genesis

2025-04-18 08:22:35 +02:00
commit 51a3cc6c2d
13 changed files with 794 additions and 0 deletions
--- a/README.md
+++ b/README.md
@@ -0,0 +1,119 @@
+# Similarity Search
+
+A Node.js module that performs word order independent similarity search on strings.
+
+This module is built as a native addon that uses C code for fast similarity computations. It uses Jaccard similarity between word sets to find matches regardless of word order.
+
+## Installation
+
+```bash
+npm install
+```
+
+## Usage
+
+```javascript
+const SimilaritySearch = require('./index');
+
+// Create a new search index with default capacity (500)
+const index = new SimilaritySearch();
+
+// Add strings to the index
+index.addString('bio bizz');
+index.addString('lightmix bizz btio substrate');
+index.addString('bizz bio mix light');
+
+// Add multiple strings at once
+index.addStrings([
+  'plant growth bio formula',
+  'garden soil substrate'
+]);
+
+// Search the index with a query and similarity cutoff
+const results = index.search('bio bizz', 0.2);
+
+// Display results
+results.forEach(match => {
+  console.log(`${match.similarity.toFixed(2)}: ${match.string}`);
+});
+```
+
+## API
+
+### `new SimilaritySearch([capacity])`
+
+Creates a new search index.
+
+- `capacity` (optional): Initial capacity for the index. Default: 500.
+
+### `addString(str)`
+
+Adds a string to the index.
+
+- `str`: The string to add.
+- Returns: Boolean indicating success.
+
+### `addStrings(strings)`
+
+Adds multiple strings to the index.
+
+- `strings`: Array of strings to add.
+- Returns: Boolean indicating if all adds were successful.
+
+### `search(query, [cutoff])`
+
+Searches the index for strings similar to the query.
+
+- `query`: The search query.
+- `cutoff` (optional): Similarity threshold between 0.0 and 1.0. Default: 0.2.
+- Returns: Array of matching results, sorted by similarity (descending).
+
+### `size()`
+
+Gets the number of strings in the index.
+
+- Returns: Number of strings in the index.
+
+## Helper Functions
+
+### `SimilaritySearch.createTestIndex([size])`
+
+Creates a test index with random data.
+
+- `size` (optional): Number of strings to generate. Default: 500.
+- Returns: A new SimilaritySearch instance with random data.
+
+### `SimilaritySearch.benchmark(index, queries, [cutoff])`
+
+Benchmarks the search performance.
+
+- `index`: The index to benchmark.
+- `queries`: Array of search queries.
+- `cutoff` (optional): Similarity threshold. Default: 0.2.
+- Returns: Benchmark results.
+
+## How It Works
+
+The similarity search uses Jaccard similarity between word sets:
+
+```
+similarity = (number of matching words) / (total unique words)
+```
+
+This means word order doesn't matter - "bio bizz" will match with "bizz bio" with 100% similarity.
+
+## Building
+
+To rebuild the native addon:
+
+```bash
+npm install
+```
+
+## Testing
+
+Run the test script:
+
+```bash
+npm test
+```