# Similarity Search A Node.js module that performs word order independent similarity search on strings. This module is built as a native addon that uses C code for fast similarity computations. It uses Jaccard similarity between word sets to find matches regardless of word order. ## Installation ```bash npm install ``` ## Usage ```javascript const SimilaritySearch = require('./index'); // Create a new search index with default capacity (500) const index = new SimilaritySearch(); // Add strings to the index index.addString('bio bizz'); index.addString('lightmix bizz btio substrate'); index.addString('bizz bio mix light'); // Add multiple strings at once index.addStrings([ 'plant growth bio formula', 'garden soil substrate' ]); // Search the index with a query and similarity cutoff const results = index.search('bio bizz', 0.2); // Display results results.forEach(match => { console.log(`${match.similarity.toFixed(2)}: ${match.string}`); }); ``` ## API ### `new SimilaritySearch([capacity])` Creates a new search index. - `capacity` (optional): Initial capacity for the index. Default: 500. ### `addString(str)` Adds a string to the index. - `str`: The string to add. - Returns: Boolean indicating success. ### `addStrings(strings)` Adds multiple strings to the index. - `strings`: Array of strings to add. - Returns: Boolean indicating if all adds were successful. ### `search(query, [cutoff])` Searches the index for strings similar to the query. - `query`: The search query. - `cutoff` (optional): Similarity threshold between 0.0 and 1.0. Default: 0.2. - Returns: Array of matching results, sorted by similarity (descending). ### `size()` Gets the number of strings in the index. - Returns: Number of strings in the index. ## Helper Functions ### `SimilaritySearch.createTestIndex([size])` Creates a test index with random data. - `size` (optional): Number of strings to generate. Default: 500. - Returns: A new SimilaritySearch instance with random data. ### `SimilaritySearch.benchmark(index, queries, [cutoff])` Benchmarks the search performance. - `index`: The index to benchmark. - `queries`: Array of search queries. - `cutoff` (optional): Similarity threshold. Default: 0.2. - Returns: Benchmark results. ## How It Works The similarity search uses Jaccard similarity between word sets: ``` similarity = (number of matching words) / (total unique words) ``` This means word order doesn't matter - "bio bizz" will match with "bizz bio" with 100% similarity. ## Building To rebuild the native addon: ```bash npm install ``` ## Testing Run the test script: ```bash npm test ```