- Move node-gyp from devDependencies to dependencies - Ensures v11.2.0 is used when installed as dependency - Fixes Visual Studio detection issues in consuming projects - Resolves shopApi build failures with old node-gyp v8.4.1
Similarity Search
A Node.js module that performs word order independent similarity search on strings.
This module is built as a native addon that uses C code for fast similarity computations. It uses a sophisticated similarity metric that combines fuzzy matching, prefix matching, and word-level comparisons to find matches regardless of word order.
Installation
npm install similarity-search
Dependencies
- Node.js (with node-gyp for building native addons)
- nan (^2.22.2)
- node-addon-api (^6.0.0)
Usage
const SimilaritySearch = require('similarity-search');
// Create a new search index with default capacity (500)
const index = new SimilaritySearch();
// Add strings to the index
index.addString('bio bizz');
index.addString('lightmix bizz btio substrate');
index.addString('bizz bio mix light');
// Add multiple strings at once
index.addStrings([
'plant growth bio formula',
'garden soil substrate'
]);
// Search the index with a query and similarity cutoff
const results = index.search('bio bizz', 0.2);
// Display results
results.forEach(match => {
console.log(`${match.similarity.toFixed(2)}: ${match.string}`);
});
API
new SimilaritySearch([capacity])
Creates a new search index.
capacity(optional): Initial capacity for the index. Default: 500.- Returns: A new SimilaritySearch instance.
addString(str)
Adds a string to the index.
str: The string to add.- Returns: Boolean indicating success (true if successful, false otherwise).
addStrings(strings)
Adds multiple strings to the index.
strings: Array of strings to add.- Returns: Boolean indicating if all adds were successful (true if all successful, false if any failed).
search(query, [cutoff])
Searches the index for strings similar to the query.
query: The search query.cutoff(optional): Similarity threshold between 0.0 and 1.0. Default: 0.2.- Returns: Array of matching results, sorted by similarity (descending). Each result is an object with:
string: The matching stringsimilarity: The similarity score (0.0 to 1.0)
size()
Gets the number of strings in the index.
- Returns: Number of strings in the index.
Helper Functions
SimilaritySearch.createTestIndex([size])
Creates a test index with random data.
size(optional): Number of strings to generate. Default: 500.- Returns: A new SimilaritySearch instance with random data.
- Note: The first 5 strings are fixed test cases, followed by randomly generated strings.
SimilaritySearch.benchmark(index, queries, [cutoff])
Benchmarks the search performance.
index: The index to benchmark.queries: Array of search queries.cutoff(optional): Similarity threshold. Default: 0.2.- Returns: Array of benchmark results, each containing:
query: The search querymatches: Number of matches foundtimeMs: Search time in millisecondstopResults: Top 5 matching results
How It Works
The similarity search uses a sophisticated multi-stage matching algorithm:
-
Word-level Matching: The algorithm first splits both the query and target strings into words.
-
Word Similarity Calculation: For each word pair, similarity is calculated using:
- Levenshtein distance for fuzzy matching
- Special handling for short words (3 chars or less require exact match)
- Prefix matching for significantly different length words
- Length-based similarity adjustments
-
Overall Similarity Score: The final similarity score is a weighted combination of:
- Word match score (70% weight): Percentage of query words that have a good match
- Average word similarity (30% weight): Average similarity of the best matching word pairs
This approach provides robust matching that:
- Handles typos and slight variations in words
- Requires exact matches for short words to avoid false positives
- Recognizes prefix matches (e.g., "bio" matches "biology")
- Considers both word presence and character-level similarity