genesis
This commit is contained in:
119
README.md
Normal file
119
README.md
Normal file
@@ -0,0 +1,119 @@
|
||||
# Similarity Search
|
||||
|
||||
A Node.js module that performs word order independent similarity search on strings.
|
||||
|
||||
This module is built as a native addon that uses C code for fast similarity computations. It uses Jaccard similarity between word sets to find matches regardless of word order.
|
||||
|
||||
## Installation
|
||||
|
||||
```bash
|
||||
npm install
|
||||
```
|
||||
|
||||
## Usage
|
||||
|
||||
```javascript
|
||||
const SimilaritySearch = require('./index');
|
||||
|
||||
// Create a new search index with default capacity (500)
|
||||
const index = new SimilaritySearch();
|
||||
|
||||
// Add strings to the index
|
||||
index.addString('bio bizz');
|
||||
index.addString('lightmix bizz btio substrate');
|
||||
index.addString('bizz bio mix light');
|
||||
|
||||
// Add multiple strings at once
|
||||
index.addStrings([
|
||||
'plant growth bio formula',
|
||||
'garden soil substrate'
|
||||
]);
|
||||
|
||||
// Search the index with a query and similarity cutoff
|
||||
const results = index.search('bio bizz', 0.2);
|
||||
|
||||
// Display results
|
||||
results.forEach(match => {
|
||||
console.log(`${match.similarity.toFixed(2)}: ${match.string}`);
|
||||
});
|
||||
```
|
||||
|
||||
## API
|
||||
|
||||
### `new SimilaritySearch([capacity])`
|
||||
|
||||
Creates a new search index.
|
||||
|
||||
- `capacity` (optional): Initial capacity for the index. Default: 500.
|
||||
|
||||
### `addString(str)`
|
||||
|
||||
Adds a string to the index.
|
||||
|
||||
- `str`: The string to add.
|
||||
- Returns: Boolean indicating success.
|
||||
|
||||
### `addStrings(strings)`
|
||||
|
||||
Adds multiple strings to the index.
|
||||
|
||||
- `strings`: Array of strings to add.
|
||||
- Returns: Boolean indicating if all adds were successful.
|
||||
|
||||
### `search(query, [cutoff])`
|
||||
|
||||
Searches the index for strings similar to the query.
|
||||
|
||||
- `query`: The search query.
|
||||
- `cutoff` (optional): Similarity threshold between 0.0 and 1.0. Default: 0.2.
|
||||
- Returns: Array of matching results, sorted by similarity (descending).
|
||||
|
||||
### `size()`
|
||||
|
||||
Gets the number of strings in the index.
|
||||
|
||||
- Returns: Number of strings in the index.
|
||||
|
||||
## Helper Functions
|
||||
|
||||
### `SimilaritySearch.createTestIndex([size])`
|
||||
|
||||
Creates a test index with random data.
|
||||
|
||||
- `size` (optional): Number of strings to generate. Default: 500.
|
||||
- Returns: A new SimilaritySearch instance with random data.
|
||||
|
||||
### `SimilaritySearch.benchmark(index, queries, [cutoff])`
|
||||
|
||||
Benchmarks the search performance.
|
||||
|
||||
- `index`: The index to benchmark.
|
||||
- `queries`: Array of search queries.
|
||||
- `cutoff` (optional): Similarity threshold. Default: 0.2.
|
||||
- Returns: Benchmark results.
|
||||
|
||||
## How It Works
|
||||
|
||||
The similarity search uses Jaccard similarity between word sets:
|
||||
|
||||
```
|
||||
similarity = (number of matching words) / (total unique words)
|
||||
```
|
||||
|
||||
This means word order doesn't matter - "bio bizz" will match with "bizz bio" with 100% similarity.
|
||||
|
||||
## Building
|
||||
|
||||
To rebuild the native addon:
|
||||
|
||||
```bash
|
||||
npm install
|
||||
```
|
||||
|
||||
## Testing
|
||||
|
||||
Run the test script:
|
||||
|
||||
```bash
|
||||
npm test
|
||||
```
|
||||
Reference in New Issue
Block a user