SearchExtensions: Soundex Support
New Soundex support in NinjaNye.SearchExtensions
I have recently released a new version of NinjaNye.SearchExtensions
nuget package. The main feature of this release is the Soundex search support.
SearchExtensions is a library of IQueryable
and IEnumerable
extension methods to help simplify string searching.
What is Soundex
Soundex is a phonetic algorithm for indexing names by sound, as pronounced in English. The goal is for homophones to be encoded to the same representation so that they can be matched despite minor differences in spelling. [Source: Wikipedia]
As of release 1.1, NinjaNye.SearchExtensions supports converting and searching for words based on the soundex algorithm.
###How to: Performing Soundex
searches
Search where a single property sounds like a single search term
var result = data.Search(x => x.Property1).Soundex("test")
Search where a any of multiple properties sounds like a single search term
var result = data.Search(x => x.Property1, x => x.PropertyTwo)
.Soundex("test")
Search where a single property sounds like any one of multiple search terms
var result = data.Search(x => x.Property1).Soundex("test", "another")
Search where a any of multiple properties sounds like any of multiple search terms
var result = data.Search(x => x.Property1, x => x.PropertyTwo)
.Soundex("test", "another")
###How to: Combining Soundex
searches
Joining soundex searches is conducted in the same way as any other search, and can be combined with any other search (although may not be appropriate)
Search where property1 sounds like term1 AND property2 sounds like a term2
var result = data.Search(x => x.Property1).Soundex("test")
.Search(x => x.Property2).Soundex("another")
###How to: Converting words to Soundex
As part of this update I created an extension method on string
that can be used to convert a word to it's Soundex code. This extension method is public and can be used as you desire outside of the Search()
functionality
Producing the Soundex code for a word is simple. Firstly, make sure your are using the Soundex namespace:
using NinjaNye.SearchExtensions.Soundex;
Once you have this you can use the ToSoundex()
extension method
string word = "test";
string soundex = word.ToSoundex();
Converting multiple words to soundex codes
string sentence = "the quick brown fox";
string words = sentence.Split(' ');
var codes = words.Select(x => x.ToSoundex());
###Performance A lot of the examples I saw whilst researching the subject performed the same task but not always in the most performant way. Because of this I was keen to build something that would scale. Below are the tests I ran during development.
####Test environment All of these test results are from my development machine with the following specification:
- Intel Core i5-3317U CPU @ 1.70GHz
- 10Gb RAM
- Windows 8.1 64bit operating system
All of the tests below were performed against 1 million randomly generated words ranging from 2 to 10 characters
Converting words using ToSoundex()
var result = words.Select(x => x.ToSoundex()).ToList();
Time taken: 0.6919661 seconds
Querying words that match 'test'
var result = words.Search(x => x).Soundex("test").ToList();
Time taken: 0.6385429 seconds (618 results)
Querying words that match two words
var result = words.Search(x => x).Soundex("test", "bacon").ToList();
Time taken: 0.4372583 seconds (1285 results)
Querying words that match ten words
var result = words.Search(x => x).Soundex("historians", "often", "articulate", "great", "battles", "elegantly", "without", "pause", "for", "thought").ToList();
Time taken: 0.5831033 seconds (7093 results)
To see a more in depth write up of the performance testing I have done, please see my latest post on the subject
###Feature requests I've had a great time developing this feature and
I'm always open to new ideas so if you have an idea for a feature that you believe would be a good addition to SearchExtensions, please get in touch.
Equally, if you are currently using SearchExtensions and can see areas that could be enhanced or improved, I'd love to hear from you.
If you would like to get in contact, please do so by adding a comment below or you can contact me on twitter (@ninjanye)
Hi, the soundex algorithm is language sensitive or could it also work in Italian? Thank you!
Hi @Frederico,
Unfortunately Soundex is based on the English pronunciation of words. I believe there is something called Phonetic matching that fixes this issue, but I have not looked into it.