SearchExtensions: Soundex Support

by John Nye

21 Oct
2014

New Soundex support in NinjaNye.SearchExtensions

I have recently released a new version of NinjaNye.SearchExtensions nuget package. The main feature of this release is the Soundex search support.

PM> Install-Package NinjaNye.SearchExtensions

SearchExtensions is a library of IQueryable and IEnumerable extension methods to help simplify string searching.

What is Soundex

Soundex is a phonetic algorithm for indexing names by sound, as pronounced in English. The goal is for homophones to be encoded to the same representation so that they can be matched despite minor differences in spelling. [Source: Wikipedia]

As of release 1.1, NinjaNye.SearchExtensions supports converting and searching for words based on the soundex algorithm.

How to: Performing Soundex searches

Search where a single property sounds like a single search term

var result = data.Search(x => x.Property1).Soundex("test")

Search where a any of multiple properties sounds like a single search term

var result = data.Search(x => x.Property1, x => x.PropertyTwo)
                 .Soundex("test")

Search where a single property sounds like any one of multiple search terms

var result = data.Search(x => x.Property1).Soundex("test", "another")

Search where a any of multiple properties sounds like any of multiple search terms

var result = data.Search(x => x.Property1, x => x.PropertyTwo)
                 .Soundex("test", "another")

How to: Combining Soundex searches

Joining soundex searches is conducted in the same way as any other search, and can be combined with any other search (although may not be appropriate)

Search where property1 sounds like term1 AND property2 sounds like a term2

 var result = data.Search(x => x.Property1).Soundex("test")
                  .Search(x => x.Property2).Soundex("another")

How to: Converting words to Soundex

As part of this update I created an extension method on string that can be used to convert a word to it's Soundex code. This extension method is public and can be used as you desire outside of the Search() functionality

Producing the Soundex code for a word is simple. Firstly, make sure your are using the Soundex namespace:

using NinjaNye.SearchExtensions.Soundex;

Once you have this you can use the ToSoundex() extension method

string word = "test";
string soundex = word.ToSoundex();

Converting multiple words to soundex codes

string sentence = "the quick brown fox";
string words = sentence.Split(' ');
var codes = words.Select(x => x.ToSoundex());

Performance

A lot of the examples I saw whilst researching the subject performed the same task but not always in the most performant way. Because of this I was keen to build something that would scale. Below are the tests I ran during development.

Test environment

All of these test results are from my development machine with the following specification:

  • Intel Core i5-3317U CPU @ 1.70GHz
  • 10Gb RAM
  • Windows 8.1 64bit operating system

All of the tests below were performed against 1 million randomly generated words ranging from 2 to 10 characters

Converting words using ToSoundex()

var result = words.Select(x => x.ToSoundex()).ToList();
Time taken: 0.6919661 seconds

Querying words that match 'test'

var result = words.Search(x => x).Soundex("test").ToList();
Time taken: 0.6385429 seconds (618 results)

Querying words that match two words

var result = words.Search(x => x).Soundex("test", "bacon").ToList();
Time taken: 0.4372583 seconds (1285 results)

Querying words that match ten words

var result = words.Search(x => x).Soundex("historians", "often", "articulate", "great", "battles", "elegantly", "without", "pause", "for", "thought").ToList();
Time taken: 0.5831033 seconds (7093 results)

To see a more in depth write up of the performance testing I have done, please see my latest post on the subject


Feature requests

I've had a great time developing this feature and

I'm always open to new ideas so if you have an idea for a feature that you believe would be a good addition to SearchExtensions, please get in touch.

Equally, if you are currently using SearchExtensions and can see areas that could be enhanced or improved, I'd love to hear from you.

If you would like to get in contact, please do so by adding a comment below or you can contact me on twitter (@ninjanye)

Comments 2

Federico says: 1563 days ago

Hi, the soundex algorithm is language sensitive or could it also work in Italian? Thank you!

John says: 1553 days ago

Hi @Frederico,

Unfortunately Soundex is based on the English pronunciation of words. I believe there is something called Phonetic matching that fixes this issue, but I have not looked into it.

Leave a message...

18 Apr
2024