Search Extensions: Soundex performance analysis
NinjaNye.SearchExtensions : Performance Analysis
This post is to explain, test and analyse the performance of the Soundex functionality within NinjaNye.SearchExtensions.
###Test environment All of the test results are from my development machine with the following specification:
- Intel Core i5-3317U CPU @ 1.70GHz
- 10Gb RAM
- Windows 8.1 64bit operating system
###Test Setup The tests were all performed against 1 million randomly generated words ranging from 2 to 10 characters. Each test had a new set of randomly generated words with which to work on. This processing was not included in the timing results.
Building the random words was done as follows:
private List<string> words;
private void BuildWords(int wordCount)
{
Console.WriteLine("Building {0} words...", wordCount);
this.words = new List<string>();
for (int i = 0; i < wordCount; i++)
{
string randomWord = this.BuildRandomWord();
this.words.Add(randomWord);
}
}
private const string letters = "ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz";
private string BuildRandomWord()
{
var letterCount = RandomInt(2, 10);
var sb = new StringBuilder(letterCount);
for (int i = 0; i < letterCount; i++)
{
var letterIndex = RandomInt(0, 51);
sb.Append(letters[letterIndex]);
}
return sb.ToString();
}
private static int RandomInt(int min, int max)
{
var rng = new RNGCryptoServiceProvider();
var buffer = new byte[4];
rng.GetBytes(buffer);
int result = BitConverter.ToInt32(buffer, 0);
return new Random(result).Next(min, max);
}
The above methods meant I was simply able to have the following [SetUp]
action performed on before every test.
[SetUp]
public void Setup()
{
this.BuildWords(1000000);
}
###The tests All the output you see below is taken from a single sample test run. These figures will vary slightly on each run since the source data is never exactly the same ####Converting 1 million words to soundex
[Test]
public void ToSoundex_OneMillionRecords_UnderOneSecond()
{
//Arrange
Console.WriteLine("Processing {0} words", words.Count);
var stopwatch = new Stopwatch();
Console.WriteLine("Begin soundex...");
stopwatch.Start();
//Act
var result = words.Select(x => x.ToSoundex()).ToList();
stopwatch.Stop();
Console.WriteLine("Time taken: {0}", stopwatch.Elapsed);
Console.WriteLine("Results retrieved: {0}", result.Count());
//Assert
Assert.True(stopwatch.Elapsed.TotalMilliseconds < 1000);
}
Output
Building 1000000 words...
Processing 1000000 words
Begin soundex...
Time taken: 00:00:00.6148250
Results retrieved: 1000000
####Querying 1 million words that sound like 'test'
[Test]
public void SearchSoundex_OneMillionWordsComparedToOneWord_UnderOneSecond()
{
//Arrange
Console.WriteLine("Processing {0} words", words.Count);
var stopwatch = new Stopwatch();
Console.WriteLine("Begin soundex search...");
stopwatch.Start();
//Act
var result = words.Search(x => x).Soundex("test").ToList();
stopwatch.Stop();
Console.WriteLine("Time taken: {0}", stopwatch.Elapsed);
Console.WriteLine("Results retrieved: {0}", result.Count);
//Assert
Assert.True(stopwatch.Elapsed.Milliseconds < 1000);
}
Output
Building 1000000 words...
Processing 1000000 words
Begin soundex search...
Time taken: 00:00:00.6203847
Results retrieved: 554
####Querying 1 million words that sound like 'test' or 'bacon'
[Test]
public void SearchSoundex_OneMillionWordsComparedToTwoWords_UnderOneSecond()
{
//Arrange
Console.WriteLine("Processing {0} words", words.Count);
var stopwatch = new Stopwatch();
Console.WriteLine("Begin soundex search...");
stopwatch.Start();
//Act
var result = words.Search(x => x).Soundex("test", "bacon").ToList();
stopwatch.Stop();
Console.WriteLine("Time taken: {0}", stopwatch.Elapsed);
Console.WriteLine("Results retrieved: {0}", result.Count);
//Assert
Assert.True(stopwatch.Elapsed.Milliseconds < 1000);
}
Output
Building 1000000 words...
Processing 1000000 words
Begin soundex search...
Time taken: 00:00:00.4232135
Results retrieved: 1230
####Querying 1 million words that sound like any of ten supplied words [Test] public void SearchSoundex_OneMillionWordsComparedToTenWords_UnderOneSecond() { //Arrange Console.WriteLine("Processing {0} words", words.Count);
var stopwatch = new Stopwatch();
Console.WriteLine("Begin soundex search...");
stopwatch.Start();
//Act
var result = words.Search(x => x).Soundex("historians", "often", "articulate", "great", "battles",
"elegantly", "without", "pause", "for", "thought").ToList();
stopwatch.Stop();
Console.WriteLine("Time taken: {0}", stopwatch.Elapsed);
Console.WriteLine("Results retrieved: {0}", result.Count);
//Assert
Assert.True(stopwatch.Elapsed.Milliseconds < 1000);
}
Output
Building 1000000 words...
Processing 1000000 words
Begin soundex search...
Time taken: 00:00:00.5468836
Results retrieved: 7149
Pedal to the metal
I am really pleased with these performance results. I have not seen how this performs against other Soundex processors out there but I hope it will be competitive. It was not always this perfomant and has only got to this stage through a lot of refactoring that was backed up by my test cases.
You can see the full source code for the
SoundexProcessor
the projects github repository
Performance tweaks
####Regex is cool! but slooooooooow
An initial implementation of the SoundexProcessor
used Regex. This was great as it enabled me to easily pass my tests, often by simply updating the regex codes. However, when it came to performance tests, processing 1 million records was taking close to 20 seconds.... With my tests as a safety net I was able to rewrite the functionality the regex pattern was providing by simply analysing each charachter myself.
char.ToUpper() vs char.IsUpper()
Click here for the code this relates section relates to
Another small improvement I made was instead of converting each character to it's upper counterpart and comparing that character to a set of upper case characters I simply check if it is an upper case character and perform the relevant checks based off of that character. This small change saved around 20% of the total running time at that point in time,
If you would like to know more about the soundex search functionality, please get in touch by adding a comment below or you can contact me on twitter (@ninjanye)