Improved Levenshtein search with SearchExtensions

by John Nye

04 Nov

Improved Levenshtein searching with SearchExtensions

NOTE: This post will not cover what Levenshtein Distance is or how it is calculated.
For more information on Levenshtein Distance, please visit the Wikipedia page

Up until now, NinjaNye.SearchExtensions has only been able to calculate the Levenshtein distance between:

  • A single property and a string search term
  • A single property and one other property

To celebrate 10,000 downloads of SearchExtensions, the latest version allows you to calculate the distance between:

  • A single property and multiple strings
  • Multiple properties and a single string
  • Multiple properties and multiple strings
  • A single property and multiple other properties
  • Multiple properties and a single property
  • Multiple properties and multiple other properties

You can install NinjaNye.SearchExtensions from nuget using the following:

PM> Install-Package NinjaNye.SearchExtensions

All of this is fairly difficult to describe, so below are some examples to help me explain.

Calculate levenshtein distance against multiple terms

The following functionality has been in SearchExtensions for a little while but it is a good starting point to describe how to get the Levenshtein distance between each users first name and the word "Jim"

var result = context.Users.LevenshteinDistanceOf(x => x.FirstName)

The above will return each user along with the Levenshtein distance of the FirstName compared to "Jim". In version 2.0, it is now possible to compare the FirstName to more than one term:

var result = context.Users.LevenshteinDistanceOf(x => x.FirstName)
      .ComparedTo("Jim", "Fred");

Calculate levenshtein distance against multiple properties

Calculating the distance against multiple properties is also now supported:

var result = context.Users.LevenshteinDistanceOf(x => x.FirstName, x => LastName)
      .ComparedTo("Jim", "Fred");

The above will result in 4 comparisons per record. Because the levenshtein calculation must be performed in memory, you should always reduce the record set as much as possible beforehand, e.g:

var result = context.Users.Search(x => x.CountryCode)
      .LevenshteinDistanceOf(x => x.FirstName, x => LastName)
      .ComparedTo("Jim", "Fred");

When performing a Levenshtein search it is important to always reduce your record set as much as possible, especially when comparing multiple properties to multiple terms

A new result

The result of a Levenshtein search also changed and now has additional properties to work with:

public interface ILevenshteinDistance<out T>
    /// <summary>
    /// The distance of the first comparison
    /// </summary>
    int Distance { get; }

    /// <summary>
    /// The queried item
    /// </summary>
    T Item { get; }

    /// <summary>
    /// A collection of all distances calculated
    /// </summary>
    int[] Distances { get; }

    /// <summary>
    /// The minimum distance of all levenshtein calculations
    /// </summary>
    int MinimumDistance { get; }

    /// <summary>
    /// The maximum distance of all levenshtein calculations
    /// </summary>
    int MaximumDistance { get; }

This means you can order the results so that the record with the closest match is ordered first:

var result = context.Users.LevenshteinDistanceOf(x => x.FirstName)
      .ComparedTo("Jim", "Fred")
      .OrderBy(x => x.MinimumDistance)
      .ThenBy(x => x.MaximumDistance);

If you would like to know more about this or any other feature, please get in touch by adding a comment below, contact me on twitter (@ninjanye) or you can raise an issue on the github project

Comments 0 * Be the first to comment!

Leave a message...

25 Feb