SearchExtensions: Ranked search results for IQueryable search terms
I have recently updated my search extensions project to enable ranked search results. This enables a user to search for a term within a property but also order the results by the most relevant according to the number of hits.
Full source code can be found here: https://github.com/ninjanye/searchextensions
The SearchExtensions nuget package is also available by running the following
<p class="nuget-badge"><code>PM> Install-Package NinjaNye.SearchExtensions</code></p>The Goal
The thought behind a ranked search is to enable users to easily search there data collections and determine which results are more relevant to others.
How to use it
A ranked search is called in the same way as a regular search:
var result = queryableData.RankedSearch(x => x.Property, "searchTerm");
This produces the following SQL
when used with a sql data provider. Notice that all the searching and ranking is done in SQL
(not in memory)
SELECT
[Project1].[C1] AS [C1],
[Project1].[Property] AS [Property]
...
FROM ( SELECT
[Extent1].[Property] AS [Property],
...
(( CAST(LEN([Extent1].[Property]) AS int)) -
( CAST(LEN(REPLACE([Extent1].[Property], N'searchTerm', N'')) AS int)))
/ 10 AS [C1]
FROM [dbo].[Table] AS [Extent1]
WHERE [Extent1].[Property] LIKE N'%searchTerm%'
) AS [Project1]
###How it was built (Expression Trees)
So here is the implementation. Firstly, to represent my ranked result I have the following interface
public interface IRanked<out T>
{
int Hits { get; }
T Item { get; }
}
... with the following concrete class
internal class Ranked<T> : IRanked<T>
{
public int Hits { get; set; }
public T Item { get; set; }
}
The RankedSearch
extension method
public static class RankedSearchExtensions
{
public static IQueryable<IRanked<T>> RankedSearch<T>(this IQueryable<T> source,
Expression<Func<T, string>> stringProperty,
string searchTerm)
{
var parameterExpression = stringProperty.Parameters[0];
var hitCountExpression = CalculateHitCount(stringProperty, searchTerm);
var rankedInitExpression = ConstructRankedResult<T>(hitCountExpression,
parameterExpression);
var selectExpression =
Expression.Lambda<Func<T, Ranked<T>>>(rankedInitExpression, parameterExpression);
return source.Search(stringProperty, searchTerm)
.Select(selectExpression);
}
The first thing this method does is call CalculateHitCount
which creates an expression that represents counting the number of times a search term occurs. I am using the following method to count occurrences so that this can be used by all providers, specifically SQL.
Note: Always write down the code you are trying to build to help visualize the expression tree
x => x.Name.Length - x.Name.Replace([searchTerm], "").Length) / [searchTerm].Length;
In terms of building the above as an expression tree, this was accomplished as follows:
private static BinaryExpression CalculateHitCount<T>(Expression<Func<T, string>> stringProperty,
string searchTerm)
{
Expression searchTermExpression = Expression.Constant(searchTerm);
// Store term length to work out how many search terms were found
Expression searchTermLengthExpression = Expression.Constant(searchTerm.Length);
// Empty string expression to replace search terms with
Expression emptyStringExpression = Expression.Constant("");
PropertyInfo stringLengthProperty = typeof (string).GetProperty("Length");
//Calculate the length of property
var lengthExpression = Expression.Property(stringProperty.Body, stringLengthProperty);
// Replace searchTerm with empty string in property
MethodInfo replaceMethod = typeof(string).GetMethod("Replace",
new[] {typeof (string), typeof (string)});
var replaceExpression = Expression.Call(stringProperty.Body, replaceMethod,
searchTermExpression, emptyStringExpression);
// Calculate length of replaced string
var replacedLengthExpression = Expression.Property(replaceExpression, stringLengthProperty);
// Calculate the difference between the property and the replaced property
var charDiffExpression = Expression.Subtract(lengthExpression, replacedLengthExpression);
// Divide the character difference by the number of characters in the
// search term to get the amount of occurrences
return Expression.Divide(charDiffExpression, searchTermLengthExpression);
}
The second part of a RankSearch
is to initialize a Ranked
search result holding the hit count as well as returning the original item. We already have the hit count expression using the method above. We now need to build an expression tree that uses the hit count and builds a ranked result.
The equivalent lambda I want to build is as follows:
x => new Ranked<T>{ Hits = [hitCountExpression], Item = x}
This is represented as the following expression tree. It is fairly simple as it is simple initializing our ranked result:
private static Expression ConstructRankedResult<T>(Expression hitCountExpression,
ParameterExpression parameterExpression)
{
var rankedType = typeof (Ranked<T>);
// Construct the object
var rankedCtor = Expression.New(rankedType);
// Assign hitCount to Hits property
var hitProperty = rankedType.GetProperty("Hits");
var hitValueAssignment = Expression.Bind(hitProperty, hitCountExpression);
//Assign record to Item property
var itemProperty = rankedType.GetProperty("Item");
var itemValueAssignment = Expression.Bind(itemProperty, parameterExpression);
// Initialize Ranked object with property assignments
return Expression.MemberInit(rankedCtor, hitValueAssignment, itemValueAssignment);
}
Get in touch
I'm not entirely happy with the method name RankedSearch
as it suggests the result is ordered by default. This is not the case as the user can order the results as they see fit. RankedSearch
simply provides an occurrence (hit) count of the search term. If you have a suggestion as to a better method name, please get in touch via the comments below, twitter, or emailing me using the link in the header
I am currently implementing the RankedSearch
feature for use with multiple properties and multiple search terms (a future post, no doubt) but if you have any ideas as to future features or enhancements, then, again, please get in touch using the normal channels.
Hi John, love the search extensions. The ranked one really saved me a lot of time! I've got one little bug though, wondering if you can help me?
I'm searching over 4 fields: Name, Role, Goal and Reason (all text fields). If I populate each field with the word "testing" and then search for the term "testing" I get an error of :
"The cast to value type 'Int32' failed because the materialized value is null. Either the result type's generic parameter or the query must use a nullable type."
But if I reduce it to "testin" then I get results back fine.
My LINQ query is as follows:
var results = query.Search(
p => p.Name,
p => p.Role,
p => p.Goal,
p => p.Reason).Containing(term.Split(' '))
.ToRanked()
.OrderByDescending(r => r.Hits);
When I then evaluate to process the list of items I get the above error.
Thanks,
Paul
Hi Paul,
Thanks for getting in touch. Glad to hear you like SearchExtensions. I'll look into your issue this evening as it sounds there ma be a bug somewhere.
Could I possibly ask you to create an issue on the projects GitHub page. I'll begin work immediately but Github issues are better for tracking progress and receiving updates.
I'll be update as soon as I have any findings
Cheers John
Hi Paul,
If possible could you also include the stack trace in the issue. Hopefully that will help me identify the route cause of the issue a little quicker. If you are unable to create an issue, just add the details here and I'll create the issue over on github.
Thanks again
Hi John the search is great and now I discovered ranked search. I'm struggling because all of my other code expects the iQueriable(of MyType) and not iQueriable(of IRanked(of Mytype) ) and I'm having trouble converting back out. I realize the perf hit I might take but my subsets are small.
Thanks, -Collin
Hi Colin,
You should be able to do something like
var converted = ranked.Select(r => r.Item);
Apologies if this is slightly off as it's from memory and typed out on my phone (away from an IDE)
Hope that helps.