Fuzzy String for Javascript (a quicksilver-like string-comparing utility)
Welcome to Fuzzy String 101.
You wanted a javascript utility that compares two strings and tells you how similar they are? Done.
You have users who want to be able to type just a few characters and the option they're thinking about rises to the top, ready for them to simply hit Enter. But you don't have a tool that compares their few characters with a list of values and tells you which one is the closest match. Actually, you do.
Fuzzy String is like QuickSilver on steroids.
You can even change the weights of different matching characteristics.
Get it: http://github.com/dcparker/jquery_plugins/tree/master/fuzzy-string
How Fuzzy String Works
Note the terms: Fuzzy String tells you how alike an "abbreviation" is to each "base string".To begin, Fuzzy String makes the following assumptions:
- The characters you type don't have to match in the right order - although they lost a lot of point value if they're out of order.
- Characters in the base string can only be matched once. So, "Toby".score("TT") probably won't match (all depending on your weights), because the second "T" doesn't match any character that wasn't already matched.
The Algorithm
- Establish a max potential score for the base string based on the current weights.
- Step through the abbreviation character by character, find ALL possible ways to match each character (creates a tree of possible match paths for the entire abbreviation) and add bonuses to each match based on the weights.
- Add up the points from each abbreviation character for each possible path, choose the best path, and return the proportion of that score to the base string's max potential score.
The Weights
- +1 to start with.
- +0.5 for matching the first character. You already get the Acronym score for this, so just adds a little more.
- +1 for matching the first character of a word.
- +0.2 for matching a capital letter.
- +0.2 for matching the case.
- +0.2 for each consecutive character you match.
- -5.5 penalty for not finding the character at all.
- -1, and reduces all other bonuses to 0.1 of their original value, when the character matched is out of order.
Examples
// See which name is the best match
'Sid Kelley'.score('sk') => 0.29341317365269465
'Sid Kelley Jr'.score('sk') => 0.22171945701357465
'Shay Smirker'.score('sk') => 0.2
'Skore Mellowman'.score('sk') => 0.17299578059071732
// Now, see which set of characters matches the best
'Sid Kelley'.score('SK') => 0.3173652694610779
'Sid Kelley'.score('SKi') => 0.3403293413173653
'Sid Kelley'.score('SKid') => 0.4241616766467067
'Sid Kelley'.score('SKidlly') => 0.6517065868263472
'Sid Kelley'.score('SKidllyee') => 0.7465269461077843
// Put a couple more in the right order...
'Sid Kelley'.score('SidKllyee') => 0.8073952095808382
'Sid Kelley'.score('SidKelley') => 0.9041916167664668
'Sid Kelley'.score('Sid Kelley') => 1
3 comments:
So...why is this tagged as jQuery?
It looks pretty cool, but I'm still not sure what exactly the applications would be. The only one that comes to mind is having a search autocomplete function that returns 10 items, and then sort them using this plugin or something.
Exactly. Typically you would use this to sort live search results. You'd get the best results if you have the entire list in javascript, and let the fuzzy-compare find your matches and just show you the top 10.
I tagged it jQuery because it can be used with several jQuery plugins that use quicksilver.js as a drop-in replacement, for comparable (but better/smarter) results. Most relevant is probably my jQuery quickselect plugin.
I just downloaded the jquery quickselect plugin but the sample.html page (first dropdown) doesn't work ...
Post a Comment