Near Text Search
Peter Haworth
pete at lcsql.com
Wed Feb 26 21:50:59 EST 2014
Does mySQL have a Soundex function?
If I remember correctly, Devin has some functions that implement a
Levenstein Distance algorithm which will identify potential misspellings of
a word
Pete
lcSQL Software <http://www.lcsql.com>
Home of lcStackBrowser <http://www.lcsql.com/lcstackbrowser.html> and
SQLiteAdmin <http://www.lcsql.com/sqliteadmin.html>
On Wed, Feb 26, 2014 at 6:06 PM, Bob Sneidar <bobsneidar at iotecdigital.com>wrote:
> Hi all.
>
> I'm trying to devise a way to implement a "near text search" when querying
> a mySQL database. My problem is, I get spreadsheet forms filled out by hand
> from our dispatcher, and he sometimes makes typos, just small ones, and I
> need to ensure there are no virtual duplicate customer records in my
> application. So I need to query the database in sic a way that I come up
> with the nearest neighbor. I could do this easily in Foxpro, because they
> provide an argument for it, but I've searched around and no one seems to be
> able to produce a nearest neighbor search for text! You can do it for
> numbers, just not text.
>
> So now I'm trying to devise a way to convert a string to a number in such
> a way that the likelihood there could be a match would be extremely
> unlikely. So far I've come up with this:
>
> function textToNum theString
> put lower(theString)
> put 1 into theSeed
> repeat for each char theAscii in theString
> put charToNum(theAscii) into theAsciiCode
> add (theAsciiCode*theSeed) to theNum
> add 1 to theSeed
> end repeat
> return theNum
> end textToNum
>
> The idea is that each character position would be multiplied by a seed
> value representing it's position in the string. However I can foresee that
> it would be statistically possible to get pretty close and even get a match
> for two completely different strings. I *could* use a seed value equal to
> the number of lower case printable characters in the lower ascii table, but
> that could produce HUGE numbers and I am afraid of overflows.
>
> Any thoughts?
>
> Bob
> _______________________________________________
> use-livecode mailing list
> use-livecode at lists.runrev.com
> Please visit this url to subscribe, unsubscribe and manage your
> subscription preferences:
> http://lists.runrev.com/mailman/listinfo/use-livecode
>
More information about the use-livecode
mailing list