Near Text Search
Bob Sneidar
bobsneidar at iotecdigital.com
Wed Feb 26 21:06:09 EST 2014
Hi all.
I’m trying to devise a way to implement a “near text search” when querying a mySQL database. My problem is, I get spreadsheet forms filled out by hand from our dispatcher, and he sometimes makes typos, just small ones, and I need to ensure there are no virtual duplicate customer records in my application. So I need to query the database in sic a way that I come up with the nearest neighbor. I could do this easily in Foxpro, because they provide an argument for it, but I’ve searched around and no one seems to be able to produce a nearest neighbor search for text! You can do it for numbers, just not text.
So now I’m trying to devise a way to convert a string to a number in such a way that the likelihood there could be a match would be extremely unlikely. So far I’ve come up with this:
function textToNum theString
put lower(theString)
put 1 into theSeed
repeat for each char theAscii in theString
put charToNum(theAscii) into theAsciiCode
add (theAsciiCode*theSeed) to theNum
add 1 to theSeed
end repeat
return theNum
end textToNum
The idea is that each character position would be multiplied by a seed value representing it’s position in the string. However I can foresee that it would be statistically possible to get pretty close and even get a match for two completely different strings. I *could* use a seed value equal to the number of lower case printable characters in the lower ascii table, but that could produce HUGE numbers and I am afraid of overflows.
Any thoughts?
Bob
More information about the use-livecode
mailing list