Near Text Search

Bob Sneidar bobsneidar at iotecdigital.com
Wed Feb 26 21:06:09 EST 2014


Hi all.

I’m trying to devise a way to implement a “near text search” when querying a mySQL database. My problem is, I get spreadsheet forms filled out by hand from our dispatcher, and he sometimes makes typos, just small ones, and I need to ensure there are no virtual duplicate customer records in my application. So I need to query the database in sic a way that I come up with the nearest neighbor. I could do this easily in Foxpro, because they provide an argument for it, but I’ve searched around and no one seems to be able to produce a nearest neighbor search for text! You can do it for numbers, just not text. 

So now I’m trying to devise a way to convert a string to a number in such a way that the likelihood there could be a match would be extremely unlikely. So far I’ve come up with this:

function textToNum theString
  put lower(theString)
   put 1 into theSeed
   repeat for each char theAscii in theString
      put charToNum(theAscii) into theAsciiCode
      add (theAsciiCode*theSeed) to theNum
      add 1 to theSeed
   end repeat
   return theNum
end textToNum

The idea is that each character position would be multiplied by a seed value representing it’s position in the string. However I can foresee that it would be statistically possible to get pretty close and even get a match for two completely different strings. I *could* use a seed value equal to the number of lower case printable characters in the lower ascii table, but that could produce HUGE numbers and I am afraid of overflows. 

Any thoughts?

Bob



More information about the use-livecode mailing list