Near Text Search

Bob Sneidar bobsneidar at iotecdigital.com
Thu Feb 27 04:17:51 CET 2014


Don’t know how I missed that, thanks. 

Bob


On Feb 26, 2014, at 18:50 , Peter Haworth <pete at lcsql.com> wrote:

> Does mySQL have a Soundex function?
> 
> If I remember correctly, Devin has some functions that implement a
> Levenstein Distance algorithm which will identify potential misspellings of
> a word
> 
> Pete
> lcSQL Software <http://www.lcsql.com>
> Home of lcStackBrowser <http://www.lcsql.com/lcstackbrowser.html> and
> SQLiteAdmin <http://www.lcsql.com/sqliteadmin.html>
> 
> 
> On Wed, Feb 26, 2014 at 6:06 PM, Bob Sneidar <bobsneidar at iotecdigital.com>wrote:
> 
>> Hi all.
>> 
>> I'm trying to devise a way to implement a "near text search" when querying
>> a mySQL database. My problem is, I get spreadsheet forms filled out by hand
>> from our dispatcher, and he sometimes makes typos, just small ones, and I
>> need to ensure there are no virtual duplicate customer records in my
>> application. So I need to query the database in sic a way that I come up
>> with the nearest neighbor. I could do this easily in Foxpro, because they
>> provide an argument for it, but I've searched around and no one seems to be
>> able to produce a nearest neighbor search for text! You can do it for
>> numbers, just not text.
>> 
>> So now I'm trying to devise a way to convert a string to a number in such
>> a way that the likelihood there could be a match would be extremely
>> unlikely. So far I've come up with this:
>> 
>> function textToNum theString
>>  put lower(theString)
>>   put 1 into theSeed
>>   repeat for each char theAscii in theString
>>      put charToNum(theAscii) into theAsciiCode
>>      add (theAsciiCode*theSeed) to theNum
>>      add 1 to theSeed
>>   end repeat
>>   return theNum
>> end textToNum
>> 
>> The idea is that each character position would be multiplied by a seed
>> value representing it's position in the string. However I can foresee that
>> it would be statistically possible to get pretty close and even get a match
>> for two completely different strings. I *could* use a seed value equal to
>> the number of lower case printable characters in the lower ascii table, but
>> that could produce HUGE numbers and I am afraid of overflows.
>> 
>> Any thoughts?
>> 
>> Bob
>> _______________________________________________
>> use-livecode mailing list
>> use-livecode at lists.runrev.com
>> Please visit this url to subscribe, unsubscribe and manage your
>> subscription preferences:
>> http://lists.runrev.com/mailman/listinfo/use-livecode
>> 
> _______________________________________________
> use-livecode mailing list
> use-livecode at lists.runrev.com
> Please visit this url to subscribe, unsubscribe and manage your subscription preferences:
> http://lists.runrev.com/mailman/listinfo/use-livecode




More information about the use-livecode mailing list