Near Text Search
Bob Sneidar
bobsneidar at iotecdigital.com
Wed Feb 26 22:17:51 EST 2014
Don’t know how I missed that, thanks.
Bob
On Feb 26, 2014, at 18:50 , Peter Haworth <pete at lcsql.com> wrote:
> Does mySQL have a Soundex function?
>
> If I remember correctly, Devin has some functions that implement a
> Levenstein Distance algorithm which will identify potential misspellings of
> a word
>
> Pete
> lcSQL Software <http://www.lcsql.com>
> Home of lcStackBrowser <http://www.lcsql.com/lcstackbrowser.html> and
> SQLiteAdmin <http://www.lcsql.com/sqliteadmin.html>
>
>
> On Wed, Feb 26, 2014 at 6:06 PM, Bob Sneidar <bobsneidar at iotecdigital.com>wrote:
>
>> Hi all.
>>
>> I'm trying to devise a way to implement a "near text search" when querying
>> a mySQL database. My problem is, I get spreadsheet forms filled out by hand
>> from our dispatcher, and he sometimes makes typos, just small ones, and I
>> need to ensure there are no virtual duplicate customer records in my
>> application. So I need to query the database in sic a way that I come up
>> with the nearest neighbor. I could do this easily in Foxpro, because they
>> provide an argument for it, but I've searched around and no one seems to be
>> able to produce a nearest neighbor search for text! You can do it for
>> numbers, just not text.
>>
>> So now I'm trying to devise a way to convert a string to a number in such
>> a way that the likelihood there could be a match would be extremely
>> unlikely. So far I've come up with this:
>>
>> function textToNum theString
>> put lower(theString)
>> put 1 into theSeed
>> repeat for each char theAscii in theString
>> put charToNum(theAscii) into theAsciiCode
>> add (theAsciiCode*theSeed) to theNum
>> add 1 to theSeed
>> end repeat
>> return theNum
>> end textToNum
>>
>> The idea is that each character position would be multiplied by a seed
>> value representing it's position in the string. However I can foresee that
>> it would be statistically possible to get pretty close and even get a match
>> for two completely different strings. I *could* use a seed value equal to
>> the number of lower case printable characters in the lower ascii table, but
>> that could produce HUGE numbers and I am afraid of overflows.
>>
>> Any thoughts?
>>
>> Bob
>> _______________________________________________
>> use-livecode mailing list
>> use-livecode at lists.runrev.com
>> Please visit this url to subscribe, unsubscribe and manage your
>> subscription preferences:
>> http://lists.runrev.com/mailman/listinfo/use-livecode
>>
> _______________________________________________
> use-livecode mailing list
> use-livecode at lists.runrev.com
> Please visit this url to subscribe, unsubscribe and manage your subscription preferences:
> http://lists.runrev.com/mailman/listinfo/use-livecode
More information about the use-livecode
mailing list