Near Text Search

Peter Haworth pete at lcsql.com
Wed Feb 26 21:50:59 EST 2014


Does mySQL have a Soundex function?

If I remember correctly, Devin has some functions that implement a
Levenstein Distance algorithm which will identify potential misspellings of
a word

Pete
lcSQL Software <http://www.lcsql.com>
Home of lcStackBrowser <http://www.lcsql.com/lcstackbrowser.html> and
SQLiteAdmin <http://www.lcsql.com/sqliteadmin.html>


On Wed, Feb 26, 2014 at 6:06 PM, Bob Sneidar <bobsneidar at iotecdigital.com>wrote:

> Hi all.
>
> I'm trying to devise a way to implement a "near text search" when querying
> a mySQL database. My problem is, I get spreadsheet forms filled out by hand
> from our dispatcher, and he sometimes makes typos, just small ones, and I
> need to ensure there are no virtual duplicate customer records in my
> application. So I need to query the database in sic a way that I come up
> with the nearest neighbor. I could do this easily in Foxpro, because they
> provide an argument for it, but I've searched around and no one seems to be
> able to produce a nearest neighbor search for text! You can do it for
> numbers, just not text.
>
> So now I'm trying to devise a way to convert a string to a number in such
> a way that the likelihood there could be a match would be extremely
> unlikely. So far I've come up with this:
>
> function textToNum theString
>   put lower(theString)
>    put 1 into theSeed
>    repeat for each char theAscii in theString
>       put charToNum(theAscii) into theAsciiCode
>       add (theAsciiCode*theSeed) to theNum
>       add 1 to theSeed
>    end repeat
>    return theNum
> end textToNum
>
> The idea is that each character position would be multiplied by a seed
> value representing it's position in the string. However I can foresee that
> it would be statistically possible to get pretty close and even get a match
> for two completely different strings. I *could* use a seed value equal to
> the number of lower case printable characters in the lower ascii table, but
> that could produce HUGE numbers and I am afraid of overflows.
>
> Any thoughts?
>
> Bob
> _______________________________________________
> use-livecode mailing list
> use-livecode at lists.runrev.com
> Please visit this url to subscribe, unsubscribe and manage your
> subscription preferences:
> http://lists.runrev.com/mailman/listinfo/use-livecode
>



More information about the use-livecode mailing list