Searching "teh" or tihs"

hh hh at hyperhh.de
Fri Mar 10 13:59:31 EST 2017


1. The algorithm I implemented is for "fuzzy search" of written/typed words,
not for "similar sounding words" (soundex), mostly quite different. My demo
is scripted for looking up a (mistyped) search string in the 3233 keywords
of LCScript.

2. That's for me the true value of LiveCode:
Don't talk about possible development -- just do it. Then you have in a few
hours a solution which is working on Mac/Win/Linux, using LC 6/7/8/9, and
often fast enough even for RaspberryPi 2/3. Independent of current OS flavours.

If that solution is not good enough or not fast enough for you then you can
write C or java extensions. We have already a Java FFI available in LC 9-dp6!
I'm really looking forward to your solution.

In the meantime you can use my approach, it was updated today. I removed a
small bug in the percentage search, which wasn't sloppy enough ;-)

> Bob S. wrote:
> There is always the soundex() sql function. SELECT soundex('the') = soundex('teh') returns true. Not sure what the tolerance is though. Because of the arbitrary nature of languages, this really requires a lookup table for commonly mistyped words, with the ability to "learn" as corrections are made. Then you would need to be able to "uncorrect" or delete entries. Eventually you end up with something that is likely built into the OS already, so at that point it would be better to write an extension in C or Java. 
> 
> Bob S
> 
> 
> > hh wrote:
> > 
> > Searching is important for your project?
> > Would you like to ask "Did you mean the?" if user searches "teh"?
> > 
> > I've implemented a fuzzySearch algorithm in LiveCode script:
> > http://forums.livecode.com/viewtopic.php?p=152202#p152202
> > 
> > Now if you wish to look up "the" or "this" then fuzzySearch will find
> > it (among others) by searching "teh" or tihs", with a penalty score of
> > one only for swapping the chars. 
> 




More information about the use-livecode mailing list