SpellCheck (re-inventing the wheel)

Alex Tweedly alex at tweedly.net
Sat Jan 29 20:53:16 EST 2005


Sivakatirswami wrote:

> ... if anyone gets their head around some fast algorithms for  pulling 
> up a short suggestion list before I do, please post it.

I'd say it doesn't need to be fast; it will only happen when the word is 
not in the dictionary, and the result will be the user is consulted, and 
has to make a decision and click /type in response - so while I'd never 
say you can just ignore speed, the algorithm need only be reasonable.

>
> My context is that most words that the users will mis-spell are 
> specialized (sanskrit, tamil, names of obscure places e.g. "Chennai" 
> ). So, where an obvious mis-spelling like "mikl " the user will know 
> how to change to "milk"-- for the specialized words, she will need to 
> see a selection of choices...
>
> I know we could go "crazy" with this and code for auto replacement of 
> the mis-spelled word, etc. which adds a new layer of complexity,  but 
> for now I would be satisfied with manual user entry into a separate 
> stack where they could enter 1, 2, 3 initial chars and get a list of 
> words starting with those characters (99.9% of cases the first char is 
> assured), a click down could put that on their clip board and they 
> could paste it over the mis-spelled word. I would have our master 
> all-publicaitions lexicon entries loaded-appended to that global 
> variable gWords, to supplement  the main word list.

If you want to do good, intelligent replacement suggestions, look into 
the links Brian gave - esp. the aspell one. You should be able to use 
aspell as an external program, and use a pipe to get data into it and 
back to your app (? I think - haven't used pipes with Rev).

But if you are truly happy to leave your users to sort out the "standard 
English" words on their own, and only need to deal with the specialized 
ones .....

I'd suggest keeping the specialized words in a separate file. They can 
be read in and added to the gWords array, but also kept in a separate 
variable.  By all means have a separate stack - but why make the user 
type the initial letters again ? They've already typed it once 
(admittedly wrongly). I'd use their failed attempt as a seed, and sort 
all the specialized words in your dictionary against it.

Assuming that the specialized words are to be in gSpecial, a cheap and 
cheerful way to generate the suggestions would be

> function suggest pWord
>   local tIdeas, L, t, i, tResult
>  
>   repeat for each line L in gSpecial
>     put 0 into t
>     repeat with i = 1 to the number of chars in pWord
>       if char i of pWord = char i of L then
>         add 1 to t
>       else
>         exit repeat
>       end if
>     end repeat
>     if t > 0 then put L && t & cr after tIdeas
>   end repeat
>   sort lines of tIdeas descending by word 2 of each
>   repeat for each line L in tIdeas
>     put word 1 of L & space after tResult
>   end repeat
>   return tResult
> end suggest

Note - this does assume the first letter is correct - you could relax 
that requirement by removing the "if t > 0 then ..." test.
Presentation of this list (and then adding selectedLine to 
clipboardData) is very much personal taste - so I didn't do that part.

-- Alex.




-- 
No virus found in this outgoing message.
Checked by AVG Anti-Virus.
Version: 7.0.300 / Virus Database: 265.7.6 - Release Date: 27/01/2005



More information about the Use-livecode mailing list