SpellCheck (re-inventing the wheel)
Alex Tweedly
alex at tweedly.net
Sat Jan 29 20:53:16 EST 2005
Sivakatirswami wrote:
> ... if anyone gets their head around some fast algorithms for pulling
> up a short suggestion list before I do, please post it.
I'd say it doesn't need to be fast; it will only happen when the word is
not in the dictionary, and the result will be the user is consulted, and
has to make a decision and click /type in response - so while I'd never
say you can just ignore speed, the algorithm need only be reasonable.
>
> My context is that most words that the users will mis-spell are
> specialized (sanskrit, tamil, names of obscure places e.g. "Chennai"
> ). So, where an obvious mis-spelling like "mikl " the user will know
> how to change to "milk"-- for the specialized words, she will need to
> see a selection of choices...
>
> I know we could go "crazy" with this and code for auto replacement of
> the mis-spelled word, etc. which adds a new layer of complexity, but
> for now I would be satisfied with manual user entry into a separate
> stack where they could enter 1, 2, 3 initial chars and get a list of
> words starting with those characters (99.9% of cases the first char is
> assured), a click down could put that on their clip board and they
> could paste it over the mis-spelled word. I would have our master
> all-publicaitions lexicon entries loaded-appended to that global
> variable gWords, to supplement the main word list.
If you want to do good, intelligent replacement suggestions, look into
the links Brian gave - esp. the aspell one. You should be able to use
aspell as an external program, and use a pipe to get data into it and
back to your app (? I think - haven't used pipes with Rev).
But if you are truly happy to leave your users to sort out the "standard
English" words on their own, and only need to deal with the specialized
ones .....
I'd suggest keeping the specialized words in a separate file. They can
be read in and added to the gWords array, but also kept in a separate
variable. By all means have a separate stack - but why make the user
type the initial letters again ? They've already typed it once
(admittedly wrongly). I'd use their failed attempt as a seed, and sort
all the specialized words in your dictionary against it.
Assuming that the specialized words are to be in gSpecial, a cheap and
cheerful way to generate the suggestions would be
> function suggest pWord
> local tIdeas, L, t, i, tResult
>
> repeat for each line L in gSpecial
> put 0 into t
> repeat with i = 1 to the number of chars in pWord
> if char i of pWord = char i of L then
> add 1 to t
> else
> exit repeat
> end if
> end repeat
> if t > 0 then put L && t & cr after tIdeas
> end repeat
> sort lines of tIdeas descending by word 2 of each
> repeat for each line L in tIdeas
> put word 1 of L & space after tResult
> end repeat
> return tResult
> end suggest
Note - this does assume the first letter is correct - you could relax
that requirement by removing the "if t > 0 then ..." test.
Presentation of this list (and then adding selectedLine to
clipboardData) is very much personal taste - so I didn't do that part.
-- Alex.
--
No virus found in this outgoing message.
Checked by AVG Anti-Virus.
Version: 7.0.300 / Virus Database: 265.7.6 - Release Date: 27/01/2005
More information about the use-livecode
mailing list