Text encoding: summary of results and times.
Alex Tweedly
alex at tweedly.net
Fri Sep 3 20:29:24 EDT 2021
I went back and re-did the tests, checking on the results.
The file *is* UTF8, so I need to textDecode() it; if I don't, the result
are simply wrong, and so the times are irrelevant.
1. Once it has been textDecoded(), i.e. is in internal format, and I run
my algorithm it gets the correct results, taking 115.1 seconds.
2. BUT, if just before the algorithm is run, I do a textEncode(tStr,
"UTF8") , it gets the correct results (identical to the above), but in
only 3.3 seconds.
The code, in a zip file containing the test stack, SpellCheck Library,
and the 'bible' and "war&peace" sample textfiles, can be downloaded from
https://www.tweedly.org/Downloads/SpellLib.gz
if anyone wants to look at it.
Alex.
On 03/09/2021 13:38, Alex Tweedly via use-livecode wrote:
>
> On 03/09/2021 11:07, David V Glasgow via use-livecode wrote:
>
>> Alex states that put textEncode(tWHoleText, "UTF8") into tWholeText
>> speeds replace up, but David B says LC internal format is UTF16.
>> Doesn’t the 8 vs 16 difference matter? Or matters less than other
>> encodings?
>
> I would regard that timing comparison with much suspicion. I was
> textEncoding() it inappropriately - I had just read it in from a file,
> so I *should* have been textDecoding() it. Therefore it is unclear
> whether the times I was seeing then are meaningful.
>
> Alex.
>
>
> _______________________________________________
> use-livecode mailing list
> use-livecode at lists.runrev.com
> Please visit this url to subscribe, unsubscribe and manage your
> subscription preferences:
> http://lists.runrev.com/mailman/listinfo/use-livecode
More information about the use-livecode
mailing list