david.bovill at gmail.com
Thu Sep 2 08:01:13 EDT 2021
Thanks for the question Alex, I’m wrestling with the same issues - but so far got no responses from encoding gurus here :)
This is my understanding:
1) Yes its recommended to textEncode text that comes from outside into Livecode’s internal native format (which is utf16). Livecode handles everything internally “transparently” from then on - which I guess means all usual language and control operations expect this utf16 internal format. My guess is this is why a few things have got slower as compared with early versions of Livecode.
2) Without doing textEncode the engine tries to guess the encoding (duck-typing?) and does this in a platform specific way? Again exactly what is going on there is a bit opaque to me, but the take-home message is that this is slower and less robust. So yes -losing nothing (assuming the original file is utf8, and yes its the best alternative.
I thing the hard thing to find out is exactly what type of encoding some files are - would be great if there was a duck-typing service where we could paste text or upload files and it would say - hey this looks like utf8 - but that’s asking too much
📆 Schedule a call with me
On 2 Sep 2021, 12:12 +0100, Alex Tweedly via use-livecode <use-livecode at lists.runrev.com>, wrote:
> Sorry to drag us off the interesting topic of licensing :-) into some
> Livecode question.
> I know little or nothing about Unicode, text encodings, etc. - so my
> question is indeed naive.
> I have a text file (War & Peace from Project Gutenberg), about 3.4Mb.
> The Mac describes it simply as "Plain text".
> When I read that into a variable, and then do
> replace tChar by SPACE in tWholeText
> it takes between 1000 and 4000 millisecs - versus the 8-10 msecs I had
> expected from other samples.
> If I put in
> put textEncode(tWHoleText, "UTF8") into tWholeText
> before the replace then it does indeed tae 8-10 msecs.
> Q1. What (if anything) am I losing by doing that ?
> Q2. Is this the best alternative ?
> Additional info - I just discovered that according to 'more' command
> line, the file start with :
> <U+FEFF>The Project ....
> if that is useful.
> Many thanks,
> use-livecode mailing list
> use-livecode at lists.runrev.com
> Please visit this url to subscribe, unsubscribe and manage your subscription preferences:
More information about the use-livecode