Text encoding.

Keith Clarke keith.clarke at me.com
Thu Sep 2 07:53:20 EDT 2021


I may be wrong but I thought Mac’s ‘Plain Text’ just meant it’s a ‘text.txt’ MIME type file, which could be encoded as ASCII, UTF-8, UTF-16 or UTF-32, rather than a 'text.rtf’ rich text MIME type file, with the embedded markup for styling, such as bold, italic, etc.

The '<U+FEFF>’ at the start of the document is the Byte Order Mark, suggesting UTF-16 in ‘little-endian’ order - see https://en.wikipedia.org/wiki/Byte_order_mark

HTH
Best,
Keith

> On 2 Sep 2021, at 12:12, Alex Tweedly via use-livecode <use-livecode at lists.runrev.com> wrote:
> 
> Sorry to drag us off the interesting topic of licensing :-) into some Livecode question.
> 
> I know little or nothing about Unicode, text encodings, etc. - so my question is indeed naive.
> 
> I have a text file (War & Peace from Project Gutenberg), about 3.4Mb. The Mac describes it simply as "Plain text".
> 
> When I read that into a variable, and then do
>     replace tChar by SPACE in tWholeText
> it takes between 1000 and 4000 millisecs - versus the 8-10 msecs I had expected from other samples.
> 
> If I put in
>     put textEncode(tWHoleText, "UTF8") into tWholeText
> before the replace then it does indeed tae 8-10 msecs.
> 
> Q1. What (if anything) am I losing by doing that ?
> 
> Q2. Is this the best alternative ?
> 
> Additional info - I just discovered that according to 'more' command line, the file start with :
> 
> <U+FEFF>The Project ....
> 
> if that is useful.
> 
> Many thanks,
> 
> Alex.
> 
> 
> _______________________________________________
> use-livecode mailing list
> use-livecode at lists.runrev.com
> Please visit this url to subscribe, unsubscribe and manage your subscription preferences:
> http://lists.runrev.com/mailman/listinfo/use-livecode





More information about the use-livecode mailing list