Text encoding.
Keith Clarke
keith.clarke at me.com
Thu Sep 2 07:53:20 EDT 2021
I may be wrong but I thought Mac’s ‘Plain Text’ just meant it’s a ‘text.txt’ MIME type file, which could be encoded as ASCII, UTF-8, UTF-16 or UTF-32, rather than a 'text.rtf’ rich text MIME type file, with the embedded markup for styling, such as bold, italic, etc.
The '<U+FEFF>’ at the start of the document is the Byte Order Mark, suggesting UTF-16 in ‘little-endian’ order - see https://en.wikipedia.org/wiki/Byte_order_mark
HTH
Best,
Keith
> On 2 Sep 2021, at 12:12, Alex Tweedly via use-livecode <use-livecode at lists.runrev.com> wrote:
>
> Sorry to drag us off the interesting topic of licensing :-) into some Livecode question.
>
> I know little or nothing about Unicode, text encodings, etc. - so my question is indeed naive.
>
> I have a text file (War & Peace from Project Gutenberg), about 3.4Mb. The Mac describes it simply as "Plain text".
>
> When I read that into a variable, and then do
> replace tChar by SPACE in tWholeText
> it takes between 1000 and 4000 millisecs - versus the 8-10 msecs I had expected from other samples.
>
> If I put in
> put textEncode(tWHoleText, "UTF8") into tWholeText
> before the replace then it does indeed tae 8-10 msecs.
>
> Q1. What (if anything) am I losing by doing that ?
>
> Q2. Is this the best alternative ?
>
> Additional info - I just discovered that according to 'more' command line, the file start with :
>
> <U+FEFF>The Project ....
>
> if that is useful.
>
> Many thanks,
>
> Alex.
>
>
> _______________________________________________
> use-livecode mailing list
> use-livecode at lists.runrev.com
> Please visit this url to subscribe, unsubscribe and manage your subscription preferences:
> http://lists.runrev.com/mailman/listinfo/use-livecode
More information about the use-livecode
mailing list