Unicode, the clipboard and LC fields
    Peter W A Wood 
    peterwawood at gmail.com
       
    Fri Jan 24 18:22:34 EST 2014
    
    
  
Richmond
It is almost impossible to determine the encoding of text from the contents of the text. You can take educated guesses but when even just considering four different encodings that is tricky.
You can get an idea of the complexity by taking a quick look at the encoding? function in this REBOL script. (You should be able to find the function as there is a big banner with encoding? at the top of it.) The script counts characters that are likely to be in one encoding but not in another. For instance, presence of characters 129, 141, 144 and 157 give a hint that the text is MacRoman encoded.
Regards
Peter
On 25 Jan 2014, at 02:54, Richmond wrote:
> On 22/01/14 20:41, Graham Samuel wrote:
>> Richmond, thanks for inching my problem towards a solution. I downloaded your test.
> 
>> Clever, in fact too clever for me.
> 
> Possibly, but NOT clever enough . . .
> 
> I would like an easy way to know what character encoding is being used in a textField:
> 
> NOT just whether it is Unicode or Not:
> 
> There are all sorts of variable such as
> 
> fontLanguage  [I have never quite worked out how that jives with Unicode],
> 
> MacCyrillic,
> 
> and so on, ad nauseam.
> 
> ------
> 
> For the sake of argument, and at the risk of repeating myself:
> 
> I managed to resurrect a 120 page 'thing' of my wife's, written in mixed English and Bulgarian on
> Mac OS 9 when Mac OS 9 was all the rage.
> 
> In the end . . . after a lot of blood, sweat, tears and incredibly coarse remarks, I manged to turn it into
> a PDF with an embedded text layer .  . . allowing, at least, the English to be directly transferred into an ODT
> document.
> 
> However my wife will still have the "joy" of having to retype all the Bulgarian and all the other bits of text
> in various other languages, because they were initially typed on Mac OS 9 in the "funny ways" Mac did
> things then which are not the same as the "funny ways" (a.k.a. Unicode) we do things now.
> 
> Had I had a stack that allowed me to import the document, or copy-paste the text, and then been able to tell
> me the encodings of the various bits (chunks) so I could have run them through some merry little algorhythms,
> life would have been considerably more refreshing.
> 
> ------------
> 
> Now, I know the argument about Livecode not being a jollified word-processor that was trotted out when I made a few
> comments about Supercard having ways of doing paragraphing and so on.
> 
> And, Livecode may NOT be a jollified word-processor; but if it is meant to be a computer programming language
> rather than a simplified subset of one, it should have the wherewithall for programmers to build a word-processor
> without recourse to outside resources. That means (quite apart from paragraph breaks, which can be easily arranged in Livecode)
> the ability to recognise and tell the programmer all sorts of tex-encoding standards.
> 
> -----------
> 
> Now Graham's "Clever" is jolly gratifying, but, frankly, comparing 2 textFields in not very clever,
> and, while that can differentiate between ASCII text and Unicode text that is as far as it goes.
> 
> ---------
> 
> My latest riff is to have a command of the sort:
> 
> put textEncoding
> 
> and something of the sort 'plainText', 'RTFtext', 'htmlText', 'unicodeText' will be output as a result.
> 
> And then, for those who really go a bundle on this kind of thing, we might extend that to 'UTF8', 'UTF16', 'UTF32' and so forth.
> 
> Richmond.
> 
> _______________________________________________
> use-livecode mailing list
> use-livecode at lists.runrev.com
> Please visit this url to subscribe, unsubscribe and manage your subscription preferences:
> http://lists.runrev.com/mailman/listinfo/use-livecode
    
    
More information about the use-livecode
mailing list