chars changed in a fld

Sat Nov 30 17:52:42 EST 2013

On 01/12/2013, at 9:02 AM, Yves COPPE <yvescoppe at skynet.be> wrote:

> I mean with a LiveCode script!
> The link you post is another computer language …;I don’t understand this language
> I hope LC 7 will help me if you can’t …

What the answer on SO says is it's quite reliable (particularly if your string is long) to just check the validity of the UTF8. That's why I posted the wikipedia article which discusses invalid byte sequences in UTF8. So... without reading too deeply into it and while coding in an email client:

function ValidUTF8 pString
    repeat with tCharNum = 192 to 193
          if numToChar(tCharNum) is in pString then return false
    end repeat
    repeat with tCharNum = 245 to 255
          if numToChar(tCharNum) is in pString then return false
    end repeat
    return true
end ValidUTF8

Now... just because it's valid UTF8 doesn't mean it's definitely UTF8 however some editors will encode the unicode byte order mark before it and given these days UTF8 is relatively likely you might do something like this:

function IsUTF8 pString
    -- check for byte order mark
    if char 1 to 3 of pString is numToChar(239)&numToChar(187)&numToChar(191) or pString is an ascii string then
       return true
    else
       -- here we are having an educated guess that it's UTF8
       return ValidUTF8()
    end if
end IsUTF8

It all gets more complicated though if your file could be UTF16 or UTF32 or even modified UTF8 or some other thing... lots of options and I'm not sure how smart the engine will be about this just yet... there's libraries for this stuff so maybe they will incorporate an appropriately licensed one: https://code.google.com/p/uchardet/

Cheers

--
Monte Goulding

M E R Goulding - software development services
mergExt - There's an external for that!