byteLen()?

Richard Gaskin ambassador at fourthworld.com
Thu Mar 9 16:24:50 EST 2017


Thanks for that background, Mark.  I always appreciate your informal 
tech notes.

I'm copying only the most relevant parts here - others looking for a 
good reach will want the full post if you missed it:
http://lists.runrev.com/pipermail/use-livecode/2017-March/235278.html


Mark Waddingham wrote:

 > This approach means that any multi-codepoint character in Unicode
 > still maps to a single byte - and any non-updated code which
 > manipulates strings as if they are data will still work (albeit with
 > some data loss in regards the original Unicode string - which it
 > wasn't written to understand anyway).

I'm not sure I follow that, but it almost sounds like no matter what the 
encoding each char is mapped to one byte, so a 5-chart string like 
"hello" will take up 5 bytes - is that right?

Doesn't feel right, but there's so much to both Unicode and how LC 
handles it that I've lost my confidence with things like this.

Your guidance is appreciated, and perhaps it may help if I describe the 
use-case at hand:

I have some large files I want to open and read as binary (for speed 
mostly; if there's a reason I should be doing that as text let me know), 
then I'll work my way through it looking for substrings, keeping track 
of the byte offsets within the data where those can be found.

Once I have my list of byte offsets, I can save that as a sort of index 
file, and use "seek" or "read at" to go directly to that portion of the 
larger files whenever I need to access that data.

The data files may use a variety of encodings, mostly UTF-8 but I can 
expect Latin-ISO or perhaps even UTF-16.  In short, encoding will may be 
known in advance.

But since I'm working with binary data the whole time, the encoding 
shouldn't matter, should it?

Earlier you wrote:

   the number of bytes in textEncode(tText, kEncoding)

...which implies that I would need to know the encoding (kEncoding), but 
do I really need textEncode for the use-case described here?

-- 
  Richard Gaskin
  Fourth World Systems
  Software Design and Development for the Desktop, Mobile, and the Web
  ____________________________________________________________________
  Ambassador at FourthWorld.com                http://www.FourthWorld.com




More information about the use-livecode mailing list