First 1000 characters without loop?

Monte Goulding monte at appisle.net
Thu Jun 22 21:44:33 EDT 2017


> On 23 Jun 2017, at 11:19 am, Richard Gaskin via use-livecode <use-livecode at lists.runrev.com> wrote:
> 
> Monte Goulding wrote:
> 
> >> On 23 Jun 2017, at 10:06 am, Richard Gaskin wrote:
> >>
> >> How can we know which is in use for a given string?
> >
> > You shouldn’t need to know. The engine will use native encoding where
> > possible for efficiency. A lot of the performance improvements between
> > LC 7 and 8 were using the right code paths based on whether the string
> > is native or unicode.
> 
> Seems murky.  I'd much rather at least have something like a byteLen function, which returns the number of bytes for a given string.  With that I can maintain byte offsets into a file with good performance and no ambiguity.

In theory `the number of bytes of <string>` should in my opinion return whatever the byteLength function would given the codeunit docs state:

> The hierarchy of the new and altered chunk types is as follows: byte w of codeunit x of codepoint y of char z of word …. 

However this report was resolved as not a bug so I guess that theory is wrong and maybe there’s a docs bug in there (I have asked internally on our language channel) http://quality.livecode.com/show_bug.cgi?id=13248 <http://quality.livecode.com/show_bug.cgi?id=13248>
put the number of codeunits of  “😀️” -> 3 

So this is actually 6 bytes but as documented you can’t rely on the codeunit length being 16 bit so I guess that means there is currently no way to get what you want reliably. Whether you need it is a separate discussion.
> 
> 
> >> Suppose I wanted to process a lot of text, so performance is
> >> critical. Using bytes would be optimal, since any chunk type or even
> >> Unicode characters may vary in length.
> >>
> >> So if I wanted to create an index of byte offsets into a large chunk
> >> of text, how would I know how long a character is?
> >
> > If it’s text encoded then you probably want to use character offsets
> > and let the engine worry about optimising it. If you know it’s binary
> > data then use bytes.
> 
> How do I find a substring in binary data in a what that will tell me the number of bytes of the offset?


If you are dealing with bytes of binary data then use byteOffset. Is that what you mean here? Probably better to talk about ranges rather than substrings if you are discussing binary data.

Cheers

Monte




More information about the use-livecode mailing list