Search Values of Array in "One Go"

Richard Gaskin ambassador at fourthworld.com
Thu Aug 24 15:52:47 EDT 2017


Bob Sneidar wrote:

 > I suppose thinking about it you could arrayencode an array then do a
 > search on it

You probably don't want to do that.  I've given a lot of thought to 
methods of indexing large LSON files on disk, and I can find no method 
anywhere near as practical as working with them in their native array 
structure, given their linear, non-indexed LSON format.

Besides, in order to serialize an LC array it needs to be small enough 
to fit into RAM to begin with, so the one thing we know about all LSON 
files is that they fit into RAM nicely. :)

It's been a while since Mark Waddingham generously provided some notes 
in the LSON format (for the older format, which has changed since the 
introduction of Unicode in v7), but IIRC it was roughly:

0x05 -- one-bye header indicating that what follows is an array (now 
0x06 in v7 and later)
   <element type op-code, with 0x05/0x06 being an array>
      <element name> NULL <4-byte data length indicator><element data>

There is a different op-code for numbers than for strings (IIRC 0x02 for 
numbers), allowing numbers to use a more compact binary form.

I'm guessing that in addition to the new op-code indicating an array 
type, the former NULL separator between element name and length UINT4 
has been replaced with a preceding length byte, since of course NULLs 
can be part of the Unicode string of the element name.

Nice tidy format, well suited for disk storage and network transfer, but 
looking for things requires linear search in LSON, whereas the 
de-serialized native array form takes advantage of the super-quick 
bucket hash to find a given key.

Much as XQuery works on an in-memory, already-parsed form of XML, 
searching associative arrays in memory will be the way to go.

So looping is both faster to execute and easier to script in array form.

The biggest challenges would be parsing the query expression, and 
generalizing evaluation within the loop(s) to handle the range of query 
options.


-- 
  Richard Gaskin
  Fourth World Systems
  Software Design and Development for the Desktop, Mobile, and the Web
  ____________________________________________________________________
  Ambassador at FourthWorld.com                http://www.FourthWorld.com




More information about the use-livecode mailing list