Data Persistence

Richard Gaskin ambassador at fourthworld.com
Tue Jul 31 15:03:40 EDT 2018


Ben Rubinstein wrote:

 > FWIW, if your needs are super-simple, saving the values of various
 > fields -  and especially if those fields are not going to have
 > multi-line values, you may want to check out the keywords 'combine'
 > and 'split'.
 > The advantage over arrayEncode is that the file is very readable -
 > with the above delimiters, each line is an entry from the array, in
 > the format <key><tab><value>.

An excellent option for single-line name-value pairs.


 > I know Richard questions why one would
 > ever want to read "LSON": my answer is that during development it can
 > be helpful, either directly when developing on desktop, or attaching
 > to error reports once the app has moved onto a device.

To clarify, my only one-size-fits-all rule on this is that there is no 
one-size-fits-all-rule for any of this. :)

What's best for a given circumstance depends on the given circumstance.

I use a format I call ISHI for my CMS and a growing range of other 
things.  I'll get around to documenting it later, but it's little more 
than a slightly-augmented use of split and combine.  As such, it gives 
me blinding speed in CGI environments where every millisecond matters, 
while retaining the flexible ease of plain text that can be written 
anywhere in anything, a valuable trait for a system based around textual 
content authoring.


Uses cases where LSON (LC's binary output from arrayEncode) is favorable 
have different requirements, such as array depth.  I use LSON in systems 
where I need the flexibility of a schema-free format and the ability to 
employ any level of depth to keep document internals separate from one 
another for instant parsing.

Humans are very good at handling one- and two-dimensional writing; e.g., 
strings and tables.  Easy to conceptualize, easy to maintain.

Once you add a third dimension things get murky, increasing cognitive 
load for writing and machine load for parsing.

With 3 or more dimensions, it's often more productive to craft a UI to 
edit the data, something that's super-easy to do in a tool like LiveCode.

And once you've made a UI for that, the storage format no longer 
matters, so there's no downside to keeping it in LC's native binary LSON 
format.

There are probably dozens of other use cases favoring different formats; 
these are just a couple examples.  None are hard to work with, so we can 
pick and choose each according to the needs of the task at hand.

With LC v9.0.1 being able to examine raw array data during development 
has never been easier, thanks to the recently-fixed tree widget: drop 
one on a card, toss an array into it, see and edit everything easily.



 > The disadvantage is that you need a couple of characters that you know
 > won't be included in either key or value. It doesn't have to be tab
 > and return; you could use "◊" and "¶" - but if you have any doubts of
 > what your users might ever need to enter, this is an issue.

Aye, and there's the rub: now you've created another format standard. :)

https://xkcd.com/927/

Split and combine work naturally on tabular data where either the first 
column is a key or a sequential integer series can be good keys.

And as long as the data associated with each key is single line it'll 
work wonderfully.

But once element data includes returns, it becomes three dimensional. 
Not too onerous, reasonably efficient, provided you keep in mind a few 
things of the sort you mentioned here, about making sure your delimiters 
will never occur in the data itself, lest you need to escape those 
delimiters, which may at times require a means to escape the escapes.

I spent some much time poking fun at the most common form of that a 
while back, Comma-Separated Values, here:
http://www.fourthworld.com/embassy/articles/csv-must-die.html

It's a generally dismissible article unless you enjoy the comically 
strident tone, but the note toward the end may be helpful for choosing 
delimiters at least matching those in fairly common use.


The challenge with any plain-text expression of multidimensional data 
becomes more evident the moment you need more depth beyond three.

And that's why serialization formats were created.

XML led the pack for a long time, offering not only richly hierarchical 
data expression in a universal standard, but with nice extras like 
element attributes.

But the closing and ending tag pairs are cumbersome to write and add 
bulk to every element and sub-element.

So some clever soul decided to come up with a more compact serialization 
that just happened to be parseable using the existing JavaScript engine 
(arguably the most common use case since browsers are where API data is 
most often consumed), and thus JSON was born.

Along the way, MongoDB came up with BSON (Binary JSON) as a means of 
enjoying much of the schema-free flexibility of JSON, but in a binary 
format that was both more compact and much quicker for machines to 
parse, an excellent choice for large-scale storage in a system like 
Mongo (and in some ways the closest match to LC's LSON).

For all of JSON's JS-friendly compactness over XML, hand-writing it can 
be tedious and error-prone, so YAML was invented as the answer for those 
cases where human-writability is a stronger need than parsing efficiency 
(all the rules about white space make it luxuriously readable to humans 
at a minor cost to machines to wade through all that).  YAML is most 
often use for configuration where settings may lend themselves to data 
that might need hierachically-ordered sets deeper than a two-dimentional 
table.

Lots of choices, litte dogma.  Just use whatever works well for what you 
need to do in the moment.

And if you factor data storage through accessors, you can completely 
change the underlying storage format at any time without affecting 
anything else in your code base. Pass in arrays, expect arrays back, and 
what happens in between becomes a black box whose storage details don't 
matter. Go wild, try them all. :)

-- 
  Richard Gaskin
  Fourth World Systems
  Software Design and Development for the Desktop, Mobile, and the Web
  ____________________________________________________________________
  Ambassador at FourthWorld.com                http://www.FourthWorld.com





More information about the use-livecode mailing list