Re: Re: A suggestion for an in-memory database, following up on Richard’s experiment

Richard Gaskin ambassador at fourthworld.com
Thu Mar 1 12:45:18 EST 2018


Tom Glod wrote:

 > no reason why it wouldn't work... but keep 2 things in mind
 >
 > if you use community edition, the number of HHTP Requests you can make
 > to the same domain at one time is exactly 1.

How many are needed from a single client?

While it's unfortunate the LiveCode Community Edition is the only open 
source scripting language I know of that doesn't have CURL support 
available in its community, that's a problem that can be fixed here as 
it has been for others: it would be possible for someone in the 
community to wrap the most relevant subset affordably.  HTTPS alone 
would take care of at least 80% of real-world needs in our API-driven world.

And in the meantime, the limit for domain access is in the scripted 
libURL, so a modded version could up that (though I wouldn't take it too 
high, for the same reasons browsers default to a small number of 
connections to a single host).


 > For a system like this, it could be better to save to an sql because
 > seperate TXT files would be a lot of IO calls.

How would the method used for moving data from disk to socket affect 
throughput for socket comms?

Ultimately all persistent storage needs to write to disk.  MySQL is very 
efficient, but does a lot of complex B-tree traversal. It's possible 
(for _very_ limited use cases) to outperform it for some forms of simple 
retrieval in LC script.

You'd never match even a small fraction of the full scope of DB features 
without losing that advantage many times over.  But if you knew in 
advance that the only thing you ever needed to do was simple retrieval 
your options are broad. Even writes aren't too bad under some circumstances.

Another concern for larger systems may be inode depletion:  if you have 
a lot of records and each record is a separate file, unless you tune the 
file system from its defaults you're limited to roughly total-disk-kb/4 
for the number of files (Ext4 and most others these days use a default 
4k block size).


 >  I'd be curious to see the performance of using LC Arrays as Database.

Poor as CGI, promising as daemon.

The ability to serialize arrays in LC is very nice, but as intensive as 
one would imagine for the task:  beyond the ftstat and fread, it needs 
to parse the data, extracting each element from length indicators, 
translating numbers (which are serialized in binary form) by type 
indicators, and tucking what it finds into an array, key and element by 
key and element.  Certainly faster in the engine's machine code than 
trying to do it in script, but it's a lot of work no matter who's doing it.

I was doing some measurements on this the other day, exploring options 
for server storage that might be reasonably performant while more 
portable than MySQL (and unencumbered by GPL in case I decide to ship a 
complete solution from it).

One file was plain text, with keys longer than we commonly find but much 
shorter than the max of 255 (35 chars) as item 1 of each line, and a 
10-character integer as the value in item 2.

The second file was that same data in array form, stored on disk as LSON.

The test was for CGI, so each iteration reads the file from disk and 
obtains the value associated with what happens to be the last key in the 
text file.  I chose the last key specifically because it would be the 
worst case for lineoffset, while of course for arrays it makes almost no 
difference.

But even weighted against using lineoffset in a text file, the overhead 
of arrayDecode more than ate up any benefits of using arrays for a 
simple single lookup:  the LSON file took nearly 8 times longer for 
100,000 keys:

Text:  21.8 ms
LSON: 167.9 ms

All that said, the overhead of LSON only applies for CGIs, where each 
request is effectively working from a cold boot, and any files used need 
to be read and unpacked each time.

As a daemon, the array would already be in memory, completely avoiding 
the overhead of deserialization.

In a broad sense that's more or less how MongoDB works: a key-value 
store in which the index is RAM-bound, with data on disk found by 
pointers in the index.

Using a CouchDB-like logfile method (append is among the faster disk 
write options), one could get pretty good performance for storing any 
arbitrary data; kinda like have one big array on disk, but with the 
added benefit of built-in versioning.

But this is ideal only for limited use cases, in which both of these 
conditions are met: shared hosting where you have no control over the 
DBs you can use, a preference for document-style NonSQL storage.

If you're really concerned about C10k, you're probably not on a shared 
host (or you'll soon find out why you don't want to be on a shared host 
for that <g>).

And if you're on a well-equipped VPS or dedicated server, there's 
probably no reason why you wouldn't just use MongoDB or CouchDB if you 
prefer those.  Compiled to machine code they'll give you not only far 
better performance than any scripted solution, but far more efficient 
and flexible options for managing the other half of most NonSQL stores, 
materialized views.

TL;DR:

I appreciate the desire for LC-based server components more than most, 
but given the performance advantage of any dedicated storage option 
those are better for scalable systems.

And even as middleware, LiveCode Server is great for small low-load 
systems, but given the blinding speed of PHP7 its advantages make it the 
clear winner among scripting languages where performance is critical.

--
  Richard Gaskin
  Fourth World Systems
  Software Design and Development for the Desktop, Mobile, and the Web
  ____________________________________________________________________
  Ambassador at FourthWorld.com                http://www.FourthWorld.com




More information about the use-livecode mailing list