Issues with storage of data in stack

Mark Talluto mark at canelasoftware.com
Fri Mar 23 00:28:57 CET 2018


Hi Lagi,

Sorry about the delayed reply. I have been on a long business trip. Your early designs are far more sophisticated than what we put together here. Super impressive history you have.

LiveCode really is the champion here in that we are only using arrayEncode() and put myArrayA into url() to store the arrays. Selecting which array cluster to store might be easier to understand using a video. 

http://canelasoftware.com/pub/canela/liveCode_array_clustering.mp4

Once you understand how the array is structured, I think the method will be clear.

We do not preallocate space. No appending. We overwrite a cluster when one or more records are saved to disk. The write happens at the end of the CRUD operation taking place. Thus, if you ‘create’ a single record, the record is first created in memory and then the cluster it belongs to is written to disk. I have toyed with the idea of making the write to disk feature controllable by the dev. Thus, you could define when the write is to take place. For example, you might like to write to disk after every 5 transactions or so. But, I have not found the write to affect performance in a noticeable way to need to add that feature.

-Multi User-
Yes, everything is processed sequentially in the cloud. There are no open sockets so you can have massive concurrent connections. All cloud calls are done via ‘post’. They are handled by PHP scripts to write the request to a cache area. One or more LiveCode standalones on the other end processes the request in the order they are received. Thus, should a process go down, no data is lost. When the process comes a back up, everything continues again as normal. Scale is handled by having more than one process be available. More scaling is handled by having data stored across multiple droplets/VMs (sharding). This can keep repeating itself as needed.

-File Size Limitations-
The OS iNode limitations are negated by not reaching its maximum limit. We found 40,000 files would really bring the performance down. Adding clustering of arrays lowers the file count to acceptable and controllable levels. 

-Test Data-
100,000 records in table
Record size average: 45 chars
Keys in each record: last_name, first_name, date_of_birth, middle_name, student_number, gender, grade_level, active

A cluster size, clusters per table, time to load all clusters from disk to RAM, time to write all clusters from RAM to disk, time to write one cluster from RAM to disk:
1, 16, 1.46 secs, 1.5 secs, 91.4 ms
2, 256, 1.52 secs, 1.5 secs, 6.7 ms
3, 4096, 2.38 secs, 1.6 secs, 0.8 ms

I hope this information is helpful. Please let me know if you have any other questions.

Best regards,

Mark Talluto
livecloud.io <http://livecloud.io/>
nursenotes.net <http://nursenotes.net/>
canelasoftware.com <http://www.canelasoftware.com/>




> On Mar 12, 2018, at 10:31 AM, Lagi Pittas <iphonelagi at gmail.com> wrote:
> 
> Hi Mark,
> 
> Thanks for the detailed explanation but I have a few (ish) questions ...
> 
> Hope you don't mind me asking these questions because I did have to
> write my own access routines in those bad old days before I started on
> Clipper/Foxpro/Delphi/Btrieve  and I do enjoy learning from others on
> the list and the forums - those AHA! moments when you finally get how
> the Heapsort works the night before the exam.
> 
> Many moons ago I wrote a multi-way B-TREE based  on the explanation in
> Wirth's Book "Algorithms + Data Structures = Programs" -  in UCSD
> Pascal for the Apple 2,  I  had a 5MB hard Drives for the bigger
> companies when I was lucky, for the smaller companies I made do with 2
> 143k floppy disks and Hashing for a "large" data set- oh the memories.
> I used   the B-Trees  if the codes were alphanumeric. I also had my
> own method where I kept the index in the first X Blocks of the file
> and loaded the parts in memory as they were needed - a brain dead
> version of yours I suppose.  I think we had about 40k of free ram to
> Play with so couldn't always keep everything in RAM. I even made the
> system multi-user and ran 20 Apple ][s on a network using a
> proprietary Nestar/Zynar network using Ribbon Cables -  it worked but
> am I glad we have Ethernet!
> 
> Anyway - I digress. I can understand the general idea of what you are
> explaining but it's the low level code for writing to the
> clusters/file on disk I'm not quite sure of.
> Which way do you build your initial file? Is it "Sparse" or prebuilt,
> or does each cluster  have a "pointer" to previous or next clusters?
> Do you have records "spanning" clusters or do you leave any spare
> space in a cluster empty. Do you mark a "record" as deleted but don't
> remove the record until it's overwritten or do what Foxpro/Dbase does
> and "PACK" them with a utility routine.
> I also presume you use the "AT" option in the write command to write
> the clusters randomly since you don't wriite the whole in memory table
> 
> Which brings me onto my final questions - I presume your system is
> multi-user because you have a server program that receives calls and
> executes them sequentially? And lastly what are the file size
> limitations doing it this way - do You also virtualize the data in
> memory?
> 
> Sorry for all the question but this is the interesting stuff
> 
> Regards Lagi



More information about the use-livecode mailing list