All this talk about DataBases

Richard Gaskin ambassador at fourthworld.com
Wed May 30 01:19:16 EDT 2007


Scott Kane wrote:

> I'd be curious to know why RR decided to change the behaviour of how
> stacks are read (from file as opposed to loaded fully into RAM).

Everything in computing involves tradeoffs.  The question of HC's 
storage vs. Rev's is about paging:

HC is constantly picking up and putting down small portions of stack 
files, hoping to keep accurate track of which piece it left where when. 
  This paging adds tremendous complexity to the management of such data, 
and even in the hands of experienced developers such methods are 
inherently prone to corruption.  Error 5454 was a popular experience for 
HC folks, and note that FileMaker, which also uses a paging scheme, 
solved the problem only by putting an item in the File menu specifically 
for rebuilding files that have been corrupted, it happens so often. 
With unusual care it was possible to have an unusually low number of 
corrupted stacks in HC, but I never met a HyperCard developer who didn't 
get a 5454 at least once.

And for all that complexity and fragility, we still have a system whose 
inventor made it clear more than once that he felt it was not a 
substitute for a database (popular usage conflicting with his 
recommendation notwithstanding).

Raney took Aktinson's advice and built a system optimized to favor that 
wisdom:  if it's not a database, why create the fragile illusion that it 
should be used as one, when there's a whole world of both performance 
enhancement and developer control available.

The latter is an interesting point:  HC always saved, whenever it wanted 
to, and you had no control over that.  But in the rest of the world 
users are in control of what gets saved, what gets discarded, what gets 
reverted, etc.  With Rev, you're in the driver's seat.  If you want to 
save, then just save, and if not then don't.  You can implement 
traditional document behaviors with Rev which just weren't possible in 
HC-based apps.

Extra bonus points: Since Rev stacks are written to disk as a single 
operation, true corruption of the stack file is a very rare thing.  In 
more than a decade with the product I've only seen one true case or 
corruption, and only heard of maybe two others which might possibly have 
been corruption.  That brings the failure rate due to the memory 
management scheme down from HC by several orders of magnitude.


> I suspect it would be possible to work around this, I believe Rob Cozens
> does something of the sought with Serendipity, but the question is
> whether it's really worth while given it's all there already with a
> "real" database.

What is "real"?

A "database" could be seen more generically as a "data store", which may 
help us appreciate the differences between a simple flat-file like HC 
and a full-blown RDBMS, with its relationality, data types, and other 
features which aren't part of the HC model.

So think about it:  if all you need is rows and columns, why not just 
use rows and columns?

Why bother with the overhead of storing the data in fields on cards, 
when you can easily parse item and line chunks of a single block of data 
so very efficiently?

I've been using simple tab-delimited tables for a wide range of 
application data for years, and have found it reasonably efficient for 
data sets of up to 50,000 records, sometimes more.  Even HC bogs down 
with 50,000 cards, and putting my records into HC fields and cards would 
bloat the storage size by several MBs with all the unnecessary object 
overhead.

I started out writing these tables to text files, but inevitably I found 
I wanted multiple tables, metadata, and a lot more.  So instead I just 
started tucking these tables into custom properties, and today my 
favorite data file format is the stack file once again -- but rather 
than using fields on a large number of cards, I store entire tables in a 
single custom property, along with anything else I want in any other 
properties, all easily and robustly accessible using native Rev commands.

And think about it:  since every Rev object has multiple property sets, 
and a stack can have any number of cards, and cards can have groups, 
etc. -- all this means you can have richly hierarchically-ordered data 
sets using just custom properties.   Hierarchies reflect much of the 
world's taxonomy, and were not gracefully done with HC.

So in brief, while Rev does ask developers to think about the needs of 
their data more carefully, it also provides many rich ways to work with 
that data very efficiently, all with native commands.  For any project 
with fewer than 50,000 records per table (probably about 80-90% of all 
projects <g>), you may find you can do everything you need without a 
single database connection.  And as your needs grow, Rev provides those too.

-- 
  Richard Gaskin
  Fourth World Media Corporation
  ___________________________________________________________
  Ambassador at FourthWorld.com       http://www.FourthWorld.com



More information about the use-livecode mailing list