All this talk about DataBases
Richard Gaskin
ambassador at fourthworld.com
Wed May 30 01:19:16 EDT 2007
Scott Kane wrote:
> I'd be curious to know why RR decided to change the behaviour of how
> stacks are read (from file as opposed to loaded fully into RAM).
Everything in computing involves tradeoffs. The question of HC's
storage vs. Rev's is about paging:
HC is constantly picking up and putting down small portions of stack
files, hoping to keep accurate track of which piece it left where when.
This paging adds tremendous complexity to the management of such data,
and even in the hands of experienced developers such methods are
inherently prone to corruption. Error 5454 was a popular experience for
HC folks, and note that FileMaker, which also uses a paging scheme,
solved the problem only by putting an item in the File menu specifically
for rebuilding files that have been corrupted, it happens so often.
With unusual care it was possible to have an unusually low number of
corrupted stacks in HC, but I never met a HyperCard developer who didn't
get a 5454 at least once.
And for all that complexity and fragility, we still have a system whose
inventor made it clear more than once that he felt it was not a
substitute for a database (popular usage conflicting with his
recommendation notwithstanding).
Raney took Aktinson's advice and built a system optimized to favor that
wisdom: if it's not a database, why create the fragile illusion that it
should be used as one, when there's a whole world of both performance
enhancement and developer control available.
The latter is an interesting point: HC always saved, whenever it wanted
to, and you had no control over that. But in the rest of the world
users are in control of what gets saved, what gets discarded, what gets
reverted, etc. With Rev, you're in the driver's seat. If you want to
save, then just save, and if not then don't. You can implement
traditional document behaviors with Rev which just weren't possible in
HC-based apps.
Extra bonus points: Since Rev stacks are written to disk as a single
operation, true corruption of the stack file is a very rare thing. In
more than a decade with the product I've only seen one true case or
corruption, and only heard of maybe two others which might possibly have
been corruption. That brings the failure rate due to the memory
management scheme down from HC by several orders of magnitude.
> I suspect it would be possible to work around this, I believe Rob Cozens
> does something of the sought with Serendipity, but the question is
> whether it's really worth while given it's all there already with a
> "real" database.
What is "real"?
A "database" could be seen more generically as a "data store", which may
help us appreciate the differences between a simple flat-file like HC
and a full-blown RDBMS, with its relationality, data types, and other
features which aren't part of the HC model.
So think about it: if all you need is rows and columns, why not just
use rows and columns?
Why bother with the overhead of storing the data in fields on cards,
when you can easily parse item and line chunks of a single block of data
so very efficiently?
I've been using simple tab-delimited tables for a wide range of
application data for years, and have found it reasonably efficient for
data sets of up to 50,000 records, sometimes more. Even HC bogs down
with 50,000 cards, and putting my records into HC fields and cards would
bloat the storage size by several MBs with all the unnecessary object
overhead.
I started out writing these tables to text files, but inevitably I found
I wanted multiple tables, metadata, and a lot more. So instead I just
started tucking these tables into custom properties, and today my
favorite data file format is the stack file once again -- but rather
than using fields on a large number of cards, I store entire tables in a
single custom property, along with anything else I want in any other
properties, all easily and robustly accessible using native Rev commands.
And think about it: since every Rev object has multiple property sets,
and a stack can have any number of cards, and cards can have groups,
etc. -- all this means you can have richly hierarchically-ordered data
sets using just custom properties. Hierarchies reflect much of the
world's taxonomy, and were not gracefully done with HC.
So in brief, while Rev does ask developers to think about the needs of
their data more carefully, it also provides many rich ways to work with
that data very efficiently, all with native commands. For any project
with fewer than 50,000 records per table (probably about 80-90% of all
projects <g>), you may find you can do everything you need without a
single database connection. And as your needs grow, Rev provides those too.
--
Richard Gaskin
Fourth World Media Corporation
___________________________________________________________
Ambassador at FourthWorld.com http://www.FourthWorld.com
More information about the use-livecode
mailing list