OT: Metadata for Text Fragments - Help with Dbase Schema
Sivakatirswami
katir at hindu.org
Sun Mar 28 18:07:34 EDT 2010
Aloha, as usual here I come begging..
I'm working on a schema for a new data base that will "hold anything and
everything" utilizing Dublin Core modified/extended with the Media
Annotation specification where field names are very "generic" and allow
you to maintain metadata and text content in the dbase for a wide
variety of "just about anything" (i.e if you wanted to put into the
same tableimages of species of flowers, or for YouTube video you have
uploaded or chapters of a book, they will all "fit") all to be
accessed later by RunRev desktop apps, Revlets, iRev, iPhone apps etc.
Of course even the Dublin Core Metadata Initiative says that when it
gets down to doing your "application profile" things will start getting
customized pretty quickly... I've drafted a dozen input data sets and
as many output requirement scenarios
I have pretty much sorted almost all possible input and output
requirements and use cases for most resources. But I am stumped by
one, which I thought would be obvious, because in the world of Academia,
this would seem to be an common requirement:
metadata for text fragments, otherwise known as "the citation" but in
this case publisher, author, source are already present fields in the
Dublin Core. The problem comes with how to store metadata where the text
fragment is part of a whole.
OK, what are we talking about. Let's use a real and specific example.
The Hindu vedas are vast and voluminous. We have selected many verses
for specific use. These need to be a) discreet b) re-assemble-able
For example here are some fun thoughts from the Rig Veda about gambling
(dice have been around for millenia!):
---------
Downward they roll, then jump in the air! Though handless themselves,
they can keep the upper hand over those who have! On the board, like
magic coals, they consume, though cold, the player's heart to ashes.
Rig Veda X, 34, 9
Abandoned, the wife of the gambler grieves. Grieved, too, is his mother
as he wanders vaguely. Afraid and in debt, ever greedy for money, he
steals in the night to the home of another.
Rig Veda X, 34, 10
He is seized by remorse when he sees his wife's lot, beside that of her
neighbor with well-ordered home. In the morning, however, he yokes the
brown steeds and at evening falls stupid before the cold embers.
Rig Veda X, 34, 11
-----------
How to best keep the citation string? such that later one could
aggregate these three verse into a unit such as we have above. where
they each have their own record in the dBase. (Also think of "quotes"
"jokes" "sayings" "maxims" etc... in the same category of "text fragments")
an exhaustive generic bibliographic citation is pretty well understood
to be comprised of
(where "collection, author, publisher, date, title etc." are already
present in the Dublin Core spec and my schema)
Series
Volume
Part
Section
Chapter
Paragraph-verse
Now... what is the best way to handle the above in terms of a schema?
this is where I get stumped, the DCMI use of RDF XML style notation is
a different universe and does not translate well to a relational dbase
PostGreSQL schema... If I study the back end MySQL Dbases for boxed LAMP
apps (Drupal, Word Press, XOOPs etc) I see various strategies depending
on who developed the module which uses a specific Table (a snake pit of
tables!)
We see fields that hold discreet data values mapped with relation tables
to other data; and we also see fields that seem to be used to hold an
array of metadata: These are scary!
varChar(255) SomeData value: "a:23;isT:45;bv:$1;...." etc... some
quite long and completely opaque from a human readability point of view
which goes against basic DCMI principles.
The whole name of the game being: how can you keep the metadata clear
enough and simple enough that it can live into the future and be easily
extracted-transformed, where the known problem (well documented) that
schema's which are too opaque are basically cast in stone, with any
second generation agents (programmer, application, export tools etc)
being locked out, required a complete refactoring of the entire frame
work later (very expensive) such that many companies simple a) cannot
upgrade b) suffer the consequences. I'm sure this issue is also present
in a lot of business frameworks.
I searched the web for any models, and will continue to do so... as one
would expect to see a lot of information from the academy where
citations for text fragments are a "mission critical" component for any
published document (PHP, scientific research, book reviews, teaching
texts etc)
But I want to put this out on this list.. if anyone has experience with
dbase schema for metadata for text fragments other wise called
"bibliographic citation" Please email me off list if you have any
advice, pointers or URL's or models or resources, I would be deeply
grateful. Contact me off list.
Or it if you feel this is a subject of general interest then shout
"Please keep this thread on the list!" and we will
TIA!
Sivakatirswami
More information about the use-livecode
mailing list