OT: Metadata for Text Fragments - Help with Dbase Schema

Sivakatirswami katir at hindu.org
Sun Mar 28 18:07:34 EDT 2010


Aloha, as usual here I come begging..

I'm working on a schema for a new data base that will "hold anything and 
everything" utilizing Dublin Core modified/extended with the Media 
Annotation specification where field names are very "generic" and allow 
you to maintain metadata and text content in the dbase for a wide 
variety of "just about anything"  (i.e if you wanted to put into the 
same tableimages of species of flowers, or for YouTube video you have 
uploaded or chapters of  a book, they will all "fit")  all to be 
accessed later by RunRev desktop apps, Revlets, iRev, iPhone apps etc.

Of course even the Dublin Core Metadata Initiative says that when it 
gets down to doing your "application profile" things will start getting 
customized pretty quickly... I've drafted  a dozen input data sets and 
as many output requirement scenarios
I have pretty much sorted almost all possible input and output 
requirements and use cases for most resources.   But I am stumped by 
one, which I thought would be obvious, because in the world of Academia, 
this would seem to be an common requirement:

  metadata for text fragments, otherwise known as "the citation"  but in 
this case publisher, author, source are already present fields in the 
Dublin Core. The problem comes with how to store metadata where the text 
fragment is part of a whole.

OK, what are we talking about. Let's use a real and specific example. 
The Hindu vedas are vast and voluminous. We have selected many verses 
for specific use. These need to be a) discreet b) re-assemble-able

For example here are some fun thoughts from the Rig Veda about gambling 
(dice have been around for millenia!):

---------

Downward they roll, then jump in the air! Though handless themselves, 
they can keep the upper hand over those who have! On the board, like 
magic coals, they consume, though cold, the player's heart to ashes.

Rig Veda X, 34, 9

Abandoned, the wife of the gambler grieves. Grieved, too, is his mother 
as he wanders vaguely. Afraid and in debt, ever greedy for money, he 
steals in the night to the home of another.

Rig Veda X, 34, 10

He is seized by remorse when he sees his wife's lot, beside that of her 
neighbor with well-ordered home. In the morning, however, he yokes the 
brown steeds and at evening falls stupid before the cold embers.

Rig Veda X, 34, 11

-----------

How to best keep the citation string? such that  later one could 
aggregate these three verse into a unit such as we have above. where 
they each have their own record in the dBase. (Also think of "quotes" 
"jokes" "sayings" "maxims" etc... in the same category of "text fragments")


an exhaustive generic bibliographic citation is pretty well understood 
to be comprised of
(where "collection, author, publisher, date, title etc." are already 
present in the Dublin Core spec and my schema)

Series
Volume
Part
Section
Chapter
Paragraph-verse

Now... what is the best way to handle the above in terms of a schema? 
this is where I get stumped, the DCMI use of RDF  XML style notation is 
a different universe and does not translate well to a relational dbase 
PostGreSQL schema... If I study the back end MySQL Dbases for boxed LAMP 
apps (Drupal, Word Press, XOOPs etc) I see various strategies depending 
on who developed the module which uses a specific Table (a snake pit of 
tables!)

We see fields that hold discreet data values mapped with relation tables 
to other data; and we also see fields that seem to be used to hold an 
array of metadata: These are scary!

varChar(255) SomeData  value: "a:23;isT:45;bv:$1;...."   etc... some 
quite long and completely opaque from a human readability point of view 
which goes against basic DCMI principles.

The whole name of the game being: how can you keep the metadata clear 
enough and simple enough that it can live into the future and be easily 
extracted-transformed, where the known problem (well documented) that 
schema's which are too opaque are basically cast in stone, with any 
second generation agents (programmer, application, export tools etc) 
being locked out, required a complete refactoring of the entire frame 
work later (very expensive) such that many companies simple a) cannot 
upgrade b) suffer  the consequences. I'm sure this issue is also present 
in a lot of business frameworks.

I searched the web for any models, and will continue to do so... as one 
would expect to see a lot of information from the academy where 
citations for text fragments are a "mission critical" component for any 
published document (PHP, scientific research, book reviews, teaching 
texts etc)

But I want to put this out on this list.. if anyone has experience with 
dbase schema for metadata for text fragments  other wise called 
"bibliographic citation" Please email me off list if you have any 
advice, pointers or URL's or models or resources, I would be deeply 
grateful. Contact me off list.

Or it if you feel this is a subject of general interest then shout 
"Please keep this thread on the list!" and we will

TIA!

Sivakatirswami










More information about the use-livecode mailing list