Importing data into RevDB

Richard Gaskin ambassador at fourthworld.com
Fri Dec 4 18:30:14 EST 2009


David Coker wrote:

 > Thanks for the suggestions and example Jim!
 >
 > Of course I'm using sample data to test with, but the real files
 > that need to be worked with are pretty massive. To give you an idea
 > of just how massive...
 >
 > I have a subset of data that I've been working with for over a week
 > now, massaging text, filing in blank fields (concatenation in Excel)
 > and thought I had everything taken care of as of late last night. I
 > found out this morning that they needed one more column of data merged
 > into my "finished" file.

If you have it in Excel, can you export it using tab-delimited?

Tabs rarely occur in field data, and if the field values don't contain 
tabs or returns you'd be able to use normal chunk expressions for 
orders-of-magnitude better performance.


 >>Just so you know CSV is the second worst format ever invented.
 >>They are still searching for worst one, but have not found it yet.
 >
 > <BIG GRIN> No question about it! </BIG GRIN>

They found it:  it was another form of CSV. ;)

That's one of the many problems with CSV: it isn't a single defined 
format, but rather a collection of ad hoc variants.  I've seen 
differences in escaping and quotation used among even products from just 
Microsoft, and in different versions of the same Microsoft products, not 
to mention the even greater number of variants used by other programs.

Some use quotes around every field value, others use quotes around only 
textual values but not numbers, others use quotes around only multi-word 
values but not around text that contains a single word, and others 
escape quotes that are in values with a preceding slash, others escape 
quotes by using double quotes (a total Whiskey Tango Foxtrot 
"solution"), others also escape returns in values while many leave 
returns unescaped requiring you to figure it out character-by-character, 
and others do even weirder things....

I've had to write CSV parsers, using flags as Jim outlined.  After 
seeing the loss of productivity and performance from those formats, I 
now have a policy of never delivering any product with a CSV export 
option, on moral grounds. :)

Any format that uses delimiter characters as commonly used in field 
values as a comma is, to be as polite as possible, a stupid invention, 
almost an anti-invention.

In a just world, whomever first deployed a system that used CSV would be 
found and put in stocks in the public square with a sign reading:

  "I'm responsible for the loss of several million hours
   of other people's time."

--
  Richard Gaskin
  Fourth World
  Rev training and consulting: http://www.fourthworld.com
  Webzine for Rev developers: http://www.revjournal.com
  revJournal blog: http://revjournal.com/blog.irv



More information about the use-livecode mailing list