Importing data into RevDB
ambassador at fourthworld.com
Fri Dec 4 18:30:14 EST 2009
David Coker wrote:
> Thanks for the suggestions and example Jim!
> Of course I'm using sample data to test with, but the real files
> that need to be worked with are pretty massive. To give you an idea
> of just how massive...
> I have a subset of data that I've been working with for over a week
> now, massaging text, filing in blank fields (concatenation in Excel)
> and thought I had everything taken care of as of late last night. I
> found out this morning that they needed one more column of data merged
> into my "finished" file.
If you have it in Excel, can you export it using tab-delimited?
Tabs rarely occur in field data, and if the field values don't contain
tabs or returns you'd be able to use normal chunk expressions for
orders-of-magnitude better performance.
>>Just so you know CSV is the second worst format ever invented.
>>They are still searching for worst one, but have not found it yet.
> <BIG GRIN> No question about it! </BIG GRIN>
They found it: it was another form of CSV. ;)
That's one of the many problems with CSV: it isn't a single defined
format, but rather a collection of ad hoc variants. I've seen
differences in escaping and quotation used among even products from just
Microsoft, and in different versions of the same Microsoft products, not
to mention the even greater number of variants used by other programs.
Some use quotes around every field value, others use quotes around only
textual values but not numbers, others use quotes around only multi-word
values but not around text that contains a single word, and others
escape quotes that are in values with a preceding slash, others escape
quotes by using double quotes (a total Whiskey Tango Foxtrot
"solution"), others also escape returns in values while many leave
returns unescaped requiring you to figure it out character-by-character,
and others do even weirder things....
I've had to write CSV parsers, using flags as Jim outlined. After
seeing the loss of productivity and performance from those formats, I
now have a policy of never delivering any product with a CSV export
option, on moral grounds. :)
Any format that uses delimiter characters as commonly used in field
values as a comma is, to be as polite as possible, a stupid invention,
almost an anti-invention.
In a just world, whomever first deployed a system that used CSV would be
found and put in stocks in the public square with a sign reading:
"I'm responsible for the loss of several million hours
of other people's time."
Rev training and consulting: http://www.fourthworld.com
Webzine for Rev developers: http://www.revjournal.com
revJournal blog: http://revjournal.com/blog.irv
More information about the Use-livecode