Importing data into RevDB
Richard Gaskin
ambassador at fourthworld.com
Fri Dec 4 19:51:55 EST 2009
David Coker wrote:
> Hi Richard!
>>If you have it in Excel, can you export it using tab-delimited?
>
> Yes sir, been there and tried that several different ways.
> Unfortunately, that creates issues of a different sort. As a really
> poor example, this is one of the things I continue to run across
> after converting to tabbed format..
>
> Original example:
> blah blah,"doodah somestupidgarbagecharacterorlinefeed doodah
> doodah",12345abc
>
> Converted to tab delimited:
> blah blah(tab)"doodah somestupidgarbagecharacterorlinefeed doodah
> doodah"(tab)12345abc
>
> When saved, tab delimited format all too often renders something like
> this:
>
> blah blah,"doodah somestupidgarbagecharacter
> doodah
> doodah"(tab)12345abc
>
> Gone is any hope of reusing the file data.
I wonder if it might be worth doing a replace on
somestupidgarbagecharacterorlinefeed to something like
"_mydumbplaceholderthang_" (or any arbitrary string unlikely to appear
in the data), then do your parsing and as the last step replace your
placeholder with the linefeed char again.
For parsing WebMerge templates I use placeholders a lot as a convenient
way to get odd characters and strings out of the way so I can do the
heavy work, putting them back when needed.
> That was actually the whole point of my original question about
> importing csv into a database. At that point I could likely pull
> the data out field by field as required and run scanners to clean
> it up enough to be used in a tab delimited format. At that point,
> it would be easy to work with using Rev in any number of ways.
If the final destination is one of the more common DBMSes out there like
SQLite or MySQL, there's got to be a CSV import filter available for
them, no?
> BTW, I used your Webmerge program last night for the first time in
> a pretty long while... As part of a test run I was doing, it created
> 81,000+ html pages in just over 90 minutes. :-)
If it took that long I can tell you most of the processing time was
spent parsing the CSV.
Internally, WM uses the same format as FileMaker's Merge format,
tab-delimited without added quotes, escaping tabs in values with ASCII 4
and returns in values with ASCII 11. All supported formats (CSV, pipes,
Merge, etc.) get translated to that internal format so the actual
template processing can be standardized and fairly well optimized.
If your data was in such a format to begin with, or even using the most
common pipe- or tab-delimited schemes which don't add quotes and use a
standard escape sequence for returns, it would complete those 81,000
pages in just a few minutes.
Here's an example from our Gallery page:
"On my first use of the full program yesterday, WebMerge generated
4.5MB of clean, error free HTML in less than 9 seconds.
I have one customer who cranks out more than 300,000 pages at a time in
well under an hour, and his templates are fairly complex.
With most templates the processing time after parsing the data file is a
fraction of a second per page.
For example, the tutorial set included with the demo generally finishes
its 20 pages in well under a second. In fact, when I first made WM I
set up the results dialog to show time spent in minutes, and that was
too long so I added seconds, but even that was too long so I had to go
back and revise it to be able to show elapsed time in milliseconds. :)
And this is the slow version. I originally set up WebMerge to use a
template syntax that mimic's FileMaker's but as time goes on our
customer base now includes very few people for whom familiarity with FMP
matters, so in a future version we'll be able to use an alternate
template syntax that lets us move most of the processing directly into
the Rev engine with the merge function, similar to how on-rev works.
Compared with the careful parsing of the FMP-style tags we do now, this
change will drop per-page processing times to a few milliseconds on
average, and for simpler templates even less.
SuperCard's merge function was one of the best things ever added to the
Rev engine. For all its convenient power, until on-rev expanded and
popularized it it was one of the most under-utilized powerhouses in the
language.
--
Richard Gaskin
Fourth World
Rev training and consulting: http://www.fourthworld.com
Webzine for Rev developers: http://www.revjournal.com
revJournal blog: http://revjournal.com/blog.irv
More information about the use-livecode
mailing list