Importing data into RevDB

Richard Gaskin ambassador at fourthworld.com
Fri Dec 4 19:51:55 EST 2009


David Coker wrote:

 > Hi Richard!
 >>If you have it in Excel, can you export it using tab-delimited?
 >
 > Yes sir, been there and tried that several different ways.
 > Unfortunately, that creates issues of a different sort. As a really
 > poor example, this is one of the things I continue to run across
 > after converting to tabbed format..
 >
 > Original example:
 > blah blah,"doodah somestupidgarbagecharacterorlinefeed doodah
 > doodah",12345abc
 >
 > Converted to tab delimited:
 > blah blah(tab)"doodah somestupidgarbagecharacterorlinefeed doodah
 > doodah"(tab)12345abc
 >
 > When saved, tab delimited format all too often renders something like
 > this:
 >
 > blah blah,"doodah somestupidgarbagecharacter
 > doodah
 > doodah"(tab)12345abc
 >
 > Gone is any hope of reusing the file data.

I wonder if it might be worth doing a replace on 
somestupidgarbagecharacterorlinefeed to something like 
"_mydumbplaceholderthang_" (or any arbitrary string unlikely to appear 
in the data), then do your parsing and as the last step replace your 
placeholder with the linefeed char again.

For parsing WebMerge templates I use placeholders a lot as a convenient 
way to get odd characters and strings out of the way so I can do the 
heavy work, putting them back when needed.


 > That was actually the whole point of my original question about
 > importing csv into a database. At that point I could likely pull
 > the data out field by field as required and run scanners to clean
 > it up enough to be used in a tab delimited format. At that point,
 > it would be easy to work with using Rev in any number of ways.

If the final destination is one of the more common DBMSes out there like 
SQLite or MySQL, there's got to be a CSV import filter available for 
them, no?


 > BTW, I used your Webmerge program last night for the first time in
 > a pretty long while... As part of a test run I was doing, it created
 > 81,000+ html pages in just over 90 minutes. :-)

If it took that long I can tell you most of the processing time was 
spent parsing the CSV.

Internally, WM uses the same format as FileMaker's Merge format, 
tab-delimited without added quotes, escaping tabs in values with ASCII 4 
and returns in values with ASCII 11.  All supported formats (CSV, pipes, 
Merge, etc.) get translated to that internal format so the actual 
template processing can be standardized and fairly well optimized.

If your data was in such a format to begin with, or even using the most 
common pipe- or tab-delimited schemes which don't add quotes and use a 
standard escape sequence for returns, it would complete those 81,000 
pages in just a few minutes.

Here's an example from our Gallery page:

    "On my first use of the full program yesterday, WebMerge generated
     4.5MB of clean, error free HTML in less than 9 seconds.

I have one customer who cranks out more than 300,000 pages at a time in 
well under an hour, and his templates are fairly complex.

With most templates the processing time after parsing the data file is a 
fraction of a second per page.

For example, the tutorial set included with the demo generally finishes 
its 20 pages in well under a second.  In fact, when I first made WM I 
set up the results dialog to show time spent in minutes, and that was 
too long so I added seconds, but even that was too long so I had to go 
back and revise it to be able to show elapsed time in milliseconds. :)

And this is the slow version.  I originally set up WebMerge to use a 
template syntax that mimic's FileMaker's but as time goes on our 
customer base now includes very few people for whom familiarity with FMP 
matters, so in a future version we'll be able to use an alternate 
template syntax that lets us move most of the processing directly into 
the Rev engine with the merge function, similar to how on-rev works.

Compared with the careful parsing of the FMP-style tags we do now, this 
change will drop per-page processing times to a few milliseconds on 
average, and for simpler templates even less.

SuperCard's merge function was one of the best things ever added to the 
Rev engine.  For all its convenient power, until on-rev expanded and 
popularized it it was one of the most under-utilized powerhouses in the 
language.

--
  Richard Gaskin
  Fourth World
  Rev training and consulting: http://www.fourthworld.com
  Webzine for Rev developers: http://www.revjournal.com
  revJournal blog: http://revjournal.com/blog.irv



More information about the use-livecode mailing list