Database or large text file processing??
alex at tweedly.net
Mon Oct 10 06:42:33 CDT 2005
Ian Leigh wrote:
> I wish to retrieve details about a particular file which are held in
> a text file, fairly large at about 264000 lines. The file contains
> details about many files but I only need to retrieve one file at a
> time. I am wondering about the best way to deal with this. The text
> file is set out in such a way that the filename is found first (with
> a specific character) and the information about follows. It is always
> in the same format but obviously some have more text than others (is
> this making sense) so there aren't specific field lengths etc..
So I take it your file looks something like
info about filename1
which can be guaranteed to not contain any plus signs
because that's the "specific character" you mention
with not much info
+ <filename3> etc.
And each time the program is run, you get a filename (from the user) and
on;y want the info on that one file.
Is that close enough to an accurate description ?
> I could try and bring that file in and use arrays I suppose but I
> don't know what effect that would have on performance and filesize.
You might need, or want, arrays in other languages, but don't need them
in Rev for this kind of thing.
> I don't know anything about using databases with rev but I wonder if
> using them would be a more elegant solution.
Don't see anything there that needs a database.
> I don't want to have to import the text file every time the program
> is run and the text file itself is subject to occasional updating.
> This led me to think that just search through the text file itself
> might make it a bit more robust but I could do with some advice about
> which way to go.
You'll want to read the file each time (to deal with the updating
issue). But it will be really, really quick.
I don't think 240K lines (say < 10M) should be a problem reading into a
variable in Rev, so I'd recommend (at least as a first try), something like
Here's a little script I tried out
> on mouseUp
> local tInfo, tFile, tCatalog
> answer file "Specify a catalog file"
> put it into tCatalog
> put fld "Input" into tFile
> put getFileInfo(tCatalog, tFile) into tInfo
> put tInfo into fld "Field"
> end mouseUp
> function getFileInfo pCatalog, pFile
> local tAlldata, tStart, tEnd
> put URL ("file:" & pCatalog) into tAllData
> put lineoffset("+ " & pFile, tAllData) into tStart
> if tStart = 0 then
> -- file not found
> return "file " & pFile & " not found"
> end if
> put lineOffset("+", tAllData, tStart) into tEnd
> if tEnd = 0 then
> put -1 into tEnd
> put tStart + tEnd -1 into tEnd
> end if
> return line tSTart to tEnd of tAllData
> end getFileInfo
and it retrieves the info in < 1 second on a 242439 line file (finding a
file about 100 lines from the end).
Hope that gives you some ideas ...
Alex Tweedly http://www.tweedly.net
No virus found in this outgoing message.
Checked by AVG Anti-Virus.
Version: 7.0.344 / Virus Database: 267.11.14/127 - Release Date: 10/10/2005
More information about the use-livecode