Database or large text file processing??

Alex Tweedly alex at tweedly.net
Mon Oct 10 07:42:33 EDT 2005


Ian Leigh wrote:

>
> I wish to retrieve details about a particular file which are held in  
> a text file, fairly large at about 264000 lines. The file contains  
> details about many files but I only need to retrieve one file at a  
> time. I am wondering about the best way to deal with this. The text  
> file is set out in such a way that the filename is found first (with  
> a specific character) and the information about follows. It is always  
> in the same format but obviously some have more text than others (is  
> this making sense) so there aren't specific field lengths etc..
>
So I take it your file looks something like
+ <filename1>
info about filename1
which can be guaranteed to not contain any plus signs
because that's the "specific character" you mention
+ <filename2>
with not much info
+ <filename3> etc.

And each time the program is run, you get a filename (from the user) and 
on;y want the info on that one file.
Is that close enough to an accurate description ?

> I could try and bring that file in and use arrays I suppose but I  
> don't know what effect that would have on performance and filesize. 

You might need, or want, arrays in other languages, but don't need them 
in Rev for this kind of thing.

> I  don't know anything about using databases with rev but I wonder if  
> using them would be a more elegant solution. 

Don't see anything there that needs a database.

> I don't want to have to  import the text file every time the program 
> is run and the text file  itself is subject to occasional updating. 
> This led me to think that  just search through the text file itself 
> might make it a bit more  robust but I could do with some advice about 
> which way to go.
>
You'll want to read the file each time (to deal with the updating 
issue). But it will be really, really quick.

I don't think 240K lines (say < 10M) should be a problem reading into a 
variable in Rev, so I'd recommend (at least as a first try), something like

Here's a little script I tried out

> on mouseUp
>   local tInfo, tFile, tCatalog
>   answer file "Specify a catalog file"
>   put it into tCatalog
>   put fld "Input" into tFile
>   put getFileInfo(tCatalog, tFile) into tInfo
>   put tInfo into fld "Field"
> end mouseUp
>
> function getFileInfo pCatalog, pFile
>   local tAlldata, tStart, tEnd
>   put URL ("file:" & pCatalog) into tAllData
>  
>   put lineoffset("+ " & pFile, tAllData) into tStart
>   if tStart = 0 then
>     -- file not found
>     return "file " & pFile & " not found"
>   end if
>   put lineOffset("+", tAllData, tStart) into tEnd
>   if tEnd = 0 then
>     put -1 into tEnd
>   else
>     put tStart + tEnd -1 into tEnd
>   end if
>   return line tSTart to tEnd of tAllData
> end getFileInfo

and it retrieves the info in < 1 second on a 242439 line file (finding a 
file about 100 lines from the end).

Hope that gives you some ideas ...

-- 
Alex Tweedly       http://www.tweedly.net



-- 
No virus found in this outgoing message.
Checked by AVG Anti-Virus.
Version: 7.0.344 / Virus Database: 267.11.14/127 - Release Date: 10/10/2005




More information about the use-livecode mailing list