Problem parsing data in Gigabyte size text files
Dave
dave at looktowindward.com
Fri Jul 6 05:41:44 EDT 2007
Hi,
That sounds like a better approach to me too, however if the problem
is because the file is > 2GB (or whatever the limit is on Windows)
then it still won't work.
All the Best
Dave
On 5 Jul 2007, at 13:20, Andre Garzia wrote:
> Alejandro,
> if this is that kind of XML that has a simple record structure and is
> repeated over and over again like a phone book, then why don't you
> break it
> into smaller edible chunks and insert it into something like SQLite or
> Valentina chunk by chunk. By using a RDBMS you'll be able to query
> and make
> sense of the XML data easily, and those databases will have no problem
> dealing with large data sets.
>
> because, even if you manage to load 8gb of data in Rev,
> manipulating it will
> be kind slow I think, just imagine the loops needed to make cross
> references
> like find everyone who was born in july and is between 30 and 40
> years....
>
> I'd make a little software to go piece by piece inserting this into a
> database and then begin again from there.
>
> Andre
>
> On 7/4/07, Alejandro Tejada <capellan2000 at yahoo.com> wrote:
>>
>> Hi all,
>>
>> Recently, i was extracting data
>> from a 8 gigabyte ANSI text file
>> (a XML customer database), but after
>> processing approximately 3.5 gigabyte
>> of data, Revolution quits itself and
>> Windows XP presents the familiar dialog
>> asking to notify the Developer of this
>> error.
>>
>> The log file that i saved, while using
>> the stack, shows that after reading character
>> 3,758,096,384 (that is more than 3 thousand million
>> of characters) the stack could not read anymore
>> into the XML database and start repeating the
>> same last line of text that it reads.
>>
>> Notice that i checked the processor and memory use
>> with Windows Task Manager and everything was normal.
>> The stack was using between a 30 to 70 % of processor
>> and memory use was between 45 MB and 125 MB.
>>
>> The code used is similar to this:
>>
>> repeat until tCounter = 8589934592 -- 8 Gigabites
>> read from file tData from char tCounter for 10000
>> -- reading 10,000 characters from database
>> -- these character are placed in the variable: it
>> put processDATA(it) into tProcessedData
>> write tProcessedData to tNewFile
>> put tCounter && last line of it & cr after URL tLOG
>> add 10000 to tCounter
>> end repeat
>>
>> etc...
>>
>> I have repeated the test at least 3 times :((
>> and the results are almost the same, with a small
>> difference between the character where stack quits,
>> while reading this 8 Gigabyte size XML database.
>>
>> I have checked for strange characters in that part of
>> the database, when i splitted the file in many parts,
>> but have not found any.
>>
>> Every insight that you could provide to process
>> this database from start to end is more
>> than welcome. :)
>>
>> Thanks in advance.
>>
>> alejandro
>>
>>
>> Visit my site:
>> http://www.geocities.com/capellan2000/
>>
>>
>>
>> ____________________________________________________________
>> ________________________
>> Sucker-punch spam with award-winning protection.
>> Try the free Yahoo! Mail Beta.
>> http://advision.webevents.yahoo.com/mailbeta/features_spam.html
>> _______________________________________________
>> use-revolution mailing list
>> use-revolution at lists.runrev.com
>> Please visit this url to subscribe, unsubscribe and manage your
>> subscription preferences:
>> http://lists.runrev.com/mailman/listinfo/use-revolution
>>
> _______________________________________________
> use-revolution mailing list
> use-revolution at lists.runrev.com
> Please visit this url to subscribe, unsubscribe and manage your
> subscription preferences:
> http://lists.runrev.com/mailman/listinfo/use-revolution
More information about the use-livecode
mailing list