Problem parsing data in Gigabyte size text files
andre at andregarzia.com
Fri Jul 6 10:03:36 CDT 2007
Fat32 has a 4gb file size limit, but I think NTFS or whatever windows uses
nowadays is not limited like this.
On 7/6/07, Dave <dave at looktowindward.com> wrote:
> That sounds like a better approach to me too, however if the problem
> is because the file is > 2GB (or whatever the limit is on Windows)
> then it still won't work.
> All the Best
> On 5 Jul 2007, at 13:20, Andre Garzia wrote:
> > Alejandro,
> > if this is that kind of XML that has a simple record structure and is
> > repeated over and over again like a phone book, then why don't you
> > break it
> > into smaller edible chunks and insert it into something like SQLite or
> > Valentina chunk by chunk. By using a RDBMS you'll be able to query
> > and make
> > sense of the XML data easily, and those databases will have no problem
> > dealing with large data sets.
> > because, even if you manage to load 8gb of data in Rev,
> > manipulating it will
> > be kind slow I think, just imagine the loops needed to make cross
> > references
> > like find everyone who was born in july and is between 30 and 40
> > years....
> > I'd make a little software to go piece by piece inserting this into a
> > database and then begin again from there.
> > Andre
> > On 7/4/07, Alejandro Tejada <capellan2000 at yahoo.com> wrote:
> >> Hi all,
> >> Recently, i was extracting data
> >> from a 8 gigabyte ANSI text file
> >> (a XML customer database), but after
> >> processing approximately 3.5 gigabyte
> >> of data, Revolution quits itself and
> >> Windows XP presents the familiar dialog
> >> asking to notify the Developer of this
> >> error.
> >> The log file that i saved, while using
> >> the stack, shows that after reading character
> >> 3,758,096,384 (that is more than 3 thousand million
> >> of characters) the stack could not read anymore
> >> into the XML database and start repeating the
> >> same last line of text that it reads.
> >> Notice that i checked the processor and memory use
> >> with Windows Task Manager and everything was normal.
> >> The stack was using between a 30 to 70 % of processor
> >> and memory use was between 45 MB and 125 MB.
> >> The code used is similar to this:
> >> repeat until tCounter = 8589934592 -- 8 Gigabites
> >> read from file tData from char tCounter for 10000
> >> -- reading 10,000 characters from database
> >> -- these character are placed in the variable: it
> >> put processDATA(it) into tProcessedData
> >> write tProcessedData to tNewFile
> >> put tCounter && last line of it & cr after URL tLOG
> >> add 10000 to tCounter
> >> end repeat
> >> etc...
> >> I have repeated the test at least 3 times :((
> >> and the results are almost the same, with a small
> >> difference between the character where stack quits,
> >> while reading this 8 Gigabyte size XML database.
> >> I have checked for strange characters in that part of
> >> the database, when i splitted the file in many parts,
> >> but have not found any.
> >> Every insight that you could provide to process
> >> this database from start to end is more
> >> than welcome. :)
> >> Thanks in advance.
> >> alejandro
> >> Visit my site:
> >> http://www.geocities.com/capellan2000/
> >> ____________________________________________________________
> >> ________________________
> >> Sucker-punch spam with award-winning protection.
> >> Try the free Yahoo! Mail Beta.
> >> http://advision.webevents.yahoo.com/mailbeta/features_spam.html
> >> _______________________________________________
> >> use-revolution mailing list
> >> use-revolution at lists.runrev.com
> >> Please visit this url to subscribe, unsubscribe and manage your
> >> subscription preferences:
> >> http://lists.runrev.com/mailman/listinfo/use-revolution
> > _______________________________________________
> > use-revolution mailing list
> > use-revolution at lists.runrev.com
> > Please visit this url to subscribe, unsubscribe and manage your
> > subscription preferences:
> > http://lists.runrev.com/mailman/listinfo/use-revolution
> use-revolution mailing list
> use-revolution at lists.runrev.com
> Please visit this url to subscribe, unsubscribe and manage your
> subscription preferences:
More information about the use-livecode