Problem parsing data in Gigabyte size text files

Andre Garzia andre at andregarzia.com
Fri Jul 6 10:03:36 CDT 2007


Fat32 has a 4gb file size limit, but I think NTFS or whatever windows uses
nowadays is not limited like this.

Cheers
andre


On 7/6/07, Dave <dave at looktowindward.com> wrote:
>
> Hi,
>
> That sounds like a better approach to me too, however if the problem
> is because the file is > 2GB (or whatever the limit is on Windows)
> then it still won't work.
>
> All the Best
> Dave
>
> On 5 Jul 2007, at 13:20, Andre Garzia wrote:
>
> > Alejandro,
> > if this is that kind of XML that has a simple record structure and is
> > repeated over and over again like a phone book, then why don't you
> > break it
> > into smaller edible chunks and insert it into something like SQLite or
> > Valentina chunk by chunk. By using a RDBMS you'll be able to query
> > and make
> > sense of the XML data easily, and those databases will have no problem
> > dealing with large data sets.
> >
> > because, even if you manage to load 8gb of data in Rev,
> > manipulating it will
> > be kind slow I think, just imagine the loops needed to make cross
> > references
> > like find everyone who was born in july and is between 30 and 40
> > years....
> >
> > I'd make a little software to go piece by piece inserting this into a
> > database and then begin again from there.
> >
> > Andre
> >
> > On 7/4/07, Alejandro Tejada <capellan2000 at yahoo.com> wrote:
> >>
> >> Hi all,
> >>
> >> Recently, i was extracting data
> >> from a 8 gigabyte ANSI text file
> >> (a XML customer database), but after
> >> processing approximately 3.5 gigabyte
> >> of data, Revolution quits itself and
> >> Windows XP presents the familiar dialog
> >> asking to notify the Developer of this
> >> error.
> >>
> >> The log file that i saved, while using
> >> the stack, shows that after reading character
> >> 3,758,096,384 (that is more than 3 thousand million
> >> of characters) the stack could not read anymore
> >> into the XML database and start repeating the
> >> same last line of text that it reads.
> >>
> >> Notice that i checked the processor and memory use
> >> with Windows Task Manager and everything was normal.
> >> The stack was using between a 30 to 70 % of processor
> >> and memory use was between 45 MB and 125 MB.
> >>
> >> The code used is similar to this:
> >>
> >> repeat until tCounter = 8589934592 -- 8 Gigabites
> >> read from file tData from char tCounter for 10000
> >> -- reading 10,000 characters from database
> >> -- these character are placed in the variable: it
> >> put processDATA(it) into tProcessedData
> >> write tProcessedData to tNewFile
> >> put tCounter && last line of it & cr after URL tLOG
> >> add 10000 to tCounter
> >> end repeat
> >>
> >> etc...
> >>
> >> I have repeated the test at least 3 times :((
> >> and the results are almost the same, with a small
> >> difference between the character where stack quits,
> >> while reading this 8 Gigabyte size XML database.
> >>
> >> I have checked for strange characters in that part of
> >> the database, when i splitted the file in many parts,
> >> but have not found any.
> >>
> >> Every insight that you could provide to process
> >> this database from start to end is more
> >> than welcome. :)
> >>
> >> Thanks in advance.
> >>
> >> alejandro
> >>
> >>
> >> Visit my site:
> >> http://www.geocities.com/capellan2000/
> >>
> >>
> >>
> >> ____________________________________________________________
> >> ________________________
> >> Sucker-punch spam with award-winning protection.
> >> Try the free Yahoo! Mail Beta.
> >> http://advision.webevents.yahoo.com/mailbeta/features_spam.html
> >> _______________________________________________
> >> use-revolution mailing list
> >> use-revolution at lists.runrev.com
> >> Please visit this url to subscribe, unsubscribe and manage your
> >> subscription preferences:
> >> http://lists.runrev.com/mailman/listinfo/use-revolution
> >>
> > _______________________________________________
> > use-revolution mailing list
> > use-revolution at lists.runrev.com
> > Please visit this url to subscribe, unsubscribe and manage your
> > subscription preferences:
> > http://lists.runrev.com/mailman/listinfo/use-revolution
>
> _______________________________________________
> use-revolution mailing list
> use-revolution at lists.runrev.com
> Please visit this url to subscribe, unsubscribe and manage your
> subscription preferences:
> http://lists.runrev.com/mailman/listinfo/use-revolution
>



More information about the use-livecode mailing list