Massive XML docs
Ruslan Zasukhin
sunshine at public.kherson.ua
Wed Feb 8 11:30:50 EST 2006
On 2/8/06 5:50 PM, "Todd Geist" <tg.lists at geistinteractive.com> wrote:
Hi Todd,
> Hello Everyone.
>
> I have some very large XML files ( 100mb or more) that I would to
> parse. And this thing has to be fast. I mean FAST. Since I have done
> almost no work with rev and XML, I am looking for advice on how to
> proceed
>
> The user experience needs to be some thing like this...
>
> I select the XML and very quickly I see the info about the top level
> nodes
>
> I select one of the top level nodes and I am very quickly presented
> with more detail on it. Which may include all of it's children and
> also information from other parts of the XML document that is related.
>
> I continue to work my way around the document, by simple selecting
> the elements that I need more info on.
> At no point will a user ever need to see all the data that is in the
> XML doc. They are almost always looking for info on just one element
> buried in there. What I don't know is if the user is just walking
> around the xml document one piece at a time, do I need to load the
> whole thing into RAM in a revXMLTree, and If I do, won't that just be
> insane for 100mb xml file. Or can I do it one step at a time as the
> user requests more detailed info.
RevXMLTree this is DOM model ?
If yes, then expect that such RAM tree will have size 5 times more than
original XML document. On the other hand DOM is fastest way to iterate tree.
> Can Valentina or SQLlite be employed to help the situation?
> Or would it be faster to parse it into a whole slew of custom props?
> Any ideas and or thoughts would be much appreciated
Todd, you have touch just HUGE issue.
First of all it needs better understand your task:
- so you parse document. Extract info, what next?
your user must be able do queries to it ?
then it sounds like DBMS job.
For your task in the world exists few streams:
A) work on XML itself - so called Native XML dbs
B) put XML into database.
It looks that major DBMS vendors as Oracle, IBM and MS do win war,
So stream b) becomes the main
I can make small announce -- that we develop in Valentina during last few
months new XML features which will be comparable to Oracle, IBM and MS. Soon
we will introduce first wave of them.
To get more details please subscribe to Valentina beta list.
--
Best regards,
Ruslan Zasukhin
VP Engineering and New Technology
Paradigma Software, Inc
Valentina - Joining Worlds of Information
http://www.paradigmasoft.com
[I feel the need: the need for speed]
More information about the use-livecode
mailing list