text files with quotes
David Vaughan
drvaughan55 at mac.com
Wed May 15 21:28:01 EDT 2002
Mark
Interesting thing. Simply putting "Parsing record" into a field is
costing you half your processing time.
I took your data sample and replicated it 1001 times into a field. With
one minor modification, your script processed that data in 13 seconds on
my machine. That mod was to take the data out of field "myField" and put
it into a variable. I then ran my script (below) against the same data
and it completed in six seconds. I commented out the lines to put
"parsing data" into a field and it completed in zero seconds (well, not
quite of course but I did not measure the milliseconds and it was less
than 500mS).
The main differences I have made have been
- put the data in a variable to process it
- use "repeat for each", not "repeat with x = 1 to N"
and some minor code changes which I think should be faster anyway.
If you have a lot of data and want a progress statement, make it every
hundred or thousand records or so. It really saves time on display while
testing where you are up to is practically cost-free.
The script:
on mouseUp
-- Timing statement just for fun
put the seconds into sTime
put empty into field "f2"
put field "f1" into inList -- Always faster to process from a variable
than a field
put 0 into recNum
repeat for each line inLine in inList -- Prefer this form of repeat
wherever you can
add 1 to recNum
-- The processing part
repeat while inLine contains quote
put offset(quote,inLine) into x
put offset(quote,inLine,x) + x into y
put char x to y of inLine into phrase
replace "," with empty in phrase
put char 2 to -2 of phrase into char x to y of inLine
end repeat
-- Continue parsing by item to create outLine(s)
put "Parsing record..." & recNum into field "status"
-- The following should probably be a function call like "put r2(inLine)
into outLine"
-- send mouseUp to button "r2"
put inLine & return after outList
end repeat
put outList into field "f2"
put (the seconds - sTime) into field "status"
end mouseUp
Try it and see how it goes.
regards
David
On Thursday, May 16, 2002, at 10:51 , BCE wrote:
> David,
>
> Thanks for the response :-)
>
> A sample line in a file I might want to parse:
> 1,"5/14/02, 10:00 AM",Pending,Batch0001,"This is a memo field, gee it's
> cool",45362
>
> This line is comma delimited, but if I try to separate it that way, the
> interior quoted commas will mess things up, so I wrote a routine to fix
> that.
>
> The code is pasted below. It first sets the item delimiter to a quote,
> and
> gets rid of any commas inside quotes, then sends the routine on to
> another
> function to replace the comma delimiters with tabs (or whatever). It
> uses
> "mod" to check for "0" remainders, thus it can distinguish where to
> strip
> out quoted commas, as the items delimited by quotes will be divisible
> by 2.
> This was the best method I could figure at the time. It also posts what
> line number it is working on in a status field.
>
> Incidentally, if ALL fields were surrounded by quotes, I could easily
> do a
> replace using
> replace (quote & "," & quote) with tab in field "myfile"
>
> In fact, I tested it, and it parses the file very quickly, but since
> only a
> few items are quoted, it's not the right tool.
>
>
> The code:
>
>
> on mouseup
> set the itemdelimiter to quote
> answer the number of items in line 1 of field "myfile"
> put 1 into x
> repeat for the number of lines of field "myfile"
> put 1 into n
>
> repeat for the number of items in line x of field "myfile"
> if n mod 2=0 then
> replace "," with " " in item n in line x of field "myfile"
> end if
> add 1 to n
> end repeat
>
> put "Parsing record " & x & "..." into field "status"
> add 1 to x
> end repeat
> send mouseup to button "r2"
> end mouseup
>
> Thanks again for any ideas on this.
>
> Mark
>
>
>
> ----- Original Message -----
> From: "David Vaughan" <drvaughan55 at mac.com>
> To: <use-revolution at lists.runrev.com>
> Sent: Wednesday, May 15, 2002 7:46 PM
> Subject: Re: text files with quotes
>
>
>>
>> On Thursday, May 16, 2002, at 04:56 , Mark Paris wrote:
>>
>>> Anyone have a good method of parsing a text file that is delimmited by
>>> commas, yet has quotes (") surrounding SOME of the records, which in
>>> turn
>>> may have commas inside of them (such as a notes field)? (!)
>>>
>>> The only routine I made for this was long and ugly, taking a second
>>> per
>>> record. I first replaced commas inside of quotes with a blank space,
>>> then
>>> got rid of the quotes, then parsed by comma.
>>
>> Mark
>>
>> I am not sure there is a trivial way of doing this but your second per
>> record sounds vastly too long to me. Rev is very fast at text
>> processing
>> in my experience, after a little familiarity with its nifty features.
>> Are you able to post a sample of your data and your script (or a
>> fragment) to handle it? I have some thoughts about it but need some
>> more
>> material to be sure of what might work well. I am prepared to bet that,
>> if not me, then someone will pounce with an improvement for you.
>>
>> regards
>> David
>>>
>>> Is there a routine I might be missing? Thanks!
>>>
>>> Mark
>>>
>>>
>>> _______________________________________________
>>> use-revolution mailing list
>>> use-revolution at lists.runrev.com
>>> http://lists.runrev.com/mailman/listinfo/use-revolution
>>>
>>
>> _______________________________________________
>> use-revolution mailing list
>> use-revolution at lists.runrev.com
>> http://lists.runrev.com/mailman/listinfo/use-revolution
>
> _______________________________________________
> use-revolution mailing list
> use-revolution at lists.runrev.com
> http://lists.runrev.com/mailman/listinfo/use-revolution
>
More information about the use-livecode
mailing list