text files with quotes

David Vaughan drvaughan55 at mac.com
Wed May 15 21:28:01 EDT 2002


Mark

Interesting thing. Simply putting "Parsing record" into a field is 
costing you half your processing time.

I took your data sample and replicated it 1001 times into a field. With 
one minor modification, your script processed that data in 13 seconds on 
my machine. That mod was to take the data out of field "myField" and put 
it into a variable. I then ran my script (below) against the same data 
and it completed in six seconds. I commented out the lines to put 
"parsing data" into a field and it completed in zero seconds (well, not 
quite of course but I did not measure the milliseconds and it was less 
than 500mS).

The main differences I have made have been
- put the data in a variable to process it
- use "repeat for each", not "repeat with x = 1 to N"
and some minor code changes which I think should be faster anyway.

If you have a lot of data and want a progress statement, make it every 
hundred or thousand records or so. It really saves time on display while 
testing where you are up to is practically cost-free.

The script:

on mouseUp
-- Timing statement just for fun
   put the seconds into sTime
   put empty into field "f2"
   put field "f1" into inList -- Always faster to process from a variable 
than a field
   put 0 into recNum

   repeat for each line inLine in inList -- Prefer this form of repeat 
wherever you can
     add 1 to recNum

-- The processing part
     repeat while inLine contains quote
       put offset(quote,inLine) into x
       put offset(quote,inLine,x) + x into y
       put char x to y of inLine into phrase
       replace "," with empty in phrase
       put char 2 to -2 of phrase into char x to y of inLine
     end repeat

-- Continue parsing by item to create outLine(s)
     put "Parsing record..." & recNum into field "status"
-- The following should probably be a function call like "put r2(inLine) 
into outLine"
-- send mouseUp to button "r2"
     put inLine & return after outList
   end repeat

   put outList into field "f2"
   put (the seconds - sTime) into field "status"
end mouseUp

Try it and see how it goes.

regards
David

On Thursday, May 16, 2002, at 10:51 , BCE wrote:

> David,
>
> Thanks for the response  :-)
>
> A sample line in a file I might want to parse:
> 1,"5/14/02, 10:00 AM",Pending,Batch0001,"This is a memo field, gee it's
> cool",45362
>
> This line is comma delimited, but if I try to separate it that way, the
> interior quoted commas will mess things up, so I wrote a routine to fix
> that.
>
> The code is pasted below.  It first sets the item delimiter to a quote, 
> and
> gets rid of any commas inside quotes, then sends the routine on to 
> another
> function to replace the comma delimiters with tabs (or whatever).  It 
> uses
> "mod" to check for "0" remainders, thus it can distinguish where to 
> strip
> out quoted commas, as the items delimited by quotes will be divisible 
> by 2.
> This was the best method I could figure at the time.  It also posts what
> line number it is working on in a status field.
>
> Incidentally, if ALL fields were surrounded by quotes, I could easily 
> do a
> replace using
> replace (quote & "," & quote) with tab in field "myfile"
>
> In fact, I tested it, and it parses the file very quickly, but since 
> only a
> few items are quoted, it's not the right tool.
>
>
> The code:
>
>
> on mouseup
>   set the itemdelimiter to quote
>   answer the number of items in line 1 of field "myfile"
>   put 1 into x
>   repeat for the number of lines of field "myfile"
>   put 1 into n
>
>   repeat for the number of items in line x of field "myfile"
>     if n mod 2=0 then
>       replace "," with " " in item n in line x of field "myfile"
>       end if
>     add 1 to n
>   end repeat
>
>   put "Parsing record " & x & "..." into field "status"
>   add 1 to x
> end repeat
> send mouseup to button "r2"
> end mouseup
>
> Thanks again for any ideas on this.
>
> Mark
>
>
>
> ----- Original Message -----
> From: "David Vaughan" <drvaughan55 at mac.com>
> To: <use-revolution at lists.runrev.com>
> Sent: Wednesday, May 15, 2002 7:46 PM
> Subject: Re: text files with quotes
>
>
>>
>> On Thursday, May 16, 2002, at 04:56 , Mark Paris wrote:
>>
>>> Anyone have a good method of parsing a text file that is delimmited by
>>> commas, yet has quotes (") surrounding SOME of the records, which in
>>> turn
>>> may have commas inside of them (such as a notes field)?  (!)
>>>
>>> The only routine I made for this was long and ugly, taking a second 
>>> per
>>> record.  I first replaced commas inside of quotes with a blank space,
>>> then
>>> got rid of the quotes, then parsed by comma.
>>
>> Mark
>>
>> I am not sure there is a trivial way of doing this but your second per
>> record sounds vastly too long to me. Rev is very fast at text 
>> processing
>> in my experience, after a little familiarity with its nifty features.
>> Are you able to post a sample of your data and your script (or a
>> fragment) to handle it? I have some thoughts about it but need some 
>> more
>> material to be sure of what might work well. I am prepared to bet that,
>> if not me, then someone will pounce with an improvement for you.
>>
>> regards
>> David
>>>
>>> Is there a routine I might be missing?  Thanks!
>>>
>>> Mark
>>>
>>>
>>> _______________________________________________
>>> use-revolution mailing list
>>> use-revolution at lists.runrev.com
>>> http://lists.runrev.com/mailman/listinfo/use-revolution
>>>
>>
>> _______________________________________________
>> use-revolution mailing list
>> use-revolution at lists.runrev.com
>> http://lists.runrev.com/mailman/listinfo/use-revolution
>
> _______________________________________________
> use-revolution mailing list
> use-revolution at lists.runrev.com
> http://lists.runrev.com/mailman/listinfo/use-revolution
>




More information about the use-livecode mailing list