[somewhat OT] Text processing question (sort of)

Jim Ault JimAultWins at yahoo.com
Sun May 18 16:02:05 EDT 2008


Is this slower or about the same?  with your data set
[these are not tested, so you many need to tweak syntax]

repeat for each line LNN in myData
   get myData
   filter it with LNN
   put line 1 of it & cr after uniqueOnly
end repeat
get put the number of lines in uniqueOnly
put the number of lines in myData & " minus dups =" & it

of course, making the target data set smaller and smaller has advantages
so adding an IF condition might defeat speed gain near end of 40000 lines...

put empty into uniqueOnly
put myData into remainingLines
put the number of lines in remainingLines into remainingCount
repeat for each line LNN in myData
   filter remainingLines without LNN
   get the number of lines in remainingLines
   if it < remainingCount then  --at least one dup found
      put LNN & cr after uniqueOnly
      put the number of lines in remainingLines into remainingCount
   end if
end repeat
get put the number of lines in uniqueOnly
put the number of lines in myData & " minus dups =" & it

If all lines are shorter than 255 chars..

put myData into arrayFood
repeat for each line LNN in arrayFood
        put LNN & tab & 1 & cr after tempVar
end repeat
--assming 
split tempVar using cr and tab
put the keys of tempVar into uniqueOnly

Try these and see, not that it will be worth all the time and effort.  Once
you have a speedy solution, go on to the next task and leave the diving to
to the benchmarkers out there.

Jim Ault
Las Vegas



On 5/18/08 11:27 AM, "jbv" <jbv.silences at club-internet.fr> wrote:

> 
> if anyone is interested, while trying to find the fastest way to compare
> each line of a list with every other line, I found the following technique
> quite fast :
> 
> -- myData contains the 40000 lines to chack
> -- myData1 is a duplicate of myData
> 
> put myData into myData1
> 
>  repeat for each line j in myData
>   delete line 1 of myData1
>   repeat for each line i in myData1
>   end repeat
>  end repeat
> 





More information about the use-livecode mailing list