how to compare 2 very large textfiles

Scott Rossi scott at tactilemedia.com
Thu Oct 6 18:28:00 EDT 2011


Pete, I meant to ask, how does your array solution work?  Where does the
comparison take place?  I've long used arrays for storing data but not much
beyond that.

Thanks & Regards,

Scott Rossi
Creative Director
Tactile Media, UX Design



Recently, Pete wrote:

> Thanks for the report back on the speed Alex.  I guess its academic if the
> speed is down to 100msecs but I'm wondering if a binary search technique
> would be better or worse (assuming the lists were sorted of course).
> 
> How did you create the two lists for your test?  I'd like to try the binary
> search but stuck with an easy way to generate two large files like that!
> 
> Pete
> Molly's Revenge <http://www.mollysrevenge.com>
> 
> 
> 
> 
> On Thu, Oct 6, 2011 at 1:17 PM, Alex Tweedly <alex at tweedly.net> wrote:
> 
>> Much faster.
>> 
>> I tried the original script (with typo fixed) on 7000 lines of varying
>> length between 100 and 300 chars - took about 2 minutes to run. The array
>> version (again with typo fixed) took around 100 msec.
>> 
>> -- Alex.
>> 
>> On 06/10/2011 20:16, Scott Rossi wrote:
>> 
>>> FWIW, I tried a quick test of Matthias's script using two fields with 5000
>>> lines of 256 chars each.  I tried using "i is not among the lines of" and
>>> "i
>>> is not in" with identical results.  Processing time was 1 min 6 secs in
>>> both
>>> cases (Mac Intel Core2 Duo).  Perhaps the array option posted is faster.
>>> 
>>> Regards,
>>> 
>>> Scott Rossi
>>> Creative Director
>>> Tactile Media, UX Design
>>> 
>>> 
>>> 
>>> Recently, Michael Kann wrote:
>>> 
>>>  Matthias,
>>>> 
>>>> Your script should take a few seconds at most. There must be something
>>>> else
>>>> going on to slow you down. If you want to post the script itself and a
>>>> few
>>>> lines of data perhaps someone can figure it out.
>>>> 
>>>> Mike
>>>> 
>>>> --- On Wed, 10/5/11, Matthias
>>>> Rebbe<matthias_livecode_**150811 at m-r-d.de<matthias_livecode_150811 at m-r-d.de
>>>> >>
>>>>  wrote:
>>>> 
>>>> From: Matthias
>>>> 
Rebbe<matthias_livecode_**150811 at m-r-d.de<matthias_livecode_150811 at m-r-d.de>>>>
>
>>>>> 
>>>> Subject: how to compare 2 very large textfiles
>>>> To: "How to use
>>>> LiveCode"<use-livecode at lists.**runrev.com<use-livecode at lists.runrev.com>
>>>>> 
>>>> Date: Wednesday, October 5, 2011, 5:00 PM
>>>> 
>>>> 
>>>> Hi,
>>>> 
>>>> i need to compare two very large text files with about 5000 - 7000 lines
>>>> each
>>>> with a lines size of up to 256 chars.
>>>> 
>>>> I need to find out if there are lines missing in either file a or file b.
>>>> 
>>>> What is the best way to do this with good speed?
>>>> 
>>>> I tried to check each line in file a and if the line is in file b.
>>>> And after that, i check for each line in file b and try to find out
>>>> if the line is in file a.
>>>> 
>>>> With large files it takes about 10 to 15 minutes to do the complete
>>>> check.
>>>> 
>>>> My script looks like this
>>>> 
>>>> repeat for each line i in tTextA
>>>> if i is not among the lines of tTextB then put i&return after tMissingInB
>>>> end repeat
>>>> 
>>>> repeat for each line i in tTextB
>>>> if i is not among the lines of tTextA then put i&retrurn after
>>>> tMissingInA
>>>> end repeat
>>>> 
>>>> Is there a better (faster) way?
>>>> 
>>>> Regards,
>>>> 
>>>> Matthias
>>>> 
>>> 
>>> 
>>> ______________________________**_________________
>>> use-livecode mailing list
>>> use-livecode at lists.runrev.com
>>> Please visit this url to subscribe, unsubscribe and manage your
>>> subscription preferences:
>>> http://lists.runrev.com/**mailman/listinfo/use-livecode<http://lists.runrev.
>>> com/mailman/listinfo/use-livecode>
>>> 
>>> 
>> 
>> ______________________________**_________________
>> use-livecode mailing list
>> use-livecode at lists.runrev.com
>> Please visit this url to subscribe, unsubscribe and manage your
>> subscription preferences:
>> http://lists.runrev.com/**mailman/listinfo/use-livecode<http://lists.runrev.c
>> om/mailman/listinfo/use-livecode>
>> 
>> 
> _______________________________________________
> use-livecode mailing list
> use-livecode at lists.runrev.com
> Please visit this url to subscribe, unsubscribe and manage your subscription
> preferences:
> http://lists.runrev.com/mailman/listinfo/use-livecode






More information about the use-livecode mailing list