how to compare 2 very large textfiles
Scott Rossi
scott at tactilemedia.com
Thu Oct 6 18:28:00 EDT 2011
Pete, I meant to ask, how does your array solution work? Where does the
comparison take place? I've long used arrays for storing data but not much
beyond that.
Thanks & Regards,
Scott Rossi
Creative Director
Tactile Media, UX Design
Recently, Pete wrote:
> Thanks for the report back on the speed Alex. I guess its academic if the
> speed is down to 100msecs but I'm wondering if a binary search technique
> would be better or worse (assuming the lists were sorted of course).
>
> How did you create the two lists for your test? I'd like to try the binary
> search but stuck with an easy way to generate two large files like that!
>
> Pete
> Molly's Revenge <http://www.mollysrevenge.com>
>
>
>
>
> On Thu, Oct 6, 2011 at 1:17 PM, Alex Tweedly <alex at tweedly.net> wrote:
>
>> Much faster.
>>
>> I tried the original script (with typo fixed) on 7000 lines of varying
>> length between 100 and 300 chars - took about 2 minutes to run. The array
>> version (again with typo fixed) took around 100 msec.
>>
>> -- Alex.
>>
>> On 06/10/2011 20:16, Scott Rossi wrote:
>>
>>> FWIW, I tried a quick test of Matthias's script using two fields with 5000
>>> lines of 256 chars each. I tried using "i is not among the lines of" and
>>> "i
>>> is not in" with identical results. Processing time was 1 min 6 secs in
>>> both
>>> cases (Mac Intel Core2 Duo). Perhaps the array option posted is faster.
>>>
>>> Regards,
>>>
>>> Scott Rossi
>>> Creative Director
>>> Tactile Media, UX Design
>>>
>>>
>>>
>>> Recently, Michael Kann wrote:
>>>
>>> Matthias,
>>>>
>>>> Your script should take a few seconds at most. There must be something
>>>> else
>>>> going on to slow you down. If you want to post the script itself and a
>>>> few
>>>> lines of data perhaps someone can figure it out.
>>>>
>>>> Mike
>>>>
>>>> --- On Wed, 10/5/11, Matthias
>>>> Rebbe<matthias_livecode_**150811 at m-r-d.de<matthias_livecode_150811 at m-r-d.de
>>>> >>
>>>> wrote:
>>>>
>>>> From: Matthias
>>>>
Rebbe<matthias_livecode_**150811 at m-r-d.de<matthias_livecode_150811 at m-r-d.de>>>>
>
>>>>>
>>>> Subject: how to compare 2 very large textfiles
>>>> To: "How to use
>>>> LiveCode"<use-livecode at lists.**runrev.com<use-livecode at lists.runrev.com>
>>>>>
>>>> Date: Wednesday, October 5, 2011, 5:00 PM
>>>>
>>>>
>>>> Hi,
>>>>
>>>> i need to compare two very large text files with about 5000 - 7000 lines
>>>> each
>>>> with a lines size of up to 256 chars.
>>>>
>>>> I need to find out if there are lines missing in either file a or file b.
>>>>
>>>> What is the best way to do this with good speed?
>>>>
>>>> I tried to check each line in file a and if the line is in file b.
>>>> And after that, i check for each line in file b and try to find out
>>>> if the line is in file a.
>>>>
>>>> With large files it takes about 10 to 15 minutes to do the complete
>>>> check.
>>>>
>>>> My script looks like this
>>>>
>>>> repeat for each line i in tTextA
>>>> if i is not among the lines of tTextB then put i&return after tMissingInB
>>>> end repeat
>>>>
>>>> repeat for each line i in tTextB
>>>> if i is not among the lines of tTextA then put i&retrurn after
>>>> tMissingInA
>>>> end repeat
>>>>
>>>> Is there a better (faster) way?
>>>>
>>>> Regards,
>>>>
>>>> Matthias
>>>>
>>>
>>>
>>> ______________________________**_________________
>>> use-livecode mailing list
>>> use-livecode at lists.runrev.com
>>> Please visit this url to subscribe, unsubscribe and manage your
>>> subscription preferences:
>>> http://lists.runrev.com/**mailman/listinfo/use-livecode<http://lists.runrev.
>>> com/mailman/listinfo/use-livecode>
>>>
>>>
>>
>> ______________________________**_________________
>> use-livecode mailing list
>> use-livecode at lists.runrev.com
>> Please visit this url to subscribe, unsubscribe and manage your
>> subscription preferences:
>> http://lists.runrev.com/**mailman/listinfo/use-livecode<http://lists.runrev.c
>> om/mailman/listinfo/use-livecode>
>>
>>
> _______________________________________________
> use-livecode mailing list
> use-livecode at lists.runrev.com
> Please visit this url to subscribe, unsubscribe and manage your subscription
> preferences:
> http://lists.runrev.com/mailman/listinfo/use-livecode
More information about the use-livecode
mailing list