Divide Large Data Blob?
Bob Sneidar
bobsneidar at iotecdigital.com
Mon May 16 19:53:32 EDT 2022
OK so it appears there is a log2 property. The log2 of 1000 yields 9.965784 so I suppsoe if you round up, that would give you the maximum number of iterations to isolate a single line in a 1000 line sorted list.
Bob S
> On May 16, 2022, at 15:44 , Bob Sneidar via use-livecode <use-livecode at lists.runrev.com> wrote:
>
> So this has got me thinking. Apparently what I am calling Divide and Conquer is really called a binary sort. I have looked up on the interwebs to calculate the maximum number of iterations for a given number of values, but it seems that all the formulas offered up use functions for C. I am trying to figure out what a basic math formula for this is, given n values.
>
> Bob S
>
>
>> On May 16, 2022, at 15:23 , Bob Sneidar via use-livecode <use-livecode at lists.runrev.com> wrote:
>>
>> A maximum of 7 recursions are necessary to isolate a single instance of 100 possible values. 1000 requires a maximum of 10. 10000 values requires 14. The idea is that for every factor of 10, you need roughly 3 more recursions. This of course assumes the data is sorted, which in your case is sorted into 3 containers. If you know the limits of how many lines can be garbage, and how many can be valid data, you narrow your scope significantly.
>>
>> Livecode is pretty damn quick at parsing this kind of data. If there are consistent delimiters (in this case a line break) then even 20 or 30 recursions is child's play.
>>
>> Bob S
>>
>>
>>> On May 16, 2022, at 15:00 , Bob Sneidar via use-livecode <use-livecode at lists.runrev.com> wrote:
>>>
>>> Do you know exactly which lines you need to toss, or do you need to searc the data to find out where the beginning and end of the useful data is?
>>> If the former, then just put line x to y of your data into a new variable. If the latter, then a divide and conquer approach might be the answer. Get the line 30% in, test for valid, get the line 40% in, test, then 35% then 32.5% or 37.5% depending on your test.
>>>
>>> You may only have to do this a dozen or so times to find the exact line where your valid data begins.
>>>
>>> The other way of course is to get it all into a SQL database (how did you all know I was going to say that??) The downside is that you have to iterate through all your data once. The upside is a good one liner query statement may be all you need to process your data. And if you need to make multiple passes at your data, all the better.
>>>
>>> Bob S
>>>
>>>> On May 16, 2022, at 10:46 , Rick Harrison via use-livecode <use-livecode at lists.runrev.com> wrote:
>>>>
>>>> I have a large chunk of data that I want to
>>>> search as quickly as possible.
>>>>
>>>> Unfortunately the part I want to search is the
>>>> middle third of the data. The other thirds at
>>>> the beginning and at the end are just junk and
>>>> slow down my search so I want to get rid of them.
>>>>
>>>> I don’t want to search line by line as that
>>>> takes way too long.
>>>>
>>>> There’s no unique character dividing any
>>>> of these data regions.
>>>>
>>>> What’s the best way to do this?
>>>>
>>>> Thanks in advance!
>>>>
>>>> Rick
>>>>
>>>>
>>>>
>>>> _______________________________________________
>>>> use-livecode mailing list
>>>> use-livecode at lists.runrev.com
>>>> Please visit this url to subscribe, unsubscribe and manage your subscription preferences:
>>>> http://lists.runrev.com/mailman/listinfo/use-livecode
>>>
>>> _______________________________________________
>>> use-livecode mailing list
>>> use-livecode at lists.runrev.com
>>> Please visit this url to subscribe, unsubscribe and manage your subscription preferences:
>>> http://lists.runrev.com/mailman/listinfo/use-livecode
>>
>> _______________________________________________
>> use-livecode mailing list
>> use-livecode at lists.runrev.com
>> Please visit this url to subscribe, unsubscribe and manage your subscription preferences:
>> http://lists.runrev.com/mailman/listinfo/use-livecode
>
> _______________________________________________
> use-livecode mailing list
> use-livecode at lists.runrev.com
> Please visit this url to subscribe, unsubscribe and manage your subscription preferences:
> http://lists.runrev.com/mailman/listinfo/use-livecode
More information about the use-livecode
mailing list