Divide Large Data Blob?

Bob Sneidar bobsneidar at iotecdigital.com
Mon May 16 18:23:52 EDT 2022


A maximum of 7 recursions are necessary to isolate a single instance of 100 possible values. 1000 requires a maximum of 10. 10000 values requires 14. The idea is that for every factor of 10, you need roughly 3 more recursions. This of course assumes the data is sorted, which in your case is sorted into 3 containers. If you know the limits of how many lines can be garbage, and how many can be valid data, you narrow your scope significantly. 

Livecode is pretty damn quick at parsing this kind of data. If there are consistent delimiters (in this case a line break) then even 20 or 30 recursions is child's play. 

Bob S


> On May 16, 2022, at 15:00 , Bob Sneidar via use-livecode <use-livecode at lists.runrev.com> wrote:
> 
> Do you know exactly which lines you need to toss, or do you need to searc the data to find out where the beginning and end of the useful data is? 
> If the former, then just put line x to y of your data into a new variable. If the latter, then a divide and conquer approach might be the answer. Get the line 30% in, test for valid, get the line 40% in, test, then 35% then 32.5% or 37.5% depending on your test. 
> 
> You may only have to do this a dozen or so times to find the exact line where your valid data begins. 
> 
> The other way of course is to get it all into a SQL database (how did you all know I was going to say that??) The downside is that you have to iterate through all your data once. The upside is a good one liner query statement may be all you need to process your data. And if you need to make multiple passes at your data, all the better. 
> 
> Bob S
> 
>> On May 16, 2022, at 10:46 , Rick Harrison via use-livecode <use-livecode at lists.runrev.com> wrote:
>> 
>> I have a large chunk of data that I want to
>> search as quickly as possible.  
>> 
>> Unfortunately the part I want to search is the 
>> middle third of the data.  The other thirds at 
>> the beginning and at the end are just junk and 
>> slow down my search so I want to get rid of them.
>> 
>> I don’t want to search line by line as that
>> takes way too long.
>> 
>> There’s no unique character dividing any
>> of these data regions.
>> 
>> What’s the best way to do this?
>> 
>> Thanks in advance!
>> 
>> Rick
>> 
>> 
>> 
>> _______________________________________________
>> use-livecode mailing list
>> use-livecode at lists.runrev.com
>> Please visit this url to subscribe, unsubscribe and manage your subscription preferences:
>> http://lists.runrev.com/mailman/listinfo/use-livecode
> 
> _______________________________________________
> use-livecode mailing list
> use-livecode at lists.runrev.com
> Please visit this url to subscribe, unsubscribe and manage your subscription preferences:
> http://lists.runrev.com/mailman/listinfo/use-livecode



More information about the use-livecode mailing list