Reverse Intersect

Bob Sneidar bobs at twft.com
Tue May 25 18:46:03 EDT 2010


Hi Mark. Thanks for the interest. 

My apologies to everyone for the length of this post. 

First, remember that I am working with paged data. The reason is this: 

Let's say the source database has a million records. Since I cannot use a join due to the fact they are dissimilar databases that are not connected in any way, I am going to have to query both databases seperately, yes? If I query the destination database for all it's keys I want to use to find records in the source database, on the first pass with an empty destination database I am going to return every single record! Huge dataset, Revolution crashes! 

But let's say I get away with it and Revolution does not crash (yet). Now I have to issue an SQL statement to the destination database something like SELECT sw_id FROM devices WHERE IN (insert huge million item comma delimited string here). If I don't kill the SQL server, I will probably kill Revolution this time with yet another 1 million record data set! 

You may say I could use WHERE NOT IN (insert huge million item comma delimited string here), but it doesn't solve my huge dataset and query problem. If I am paging through the data, then I will get every record not in the page, even if they exist in both databases. 

The solution? Page through the source database a reasonable number of records at a time, say 100 or 1000 or 10,000 depending on what the limits of Revolution and SQL are. Now my second query will return a dataset with matching records, excluding the records that haven't been added yet. At this point I have a dataset with all the records in my page, and another dataset with all the records in my destination database with matching keys, a subset of my paged data. 

Now I need to find out what those keys are that are in my page dataset that are not in my destination dataset. Hence my need for a reverse intersect command. My strategy of working with one record at a time proved to be cumbersome, especially with a remote SQL server where internet lag plays a part. Whether I use a temporary table or memory variables to work with the data seems irrelevant, except that the SQL database is remote, remember? Lag issues. The less individual queries I make, the better the performance will be. 

Bob



 
On May 25, 2010, at 3:18 PM, Mark Wieder wrote:

> stephen-
> 
> Tuesday, May 25, 2010, 3:08:33 PM, you wrote:
> 
>> if you want to remove duplicate keys - put the imported data immediately
>> into an array. Duplicates are automatically eliminated.
> 
> I was going to suggest something similar, but then thought this seems
> like a long way to go around for the desired result. Why not take the
> recordsets from the two database queries and use them on the fly to
> populate tables in a new database? Then you can run your sql matching
> statement on the new tables and delete the temporary database when
> you're done.
> 
> -- 
> -Mark Wieder
> mwieder at ahsoftware.net
> 
> _______________________________________________
> use-revolution mailing list
> use-revolution at lists.runrev.com
> Please visit this url to subscribe, unsubscribe and manage your subscription preferences:
> http://lists.runrev.com/mailman/listinfo/use-revolution




More information about the use-livecode mailing list