Reverse Intersect
Bob Sneidar
bobs at twft.com
Tue May 25 18:46:03 EDT 2010
Hi Mark. Thanks for the interest.
My apologies to everyone for the length of this post.
First, remember that I am working with paged data. The reason is this:
Let's say the source database has a million records. Since I cannot use a join due to the fact they are dissimilar databases that are not connected in any way, I am going to have to query both databases seperately, yes? If I query the destination database for all it's keys I want to use to find records in the source database, on the first pass with an empty destination database I am going to return every single record! Huge dataset, Revolution crashes!
But let's say I get away with it and Revolution does not crash (yet). Now I have to issue an SQL statement to the destination database something like SELECT sw_id FROM devices WHERE IN (insert huge million item comma delimited string here). If I don't kill the SQL server, I will probably kill Revolution this time with yet another 1 million record data set!
You may say I could use WHERE NOT IN (insert huge million item comma delimited string here), but it doesn't solve my huge dataset and query problem. If I am paging through the data, then I will get every record not in the page, even if they exist in both databases.
The solution? Page through the source database a reasonable number of records at a time, say 100 or 1000 or 10,000 depending on what the limits of Revolution and SQL are. Now my second query will return a dataset with matching records, excluding the records that haven't been added yet. At this point I have a dataset with all the records in my page, and another dataset with all the records in my destination database with matching keys, a subset of my paged data.
Now I need to find out what those keys are that are in my page dataset that are not in my destination dataset. Hence my need for a reverse intersect command. My strategy of working with one record at a time proved to be cumbersome, especially with a remote SQL server where internet lag plays a part. Whether I use a temporary table or memory variables to work with the data seems irrelevant, except that the SQL database is remote, remember? Lag issues. The less individual queries I make, the better the performance will be.
Bob
On May 25, 2010, at 3:18 PM, Mark Wieder wrote:
> stephen-
>
> Tuesday, May 25, 2010, 3:08:33 PM, you wrote:
>
>> if you want to remove duplicate keys - put the imported data immediately
>> into an array. Duplicates are automatically eliminated.
>
> I was going to suggest something similar, but then thought this seems
> like a long way to go around for the desired result. Why not take the
> recordsets from the two database queries and use them on the fly to
> populate tables in a new database? Then you can run your sql matching
> statement on the new tables and delete the temporary database when
> you're done.
>
> --
> -Mark Wieder
> mwieder at ahsoftware.net
>
> _______________________________________________
> use-revolution mailing list
> use-revolution at lists.runrev.com
> Please visit this url to subscribe, unsubscribe and manage your subscription preferences:
> http://lists.runrev.com/mailman/listinfo/use-revolution
More information about the use-livecode
mailing list