Another SQL question

Brian Yennie briany at qldlearning.com
Thu Feb 5 14:04:46 EST 2009


Hey Bob,

Hope these notes help - sounds like a fun project.

> It looks like your query above is going the opposite direction,  
> looking up customers that match the deptid in departments. I am  
> looking for department records whose deptid only exist in the  
> customer cursor. More on paging later.

Other than returning fields from both tables, my query should return  
identical results. It may just be a matter of style, but using IN with  
a sub-select seemed like overkill if you are just trying to match two  
tables based on the "deptid" field. Add a third table and the join  
grows elegantly, but you will probably not want to nest another SELECT  
for each table. If you just want fields from the departments table,  
you could say:

SELECT DISTINCT departments.* FROM customers,departments WHERE  
customers.deptid = departments.deptid

If you had a short list of customers, you could say something like:

SELECT customer.id, department.id, department.name FROM  
customers,departments WHERE customer.id IN (1,2,3,4,5,6...) AND  
customers.deptid = departments.deptid

Now you've got all of your departments listed by customer.

> Aye, I could return all the data with one query. The problem though  
> is that my app is going to contain some fairly complex queries with  
> multiple relations, and I have to construct these queries via  
> conditional coding. Also, I do not WANT to return data from 2 tables  
> in one cursor. First of all, I have like data in multiple tables  
> (deptid in customers and also in departments for example) as well as  
> a signature field, a deleted flag field, and a unique ID field in  
> EVERY table. I need these values for every record in every table.  
> That would mean I would have to use column aliases, so now my SQL  
> construction code would have to be orders of magnitude more  
> complicated, and much more difficult to troubleshoot should the  
> queries not return what I expect. Not impossible, just complicated.

You know your app best, but I would raise a red flag here. Avoiding  
multiple table queries could really set you back if you want to scale  
to large databases. When you've got 1,000,000 records in a table, your  
database engine won't blink joining it to another table provided you  
have indexed fields and a reasonable query. However, if you try to  
dump 100,000 of those IDs into an IN() portion of a subsequent query,  
prepare to wait. (NOTE: a sub-select can still work here, but may be  
harder to generate on the fly, especially with 3+ tables). If these  
subsequent queries are always going to use just a small "page" of  
records to join against, then it's probably fine either way.

On column aliases... What API are you using to fetch your field  
values? Could you just use a fully qualified naming convention:
SELECT department.name AS department_name, customer.name AS  
customer_name FROM ...

Alternatively, you could reference fields by number, and track the  
field names outside of SQL. In that case, it's valid SQL to just use  
the same field names:
SELECT department.name, customer.name FROM ...

(fetch department.name as field #1, customer.name as field #2).

> The method I am using instead is to return each table as it's own  
> cursor, resulting in MUCH simpler code to construct the queries, and  
> simpler queries themselves. Additionally, the signature, delete and  
> uniqueid fields can never be confused beacause each cursor for each  
> table retains it's own identity. So how is it relational? Well I  
> wrote my own relational system into my application, so that when I  
> navigate to a record in a cursor, I also look up the records in and  
> child subservient to that cursor. (IMHO that is what relational  
> really means). Now all I have to do is get values from the current  
> records in each cursor.

This is perfectly fine if it works for you. Just keep in mind that it  
really doesn't scale if you have large cursors. Performing 100 or 1000  
queries versus 1 multi-table query will be no contest. But if your  
parent cursor is always trimmed to a reasonable size, you'll be fine.  
No doubt simpler code and simpler queries are a big win when you are  
trying to write a tool for the general case. I would just be careful  
if you are worried about performance not to do this (pseudo-code):

## N + 1 queries for N customers
put query("SELECT id,deptid FROM customers WHERE state = 'CA'") into  
customerData
repeat for each line customerDetails in customerData
    put item 1 of customerDetails into customerID
    put item 2 of customerDetails into deptid
    put query("SELECT * FROM department WHERE deptid = "&deptid) into  
departments[customerID]
end repeat

## 1 query
put query("SELECT customer.id,department.id,department.name FROM  
customer,department WHERE customer.deptid = department.deptid") into  
customerData

Again, it depends. If N = 25, then maybe you stick with the cleaner  
looking code. But if N = 10,000 ...

> I also have the advantage of being able to present the entire child  
> cursor to the user in a one-to-many environment, as in an invoice  
> where there is a master record and many detail records. My table  
> data is ready made for my invoice detail. It's all in it's own cursor.

Sounds good, although I'm not sure I see how multi-table queries would  
stop you from doing this. If anything, it will just get you all of the  
master/detail information in one "combined" table with exactly the  
fields you want instead of having to grab a parent record and then  
assemble all of the child data with separate queries.

> The LIMIT is necessary because I am working towards a paging system  
> that will handle HUGE databases of unlimited size. This will get  
> around Revolutions limits on how much data can be returned in a  
> cursor (the limits of which I still do not have a definitive answer  
> on). This is why I need to use the SELECT for the master table as a  
> lookup in the child table.

Amen, LIMIT is your friend. Keep in mind, it works with multi-table  
queries as well. You can also use DISTINCT to remove duplicates. I'm  
not sure I follow the second comment. A multiple table query won't  
return any larger data sets unless you include more fields in your  
query.

> I do not use Revolution's built in queries because what I need to do  
> is more than just read data from a table and let the user edit it. I  
> need to do complex validations. For instance using the example  
> above, if someone edits a deptid in the department table, I need to  
> go find every other table that uses deptid, look up the old value  
> and change it to the new value. Otherwise I break the relational  
> link between those records.

Keep in mind that most database engines have built-in abilities to  
manage relations. That's not to say you can't do it yourself, but the  
use-case you describe above is handled automatically by MySQL,  
Valentina, SQL Server, etc. You just need to set up the constraints in  
your database schema and you can do cascading deletes, set fields to  
NULL, or throw an exception depending on your needs

Generally speaking, your approach may be fine for your application. It  
sounds like you are doing things from a 'FileMaker' like view where  
you only have a small number of records visible at a time. Thus there  
never will be a case where you have 100,000 parent records to deal  
with at once. However, if later you want more advanced reporting or  
querying capabilities, you may need the scalability.

HTH



More information about the use-livecode mailing list