Another SQL question
Brian Yennie
briany at qldlearning.com
Thu Feb 5 14:04:46 EST 2009
Hey Bob,
Hope these notes help - sounds like a fun project.
> It looks like your query above is going the opposite direction,
> looking up customers that match the deptid in departments. I am
> looking for department records whose deptid only exist in the
> customer cursor. More on paging later.
Other than returning fields from both tables, my query should return
identical results. It may just be a matter of style, but using IN with
a sub-select seemed like overkill if you are just trying to match two
tables based on the "deptid" field. Add a third table and the join
grows elegantly, but you will probably not want to nest another SELECT
for each table. If you just want fields from the departments table,
you could say:
SELECT DISTINCT departments.* FROM customers,departments WHERE
customers.deptid = departments.deptid
If you had a short list of customers, you could say something like:
SELECT customer.id, department.id, department.name FROM
customers,departments WHERE customer.id IN (1,2,3,4,5,6...) AND
customers.deptid = departments.deptid
Now you've got all of your departments listed by customer.
> Aye, I could return all the data with one query. The problem though
> is that my app is going to contain some fairly complex queries with
> multiple relations, and I have to construct these queries via
> conditional coding. Also, I do not WANT to return data from 2 tables
> in one cursor. First of all, I have like data in multiple tables
> (deptid in customers and also in departments for example) as well as
> a signature field, a deleted flag field, and a unique ID field in
> EVERY table. I need these values for every record in every table.
> That would mean I would have to use column aliases, so now my SQL
> construction code would have to be orders of magnitude more
> complicated, and much more difficult to troubleshoot should the
> queries not return what I expect. Not impossible, just complicated.
You know your app best, but I would raise a red flag here. Avoiding
multiple table queries could really set you back if you want to scale
to large databases. When you've got 1,000,000 records in a table, your
database engine won't blink joining it to another table provided you
have indexed fields and a reasonable query. However, if you try to
dump 100,000 of those IDs into an IN() portion of a subsequent query,
prepare to wait. (NOTE: a sub-select can still work here, but may be
harder to generate on the fly, especially with 3+ tables). If these
subsequent queries are always going to use just a small "page" of
records to join against, then it's probably fine either way.
On column aliases... What API are you using to fetch your field
values? Could you just use a fully qualified naming convention:
SELECT department.name AS department_name, customer.name AS
customer_name FROM ...
Alternatively, you could reference fields by number, and track the
field names outside of SQL. In that case, it's valid SQL to just use
the same field names:
SELECT department.name, customer.name FROM ...
(fetch department.name as field #1, customer.name as field #2).
> The method I am using instead is to return each table as it's own
> cursor, resulting in MUCH simpler code to construct the queries, and
> simpler queries themselves. Additionally, the signature, delete and
> uniqueid fields can never be confused beacause each cursor for each
> table retains it's own identity. So how is it relational? Well I
> wrote my own relational system into my application, so that when I
> navigate to a record in a cursor, I also look up the records in and
> child subservient to that cursor. (IMHO that is what relational
> really means). Now all I have to do is get values from the current
> records in each cursor.
This is perfectly fine if it works for you. Just keep in mind that it
really doesn't scale if you have large cursors. Performing 100 or 1000
queries versus 1 multi-table query will be no contest. But if your
parent cursor is always trimmed to a reasonable size, you'll be fine.
No doubt simpler code and simpler queries are a big win when you are
trying to write a tool for the general case. I would just be careful
if you are worried about performance not to do this (pseudo-code):
## N + 1 queries for N customers
put query("SELECT id,deptid FROM customers WHERE state = 'CA'") into
customerData
repeat for each line customerDetails in customerData
put item 1 of customerDetails into customerID
put item 2 of customerDetails into deptid
put query("SELECT * FROM department WHERE deptid = "&deptid) into
departments[customerID]
end repeat
## 1 query
put query("SELECT customer.id,department.id,department.name FROM
customer,department WHERE customer.deptid = department.deptid") into
customerData
Again, it depends. If N = 25, then maybe you stick with the cleaner
looking code. But if N = 10,000 ...
> I also have the advantage of being able to present the entire child
> cursor to the user in a one-to-many environment, as in an invoice
> where there is a master record and many detail records. My table
> data is ready made for my invoice detail. It's all in it's own cursor.
Sounds good, although I'm not sure I see how multi-table queries would
stop you from doing this. If anything, it will just get you all of the
master/detail information in one "combined" table with exactly the
fields you want instead of having to grab a parent record and then
assemble all of the child data with separate queries.
> The LIMIT is necessary because I am working towards a paging system
> that will handle HUGE databases of unlimited size. This will get
> around Revolutions limits on how much data can be returned in a
> cursor (the limits of which I still do not have a definitive answer
> on). This is why I need to use the SELECT for the master table as a
> lookup in the child table.
Amen, LIMIT is your friend. Keep in mind, it works with multi-table
queries as well. You can also use DISTINCT to remove duplicates. I'm
not sure I follow the second comment. A multiple table query won't
return any larger data sets unless you include more fields in your
query.
> I do not use Revolution's built in queries because what I need to do
> is more than just read data from a table and let the user edit it. I
> need to do complex validations. For instance using the example
> above, if someone edits a deptid in the department table, I need to
> go find every other table that uses deptid, look up the old value
> and change it to the new value. Otherwise I break the relational
> link between those records.
Keep in mind that most database engines have built-in abilities to
manage relations. That's not to say you can't do it yourself, but the
use-case you describe above is handled automatically by MySQL,
Valentina, SQL Server, etc. You just need to set up the constraints in
your database schema and you can do cascading deletes, set fields to
NULL, or throw an exception depending on your needs
Generally speaking, your approach may be fine for your application. It
sounds like you are doing things from a 'FileMaker' like view where
you only have a small number of records visible at a time. Thus there
never will be a case where you have 100,000 parent records to deal
with at once. However, if later you want more advanced reporting or
querying capabilities, you may need the scalability.
HTH
More information about the use-livecode
mailing list