OT : Advice on SQL database design

Igor de Oliveira Couto igor at semperuna.com
Mon Jun 30 21:09:50 EDT 2014


I'll try to provide some initial info:

On 1 Jul 2014, at 10:23 am, Alex Tweedly <alex at tweedly.net> wrote:

> I guess there's a meta-question about "what's a good book/source on SQL database design ?"
> 
> I am trying to design a database. There is one table which contains, let's say, Armed Forces Personnel
> - each row is a different person
> - columns include the obvious .... name, date of birth, ... that are common to all
> - there is a column saying which armed force the person is in
>      - and then there are additional, DIFFERENT fields depending on which one it is [...]

This is a very common design pattern. If you have ever done any Object-Oriented Programming, you would have come across this problem when trying to translate your objects' inheritance hierarchy into database storage tables. I'll give you a very common example: online catalogues. Let's say you have an online catalogue, where you sell different kinds of items: books, clothes, and computer accessories. Each one of these is an "item", and as an 'item', they have several properties in common: an 'item number' in your catalogue, 'pricing', 'stock level', and so on. Unfortunately, 'books' also need to display certain information - 'title', 'author', etc. - which is not appropriate for 'clothes', where we really need to have information like 'colour', 'size', and so on.

It is easy to visualise the relationship between these different types of 'items' in an OOP class diagram: "Item" would be a super class, with the properties (fields) common to all items: 'number', 'price', 'stock level', etc. "Books", "Clothes" and "Computer Accessories" are sub-classes of "Item", and as well as inheriting all the properties from the parent "Item", they define their own properties, which should only be available to items of that class. So, the "Books" class will define an "author" and "title" properties, and the "Clothes" class will define a "size" and "colour" properties, and so on. But how do we store this information in the database?

In our database, we are going to have a table of "orders". Each "order" will have one-to-many "items" connected to it. But these "items" may be 'books', 'clothes' or 'computer accessories'. How do we store this?

The type of relationship between 'orders' and its 'items' is traditionally called a one-to-many POLYMORPHIC RELATIONSHIP: the 'order' is connected to an 'item', but the 'item' can be more than one 'type of thing'. Usually, the way this is done is:

* you create a separate table each for 'books', 'clothes' and 'computer accessories', and in this book you store the information relevant to that type of object. 

* you create an 'items' table, and in that table you have all the information that is common to all items ('pricing', 'item number', 'stock level'). 

* each 'item' is going to reference a record in the 'books', 'clothes' or 'computer accessories' table. In order for that to happen, each item must have an "item_id" field, and an "item_type" field. The "item_id" is the id of the item in the 'books'/'clothes'/'accessories' table, and the "item_type" is simply the name of the table itself. 

Now you can relate 'orders' to 'items', and based on the "item_type" and "item_id" then retrieve the appropriate record from the correct table. 

This introduces an extra layer of complexity to your database queries - extra queries will have to be made with every CRUD operation of 'items' - but if you are using OOP, this table layout will closely match your classes, and makes updating your architecture easier in the future.

Now, let's throw a spanner in the works...

You may have heard of "NoSQL" databases. These are database systems that were created to try and overcome some of the problems inherent to SQL (and table-based) databases. There are many different types of NoSQL databases, which store information in a variety of flexible ways. Traditionally, NoSQL databases were designed to be able to store *massive* amounts of data - think Google-size - and be easy to be expanded. 

NoSQL databases usually don't store their data in 'rows' and 'tables' - they use 'records' and 'collections', but the main difference is, that there is no set 'fields': each record in a collection can have an arbitrary number of fields, and these can be totally different from the next record in the same collection. Now, *this* is flexible, and frees the developer from even having to know what kind information will be stored.

This flexibility and expandability usually came at a high cost: there were lots of features that we were used to in SQL databases, that were missing in NoSQL - important ones, such as relational integrity (most have no easy relational capability) and ACID transactions (no rollback and commit).

However, NoSQL databases have evolved *greatly* in the last couple of years, some open source projects have received a very healthy injection of cash. We are now starting to see some NoSQL databases that offer ALL of the flexibility of NoSQL, as well as having ALL of the SQL features. Possibly, the most notable one is "ArangoDB":

http://www.arangodb.org

This is an incredibly powerful database, which is free, extremely easy to instal, and well-documented.

Using a NoSQL database like Arango, we could simply have an "items" collection, and if different items have different fields, it doesn't matter. We can then later search for all items that from a certain 'author', or of a certain 'colour', and with one single query get the right book or item of clothing. We can also *embed* a full list of items in every 'order'. That way we can change our items at will, without fearing that it will alter the content of past orders. To do the same in an SQL database would require the addition of extra 'item template' tables, adding even more complexity to the architecture.

As you can see, there are some amazing advantages in NoSQL databases, and it may pay to invest some time and get acquainted with a solid product like ArangoDB. It may offer a different type of solution that may be a good inclusion in your arsenal of options.

I hope this helps.

Kind regards to all,

--
Igor Couto
Sydney, Australia




More information about the use-livecode mailing list