SQL JOIN tables should be used rather than comma-separated tables
Having used SQL since long before the ANSI JOIN syntax was well supported (first Sybase, then MS SQL and then Oracle) I resisted it for a long time out of habit, and also because at first the syntax was buggy when used in Oracle.
But I have come around to being in favor of it. The bugs have been fixed, and the main advantages are that: inner and outer joins are more clearly stated than by using '*=', or '(+)' suffixes on one side of a predicate; and the join criteria are clearly separated from the WHERE clause. It makes it much easier to see how the tables are being joined vs. how the results are being limited.
They stop Cartesian products from accidentally occurring. That alone should force the use of ANSI joins. cross joining two million row tables is not a pleasant experience!
Limiting syntax to use only ANSI joins also means that you end up with one and only one table as the root table (or view) to select from. I think this makes code much clearer.
I tend to writ my SQL like lisp, very compositional with lots of embedded views, in order to be very explicit on how I want tables and views joined. If the optimizer is good, it will know what parts of your statement it can optimize, and if it's not, it usually follows my code, and I don't have to resort to Optimizer hints in order for Oracle to get its black magic done.
I think the key is that join condition problems are easier to spot. There are several types query issues which are classic join condition issues. If you use cross join and where, these are harder to spot when troubleshooting. I think this is a bigger issue actually than accidental cross-joins. The latter sometimes happens occasionally. The former happens much more frequently and so wins there in terms of debugging time are quite important.
The way I see SQL in my mind is programming Venn Diagrams ( http://en.wikipedia.org/wiki/Venn_diagram ) followed by creating a projection that uses one or more of the areas within that diagram (contrasting strongly with imperative languages, i.e. logic programming). Unfortunately it's not straight-forward to do that because SQL is a superset of Venn Diagrams, but that line of thinking is where you need to be.
Yes, exactly. To use SQL well you need to think in terms of sets and set operations. Venn Diagrams are a good idea to help visualize this. If you find yourself writing a lot loops over cursors there's a good chance you're doing it wrong.
I thought this was a very good article, unusually so for a blog post about databases. I do take issue, however, with the idea that there's a beauty to SQL. There's a beauty in the conceptual model, but you have to squint sideways to see it through the deeply hostile syntax.
I'm not sure the "Relational Model" is even a great conceptual model for common business or analytical processing. I think SQL is ugly because the model is a poor fit; if the model was solid, a clean syntax would follow naturally. As for alternatives, MDX has superseded it for core "pivot table" analytics. For application processing, it seems modern document/hierarchical databases are good for many transactional needs.
What modern relational databases have... is a very intuitive abstract storage model. That's awesome. We've been working on an alternative "navigational model" (inspired from CODASYL and ORMs) based on this storage model. Our experimental query language is at http://htsql.org
I found the relational beauty in being able to "store every unique value just once." It resonated with the programming concept of "write a function so that you don't need to change things in many places." This in turn gave me a lens to viewing how businesses operate independent of what systems you put on top of the data.
Does this mean SQL is pretty? Not really, but it gets the job done. Though I did have to spend a lot of time optimizing SQL to force certain query optimizations.
Does this mean relational should be used everywhere? No, but it has a distinct place. I think beer is beautiful too, but I won't drink it for breakfast with the in-laws.
I disagree. There are issues with SQL (how many things can NULL mean?) but there is also beauty in it. Every language has issues.
You don't see the beauty in SQL until you try to get as much bang for the buck out of specific queries. Then the idea that you can think in terms of math and sets really shines. You can do a lot in SQL which would require a lot more app code to accomplish, and the SQL will be more elegant and easier to maintain.
The conceptual model is great, but IMHO a little impractical. Anyone who has ever had to trick the optimiser into choosing a better plan by changing the semi join from WHERE EXISTS to NOT IN (or vice versa) will appreciate what I'm saying :-)
Another thing to understand is how NULLs are handled in different contexts, using either the familiar two-valued logic or the more exotic three-valued logic. It's kind of messy but really worth knowing if you're working with SQL. The wikipedia page actually gives a pretty good account of the issue: https://en.wikipedia.org/wiki/Null_%28SQL%29
The most beneficial thing you can do when you start with SQL is to NEVER EVER put join clauses in the WHERE section. ALWAYS put the join clauses as close to the table you're joining as possible.
eg :
never do SELECT * FROM A JOIN B WHERE A.ID = B.a_id and B.id > 10
but always JOIN B ON A.ID = B.a_id AND B.id > 10
the second way of doing "scale" much much better when you add more tables, and start mixing left and right joins.
I was just about to say that! Truthfully this is a good introduction but is definitely not everything you need to know about SQL.(Obviously) Handling nulls is just one of many sticklers people using SQL frequently screw up.
Fantastic article. I clicked on it, hoping to find a way to help some of my less SQL-savvy co-workers to understand SQL a little better. I instead came away with a better understanding of my own.
I like SQL. The relational model is often underestimated. All of the complex structure that goes beyond individual rows is orthogonal and is itself first-class data. Contrast with document databases where hierarchies are part of the low-level storage mechanism. By making the structure first-class data, the relational model keeps you from painting yourself into a corner.
> If a has 3 columns and b has 5 columns, then the "output table" will have 8 (3 + 5) columns. The records contained in this combined table reference are those of the cross product / cartesian product of a x b.
This is where the easy steps become hard for real noobs.
Those are expressions. They only work once you have picked the table(s) - FROM - and determined the rows you want returned - WHERE. The query then groups and orders the results and calculates the scalar value of COUNT(1) which it repeats on every row, then it determines the weekday for every row for the column created_at and uses that value for output.
Know thy database engine for if you do not, your local DBA will be mightily pissed off when you do a cross join across 15 tables and watch his/her IOPS go through the roof...
[+] [-] ams6110|12 years ago|reply
Having used SQL since long before the ANSI JOIN syntax was well supported (first Sybase, then MS SQL and then Oracle) I resisted it for a long time out of habit, and also because at first the syntax was buggy when used in Oracle.
But I have come around to being in favor of it. The bugs have been fixed, and the main advantages are that: inner and outer joins are more clearly stated than by using '*=', or '(+)' suffixes on one side of a predicate; and the join criteria are clearly separated from the WHERE clause. It makes it much easier to see how the tables are being joined vs. how the results are being limited.
[+] [-] chris_wot|12 years ago|reply
[+] [-] batbomb|12 years ago|reply
I tend to writ my SQL like lisp, very compositional with lots of embedded views, in order to be very explicit on how I want tables and views joined. If the optimizer is good, it will know what parts of your statement it can optimize, and if it's not, it usually follows my code, and I don't have to resort to Optimizer hints in order for Oracle to get its black magic done.
[+] [-] einhverfr|12 years ago|reply
[+] [-] zamalek|12 years ago|reply
[+] [-] ams6110|12 years ago|reply
[+] [-] jfb|12 years ago|reply
[+] [-] clarkevans|12 years ago|reply
What modern relational databases have... is a very intuitive abstract storage model. That's awesome. We've been working on an alternative "navigational model" (inspired from CODASYL and ORMs) based on this storage model. Our experimental query language is at http://htsql.org
Our critique of SQL is at http://htsql.org/doc/overview.html#why-not-sql
[+] [-] mathattack|12 years ago|reply
Does this mean SQL is pretty? Not really, but it gets the job done. Though I did have to spend a lot of time optimizing SQL to force certain query optimizations.
Does this mean relational should be used everywhere? No, but it has a distinct place. I think beer is beautiful too, but I won't drink it for breakfast with the in-laws.
[+] [-] einhverfr|12 years ago|reply
You don't see the beauty in SQL until you try to get as much bang for the buck out of specific queries. Then the idea that you can think in terms of math and sets really shines. You can do a lot in SQL which would require a lot more app code to accomplish, and the SQL will be more elegant and easier to maintain.
[+] [-] chris_wot|12 years ago|reply
[+] [-] rhizome31|12 years ago|reply
[+] [-] bsaul|12 years ago|reply
eg : never do SELECT * FROM A JOIN B WHERE A.ID = B.a_id and B.id > 10 but always JOIN B ON A.ID = B.a_id AND B.id > 10
the second way of doing "scale" much much better when you add more tables, and start mixing left and right joins.
[+] [-] lysium|12 years ago|reply
[+] [-] adad95|12 years ago|reply
[+] [-] batbomb|12 years ago|reply
[+] [-] lukaseder|12 years ago|reply
[+] [-] chris_wot|12 years ago|reply
[+] [-] unknown|12 years ago|reply
[deleted]
[+] [-] chris_wot|12 years ago|reply
[+] [-] lukaseder|12 years ago|reply
Example: http://sqlfiddle.com/#!4/d41d8/16785
[+] [-] hobs|12 years ago|reply
[+] [-] saosebastiao|12 years ago|reply
[+] [-] triplesec|12 years ago|reply
[+] [-] mtdewcmu|12 years ago|reply
[+] [-] ExpiredLink|12 years ago|reply
This is where the easy steps become hard for real noobs.
[+] [-] pbhjpbhj|12 years ago|reply
[+] [-] julochrobak|12 years ago|reply
It's focusing on bandicoot language, but concepts are taken from the relational algebra. It should match the logic of SQL.
[+] [-] jcampbell1|12 years ago|reply
[+] [-] chris_wot|12 years ago|reply
[+] [-] siddboots|12 years ago|reply
[+] [-] lukaseder|12 years ago|reply
[+] [-] harrytuttle|12 years ago|reply
Know thy database engine for if you do not, your local DBA will be mightily pissed off when you do a cross join across 15 tables and watch his/her IOPS go through the roof...
[+] [-] lukaseder|12 years ago|reply
[+] [-] ams6110|12 years ago|reply
[+] [-] Shivetya|12 years ago|reply
[+] [-] shire|12 years ago|reply