Easy Steps to a Complete Understanding of SQL

[+] ams6110|12 years ago|reply

SQL JOIN tables should be used rather than comma-separated tables

Having used SQL since long before the ANSI JOIN syntax was well supported (first Sybase, then MS SQL and then Oracle) I resisted it for a long time out of habit, and also because at first the syntax was buggy when used in Oracle.

But I have come around to being in favor of it. The bugs have been fixed, and the main advantages are that: inner and outer joins are more clearly stated than by using '*=', or '(+)' suffixes on one side of a predicate; and the join criteria are clearly separated from the WHERE clause. It makes it much easier to see how the tables are being joined vs. how the results are being limited.

[+] chris_wot|12 years ago|reply

They stop Cartesian products from accidentally occurring. That alone should force the use of ANSI joins. cross joining two million row tables is not a pleasant experience!

[+] batbomb|12 years ago|reply

Limiting syntax to use only ANSI joins also means that you end up with one and only one table as the root table (or view) to select from. I think this makes code much clearer.

I tend to writ my SQL like lisp, very compositional with lots of embedded views, in order to be very explicit on how I want tables and views joined. If the optimizer is good, it will know what parts of your statement it can optimize, and if it's not, it usually follows my code, and I don't have to resort to Optimizer hints in order for Oracle to get its black magic done.

[+] einhverfr|12 years ago|reply

I think the key is that join condition problems are easier to spot. There are several types query issues which are classic join condition issues. If you use cross join and where, these are harder to spot when troubleshooting. I think this is a bigger issue actually than accidental cross-joins. The latter sometimes happens occasionally. The former happens much more frequently and so wins there in terms of debugging time are quite important.

[+] zamalek|12 years ago|reply

The way I see SQL in my mind is programming Venn Diagrams ( http://en.wikipedia.org/wiki/Venn_diagram ) followed by creating a projection that uses one or more of the areas within that diagram (contrasting strongly with imperative languages, i.e. logic programming). Unfortunately it's not straight-forward to do that because SQL is a superset of Venn Diagrams, but that line of thinking is where you need to be.

[+] ams6110|12 years ago|reply

Yes, exactly. To use SQL well you need to think in terms of sets and set operations. Venn Diagrams are a good idea to help visualize this. If you find yourself writing a lot loops over cursors there's a good chance you're doing it wrong.

[+] jfb|12 years ago|reply

I thought this was a very good article, unusually so for a blog post about databases. I do take issue, however, with the idea that there's a beauty to SQL. There's a beauty in the conceptual model, but you have to squint sideways to see it through the deeply hostile syntax.

[+] clarkevans|12 years ago|reply

I'm not sure the "Relational Model" is even a great conceptual model for common business or analytical processing. I think SQL is ugly because the model is a poor fit; if the model was solid, a clean syntax would follow naturally. As for alternatives, MDX has superseded it for core "pivot table" analytics. For application processing, it seems modern document/hierarchical databases are good for many transactional needs.

What modern relational databases have... is a very intuitive abstract storage model. That's awesome. We've been working on an alternative "navigational model" (inspired from CODASYL and ORMs) based on this storage model. Our experimental query language is at http://htsql.org

Our critique of SQL is at http://htsql.org/doc/overview.html#why-not-sql

[+] mathattack|12 years ago|reply

I found the relational beauty in being able to "store every unique value just once." It resonated with the programming concept of "write a function so that you don't need to change things in many places." This in turn gave me a lens to viewing how businesses operate independent of what systems you put on top of the data.

Does this mean SQL is pretty? Not really, but it gets the job done. Though I did have to spend a lot of time optimizing SQL to force certain query optimizations.

Does this mean relational should be used everywhere? No, but it has a distinct place. I think beer is beautiful too, but I won't drink it for breakfast with the in-laws.

[+] einhverfr|12 years ago|reply

I disagree. There are issues with SQL (how many things can NULL mean?) but there is also beauty in it. Every language has issues.

You don't see the beauty in SQL until you try to get as much bang for the buck out of specific queries. Then the idea that you can think in terms of math and sets really shines. You can do a lot in SQL which would require a lot more app code to accomplish, and the SQL will be more elegant and easier to maintain.

[+] chris_wot|12 years ago|reply

The conceptual model is great, but IMHO a little impractical. Anyone who has ever had to trick the optimiser into choosing a better plan by changing the semi join from WHERE EXISTS to NOT IN (or vice versa) will appreciate what I'm saying :-)

[+] rhizome31|12 years ago|reply

Another thing to understand is how NULLs are handled in different contexts, using either the familiar two-valued logic or the more exotic three-valued logic. It's kind of messy but really worth knowing if you're working with SQL. The wikipedia page actually gives a pretty good account of the issue: https://en.wikipedia.org/wiki/Null_%28SQL%29

[+] bsaul|12 years ago|reply

The most beneficial thing you can do when you start with SQL is to NEVER EVER put join clauses in the WHERE section. ALWAYS put the join clauses as close to the table you're joining as possible.

eg : never do SELECT * FROM A JOIN B WHERE A.ID = B.a_id and B.id > 10 but always JOIN B ON A.ID = B.a_id AND B.id > 10

the second way of doing "scale" much much better when you add more tables, and start mixing left and right joins.

[+] lysium|12 years ago|reply

Great article! I wish all SQL libraries would adhere to this structure instead of forcing devs to write glorified (string) SQL. Ie., let me

    from(join(table1, table2))
    .where(table1.col1 > 10)
    .groupBy(col1)
    .project(col1, avg(col2))

and only in that order.

[+] adad95|12 years ago|reply

http://www.linqpad.net/WhyLINQBeatsSQL.aspx

[+] batbomb|12 years ago|reply

I wrote my own library in Java recently for something similar. It was inspired by LINQ and SQL grammar.

     Select statement = table1.select()
       .join(table2,[join condition])
       .where(table1.col1.gt(10))
       .groupBy(table1.col1)
       .selection(col1, Fn.avg(table1.col2))

     Select s = new Select()
        .from(table1)
        .join(table2, table2.t1_col1.eq(table1.col1))
        .where(table1.col1.gt(10))
        .groupBy(table1.col1)
        .selection(col1, Fn.avg(table1.col2));

[+] lukaseder|12 years ago|reply

Check out http://www.jooq.org, by the same author as the article (me)

[+] chris_wot|12 years ago|reply

Have you tried LINQ?

[+] unknown|12 years ago|reply

[deleted]

[+] chris_wot|12 years ago|reply

Be careful with point 6. If you use a subquery through an IN clause and it returns a NULL as one of the rows, nothing will return.

[+] lukaseder|12 years ago|reply

That's only true for NOT IN predicates (which is mentioned in the article)

Example: http://sqlfiddle.com/#!4/d41d8/16785

    with a(x) as (
      select 1    from dual union all
      select 2    from dual union all
      select null from dual
    )
    select 1 from dual where 1 in (select x from a)

[+] hobs|12 years ago|reply

I was just about to say that! Truthfully this is a good introduction but is definitely not everything you need to know about SQL.(Obviously) Handling nulls is just one of many sticklers people using SQL frequently screw up.

[+] saosebastiao|12 years ago|reply

Fantastic article. I clicked on it, hoping to find a way to help some of my less SQL-savvy co-workers to understand SQL a little better. I instead came away with a better understanding of my own.

[+] triplesec|12 years ago|reply

Stuff in here they nevr taught me all those years ago either! And a smart geek is never be too proud for a refresher.

[+] mtdewcmu|12 years ago|reply

I like SQL. The relational model is often underestimated. All of the complex structure that goes beyond individual rows is orthogonal and is itself first-class data. Contrast with document databases where hierarchies are part of the low-level storage mechanism. By making the structure first-class data, the relational model keeps you from painting yourself into a corner.

[+] ExpiredLink|12 years ago|reply

> If a has 3 columns and b has 5 columns, then the "output table" will have 8 (3 + 5) columns. The records contained in this combined table reference are those of the cross product / cartesian product of a x b.

This is where the easy steps become hard for real noobs.

[+] pbhjpbhj|12 years ago|reply

A graphical example of a Cartesian join - http://web.utk.edu/sas/OnlineTutor/1.2/en/60477/m71/m71_3.ht....

[+] julochrobak|12 years ago|reply

here are some other graphical illustrations of the operators from relational algebra - http://bandilab.org/bandicoot-algebra.pdf

It's focusing on bandicoot language, but concepts are taken from the relational algebra. It should match the logic of SQL.

[+] jcampbell1|12 years ago|reply

I think the author has the order wrong. Consider:

    SELECT WEEKDAY(created_at) wkday, COUNT(1) 
    FROM orders
    GROUP BY wkday

The above query works fine, thus GROUP BY, ORDER BY, and HAVING are all aware of SELECT.

[+] chris_wot|12 years ago|reply

Those are expressions. They only work once you have picked the table(s) - FROM - and determined the rows you want returned - WHERE. The query then groups and orders the results and calculates the scalar value of COUNT(1) which it repeats on every row, then it determines the weekday for every row for the column created_at and uses that value for output.

[+] siddboots|12 years ago|reply

Which RDBMS? This would not work on Oracle. You would need "group by WEEKDAY(created_at)" in this example.

[+] lukaseder|12 years ago|reply

True, the explanation was maybe a bit simplistic. Some expressions are allowed to appear even if they're not declared in the GROUP BY clause...

[+] harrytuttle|12 years ago|reply

Not a bad article but it needs a step 11:

Know thy database engine for if you do not, your local DBA will be mightily pissed off when you do a cross join across 15 tables and watch his/her IOPS go through the roof...

[+] lukaseder|12 years ago|reply

Well, to beginners it's always good to express things in a positive way. The "DBA rules" is part of the 2nd year of SQL experience tutorial :-)

[+] ams6110|12 years ago|reply

This is why you have a test instance for developers (or more likely now, every developer runs his own engine locally or in a VM).

[+] Shivetya|12 years ago|reply

how many platforms do not permit setting limits on processing of queries?

[+] shire|12 years ago|reply

Thanks for this article very beneficial, I'm learning about Databases at the moment so this definitely comes in handy while learning SQL.

80 comments