top | item 9018129

Modern SQL in PostgreSQL

446 points| lelf | 11 years ago |slideshare.net

130 comments

order
[+] rosser|11 years ago|reply
One of my favorite idioms lately for doing bulk updates without incurring lock contention is to chain CTEs, like so:

  with candidate_rows as (
     select id
       from table
      where conditions
      limit 1000
        for update nowait
  ), update_rows as (
     update table
        set column = value
       from candidate_rows
      where candidate_rows.id = table.id
  returning table.id
  )
  select count(1) from update_rows;
...and loop on issuing that query until the "count(1)" returns zero some for number of iterations (three works pretty well).

Want to add a column to your "orders" table and populate it without blocking concurrent writes for as long as it will take to rewrite a multi-million row table? Want to re-hash user passwords using something stronger than MD5, but not prevent users from ... you know, logging in for the duration?

CTEs are all that and the bag of chips.

[+] lastofus|11 years ago|reply
Do you have any resources for learning cool advanced SQL like this? I've used CTEs in the past, just from reading about them in the documentation, but I feel like I'm only scratching the surface of what's possible out there.
[+] jeffdavis|11 years ago|reply
"Want to add a column to your "orders" table and populate it without blocking concurrent writes for as long as it will take to rewrite a multi-million row table?"

PostgreSQL ALTER TABLE ... ADD COLUMN is an O(1) operation, and requires no data rewrite (as long as you are OK with NULLs).

You can even add an attribute to a composite type, and existing tables using that type will see the extra attribute. Again, O(1), no data rewrite.

[+] kibibu|11 years ago|reply
> re-hash user passwords using something stronger than MD5

Isn't this only possible if you do it on user login - unless you are also cracking user passwords... Actually has anybody cracked their own MD5 user passwords to upgrade them?

[+] platz|11 years ago|reply
Is this really different from using a transaction with separate read (using an update lock) and update statements, and looping over that in the same way from the application until the count is 0?
[+] nroose|11 years ago|reply
I was interested in NoSQL. Went to a Mongo pres. Both examples would have been easier and faster in SQL. Even the SQL I used 20 years ago. I asked for an example that would show performance advantage. I got a tired vague statement about the vague performance advantage. Seems like snake oil to me.
[+] cgh|11 years ago|reply
Unfortunately you were downvoted, but it wasn't so long ago on HN that every second story was NoSQL this, NoSQL that. There were even "SQL is dead"/"relational DBs are dead" posts, just ridiculous. So it's nice to see stories like this.

I've been writing a lot of recursive queries for Postgresql lately using CTEs. Quite cool though a little mindbending at times.

[+] vidarh|11 years ago|reply
"NoSQL" is generally a misnomer. It's not SQL that is/was the problem, but that there are a lot of cases where specific common properties of RDBMS' are limiting. The NoSQL moniker is a result of the fact that most of these RDBMS's uses SQL as the query language, and most of the "new" database engines does/did not.

Since then, a lot of the RDBMS's have adopted features that have reduced the gap. E.g. Postgres' rapidly improving support for indexed JSON data means that for cases where you have genuine reasons to have data you don't want a schema for, you can just operate on JSON (and best of all, you get to mix and match).

For some of the NoSQL databases that puts them in a pickle because they're not distinguishing themselves enough to have a clear value proposition any more.

But it is not the lack of SQL that has been the real value proposition.

[+] headgasket|11 years ago|reply
(accidentally posted this to the root of the story)

Postgresql 9.4 with jsonb sends mongo to the dustbin, IMHO. If you have to write it in js close to the data or if plpgsql is too steep of a learning curve, you can play with the experimental plv8. But you should really pick up plpgsql, it's "python" powerful, with the an awesome db (and has a python 2.x consistent API, sadly, but the doc is very good) There is a great sublime 2.0 package that makes the writing and debugging of functions in one file just awesome. Write an uncalled dumb function that has a lot of the API in it at the top of your file, and you'll get autocomplete on this part of the API. Specifically no not miss getting acquainted with json and hstore, specifically using json as a variable size argument passing and returning mechanism, it's just hilariously effective. cheers, and keep making this place(not only HN, our blue dot) better, F

[+] lkrubner|11 years ago|reply
> Both examples would have been easier and faster in SQL.

That's easy, so long as we mean the whole entire project when we say "faster". When I worked at Timeout.com they were importing information about hotels from a large number of sources. For some insane reason, they were storing the data in MySql. Processing was 2 step:

1.) the initial import was done with PHP

2.) a later phase normalized all the data to the schema that we wanted, and this was written in Scala

The crazy thing was that, during the first phase, we simply pulled in the data and stored it in the form that the 3rd party was using. That meant that we had a separate schema for every 3rd party that we imported data from. I think we pulled data from 8 sources, so we had 8 different schemas. When they 3rd party changed their schema, we had to change ours. If we added a 9th source of information, then we would have to create a 9th schema in MySQL. We also checked the 3rd party schema at this phase, which struck me as silly because this did not mean that step 2 could be innocent of the schema, rather, both step 1 and step 2 would have to know the structure of those foreign schemas, but it was necessary because we were writing to a database that had a schema.

The system struck me as verbose and too complicated.

It's important to note that most of the work involved with step 1 could be skipped entirely if we used MongoDB. Simply import documents, and don't care about their schema. Dump all the data we get in MongoDB. Then we can move straight to step 2, which is taking all those foreign schemas and normalizing them to the schema that we wanted to use.

For ETL situations like, NoSQL document stores offer a huge convenience. Just grab data and dump it somewhere. Simplify the process. Your transformation phase is the only phase that should have to know about schemas, the import phase should be allowed to focus on the details of getting data and saving it.

[+] juliangregorian|11 years ago|reply
I'm not about to go to bat for Mongo in an SQL thread (there's plenty of problems with that platform that are real), but I rather enjoy their query syntax, it's very AST-like, even through the aggregation pipeline.

I don't believe it's probably much faster even in the best case (and I'm sure an experienced SQL expert wouldn't find it any "easier"), but on a grammatical level I do find it a fresh take on query structure, and writing queries and map-reduce jobs in coffeescript was extremely satisfying because of how terse, elegant, and pseudocode-like it turned out.

[+] dtech|11 years ago|reply
MySQL (and thus MariaDB I presume) and SQLite seem to be pretty poor in supporting these "new" features.

SQLite seems logical because it needs to be kept lean for embedding purposes, but do people know why MySQL is lagging behind so much?

[+] jeffdavis|11 years ago|reply
It seems to be a philosophical difference. A lot of people seem to think that the database should only support primitive operations, and the rest should be left to application code.

I am not one of those people -- I think a good database system (like postgres) can make many things dramatically simpler.

[+] barrkel|11 years ago|reply
My experience with PostgreSQL and MySQL is that PostgreSQL supports lots of features, but performance needs a very careful eye - often a neat feature isn't as useful as it seems because optimization isn't good enough. Whereas MySQL tends to be very good at what it can do, and makes it easy to do silly things, like turn off referential integrity in a slow query to boost performance.

It almost feels like a worse-is-better story. As a programmer, PostgreSQL is much better to work with; more tools, better EXPLAIN, more features, more types, more of almost everything. But to use in the heat of battle, it's less clear-cut. PostgreSQL's replication story is complicated. MySQL master-master replication is fairly easy to set up, and if you use a master as a hot failover, it all mostly just works; when the primary site comes back up, it resyncs with the failover. PostgreSQL has a lot of different replication stories - without a strong central narrative, it's hard to gain confidence.

[+] weaksauce|11 years ago|reply
Oracle influence? Postgres is the leader now in mindshare and features for the opensource offerings. If you like MySQL you could check out mariadb
[+] threeseed|11 years ago|reply
> but do people know why MySQL is lagging behind so much

Because for 99% of the MySQL user base there isn't a need for these features.

Those using MySQL at scale are using them as dumb key-value stores with horizontally sharding. Those who aren't typically are using them with an ORM and so they aren't dealing with the database at the SQL layer.

Everyone else who is manually writing SQL generally was on or moved to Oracle, Teradata, SQL Server, PostgreSQL etc anyway.

[+] mjrpes|11 years ago|reply
One thing that would be nice is if SQL provided first class support for sub records.

So instead of "SELECT name, (SELECT GROUP_CONCAT(CONCAT_WS(',', post_id, post) SEPARATOR ';') FROM posts p WHERE p.user_id = u.user_id) AS 'posts' FROM users u WHERE u.user_id = 1",

you could do "SELECT name, (SELECT post_id, post FROM posts p WHERE p.user_id = u.user_id) AS 'posts' FROM users u WHERE u.user_id = 1".

and the query result would be { name : 'Todd', posts : [ { post_id : 1, post : 'My Comment' } ] }.

Obviously this is a simple example and could have been rewritten as a query on the posts table, inner joined on the user table, and duplicating the user's name in the result. But it becomes much nicer to have as queries get more complex.

A query that supports sub records would gives you flexibility to structure data like a JSON object and simplify the server end of REST apis.

[+] exogen|11 years ago|reply
With Postgres:

    SELECT u.name, json_agg(p) AS posts
        FROM users u, posts p
        WHERE p.user_id = u.user_id
        GROUP BY u.user_id;
[+] qooleot|11 years ago|reply
If you're just trying to get JSON out (for a simple query to REST api, or in a node.js environment), have you considered converting the record/recordset to json in the subquery?

The json functions in 9.3+ are pretty handy for that sort of thing. Andrew (core developer who wrote most of that functionality) and I decided to keep the api pretty lightweight, as its easy to also add your own functions to suit your needs.

[+] zapov|11 years ago|reply
Postgres/Oracle supports such constructs.

And Revenj has been using it for years: https://github.com/ngs-doo/revenj

And yes, Revenj now comes with an offline compiler ;)

[+] headgasket|11 years ago|reply
Postgresql 9.4 with jsonb sends mongo to the dustbin, IMHO. If you have to write it in js close to the data or if plpgsql is too steep of a learning curve, you can play with the experimental plv8.

But you should really pick up plpgsql, it's "python" powerful, with the an awesome db (and has a python 2.x consistent API, sadly, but the doc is very good) There is a great sublime 2.0 package that makes the writing and debugging of functions in one file just awesome. Write an uncalled dumb function that has a lot of the API in it at the top of your file, and you'll get autocomplete on this part of the API.

Specifically no not miss getting acquainted with json and hstore, specifically using json as a variable size argument passing and returning mechanism, it's just hilariously effective.

cheers, and keep making this place(not only HN, our blue dot) better, F

[+] gamesbrainiac|11 years ago|reply
Anyone know where I can get the full video lecture for this? I'd really like to see it.
[+] fatalmind|11 years ago|reply
I've given this talk on Saturday in Moscow again and it was video taped there. Organizers told me they'll publish the English original as well as the Russian voice over version. I'm keen for the second one ;)

I keep you posted.

[+] jiphex|11 years ago|reply
It was given at FOSDEM, the videos should be available "soon".
[+] leoh|11 years ago|reply
Interesting. I don't think django implements a lot of this in their ORM.
[+] qooleot|11 years ago|reply
I work on several Django apps, and also write advanced SQL in Postgres (CTEs, plv8 functions, materialized views, etc.). One thing the ORM allows though is writing a custom sql query and its pretty easy to 'cast' the sql results into a Django model.

With that said, its still sometimes a struggle to justify if almost all the queries are highly custom, and the app doesn't require Django Admin. Certainly there are advantages, but the minimalism of Flask and focusing on the REST interface for integration and building micro services is more appealing in some cases.

[+] jfb|11 years ago|reply
SQLAlchemy supports many of these advanced concepts.
[+] hcarvalhoalves|11 years ago|reply
Django's ORM is very limited, it doesn't even expose GROUP BY. I don't think there will ever be support for more powerful constructs like these.
[+] juliangregorian|11 years ago|reply
I often find myself wishing that SQL had a more uniform syntax. So much of it resembles COBOL to me, with all the capital letters and "english-like" arbitrary word order. At the same time, relational algebra exists and is very elegant, in an APL-like way. Would be very interested if there was a project to expose such an api to SQL databases.
[+] ris|11 years ago|reply
SQLAlchemy (for python) tries to sanitize the syntax away from SQL with some level of success.
[+] collyw|11 years ago|reply
SQL is case insensitive. It is a bit quirky though.
[+] xvirk|11 years ago|reply
is it even possible ?
[+] liprais|11 years ago|reply
According to my understanding ,Teradata has already supported these so-called "modern" features for several years.
[+] ecopoesis|11 years ago|reply
And yet, despite all these cool features, Postgres still doesn't support upsert or merge, which even MySQL manages to get right.
[+] rosser|11 years ago|reply
What's really funny about this comment is that, while all of the features discussed in TFA are fully standards-compliant SQL, MySQL implements none of them, and its "upsert" isn't.

EDIT: And before one slags PostgreSQL too hard for not currently supporting upsert, one might peruse the relevant pg wiki page [1] to better understand where development stands and why we don't yet have it. (Hint: it's actually kinda complicated, assuming you want it done, you know, "right".)

[1] https://wiki.postgresql.org/wiki/UPSERT