Postgres gets support for upsert

[+] pbnjay|10 years ago|reply

I think I squealed with delight when I read this title. This is really a great addition, and will simplify so much of my code once I'm able to upgrade.

I try not to think too deeply about all the DB internals (some people are much smarter than me about this stuff), but clearly there are a lot of things that need maintained internally to make this correct, so kudos to all the people who contributed to this.

[+] craigkerstiens|10 years ago|reply

A big +1 to this. Peter Geoghegan is the primary author and has been championing it for over 2 Postgres releases now. I know he's especially excited to see it land.

Also it couldn't have been possible without the support of Heikki and Andres (who actually submitted this to HN as his first ever submission).

[+] anilshanbhag|10 years ago|reply

This is actually huge. A common problem that arises when you write applications is you want to INSERT if key does not exist else UPDATE. The right way of doing this without an upsert is using a transaction. However this will make life easier as you can do it directly in one SQL statement.

[+] colanderman|10 years ago|reply

Just "using a transaction" is insufficient. You must be prepared to handle the case that neither the INSERT nor the UPDATE succeeds (with READ COMMITTED isolation), or that the transaction fails (with REPEATABLE READ isolation or better), by repeating the transaction. And if latency is at all a concern to you, you must wrap this all in a stored procedure to avoid the necessary round-trip-time between the commands.

Hence this is more than just saving typing a couple lines -- this saves writing entire stupid loops to do what is conceptually a simple (and very common) operation.

Postgres gets better and better.

[+] ryan0x00|10 years ago|reply

i've seen a system in mysql do "insert on duplicate key update" for ages, so this is that same pattern?

[+] NoMoreNicksLeft|10 years ago|reply

Why do you need to use a transaction for that?

Do the update statement, followed by insert-where-not-exists. If the update doesn't match, 0 rows updated, and the insert works. If the update matches, then 0 rows inserted.

[+] viraptor|10 years ago|reply

I love that it's there now. I've been waiting for it for a long time, but one thing I don't get is... why does every single implementation have to have their own slightly different syntax?

pgsql -> on conflict

mysql -> replace / on duplicate

oracle -> merge

mssql -> merge

sqlite -> insert or replace

firebird -> merge / update or insert

[+] colanderman|10 years ago|reply

There's a great discussion here: https://wiki.postgresql.org/wiki/UPSERT#Syntax_discussion It basically boils down to that the different syntaxes mean slightly different things. The PG devs (IMO rightly) don't want to adopt an existing syntax that does something slightly different than what they are willing/able to provide, thus confusing users.

[+] anarazel|10 years ago|reply

MERGE is actually a separate feature. We're still considering implementing it. But it's syntax is too cumbersome for many usages. There's also some unrelated complexity - so implementing merge support does get easier by the infrastructure in here, but a fair amount of work remains.

[+] daigoba66|10 years ago|reply

I'm a fan of MERGE in T-SQL. An "upsert" is just one use-case enabled by the statement. Granted for the relatively simple use-case of an "upsert", especially a single row, MERGE has incredibly cumbersome syntax.

But MERGE also, in my opinion, aligns better with set-based logic. I.e., I have two _sets_ of data that I want to merge together. In some cases I want need to INSERT rows into the target, in other cases I need to UPDATE, and in some cases I might DELETE. The <ouput_clause> in T-SQL is also quite useful.

[+] nolok|10 years ago|reply

I'm not very familiar with those specific features in other dbms, but in mysql's case "on duplicate" and "replace" are two very different thing. The first is an upsert, while the other drops the old row if it exists and always make a new one.

Both have different use cases.

[+] DigitalSea|10 years ago|reply

Yet another reason why I absolutely love Postgres. It might have taken four years, but we finally got what many of us have been asking for. Nothing ruined my day more than having to write insert-update loops, I am beyond ecstatic for this.

[+] dan15|10 years ago|reply

You love Postgres because it takes four years to get requested features?

[+] petergeoghegan|10 years ago|reply

The documentation covering the new feature has now built: http://www.postgresql.org/docs/devel/static/sql-insert.html

[+] electrotype|10 years ago|reply

Postgres's documentation is one of the best I've seen. Thanks to everybody working on it!

[+] endymi0n|10 years ago|reply

For the backstory on why this has taken so long: http://www.depesz.com/2012/06/10/why-is-upsert-so-complicate...

[+] kbendyk|10 years ago|reply

I know it doesn't matter to some developers out there, but for me, the only thing left is the ability to add a column after another one. Even if it's only logical order, it's easier for me to read a table structure when fields are ordered nicely.

[+] leftnode|10 years ago|reply

Same thing here, I would love this feature. I like to make my tables have a common structure:

    id
    created_at
    updated_at
    status
    fkey1_id
    fkey2_id
    fkey3_id
    date_field1
    date_field2
    column1
    column2
    column3

Being able to add a new foreign key column with the "rest" of them would be awesome.

[+] jeltz|10 years ago|reply

Work on this started recently, but since it is an open source project there is no way to know if this will make it to 9.6 or not (the work was start too late to get into 9.5).

http://www.postgresql.org/message-id/flat/20141209174146.GP1...

[+] narsil|10 years ago|reply

Unsure what you mean. A subset of columns can be selected in any order, across tables. Were you referring to the select order used by "*"?

[+] corford|10 years ago|reply

I'd love this too. I wanted materialised views and upsert, so here's hoping :)

[+] ozgune|10 years ago|reply

So excited about upserts in Postgres!

This was the most requested feature in PostgreSQL: https://postgresql.uservoice.com/forums/21853-general

[+] olalonde|10 years ago|reply

Which version of Postgres will this land in?

[+] amitlan|10 years ago|reply

9.5, that is, upcoming!

[+] piker|10 years ago|reply

I am extremely happy to hear about this. We currently insert, catch the exception and update if necessary. That can lead us to race conditions in concurrent transactions. Thank you very much for your excellent work on Postgres. It continues to be an amazing product to work with.

[+] abannin|10 years ago|reply

I want to give a giant digital hug to all PG committers who made this happen.

[+] taspeotis|10 years ago|reply

Who decided the syntax? Why not use MERGE ... INTO ... USING ... ON ... WHEN like Oracle and MSSQL and apparently ANSI SQL [1]? I'm not aware of ON CONFLICT being standard.

[1] http://en.wikipedia.org/wiki/Merge_(SQL)

[+] veddan|10 years ago|reply

There's some extended discussion of the issue on the wiki[1]. A future full MERGE implementation might have slightly different semantics or performance characteristics than ON CONFLICT (which might also differ from other DBMS).

[1] https://wiki.postgresql.org/wiki/UPSERT#SQL_MERGE_syntax

[+] ivoras|10 years ago|reply

Does the description seems like it's a two-pass operation to anyone else? Can anyone with more detailed knowledge tell why can't this be a single-pass "INSERT in a unique index, UPDATE if it fails" operation?

[+] anarazel|10 years ago|reply

To understand the problem one first has to know that for better or worse postgres does not use index organized tables. Additionally this ought to work not only on a primary key but also on additional unique constraints. Including partial unique indexes, potentially over expressions (say lower(username) or something).

That makes some approaches harder.

The bigger problem is that 'update if fails' is easier than it sounds. When you find a conflicting row in the index, that row's creating transaction might not yet have committed. So you need to first wait for it to commit, and then retry. But that implies that the transaction could actually roll back or delete the row again. And suddenly there's not actually a row to UPDATE. Thus you need to retry, right?

There's also some issues around avoiding deadlocks. If you "naively" just acquire a 'value lock' on the to-be-inserted row, you can easily get into deadlock territory if more than one row is inserted in one transaction. Such value locks have to be released after an attempt to update. Which then necessitates a looping attempt...

Does that start to explain the problems a bit?

[+] smegel|10 years ago|reply

I remember about 10 years ago using UPSERT in a Teradata database to do delta-diff updating of tables.

Basically, I was updating a table from an external file, and I wanted to add what was new, leave what was the same, and delete any records no longer in the external table. Using UPSERT to insert new rows, update a timestamp on existing rows, and then afterwards delete any rows who's timestamp was before the UPSERT operation. Because of the nature of the file, doing an external patch/diff approach was difficult, but the UPSERT operation worked like a breeze!

[+] donw|10 years ago|reply

Damnit, I just wrote a sweet gem that handles this... ah, okay, but it requires an index constraint, whereas I needed to merge on arbitrary conditions. So I didn't totally waste my time.

Way to go Postgres team!

[+] loopdoend|10 years ago|reply

Is the gem open source? I checked your github and didn't see it, I think people would still find it useful!

[+] corford|10 years ago|reply

I've been looking forward to this for ages! The solution we use at the moment for upsert is a create or replace rule on the table but it isn't great as you need to remember to update the rule whenever you change the table schema...

  CREATE OR REPLACE RULE "replace_row" AS
  ON INSERT TO my_table
  WHERE EXISTS(SELECT 1 FROM my_table WHERE id=NEW.id)
  DO INSTEAD
  (UPDATE my_table SET
  col1=NEW.col1,
  col2=NEW.col2,
  col3=NEW.col3,
  WHERE id=NEW.id);

[+] anarazel|10 years ago|reply

That unfortunately is not safe on several fronts.

For one, the WHERE EXISTS() will not see concurrent insertions by transactions that are still in progress, which means you'll still get constraint violation errors.

For another, rules generally have very surprising behaviour. E.g. with this RULE you'll get into trouble if "id" isn't passed in as a explicit value, but uses a non-deterministic DEFAULT or directly passed in expression. Every reference to NEW.id will not be replaced by the result of that expression, but rather with the expression itself. For the common serial id column (aka autoincrement) you'll have a separate nextval() call in each reference. Which can make this explode in pretty damn confusing ways.

[+] agotterer|10 years ago|reply

Does anyone know when this is expected to be released?

[+] elchief|10 years ago|reply

New versions of Postgres usually ship in September. Except 9.4, which slipped to December.

[+] asnyder|10 years ago|reply

As someone that's been using PostgreSQL way before it was cool (since 2005), I'm super happy to see this. Previously I had to do EXISTS, IF FOUND UPDATE ELSE INSERT. Now I can just do upsert, so nice.

Very thankful to all the hard working Postgres devs. PostgreSQL was really good when I first started using it, and just keeps getting better. Glad to see the general community finally getting on the PostgreSQL bandwagon.

Now let's see if I can get more people to use internal PostgreSQL functions instead of constantly writing statements directly in their code.

[+] Mister_Snuggles|10 years ago|reply

This reminds me of a common idiom in the Pick-style databases that I've worked on...

    READU REC FROM FILE,KEY ELSE
        REC<1> = 'stuff'
        * Set other fields as appropriate for a brand new record - this is the 'insert'
    END
    REC<1> = 'blah'
    * Set other fields as appropriate for an update
    WRITE BLAH TO FILE,KEY ;* Write record and release locks

This can, of course, be wrapped in a transaction.

[+] unknown|10 years ago|reply

[deleted]

109 comments