I think I squealed with delight when I read this title. This is really a great addition, and will simplify so much of my code once I'm able to upgrade.
I try not to think too deeply about all the DB internals (some people are much smarter than me about this stuff), but clearly there are a lot of things that need maintained internally to make this correct, so kudos to all the people who contributed to this.
A big +1 to this. Peter Geoghegan is the primary author and has been championing it for over 2 Postgres releases now. I know he's especially excited to see it land.
Also it couldn't have been possible without the support of Heikki and Andres (who actually submitted this to HN as his first ever submission).
This is actually huge. A common problem that arises when you write applications is you want to INSERT if key does not exist else UPDATE. The right way of doing this without an upsert is using a transaction. However this will make life easier as you can do it directly in one SQL statement.
Just "using a transaction" is insufficient. You must be prepared to handle the case that neither the INSERT nor the UPDATE succeeds (with READ COMMITTED isolation), or that the transaction fails (with REPEATABLE READ isolation or better), by repeating the transaction. And if latency is at all a concern to you, you must wrap this all in a stored procedure to avoid the necessary round-trip-time between the commands.
Hence this is more than just saving typing a couple lines -- this saves writing entire stupid loops to do what is conceptually a simple (and very common) operation.
Do the update statement, followed by insert-where-not-exists. If the update doesn't match, 0 rows updated, and the insert works. If the update matches, then 0 rows inserted.
I love that it's there now. I've been waiting for it for a long time, but one thing I don't get is... why does every single implementation have to have their own slightly different syntax?
There's a great discussion here: https://wiki.postgresql.org/wiki/UPSERT#Syntax_discussion It basically boils down to that the different syntaxes mean slightly different things. The PG devs (IMO rightly) don't want to adopt an existing syntax that does something slightly different than what they are willing/able to provide, thus confusing users.
MERGE is actually a separate feature. We're still considering implementing it. But it's syntax is too cumbersome for many usages. There's also some unrelated complexity - so implementing merge support does get easier by the infrastructure in here, but a fair amount of work remains.
I'm a fan of MERGE in T-SQL. An "upsert" is just one use-case enabled by the statement. Granted for the relatively simple use-case of an "upsert", especially a single row, MERGE has incredibly cumbersome syntax.
But MERGE also, in my opinion, aligns better with set-based logic. I.e., I have two _sets_ of data that I want to merge together. In some cases I want need to INSERT rows into the target, in other cases I need to UPDATE, and in some cases I might DELETE. The <ouput_clause> in T-SQL is also quite useful.
I'm not very familiar with those specific features in other dbms, but in mysql's case "on duplicate" and "replace" are two very different thing. The first is an upsert, while the other drops the old row if it exists and always make a new one.
Yet another reason why I absolutely love Postgres. It might have taken four years, but we finally got what many of us have been asking for. Nothing ruined my day more than having to write insert-update loops, I am beyond ecstatic for this.
I know it doesn't matter to some developers out there, but for me, the only thing left is the ability to add a column after another one. Even if it's only logical order, it's easier for me to read a table structure when fields are ordered nicely.
Work on this started recently, but since it is an open source project there is no way to know if this will make it to 9.6 or not (the work was start too late to get into 9.5).
I am extremely happy to hear about this. We currently insert, catch the exception and update if necessary. That can lead us to race conditions in concurrent transactions. Thank you very much for your excellent work on Postgres. It continues to be an amazing product to work with.
Who decided the syntax? Why not use MERGE ... INTO ... USING ... ON ... WHEN like Oracle and MSSQL and apparently ANSI SQL [1]? I'm not aware of ON CONFLICT being standard.
There's some extended discussion of the issue on the wiki[1]. A future full MERGE implementation might have slightly different semantics or performance characteristics than ON CONFLICT (which might also differ from other DBMS).
Does the description seems like it's a two-pass operation to anyone else? Can anyone with more detailed knowledge tell why can't this be a single-pass "INSERT in a unique index, UPDATE if it fails" operation?
To understand the problem one first has to know that for better or worse postgres does not use index organized tables. Additionally this ought to work not only on a primary key but also on additional unique constraints. Including partial unique indexes, potentially over expressions (say lower(username) or something).
That makes some approaches harder.
The bigger problem is that 'update if fails' is easier than it sounds. When you find a conflicting row in the index, that row's creating transaction might not yet have committed. So you need to first wait for it to commit, and then retry. But that implies that the transaction could actually roll back or delete the row again. And suddenly there's not actually a row to UPDATE. Thus you need to retry, right?
There's also some issues around avoiding deadlocks. If you "naively" just acquire a 'value lock' on the to-be-inserted row, you can easily get into deadlock territory if more than one row is inserted in one transaction. Such value locks have to be released after an attempt to update. Which then necessitates a looping attempt...
I remember about 10 years ago using UPSERT in a Teradata database to do delta-diff updating of tables.
Basically, I was updating a table from an external file, and I wanted to add what was new, leave what was the same, and delete any records no longer in the external table. Using UPSERT to insert new rows, update a timestamp on existing rows, and then afterwards delete any rows who's timestamp was before the UPSERT operation. Because of the nature of the file, doing an external patch/diff approach was difficult, but the UPSERT operation worked like a breeze!
Damnit, I just wrote a sweet gem that handles this... ah, okay, but it requires an index constraint, whereas I needed to merge on arbitrary conditions. So I didn't totally waste my time.
I've been looking forward to this for ages! The solution we use at the moment for upsert is a create or replace rule on the table but it isn't great as you need to remember to update the rule whenever you change the table schema...
CREATE OR REPLACE RULE "replace_row" AS
ON INSERT TO my_table
WHERE EXISTS(SELECT 1 FROM my_table WHERE id=NEW.id)
DO INSTEAD
(UPDATE my_table SET
col1=NEW.col1,
col2=NEW.col2,
col3=NEW.col3,
WHERE id=NEW.id);
For one, the WHERE EXISTS() will not see concurrent insertions by transactions that are still in progress, which means you'll still get constraint violation errors.
For another, rules generally have very surprising behaviour. E.g. with this RULE you'll get into trouble if "id" isn't passed in as a explicit value, but uses a non-deterministic DEFAULT or directly passed in expression. Every reference to NEW.id will not be replaced by the result of that expression, but rather with the expression itself. For the common serial id column (aka autoincrement) you'll have a separate nextval() call in each reference. Which can make this explode in pretty damn confusing ways.
As someone that's been using PostgreSQL way before it was cool (since 2005), I'm super happy to see this. Previously I had to do EXISTS, IF FOUND UPDATE ELSE INSERT. Now I can just do upsert, so nice.
Very thankful to all the hard working Postgres devs. PostgreSQL was really good when I first started using it, and just keeps getting better. Glad to see the general community finally getting on the PostgreSQL bandwagon.
Now let's see if I can get more people to use internal PostgreSQL functions instead of constantly writing statements directly in their code.
This reminds me of a common idiom in the Pick-style databases that I've worked on...
READU REC FROM FILE,KEY ELSE
REC<1> = 'stuff'
* Set other fields as appropriate for a brand new record - this is the 'insert'
END
REC<1> = 'blah'
* Set other fields as appropriate for an update
WRITE BLAH TO FILE,KEY ;* Write record and release locks
[+] [-] pbnjay|10 years ago|reply
I try not to think too deeply about all the DB internals (some people are much smarter than me about this stuff), but clearly there are a lot of things that need maintained internally to make this correct, so kudos to all the people who contributed to this.
[+] [-] craigkerstiens|10 years ago|reply
Also it couldn't have been possible without the support of Heikki and Andres (who actually submitted this to HN as his first ever submission).
[+] [-] anilshanbhag|10 years ago|reply
[+] [-] colanderman|10 years ago|reply
Hence this is more than just saving typing a couple lines -- this saves writing entire stupid loops to do what is conceptually a simple (and very common) operation.
Postgres gets better and better.
[+] [-] ryan0x00|10 years ago|reply
[+] [-] NoMoreNicksLeft|10 years ago|reply
Do the update statement, followed by insert-where-not-exists. If the update doesn't match, 0 rows updated, and the insert works. If the update matches, then 0 rows inserted.
[+] [-] viraptor|10 years ago|reply
pgsql -> on conflict
mysql -> replace / on duplicate
oracle -> merge
mssql -> merge
sqlite -> insert or replace
firebird -> merge / update or insert
[+] [-] colanderman|10 years ago|reply
[+] [-] anarazel|10 years ago|reply
[+] [-] daigoba66|10 years ago|reply
But MERGE also, in my opinion, aligns better with set-based logic. I.e., I have two _sets_ of data that I want to merge together. In some cases I want need to INSERT rows into the target, in other cases I need to UPDATE, and in some cases I might DELETE. The <ouput_clause> in T-SQL is also quite useful.
[+] [-] nolok|10 years ago|reply
Both have different use cases.
[+] [-] DigitalSea|10 years ago|reply
[+] [-] dan15|10 years ago|reply
[+] [-] petergeoghegan|10 years ago|reply
[+] [-] electrotype|10 years ago|reply
[+] [-] endymi0n|10 years ago|reply
[+] [-] kbendyk|10 years ago|reply
[+] [-] leftnode|10 years ago|reply
[+] [-] jeltz|10 years ago|reply
http://www.postgresql.org/message-id/flat/20141209174146.GP1...
[+] [-] narsil|10 years ago|reply
[+] [-] corford|10 years ago|reply
[+] [-] ozgune|10 years ago|reply
This was the most requested feature in PostgreSQL: https://postgresql.uservoice.com/forums/21853-general
[+] [-] olalonde|10 years ago|reply
[+] [-] amitlan|10 years ago|reply
[+] [-] piker|10 years ago|reply
[+] [-] abannin|10 years ago|reply
[+] [-] taspeotis|10 years ago|reply
[1] http://en.wikipedia.org/wiki/Merge_(SQL)
[+] [-] veddan|10 years ago|reply
[1] https://wiki.postgresql.org/wiki/UPSERT#SQL_MERGE_syntax
[+] [-] ivoras|10 years ago|reply
[+] [-] anarazel|10 years ago|reply
That makes some approaches harder.
The bigger problem is that 'update if fails' is easier than it sounds. When you find a conflicting row in the index, that row's creating transaction might not yet have committed. So you need to first wait for it to commit, and then retry. But that implies that the transaction could actually roll back or delete the row again. And suddenly there's not actually a row to UPDATE. Thus you need to retry, right?
There's also some issues around avoiding deadlocks. If you "naively" just acquire a 'value lock' on the to-be-inserted row, you can easily get into deadlock territory if more than one row is inserted in one transaction. Such value locks have to be released after an attempt to update. Which then necessitates a looping attempt...
Does that start to explain the problems a bit?
[+] [-] smegel|10 years ago|reply
Basically, I was updating a table from an external file, and I wanted to add what was new, leave what was the same, and delete any records no longer in the external table. Using UPSERT to insert new rows, update a timestamp on existing rows, and then afterwards delete any rows who's timestamp was before the UPSERT operation. Because of the nature of the file, doing an external patch/diff approach was difficult, but the UPSERT operation worked like a breeze!
[+] [-] donw|10 years ago|reply
Way to go Postgres team!
[+] [-] loopdoend|10 years ago|reply
[+] [-] corford|10 years ago|reply
[+] [-] anarazel|10 years ago|reply
For one, the WHERE EXISTS() will not see concurrent insertions by transactions that are still in progress, which means you'll still get constraint violation errors.
For another, rules generally have very surprising behaviour. E.g. with this RULE you'll get into trouble if "id" isn't passed in as a explicit value, but uses a non-deterministic DEFAULT or directly passed in expression. Every reference to NEW.id will not be replaced by the result of that expression, but rather with the expression itself. For the common serial id column (aka autoincrement) you'll have a separate nextval() call in each reference. Which can make this explode in pretty damn confusing ways.
[+] [-] agotterer|10 years ago|reply
[+] [-] elchief|10 years ago|reply
[+] [-] asnyder|10 years ago|reply
Very thankful to all the hard working Postgres devs. PostgreSQL was really good when I first started using it, and just keeps getting better. Glad to see the general community finally getting on the PostgreSQL bandwagon.
Now let's see if I can get more people to use internal PostgreSQL functions instead of constantly writing statements directly in their code.
[+] [-] Mister_Snuggles|10 years ago|reply
[+] [-] unknown|10 years ago|reply
[deleted]