top | item 46696329

(no title)

msdrigg | 1 month ago

The most interesting thing for me in this article was the mention of `MERGE` almost in passing at the end.

> I'm not a big fan of using the constraint names in SQL, so to overcome both limitations I'd use MERGE instead:

``` db=# MERGE INTO urls t USING (VALUES (1000004, 'https://hakibenita.com')) AS s(id, url) ON t.url = s.url WHEN MATCHED THEN UPDATE SET id = s.id WHEN NOT MATCHED THEN INSERT (id, url) VALUES (s.id, s.url); MERGE 1 ```

I use `insert ... on conflict do update ...` all the time to handle upserts, but it seems like merge may be more powerful and able to work in more scenarios. I hadn't heard of it before.

discuss

order

gshulegaard|1 month ago

IIRC `MERGE` has been part of SQL for a while, but Postgres opted against adding it for many years because it's syntax is inherently non-atomic within Postgres's MVCC model.

https://pganalyze.com/blog/5mins-postgres-15-merge-vs-insert...

This is somewhat a personal preference, but I would just use `INSERT ... ON CONFLICT` and design my data model around it as much as I can. If I absolutely need the more general features of `MERGE` and _can't_ design an alternative using `INSERT ... ON CONLFICT` then I would take a bit of extra time to ensure I handle `MERGE` edge cases (failures) gracefully.

kbolino|1 month ago

It's kinda hard to handle MERGE failures gracefully. You generally expect the whole thing to succeed, and the syntax deceptively makes it seem like you can handle all the cases. But because of MVCC, you get these TOCTOU-style spurious constraint violations, yet there's no way to address them on a per-row basis, leading to the entire statement rolling back even for the rows that had no issues. If you are designing for concurrent OLTP workloads against the table, you should probably just avoid MERGE altogether. It's more useful for one-off manual fixups.

awesome_dude|1 month ago

That reference - my initial gut feeling was that `MERGE` felt more readable, but then I read this paragraph

> If you want the generality of MERGE, you have to accept the fact that you might get unique constraint violations, when there are concurrent inserts, versus with INSERT ON CONFLICT, the way it's designed with its speculative insertions, guarantees that you either get an INSERT or an UPDATE and that is true even if there are concurrent inserts. You might want to choose INSERT ON CONFLICT if you need the guarantee.

Basically, `MERGE` is susceptible to a concurrent process also writing `INSERT` where that `INSERT` and `MERGE` are unaware of one another, causing a duplicate value to be used.

philjohn|1 month ago

If you're doing large batch inserts, I've found using the COPY INTO the fastest way, especially if you use the binary data format so there's no overhead on the postgres server side.

sirfz|1 month ago

That doesn't work well with conflicts tho iirc