top | item 9967732

(no title)

lpsmith | 10 years ago

There is a good chance that your proposed algorithm to enforce a uniqueness constraint in the application won't work. As in, you've left out enough details that would be critical for getting it right, and in my experience, a lot of programmers would only get this right by accident if they get it right at all.

First problem is that the SQL standard provides no way to make this work portably on any standards-compliant database. So right there you are going to have to code to the database to one degree or another.

So, let's say you want to make this work in Postgres. Now, you'll need to be using Postgres 9.0 at least; otherwise your uniqueness constraint won't be a uniqueness constraint.

Try this, in any version of Postgres. Open up two psql sessions. In one, run a `create table unique (x text);`. Then run `begin isolation level repeatable read; select * from unique where x = 'foo';` in one of the sessions. Repeat those two commands in the other sessions.

Neither session sees 'foo'. So now both can go ahead and run `insert into unique values ('foo'); commit;`. Both transactions will succeed, and you can confirm that there are now two instances of 'foo' in the table.

In fact, `begin isolation level serializable` in PostgreSQL 9.0 or later is the minimum isolation level to make this work. And, you will need retry logic around the transaction in case of a serialization failure. (Perhaps your DB access layer or language would hide this latter detail from you, or perhaps not.)

In PostgreSQL 8.4 and before, serializable and repeatable read were equivalent, and both were still SQL standards compliant. In PostgreSQL 9.0, the repeatable read isolation level stayed the same, while the serializable isolation level was strengthened.

Unless you can accept a certain level of degraded accuracy by using a probabilistic construct such as a Bloom filter, by far the biggest cost of maintaining uniqueness is the index. And you'll need that index whether you use the database or the application to enforce uniqueness.

And, judiciously pushing computation onto a database can actually be cheaper for the database as well as its clients. This scenario is likely to be one of those situations.

discuss

No comments yet.