top | item 42287418

(no title)

asQuirreL | 1 year ago

One reason I can think of is that the database needs to maintain atomicity and isolate effects of any given operation (the A and I in ACID).

By manually batching the deletes, you are telling the database that the whole operation does not need to be atomic and other operations can see partial updates of it as they run. The database wouldn't be able to do that for every large delete without breaking its guarantees.

discuss

order

wruza|1 year ago

I think that gp’s comment can be reinterpreted as: why should this landmine exist when databases could notify a reader of its manual about this issue in an explicit way, for example:

  DELETE FROM t WHERE … BATCH 100
Which would simulate batched queries when called outside of transaction. This would remove the need of a client to be connected (or at least active) for a duration of this lenghty operation.

If DELETE is so special, make special ways to manage it. Don’t offload what is your competence onto a clueless user, it’s recipe for disaster. Replace DELETE with anything and it’s still true.

  ALTER DATABASE d SET UNBATCHED DELETE LIMIT 500000
I know a guy (not me) who deleted rows from an OLTP table that served a country-level worth of clients and put it down for two days. That is completely database’s fault. If its engine was designed properly for bigdata, it should have refused to do so on a table with gazillions of rows and suggested a proper way to do it.

cogman10|1 year ago

Rather than batching, I would want a "NO ROLLBACK DELETE" sort of command. The real expensive part of the delete is rewriting the records into the transaction log so that a cancel or crash can undo the delete.

If you've gone to the effort of batching things, you are still writing out those records, you are just giving the db a chance to delete them from the log.

I'd like to save my ssds that heartache and instead allow the database to just delete.

In MSSQL in some extreme circumstances, we've partitioned our tables specifically so we can use the 'TRUNCATE TABLE' command as delete is just too expensive.

That operation can wipe gbs in seconds.