(no title)
MaxGabriel | 1 month ago
* It's obvious from the schema: If there's a `deleted_at` column, I know how to query the table correctly (vs thinking rows aren't DELETEd, or knowing where to look in another table)
* One way to do things: Analytics queries, admin pages, it all can look at the same set of data, vs having separate handling for historical data.
* DELETEs are likely fairly rare by volume for many use cases
* I haven't found soft-deleted rows to be a big performance issue. Intuitively this should be true, since queries should be O log(N)
* Undoing is really easy, because all the relationships stay in place, vs data already being moved elsewhere (In practice, I haven't found much need for this kind of undo).
In most cases, I've really enjoyed going even further and making rows fully immutable, using a new row to handle updates. This makes it really easy to reference historical data.
If I was doing the logging approach described in the article, I'd use database triggers that keep a copy of every INSERT/UPDATE/DELETEd row in a duplicate table. This way it all stays in the same database—easy to query and replicate elsewhere.
nine_k|1 month ago
All your other points make sense, given this assumption.
I've seen tables where 50%-70% were soft-deleted, and it did affect the performance noticeably.
> Undoing is really easy
Depends on whether undoing even happens, and whether the act of deletion and undeletion require audit records anyway.
In short, there are cases when soft-deletion works well, and is a good approach. In other cases it does not, and is not. Analysis is needed before adopting it.
whizzter|1 month ago
That said, we've had soft-deletes and during discussions of keeping it on one argument was that it was really only a half-assed measure (data lost due to updates rather than deletes aren't really saved)
KronisLV|1 month ago
I think we largely need support for "soft deletes" to be baked into SQL or its dialects directly and treated as something transparent (selecting soft deleted rows = special case, regular selects skip those rows; support for changing regular DELETE statements into doing soft deletes under the hood).
https://news.ycombinator.com/item?id=43781109
https://news.ycombinator.com/item?id=41272903
And then make dynamically sharding data by deleted/not deleted really easy to configure.
You soft deleted a few rows? They get moved to another DB instance, an archive/bin of sorts. Normal queries wouldn't even consider it, only when you explicitly try to select soft deleted rows would it be reached out to.
da_chicken|1 month ago
At that point you should probably investigate partitioning or data warehousing.
tharkun__|1 month ago
scott_w|1 month ago
Depending on your use-case, having soft-deletes doesn't mean you can't clean out old deleted data anyway. You may want a process that grabs all data soft-deleted X years ago and just hard-delete it.
> Depends on whether undoing even happens, and whether the act of deletion and undeletion require audit records anyway.
Yes but this is no more complex than the current situation, where you have to always create the audit records.
unknown|1 month ago
[deleted]
paulddraper|1 month ago
(Again, a lot is O(log n) right?)
postexitus|1 month ago
infamia|1 month ago
Another great (and older) approach is adding temporal information do your traditional database, which gives immutability without the eventual consistency headaches that normally comes with event sourcing. Temporal SQL has their own set of challenges of course, but you get to keep 30+ years of relational DB tooling which is a boon. Event sourcing is great, but we shouldn't forget about other tools in our toolbelt as well!
taeric|1 month ago
gleenn|1 month ago
arter45|1 month ago
Also, it doesn't support non-immutable use cases AFAIK, so if you need both you have to use two database technologies (interfaces?), which can add complexity.
arnsholt|1 month ago
arter45|1 month ago
(timestamp, accountNumber, value, state)
And then you just
SELECT state FROM Table WHERE accountNumber = ... ORDER BY timestamp DESC LIMIT 1
right?
unknown|1 month ago
[deleted]
ndr|1 month ago
ozim|1 month ago
I think one of our problems is getting users to delete stuff they don’t need anymore.
eddd-ddde|1 month ago
I'm pretty sure it is possible, and it might even yield some performance improvements.
That way you wouldn't have to worry about deleted items impacting performance too much.
gleenn|1 month ago
paulddraper|1 month ago
Memory >>>>> Disk in importance.
MaxGabriel|1 month ago
I think this is likely unnecessary for most use cases and is mostly a RAM saving measure, but could help in some cases.
rawgabbit|1 month ago
I usually tell people to stop treating databases like firebase and wax on/wax off records and fields willy nilly. You need to treat the database as the store of your business process. And your business processes demand retention of all requests. You need to keep the request to soft delete a record. You need to keep a request to undelete a record.
Too much crap in the database, you need to create a field saying this record will be archived off by this date. On that date, you move that record off into another table or file that is only accessible to admins. And yes, you need to keep a record of that archival as well. Too much gunk in your request logs? Well then you need to create an archive process for that as well.
These principles are nothing new. They are in line with “Generally Accepted Record Keeping Principles” which are US oriented. Other countries have similar standards.
indigo945|1 month ago