Wikipedia is tiny data. You don't start to really see cost scaling issues until you have active data a few hundred times larger and your data changes enough that autovacuuming can't keep up.
I'm getting paid to move a database that size this morning.
English language Wikipedia revision history dump:
April 2019: 18 880 938 139 465 bytes (19 TB) uncompressed. 937GB bz2 compressed. 157GB 7z compressed.
I assume since then it's grown at least ten-fold. It's already an amount of data that would cripple most NoSQL solutions on the market.
I honestly feel like talking to functional programming zealots. There's this fictional product that is oh so much better than whatever tool you're talking about. No one has seen it, no one has proven it exists, or works better than the current perfectly adequate and performant tool. But trust us, for some ridiculous vaguely specified constraints it definitely works amazingly well.
This time "RDBMS is bad at soft deletions and versions because 19TBs of revisions on one of the world's most popular websites is tiny"
Archival read only servers don't have to worry about any of the maintenance mentioned. Use chatgpt or something to play your devil's advocate, because what you're saying is magical and non existent is quite common.
troupo|1 year ago
I assume since then it's grown at least ten-fold. It's already an amount of data that would cripple most NoSQL solutions on the market.
I honestly feel like talking to functional programming zealots. There's this fictional product that is oh so much better than whatever tool you're talking about. No one has seen it, no one has proven it exists, or works better than the current perfectly adequate and performant tool. But trust us, for some ridiculous vaguely specified constraints it definitely works amazingly well.
This time "RDBMS is bad at soft deletions and versions because 19TBs of revisions on one of the world's most popular websites is tiny"
[1] https://meta.wikimedia.org/wiki/Data_dumps/Dumps_sizes_and_g...
Thews|1 year ago
They store revisions in compressed storage mostly read only for archival. https://wikitech.wikimedia.org/wiki/MariaDB#External_storage
They have the layout and backup plans of their servers available.
They've got an efficient layout, and they use caching, and it is by nature very read intensive.
https://wikitech.wikimedia.org/wiki/MariaDB#/media/File:Wiki...
Archival read only servers don't have to worry about any of the maintenance mentioned. Use chatgpt or something to play your devil's advocate, because what you're saying is magical and non existent is quite common.