(no title)
marsovo | 11 months ago
That being said, my background is primarily Microsoft SQL more than Postgres. As such I'm occasionally bemused at the sort-of monoculture here around Postgres, where if Postgres doesn't have it, it may as well not exist*.
And so it is in this case (the DELAYED_DURABILITY documentation I linked above). Alas, this doesn't seem to be something I see in standard SQL, so indeed, as you say, it's too bad that the standard doesn't provide for relaxing the rules.
Relatedly, the other interesting thing is the chatter about fsync. I know on Windows that's not the mechanism that's used, and out of curiosity I looked deeper into what MS-SQL does on Linux, and indeed they were able to get significant improvement by leveraging similar mechanisms to ensure the data is hardened to disk without a separate flush (see https://news.ycombinator.com/item?id=43443703). They contributed to kernel 4.18 to make it happen.
> The repeated use of a write request followed by a flush request may be detrimental to performance and will increase traffic on the I/O bus.
> Prior to the Linux Kernel 4.18 updates, Linux could use Fua but only for the file system journaling writes and not data writes.
> If your system supports Fua and you have the Linux Kernel 4.18 or newer updates, you can enable SQL Server trace flag -T3979 and use /opt/mssql/bin/mssql-conf set control.alternatewritethrough 0. SQL Server will use Fua write behavior patterns instead of Forced Flush. File systems supporting optimized Fua align SQL Server on Linux with SQL Server on Windows behavior and performance.
*I think performance of CTEs/Views is another topic where I noticed it, where it was just taken as given that they can hurt performance, whereas in T-SQL they are simply equivalent to subqueries
anarazel|11 months ago
FWIW, I, as a medium-long term PG developer, are also regularly ... bemused by that attitude. We do some stuff well, but we also do a lot of shit not so well, and PG is succeeding despite that, not because of.
> Relatedly, the other interesting thing is the chatter about fsync. I know on Windows that's not the mechanism that's used, and out of curiosity I looked deeper into what MS-SQL does on Linux, and indeed they were able to get significant improvement by leveraging similar mechanisms to ensure the data is hardened to disk without a separate flush (see https://news.ycombinator.com/item?id=43443703). They contributed to kernel 4.18 to make it happen.
Case in point about "we also do a lot of shit not so well" - you can actually get WAL writes utilizing FUA out of postgres, but it's advisable only under somewhat limited circumstances:
Most filesystems are only going to use FUA writes with O_DIRECT. The problem is that for streaming replication PG currently reads back the WAL from the filesystem. So from a performance POV it's not great to use FUA writes, because that then triggers read IO. And some filesystems have, uhm, somewhat odd behaviour if you mix buffered and unbuffered IO.
Another fun angle around this is that some SSDs have *absurdly* slow FUA writes. Many Samsung SSDs, in particular, have FUA write performance 2-3x slower than their already bad whole-cache-flush performance - and it's not even just client and prosumer drives, it's some of the more lightweight enterprise-y drives too.
Edit: fights with HN formatting.