(no title)
kbaker | 6 months ago
I think this is the part that is confusing.
The fsyncing of the directory is supposed to be done by the filesystem/OS itself, not the application.
From man fsync,
As well as flushing the file data, fsync() also flushes the metadata information associated with the file (see inode(7)).
So from sqlite's perspective on DELETE it is either: before the fsync call, and not committed, or after the fsync call, and committed (or partially written somehow and needing rollback.)Unfortunately it seems like this has traditionally been broken on many systems, requiring workarounds, like SYNCHRONOUS = EXTRA.
agwa|6 months ago
The next paragraph in the man page explains this:
> Calling fsync() does not necessarily ensure that the entry in the directory containing the file has also reached disk. For that an explicit fsync() on a file descriptor for the directory is also needed.
https://man7.org/linux/man-pages/man2/fsync.2.html
Edit to add: I don't think there's a single Unix-like OS on which fsync would also fsync the directory, since a file can appear in an arbitrary number of directories, and the kernel doesn't know all the directories in which an open file appears.
This is a moot point anyways, because in DELETE mode, the operation that needs to be durably persisted is the unlinking of the journal file - what would you fsync for that besides the directory itself?
kbaker|6 months ago
I was more curious so I looked at the code here:
https://sqlite.org/src/file?name=src/pager.c&ci=trunk
and found something similar to what you are asking in this comment before `sqlite3PagerCommitPhaseTwo`:
So, it does this: Assuming fsync works on both the main database and the hot journal, then I don't see a way that it is not durable? Because, it has to write and sync the full hot journal, then write to the main database, then zero out the hot journal, sync that, and only then does it atomically return from the commit? (assuming FULL and DELETE)