top | item 37118024

(no title)

mbid | 2 years ago

The rename system call is not quite atomic. From https://linux.die.net/man/2/rename:

> However, when overwriting there will probably be a window in which both oldpath and newpath refer to the file being renamed.

A better way to think about rename is that it's given by two atomic operations in sequence: (1) An atomic link call to make the file available under the new name, and then (2) an atomic remove operation to delete the old name. Thus, it's possible to observe the state after (1) but before (2).

discuss

order

jakewins|2 years ago

Dan Luu also says rename is not atomic, at the bottom of this post: https://danluu.com/file-consistency/

I've wondered ever since I read that what the first bullet there means. It can't be what you're saying here, what you're saying would apply to racing processes as well as crashes?

So then.. what does he mean when he says rename isn't atomic if there's a crash? What's the failure condition / in-between state that can occur if there's a crash?

mbid|2 years ago

>what does he mean when he says rename isn't atomic if there's a crash?

Not sure. One of the papers he cites [1] has this to say about rename atomicity:

>Directory operations such as rename() and link() are seemingly atomic on all file systems that use techniques like journaling or copy-on-write for consistency

So it seems to depend on the file system. But the authors claim that not just link but also rename is atomic without qualification, which appears to be false, so I'm not sure how trustworthy this is.

There's a stackoverflow comment [2] that suggests that ext4 guarantees the kind of sequential atomicity property (atomic link followed by atomic remove) I mentioned:

>Kernel implementation depends on filesystem but here's implementation of Linux ext4 filesystem: elixir.bootlin.com/linux/v5.19.3/source/fs/ext4/namei.c#L3887 – it seems to first mark the inode as dirty, then create new link and after that delete the old link. If the kernel crashes in the middle, the end result could be two links and dirty mark. I would guess (but didn't investigate if this true) that journal recovery during the next mount would fix the end result to match atomic behavior of leaving either the old or new state depending on exact crash location.

In general, I find it rather weird how little authoritative documentation there is about whether or not this property holds given how important it is.

[1] https://www.usenix.org/system/files/conference/osdi14/osdi14...

[2] https://stackoverflow.com/questions/7054844/is-rename-atomic...

remram|2 years ago

Usually the atomicity I care about is with regard to newpath: I either observe no file/previous file at newpath, or I observe the new file at newpath. This is in opposition to a copy for example, where I might observe a partially-written file.

This is a great point though, when taking oldpath into account the operation is not atomic. TIL

fsckboy|2 years ago

the type of atomicity I've always seen it used for is, two processes try to rename a file at the same time, only one will succeed; the winner does what they planned and then rename the file back.

I guess you shouldn't ignore the error with both processes using the same destination name, why folks generally tend to put their pid in the lockfilename

yencabulator|2 years ago

Umm, two concurrent rename(2) syscalls to the same destination path will likely both succeed. One of them will win (not be overwritten), but both should succeed.

If you want atomic creation of a path, where no one else can succeed also, you want link(2) not rename(2).