top | item 35800848

(no title)

bluestreak | 2 years ago

> On x86, and I think every architecture, when you write to a memory mapping that is not already backed by a writable page, the kernel is notified that user code is trying to write. And the kernel needs to fill in the contents of the page, which requires a read if the page isn’t already loaded.

This is very true. Perhaps there wasn't enough context to what the article is describing. The read problem started to occur on database that is subject to constant write workload. Data is flowing in all the time at variable rate. Typically blocks are "hot" and being filled in fully within seconds if not millis.

Zeroing the file is an option to try. QuestDB allocates disk with `posix_fallocate()`, which doesn't have the required flags. We would need to explore `fallocate()`. Thanks.

discuss

order

amluto|2 years ago

If you are allocating zeroed space and then writing a whole page quickly, then you may well avoid a read. And doing this under extreme memory pressure will indeed cause the page to be written to disk while only partly written by user code, and it will need to be read back in. Reducing memory pressure is always good.

I would expect quite a bit better performance if you actually write a entire pages using pwrite(2) or the io_uring equivalent, though.

Messing with fallocate on a per-page basis is asking for trouble. It changes file metadata, and I expect that to hurt performance.

bluestreak|2 years ago

great, we are on the same page! `fallocate()` (or posix one) is called on large swades of file. 16MB default. Not too often to hurt performance. I wonder if zeroing the file with `fallocate()` will result in actual disk writes or is it ephemeral?