top | item 27655865

(no title)

zlynx | 4 years ago

The journalctl binary format seems to handle corruption pretty well. That was a design criteria.

Everyone forgets or tries to ignore that text files ARE A BINARY FORMAT. It is encoded in 7-bit ASCII with records delimited by 0x0a bytes.

Corruption tends to be missing data, and so the reader has to jump ahead to find the next synchronization byte, aka 0x0a. This also leads to log parsers producing complete trash as they try to parse a line that has a new timestamp right in the middle of it.

Or there's a 4K block containing some text and then padded to the end with 0x00 bytes. And then the log continues adding more after reboot. Again, that's fixed by ignoring data until the next non-zero byte and/or 0x0a byte. This problem makes it really obvious that text logs are binary files.

See the format definition at https://www.freedesktop.org/wiki/Software/systemd/journal-fi...

And here, this isn't perfect but if you had to hack out the text with no journalctl available you could try this:

grep -a -z 'SYSLOG_TIMESTAMP=\|MESSAGE=' /var/log/journal/69d27b356a94476da859461d3a3bc6fd/system@4fd7dfdde574402786d1a1ab2575f8fb-0000000001fc01f1-0005c59a802abcff.journal | sed -e 's/SYSLOG_TIMESTAMP=\|MESSAGE=/\n&/g'

discuss

dec0dedab0de|4 years ago

The journalctl binary format seems to handle corruption pretty well. That was a design criteria.

Thanks for pointing that out. I guess thats why they came up with their own format instead of just using sqlite, or something else that is already a standard.

Everyone forgets or tries to ignore that text files ARE A BINARY FORMAT

That's a bit pedantic, even for HN standards :-)

But yes, I know all about fragile log parsers and race conditions of multiple processes writing to the same file. I was just thinking about a scenario where you end up having to read raw logs when things go haywire.