It's beyond me how he doesn't understand that text logs are a universal format, easily accessible, that can be instantly turned into whatever binary format you desire with a highly efficient insertion process (Splunk is just one of those that does a great job).
Here is the thing he doesn't seem to understand - all of us who are sysadmins absolutely understand the value of placing complex and large log files into database so that we can query them efficiently. We also understand why having multi-terabyte text log files is not useful.
But what we find totally unacceptable is log files being shoved into binary repositories as the primary storage location. Because you know what everyone has their own idea of what that primary storage location should be, and they are mostly incompatible with each other.
The nice thing about text - for the last 40 years it's been universally readable, and will be for the next 40 years. Many of these binary repositories will be unreadable within a short period, and will be immediately unreadable to those people who don't know the magic tool to open them.
Uh, I don't know what world you live in but I'd like the address because mine sucks in comparison.
Text logs are definitely not a "universal format". Easily accessible, sure. Human readable most of the time? Okay. Universal? Ten times nope.
Give you an example: uwsgi logs don't even have timestamps, and contain whatever crap the program's stdout outputs, so you often end up with three different types of your "universal format" in there. I'm not giving this example because it's contrived, but because I was dealing with it the very moment I read your comment.
"But what we find totally unacceptable is log files being shoved into binary repositories as the primary storage location"
The way I read his article, he's not really opposed to additionally keeping your logs around as text.
But you make a good point of using text as the primary storage location, since you can always easily feed it to some binary system for further analysis.
Would the best practice then be to keep your logs around as (compressed) text, but additionally feed it to your log analysis system of choice for greater querying capabilities?
Agreed. Logs are for when everything and anything is broken. They aren't supposed to be pretty or highly functional, they are just meant as a starting point for gathering data.
our product stores all the logs raw in flats files on the file system, we don't use databases for keeping the logs in, this allows you to scale massively (ingestion limit is that of the correlation engine and disk bandwidth). You then just need an efficient search crawler and use of metadata so search performance is good too.
Issue is if you every need to pull the logs for court and you have messed with them (i.e. normalized them and stuffed them into a DB) then your chain of custody is broken.
Best of both worlds means parsed out normalisation so I don't have to remember that Juniper calls source ip srcIP and Cisco SourceIP, but the original logs under the covers for grepping if you need.
Cool, so which standard binary log storage format should we all switch to?
Should I submit patches to jawstats so that it'll support google-log-format 1.0 beta, or the newer Amazon Cloud Storage 5 format? Or both? Or just go with the older Microsoft Log Storage Format? Or wait until Gruber releases Fireball Format? Has he decided yet whether to store dates as little-endian Unix 64 bit int timestamps, or is he still thinking about going with the Visual FoxPro date format, y'know, where the first 4 bytes are a 32-bit little-endian integer representation of the Julian date (so Oct. 15, 1582 = 2299161) and the last 4 bytes are the little-endian integer time of day represented as milliseconds since midnight? (True story, I had to figure that one out once. Without documentation.)
Should I write a new plugin for Sublime Text to handle the binary log formats? Or write something that will read the binary storage format and spit out text? Or is that too inefficient? Or should I give up on reading logs in a text form at all and write a GUI for it (maybe in Visual Basic)?
Do you know when I should expect suexec to start writing the same binary log format as Apache, or should I give up waiting on that and just write a daemon to read the suexec binary logs and translate them to the Apache binary logs?
Should I take the time to write a natural language parsing search engine for my custom binary log format? Do you think that's worth the time investment? I would really like to be able to search for common misspellings when users ask about a missing email, you know, like "/[^\s]+@domain.com/" does now.
I look forward to your guidance. I've been eagerly awaiting the day that I can have an urgent situation on my hands and I can dig through server logs with all of the ease and convenience of the Windows system logs.
The system should provide a standard API for writing and reading logs. The precise format of the underlying log files is thus rather unimportant at this level of abstraction. Other than the logging subsystem and recovery tools, there's no need for any software to be accessing such log files directly (outside of the API functions). This is how Windows has done it for years.
Binary logs may be fine for you, but don't force it on us!
This is really the important point here. For small systems, grep works fine. The number of people administering small systems is much greater than the number of people administering large systems. The systemd controversy has caused people to fear that change they don't want will be imposed on them and their objections insultingly dismissed: a consequence of incredibly bad social "change management" by its proponents.
They are therefore deploying pre-emptive rhetorical covering fire against the day when greppable logs will be removed from the popular Linux distributions. Plain text is the lingua franca; binary formats bind you to their tools with a particular set of design choices, bugs and disadvantages. My adhoc log grepping workflow has a different set of bugs and disadvantages, but they're mine.
That really the key for me. My go to example is searching for IP numbers across different logs. If I have just one machine, and I want to find an IP in the SSH, web and mail logs I shouldn't have to use multiple tools for getting that data.
Logstash, Splunk and other tools store stuff binary, as he writes, and that's perfectly valid, the only solution in fact. But I don't want to be force to run a centralized logging server, if I have just the one or two servers.
If it's okay to claim that binary logging is the only way to go, because you have hundreds of servers, it's also okay to claim that text files are the only solution, because I just have one server.
Finally, isn't those binary logs (those that come from individual services) going to be transformed into text when I transmit them to something like Splunk, only to be transformed back to some internal binary format when received? It seems we could save a transformation in that process.
Oddly enough, even for large (>=1e5 physical machines) systems, grep works fine. Better yet, if the logs are important, you're shunting them off for some sort of longer-term storage for post-processing and indexing _anyway_, irrespective of the underlying disk format. Some folks continue to use plain text even then, just with some distributed systems magic wrapped around the traditional Unix tools.
(If you're shunting _all_ of your log data off at that scale, you're crazy, and you'll melt your switches if you aren't careful.)
The name of the game is to think of the problems that you're solving and how they relate to the business bottom line. No sooner, no later. Additionally, what's most troubling is that we've turned this exercise into an emotional one, not one with any sort of scientific-oriented perspective.
I can personally say with conviction that I'd like to sit down and actually collect data on, e.g., how many instructions it takes to store logs to disk in plain text versus a binary format, how many it takes to retrieve logs from disk in both situations, and how much search latency I incur when trying to retrieve said logs from disk in the same. At scale, which is where most of my attention lies these days, that's the kind of thing that matters because those effects get amplified automatically—often to operators' and capacity planners' horrors—by the number of machines you have.
If you're dealing with smaller systems, it won't matter as much, but at that point, you're probably dealing with the other side of this, which is having information on how many requests you get for historical log data and what sort of criteria were used in that search. If you're getting requests less frequently than, say, once per quarter, it likely wouldn't be worth your time to invest in what Mr. Nagy is evangelizing.
tl;dr: Continue using your ad hoc grep-fu, but be mindful of how much time it takes you to get the data you're looking for. That alone will be your decision criterion for adopting something like this.
For sure the storage format should not hinder you from using grep if you want. Even with systemd you can pipe journalctl's output and use the same old regexes as its default behaviour is to be a glorified `cat` (but being able to use the --since and --until flags instead of matching date ranges by regexes makes it much better than `cat` for me).
Take this philosophy to an extreme and you end up with a dedicated data format and tooling/APIs to access the data for every subsystem, not just logging. Essentially, this is Windows.
The downside to this is that now you don't have a set of global tools which can easily operate across these separate datasets without writing code against an API. I hear PowerShell tackles this; I don't know how well. The general principle though harms velocity at just getting something simple done, to the benefit of being able to do extremely complex things more easily. See Event Viewer for a good example of this.
Logs don't exist in isolation. I want to use generally global tooling to access and manipulate everything. I don't want to have to write (non-shell) code, recall a logging-specific API or to have to take the extra step of converting my logs back to the text domain in order to manipulate data from them against text files I have exported from elsewhere for a one-off job. An example might be if I have a bunch of mbox files and need to process them against log files that have message IDs in them. I could have an API to read the emails, and an API to read the logs, or I could just use textutils because I know an exact, validating regexp is not necessary and log format injection would have no consequence in this particular task.
I do see the benefits of having logs be better structured data, but I also see downsides of taking plain text logs away. Claiming that there are no downsides, and therefore no trade-off to be made, is futile. It's like playing whack-a-mole, because nobody is capable of covering every single use case.
Honestly - I agree about the ELK stack side - piping all your logs into ES / Logstash is a great idea. (Or Splunk / Greylog / Logentries)
If you run any sort of distributed system, this is vital. And while that counts as binary logs, I would argue that on the local boxes it should stay text.
I would agree, if you are running any sort of complex queries on your data - go to logstash, and do it there - it much nicer than regexes.
If on the other hand, you just want to see how a development environment is getting on, or to troubleshoot a known bad component tail'ing to | grep (or just tail'ing depending on the verbosity of your logs) is fine.
I don't have to remember some weird incantation to see the local logs, worry about corruption etc.
One problem I will point out with the setup described is syslon-ng can be blocking. If the user is disconnected from the central logstash, and their local one dies, as soon as the FIFO queue in syslog-ng fills, good luck writing to /dev/log , which means things like 'sudo' and 'login' have .... issues.
Instead, if you have text files being written out, and something like beaver collecting them and sending them to logstash, you have the best of both worlds.
Windows has had binary logging forever. Is windows administration some wonderland of awesome capability for getting intelligence out of logs? Hell no.
For administering Unix like systems, the ability to use a variety of tools to process streams of text is an advantage and valuable capability.
That said, your needs do change when you're talking about managing 10 vs 10,000 vs 100,000 hosts. I think what you're really seeing here is a movement to "industrialize" the operations of these systems and push capabilities from paid management tools into the OS.
I think that largest problem with Event Log is overreliance on structure. Often you have one particular log record that you know is the problem, but no idea what it means because you have some generic event code and bunch of meaningless structured data.
Freeform text logs usually contain more detail as to what exactly happened.
Grepping logs is terrible. Reverse engineering a binary format so you can diagnose why you are down/crashing/losing data is far worse. Logs should be handled as text until they reach their long term storage... then whatever helps analyze and query is fine...
Yeah,in the presence of adequate tooling you don't need to grep logs. But how much more effort is required to use those tool-friendly loggings? Where is your god when the tool fails?
For me the main reason to access plaintext logs is they seldom fail, and they are simple. They are a bore to analyse, they CAN be analysed.
Anyway, this discussion only makes sense if the task at hand involves heavy log analysis, don't complicate what is simple when it isn't needed.
As for the razor analogy, you're right, however I wouldn't change my beard to be "razor compatible only". In the software world I'd say it is still not uncommon to find yourself "stranded in a desert island".
Oh jeez. Yes there are better and more performant tools for parsing optimised binary databases; nobody disputes that. And yes, tools like Splunk are more user friendly than grep; nobody disputes that either. But to advocate a binary only system for logs is short sighted because logs are the goto when everything else fails and thus need to be readable when every other tool dies. There's quite a few scenarios that could cause this too:
* log file corruption - text parsing would still work,
* tooling gets deleted - there's a million ways you
can still render plain text even when you've lost
half your POSIX/GNU userland,
* network connection problems, breaking push to a
centralised database - local text copies would still
be readable.
In his previous blog post he commented that there's no point running both a local text version and a binary version, but since the entirety of his rant is really about tooling rather than log file format, I'm yet to see a convincing argument against running the two paradigms in parallel.
The ease of recovering data from a corrupted log file depends on whether the logged events have been written as sequential records. This is true for text-based logs (the record delimited being a newline), and is also true of the most popular binary (i.e. structured) log formats, namely Windows event logs, and systemd's journals. Probably not if you're storing them in a more general purpose database though.
So this really is dependant on the file format of your log data, rather than an inherent difference between text and binary logging.
This is a discussion for a sake of discussion. The way I see it is that author has a niche situation on his hands and therefore should use a product designed for that particular niche, instead of complaining how everyone's wrong and trying to shove his perspective down peoples' throats.
Sounds like somebody in the systemd camp. I really dislike added complexity when it is totally unnecessary. If people want to transform their logs into a different storage format, that is up to them. Text files, however, are a fantastically simple way of storing... (drumroll please) text. Surprising /s
> For example: find all logs between 2013-12-24 and 2015-04-11, valid dates only.
That’s a straw man. If you’re grepping logs, you don’t need a regular expression that matches only valid dates because you can assume that the timestamps on the log records are valid dates. But I suppose
Not to mention 99.9% of the searches one does of a log file isn't really that complex. Heck, I'm willing to wager that 90% + of my searches over the last 20 years have been in log files from a particular day.
That's the thing about having simple text log files - the cognitive load required to pull data out of them, often into a format that can then be manipulated by another tool (awk, being one of the more well known), is so low that you can perform them without a context switch.
If you have a problem, you can reach into the log files, pull out the data you need, possibly massage/sum/count particular records with awk, all without missing a beat.
This is particularly important for sysadmins who may be managing dozens of different applications and subsystems. Text files pull all of them together.
But, and here is the most important thing that people need to realize - for scenarios in which complex searching is required, by all means move it into a binary format - that just makes sense if you really need to do so.
The argument isn't all text instead of binary, it is at least text and then use binary where it makes sense.
More to the point: Text logs are just as structured as binary logs, but they have the additional property of not being as opaque and, therefore, being immediately usable with more preexisting, well-tested, well-known tooling.
> If you’re grepping logs, you don’t need a regular expression that matches only valid dates because you can assume that the timestamps on the log records are valid dates.
Even _if_ I agreed with your assumption[1], are you actually suggesting that
is a serious solution? I admit that it is shorter than the author's solution, _but it still proves his point_.
And then what about multi-line log lines? `grep` can't tell where the next line is; sure, I can -A, but there's no number I can plug in that's going to just work: I need to guess, and if I get a truncated result or too much output, adjust. Worse, I get too much output _and_ a truncated record where I need it…
Using regexs for time is like using regexs for HTML: it's possible-ish, but most people are probably doing it wrong and storing things using their correct data structures is a much simpler solution.
After reading the article I wonder if there are lots of tools that do all the binary advantages in indexes but leave the logs as text files, why that is not fine. To get the binary advantage the log does not have to be binary.
The example with the timestamps is also strange. No matter how you store the timestamps, parsing a humanly reasonable query like "give me 10 hours starting from last Friday 2am" to an actual filter is a complex problem. The problem is complex no matter how you store your timestamp. You can choose to do the complexity before and create complex index structures. You can choose to have complex algorithms to parse simple timestamps in binary or text form, you can build complex regexes. But something needs to be complex, because the problem space is. Just being binary doesn't help you.
And that's really the point here, isn't it? Just being binary in itself is not an advantage. It doesn't even mean by itself that it will save disk space. But text in itself is an advantage, always, because text can be read by humans without help (and in some instances without any training or IT education), binary not.
Yesterday I was thinking there might be something about binary logs. Now I'm convinced there isn't. The only disadvantage seems to be that you also lose disk space if you store it in clear text. But disk space isn't an issue in most situations (and in many situations where it is an issue you might have resources and tools at hand to handle that as well) It is added complexity for no real advantage. Thanks for clearing that up.
Another advantage of using structured data rather than free-form text is that you can more precisely encode the essence of the event, with fields for timestamp, event source, type of event, its severity, any important parameters, and so on. This permits logging to be independent of the language of the system operator. Rather than grepping for what is almost always English text, one can query a language-independent set of fields, and then, if a suitable translation has been done, see the event in one's native language.
When applied widely throughout a system, this leads to the internationalisation of log messages. Thus lessening the anglocentric bias in systems software. Windows has done this for years, at least with its own system logging (other applications can still put free-form text into the event logs if they wish.)
With regards to disk space - compressed text logs are pretty common. The frequency with which they are compressed is adjustable, and, gzcat is a pretty well known mechanism for opening them.
Now I'm no systemd apologist but maybe some of the hate towards systemd, journald and pals is unwarranted. If one gives these newer tools a chance, they actually have some nice features. Despite the Internet's opinion, seems like they were not actually created to make Linux users' lives difficult.
If binary logs turn out to be the wrong technological decision, I'm sure we'll figure that out and change over to text logs again. All it would take is a few key savvy users losing their logs to journald corruption and the change in the wider "ecosystem" would be made. But if all goes well... then what's to complain about? :-D
Grepping logs is terrible. Reverse engineering a binary format so you can diagnose why you are is worse. Logs should be handled as text until they reach their long term storage... then whatever helps analyze and query is fine...
My main problem with this is that ascii is not something that will ever change over time. The data format is wonderfully static. Forever. Introduce a binary format? You get versioning. It is a major downside.
What you lose when you move away from text logs is not any real benefit; what you lose is the illusion of control you have with text logs.
Text logs can be corrupted, text logs can be made unusable, you need a ton of domain-specific knowledge to even begin to make sense of text logs, etc.
But there's always a sense that, if you had the time, you could still personally extract meaning from them. With binary logs, you couldn't personally sit there and read them out line by line.
The issue is psychology, not pragmatism, and that's why text logs have been so sticky for so long.
A substring of text may or may not be a date and based on the excellent tools available in linux you can decide how to extract that "data point". If binary logging is little more than a stream of text, then that is fine, but I seriously doubt that is the push happening. Personally I prefer having a raw stream of data that I have to work with as best as I can rather than having to use some flag defined by somebody else to range across dates. That is the fundamental difference it seems: do you want a collection of tools that can be applied in a variety of ways or do you want the "one way" (with potential versioning... have fun!).
Again if the binary log is simply better compressed data, well we have ways of compressing text already as an afterthought. This really, fundamentally, seems to be a conflict in how people want to administer their systems and, for the most part, this seems to be about creating a "tool" that people then have to pay money for to better understand.
> Does database store the data in text files? No? That's my point.
This guy is a first class idiot who knows enough to reformulate a decided issue into yet another troll article. "a database (which then goes and stores the data in a binary format)". How about a text file IS a database. It's encoded 1s and 0s in a universal format instead of the binary DB format which can be corrupted with the slightest modification or hardware failure.
I think there are a number of issues that are getting mushed into one.
* Journal is just terrible.
* some text logs are perfectly fine.
* when you are in rescue mode, you want text logs
* some people use text logs as a way to compile metrics
I think the most annoying thing for me about journald is that it forces you to do something their way. However its optional, and in centos7 its turned off, or its beaten into such a way that I haven't noticed its there.... (if that is the case, I've not really bothered to look, I poked about to see if logs still live in /var/log/ they did, and that was the end of it. Yes, I know that if this is the case, I've just undermined my case. Shhhhh.)
/var/log/messages for kernel oopes, auth for login, and all the traditional systemy type things are good for text logs. Mainly because 99.9% of the time you get less than 10 lines a minute.
being able to sed, grep, tee and pipe text files are brilliant on a slow connection with limited time/mental capacity. ie. a rescue situation. I'm sure there will be a multitude of stable tools that'll popup to deal with a standardised binary log format, in about ten years.
The last point is the big kicker here. This is where, quite correctly its time to question the use of grep. Regex is terrible. Its a force/problem amplfier. If you get it correct, well done. Wrong? you might not even know.
Unless you don't have a choice, you need to make sure that your app kicks out metrics directly. Or as close to directly as possible. Failing that you need to use something like elastic search. However because you're getting the metrics as an afterthought, you have to do much more work to make sure that they are correct. (although forcing metrics into an app is often non trivial)
If you're starting from scratch, writing custom software, and think that log diving is a great way to collect metrics, you've failed.
if you are using off the shelf parts, its worth Spending the time and interrogating the API to gather stats directly. you never know, collectd might have already done the hard work for you.
The basic argument he puts forth is this: text logs are a terrible way to interchange and store metrics. And yes, he is correct.
Of course you need to log some data in textual format for emergencies, but if you had a tool that indexes events on timestamps, servers, monitorees, severity and event type, while severely reducing the storage required, you would be able to log much more data, and find problems faster. Arguing binary vs text logs is like arguing serial port vs USB on some industrial systems.
Great to see some effort in this area. I've been using New Relic and it's pretty great for errors because we've setup Slack/email notifications. However, there's nothing for general log (e.g.: access log) parsing. I'm installing an ELK stack on my machine right now and hope that it's enough
Doesn't this just mean that we should have a more "intelligent" version of grep? For example, this "supergrep" could periodically index the files it is used on, so searching becomes faster.
It seems to me that most of the worry about a binary log file being "opaque" could be solved with a single utility:
log-cat <binary-log-file>
… that just outputs it in text. Then you can attack the problem with whatever text-based tools you want.
But to me, having a utility that I could do things like, get a range of log lines — in sorted order —, or, grep on just the message, would be amazing. These are all things that proponents of grep I'm sure will say "you can!" do with grep… but you can't.
The dates example was a good one. I'd much rather:
Also, my log files are not "sorted". They are, but they're sorted _per-process_, and I might have multiple instances of some daemon running (perhaps on this VM, perhaps across many VMs), and it's really useful to see their logs merged together[2]. For this, you need to understand the notion of where a record starts and ends, because you need to re-order whole records. (And log records' messages are _going_ to contain newlines. I'm not logging a backtrace on one line.) grep doesn't sort. |sort doesn't know enough about a text log to adequately sort, but
Binary files offer the opportunity for structured data. It's really annoying to try to find all 5xx's in a log, and your grep matches the process ID, the line number, the time of day…
I've seen some well-meaning attempts at trying to do JSON logs, s.t. each line is a JSON object[1]. (I've also seen it attempted were all that is available is a rudimentary format string, and the first " breaks everything.)
Lastly, log files sometimes go into metrics (I don't really think this is a good idea, personally, but we need better libraries here too…). Is your log format even parseable? I've yet to run across one that had an unambiguous grammar: a newline in the middle of a log message, with the right text on the second line, can easily get picked up as a date, and suddenly, it's a new record. Every log file "parser" I've seen was a heuristic matcher, and I've seem most all of them make mistakes. With the simple "log-cat" above, you can instantly turn a binary log into a text one. The reverse — if possible — is likely to be a "best-effort" transformation.
[1]: the log writer is forbidden to output a newline inside the object. This doesn't diminish what you can output in JSON, and allows newline to be the record separator.
[2]: I get requests from mobile developers tell me that the server isn't acting correctly all the time. In order to debug the situation, I first need to _find_ their request in the log. I don't know what process on what VM handled their request, but I often have a _very_ narrow time-range that it occurred in.
Windows systems have had better log querying tools than grep for years now, with a well structured log file format to match. It's good to see Linux distributions finally catching up in this regard.
Not that the log files on Linux are all entirely text-based anyway. The wtmpx and btmpx files are of a binary format, with specialised tools for querying. I don't see anyone complaining about these and insisting that they be converted to a text-only format.
ghshephard|10 years ago
Here is the thing he doesn't seem to understand - all of us who are sysadmins absolutely understand the value of placing complex and large log files into database so that we can query them efficiently. We also understand why having multi-terabyte text log files is not useful.
But what we find totally unacceptable is log files being shoved into binary repositories as the primary storage location. Because you know what everyone has their own idea of what that primary storage location should be, and they are mostly incompatible with each other.
The nice thing about text - for the last 40 years it's been universally readable, and will be for the next 40 years. Many of these binary repositories will be unreadable within a short period, and will be immediately unreadable to those people who don't know the magic tool to open them.
scrollaway|10 years ago
Uh, I don't know what world you live in but I'd like the address because mine sucks in comparison.
Text logs are definitely not a "universal format". Easily accessible, sure. Human readable most of the time? Okay. Universal? Ten times nope.
Give you an example: uwsgi logs don't even have timestamps, and contain whatever crap the program's stdout outputs, so you often end up with three different types of your "universal format" in there. I'm not giving this example because it's contrived, but because I was dealing with it the very moment I read your comment.
rkrzr|10 years ago
The way I read his article, he's not really opposed to additionally keeping your logs around as text. But you make a good point of using text as the primary storage location, since you can always easily feed it to some binary system for further analysis.
Would the best practice then be to keep your logs around as (compressed) text, but additionally feed it to your log analysis system of choice for greater querying capabilities?
magnifyingglass|10 years ago
cones688|10 years ago
our product stores all the logs raw in flats files on the file system, we don't use databases for keeping the logs in, this allows you to scale massively (ingestion limit is that of the correlation engine and disk bandwidth). You then just need an efficient search crawler and use of metadata so search performance is good too.
Issue is if you every need to pull the logs for court and you have messed with them (i.e. normalized them and stuffed them into a DB) then your chain of custody is broken.
Best of both worlds means parsed out normalisation so I don't have to remember that Juniper calls source ip srcIP and Cisco SourceIP, but the original logs under the covers for grepping if you need.
sobkas|10 years ago
Then punch in the face is a universal form of communication. Also EBCDIC is the only encoding future will recognize!
thaumaturgy|10 years ago
Should I submit patches to jawstats so that it'll support google-log-format 1.0 beta, or the newer Amazon Cloud Storage 5 format? Or both? Or just go with the older Microsoft Log Storage Format? Or wait until Gruber releases Fireball Format? Has he decided yet whether to store dates as little-endian Unix 64 bit int timestamps, or is he still thinking about going with the Visual FoxPro date format, y'know, where the first 4 bytes are a 32-bit little-endian integer representation of the Julian date (so Oct. 15, 1582 = 2299161) and the last 4 bytes are the little-endian integer time of day represented as milliseconds since midnight? (True story, I had to figure that one out once. Without documentation.)
Should I write a new plugin for Sublime Text to handle the binary log formats? Or write something that will read the binary storage format and spit out text? Or is that too inefficient? Or should I give up on reading logs in a text form at all and write a GUI for it (maybe in Visual Basic)?
Do you know when I should expect suexec to start writing the same binary log format as Apache, or should I give up waiting on that and just write a daemon to read the suexec binary logs and translate them to the Apache binary logs?
Should I take the time to write a natural language parsing search engine for my custom binary log format? Do you think that's worth the time investment? I would really like to be able to search for common misspellings when users ask about a missing email, you know, like "/[^\s]+@domain.com/" does now.
I look forward to your guidance. I've been eagerly awaiting the day that I can have an urgent situation on my hands and I can dig through server logs with all of the ease and convenience of the Windows system logs.
geographomics|10 years ago
pjc50|10 years ago
This is really the important point here. For small systems, grep works fine. The number of people administering small systems is much greater than the number of people administering large systems. The systemd controversy has caused people to fear that change they don't want will be imposed on them and their objections insultingly dismissed: a consequence of incredibly bad social "change management" by its proponents.
They are therefore deploying pre-emptive rhetorical covering fire against the day when greppable logs will be removed from the popular Linux distributions. Plain text is the lingua franca; binary formats bind you to their tools with a particular set of design choices, bugs and disadvantages. My adhoc log grepping workflow has a different set of bugs and disadvantages, but they're mine.
mrweasel|10 years ago
That really the key for me. My go to example is searching for IP numbers across different logs. If I have just one machine, and I want to find an IP in the SSH, web and mail logs I shouldn't have to use multiple tools for getting that data.
Logstash, Splunk and other tools store stuff binary, as he writes, and that's perfectly valid, the only solution in fact. But I don't want to be force to run a centralized logging server, if I have just the one or two servers.
If it's okay to claim that binary logging is the only way to go, because you have hundreds of servers, it's also okay to claim that text files are the only solution, because I just have one server.
Finally, isn't those binary logs (those that come from individual services) going to be transformed into text when I transmit them to something like Splunk, only to be transformed back to some internal binary format when received? It seems we could save a transformation in that process.
nrr|10 years ago
(If you're shunting _all_ of your log data off at that scale, you're crazy, and you'll melt your switches if you aren't careful.)
The name of the game is to think of the problems that you're solving and how they relate to the business bottom line. No sooner, no later. Additionally, what's most troubling is that we've turned this exercise into an emotional one, not one with any sort of scientific-oriented perspective.
I can personally say with conviction that I'd like to sit down and actually collect data on, e.g., how many instructions it takes to store logs to disk in plain text versus a binary format, how many it takes to retrieve logs from disk in both situations, and how much search latency I incur when trying to retrieve said logs from disk in the same. At scale, which is where most of my attention lies these days, that's the kind of thing that matters because those effects get amplified automatically—often to operators' and capacity planners' horrors—by the number of machines you have.
If you're dealing with smaller systems, it won't matter as much, but at that point, you're probably dealing with the other side of this, which is having information on how many requests you get for historical log data and what sort of criteria were used in that search. If you're getting requests less frequently than, say, once per quarter, it likely wouldn't be worth your time to invest in what Mr. Nagy is evangelizing.
tl;dr: Continue using your ad hoc grep-fu, but be mindful of how much time it takes you to get the data you're looking for. That alone will be your decision criterion for adopting something like this.
mdekkers|10 years ago
Do you have any evidence for this statement? Because it sounds all kinds of wrong.
EmanueleAina|10 years ago
rlpb|10 years ago
The downside to this is that now you don't have a set of global tools which can easily operate across these separate datasets without writing code against an API. I hear PowerShell tackles this; I don't know how well. The general principle though harms velocity at just getting something simple done, to the benefit of being able to do extremely complex things more easily. See Event Viewer for a good example of this.
Logs don't exist in isolation. I want to use generally global tooling to access and manipulate everything. I don't want to have to write (non-shell) code, recall a logging-specific API or to have to take the extra step of converting my logs back to the text domain in order to manipulate data from them against text files I have exported from elsewhere for a one-off job. An example might be if I have a bunch of mbox files and need to process them against log files that have message IDs in them. I could have an API to read the emails, and an API to read the logs, or I could just use textutils because I know an exact, validating regexp is not necessary and log format injection would have no consequence in this particular task.
I do see the benefits of having logs be better structured data, but I also see downsides of taking plain text logs away. Claiming that there are no downsides, and therefore no trade-off to be made, is futile. It's like playing whack-a-mole, because nobody is capable of covering every single use case.
pjmlp|10 years ago
Actually any non-UNIX OS clone out there, including mainframe and embedded OSes.
mugsie|10 years ago
If you run any sort of distributed system, this is vital. And while that counts as binary logs, I would argue that on the local boxes it should stay text.
I would agree, if you are running any sort of complex queries on your data - go to logstash, and do it there - it much nicer than regexes.
If on the other hand, you just want to see how a development environment is getting on, or to troubleshoot a known bad component tail'ing to | grep (or just tail'ing depending on the verbosity of your logs) is fine.
I don't have to remember some weird incantation to see the local logs, worry about corruption etc.
One problem I will point out with the setup described is syslon-ng can be blocking. If the user is disconnected from the central logstash, and their local one dies, as soon as the FIFO queue in syslog-ng fills, good luck writing to /dev/log , which means things like 'sudo' and 'login' have .... issues.
Instead, if you have text files being written out, and something like beaver collecting them and sending them to logstash, you have the best of both worlds.
Spooky23|10 years ago
For administering Unix like systems, the ability to use a variety of tools to process streams of text is an advantage and valuable capability.
That said, your needs do change when you're talking about managing 10 vs 10,000 vs 100,000 hosts. I think what you're really seeing here is a movement to "industrialize" the operations of these systems and push capabilities from paid management tools into the OS.
dfox|10 years ago
Freeform text logs usually contain more detail as to what exactly happened.
indymike|10 years ago
phn|10 years ago
For me the main reason to access plaintext logs is they seldom fail, and they are simple. They are a bore to analyse, they CAN be analysed.
Anyway, this discussion only makes sense if the task at hand involves heavy log analysis, don't complicate what is simple when it isn't needed.
As for the razor analogy, you're right, however I wouldn't change my beard to be "razor compatible only". In the software world I'd say it is still not uncommon to find yourself "stranded in a desert island".
laumars|10 years ago
geographomics|10 years ago
So this really is dependant on the file format of your log data, rather than an inherent difference between text and binary logging.
nailer|10 years ago
arpa|10 years ago
4ydx|10 years ago
robinhouston|10 years ago
That’s a straw man. If you’re grepping logs, you don’t need a regular expression that matches only valid dates because you can assume that the timestamps on the log records are valid dates. But I suppose
doesn’t look so bad.The whole thing is similarly exaggerated.
ghshephard|10 years ago
That's the thing about having simple text log files - the cognitive load required to pull data out of them, often into a format that can then be manipulated by another tool (awk, being one of the more well known), is so low that you can perform them without a context switch.
If you have a problem, you can reach into the log files, pull out the data you need, possibly massage/sum/count particular records with awk, all without missing a beat.
This is particularly important for sysadmins who may be managing dozens of different applications and subsystems. Text files pull all of them together.
But, and here is the most important thing that people need to realize - for scenarios in which complex searching is required, by all means move it into a binary format - that just makes sense if you really need to do so.
The argument isn't all text instead of binary, it is at least text and then use binary where it makes sense.
cbd1984|10 years ago
deathanatos|10 years ago
Even _if_ I agreed with your assumption[1], are you actually suggesting that
is a serious solution? I admit that it is shorter than the author's solution, _but it still proves his point_.And then what about multi-line log lines? `grep` can't tell where the next line is; sure, I can -A, but there's no number I can plug in that's going to just work: I need to guess, and if I get a truncated result or too much output, adjust. Worse, I get too much output _and_ a truncated record where I need it…
[1] most log file formats I've run across do not guarantee the date to appear in a given location.nailer|10 years ago
TillE|10 years ago
4ydx|10 years ago
erikb|10 years ago
The example with the timestamps is also strange. No matter how you store the timestamps, parsing a humanly reasonable query like "give me 10 hours starting from last Friday 2am" to an actual filter is a complex problem. The problem is complex no matter how you store your timestamp. You can choose to do the complexity before and create complex index structures. You can choose to have complex algorithms to parse simple timestamps in binary or text form, you can build complex regexes. But something needs to be complex, because the problem space is. Just being binary doesn't help you.
And that's really the point here, isn't it? Just being binary in itself is not an advantage. It doesn't even mean by itself that it will save disk space. But text in itself is an advantage, always, because text can be read by humans without help (and in some instances without any training or IT education), binary not.
Yesterday I was thinking there might be something about binary logs. Now I'm convinced there isn't. The only disadvantage seems to be that you also lose disk space if you store it in clear text. But disk space isn't an issue in most situations (and in many situations where it is an issue you might have resources and tools at hand to handle that as well) It is added complexity for no real advantage. Thanks for clearing that up.
geographomics|10 years ago
When applied widely throughout a system, this leads to the internationalisation of log messages. Thus lessening the anglocentric bias in systems software. Windows has done this for years, at least with its own system logging (other applications can still put free-form text into the event logs if they wish.)
ghshephard|10 years ago
rquirk|10 years ago
journalctl --since="$(date -d'last friday 2am' '+%F %X')" --until="$(date -d'last friday 2am + 10 hours' '+%F %X')"
Now I'm no systemd apologist but maybe some of the hate towards systemd, journald and pals is unwarranted. If one gives these newer tools a chance, they actually have some nice features. Despite the Internet's opinion, seems like they were not actually created to make Linux users' lives difficult.
If binary logs turn out to be the wrong technological decision, I'm sure we'll figure that out and change over to text logs again. All it would take is a few key savvy users losing their logs to journald corruption and the change in the wider "ecosystem" would be made. But if all goes well... then what's to complain about? :-D
indymike|10 years ago
4ydx|10 years ago
Frondo|10 years ago
Text logs can be corrupted, text logs can be made unusable, you need a ton of domain-specific knowledge to even begin to make sense of text logs, etc.
But there's always a sense that, if you had the time, you could still personally extract meaning from them. With binary logs, you couldn't personally sit there and read them out line by line.
The issue is psychology, not pragmatism, and that's why text logs have been so sticky for so long.
4ydx|10 years ago
Again if the binary log is simply better compressed data, well we have ways of compressing text already as an afterthought. This really, fundamentally, seems to be a conflict in how people want to administer their systems and, for the most part, this seems to be about creating a "tool" that people then have to pay money for to better understand.
jack9|10 years ago
This guy is a first class idiot who knows enough to reformulate a decided issue into yet another troll article. "a database (which then goes and stores the data in a binary format)". How about a text file IS a database. It's encoded 1s and 0s in a universal format instead of the binary DB format which can be corrupted with the slightest modification or hardware failure.
KaiserPro|10 years ago
* Journal is just terrible.
* some text logs are perfectly fine.
* when you are in rescue mode, you want text logs
* some people use text logs as a way to compile metrics
I think the most annoying thing for me about journald is that it forces you to do something their way. However its optional, and in centos7 its turned off, or its beaten into such a way that I haven't noticed its there.... (if that is the case, I've not really bothered to look, I poked about to see if logs still live in /var/log/ they did, and that was the end of it. Yes, I know that if this is the case, I've just undermined my case. Shhhhh.)
/var/log/messages for kernel oopes, auth for login, and all the traditional systemy type things are good for text logs. Mainly because 99.9% of the time you get less than 10 lines a minute.
being able to sed, grep, tee and pipe text files are brilliant on a slow connection with limited time/mental capacity. ie. a rescue situation. I'm sure there will be a multitude of stable tools that'll popup to deal with a standardised binary log format, in about ten years.
The last point is the big kicker here. This is where, quite correctly its time to question the use of grep. Regex is terrible. Its a force/problem amplfier. If you get it correct, well done. Wrong? you might not even know.
Unless you don't have a choice, you need to make sure that your app kicks out metrics directly. Or as close to directly as possible. Failing that you need to use something like elastic search. However because you're getting the metrics as an afterthought, you have to do much more work to make sure that they are correct. (although forcing metrics into an app is often non trivial)
If you're starting from scratch, writing custom software, and think that log diving is a great way to collect metrics, you've failed.
if you are using off the shelf parts, its worth Spending the time and interrogating the API to gather stats directly. you never know, collectd might have already done the hard work for you.
The basic argument he puts forth is this: text logs are a terrible way to interchange and store metrics. And yes, he is correct.
ownagefool|10 years ago
Just type journalctl and you should see the data there.
sika_grr|10 years ago
arenaninja|10 years ago
amelius|10 years ago
erikb|10 years ago
hartator|10 years ago
lurkinggrue|10 years ago
deathanatos|10 years ago
But to me, having a utility that I could do things like, get a range of log lines — in sorted order —, or, grep on just the message, would be amazing. These are all things that proponents of grep I'm sure will say "you can!" do with grep… but you can't.
The dates example was a good one. I'd much rather:
Also, my log files are not "sorted". They are, but they're sorted _per-process_, and I might have multiple instances of some daemon running (perhaps on this VM, perhaps across many VMs), and it's really useful to see their logs merged together[2]. For this, you need to understand the notion of where a record starts and ends, because you need to re-order whole records. (And log records' messages are _going_ to contain newlines. I'm not logging a backtrace on one line.) grep doesn't sort. |sort doesn't know enough about a text log to adequately sort, but Binary files offer the opportunity for structured data. It's really annoying to try to find all 5xx's in a log, and your grep matches the process ID, the line number, the time of day…I've seen some well-meaning attempts at trying to do JSON logs, s.t. each line is a JSON object[1]. (I've also seen it attempted were all that is available is a rudimentary format string, and the first " breaks everything.)
Lastly, log files sometimes go into metrics (I don't really think this is a good idea, personally, but we need better libraries here too…). Is your log format even parseable? I've yet to run across one that had an unambiguous grammar: a newline in the middle of a log message, with the right text on the second line, can easily get picked up as a date, and suddenly, it's a new record. Every log file "parser" I've seen was a heuristic matcher, and I've seem most all of them make mistakes. With the simple "log-cat" above, you can instantly turn a binary log into a text one. The reverse — if possible — is likely to be a "best-effort" transformation.
[1]: the log writer is forbidden to output a newline inside the object. This doesn't diminish what you can output in JSON, and allows newline to be the record separator.
[2]: I get requests from mobile developers tell me that the server isn't acting correctly all the time. In order to debug the situation, I first need to _find_ their request in the log. I don't know what process on what VM handled their request, but I often have a _very_ narrow time-range that it occurred in.
scrollaway|10 years ago
imaginenore|10 years ago
geographomics|10 years ago
Not that the log files on Linux are all entirely text-based anyway. The wtmpx and btmpx files are of a binary format, with specialised tools for querying. I don't see anyone complaining about these and insisting that they be converted to a text-only format.