top | item 9497805

(no title)

madhouse | 10 years ago

> you're saying that yours is unconditionally better

I don't think I'm saying that. The article presents two setups and a few related use cases, where I believe binary log storage is superior.

> With the services you run, you might be able to dictate that the log formats are restrictive enough that writing a parser for each one isn't a problematic overhead.

I don't need to dictate all log formats. If I can't parse one, I'll just store it as-is, with some meta-data (timestamp, origin host, and so on). My processed logs do not need to be completely uniform. As long as they have a few common keys, I can work with them.

For some apps or groups of apps, I can create special parsers, but I don't necessarily need that from day one. If I'm ok with only new logs being parsed according to the new rules (and most often, I am), I can add new rules anytime.

> Parsing up-front, assuming you know what data you can safely throw away, might appear to some as a premature optimisation.

>> We have well documented tools and workflows, so anyone new to the system can catch up and start working with the logs within minutes. > It sounds like this is something which could be usefully open-sourced, to show how it's done.

LogStash is a reasonable starting point. Our solution has a lot of common with it, at least on the idea level.

> Pre-parsed binary logs in a locked-down environment might be as flexible as freeform text, but I'd need to see a running system to properly judge.

Only our storage is binary. That is all the article is talking about. Within that binary blob, there are many traces of freeform text, mostly in the MESSAGE keys of application logs which we care less about (and thus, parse no further than basic syslog parsing). You still have the flexibility of freeform text, even if you store it in a binary storage format.

discuss

No comments yet.