top | item 40941096

(no title)

Most probably, said ops folks have quite a few war stories to share about logs.

Maybe a JVM-based app went haywire, producing 500GB of logs within 15 minutes, filling the disk, and breaking a critical system because no one anticipated that a disk could go from 75% free to 0% free in 15 minutes.

Maybe another JVM-based app went haywire inside a managed Kubernetes service, producing 4 terabytes of logs, and the company's Google Cloud monthly usage went from $5,000 to $15,000 because storing bytes is supposed to be cheap when they are bytes and not when they are terabytes.

I completely agree that logs are useful, but developers often do not consider what to log and when. Check your company's cloud costs. I bet you the cost of keeping logs is at least 10%, maybe closer to 25% of the total cost.

discuss

andrewf|1 year ago

Agreed you need to engineer the logging system and not just pray. "The log service slowed down and our writes to it are synchronous" is one I've seen a few times.

On "do not consider what to log and when" .. I'm not saying don't think about it at all, but if I could anticipate bugs well enough to know exactly what I'll need to debug them, I'd just not write the bug.

jamesfinlayson|1 year ago

Just saw this at work recently - 94% of log disk space for domain controllers were filled by logging what groups users were in (I don't know the specifics but group membership is pretty static, and if a log-on fails I assume the missing group is logged as part of that failure message).

charlie0|1 year ago

Sounds like really bad design choices here. #1 logs shouldn't go on the same machine that's running the app, they should be reported tp another server and if you want local logs, then properly setup log rotators. Both would be good.