top | item 47182125

(no title)

But does it work? I’ve used LLMs for log analysis and they have been prone to hallucinate reasons: depending on the logs the distance between cause and effects can be larger than context, usually we’re dealing with multiple failures at once for things to go badly wrong, and plenty of benign issues throw scary sounding errors.

discuss

aluzzardi|3 days ago

Post author here.

Yes, it works really well.

1) The latest models are radically better at this. We noticed a massive improvement in quality starting with Sonnet 4.5

2) The context issue is real. We solve this by using sub agents that read through logs and return only relevant bits to the parent agent’s context

hinkley|3 days ago

So you’re not getting alerts at 2 am from hallucinations?

sollewitt|3 days ago

I would be very interested in reading about this kind of orchestration and filtering than data acquisition if you have the energy for another post :)

cgfjtynzdrfht|3 days ago

[deleted]

verdverm|3 days ago

It can, like all the other tasks, it's not magic and you need to make the job of the agent easier by giving it good instructions, tools, and environments. It's exactly the same thing that makes the life of humans easier too.

This post is a case study that shows one way to do this for a specific task. We found an RCA to a long-standing problem with our dev boxes this week using Ai. I fed Gemini Deep Research a few logs and our tech stack, it came back with an explanation of the underlying interactions, debugging commands, and the most likely fix. It was spot on, GDR is one of the best debugging tools for problems where you don't have full understanding.

If you are curious, and perhaps a PSA, the issue was that Docker and Tailscale were competing on IP table updates, and in rare circumstances (one dev, once every few weeks), Docker DNS would get borked. The fix is to ignore Docker managed interfaces in NetworkManager so Tailscale stops trying to do things with them.

aluzzardi|3 days ago

> it's not magic and you need to make the job of the agent easier by giving it good instructions, tools, and environments.

This. We had much better success by letting the agent pull context rather trying to push what we thought was relevant.

Turns out it's exactly like a human: if you push the wrong context, it'll influence them to follow the wrong pattern.

sollewitt|3 days ago

Thanks - that’s the maddening with flakes - is it the thing under test or the thing doing the testing? Hermeticity is a lie we tell ourselves :)

shad42|3 days ago

Mendral co-founder here, we built this infra to have our agent detect CI issues like flaky tests and fix them. Observing logs are useful to detect anomalies but we also use those to confirm a fix after the agent opens a PR (we have long coding sessions that verifies a fixe and re-run the CI if needed, all in the same agent loop).

So yes it works, we have customers in production.

hardolaf|2 days ago

I can't get an LLM to properly handle analyzing a single 200K+ line log without making things up so whatever anyone is saying about this "working" is probably a lie.

kburman|3 days ago

Honestly, with recent models, these types of tasks are very much possible. Now it mostly depends on whether you are using the model correctly or not.