(no title)
xinweihe | 6 months ago
When we say we "organize all logs, metrics, and traces", we mean more than just linking them together (which otel already supports). What we’re doing is:
- context engineering optimization: We leverage the structure among logs, spans, and metadata to filter and group relevant context before passing it to the LLM. In real production issues, it's common to see 10k+ logs, traces, etc. related to a single incident — but most of it is noise. Throwing all that at agents usually leads to poor performance due to context bloat see https://arxiv.org/pdf/2307.03172. We're working on addressing that by doing structured filtering and summarization. For more details see https://bit.ly/45Bai1q.
- Human-in-the-Loop UI: For cases where developers want to manually inspect or guide the agent, we provide a UI that makes it easy to zoom in on relevant subtrees, trace paths, or log clusters and directly select spans to be included in the reasoning of agents.
The goal isn't just unification, it's scalable reasoning over noisy telemetry data, both automated and interactive.
Hope that clears things up a bit! Happy to dive deeper if useful.
lmeyerov|6 months ago
It's interesting to wonder if 80% of the question answering can be achieved as a prompts/otel.md over MCPs connected to Claude Code and let agentic reasoning do the rest
Ex:
* When investigating errors, only query for error-level logs
* When investigating performance, only query spans (skip logs unless required) and keep only name, time. Linearize as ... .
* When querying both logs & traces, inline logs near relevant trace as part of an llm-friendly stored artifact jobs/abc123/context.txt
Are there aspects of the question answering (not ui widgets) you think are too hard there?
zecheng|6 months ago