(no title)
bfors | 5 months ago
- Consider decoupling your collector from whatever is consuming your traces with something like kafka. Traces can be pretty heavy and it can be tricky to scale collectors. If something goes down, it's probably a good idea to continue writing the traces to queue or topic.
- https://www.otelbin.io is a nice little tool to help with collector configuration
akshayKMR|5 months ago
My ideal setup would be to just write SQL on telemetry data and plot dashboards / set alerts.
Also, thoughts on Vector vs otel agent?
srcreigh|5 months ago
Don’t use vector or otel-agent. Add a materialized view in clickhouse to transform data and swap HyperDX to load from your view (in the UI.)
Jedd|5 months ago
This isn't a lot to go on.
The important thing is what you're trying to instrument - hosts, applications, network, microservices, all of the above? (And then whether you want a few weeks retention, or keeping years worth.)
Grafana in front of Prometheus with node-exporter or telegraf (it can expose in prometheus mode) on the clients -- will tick a lot of boxes and is fast to get going.
Grafana in front of InfluxDB + telegraf is similar, but personally I find PromQL easier than InfluxQL.
> ... write SQL on telemetry data and plot dashboards / set alerts.
Read up about the design of TSDBs and log / tracing datastores - their design & intent heavily influences their query languages.
diurnalist|5 months ago
IMO, with the current tech, it entirely depends on what data you're talking about.
For metrics and traces, I would use the OTel collector personally. You will have much more flexibility and it's pretty easy to write custom processors in Go. Support for traces is quite mature and metrics isn't far off. We've been running collectors for production scale of metric and trace ingest for the past couple of years, on the order of 1m events/sec (metric datapoints or spans). You mentioned low volume so that's less important, but I just wanted to mention in case others find this comment.
Logs are a bit different. We looked in to this in the past year. Vector has emerging support for OTLP but it's pretty early. Still, I bet it's pretty straightforward if your backend can ingest via OTLP. Our main concern with running the otel-collector as the log ingest agent was around throughput/performance. Vector is battle-tested, otel is still a bit early in this space. I imagine over time the gap will be closed but I would probably still reach for Vector for this use-case for higher scale. That said, YMMV and as with any technical decision, empirical data and benchmarking on your workloads will be the best way to determine the tradeoffs.
For your scale you could probably get away with an OTel collector daemonset and maybe a deployment with the Target Allocator (to allocate Prometheus scrapes) and call it a day :)
GordonS|5 months ago
It's been solid, but the UI is kind of clunky and a little buggy here and there. Dashboards are tricky to setup too. But it has no dependencies, and was easy to setup, and I couldn't find anything else that handled logs too.
cyberax|5 months ago
The UI is predictably an annoying mess, but that's the case with EVERY tracing solution I've tried. Very much including SigNoz.
oulipo2|5 months ago
sdairs|5 months ago
ndhandala|5 months ago
smarx007|5 months ago
CuriouslyC|5 months ago