(no title)
proddata | 4 years ago
For the 3 caveats at the top, there are already two TS solutions that look promising (QuestDB, TimescaleDB). Often an operational analytics DB (Clickhouse, CrateDB) might also be a solution.
proddata | 4 years ago
For the 3 caveats at the top, there are already two TS solutions that look promising (QuestDB, TimescaleDB). Often an operational analytics DB (Clickhouse, CrateDB) might also be a solution.
akulkarni|4 years ago
Thanks for the mention, and I completely agree :-)
Personally, there is a lot in this article that is misguided.
For example, it essentially defines "time-series database" as "metric store." As TimescaleDB users know, TimescaleDB handles a lot more than just metrics. In fact, we handle any of the data types that Postgres can handle, which I suspect is more than what Honeycomb's custom store supports.
This is a broad generalization. Some time-series databases are better at high cardinality than others. Also, what is "high-cardinality" - 100K? 1M? 10M? (We in fact are designed for _higher cardinalities_ than most other time-series databases [0]) We just launched tracing and metrics support in the same backend - in Promscale, built on TimescaleDB [1]I do commend the folks at Honeycomb for having a good product loved by some of my colleagues (at other companies). I also commend them for attempting to write an article aimed to educate. But I wish they had done more research - because without it, this article (IMO) ends up confusing more than educating.
For anyone curious on our definition of "time-series data" and "time-series databases": https://blog.timescale.com/blog/what-the-heck-is-time-series...
[0] https://blog.timescale.com/blog/what-is-high-cardinality-how...
[1] https://blog.timescale.com/blog/what-are-traces-and-how-sql-...
ignoramous|4 years ago
PS https://www.timescale.com/papers/timescaledb.pdf is 404
eska|4 years ago
oconnore|4 years ago
This might be a bit off topic, but speaking of gaps in common observability tooling: is an OLAP database a common go-to for longer-timescale analytics (as in [1])? We're using BigQuery, but on ~600GB of log/event data I start hitting memory limits even with fairly small analytical windows.
In this context I have seen other references to: Sawzall (google), Lingo (google), MapReduce/Pig/Cascading/Scalding. Are people using Spark for this sort of thing now? Perhaps a combined workflow would be ideal: filter/group/extract interesting data in Hadoop/Spark, and then load into OLAP for ad-hoc querying?
[1]: https://danluu.com/metrics-analytics/
proddata|4 years ago
I would not consider Clickhouse or CrateDB "classic" OLAP DBs. I can speak for CrateDB (I work there), that it definitely would be able to handle 600GB and query across it in an ad-hoc manner.
We have users ingesting Terabytes of events per day and run aggregations across 100 Terabyte.
jpgvm|4 years ago
Druid only really has 1 downside, which is it's still a bit of a pain to setup. It's gotten a ton ton better in recent times and I have been contributing changes to make it work better out of the box with common big data tooling like Avro.
For performance it's the top dog except for really naive queries that are dominated by scan performance. For those you are best off with Clickhouse, it's vectorized query engine is extremely fast for simpler/scan heavy workloads.
shaklee3|4 years ago
alfiedotwtf|4 years ago
dominotw|4 years ago
https://www.youtube.com/playlist?list=PLSE8ODhjZXjY0GMWN4X8F...
Things have changed a little bit now , but not much.
jpgvm|4 years ago
dikei|4 years ago
camel_gopher|4 years ago