Do you think it's good enough to replace Graphite's Carbon at this point? Not as a drop in to graphite, but as a backend to a custom metrics system?
I know your project is young and probably has not seen much battle testing, but your blog post indicates to me that you've put a lot of thought into it being robust.
We are using Carbon for our metrics solution at the moment, and I've read its source and it's not something I'd give a big 'ready for production' stamp even though I know many shops are using it in production.
Perhaps, if you feel like it and will entertain my cheap grab for information, could you give super small explanation of the performance differences between your partition style format and for example Whisper (like RRD, it's what Carbon uses) and InfluxDB? As far as I understand Whisper is simply a cyclic buffer of fixed time distance points in a file per series. And InfluxDB is simply a key value store I think.
Your solution lies somewhere in between those right?
I'm curious why Bolt isn't a good fit for your timeseries project ? It has excellent iteration support, supports concurrent readers, is already mmaped and can hold multiple "namespaces" through the use of buckets.
I read through your post, and I am curious; what is the reasoning driving having metrics split across partitions, rather than partitioning by metric itself?
This is a pleasant, self-honest discussion of purpose-driven software as it is being implemented. The language is irrelevant; it's fun to read this type of piece.
Thank you! You're right; the language is irrelevant. I wrote this in Go because the parent project is written in Go. I'm looking forward to writing more of this blog series.
Hi, Prometheus[0] author here. Thanks for the interesting article!
Since I was curious how this compares to Prometheus's internal storage for writes, I whipped up some (disclaimer: very naive and ad-hoc!) benchmarks[1] to get a rough feeling for Catena's performance. I am not achieving a lot of write performance with it yet, but maybe I'm doing something wrong or using it inefficiently. Some questions to investigate would be: what's the best number of rows to batch in one insert, and are timestamps in seconds, milliseconds, or essentially only user-interpreted (I noticed the partitioning at least depends heavily on the interval between timestamps)? So far I've just done a tiny bit of fiddling and results haven't changed dramatically.
The benchmark parameters:
* writing 10000 samples x 10000 metrics (100 million data points)
* initial state: empty storage
* source names: constant "testsource" for all time series
* metric names: "testmetric_<i>" (0 <= i < 10000)
* values: the metric index <i> (constant integer value within each series)
* timestamps: starting at 0 and increasing by 15 seconds for every iteration
* Other machine load: SoundCloud playing music in the background
The benchmark results:
#### Prometheus ####
(GOMAXPROCS=4 go run prometheus_bench.go -num-metrics=10000 -samples-per-metric=10000)
Time: 1m26s
Space: 138MB
#### Catena ####
(GOMAXPROCS=4 go run catena_bench.go -num-metrics=10000 -samples-per-metric=10000)
Time: 1h25m
Space: 190MB
So in this particular benchmark Catena took 60x longer and used 1.4x more space.
Please don't take this as discouragement or a statement on one being better than the other. Obviously Catena is very new and also probably optimized for slightly different use cases. And possibly I'm just doing something wrong (please tell me!). I also haven't dug into possible performance bottlenecks yet, but I saw it utilize 100% of all 4 CPU cores the entire time. In any case, I'd be interested in a set of benchmarks optimized specifically for Catena's use case.
Unfortunately we also haven't fully documented the internals of Prometheus's storage yet, but a bit of background information can be found here: http://prometheus.io/docs/operating/storage/ Maybe that's worth a blog post sometime.
Thanks for trying it out! I haven't had time to run any benchmarks, so I really appreciate you taking the time to do this (especially since it took so long!).
I'm not sure what the best batch size is at the moment. Timestamps are int64s, and it's up to the user to interpret them as they wish. Partition sizes are in terms of the number of timestamps. If you had timestamps which correspond to seconds, and you wanted each partition to be 1 day, you'd choose 86400. This isn't configurable yet unless you modify the source.
I'm not surprised it's that slow. I'm not a storage engine expert, still a college student, and this has my first lock-free list implementation :). There is a lot of silliness going on with inserts, like launching a goroutine for each metric on inserts[0], using lock-free O(n) lists for sources and metrics[1] when I could have used a map, there's still a lock left that should be removed[2], and a bunch of others.
On an unrelated note, I see that someone from Prometheus will be at Monitorama. I'll be there too, so I'd love to you guys some more.
[+] [-] misframer|11 years ago|reply
Edit: I'm not sure why I'm being downvoted... I'm not home at the moment, so I'm trying my best to answer using my phone.
Edit #2: Back home with a full-size QWERTY keyboard :).
[+] [-] tinco|11 years ago|reply
I know your project is young and probably has not seen much battle testing, but your blog post indicates to me that you've put a lot of thought into it being robust.
We are using Carbon for our metrics solution at the moment, and I've read its source and it's not something I'd give a big 'ready for production' stamp even though I know many shops are using it in production.
Perhaps, if you feel like it and will entertain my cheap grab for information, could you give super small explanation of the performance differences between your partition style format and for example Whisper (like RRD, it's what Carbon uses) and InfluxDB? As far as I understand Whisper is simply a cyclic buffer of fixed time distance points in a file per series. And InfluxDB is simply a key value store I think.
Your solution lies somewhere in between those right?
[+] [-] rakoo|11 years ago|reply
I'm curious why Bolt isn't a good fit for your timeseries project ? It has excellent iteration support, supports concurrent readers, is already mmaped and can hold multiple "namespaces" through the use of buckets.
[+] [-] shizcakes|11 years ago|reply
[+] [-] sbt|11 years ago|reply
Welcome to the internet :)
[+] [-] carbocation|11 years ago|reply
[+] [-] misframer|11 years ago|reply
[+] [-] jrv|11 years ago|reply
Since I was curious how this compares to Prometheus's internal storage for writes, I whipped up some (disclaimer: very naive and ad-hoc!) benchmarks[1] to get a rough feeling for Catena's performance. I am not achieving a lot of write performance with it yet, but maybe I'm doing something wrong or using it inefficiently. Some questions to investigate would be: what's the best number of rows to batch in one insert, and are timestamps in seconds, milliseconds, or essentially only user-interpreted (I noticed the partitioning at least depends heavily on the interval between timestamps)? So far I've just done a tiny bit of fiddling and results haven't changed dramatically.
The benchmark parameters:
* writing 10000 samples x 10000 metrics (100 million data points)
* initial state: empty storage
* source names: constant "testsource" for all time series
* metric names: "testmetric_<i>" (0 <= i < 10000)
* values: the metric index <i> (constant integer value within each series)
* timestamps: starting at 0 and increasing by 15 seconds for every iteration
* GOMAXPROCS=4 (4-core "Core i5-4690K" machine, 3.5GHz)
* Disk: SSD
* Other machine load: SoundCloud playing music in the background
The benchmark results:
#### Prometheus ####
(GOMAXPROCS=4 go run prometheus_bench.go -num-metrics=10000 -samples-per-metric=10000)
Time: 1m26s Space: 138MB
#### Catena ####
(GOMAXPROCS=4 go run catena_bench.go -num-metrics=10000 -samples-per-metric=10000)
Time: 1h25m Space: 190MB
So in this particular benchmark Catena took 60x longer and used 1.4x more space.
Please don't take this as discouragement or a statement on one being better than the other. Obviously Catena is very new and also probably optimized for slightly different use cases. And possibly I'm just doing something wrong (please tell me!). I also haven't dug into possible performance bottlenecks yet, but I saw it utilize 100% of all 4 CPU cores the entire time. In any case, I'd be interested in a set of benchmarks optimized specifically for Catena's use case.
Unfortunately we also haven't fully documented the internals of Prometheus's storage yet, but a bit of background information can be found here: http://prometheus.io/docs/operating/storage/ Maybe that's worth a blog post sometime.
[0] http://prometheus.io/
[1] The code for the benchmarks is here: https://gist.github.com/juliusv/ce7c3b5368cd7adf8bc6
[+] [-] misframer|11 years ago|reply
I'm not sure what the best batch size is at the moment. Timestamps are int64s, and it's up to the user to interpret them as they wish. Partition sizes are in terms of the number of timestamps. If you had timestamps which correspond to seconds, and you wanted each partition to be 1 day, you'd choose 86400. This isn't configurable yet unless you modify the source.
I'm not surprised it's that slow. I'm not a storage engine expert, still a college student, and this has my first lock-free list implementation :). There is a lot of silliness going on with inserts, like launching a goroutine for each metric on inserts[0], using lock-free O(n) lists for sources and metrics[1] when I could have used a map, there's still a lock left that should be removed[2], and a bunch of others.
On an unrelated note, I see that someone from Prometheus will be at Monitorama. I'll be there too, so I'd love to you guys some more.
Thanks again!
[0] https://github.com/PreetamJinka/catena/blob/8e068b1b95ce1a10...
[1] https://github.com/PreetamJinka/catena/blob/8e068b1b95ce1a10...
[2] https://github.com/PreetamJinka/catena/blob/8e068b1b95ce1a10...
[+] [-] bascule|11 years ago|reply
[+] [-] unknown|11 years ago|reply
[deleted]
[+] [-] kylered|11 years ago|reply