CNCF's Cortex v1.0: scalable, fast Prometheus implementation

[+] nopzor|6 years ago|reply

awesome job by the cortex team!

there's a lot of good questions, and some confusion in this thread. here is my view. note: i'm definitely biased; am the co-founder/ceo at grafana labs.

- at grafana labs we are huge fans of prometheus. it has become the most popular metrics backend for grafana. we view cortex and prometheus as complementary. we are also very active contributors to the prometheus project itself. in fact, cortex vendors in prometheus.

- you can think of cortex as a scale-out, multi-tenant, highly available "implementation" of prometheus itself.

- the reason grafana labs put so much resources into cortex is because it powers our grafana cloud product (which offers a prometheus backend). like grafana itself, we are also actively working on an enterprise edition of cortex that is designed to meet the security and feature requirements of the largest companies in the world.

- yes, cortex was born at weaveworks in 2016. tom wilkie (vp of product at grafana labs) co-created it while he worked there. after tom joined grafana labs in 2018, we decided to pour a lot more resources into the project, and managed to convince weave.works to move it to the cncf. this was a great move for the project and the community, and cortex has come a long long way in the last 2 years.

once again, a big hat tip to everyone who made this release possible. a big day for the project, and for prometheus users in general!

[edit: typos]

[+] Florin_Andrei|6 years ago|reply

I'm worried about this statement:

> Local storage is explicitly not production ready at this time.

https://cortexmetrics.io/docs/getting-started/getting-starte...

But I want a scale-out, multitenant implementation of Prometheus with local storage that's ready for prod. What are my options then? VictoriaMetrics?

[+] m0rphling|6 years ago|reply

Please note the difference between complimentary and complementary. It's a common homophone confusion in English.

The former means free or charge or expressing praise or a compliment.

The latter means disparate things go well together and enhance each others' qualities.

[+] kapilvt|6 years ago|reply

also props to https://weave.works for creating cortex, open-sourcing it and moving it under cncf, something this blog post leaves out.

[+] netingle|6 years ago|reply

Hi! Tom, one of the Cortex authors here. Super proud of the team and this release - let me know if you have any questions!

[+] number101010|6 years ago|reply

Hey Tom!

Can you outline how Cortex differs from some of the other available Prometheus backends?

[+] ctovena|6 years ago|reply

Great job Cortex team, Do you think this means Cortex will move to incubation in the CNCF landscape ?

[+] ones_and_zeros|6 years ago|reply

Isn't prometheus an implementation and not an interface? I have "prometheus" running in my cluster, if it's not cortex, what implementation am I using?

[+] ownagefool|6 years ago|reply

It's kinda several things

- The OSS product

- The Storage Format (I guess)

- The Interface for pulling metrics (https://github.com/OpenObservability/OpenMetrics)

I haven't dug into cortex even a little, but the other comments are suggesting it's API compatible but essentially claiming they're production ready because they'll give you things the OSS project won't give you out of the box, i.e. long term storage and RBAC.

Looks like a good thing.

[+] outworlder|6 years ago|reply

You are using Prometheus.

However, Prometheus can use different storage backends. The TSDB that it comes with is horrible.

I mean, it's workable. And can store an impressive amount of data points. If you don't care about historical data or scale, it may be all you need.

However, if your scale is really large, or if you care about the data, it may not be the right solution, and you'll need something like Cortex.

For instance, Prometheus' own TSSB has no 'fsck'-like tool. From time to time, it does compaction operations. If your process (or pod in K8s) dies, you may be left with duplicate time series. And now you have to delete some (or a lot!) of your data to recover.

Prometheus documentation, last I checked, even says it is not suitable for long-term storage.

[+] netingle|6 years ago|reply

Yes, Prometheus is an implementation - the HN text has a limited number of words, so I thought "Prometheus implementation" conveyed the fact Cortex was trying to be a 100% API compatible implementation of Prometheus, but with scalability, replication etc

[+] gouthamve|6 years ago|reply

Yes, you're running the Prometheus server. But what Cortex is a Prometheus API compatible service that horizontally scales and has multi-tenancy and other things built in.

[+] Rapzid|6 years ago|reply

Dat architecture tho: https://cortexmetrics.io/docs/architecture/ . Holy bi-gebus.

[+] netingle|6 years ago|reply

Thats the "microservices" mode - you can run it as a single process and the architecture becomes super boring.

Its like looking at the module interdependencies of reasonably large piece of software; of course its going to look complicated.

[+] zytek|6 years ago|reply

Congrats to Grafana Team!

If you're looking at scaling your Prometheus setup - check out also Victoria Metrics.

Operational simplicity and scalability/robustness are what drive me to it.

I used to to send metrics from multiple Kubernetes clusters with Prometheus - each cluster having Prom with remote_write directive to send metrics to central VictoriaMetrics service.

That way my "edge" prometheus installations are practically "stateless", easily set up using prometheus-operator. You don't even need to add persistent storage to them.

[+] mmcclellan|6 years ago|reply

New to Cortex but when looking at a comparison of Prometheus and InfluxDB (like https://prometheus.io/docs/introduction/comparison/#promethe...) it appears that Cortex offers similar horizontal scalability features to the InfluxDB Enterprise offering. The linked comparison does note the difference between event logging and metrics recording but I am curious (choosy beggar that I am) whether others consider them separate tooling or whether it is possible to remain performant using one solution.

[+] stuff4ben|6 years ago|reply

This was a Weaveworks project right?

[+] gouthamve|6 years ago|reply

Yes, it was created at Weaveworks, but it was later donated to CNCF and now the community is much bigger! Having said that Weaveworks is still a major contributor!

46 comments