Clickhouse went corp a couple weeks ago, Timescale goes fully managed, Snowflake, Dremio, DBX ... :popcorn:
Apache Arrow, K8s, ML analytics have given rise to another DB War.
The end of NoSQL was the realization that SQL had a good reason for existing in most cases. Now we have massively distributed SQL in many flavours. I wonder what the hard lessons will be this time?
I'll wager small data companies will be spending good money on vastly overpowered engines... I wonder what else tho?
Using NoSQL is fast, but requires you to know the how you want to read the data before you start writing. This is not the case with analytics, where new queries pop up all the time.
As a result, NoSQL never really catch on in the analytics/BI crowds, SQL is always king there, if you discount Excel :)
Influxdb was doing a project in Rust/Arrow as well right?
I think the lesson might be- be careful who and when you take funding from. All these managed services smack of VCs looking for ARR growth and a big exit...
I’d love to see some overlap/collab with Supabase here. If Timescale is providing databases I don’t have to monkey with that scales to my workload, and Supabase is proving Firebase-like ergonomics and DX, there’s some strong synergy here.
Supabase as an add on to Timescale Cloud or Timescale as a supported alternate provider for Supabase would be compelling for get-it-done devs and teams out there
We have 2 tables that are good candidates for Timescales, others are fine with Postgres.
We perform joins query across those 2 table and others. What do you suggest for this? Migrate all to timescale or have two database (Timescale for 2 tables and PG for the rest) ?
I find clickhouse with proper time-series encoding and tiered data storage to be a better alternative than timescale. There were also some issues with ingestion speed with a timescaleDB.
Can we think of timescale as OLTP for timeseries data with gurrantees from postgres and clickhouse as OLAP for timeseries data?
Clickhouse is a really great piece of technology, especially for general OLAP.
But for time-series workloads, we've found that the results are quite close, with TimescaleDB outperforming Clickhouse for many different types of query workloads. We'll be sharing our results soon.
This is quite interesting. Some of the features and upcoming features look really nice. One-click database forks would be really handy. VPC peering is nice, not just for security, but also so AWS doesn't fleece you with bandwidth charges ($0.01/GB on your side and timescale side in the same AZ, and worse if not. For big data systems that can be a lot.)
Seems like to scale it separately it would need to be EBS not local instance storage? I wonder if magnetic or SSD? That does constrain the performance, especially for queries.
Yes, we use EBS SSD on the backend so we can scale up storage separately from the instance. Our Cloud performance metrics are based on this backend so the short answer is no it doesn't constrain perf. The constraint I see right now is that we are currently mostly GP2 with a planned migration to GP3 which will allow for new independent controls of IOPS and throughput. There are certain, uncommon, situations where customers need to bump up performance beyond what the normal GP2 perf steps allow.
To tie GP2/3 back into the serverless vs. DBaaS concepts we are looking at auto-scale for IOPS/Throughput performance while also allowing more direct access such that a customer could control performance via APIs to manage on your own.
Looks cool. Will this work with postgraphile, hasura, or prisma? They seem to suggest it does but I wonder if anyone has tried it. Postgraphile relies on row-level policies [0] and not even all hosted postgres instances work with it in that respect.
CoralCDN ran from about 2004 - 2015. Eventually, its need was pretty much negated by the rise of free CDN services (e.g., Cloudflare) or just lots of SaaS services that supported user-generated content. For example, in late 2004, many of the amateur videos of a large Indian Ocean earthquake & tsunami were shared using CoralCDN, but that soon went to YouTube. Podcasters like "This Week in Tech" were using CoralCDN, but those went to freemium podcasting services. And so on.
What eventually took CoralCDN "down" was that the academic platform on which the ~1000 servers ran, PlanetLab (https://planetlab.cs.princeton.edu/) was end-of-life'd.
But taking a step back, the original thinking behind CoralCDN was as a peer-to-peer CDN, but there were a lot of web security issues that actually made that difficult. If folks are interested, I talk about some of these issues in this 2009 retrospective [0], also also outline a browser-based P2P CDN in this workshop paper that could address (and actually make it P2P but secure)[1]. But still, I think the economics of CDNs (and transit costs) have just changed, such that most of the p2p architectures just don't make sense today like they did in early 2000s.
Your view is the pedant’s view. Of course there are servers. There always will be, to some degree. They’re just not your servers to manage (e.g. update, scale) or pay for. You don’t own them, or even rent them, so you can eliminate the hardware details from your thought process.
I like serverless. I know that cloud providers can run my code and auto scale it however is needed. I don't want to worry about how that actually works. I don't want to auto scale servers or think about RAM, etc.
I also like cloud services like Timescale. I just want a database of a certain size. How it runs? I don't care at all as long as it works.
Also a fan and user of Timescale, and I hate the serverless phase.
I'm pretty sure everyone is pushing it so they can make money from SaS. But if this allows them to give it away for free, then I'm all for it ( just won't be using it ).
Serverless should probably just be re-named to "pay-for-request/cpu time/whatever" or something like that. By and large most if not all "serverless" databases or data services (like kafka/pulsar) are just multi-tenant deployments and you're billed on the metrics your tenant generates. Unlike RDS where you provision an instance that you pay for as long as it's running.
"Gay people aren't even always happy! Gay means happy!"
Language changes. "Serverless" clearly doesn't mean "computers aren't involved", but "the computers are someone else's responsibility". It's a marketing term, just like "cloud", and it's unlikely to go away.
As with "hackers vs. crackers", "Linux vs. GNU/Linux", "copyright infringement vs. piracy", and other similar scenarios, the ship has sailed here.
There is a lot of marketing in that blog post but I feel the tldr is that it is a rebranding of forge (their second in-house cloud offering) that will eventually replace their outsourced timescaledb cloud (first offering by aiven).
Am I correct? If so what is the plan for existing customers of those services? Especially since forge didnt support other clouds than AWS last time I checked.
Not exactly right - seems like we could have clarified the difference more :-)
We have two cloud products, Timescale Cloud (which is what this post is discussing), and Managed Service for TimescaleDB (MST), which is what you are also referencing.
Also, as we say in the post:
Some of you may remember that we launched the first “Timescale Cloud” 2.5 years ago, as the world’s first fully-managed time-series database-as-a-service on AWS, GCP, Azure. That product is alive and well, and fully supported as before, but is now called “Managed Service for TimescaleDB”.
We're investing in and maintaining both. They are just different products, depending on what you are looking for.
Sounds like a lot of people don't like the "serverless" term. And I agree, it's not great.
So here's an "RFP" - any suggestions for a better term to describe this consumption-based experience, where you don't need to worry about servers (and ideally, you don't pay for what you don't use)?
[+] [-] xbpx|4 years ago|reply
Apache Arrow, K8s, ML analytics have given rise to another DB War.
The end of NoSQL was the realization that SQL had a good reason for existing in most cases. Now we have massively distributed SQL in many flavours. I wonder what the hard lessons will be this time?
I'll wager small data companies will be spending good money on vastly overpowered engines... I wonder what else tho?
[+] [-] akulkarni|4 years ago|reply
[0] https://blog.timescale.com/blog/building-open-source-busines...
[+] [-] dikei|4 years ago|reply
As a result, NoSQL never really catch on in the analytics/BI crowds, SQL is always king there, if you discount Excel :)
[+] [-] Wonnk13|4 years ago|reply
I think the lesson might be- be careful who and when you take funding from. All these managed services smack of VCs looking for ARR growth and a big exit...
[+] [-] hardwaresofton|4 years ago|reply
Supabase as an add on to Timescale Cloud or Timescale as a supported alternate provider for Supabase would be compelling for get-it-done devs and teams out there
[+] [-] CameronNemo|4 years ago|reply
https://supabase.io/
https://www.metabase.com/
https://postgrest.org/
https://hasura.io/
Of course grafana too, but that goes beyond just PostgreSQL.
[+] [-] Aarvay|4 years ago|reply
[+] [-] devops000|4 years ago|reply
[+] [-] Croftengea|4 years ago|reply
[+] [-] chrisdalke|4 years ago|reply
[+] [-] dominotw|4 years ago|reply
By this i assume you want columnar access along time dimension.
There are a bunch of columnar options out there ( timescale being one). you can operate hybrid row + column access.
https://www.citusdata.com/blog/2021/03/06/citus-10-columnar-...
https://swarm64.com/post/postgresql-columnstore-index-intro/
[+] [-] Jarwain|4 years ago|reply
If not, it may be worthwhile to migrate to a host that Does support timescaledb, if not timescales managed product itself
[+] [-] devops000|4 years ago|reply
[+] [-] _9rq6|4 years ago|reply
Can we think of timescale as OLTP for timeseries data with gurrantees from postgres and clickhouse as OLAP for timeseries data?
Continuous Aggregates is a neat feature though.
[+] [-] akulkarni|4 years ago|reply
But for time-series workloads, we've found that the results are quite close, with TimescaleDB outperforming Clickhouse for many different types of query workloads. We'll be sharing our results soon.
[+] [-] eloff|4 years ago|reply
I wonder how the storage works: https://docs.timescale.com/cloud/latest/scaling-a-service/#p...
Seems like to scale it separately it would need to be EBS not local instance storage? I wonder if magnetic or SSD? That does constrain the performance, especially for queries.
[+] [-] clarkbw|4 years ago|reply
To tie GP2/3 back into the serverless vs. DBaaS concepts we are looking at auto-scale for IOPS/Throughput performance while also allowing more direct access such that a customer could control performance via APIs to manage on your own.
(timescaler here)
[+] [-] jsjsossse|4 years ago|reply
[+] [-] rgbrgb|4 years ago|reply
[0]: https://www.graphile.org/postgraphile/security/
[+] [-] avthar|4 years ago|reply
That said, I'd be curious to hear about other folks experiences.
Disclaimer: I work at Timescale.
[1] https://hasura.io/blog/using-timescaledb-with-hasura-graphql...
[+] [-] 1vuio0pswjnm7|4 years ago|reply
[+] [-] mfreed|4 years ago|reply
CoralCDN ran from about 2004 - 2015. Eventually, its need was pretty much negated by the rise of free CDN services (e.g., Cloudflare) or just lots of SaaS services that supported user-generated content. For example, in late 2004, many of the amateur videos of a large Indian Ocean earthquake & tsunami were shared using CoralCDN, but that soon went to YouTube. Podcasters like "This Week in Tech" were using CoralCDN, but those went to freemium podcasting services. And so on.
What eventually took CoralCDN "down" was that the academic platform on which the ~1000 servers ran, PlanetLab (https://planetlab.cs.princeton.edu/) was end-of-life'd.
But taking a step back, the original thinking behind CoralCDN was as a peer-to-peer CDN, but there were a lot of web security issues that actually made that difficult. If folks are interested, I talk about some of these issues in this 2009 retrospective [0], also also outline a browser-based P2P CDN in this workshop paper that could address (and actually make it P2P but secure)[1]. But still, I think the economics of CDNs (and transit costs) have just changed, such that most of the p2p architectures just don't make sense today like they did in early 2000s.
But thanks for the kind words!
[0] https://www.cs.princeton.edu/~mfreed/docs/coral-nsdi10.pdf
[1] https://www.cs.princeton.edu/~mfreed/docs/firecoral-iptps09....
[+] [-] akulkarni|4 years ago|reply
[+] [-] tofuahdude|4 years ago|reply
I'm a fan of Timescale but definitely not a fan of "serverless" as a phrase.
That phrase is just abstracting "other people's computers" one additional degree, in a borderline meaningless way.
There's still a server. How is that serverless? It's just a server managed in a more indirect way.
Please convince me I am wrong here.
[+] [-] btgeekboy|4 years ago|reply
[+] [-] leros|4 years ago|reply
I also like cloud services like Timescale. I just want a database of a certain size. How it runs? I don't care at all as long as it works.
[+] [-] qorrect|4 years ago|reply
I'm pretty sure everyone is pushing it so they can make money from SaS. But if this allows them to give it away for free, then I'm all for it ( just won't be using it ).
[+] [-] zinclozenge|4 years ago|reply
[+] [-] ceejayoz|4 years ago|reply
Language changes. "Serverless" clearly doesn't mean "computers aren't involved", but "the computers are someone else's responsibility". It's a marketing term, just like "cloud", and it's unlikely to go away.
As with "hackers vs. crackers", "Linux vs. GNU/Linux", "copyright infringement vs. piracy", and other similar scenarios, the ship has sailed here.
[+] [-] Sytten|4 years ago|reply
Am I correct? If so what is the plan for existing customers of those services? Especially since forge didnt support other clouds than AWS last time I checked.
[+] [-] akulkarni|4 years ago|reply
We have two cloud products, Timescale Cloud (which is what this post is discussing), and Managed Service for TimescaleDB (MST), which is what you are also referencing.
Also, as we say in the post:
We're investing in and maintaining both. They are just different products, depending on what you are looking for.[+] [-] unknown|4 years ago|reply
[deleted]
[+] [-] unknown|4 years ago|reply
[deleted]
[+] [-] akulkarni|4 years ago|reply
So here's an "RFP" - any suggestions for a better term to describe this consumption-based experience, where you don't need to worry about servers (and ideally, you don't pay for what you don't use)?
[+] [-] beoberha|4 years ago|reply
To me, serverless is fine. Yes, it’s a buzzword, but it makes sense.
[+] [-] unknown|4 years ago|reply
[deleted]