> Developers want the durability, stability, and scalability of a SQL database but do not want to be constrained by managing a schema
Speak for yourself. I love my schemas.
This article is also 100% fluff, and zero actual information. Obviously I won’t sign up to anything without an ounce of information being provided up front.
From the Vercel point of view, this promises to answer one of the most frequent, interesting, and technically challenging questions since we first launched our "immutable deploys".
That is: how can I pair a brand new frontend preview deploy, with a serverless database with the specific schema my new feature needs?
This technology makes the whole serverless stack feel complete.
It doesn’t solve the N+1 queries problem in a generic way. That is a major hurdle in scaling complexity-wise, which in turn is often deeply coupled with business realities.
More to the point, it probably cannot solve it efficiently at all, since it is not a graph database and thus cannot be paired with a generic GraphQL resolver (would generate join queries instead of lookups across edges) and a stack of generated static queries in the backend (no need to allow generic queries, just make it possible to write queries once and only once).
Before GitHub launched, I built some large eCommerce sites, and for a vcs we used CVS, and then Subversion. We had a person on our team with the title of release manager, because branching in Subversion, and merging back to make releases, was a specialist effort that took time, patience, and managing a tremendous amount of fighting between teams of what features and fixes could even be merged together in order to ship this release.
When I started using git, it broke my brain. I wasn't really sure I understood it, but the idea of cheap local branches soon became the most important thing to me in a vcs, and everything became easier and so much faster. You could just work on code the way real life happens, not in some methodical pre-planned release schedule that always rubbed harshly against the reality of bug fixes and ever changing minds.
Planetscale is a lot like that transformation. We've been thinking of databases as this specialist thing, where the arcane knowledge required to make it perform well is a specialist role, and something completely untouchable by mortal engineers. While we've learned about the importance of good data modeling, we've dealt with a layer in our stack that is essentially static. Shipping features that require changes to a database schema sit and languish, because the pain and coordination required can often be too much for a team to want to deal with.
What PlanetScale has done is solve two massive problems in one platform. First, since it's built on Vitess, it Just Works At Scale. You don't need to fiddle with knobs to get it to perform well. You most definitely are not even in the top 10% of what is already running on Vitess, so you don't really need to worry about ever outgrowing PlanetScale. But the BIG innovation here is what is possible now that a database works the way the SDLC has evolved with git. Make a branch, change the schema, roll it out with your code. Just like anything else, really. Use the PlanetScale CLI to actually USE your database. Change the schema because it will make your code better, and don't worry that you're somehow going to do it wrong.
PlanetScale just made databases useful for developers beyond a nearly-static data store. It's a high-scale database that you can change like code. It will break your brain a little at first, just like git did. And then you'll wonder why the hell we waited so long to have a database this good.
One question about pricing -- I can't tell if the "free" tier costs are for one month, or ongoing. That is, do you get the free tier amounts for one month and then pay from then on, or is that level of service free from then on?
> It's a high-scale database that you can change like code. It will break your brain a little at first, just like git did.
I think you just described why developers liked MongoDB. It does not get a lot of love on HN, but having the DB schema map to your object model is very convenient.
As I understand it, Vitess is basically a really powerful sharding system, which goes a step further than typical sharding solutions by basically making the shards one or more unique databases. In the case of someone like slack, because your tenant (e.g your company slack), is completely isolated from other tenants, you can treat that basically as its own database, and have a master for just that DB, allowing much better scaling. The big limitation on Vitess is cross-shard transactions, and the fact you have to make sure your schema has a clear cut sharding key (like your tenant ID) that works nicely with your application needs. The alternative for scaling transactional SQL DBs in a multi-master fashion are the "NewSQL" DBs like yugabyte and cockroachdb which are basically document DBs with a partially implemented postgres frontend, so don't have the full feature set of your SQL engine like Vitesse does but don't require so much attention to sharding. These are oversimplifications of the actual mechanisms, but give a basic overview of the tradeoffs involved, please feel free to correct me on any inaccuracies as I'm not an expert in DBs.
Ultimately all databases scale the same way, by splitting up data into shards/partitions/segments and spreading them out over several servers, along with replication for durability. The partitioning is done by a primary/sorting/distribution key on the data for each table.
Implementations vary but there are the 2 major architectures: systems like Vitess/Proxy SQL/Citus/Timescale that act as a proxy layer on top of existing RDBMS running on multiple servers to make them look like a single database, and entirely custom projects like CockroachDB/TiDB/Yugabyte/Cloud Spanner which have their own native processing and data layers.
OLAP relational data warehouses like Vertica/Greenplum/MemSQL/Redshift/Bigquery are also natively distributed but focus on large-scale analytics with features like column-oriented storage and vectorized processing.
Vitess is an additional layer on top of MySQL which all queries pass though. Among other things, it implements its own query parser which can then do stuff like split a query across shards and join results etc.
I wouldn't say there's too much "magic" in there, but it does a lot of known difficult things (schema management, sharding/resharding, connection pooling, query optimization, DB administration, monitoring, backup/failover) which are generally painful and expensive to do yourself.
IME the answer to "how did they make [hard to scale thing] easily scalable?" is usually that they introduce limitations in how you can use [hard to scale thing] so you can't use it in ways that are hard to scale, then automate scaling it in well-known ways for use cases that are so-limited. Vitesse's site mentions that it relies on horizontal sharding, so right off the bat, my guess is that you can't use it in ways that are sharding-unfriendly, or if you can then you'll be met with restrictions on much of the "magic" of it if you do.
Rarely is it the case that someone's actually discovered e.g. novel math or something to make the hard part easier. Better tools (to do well-understood things more easily for this use case) and restrictions (so you don't use it in ways the tools can't handle) are the usual way.
I have been using the beta version of PlanetScale for a while, and it is extremely cool. It's using the mature technology that powers Youtube to provide a developer experience for databases similar to what Vercel and Netlify provide for hosting: It will give you a database branch for each Git branch and help you manage the workflows around it.
And it is the first truly serverless relational database offering that I am aware of. The cost scale to 0, so it is perfect for small projects, but that same instance will scale to support massive load when you need it.
> Developers want the durability, stability, and scalability of a SQL database but do not want to be constrained by managing a schema. It has been our goal to give both, not compromising on the power of your datastore but making changes feel as easy as deploying code.
So is this a wrapper around managing Schemas powered by Vitesse? (Btw, had to go to your github to figure that out)
If you want to be the database for developers you should know that developers do care about how you do this scaling.
Amy I the only person reading this who doesn’t think that the marketing line in the middle about not managing a schema is super scary?
> Developers want the durability, stability, and scalability of a SQL database but do not want to be constrained by managing a schema. It has been our goal to give both, not compromising on the power of your datastore but making changes feel as easy as deploying code.
Sorry. There’s always a schema, and learning to manage it is a _good_ thing. If you’re not ready to manage and migrate, then you’re not in production. You’re in something else.
So just wanted to add a few random thoughts about some of this stuff.
With sharding, where this stuff eventually gets you is when your assumptions about shard keys no longer hold.
Eg, you might have user initiated traffic to start with, so you can easily and automatically shard everything by user id or whatever. Then one day, those assumptions change because you might have to accommodate event driven traffic, ie not request/response, and the user’s id can’t be assumed to always be present. For example let’s say something in the real world causes an event to be pushed onto your queue. That event could correspond to a real user, but since it originated somewhere in the real world, there’s probably a separate id for how that user is represented. So you can’t rely on that user ID being present to shard things by.
Not sure, if that makes sense, but sharding can be hard. It’s not like free, and I still think it’s important for engineers to understand the mental model they’re using, even with a tool like vitess.
Also, I saw claim either on planetscale or vitess that MySQL has no native support for horizontal scaling with automatic sharding, but I think they do? I think you just have to pay for that though.
Also, cross-shard transactions were mentioned as another difficulty with sharding. They can be done with either sagas (depends on the context but it’s a design pattern), or 2PC which is available in MySQL > 5.8 I believe in the form of XA Transactions.
Am I the only finding it extremely light on useful information? And after reading all the comments here those questions are still not answered.
Basically a Hosted Vitess with MySQL? Where is DC? Backup included? Redundancy? Spending Cap? Own Infrastructure or on top of other Cloud ? Support Level? Uptime Guarantee? etc etc.
All of these are basic info required from a SaaS, and they are missing. Not even a FAQ.
Can I install this locally? Will it work without an internet connection? Is it fully open source? From what I can tell, the answer to all three questions is no.
Sounds interesting but there is not enough information for me to get a picture of what this actually means. Thought I was gonna be clever and install it, run it and see what it is all about but I can't figure out how to get it. One link is "sign up" and the other is "contact sales" but I'm not interested in either, I'm interested in "Download" but I cannot find it anywhere.
Am I stupid or misunderstanding something fundamentally here? Is just a hosted DB or something like that?
It's a hosted DB, and as far as I can tell the Killer Feature is that it makes schema updates less painful. How it does that without significant performance trade-offs or caps is unclear to me.
[EDIT] on reading further in their docs, my suspicion is that their "branching" concept is a hell of a lot more limited than I believed at first. I initially took it to mean you could have multiple active schemas working on your data at once—instead, I think it's more like exporting just the schema of your DB and importing it to a fresh DB, which is nothing new and doesn't run into all kinds of operational and security issues the other workflow would. I'm fairly sure all the actual magic is in the schema diffing, and the docs make me think even that isn't as fully-magical as one might hope.
It sounds like the database branch has copy of the production schema, but what about the data itself? We've been using AWS Aurora clones to quickly give developers copies of production, can zero-copy clones be made as part of the branch?
Ed: obviously this does not solve the bigger service of providing "magic" scale up/down, though.
It looks like the "branching schema" is essentially:
CREATE DATABASE branch_name WITH TEMPLATE main;
I might actually have to try this - I hadn't really thought template dbs would be all that useful in pg - but this branch and test för dev use-case is interesting.
Then there's COPY (for data) and rename to "promote" a branch with data (one would probably want to run DDL on main db though:
ALTER DATABASE branch_name RENAME TO main; -- would have to move old main out of the way first - but might be possible in same transaction?
looks like it's a ways off... planetscale says they're built on vitess, and vitess has postgres compatibility at the bottom of their "Medium Term" roadmap: https://vitess.io/docs/resources/roadmap/
[+] [-] Aeolun|4 years ago|reply
Speak for yourself. I love my schemas.
This article is also 100% fluff, and zero actual information. Obviously I won’t sign up to anything without an ounce of information being provided up front.
[+] [-] mixmastamyk|4 years ago|reply
[+] [-] lysecret|4 years ago|reply
[+] [-] Rauchg|4 years ago|reply
That is: how can I pair a brand new frontend preview deploy, with a serverless database with the specific schema my new feature needs?
This technology makes the whole serverless stack feel complete.
[+] [-] adamfeldman|4 years ago|reply
[+] [-] eurasiantiger|4 years ago|reply
More to the point, it probably cannot solve it efficiently at all, since it is not a graph database and thus cannot be paired with a generic GraphQL resolver (would generate join queries instead of lookups across edges) and a stack of generated static queries in the backend (no need to allow generic queries, just make it possible to write queries once and only once).
[+] [-] briandoll|4 years ago|reply
When I started using git, it broke my brain. I wasn't really sure I understood it, but the idea of cheap local branches soon became the most important thing to me in a vcs, and everything became easier and so much faster. You could just work on code the way real life happens, not in some methodical pre-planned release schedule that always rubbed harshly against the reality of bug fixes and ever changing minds.
Planetscale is a lot like that transformation. We've been thinking of databases as this specialist thing, where the arcane knowledge required to make it perform well is a specialist role, and something completely untouchable by mortal engineers. While we've learned about the importance of good data modeling, we've dealt with a layer in our stack that is essentially static. Shipping features that require changes to a database schema sit and languish, because the pain and coordination required can often be too much for a team to want to deal with.
What PlanetScale has done is solve two massive problems in one platform. First, since it's built on Vitess, it Just Works At Scale. You don't need to fiddle with knobs to get it to perform well. You most definitely are not even in the top 10% of what is already running on Vitess, so you don't really need to worry about ever outgrowing PlanetScale. But the BIG innovation here is what is possible now that a database works the way the SDLC has evolved with git. Make a branch, change the schema, roll it out with your code. Just like anything else, really. Use the PlanetScale CLI to actually USE your database. Change the schema because it will make your code better, and don't worry that you're somehow going to do it wrong.
PlanetScale just made databases useful for developers beyond a nearly-static data store. It's a high-scale database that you can change like code. It will break your brain a little at first, just like git did. And then you'll wonder why the hell we waited so long to have a database this good.
[+] [-] tomc1985|4 years ago|reply
[+] [-] chris_st|4 years ago|reply
One question about pricing -- I can't tell if the "free" tier costs are for one month, or ongoing. That is, do you get the free tier amounts for one month and then pay from then on, or is that level of service free from then on?
[+] [-] hodgesrm|4 years ago|reply
I think you just described why developers liked MongoDB. It does not get a lot of love on HN, but having the DB schema map to your object model is very convenient.
[+] [-] unknown|4 years ago|reply
[deleted]
[+] [-] unknown_error|4 years ago|reply
And then what does PlanetScale add on top of Vitess hosted anywhere else?
Sorry, the linked blog post is both very abstract and assumes a high level of preexisting knowledge about database scaling.
[+] [-] motives|4 years ago|reply
[+] [-] manigandham|4 years ago|reply
Implementations vary but there are the 2 major architectures: systems like Vitess/Proxy SQL/Citus/Timescale that act as a proxy layer on top of existing RDBMS running on multiple servers to make them look like a single database, and entirely custom projects like CockroachDB/TiDB/Yugabyte/Cloud Spanner which have their own native processing and data layers.
OLAP relational data warehouses like Vertica/Greenplum/MemSQL/Redshift/Bigquery are also natively distributed but focus on large-scale analytics with features like column-oriented storage and vectorized processing.
[+] [-] paxys|4 years ago|reply
I wouldn't say there's too much "magic" in there, but it does a lot of known difficult things (schema management, sharding/resharding, connection pooling, query optimization, DB administration, monitoring, backup/failover) which are generally painful and expensive to do yourself.
[+] [-] moshmosh|4 years ago|reply
Rarely is it the case that someone's actually discovered e.g. novel math or something to make the hard part easier. Better tools (to do well-understood things more easily for this use case) and restrictions (so you don't use it in ways the tools can't handle) are the usual way.
[+] [-] sorenbs|4 years ago|reply
And it is the first truly serverless relational database offering that I am aware of. The cost scale to 0, so it is perfect for small projects, but that same instance will scale to support massive load when you need it.
[+] [-] cfors|4 years ago|reply
So is this a wrapper around managing Schemas powered by Vitesse? (Btw, had to go to your github to figure that out)
If you want to be the database for developers you should know that developers do care about how you do this scaling.
[+] [-] olivierlacan|4 years ago|reply
> PlanetScale's Non-Blocking Schema Changes' workflow doesn't support FOREIGN KEYs in users' databases.
> PlanetScale determined that the production safety that Non-Blocking Schema Changes provide are worth this technical tradeoff. Learn more.
https://docs.planetscale.com/concepts/nonblocking-schema-cha...
[+] [-] halostatue|4 years ago|reply
> Developers want the durability, stability, and scalability of a SQL database but do not want to be constrained by managing a schema. It has been our goal to give both, not compromising on the power of your datastore but making changes feel as easy as deploying code.
Sorry. There’s always a schema, and learning to manage it is a _good_ thing. If you’re not ready to manage and migrate, then you’re not in production. You’re in something else.
[+] [-] asimpletune|4 years ago|reply
With sharding, where this stuff eventually gets you is when your assumptions about shard keys no longer hold.
Eg, you might have user initiated traffic to start with, so you can easily and automatically shard everything by user id or whatever. Then one day, those assumptions change because you might have to accommodate event driven traffic, ie not request/response, and the user’s id can’t be assumed to always be present. For example let’s say something in the real world causes an event to be pushed onto your queue. That event could correspond to a real user, but since it originated somewhere in the real world, there’s probably a separate id for how that user is represented. So you can’t rely on that user ID being present to shard things by.
Not sure, if that makes sense, but sharding can be hard. It’s not like free, and I still think it’s important for engineers to understand the mental model they’re using, even with a tool like vitess.
Also, I saw claim either on planetscale or vitess that MySQL has no native support for horizontal scaling with automatic sharding, but I think they do? I think you just have to pay for that though.
Also, cross-shard transactions were mentioned as another difficulty with sharding. They can be done with either sagas (depends on the context but it’s a design pattern), or 2PC which is available in MySQL > 5.8 I believe in the form of XA Transactions.
[+] [-] itwy|4 years ago|reply
Ironic coming from the infinitely scalable database, isn't it?
[+] [-] paxys|4 years ago|reply
[+] [-] chachra|4 years ago|reply
[+] [-] ksec|4 years ago|reply
Basically a Hosted Vitess with MySQL? Where is DC? Backup included? Redundancy? Spending Cap? Own Infrastructure or on top of other Cloud ? Support Level? Uptime Guarantee? etc etc.
All of these are basic info required from a SaaS, and they are missing. Not even a FAQ.
[+] [-] jccooper|4 years ago|reply
As for the rest? Very good questions. Just coming out of beta, so perhaps they're still filling out the website.
[+] [-] nebulous1|4 years ago|reply
[+] [-] aantix|4 years ago|reply
Doesn’t look like there’s Postgres compatibility.
[+] [-] jaredcwhite|4 years ago|reply
Is it yet another example of vendor lock-in? Yes.
[+] [-] capableweb|4 years ago|reply
Am I stupid or misunderstanding something fundamentally here? Is just a hosted DB or something like that?
[+] [-] moshmosh|4 years ago|reply
[EDIT] on reading further in their docs, my suspicion is that their "branching" concept is a hell of a lot more limited than I believed at first. I initially took it to mean you could have multiple active schemas working on your data at once—instead, I think it's more like exporting just the schema of your DB and importing it to a fresh DB, which is nothing new and doesn't run into all kinds of operational and security issues the other workflow would. I'm fairly sure all the actual magic is in the schema diffing, and the docs make me think even that isn't as fully-magical as one might hope.
[+] [-] ngrilly|4 years ago|reply
Where are you hosted? What latency should we expect from AWS, GCP, DigitalOcean and fly.io?
And also what degree of compatibility should we expect with MySQL? The doc is quite sparse on this.
[+] [-] stalluri|4 years ago|reply
https://docs.planetscale.com/concepts/branching
[+] [-] satyrnein|4 years ago|reply
[+] [-] vira28|4 years ago|reply
I like the cool things but I can’t migrate to MySQL just because of this.
[+] [-] Allstar|4 years ago|reply
[+] [-] e12e|4 years ago|reply
It looks like the "branching schema" is essentially:
CREATE DATABASE branch_name WITH TEMPLATE main;
I might actually have to try this - I hadn't really thought template dbs would be all that useful in pg - but this branch and test för dev use-case is interesting.
Then there's COPY (for data) and rename to "promote" a branch with data (one would probably want to run DDL on main db though:
ALTER DATABASE branch_name RENAME TO main; -- would have to move old main out of the way first - but might be possible in same transaction?
[+] [-] samlambert|4 years ago|reply
[+] [-] NAR8789|4 years ago|reply
[+] [-] qaq|4 years ago|reply