top | item 31536827

Neon – Serverless Postgres

667 points| nikolay | 3 years ago |neon.tech | reply

330 comments

order
[+] anilgulecha|3 years ago|reply
This is the missing piece on cloud for masses

  * we already have compute scale-to-zero (cloudrun, lambda, fly.io).

  * Network is default pay for use. Storage (S3) is default pay for use.

  * The only piece in the stack that was always-on was the database (only serverless db thus far was firestore, or something like sqlite+litestream)
With something like this we get a solid RDBMS engineered to be scale-to-zero, and with good developer experience.

This opens up a world of try-out mini applications that cost cents to host. serverless db (postgres) + serverless compute (cloud-run) + use as you go storage+network. This is a paradigm-shift stack. Exciting days ahead.

[+] manigandham|3 years ago|reply
There are plenty of serverless database options already: Firestore, DynamoDB, CosmosDB, FaunaDB, even MongoDB, and there are "newsql" distributed relational systems like CockroachDB and Planetscale with serverless plans.
[+] logifail|3 years ago|reply
> This opens up a world of try-out mini applications that cost cents to host

Given how much performance you can squeeze out of a $5/month VPS (I've been spinning them up and indeed down regularly over the last couple of years), is this really a paradigm shift?

[+] pid-1|3 years ago|reply
Fly.io does not scale to zero.

Lambda has many limitations.

In particular, for some reason AWS is allergic to providing a container deployment service that actually scales to zero.

[+] dragonwriter|3 years ago|reply
AWS Aurora Serverless v1 (in MySQL and Postgres flavors) has had serverless, scale-to-zero for quite a while.
[+] 8organicbits|3 years ago|reply
What's the cold start time for something using sqlite+lightstream on scale-to-zero compute? I think you'd need to pull the db out of storage, so I would be slow to go from 0->1 instance. Anyone know if that's right?

Is there any cold start delay for neon?

[+] rektide|3 years ago|reply
> This is the missing piece on cloud for masses

I like this perspective a lot & think it's absolutely key here.

We- the world- still pick single-node writer postgres & read replicas when we have to store & query data. There's great Kubernetes postgres operators, but it's still a distinctly pre-cloud pre-scale type of technology, & this decoupling & shared-storage sounds ultra promising, allows independent & radicaly scale up & scale down, sounds principally much more managable.

[+] hamandcheese|3 years ago|reply
If you can scale your app to zero, couldn’t you also just scale your database to zero once no more app servers are running?

Or for try-out apps, as you mention, you could just run Postgres next to your app in the same container.

This might be possible with fly.io, or will soon, I think.

I’m not sure how comfortable I am using a custom flavor of Postgres (even if it’s just the storage layer).

[+] antender|3 years ago|reply
We already had serverless db for ages and it's called ... Google Sheets. You can even query it with simple SQL-like language.

The problem with most other "serverless" databases is that they don't offer HTTP API to query them from restricted environments like serverless functions.

[+] derefr|3 years ago|reply
Snowflake is probably the closest comparison.
[+] TruthWillHurt|3 years ago|reply
uh... AWS Aurora? Azure CosmosDB? GCP BigQuery?

All serverless, scale-to-zero or pay for demand...

[+] jokethrowaway|3 years ago|reply
I don't see the benefits over a 5$ VPS. Even admitting I'll save a few cents over a VPS (which is absolutely not guaranteed), the cost saving is so minimal I won't bother rewriting everything under the serverless paradigm just for it. Of course if your cloud doesn't cost an eye and an arm. I can understand people excited to save on expensive Aws instances but maybe you should just consider dumping Aws.

Scale doesn't matter for mini applications and scaling vertically (=throw money for a bigger server) will work for 99% of the companies. The 1% who need horizontal scaling will have custom everything regardless and will need to hire experts, not a good niche to release a product.

[+] SonOfLilit|3 years ago|reply
> Neon allows to instantly branch your Postgres database to support a modern development workflow. You can create a branch for your test environments for every code deployment in your CI/CD pipeline.

> Branches are virtually free and implemented using the "copy on write" technique.

Unless I missed that everyone supports this, this here could be a killer feature and should be advertised higher.

[+] zxspectrum1982|3 years ago|reply
You can get that feature on any Postgres server by installing Citus
[+] jvolkman|3 years ago|reply
AWS Aurora Postgres supports this to an extent with "clones". You can even clone cross-account. The same copy-on-write stuff applies, so they're relatively cheap and fast. I hope that Google's new AlloyDB will also support it.

https://aws.amazon.com/about-aws/whats-new/2019/07/amazon_au...

There are some annoying restrictions, though. You can only have a single cross-account clone of a particular db per account.

[+] jhgb|3 years ago|reply
It sounds like something you might be able to accomplish with a copy-on-write VFS on top of a Firebird database file. (Not sure about PostgreSQL, but with Firebird, you only deal with one file, so with Firebird, this should definitely work.)
[+] rkwz|3 years ago|reply
What are the intended usecases for "branching" a database? Currently, I use separate databases for different environments, are branches better?
[+] rektide|3 years ago|reply
Really interesting. I've seen so much disagregated database work, and so so so much of that exposes postgres interfaces. But all the good stuff has been closed source!

I'm very very excited to hear about a team taking this effort to postgres itself, in an open source fashion! From the Architecture[1] section of the README:

> A Neon installation consists of compute nodes and Neon storage engine.

> Compute nodes are stateless PostgreSQL nodes, backed by Neon storage engine.

> *Neon storage engine consists of two major components: A) Pageserver. Scalable storage backend for compute nodes. B) WAL service. The service that receives WAL from compute node and ensures that it is stored durably.

Sounds like a very reasonable disaggregation strategy. Really hope to hear about this wonderful effort for many more years. Ticks the boxes: open-source with a great service offering: nice. Rust: nice.

[+] nikita|3 years ago|reply
We are committed to building a durable company and we are well funded. So yes, you will hear from us for years to come as we will be shipping more and more features.
[+] ranguna|3 years ago|reply
Just yesterday I was comparing managed serverless postgres offers and was sad to temporarily end my investigation with a compromise of using managed aws RDS for development, hoping that a fully serverless postgres with a nice free tier would pop up before going to production, and here we are!

Congrats to the team for what feels like an amazing product. Signed up for the early access, can't wait to get my hands on this!

For anyone interested, these ere the DB offers I looked into:

* DO managed postgres, no free tier but price scaling was not too aggressive, the issue is that it's not natively serverless and we're gonna get 100s of ephemeral connections.

* Cockroach, was the best option for our use case but it doesn't support triggers and stored procedures, so we can't use it right now (closely following https://github.com/cockroachdb/cockroach/issues/28296)

* Fly.io price scaling is too aggressive 6$ -> 33 -> 154 -> 1000s a month and no free tier that I could find.

* Aurora serverless v2 is only for aws internal access and we are using gcp.

* Aurora v1 was what we were gonna go with, but a lot of people online have showed their negative opinion around slow scaling. I didn't investigate enough but I'm thinking we'd need to setup RDS proxy for it handle all our connections, which would've bumped up the price by a good amount. Also no free tier.

* Alloydb looked promising but also no free tier and starting price is a bit much for our current phase of development, but it was definitely something we'd look into in the future.

And now Neon, natively serverless with a (hopefully) good free tier to test things out and some hints about cross region data replication, amazing stuff!

[+] gorgoiler|3 years ago|reply
Postgres is mind boggling, coming from sqlite. In a good way, and both are amazing tools.

   with ordinal

   jsonb_*

   ‘3 minutes’::interval

   create index on my_json ->> ‘a key’
It’s amazing how much stuff there is available. All the toys!
[+] CGamesPlay|3 years ago|reply
Just a quick point in defense of SQLite: that last one is almost verbatim possible in SQLite, and it is possible to calculate ordinals, although the syntax is with standard SQL rather than a custom syntax. The SQLite docs mention that they never found a use case for jsonb that ended up being faster or more efficient than json, so they left it out, although they do reserve the BLOB data type for jsonb if such a use case is discovered.
[+] manigandham|3 years ago|reply
From the teams page, the CEO of Neon is the cofounder of MemSQL/Singlestore which is one of the best database products I've used. Looks like a solid team to get this done. Very similar approach to Yugabyte (real postgres compute layer + custom scale out data layer) and many others in the OLAP space.
[+] nikita|3 years ago|reply
Nikita - CEO of Neon here. We intended to post this at the launch next month, but since it here, I'm happy to answer any questions.

We have been hard at work and looking to open the service to the public soon.

[+] zeusly|3 years ago|reply
Hey Nikita, could you maybe put some more legal information on the webpage?

I'm trying to find out if you're a company and where you are located. Is there no legal entity behind this? Do you have a privacy policy?

[+] timmg|3 years ago|reply
How “cheap” is it to create new db instances?

I can imagine a world where it might be practical to have one master db for all of your customers/accounts. But a separate db instance for each customer’s data.

Is that the kind of architecture you think might be workable with your system?

[+] avinassh|3 years ago|reply
This is really exciting and thank you for making it open source. I am still trying to wrap my head around the Neon, but is there any design document or architecture description? I want to learn more about the Neon storage engine and how it all fits together.

Also, how do I get an invite code to try?

edit: found this to get started - https://neon.tech/docs/storage-engine/architecture-overview/

[+] akmodi|3 years ago|reply
Hey Nikita! I was just looking at the docs but I was a bit confused about what the various compute instances were doing. Do they all serve reads and writes? If so, is there data partitioning or does this support distributed transactions?
[+] httgp|3 years ago|reply
Do you plan to solve for global data-at-the-edge availability? That to me is the killer feature for databases and one I’m direly in need of at work.
[+] code_biologist|3 years ago|reply
Cool stuff! Is PostGIS support difficult?
[+] lewisl9029|3 years ago|reply
Seems like this might implement database branching in the way most people would assume: branching both the data and schema? I remember being a bit disappointed to learn that PlanetScale's database "branching" was only for the schema [1], which is still quite useful, but this would be so much cooler!

I couldn't find much info about the replication models available/planned however. I would consider this to be table stakes at this point for a serverless database with the recent trend of pushing compute to the edge. This is much more interesting to me than scaling to 0, which is only really useful during the prototyping phase.

PlanetScale is single primary with eventually consistent read replicas, Fauna has strongly consistent global writes (or regional if you choose, but no option for replication between regions if you do) with a write latency cost, Dynamo/Cosmos are active-active eventual consistently replicated with fast writes globally. All useful in different scenarios, but I'd love to have one DB tech that can operate in all of these modes for different use cases within the same app, using the same programming model to interact with data across the board.

I think the decoupled storage engine here would open up some really interesting strategies around replication. What are the team's plans here?

[1] https://docs.planetscale.com/concepts/branching

[+] vira28|3 years ago|reply
Amazing work by the Team. Congrats y'all. It was one of the best presentations in the PGcon22.

I did email Heikki the following questions, in case if someone from Neon is around here.

a) How does Neon compare to polardb https://github.com/ApsaraDB/PolarDB-for-PostgreSQL.

b) The readme mentions a component "Repository - Neon storage implementation". Does it use any special FileSystem? Any links to read more about it?

c) Heard the cold start is a second (IIRC), how does that value differ if one runs Neon on bare metal instead of k8s?

[+] talkingtab|3 years ago|reply
Why is this a good idea? In my experience, getting Postgres up and running is trivial. Docker anyone? And in many cases your data is your business so why hand it off? And if you are going to offer this product why not just call it what it is, "Postgres as service", instead of serverless which seems a bit misleading. Really it is simply Postgres running on your server.
[+] iknownothow|3 years ago|reply
I knew my bet to sticking with Postgres would pay off! This looks super exciting.

I thought of doing something similar for our data warehouse with AWS Fargate and Postgres but the cold starts and limited disk space required too much engineering on top to make it work.

Moving to Snowflake comes at the cost of losing so many Posgtres features in exchange for speed. Things like foreign keys, constraints, extensions etc which requires so much engineering to replace in Snowflake. I would be happy to pay 25x the price for a 10x speed increase for a specific query.

[+] captnObvious|3 years ago|reply
I hope y’all have a plan for when AWS decides to pick up your open source project and turn it into a managed cloud solution. It’s a pattern of theirs. And with the way egress charges are structured they’re likely to snap up any clients straddling their cloud and yours.
[+] IgorPartola|3 years ago|reply
I am trying to understand how it works without digging into the code. It sounds like the disk-backed storage here uses S3 which would introduce some severe latency as well as orders of magnitude more access errors (S3 is not going to be more reliable than EBS, let alone physical disk arrays on a day to day basis). Also how do they mitigate latency from their network to mine? In other words why would I run this over a local install if performance mattered at all to me?
[+] infogulch|3 years ago|reply
How fast can a "scaled to zero" database start up? Does Neon use a "uninitialized hot spare" strategy to reduce startup latency like crdb?

How much memory do they expect a typical single postgresql compute instance to take? I saw that Neon is targeting 'thousands' of postgresql processes per server, though with giant multi-TB servers these days that doesn't really narrow it down.

Are the postgresql processes multi-tenant as well, or is multitenancy isolated to the storage layer?

---

Heikki from the Neon team presented a talk about why they chose to develop Neon in Rust and what their experience was in Rust Finland 2022. https://www.youtube.com/watch?v=kAQeout-mh8

[+] nikita|3 years ago|reply
Postgres process in single tenant. Right now both provisioning a new Postgres instance (we call it project) and cold start is 2 seconds. We will be improving on that.
[+] Shorn|3 years ago|reply
https://youtu.be/kAQeout-mh8?t=700

Nobody knew Rust, so they started out by hiring someone who did. Good move.

Business idea: consultancy that hires out competent Rust devs to new projects.

[+] _xnmw|3 years ago|reply
Are they going to stay up-to-date with the latest version of Postgres? One problem with Yugabyte, TimescaleDB, Aurora etc. is that they are stuck on older versions of Postgres, which makes it feel like an entirely different product after a few years.
[+] akulkarni|3 years ago|reply
"One of these things is not like the other"

TimescaleDB, because it is packaged as a PostgreSQL extension (and not a fork, unlike the others), stays compatible with mainline PostgreSQL, especially as PostgreSQL improves. This is one of the key advantages of our approach.

(Timescale co-founder)

[+] mattashii|3 years ago|reply
We are planning on supporting the latest stable version of PostgreSQL. Right now, we're a bit behind (we're at 14.1, latest PosgreSQL is 14.3) but that shouldn't be much of an issue.

We don't yet know how we're going to do major version migrations, as the product is still not even out of private beta.

[+] AndrewDucker|3 years ago|reply
Having a relational database where you're charged purely for the calls you make is a game-changer.

All of the relational databases I looked at in the past required you to have a gateway node on at all times, which is far too expensive for a simple hobby project.

[+] coder543|3 years ago|reply
> Branches are virtually free and implemented using the "copy of write" technique.

Copy on write, presumably.

[+] riedel|3 years ago|reply
This sounds really interesting! I wonder what kind of scaling use cases neon is good for. Is it e.g. good for custom scenarios like a geospatial timeseries database on top of postgres?

We have admittedly not really a clue about current database cluster tech as we are IoT/ML researchers, but we are running a custom timescaledb cluster that receives constant nonchunked write load from a lot of devices and may encounter some long running queries on an around 500GB DB filled with geolo (even timing out if users are too creative), why we splitted into a single ingress master and multiple outgres WAL readonly replicated query clients to relax the consistency and sync, that seemed to be killing us (we need postgres because of postgis and have no capacity to rewrite the front-end). I wonder if neon would be good for such a use case and if it easily supports postgres extension like timescaledb hypertables and postgis). Most of the time our system just measurements, but sometimes we really need to scale up for PoCs, which makes dimensioning really hard (for us).