top | item 31318708

I'm all-in on server-side SQLite

1353 points| dpeck | 3 years ago |fly.io

404 comments

order
[+] bob1029|3 years ago|reply
> SQLite isn't just on the same machine as your application, but actually built into your application process. When you put your data right next to your application, you can see per-query latency drop to 10-20 microseconds. That's micro, with a μ. A 50-100x improvement over an intra-region Postgres query.

This is the #1 reason my exuberant technical mind likes that we use SQLite for all the things. Latency is the exact reason you would have a problem scaling any large system in the first place. Forcing it all into one cache-coherent domain is a really good way to begin eliminating entire universes of bugs.

Do we all appreciate just how much more throughput you can get in the case described above? A 100x latency improvement doesn't translate directly into the same # of transactions per second, but its pretty damn close if your I/O subsystem is up to the task.

[+] WJW|3 years ago|reply
How do any writes end up on other horizontally scaled machines though? To me the whole point of a database on another machine is that it is the single point of truth that many horizontally scaled servers can write to and read each others' updates from. If you don't need that, you might as well read the entire dataset into memory and be done with it.

I know TFA says that you can "soon" automagically replicate your sqlite db to another server, but it only allows writes on a single server and all other will be readers. Now you need to think about how to move all write traffic to a single app server. All writes to that server will still take several milliseconds (possibly more, since S3 is eventually consistent) to propagate around all replicas.

In short, 100x latency improvement for reads is great but a bit of a red herring since if you have read-only traffic you don't need sqlite replication. If you do have write traffic, then routing it through S3 will definitely not give you a 100x latency improvement over Postgres or MySQL anymore. Litestream is definitely on my radar, but as a continuous backup system for small apps ("small" meaning it runs and will always run on a single box) rather than a wholesale replacement of traditional client-server databases.

PS: Congrats Ben!

[+] judofyr|3 years ago|reply
> Latency is the exact reason you would have a problem scaling any large system in the first place.

Let's not forget why we started using separate database server in the first now…

A web server does quite a lot of things: Parsing/formatting HTTP/JSON/HTML, restructuring data, calculating stuff. This is typically very separate from the data loading aspect and as you get more requests you'll have to put more CPU in order to keep up (regardless of the language).

By separating the web server from the database server you introduce more latency in favor of enabling scalability. Now you can spin up hundreds of web servers which all talk to a single database server. This is a typical strategy for scalability: decouple the logic and scale up individually.

If you couple them together it's more difficult to scale. First of all, in order to spin up a server you need a full version of the database. Good luck autoscaling on-demand! Also, now every write will have to be replicated to all the readers. That's a lot more bandwidth.

There are definitely use cases for Litestream, but it's far from a replacement for your typical Node + PostgreSQL stack. I can see it being useful as a lower-level component: You can use Litestream to build your "own" database server with customized logic which you can talk to using an internal protocol (gRPC?) from your web servers.

[+] closeparen|3 years ago|reply
This is a large part of what Rich Hickey emphasizes about Datomic, too. We're so used to the database being "over there" but it's actually very nice to have it locally. Datomic solves this in the context of a distributed database by having the read-only replicas local to client applications while the transaction-running parts are remote.
[+] throwaway894345|3 years ago|reply
If you're pushing the database up into the application layer, do you have to route all write operations through a single "master" application instance? If not, is there some multi-master scheme, and if so, is it cheaper to propagate state all the time than it is to have the application write to a master database instance over a network? Moreover, how does it affect the operations of your application? Are you still as comfortable bouncing an application instance as you would otherwise be?
[+] funstuff007|3 years ago|reply
This is exactly the reason I am so skeptical of the cloud. I don't care how easy it is to stand up VMs, containers, k8s, etc. What I need to know is how hard is it to lug my data to my application and vice a versa. My feelings on this are so strong as I work mostly on database read-heavy applications.
[+] sanderjd|3 years ago|reply
What confuses me about this architecture I guess is: why have a SQL database at all? This sounds like a local cache. Which sure, of course those are super fast. But why does it need to be relational if all the data fits on the edge?
[+] a-dub|3 years ago|reply
if you can tolerate eventual consistency and have the disk/ram on the application vms, then sure, keeping the data and the indices close to the code has the added benefit of keeping request latency down.

downside of course is the complexity added in synchronization, which is what they're tackling here.

personally i like the idea of per-tenant databases with something like this to scale out for each tenant. it encourages architectures that are more conducive for e2ee or procedures that allow for better guarantees around customer privacy than big central databases with a customer id column.

[+] vmception|3 years ago|reply
> SQLite isn't just on the same machine as your application, but actually built into your application process.

How is that different than whats commonly happening? Android and iOS do this... right? ... but its still accessing the filesystem to use it.

Am I missing something or is what they are describing just completely commonplace that is only interesting to people that use microservices and never knew what was normal.

[+] deepstack|3 years ago|reply
a few years back working on Java project. Used H2 instead of postgres, and included H2 db as in application memory access. It speed up quries tremendously. There is just no beating in application db.
[+] errantmind|3 years ago|reply
Just wait until (some) devs realize they don't even need sqlite, and can serialize their data directly to binary flat files with simple locking synchronization for backups.

I'm half joking but I've witnessed many devs use databases when a binary file will do. I've done this personally for years for most of my 'fits-in-RAM', non-transactional, denormalized datasets, which is almost all of them.

Better yet, use both if you have both types of data. The performance benefits are enormous and well worth the complexity tradeoff in my experience.

[+] raxxorraxor|3 years ago|reply
The usual use case for a database is that it has multiple users in different places which would be difficult with SQLite. But for other use cases I don't see a problem. I don't know how it scales ad infinitum, but you can manage a lot of data with it and the usual SQL server has limits too. Could be a good choice even before you care about latency.
[+] overview|3 years ago|reply
> Latency is the exact reason you would have a problem scaling any large system in the first place.

Not always. It depends on the architecture and your hosting strategy. I think it’s more likely for an instance of a web app to receive more requests than it can handle, causing the app to not service any requests.

[+] iveqy|3 years ago|reply
Just the latency is really important to me! I even built an ERP system that has a response time below 100 ms for all operations, it's a design goal.

My thought is that if you can see consumer changes depending on latency (for example on amazon or google) it is equally important for internal tools. Employee time is expensive.

[+] kumarvvr|3 years ago|reply
Throughput for a single service / app improves, but does it really scale? Across a cluster, you will have to have data replication and sync routines, that are a whole mess themselves.

The latency is not reduced, it is shifted elsewhere.

[+] wasd|3 years ago|reply
Fly is putting together a pretty great team and interesting tech stack. It's the service I see as a true disruptor to Heroku because it's doing something novel (not just cheaper).

I'm still a little murky on the tradeoffs with Fly (and litestream). @ben / @fly, you should write a tutorial on hosting a todo app using rails with litestream and any expected hurdles at different levels of scale (maybe comparing to Heroku).

[+] the_biot|3 years ago|reply
If only they could keep their website reachable, that would be the icing on the cake. Like every time I see them linked on HN, I click and cannot connect to their website.

Last time somebody from fly said they'd look into it, but alas. It was related to IPv6 on their end, was as far as I could tell.

[+] quickthrower2|3 years ago|reply
How does Vercel fit in? I am having a lot of pleasure using their free tier and would be happy to pay if needed. My only concern is the pricing model being 0/20/Call us. I think clear usage-based pricing plans going 0-infinity should be the norm.
[+] purplerabbit|3 years ago|reply
Render is more of the successor IMO. Fly is a bit of a wildcard — they are bleeding edge, certainly, but they seem to shy away from focusing on implementation of some of the “boring” but extremely useful features present in most managed services (e.g., scaling volumes for Postgres)
[+] dsincl12|3 years ago|reply
Uhm... experience from a large project that used SQLite was that we where hit with SQLite only allowing one write transaction at a time. That is madness for any web app really.

Why do everyone seem so hyped on this when it can't really work properly IRL? If you have large amounts of data that need to be stored the app would die instantly, or leave all your users waiting for their changes to be saved.

What am I missing?

[+] masklinn|3 years ago|reply
> Uhm... experience from a large project that used SQLite was that we where hit with SQLite only allowing one write transaction at a time. That is madness for any web app really.

"Lots of readers few writers" is an extremely common application property tho. Your average HN has significantly more reads than it has writes, especially if you bump the "worthless" writes (e.g. vote counts) out of using the DB and only flush them to the durable DB one in a while, for instance.

And with SQLite's WAL support it's supported even better: while it still has a singular writer, the writer doesn't block the readers anymore, which is a workload issue in the exclusive case (as the single writer would bring read concurrency down to 0).

[+] jbverschoor|3 years ago|reply
- Most transactions are read-only

- "Large" applications can usually be sharded by account. This means 1 file per account, and can easily be put on the most optimal geolocation of the account

- If you defer locking until commit, allowing multiple writers ( https://www.sqlite.org/cgi/src/doc/begin-concurrent/doc/begi... ). This is good enough for most applications anayway.

- Sqlite simple, fast enough for almost anything, supports a good set of features and datatypes, is very easy to embed.

[+] samwillis|3 years ago|reply
Quite right it’s not one size fits all but for any site that’s mostly read only it’s a brilliant solution.

Simon Willison has written about it and coined the term “baked data”: https://simonwillison.net/2021/Jul/28/baked-data/

Mozilla.org uses this architecture, Django app running off SQLite with the db rsync’ed to each application server.

[+] phaedrus|3 years ago|reply
The default settings of SQLite are very conservative and essentially enforce serial writes. With tuning and loosening that enforcement, you can go from 50 writes per second to 50,000.

Edit: forgot to mention that yes a major part of that is batching writes into fewer, bigger transactions; AFAIK you can't really get around that.

[+] dagw|3 years ago|reply
What am I missing?

Many sites are Read (almost) Only. For sites where users interactively query/view/explore the data, but (almost) never write their own, it works great.

[+] beberlei|3 years ago|reply
use more than one SQLite file? we have one per day and project for example.
[+] paulhodge|3 years ago|reply
Wow Litestream sounds really interesting to me. I was just starting on an architecture, that was either stupid or genius, of using many SQLite databases on the server. Each user's account gets their own SQLite file. So the service's horizontal scaling is good (similar to the horizontal scaling of a document DB), and it naturally mitigates data leaks/injections. Also opens up a few neat tricks like the ability to do blue/green rollouts for schema changes. Anyway Litestream seems pretty ideal for that, will be checking it out!
[+] no_wizard|3 years ago|reply
This a great and interesting offering! I think this fits well with fly.io and their model of computing.

I now wish that I had engaged with this idea that was very similar to litestream that I had about a year and half ago. I always thought SQLite just needed a distribution layer to be extremely effective as a distributed database of sorts. Its flat file architecture means its easy to provision, restore and backup. SQLite also has incremental snapshotting and re-producible WAL logs that can be used to do incremental backups, restores, writes etc. It just needs a "frontend" to handle those bits. Latency has gotten to the point where you can replicate a database by its continued snapshots (which is, on a high level, what litestream appears to be doing) being propagated out to object / blob storage. You could even achieve brute force consensus with this approach if you ran it in a truly distributed way (though RAFT is probably more efficient).

Reason I didn't do this? I thought to myself - why in the world in 2020 would someone choose to use SQLite at scale instead of something like Firebase, Spanner, Fauna, or even Postgres? So after I did an initial prototype (long gone, never pushed it to GitHub) I just felt like...there was no appetite for it.

Now I regret!

Just a long winded way of saying, congrats! This is awesome! Thanks for doing exactly what I wanted to do but didn't have the guts to follow through with.

[+] epilys|3 years ago|reply
I implemented exactly this setup, in Rust, last year for a client. Distributed WAL with write locks on a RAFT scheme. Custom VFS in Rust for sqlite3 to handle the IO. I asked the client to opensource it but it's probably not gonna happen... It's definitely doable though.
[+] tiffanyh|3 years ago|reply
@dang, the actual title is “ I'm All-In on Server-Side SQLite”

Maybe I missed it but where in the article does it say Fly acquired Litestream?

EDIT: Ben Johnson says he just joined Fly. Nothing about Fly “acquiring” Litestream.

https://mobile.twitter.com/benbjohnson/status/15237489883352...

[+] dang|3 years ago|reply
Elsewhere in this thread he says "the project was acquired" which is more or less "Fly.io Buys Litestream" (the submitted title).

I'm honestly not sure whether we should change it or not - minimizing complaints is the goal - what's it called when a function has two points that it keeps unstably jumping between?

[+] lnsp|3 years ago|reply
> Litestream has a new home at Fly.io, but it is and always will be an open-source project. My plan for the next several years is to keep making it more useful, no matter where your application runs, and see just how far we can take the SQLite model of how databases can work.

As far as I understood it, Fly.io hired the person working on Litestream and pays them to keep working on Litestream.

[+] gamblor956|3 years ago|reply
"Litestream has a new home at Fly.io, but it is and always will be an open-source project"

Very bottom of the post. Technically, Litestream remains an open-source project, so it's more accurate to say that Fly.io acquired the brand IP and the owner of that IP.

[+] jchw|3 years ago|reply
This is interesting! I like using Fly.io today, but I’m currently using a single node for most stuff with SQLite. Having some kind of failover and replication would be pretty awesome. I have yet to try Litestream and it does sound like there’s some issues to work out that could be pretty nasty, but I’ll definitely be watching.

Fly.io is very nice. It’s what I hoped Hyper.sh would be, except it isn’t dead. That said, there are a couple things I worry about… like, there’s no obvious way to resize disks, you pretty much need to make a new disk that’s larger, launch a new instance with it mounted, and transfer data from an existing instance. If it was automated, I probably wouldn’t care, though a zero downtime way of resizing disks would be a massive improvement. Another huge concern is just how good the free tier is. I actually am bothered that I basically don’t get billed. Hyper.sh felt a bit overpriced, and by comparison Fly.io does scale up in price but for small uses it feels like theft.

[+] tyingq|3 years ago|reply
Dqlite is also interesting, and in a similar space. It seems to have evolved from the LXC/LXD team wanting a replacement for Etcd. It's Sqlite with raft replication and also a networked client protocol.

https://dqlite.io/docs/architecture

[+] jrochkind1|3 years ago|reply
While the title is about a business acquisition, the article is mostly about the technology itself -- replicating SQLite, suggested as a superior option to a more traditional separate-process rdbms, for real large-scale production workloads.

I'd be curious to hear reactions to/experiences with that suggestion/technology, inside or outside the context of fly.io.

[+] kall|3 years ago|reply
I am as obsessed with sub 100ms responses as the people at fly.io, so I think the one writer and many, many readers architecture is smart and fits quite a few applications. When litestream adds actual replication it will get really exciting.

> it won't work well on ephemeral, serverless platforms or when using rolling deployments

That's... a lot of new applications these days.

[+] swaraj|3 years ago|reply
Looks v cool, but I feel like I'm missing a big part of the story, how do 2 app 'servers/process' connect to same sqlite/litestream db?

Do you 'init' (restore) the db from each app process? When one app makes a write, is it instantly reflected on the other app's local sqlite?

[+] mtlynch|3 years ago|reply
Super cool! Congrats, Ben!

I've been building all of my projects for the last year with SQLite + fly.io + Litestream. It's already such a great experience, but I'm excited to see what develops now that Litestream is part of fly.

[+] aidenn0|3 years ago|reply
The SQLite team has done a good job over the years establishing an ethos (in the rhetorical sense) of writing reliable software. The degree to which this can transfer to Lighstream is the degree to which Lightstream is intrusive on the SQLite code.

Another way of saying it: I trust the SQLite's team statements of stability for SQLite because of history and a track-record for following stringent development processes. The same is not true of the Lighstream team. Does anybody know how much any potential damage introduced by the Lightstream code could affect the integrity of my data on disk -- obviously replication added by Lightstream will be only as good as the Lighstream team makes it, but to what degree is the local data-store affected?

[+] kgeist|3 years ago|reply
>But database optimization has become less important for typical applications. <..> As much as I love tuning SQL queries, it's becoming a dying art for most application developers.

We thought so, too, but as our business started to grow, we had to spend months, if not years, rewriting and fine-tuning most of our queries because every day there were reports about query timeouts in large clients' accounts... Some clients left because they were disappointed with performance. Another issue is growing the development team. We made the application stateless so we can spin up additional app instances at no cost, or move them around between nodes, to make sure the load is evenly distributed across all nodes/CPUs (often a node simply dies for some reason). Since they are stateless, if an app instance crashes or becomes unstable, nothing happens, no data is lost, it's just restarted or moved to a less busy node. DB instances are now managed by the SRE team which consists of a few very experienced devs, while the app itself (microservices) is written by several teams of varying experience and you worry less about the app bringing down the whole production because microservice instances are ephemeral and can be quickly killed/restarted/moved around. Simple solutions are attractive but I'd rather invest in a more complex solution from the very beginning, because moving away from SQLite to something like Postgres can be costlier than investing some time in setting up 3-tier if you plan your business to grow, otherwise eventually you can end up reinventing 3-tier, but with SQLite. But that's just my experience, maybe I'm too used to our architecture.

[+] anyfactor|3 years ago|reply
Story time!

A client told me that they will use a DigitalOcean droplet for a web app. Because the database was very small I chose to use SQLite3.

After delivery the client said their devops guy wasn’t available they would like to deploy to Heroku. Heroku being a ephemeral cloud service couldn’t handle the same directory SQLite3 db I had there. The only solution was to use their Postgres database service.

For some reason, it was infuriating that I have to use a database like that to store few thousand rows of data. Moreover, I would have to rewrite a ton of stuff accommodate the change to Postgres.

I ended up using firestore.

---

I think something like this could have saved me a ton of hassle that day.

[+] ignoramous|3 years ago|reply
Looking forward to ditching my PlanetScale plans for this!

> ...people use Litestream today is to replicate their SQLite database to S3 (it's remarkably cheap for most SQLite databases to live-replicate to S3).

Cloudflare R2 would make that even cheaper. Cloudflare set to open beta registration this week.

And if you squint just enough, you'd see R2, S3 et al are nosql KV store themselves, masquerading as disk drives, and used here to back-up a sql db...

> My claim is this: by building reliable, easy-to-use replication for SQLite, we make it attractive for all kinds of full-stack applications to run entirely on SQLite.

Disruption (? [0]) playing out as expected? That said, the world reliable is doing a lot of heavy lifting. Reliability in distributed systems is hard (well... easy if your definition of reliability is different ;) [1])

> And if you don't need the Postgres features, they're a liability.

Reminds me of WireGuard, and how it accomplishes so much more by doing so much less [2].

Congratulations Ben (but really, could have taken a chance with heavybit)!

----

[0] https://hbr.org/2015/12/what-is-disruptive-innovation

[1] God help me, the person on the orange site saying they need to run Jepson tests to verify Litestream WAL-shipping. Stand back! You don’t want to get barium sulfated!, https://twitter.com/tqbf/status/1510066302530072580

[2] "...there’s something like 100 times less code to implement WireGuard than to implement IPsec. Like, that is very hard to believe, but it is actually the case. And that made it something really powerful to build on top of*, https://www.lastweekinaws.com/podcast/screaming-in-the-cloud...

[+] mwcampbell|3 years ago|reply
Congratulations to Ben on getting a well-funded player like Fly to buy into this vision. I'm looking forward to seeing a complete, ready-to-deploy sample app, when the upcoming Litestream enhancements are ready.

I know that Fly also likes Elixir and Phoenix; they hired Chris McCord, after all. So would it make sense for Phoenix applications deployed in production on Fly to use SQLite and Litestream? Is support for SQLite in the Elixir ecosystem, particularly Ecto, good enough for this?

[+] otoolep|3 years ago|reply
Congratulations to Ben! This project has been like a rocket ship.
[+] krts-|3 years ago|reply
A great project with awesome implications. Well deserved, and the fly.io team are very pragmatic.

This will be even more brilliant than it already is when fly.io can get some slick sidecar/multi-process stuff.

I ended up back with Postgres after my misconfigs left me a bit burned with S3 costs and data stuff. But I think a master VM backed by persistent storage on fly with read replicas as required is maybe the next step: I love the simplicity of SQLite.