I'm all-in on server-side SQLite (2022)

[+] erulabs|2 years ago|reply

I hope fly is able to make it. I’m rooting for them - however - I’m starting to wonder if the SQLite push isn’t more “this is fun and interesting to build” and less “customers want this”.

Don’t get me wrong - this is neat - but I’d never suggest anyone to actually use this outside of a fun experiment. The problem with existing SQL dbs isn’t really the architecture - its the awful queries that do in memory sorting or make temporary tables for no reason or read-after-write, etc, not network latency. SQLite won’t fix your current production problems.

If it turns out they’re building this for customers throwing cash at them, awesome. I just somehow doubt it. I think Planetscale has the better approach: a drop in replacement for MySQL/RDS with a smarter query planner. As a production engineer that’s what I want to pay for!

[+] maxmcd|2 years ago|reply

Been feeling a little miffed about this recently. Litestream is excellent but if you have multiple writers your db gets corrupted. Quite easy to do with rolling deploys.

LifeFS was announced and is intended to help this. Now seems like (https://fly.io/docs/litefs/getting-started-fly/) it requires an HTTP proxy so that the application can guess about sqlite write/read usage by reading the HTTP request method. This seems... to introduce a different (maybe better?) set of gotchas to navigate.

There are now SQLite cloud offerings but you pay the network overhead and avoiding that was so much of the appeal of using SQLite.

Are people successfully using SQLite in a work or production setting with a replication and consistency strategy that they like? I've had trouble getting a setup to the point where I can recommend it for use at my jarb.

[+] bob1029|2 years ago|reply

> if you have multiple writers

Our strategy is to not attempt replication at the level of SQLite. We use a single binary for our SaaS product which shares 1 SQLiteConnection instance for the lifetime of the whole ordeal. Remember - every single SQLite connection instance is a file system abstraction, not some in-memory/networking clever optimized thing that Postgres or SQL Server is managing on your behalf. Every time you open a new connection to SQLite you are doing some pretty heavy-duty OS calls, relative to just reusing a prior connection. SQLite itself is typically built with serialization on by default, which deals with multiple threads on one connection. In my experience, this is the most stable & performant arrangement (with WAL, et. al. also enabled).

Our backup solution is to snapshot the entire VM (or block storage device) that SQLite is running on. Replication is not a concern because our restore strategy is to just bring back a snapshot if required. Our customers are ultimately responsible for this and typically handle it with a few clicks through AWS, Azure or a quick email to their private cloud provider. RPO and RTO is entirely in their court and all parties prefer it this way - them being highly-regulated banks and us being a small startup operating at the edge of the abyss.

To this day, we have not once had to support recovery of a SQLite database from snapshot due to corruption or other weirdness. We've been at it for half a decade now.

[+] capableweb|2 years ago|reply

I've had success in a production capacity with using rqlite before. There are also a bunch of other alternatives that still seem to be actively maintained, although I've only used rqlite myself before:

- https://github.com/canonical/dqlite

- https://github.com/rqlite/rqlite

- https://github.com/Expensify/Bedrock

[+] benbjohnson|2 years ago|reply

Author here. The single-node restriction for Litestream was one of the main reasons we started LiteFS. There isn't a way to handle streaming backup from multiple nodes with Litestream & S3 as SQLite is a single-writer system and there aren't any coordination primitives available with S3.

I agree that many of the SQLite cloud offerings introduce the same network overhead. With LiteFS, the goal is to have the data on the application node so you can avoid the network latency for most requests. Writes still need to go to the primary so that's unavoidable but read requests can be served directly from the replica. The LiteFS HTTP proxy was introduced as an easy way to have LiteFS manage consistency transparently so you can get read-your-writes consistency on replicas and strict serializability on the primary. That level of consistency works for a lot of applications but if you need stronger guarantees then there's usually trade-offs to be made.

[+] liveoneggs|2 years ago|reply

Do you use https://www.sqlite.org/cgi/src/doc/begin-concurrent/doc/begi... ?

[+] morelisp|2 years ago|reply

> Litestream is excellent but if you have multiple writers your db gets corrupted.

Isn't this not only well-documented, but (restricting to a single writer to avoid distributed systems issues while still making it easy to move that single writer around) sort of the whole point?

[+] jmull|2 years ago|reply

I’m bullish on SQLite, and this is mostly a great article, but this kind of stuff is flat-out misleading:

> When you put your data right next to your application, you can see per-query latency drop to 10-20 microseconds.

As if postgres and others don’t have a way to run application logic at the database. I like the SQLite way of doing it — you pretty much freely choose your own host language — anything with a decent SQLite client will work. While in postgres, for example, you’ll probably end up with pgplsql (there are others, but there are constraints). So this isn’t about latency, as the whole section of the article suggests.

There’s actually a relative weakness in SQLite here, since it doesn’t include a built-in protocol to support running application logic separate from the database. That’s also architecturally useful, and so you may have to find/build a solution for this.

Just adding replicas isn’t a general solution either, because each replica has an inherent cost: changes have to somehow get to every replica.

E.g., systems can grow to have a lot of database clients. In traditional setups you begin to struggle with the number of connections. You might think with SQLite, “hey, no connections, to problems!” but now, instead of 1000 connections you’ve got 1000 replicas. That’s something you’re going to have to deal with… that’s 1000x write load, 1000x write bandwidth.

Perhaps fly.io has a solution for this, but I suspect it’s going to cost you.

[+] JohnBooty|2 years ago|reply

    As if postgres and others don’t have a way to run 
    application logic at the database.

I mean...

This is probably the least popular possible thing you can possibly suggest as an engineer in 2023.

Me? I actually think pushing app logic to the DB is a solid, underrated, and possibly even optimal solution for a lot of scenarios.

But don't tell anybody I said that. I might get beaten up.

That's probably why fly.io sort of glosses over it as a possibility. Almost nobody is even considering it as an option in 2023.

[+] sodapopcan|2 years ago|reply

> As if postgres and others don’t have a way to run application logic at the database.

I think it's reasonably fair of them not to specify this. The target audience of this article is people who are writing their applications in languages like Elixir, JS, Ruby, Python, and are not going to be interested in pushing all of their business logic to the db.

[+] TylerE|2 years ago|reply

That whole series of blog posts is an ad for fly.io.

[+] andrewstuart|2 years ago|reply

I can’t see any valid reason not to use Postgres at the back end, unless you are in some sort of environment such as embedded or cloudflare workers that requires it. Or if you need a graph database there are better choices than Postgres.

Postgres is good on multi core, incredibly feature rich, multi user, supported by everything, lightweight and has all the tools for production workload and management. All stuff that is important.

Most important difference to me being SQLite I understand lacks flexibility in modifying table structures.

[+] kentonv|2 years ago|reply

Postgres is great at what it does, but it is extremely inefficient for storing a small amount of data, e.g. kilobytes or a few megabytes. sqlite, on the other hand, scales nicely all the way down to a few kb.

This matters for cloud in that it means with Postgres you cannot take a "lots of small databases" strategy, e.g. database per user or database per document. You pretty much have to group a lot of data into one big database.

Many apps want to do that anyway! For them, Postgres makes sense. But in the growing world of global deployments and edge compute, the lots-of-small-databases approach is getting popular because it means you can store than data out on hundreds or thousands of edge locations, rather than a single central location. And many (not all) applications actually fit pretty well into a database-per-user or database-per-document model. Centralizing their storage only hurts performance for no benefit.

As a bonus, if you are able to run sqlite compiled directly into your app, not making any kind of network connection, it can be much faster than Postgres, especially in "N+1 select" situations (which are well-known to be a problem with most SQL databases, but are not a problem when using local sqlite). Postgres does not support running as a library like this.

[+] bambax|2 years ago|reply

> I can’t see any valid reason not to use Postgres at the back end

From TFA:

> if you don’t need the Postgres features, they’re a liability. For example, even if you don’t use multiple user accounts, you’ll still need to configure and debug host-based authentication. You have to firewall off your Postgres server. And more features mean more documentation, which makes it difficult to understand the software you’re running. The documentation for Postgres 14 is nearly 3,000 pages.

[+] sumtechguy|2 years ago|reply

SQLite fills a need in the market that postgres does not. Local single process data storage and retrieval from that with some structure. Postgres and all of the DB's like it kick in when you want more than one process involved. MS used to have Access that filled this need very nicely. Once you go multi user/process you probably do not want sqlite you will want something that can do ACID on a multi user level. Once you go more than one process to store data you can make your installation a larger burden than you really need to deal with. On the other hand I can think of maybe one or two projects where sqllite fit very nicely for a data store. Most of the other cases I had involved a 'real' db.

For this case I guess they could try to bend sqlite to do this but the pain in doing so will probably not be worth the long term trouble than just using mysql or postgres or something like those.

[+] bajsejohannes|2 years ago|reply

> I can’t see any valid reason not to use Postgres at the back end

I'm using sqlite in production for a backend service. There are plenty of downsides, but to focus on the positives:

- I can run all tests in a database in memory. It's incredibly fast to "spin up" and I can use a separate database per test

- Related, I find that I write more tests against the database instead of mocking a database, which cuts down on time writing tests.

- I don't need to start a database to run the backend

- I can have snapshots of the databases in a single files for various scenarios

All in all, the development process feels a lot faster. When something takes a millisecond instead of seconds, you do things differently.

[+] dimgl|2 years ago|reply

> any valid reason

Well, cost, right? Cost is a reason why someone may not want to use a traditional RDBMS. AWS RDS and GCP Cloud SQL aren't exactly the cheapest solutions out there.

[+] srcreigh|2 years ago|reply

Postgres doesn’t support b-trees for primary storage, so data can’t be automatically clustered, which reduces efficiency for joins by 50x. MySQL and SQLite don’t have this issue.

[+] benbjohnson|2 years ago|reply

Author here. Cool to see the post make it up on HN again. I'm still as excited as ever about the SQLite space. So much great work going on from rqlite, cr-sqlite, & Turso, and we're still plugging away on LiteFS. I'm happy to answer any questions about the post.

[+] blagie|2 years ago|reply

I don't need to be sold on the virtues of applications running on systems like SQLite. The nineties had a lot of servers which were very simple (and performant) compared to LAMP, and I like systems like that.

What I would like is a good primer about the layers on top of SQLite. What does Litestream do for me? How does it compare to competitors? Why not just use SQLite directly? A more in-depth technical discussion would be nice. I'd also like to understand wrappers and ORMs for migration to other systems, should SQLite stop scaling.

[+] fiedzia|2 years ago|reply

> Why not just use SQLite directly?

SQLite does not provide replication, so there is no way to use it directly (other than copy whole file). If you mean it as "Why not use it as a database" than sure, you can use it directly, though the article states reasons for not doing so (resiliency and concurrency). Postgres is a lot better in those areas, and so is the tooling.

>I'd also like to understand wrappers and ORMs for migration to other systems, should SQLite stop scaling

1. It heavily depends on the orms. For example Django provides good abstraction layer and many things works with any database with no change needed, but many other don't bother about that. However just because a query runs, doesn't mean it will return the same results. Any non-trivial app will rely on numerous accidental details and you can't switch db and expect everything will be fine. SQL is not really portable even in the parts it does cover, and there are many it doesn't.

[+] matlin|2 years ago|reply

If you need multiple writers and can handle eventual correctness, you should really be using cr-sqlite[1]. It'll allow you to have any number of workers/clients that can write locally within the same process (so no network overhead) but still guarantee converge to the same state.

[1] https://github.com/vlcn-io/cr-sqlite

[+] endisneigh|2 years ago|reply

Use Postgres. Or if you insist on this type of architecture use CouchDB. I shudder thinking about a SQLite schema migration across clients with potentially unknown versions.

Seems like a disaster waiting to happen unless you have a bunch of logic centralized somewhere to keep track of last know schemas per user client database. And if you’re going to do all that, unless you desperately need low latency (in which case you could use a multi region database like cockroach), why not just centralize?

[+] simonw|2 years ago|reply

"unless you have a bunch of logic centralized somewhere to keep track of last know schemas per user client database"

I've been building exactly that here: https://github.com/simonw/sqlite-migrate

[+] ak39|2 years ago|reply

SQLite not supporting "stored procedures" is a deal-breaker for me. The idea for stored procs is not to "put the process as close to the data" but simply that we have a single place for language-agnostic encapsulation of data procedures.

[+] bob1029|2 years ago|reply

SQLite supports the best version of "stored procedures", IMO:

https://www.sqlite.org/appfunc.html

[+] chungy|2 years ago|reply

SQLite is an in-process database. If you need language-agnostic encapsulation of data procedures, SQLite is not for you. I would suggest you consider PostgreSQL.

[+] ketralnis|2 years ago|reply

I don't think I've ever needed language-agnostic procedures in a project where sqlite is also a fit. I like them both but at different times. I'd love to hear your use case though. Do you have microservices in different languages running on the same machine that share a db file? Or maybe a web + command line interface?

Sqlite's internals actually could support something like this: it has a bytecode engine https://www.sqlite.org/opcode.html that's more oriented around executing query plans and it's missing some pieces (e.g. it has no stack, only registers) but much of the machinery is there to expand it to stored procedures

[+] pstuart|2 years ago|reply

What language(s) would the stored procedures be, and how would that look in keeping with the ethos of the project?

Their reasoning for not doing this is not unreasonable, but it certainly would be cool if such functionality existed.

[+] chasil|2 years ago|reply

The language for triggers in SQLite appears to be a fragment of SQL/PSM and Oracle PL/SQL.

Perhaps this will grow into a more thorough implementation.

[+] hahn-kev|2 years ago|reply

But man the maintenance and debug nightmare never seems worth it for that tradeoff. Not to mention vendor lockin

[+] srameshc|2 years ago|reply

I recently saw the launch post of Electric SQL which syncs to SQlite, I like the pattern on how keeping the data close to the frontend can solve many problems, if synced with the main DB. I hate to run another docker or manage service to manage this layer but if somehow a part of data from the database like Postgres can be synced using something simple like litestream and can be placed either on edge or client can be a solution to many of the problems.

[+] kijin|2 years ago|reply

> When you put your data right next to your application, you can see per-query latency drop to 10-20 microseconds. That’s micro, with a μ. A 50-100x improvement over an intra-region Postgres query.

Why compare the latency of a remote Postgres database with a local SQLite database? If your app is so simple and self-contained that it runs on a single EC2 instance using local files, nothing prevents you from installing Postgres on the same machine, whether inside a container or not.

I have some simple apps on EC2 with MariaDB on localhost, and well-tuned queries rarely take more than 100-200 microseconds. That's total query execution time, not just communication latency. RDS just sucks for this kind of use case. It's not a useful comparison.

> As much as I love tuning SQL queries, it’s becoming a dying art for most application developers. Even poorly tuned queries can execute in under a second for ordinary databases.

Didn't you just say that milliseconds matter?

[+] morelisp|2 years ago|reply

But why take on the operational overhead of a separate DB server (not to mention 200 more microseconds), plus the EC2 surcharge? I would rather run app+SQLite + dumb object storage, than app + MySQL + MySQL incremental backup and restore.

[+] benbjohnson|2 years ago|reply

Author here. The comparison was meant to be about how Postgres (or any client/server RDBMS) is typically deployed. Yes, you can deploy Postgres on the same machine but I wouldn't say it's common. Maybe I could have expanded more on that point or simply referenced client/server architecture rather than Postgres so it didn't seem like a straw man argument.

[+] giantrobot|2 years ago|reply

> Why compare the latency of a remote Postgres database with a local SQLite database?

The SQLite DB is just a flat file that can be packaged in a lambda or whatever cloud's object store. It makes sense for fast access to read-heavy or read-only data. Even a shitty query can return rows in less than a millisecond where a remote DB that same amount of data is tens of milliseconds away.

[+] bradgessler|2 years ago|reply

Related: I wrote a piece last week on deploying Rails apps to production on Fly.io at https://fly.io/ruby-dispatch/sqlite-and-rails-in-production/

The work that’s made this possible is:

1. Litestack https://github.com/oldmoe/litestack runs everything on Sqlite

2. Fly.io’s work on the dockerfile-rails generator detecting Sqlite and Litestack in a Rails project, then setting up sane defaults for where that data is stored and persisted in production. This is all done behind the scenes with no intervention required from the person deploying.

3. Servers are overall faster and more powerful

I hope more Rails hosts make it easier and safer to deploy Sqlite to production. It will lower costs and reduce complexity for folks deploying apps.

[+] greatNespresso|2 years ago|reply

While different than the approach offered by Litestream, I am fairly excited by the direction of Cloudflare D1, making SQLite available at the edge without having to manage anything. Still in alpha but worth looking at if you're looking for cheap cloud option.

[+] mediumsmart|2 years ago|reply

Me too. It’s just a .db file on the server. The same as MySQL but this one is on the same server like the clients site or my site. Get it? It’s a file! How crazy is that. If you wanted to outsource the sqlite the same way you do the myposmongresdb databases with separate login, scale from zero to ipo and all the trimmings you would have to put it on another server or a service even. Then you can call it long distance and have a dedicated dbdudeuser like with a grown up database and you get networklattemacciato for free! Endless possibilities and constellations.

[+] sergioisidoro|2 years ago|reply

A few years I made a decision to ship a SQLite database in an (internal) ruby on rails package. Why? Because there was a large set of (static) data that was required for the package to work, and it made no sense to make an API to query it from external sources (It wasn't that big, something like 5-10Mb if I recall). At the time it felt like a super dirty hack, but time seems to have validated that decision :)

[+] meitham|2 years ago|reply

I recently wrote a production system that uses SQLite as the main backend. SQLite is in memory in this case and its entire state gets rebuilt from Kafka on start. The DB receives about 2 updates a second, wrapped with rest api aiohttp and odata filters. It has been able to handle close to 9k requests/second ands it’s a primary system in a financial institution. So yes SQLite is fully capable prod db.

[+] endisneigh|2 years ago|reply

You’re using SQLite and Kafka? Very ironic.

[+] jerrygenser|2 years ago|reply

If you don't need SQL (relational data), but maybe have a schema per topic, I've used rocksdb as a cache for latest in tombstones topic. It has high write throughput for rebuilding state when playing forward a stream

[+] paulryanrogers|2 years ago|reply

So your source of truth is ... Kafka?

[+] sigmonsays|2 years ago|reply

I dont want this to be taken the wrong way but I read about fly.io and sqlite atleast once a week.

Who is using this and why is it such a hot topic on HN?

[+] adamrezich|2 years ago|reply

SQLite appeals to the hacker because it is simple on the surface, complex beneath the surface, easy-to-use, and does its job well. however, replacing a traditional MySQL or PostgreSQL database with it, takes a bit of work, because it's not a drop-in replacement. so you see this push and pull between people of different strata advocating for SQLite, disparaging it in favor of PostreSQL, and some who try to engineer stuff around SQLite to make it a better fit for traditionally PostgreSQL-type situations.

if you've never tried SQLite on your own and you're comfortable with C, I'd recommend giving it a try, it's pretty dang cool.

[+] mnming|2 years ago|reply

Neither Litestream and LiteFS meet my SQL needs: Litestream is a single writer system, LiteFS has data consistency risk. I can't justify replacing Postgresql for them.

I do understand those tools expanded the use cases of sqlite a lot and they are pretty cool in how they pulled it off. But I'm surprised Fly's investing here; feels like it tarnishes their infra provider rep.

If they do want to continue this investment, maybe investing in things like rqlite will be more appropriate for an infra shop.

[+] nik736|2 years ago|reply

Why would I use SQLite over PostgreSQL for regular CRUD apps?

[+] imhoguy|2 years ago|reply

Did anybody try something like that: read/write to SQLite database file on backend, but also allow the database file to be downloaded at any time by rich JS frontend for read-only querying. I just wonder if the file is going to be (eventually-) consistent and not corrupted.

[+] simonw|2 years ago|reply

My hunch is that if you want to do that the safe way would be to have a mechanism that creates a snapshot of the SQLite database for the client to download when they request it.

One way to do that is with VACUUM INTO, e.g. how I use it in this TIL: https://til.simonwillison.net/sqlite/python-sqlite-memory-to...

If your database is less than 100MB or so I imagine this would easily be fast enough that the performance overhead wouldn't be worth worrying about.

161 comments