We've been using SQLite as our principal data store for 6 years. Our application services potentially hundreds of simultaneous users at once, each pushing 1-15 megabytes of business state to/from disk 1-2 times per second.
We have not had a single incident involving performance or data integrity issues throughout this time. The trick to this success is as follows:
- Use a single SqliteConnection instance per physical database file and share it responsibly within your application. I have seen some incorrect comments in this thread already regarding the best way to extract performance from SQLite using multiple connections. SQLite (by default for most distributions) is built with serialized mode enabled, so it would be very counterproductive to throw a Parallel.ForEach against one of these.
- Use WAL. Make sure you copy all 3 files if you are grabbing a snapshot of a running system, or moving databases around after an unclean shutdown.
- Batch operations if feasible. Leverage application-level primitives for this. Investigate techniques like LMAX Disruptor and other fancy ring-buffer-like abstractions if you are worried about millions of things per second on a single machine. You can insert many orders of magnitude faster if you have an array of contiguous items you want to put to disk.
- Just snapshot the whole VM if you need a backup. This is dead simple. We've never had a snapshot that wouldn't restore to a perfectly-functional application, and we test it all the time. This is a huge advantage of going all-in with SQLite. One app, one machine, one snapshot, etc...
> Make sure you copy all 3 files if you are grabbing a snapshot of a running system
Or use the ".backup" command in the CLI to take a clean single file snapshot if that's what you need. Or, you can call the checkpoint C API and if it can move all the transactions from WAL to DB, it will remove the WAL file when it's finished.
I find databases of all sorts extremely interesting and I've tried many of them, of all flavors.
In the end, I always come back to Postgres. It's unbelievably powerful, lightweight and there's not much it can't do.
Why not use sqlite? Here's one example - I like to access my database from my IDE remotely - my understanding is that remote access is not possible with sqlite.
Another example - multitenancy is dead easy in postgres, makes things more secure and makes coding easier.
Another is that Postgres had an extremely rich set of data types, including arrays and json.
Also Postgres is designed for symmetric multiprocessing which matters in today's multicore world.
> In the end, I always come back to Postgres. It's unbelievably powerful, lightweight and there's not much it can't do.
Well, depends on how wide your definition is:
Filesystems are also databases. (And eg inside of Google, they are using databases to store filesystems.)
These days, we often think of relational databases. Filesystems are more like the hierarchical databases of yore that Codd (also) talked about in his original papers.
> Here's one example - I like to access my database from my IDE remotely - my understanding is that remote access is not possible with sqlite
Not sure exactly what you mean by 'remote access', but if it's to debug something on a remote DB from an SQL IDE, I do this quite often. I use sshfs to mount the remote filesystem and open the SQLITE file in DBeaver.
Even in case of Postgres, you would have to connect to a remote DB via SSH or via VPN. It's just that you will be using sshfs to connect to a remote SQLITE file.
> In the end, I always come back to Postgres. It's unbelievably powerful, lightweight and there's not much it can't do.
How lightweight?
SQLite itself is around 600 KB. That's with all features enabled. I believe you can get it it down to about half that if you disable features you don't need. RAM usage is typically under a dozen or so KB.
It doesn't even require that you have an operating system. All you need for a minimal build is a C runtime that includes memcmp, memcpy, memmove, memset, strcmp, strlen, and strncmp, and by default also malloc, realloc, and free although it has provisions for providing different memory allocators, and you have to provide some code to access storage if you are running on a system that doesn't have something close to open, read, write, and the like.
This means it can even be used on many embedded systems far too small to run a Unix.
Heck, it is not even too unreasonable to compile SQLite to webasm and include it on a web page if for some reason you need client side SQL on the web.
I like everything about Postgres except how arcane it is. It feels like 1980s Unix software, and everything about it is manual. It seems like there are multiple ways to do many things and I never feel like I am sure I am doing it the "right" way (as in both correct and modern and what you should do).
I've been watching CockroachDB as a more modern alternative that can auto-scale easily, do master/master easily, etc. Those are all painful on Postgres to the point that hosted and managed pgsql is a cash cow for cloud vendors.
I've seen mysql used in multi-master ha configuration (percona-based solution) successfully, do you have anything to recommend for a true multi-master postgresql solution (ideally open source)?
> Why not use sqlite? Here's one example - I like to access my database from my IDE remotely - my understanding is that remote access is not possible with sqlite.
Can you share more on how you do multitenancy? Are you using it for web/mobile? What language/Library do you use to access the database?
I’m building a multi tenant web application on node/sequlize/feathers and implementing logic in code. I’ve seen table row level security but haven’t looked deeply.
By multi tenancy is dead easy in Postgres, are you talking about schemas?
If so, have you run into any scaling issues with the number of tenants you can support. I looked into using schemas for multi tenancy but the various reports I found said around 50 tenants or so things started breaking down very quickly.
A lot of IDEs and other graphical tools can point towards a SQLite database. Does it really have to be local though? I assumed you could hit it from any external file share? Or did you mean something else?
Cool article. I definitely agree that SQLite is the way to go in 99% of the cases you're making some kind of app.
I have often seen Postgres, MySQL and MongoDB in places where they were really overkill and added unnecessary network and deployment complexity. And upon asking the designers it turns out they simply didn't know about SQLite...
Small gripe in case the poster reads this: There's a malopropism in the opening bit:
> SQLite is properly the only database you will ever need in most cases.
I'm not sure I understand why PostgreSQL would be "overkill" for something? It's very easy to set up and operate, almost anyone in the industry either knows it or they should be willing to learn it as it's the de facto standard for almost any modern company using an SQL database.
Also it's very easy to buy PostgreSQL from various DBaaS providers.
There's a docker one liner to set it up on any development machine and you need to run it just once.
Sure, SQLite is a little bit easier but it's so much less powerful as a database. Why not just go to Postgres directly and leave SQLite for what it's intended: embedded programming like mobile apps or in-car entertainment systems etc.
One example is Ubiquiti Unifi management software using MongoDB, you end up with 4GB+ of space usage just to run some basic software because of the prealloc files.
I use Gitea(Github clone) locally with SQLite running on ZFS where I can take atomic snapshots of both the SQLite database and git repositories.
This would not be possible if I used a remote database, as there is no way to do a atomic snapshot of both the remote database on another filesystem(even if its ZFS) and the git repository.
Developers of multiple services, please support SQLite and don't lock yourself to a single database like PostgreSQL and MySQL. Even if the service is embedded in a Docker Container it is still painful to manage.
> I use Gitea(Github clone) locally with SQLite running on ZFS where I can take atomic snapshots of both the SQLite database and git repositories.
ZFS snapshots may not result in a consistent databases backup of sqlite. You should use VACUUM INTO and then do a snapshot or use the sqlite Backup API.
Essentially, while you have a proper atomic snapshot of changes on disk the in flight transactions won't be there and you will be on the mercy of having a sucessful recovery from the journal/wal if you have that.
It is important to mention that a "Remote" database could run on the same host ...
I run PostgreSQL on my VPS alongside gitea and some other services, and I can take snapshots of it just fine (assuming I don't take a snapshot as it is writing to disk, but that problem exists with SQLite as well).
1. You can run Postgres locally using the same ZFS for storage you'd use for the sqlite database. This should come with the same backup properties as sqlite.
2. I doubt that the git part of your setup can be snapshotted correctly while an operation is in progress, since git doesn't use transactions.
I've deployed Bitbucket and Postgres as a docker stack under the same subvolume and done snapshots on that. As long as you have WAL enabled in Postgres there shouldn't be an issue, and I've restored multiple times without problems.
This article says you can back up your database by just copying it, as it’s a single file (and apparently copying it while it’s in use is fine?). It also suggests enabling WAL mode. But WAL mode means you now have two files. Does the use of WAL introduce any concerns while backing up?
This is untrue and a sure way to corrupt your database[1].
From sqlite.org on how to corrupt your database:
> 1.2. Backup or restore while a transaction is active
Systems that run automatic backups in the background might try to make a backup copy of an SQLite database file while it is in the middle of a transaction. The backup copy then might contain some old and some new content, and thus be corrupt.
> The best approach to make reliable backup copies of an SQLite database is to make use of the backup API that is part of the SQLite library. Failing that, it is safe to make a copy of an SQLite database file as long as there are no transactions in progress by any process. If the previous transaction failed, then it is important that any rollback journal (the -journal file) or write-ahead log (the -wal file) be copied together with the database file itself.
What you need is to follow the guide[2] and use the Backup API or the VACUUM INTO[3] statement to create a new database on the side.
Ugh a dynamically typed SQL database, no thanks. I always find I have to build quite a bit of a "data layer" on top of SQLite for example when using it in mobile apps.
Also you have to build some basic stuff yourself like Enum or JSON support. It's not hard but honestly I would feel much better using PostgreSQL in those situations.
If you want an super easy way to back up your sqlite database, check out https://litestream.io/
It will continuously backup your database to an S3 compatible service. There are very nice detailed instructions for various backends (AWS, Minio, B2, Digital Ocean) as well as instructions on how to run as a service.
I really like SQLite but I also like postgres because it supports more complex data types. Sure you can do the same in SQLite by destructuring the data over multiple tables but it's just not as easy or straightforward.
This is why developing with MongoDB is so fast and easy in the beginning. You just go, store shit like you structure your data in code which makes the mapping extremely easy in most cases.
I really agree with this article. I actually built an App with Sqlite as the backend that works locally, on kubernetes, on the mac app store , on Raspberry PIs, and almost anywhere and it has performed fantastically for years now. The biggest thing is that you have to make sure you always wrap your SQLite queries in a transaction, otherwise you can get corruption: https://github.com/yazz/visualjavascript
SQLite is great! But if you have mainly analytical queries over your data, you might want to give DuckDB [1] a go. It's an embedded database that uses a columnar representation similar to MonetDB (it's from the same group).
SQLite is cool and all, but using ever with the caveat most cases really is indicative of that it is not ever. Additionally, I would contest not even most cases. The post is correct in that SQLite often is fine though.
> SQLite is cool and all, but using ever with the caveat most cases really is indicative of that it is not ever.
I think the "ever" is about "will I ever need to upgrade to a server-based database in this particular case". I.e., in cases A and B, you might do OK for a year or two with SQLite, but if your project grows, eventually you'll have to upgrade to Postgres; in these cases, SQLite is not the only database you'll ever need. But in cases C, D, E, and F, you'll never have to upgrade; in these cases, SQLite is the only database you'll ever need. The cases where SQLite is the only database you'll ever need outnumber the cases where SQLite is not the only database you'll ever need; thus, "SQLite is the only database you'll ever need in most cases."
I love SQLite. Used it to set up a small PHP tool which allowed users to check if their data was present in the latest Facebook leak. Took 30 mins of coding, incl. data import. The tool got picked up by the national media and the tiny 1GB VPS got quite hammered with traffic (±50 lookups a second, 5.5M records). But it didn't break a sweat. The only snag I ran into was running out of file descriptors/sockets, which was easily solved.
I personally think that SQL and tabular data should be built into languages in a manner similar to text and regular expressions. I get his (partially) in python by using pandas and pandasql, which (to my understanding) initiates sqlite in the background. There are a few other modules that do this as well.
Recreating relational table query operations as a set of operations unique to a specific programming language instead of integrating SQL makes, to me, as much conceptual sense as recreating a unique text pattern matching search for each language rather than integrating regular expressions.
This is only a small sliver of the topic here, but I do think that SQLite is probably a good backend for this.
Sqlite is awesome and just enough for many many uses. The thing I wonder if it will come or if its deeply embedded in the DB design are column types. Having everything as a string is the main difference with a standard DB for me. Maybe some optional typing could be enforced ? Anyway, I'm not criticising, being more than grateful for this awesome under the radar but cornerstone piece if tech, just like curl
Everything is not stored as string. You set a column to number and pass a number, it's stored as a number.
SQLite doesn't enforce it. But people forget the major reason for DB enforcement is multiple application on one database.
This pattern has fallen out of favor in general for all databases (we prefer one service layer in front of the DB, and then multiple apps use that service layer).
And when you have one app or one service layer, that's where the enforcement can easily come from.
In "most cases"? I take it by most cases it means, non enterprise applications with no load and basic use-cases? I can't think of a single application for SQLite at any company I've ever worked for, perhaps my first job for a small e-commerce company. Every database I work with has hundreds of GBs of data at the minimum and has to be highly available (i.e. tolerate any DB server dying with zero downtime) and in most cases, extremely low latency. I.e. respond to hundreds or usual thousands of requests per second with a response time ranging from 100ms to 1ms. Recently most of the databases I'm working with are responding to 10000+ requests per second at an average request time of less than 1ms.
Sure, for embedded databases with not much data and a single user, use SQLite, but for actual serious applications it's a toy.
> Even if you start out small and later need to upscale, as long as your web application can run on the same machine as the database, which it can in 99% of the time...
It's not really "upscaling" if it still can be handled by one machine though isn't it? And it might be much cheaper to handle the same load on a number of weaker "machines" than a single powerful "machine" (replace "machine" with cloud VMs).
And for such "small-scale" problems, it's also not a big deal to run a "proper" DB server process on the same machine (but with the advantage that it's absolutely trivial to switch to multiple machines).
[+] [-] bob1029|4 years ago|reply
We have not had a single incident involving performance or data integrity issues throughout this time. The trick to this success is as follows:
- Use a single SqliteConnection instance per physical database file and share it responsibly within your application. I have seen some incorrect comments in this thread already regarding the best way to extract performance from SQLite using multiple connections. SQLite (by default for most distributions) is built with serialized mode enabled, so it would be very counterproductive to throw a Parallel.ForEach against one of these.
- Use WAL. Make sure you copy all 3 files if you are grabbing a snapshot of a running system, or moving databases around after an unclean shutdown.
- Batch operations if feasible. Leverage application-level primitives for this. Investigate techniques like LMAX Disruptor and other fancy ring-buffer-like abstractions if you are worried about millions of things per second on a single machine. You can insert many orders of magnitude faster if you have an array of contiguous items you want to put to disk.
- Just snapshot the whole VM if you need a backup. This is dead simple. We've never had a snapshot that wouldn't restore to a perfectly-functional application, and we test it all the time. This is a huge advantage of going all-in with SQLite. One app, one machine, one snapshot, etc...
[+] [-] 3np|4 years ago|reply
[+] [-] akira2501|4 years ago|reply
Or use the ".backup" command in the CLI to take a clean single file snapshot if that's what you need. Or, you can call the checkpoint C API and if it can move all the transactions from WAL to DB, it will remove the WAL file when it's finished.
[+] [-] gwbas1c|4 years ago|reply
How do you deal with things like updates? Upgrades to your service?
Do you just accept that you have scheduled downtime when your service won't be available?
[+] [-] andrewstuart|4 years ago|reply
I find databases of all sorts extremely interesting and I've tried many of them, of all flavors.
In the end, I always come back to Postgres. It's unbelievably powerful, lightweight and there's not much it can't do.
Why not use sqlite? Here's one example - I like to access my database from my IDE remotely - my understanding is that remote access is not possible with sqlite.
Another example - multitenancy is dead easy in postgres, makes things more secure and makes coding easier.
Another is that Postgres had an extremely rich set of data types, including arrays and json.
Also Postgres is designed for symmetric multiprocessing which matters in today's multicore world.
[+] [-] dijit|4 years ago|reply
To that goal, I think it's wildly successful, instead of writing files I almost always reach for SQLite first.
When systems share a database, I, like you, reach straight for postgresql. Most people reach for mysql, which I do not prefer.
[+] [-] eru|4 years ago|reply
Well, depends on how wide your definition is:
Filesystems are also databases. (And eg inside of Google, they are using databases to store filesystems.)
These days, we often think of relational databases. Filesystems are more like the hierarchical databases of yore that Codd (also) talked about in his original papers.
[+] [-] hannofcart|4 years ago|reply
Not sure exactly what you mean by 'remote access', but if it's to debug something on a remote DB from an SQL IDE, I do this quite often. I use sshfs to mount the remote filesystem and open the SQLITE file in DBeaver.
Even in case of Postgres, you would have to connect to a remote DB via SSH or via VPN. It's just that you will be using sshfs to connect to a remote SQLITE file.
[+] [-] tzs|4 years ago|reply
How lightweight?
SQLite itself is around 600 KB. That's with all features enabled. I believe you can get it it down to about half that if you disable features you don't need. RAM usage is typically under a dozen or so KB.
It doesn't even require that you have an operating system. All you need for a minimal build is a C runtime that includes memcmp, memcpy, memmove, memset, strcmp, strlen, and strncmp, and by default also malloc, realloc, and free although it has provisions for providing different memory allocators, and you have to provide some code to access storage if you are running on a system that doesn't have something close to open, read, write, and the like.
This means it can even be used on many embedded systems far too small to run a Unix.
Heck, it is not even too unreasonable to compile SQLite to webasm and include it on a web page if for some reason you need client side SQL on the web.
[+] [-] api|4 years ago|reply
I've been watching CockroachDB as a more modern alternative that can auto-scale easily, do master/master easily, etc. Those are all painful on Postgres to the point that hosted and managed pgsql is a cash cow for cloud vendors.
[+] [-] ellimilial|4 years ago|reply
[+] [-] NicoJuicy|4 years ago|reply
It also has javascript V8 functions to update data for migrations.
( eg. migrating your events to a new model, instead of keeping the different versions)
[+] [-] Annatar|4 years ago|reply
I agree, but there is one thing that it cannot do, one very important, critical thing: synchronous multimaster replication.
Well, vanilla PostgreSQL cannot. Vertica can.
[+] [-] TicklishTiger|4 years ago|reply
[+] [-] znpy|4 years ago|reply
[+] [-] jhgb|4 years ago|reply
I found https://kristaps.bsd.lv/sqlbox/ to be an interesting approach.
[+] [-] bjacobt|4 years ago|reply
I’m building a multi tenant web application on node/sequlize/feathers and implementing logic in code. I’ve seen table row level security but haven’t looked deeply.
[+] [-] blacktriangle|4 years ago|reply
If so, have you run into any scaling issues with the number of tenants you can support. I looked into using schemas for multi tenancy but the various reports I found said around 50 tenants or so things started breaking down very quickly.
[+] [-] 7thaccount|4 years ago|reply
[+] [-] aabbcc1241|4 years ago|reply
[+] [-] gher-shyu3i|4 years ago|reply
[+] [-] not_knuth|4 years ago|reply
I have often seen Postgres, MySQL and MongoDB in places where they were really overkill and added unnecessary network and deployment complexity. And upon asking the designers it turns out they simply didn't know about SQLite...
Small gripe in case the poster reads this: There's a malopropism in the opening bit:
> SQLite is properly the only database you will ever need in most cases.
Should be:
> SQLite is probably
[+] [-] elnygren|4 years ago|reply
Also it's very easy to buy PostgreSQL from various DBaaS providers. There's a docker one liner to set it up on any development machine and you need to run it just once.
Sure, SQLite is a little bit easier but it's so much less powerful as a database. Why not just go to Postgres directly and leave SQLite for what it's intended: embedded programming like mobile apps or in-car entertainment systems etc.
[+] [-] masterofmisc|4 years ago|reply
[+] [-] Saris|4 years ago|reply
[+] [-] diarrhea|4 years ago|reply
Malapropism? Can just call it a typo (which is incorrect but people get it).
[+] [-] olavgg|4 years ago|reply
This would not be possible if I used a remote database, as there is no way to do a atomic snapshot of both the remote database on another filesystem(even if its ZFS) and the git repository.
Developers of multiple services, please support SQLite and don't lock yourself to a single database like PostgreSQL and MySQL. Even if the service is embedded in a Docker Container it is still painful to manage.
[+] [-] mulander|4 years ago|reply
ZFS snapshots may not result in a consistent databases backup of sqlite. You should use VACUUM INTO and then do a snapshot or use the sqlite Backup API.
See my other comment: https://www.sqlite.org/backup.html
Essentially, while you have a proper atomic snapshot of changes on disk the in flight transactions won't be there and you will be on the mercy of having a sucessful recovery from the journal/wal if you have that.
[+] [-] 7steps2much|4 years ago|reply
I run PostgreSQL on my VPS alongside gitea and some other services, and I can take snapshots of it just fine (assuming I don't take a snapshot as it is writing to disk, but that problem exists with SQLite as well).
[+] [-] CodesInChaos|4 years ago|reply
2. I doubt that the git part of your setup can be snapshotted correctly while an operation is in progress, since git doesn't use transactions.
[+] [-] croon|4 years ago|reply
[+] [-] lilyball|4 years ago|reply
[+] [-] mulander|4 years ago|reply
From sqlite.org on how to corrupt your database:
> 1.2. Backup or restore while a transaction is active Systems that run automatic backups in the background might try to make a backup copy of an SQLite database file while it is in the middle of a transaction. The backup copy then might contain some old and some new content, and thus be corrupt.
> The best approach to make reliable backup copies of an SQLite database is to make use of the backup API that is part of the SQLite library. Failing that, it is safe to make a copy of an SQLite database file as long as there are no transactions in progress by any process. If the previous transaction failed, then it is important that any rollback journal (the -journal file) or write-ahead log (the -wal file) be copied together with the database file itself.
What you need is to follow the guide[2] and use the Backup API or the VACUUM INTO[3] statement to create a new database on the side.
[1] https://www.sqlite.org/howtocorrupt.html
[2] https://www.sqlite.org/backup.html
[3] https://www.sqlite.org/lang_vacuum.html#vacuuminto
[+] [-] the-dude|4 years ago|reply
Rails switched to SQLite for at least the development environment.
As soon as I started to test the app with multiple clients ( browsers, websockets, embedded websockets ), I was hit with concurrency errors.
So I switched to PostgreSQL ( which I love ). But I had to.
[+] [-] elnygren|4 years ago|reply
Also you have to build some basic stuff yourself like Enum or JSON support. It's not hard but honestly I would feel much better using PostgreSQL in those situations.
[+] [-] RcouF1uZ4gsC|4 years ago|reply
It will continuously backup your database to an S3 compatible service. There are very nice detailed instructions for various backends (AWS, Minio, B2, Digital Ocean) as well as instructions on how to run as a service.
[+] [-] EMM_386|4 years ago|reply
https://github.com/sqlitebrowser/sqlitebrowser
[+] [-] staticelf|4 years ago|reply
This is why developing with MongoDB is so fast and easy in the beginning. You just go, store shit like you structure your data in code which makes the mapping extremely easy in most cases.
[+] [-] zubairq|4 years ago|reply
[+] [-] hesk|4 years ago|reply
[1] https://duckdb.org/
[+] [-] laurent123456|4 years ago|reply
[+] [-] nxpnsv|4 years ago|reply
[+] [-] gwd|4 years ago|reply
I think the "ever" is about "will I ever need to upgrade to a server-based database in this particular case". I.e., in cases A and B, you might do OK for a year or two with SQLite, but if your project grows, eventually you'll have to upgrade to Postgres; in these cases, SQLite is not the only database you'll ever need. But in cases C, D, E, and F, you'll never have to upgrade; in these cases, SQLite is the only database you'll ever need. The cases where SQLite is the only database you'll ever need outnumber the cases where SQLite is not the only database you'll ever need; thus, "SQLite is the only database you'll ever need in most cases."
[+] [-] jstsch|4 years ago|reply
[+] [-] geebee|4 years ago|reply
Recreating relational table query operations as a set of operations unique to a specific programming language instead of integrating SQL makes, to me, as much conceptual sense as recreating a unique text pattern matching search for each language rather than integrating regular expressions.
This is only a small sliver of the topic here, but I do think that SQLite is probably a good backend for this.
[+] [-] rickette|4 years ago|reply
[+] [-] makapuf|4 years ago|reply
[+] [-] slver|4 years ago|reply
SQLite doesn't enforce it. But people forget the major reason for DB enforcement is multiple application on one database.
This pattern has fallen out of favor in general for all databases (we prefer one service layer in front of the DB, and then multiple apps use that service layer).
And when you have one app or one service layer, that's where the enforcement can easily come from.
[+] [-] saberience|4 years ago|reply
Sure, for embedded databases with not much data and a single user, use SQLite, but for actual serious applications it's a toy.
[+] [-] flohofwoe|4 years ago|reply
It's not really "upscaling" if it still can be handled by one machine though isn't it? And it might be much cheaper to handle the same load on a number of weaker "machines" than a single powerful "machine" (replace "machine" with cloud VMs).
And for such "small-scale" problems, it's also not a big deal to run a "proper" DB server process on the same machine (but with the advantage that it's absolutely trivial to switch to multiple machines).