SQLite or PostgreSQL? It's Complicated

[+] msbarnett|3 years ago|reply

> Once again, SQLite happily stored my UUIDs on this column and even indexed them, but to get these across to PostgreSQL I needed to fix the schema and properly declare these keys as strings.

In PostgreSQL you're leaving performance on the table if you store UUIDs as strings instead of as the built in UUID type (128-bit value under the hood)

[+] cryptonector|3 years ago|reply

Plus PG and SQLite3 have fundamentally different approaches to types.

SQLite3 has very few types, you cannot create new types, and is duck-typed.

PG has many built-in types, you can create new types, and is (mostly) statically typed.

You really have to know this going in. If you have to support both, you'll probably have to write two versions of your SQL -- or you'll end up using an ORM that supports the two RDBMSes, and then you may get stuck with the ORM.

SQLite3 is amazing, but it's mainly amazing because of how easy it is to embed in apps. If that's what you need, use SQLite3. If you need more power, if you're building a server, then think long and hard about why not just go straight to PG.

EDIT: SQLite3 is amazing for other reasons too.

[+] masklinn|3 years ago|reply

There's also no word about indexing or analysis (and postgres is very sensitive to having up-to-date statistics).

And on sqlite you might be leaving memory / storage on the table if you use a UUID as PK and don't remove the implicit ROWID.

[+] miguelgrinberg|3 years ago|reply

For this particular application I discuss in the article it would make no difference, since there is only one not heavily used table that uses UUIDs. All the other tables use integer primary keys.

Also this may be not a very popular opinion, but I prefer to design the database without using database-specific features. Things such as using SQLAlchemy and sticking to basic types. Sure it may affect performance some, but thanks to having done that I was able to port the app from SQLite to PostgreSQL in like an hour.

[+] nick_|3 years ago|reply

Just wait until you learn that SQLite stores everything as a string :(

[+] jeffdn|3 years ago|reply

> The solution was to grow the slug column to 512 characters and retry.

No, the solution is to always use `text` as your column type, and use a check constraint if you need to enforce a limit. It's much easier and safer to alter a check constraint than it is to change a column type live in production. The `text` type and `varchar(x)` are identical under the hood, `text` takes up no more space on disk.

[+] luhn|3 years ago|reply

For Postgres, there's a strong case to be made VARCHAR(n) is preferable to TEXT+check constraint. As you note, they're stored the same on disk. But for a varchar, you can increase the length with just this:

    ALTER TABLE a ALTER COLUMN b TYPE VARCHAR(n)

This only requires ACCESS EXCLUSIVE for a fraction of a second while the metadata is updated.

Whereas updating a CHECK constraint will require a full scan of the table. If you do it naively, this locks the table for the entire scan. If you want to avoid that, it's three steps:

    ALTER TABLE a
        DROP CONSTRAINT b_c,
        ADD CONSTRAINT b_c CHECK (length(b) < n) NOT VALID;
    COMMIT;
    ALTER TABLE a
        VALIDATE CONSTRAINT b_c;

So as long as you're only increasing the length (which in my own experience is a safe assumption), VARCHAR is much easier to work with.

[+] SigmundA|3 years ago|reply

I thought as of PG 9.2 expanding a varchar column was metadata only operation and therefore low overhead.

I know in SQL Server there are two issues with doing varchar(max) for everything and increasing a columns size is metadata only. First indexes have a limit of 900 byte values and will fail at runtime if you index a column with no max length and insert a value larger than 900 bytes. PG seems to have this issue as well but the limit is 2712 bytes.

Second the query planner makes use of the size to determine how much memory to pre-allocate for the query, with unlimited length field it assume something like 4096 bytes and wastes working memory if your values are not actually that size. Not sure if PG has the second issue, having a max value defined is valuable information to a database engine along with other type information such as UUID's only taking 16 bytes instead of 36 bytes.

[+] masklinn|3 years ago|reply

Note: that's true for postgres and sqlite (which ignores the length limit entirely as they discovered anyway), not necessarily for other database systems.

[+] bob1029|3 years ago|reply

I think it's even more complicated.

The reduction in latency brought on by in-process databases, combined with modern NVMe storage, means that SQLite is a substantially faster approach than any other solution which requires a trip through the network stack.

I've got services in production for several financial institutions right now that are resolving most SQL queries within 100-1000 micro seconds by simply using SQLite on reasonable hardware.

How many more users could you support if the amount of time you had to await IO for each was reduced by ~2 orders of magnitude?

Obvious caveats here being the resilience angle. We addressed this with application-level protocols and additional instances.

[+] infogulch|3 years ago|reply

Another angle is unbounded horizontal scalability across tenants. If your application can be divided cleanly between different customers and one customer can be served with one instance (see "How many more users could you support if the amount of time you had to await IO for each was reduced by ~2 orders of magnitude?") then scaling to more customers is extremely simple and linear.

[+] abrookewood|3 years ago|reply

Agreed, but a fairer comparison would be SQLite vs PostgreSQL installed on the same machine.

[+] psankar|3 years ago|reply

What stops you from running postgres on the same vm/machine as your app ?! You avoid the network and reap all the benefits of the NVMe storage.

[+] srcreigh|3 years ago|reply

Can you share more info about resilience, app protocols and additional instances?

[+] WuxiFingerHold|3 years ago|reply

> How many more users could you support if the amount of time you had to await IO for each was reduced by ~2 orders of magnitude?

None - or practically not many, when using async (or sort of green threading) backend stacks. The waiting connections / users are just waiting and don't block any new connections.

If the network round trip to Postgres (in my experience around 3-10 ms, but of course highly depending on your server infra) was a concern Postgres could be placed on the same server as the backend, though I would not recommend it. But this relatively small IO overhead usually is not a concern for many apps.

[+] rattray|3 years ago|reply

Thank you for sharing! Anecdotes like this are very useful.

Can you share more about the context? How big/gnarly are the tables? How frequently/concurrently are they written to? Based on your experience here, when wouldn't you want to use this approach?

[+] philliphaydon|3 years ago|reply

It’s difficult to accept the results without looking at the query plans to see how the queries are running and if they are optimal. Seems like it’s just a straight dump of data into PostgreSQL and letting SQLAlchemy handle some queries and indexes but no analysis of the indexes at all.

Could be able to squeeze more perf out of both SQLite and PostgreSQL.

[+] miguelgrinberg|3 years ago|reply

The point of this exercise was to determine how these database perform with same schema, and if it makes sense to make the jump from SQLite to PostgreSQL or not. As a side note and since you ask, the indexes on this database are fairly well thought out, I have exhausted all ideas on indexing improvements to help with performance. I do mention a major implementation enhancement which is to precalculate monthly totals.

Also, another important point I tried to make is that this benchmark is useful for this particular deployment. I don't really want to convince you the numbers that I've got will also apply to you, my proposal is that you should test your application and make decisions based on your own results, which can certainly be different than mine.

[+] jayski|3 years ago|reply

exactamundo

[+] ed25519FUUU|3 years ago|reply

The fact that the performance is so close between a database that requires NO SERVER and one that requires a dedicated server + ops management is a signal to me that all projects should start on SQLite unless there is a compelling reason not to.

[+] masklinn|3 years ago|reply

It's funny because to me it says pretty much the opposite: because sqlite has no server it works entirely in-process, whereas the client will be talking to postgres over a unix socket, or possibly a full-blown network socket.

Plus it seems to be a readonly workload, something for which posgres is not ultra relevant (not that it doesn't work, but...).

[+] jayd16|3 years ago|reply

Postgres doesn't require a _dedicated_ server. If your requirements allow for SQLite but it didn't exist, you could conceivably run postgres on your app server to reduce the ops load if you really felt it was a significant effort.

If you might want to use multiple app instances connected to a shared DB, I would say it's probably easier to just use a local postgres container for dev and a managed cloud DB. Really not that much effort and you get automatic backups.

If you plan to never use a shared DB, SQLite is great though.

[+] Beltiras|3 years ago|reply

There are enough SQL oopsies in this article to show that you should NEVER use sqlite for any production except as filter databases or other sort of throwaways. RDBMSs are there to enforce constraints on data.

[+] lazide|3 years ago|reply

Uh, most distributed environments you have no choice but to use a server, because you’ll have clients on a (relatively) uncontrollable set of other machines.

If you’re writing something to run in an embedded/client app environment, then yeah why would you use Postgres for your one machine? You could, but it’ll add a lot of moving parts you’ll never need and probably don’t want (like remotely accessible network ports you’ll need to secure)

[+] rco8786|3 years ago|reply

yes!

It seems to me (or at least, I'm hoping this is happening) that the pendulum is swinging back to simple deployments and away from AWS/Cloud/K8s for every tiny little app.

One thing I'd love to see is that the 'default' MVP deployment is your code + SQLite running on a single VM. That will scale well into "traction" for the vast majority of applications people are building.

[+] avinassh|3 years ago|reply

hmm in that case I would probably pick Postgres because despite the added network call, it is still faster than an embedded DB.

[+] WuxiFingerHold|3 years ago|reply

I disagree:

Postgres and SQLite are very different in several aspects, e.g.

- types and enforcing type safety

- handling concurrency and transactions (available isolation levels)

- functions / procedures

- array support

Postgres can be much more valuable than just using it for "relational data storage". Its features can be a good reason for choosing it from the beginning.

[+] BeefWellington|3 years ago|reply

Sure, you can start there. Once you scale though you're going to have lots of fun.

[+] zinekeller|3 years ago|reply

I have a question that wasn't answered on the blog, and might be silly for me to ask. Is SQLite running in WAL mode or not? I usually found that WAL mode usually allows better performance even in read-heavy workloads as it doesn't need to exclusively lock the database each time.

Otherwise, the post is a nice reminder for SQLite users that foresee a database switch to use the newly-introduced strict mode.

[+] dgb23|3 years ago|reply

Yes, WAL mode can have a drastic impact.

Also this is apparently benchmarking Postgres and SQLite _with Python on top_. Some of the overhead here might not have much to do with either SQLite or Postgres at all. Just because one uses an ORM doesn't mean the same code is running and especially considering how sluggish Python is, I'm not convinced by any of this either way outside of this being an improvement for the very specific use-case presented. But we don't know _why_ that is.

[+] miguelgrinberg|3 years ago|reply

SQLite isn't running in WAL mode. This application isn't just read-heavy, it is ready-only for users, so I wouldn't think this will make a difference. The writes occur while everybody is sleeping, from a cron job that runs once a day to upload the traffic for the previous 24 hs.

[+] mjw1007|3 years ago|reply

I had a look at SQLite around the time sqlite3 appeared, and my notes say I decided I didn't like it because of the lax and sometimes inconsistent way it can treat its input.

I thought I'd have another go at it today (sqlite 3.34) and I'm mildly surprised that the things I didn't like are still there:

  sqlite> select '1 apple' + '2 oranges';
  3

  sqlite> select count(*) where 0.9;
  1
  sqlite> select count(*) where 1.1;
  1
  sqlite> select count(*) where not 0.9;
  1
  sqlite> select count(*) where not 1.1;
  0

[+] infogulch|3 years ago|reply

You may be interested in STRICT Tables [1], introduced just 6 months ago:

> Some developers appreciate the freedom that SQLite's flexible typing rules provide and use that freedom to advantage. But other developers are aghast at SQLite's flagrant rule-breaking and prefer the traditional rigid type system found in all other SQL database engines and in the SQL standard. For this latter group, SQLite supports a strict typing mode, as of version 3.37.0 (2021-11-27), that is enabled separately for each table.

[1]: https://www.sqlite.org/stricttables.html

Related past discussions:

Strict Tables – Column type constraints in SQLite - Draft | 2021-08-21 | 331 points, 108 comments | https://news.ycombinator.com/item?id=28259104

SQLite Release 3.37.0 | 2021-11-27 | 333 points, 112 comments | https://news.ycombinator.com/item?id=29363054

[+] simonw|3 years ago|reply

"I'm mildly surprised that the things I didn't like are still there"

That's because SQLite doesn't ship changes that break existing applications.

[+] gunapologist99|3 years ago|reply

The debate over type conversion seems somewhat like arguing language features (such as weak typing vs strong), which has been a spirited debate since forever.

The fact that this is debatable at all means that there are arguments and counter-arguments that serve both sides well, which further implies that there are use cases where some pros outweigh the cons on each side.

(Of course, this entirely neglects the point that it's usually fun and even educational to have a technical argument.)

[+] ormaaj|3 years ago|reply

This is totally uncomplicated. The choice of database is a function of your application and its architecture, not a matter of all-things-equal performance comparison.

"I want an extensible versitile database platform for use as a component in an infrastructure that is part of a larger system" - postgres.

"I want an embeddable lightweight SQL storage backend for a narrow scope, single-purpose, probably single-user, probably single-system application" - sqlite

Postgres should perform well in any role in the hands of an expert - it's a question of whether or not that is overkill. Sqlite can also perform well in a subset of those roles.

[+] mattxxx|3 years ago|reply

Some context: I've worked with massive PostgreSQL db's, containing millions of daily records spanning multiple years (allowing for potentially massive numbers of returned rows). I also exclusively MVP simple db architectures in SQLite (think: no massive joins or need to partition data).

Best takeaways here:

- SQLite kills it. It appears untuned PostgreSQL is only 2 seconds (???) faster than untuned SQLite; always try to kickoff a project with SQLite

- PostgreSQL has a performance cliff when query responses leave memory. This is extra true when an index no longer fits in memory. worker_mem is critical for pushing back this performance cliff

Side note: Tuning postgres can be really painful, but it's also a really dope tried-and-true db.

[+] xsc|3 years ago|reply

It seems a solid first step would be to use EXPLAIN to understand query planning/scanning issues.

Assuming 6.5M rows/day*1.5yr=~3.5B rows, how large is the dataset in both SQLite and pg? With indexes, is it fitting in RAM?

Linode 2GB plan is shared vcpu, so results will vary depending on who is sharing it with your service.

[+] miguelgrinberg|3 years ago|reply

This isn't the topic of the article so I haven't included it, but I have taken several "solid steps" to optimize this application throughout its year and a half of being in production.

This article is about one of them only, how I designed a test system to evaluate the two databases head to head.

[+] mcdonje|3 years ago|reply

>If you are going to take one thing away from this article, I hope it is that the only benchmarks that are valuable are those that run on your own platform, with your own stack, with your own data, and with your own software. And even then, you may need to add custom optimizations to get the best performance.

Good advice, contextually.

[+] ehutch79|3 years ago|reply

Is it just me or was testing done with only four concurrent clients? In a mostly read only environment?

Wouldn't moving to a static site generator been a better direction in that case?

If you're maxing out at four people reading your blog, I can't imagine more infrastructure is better.

[+] KronisLV|3 years ago|reply

You know, apart from some of the details in regards to what data types to better use etc., I think that both PostgreSQL and SQLite are excellent choices for most use cases!

Of course, SQLite might be the best choice for embedded or local usage, they even have this lovely page about using it as an application file format: https://sqlite.org/appfileformat.html

And PostgreSQL might scale better for multiple writers and situations where you need it on a separate VM/server/container that might be shared between different apps through a network connection of some sort.

Though it's also curious, because you never hear a lot about MySQL or MariaDB (comparatively, in my subjective opinion) on HN or many other places for that matter. And yet, it's still a decent option, has good tooling (MySQL Workbench is great), is widely supported and backs a significant amount of production systems out there.

Do people just view it as old/legacy, or is there the belief that for many use cases PostgreSQL might simply be the better option, e.g. also because of projects like PostGIS and support for more complex setups, like custom data types, more advanced scripting language for in-database processing and whatnot?

[+] mikece|3 years ago|reply

I really wish WordPress supported using SQLite as it's database instead of MySQL as this could make backups as simple as rsync'ing from prod server to backups.

[+] simonw|3 years ago|reply

There's a WordPress plugin that adds support for SQLite as an alternative to MySQL.

The implementation is (to my mind) simply astonishing: they run regular expressions against the SQL to convert it from MySQL dialect to SQLite! https://github.com/aaemnnosttv/wp-sqlite-db/blob/9a5604cce13...

I'm amazed that this works, but apparently it works really well. I would expect it to break the moment it ran into a WordPress plugin that executed custom SQL in a way they hadn't anticipated though.

[+] polyrand|3 years ago|reply

Great blog post. A few observations (considering the blog post skips numerous details):

SQLite really shines when compared to running a database in a separate server from the app. This is when the network latency starts being noticeable. Latency between you and the app is usually constant. The latency between the app and the database is what matters. If in this case, Postgres can run in the same host as the main app, I think it's a good thing to do. But very often that's not the case.

Even though not surprising, it's great to see how SQLite handles concurrency. From my experience, SQLite would be able to handle much more read concurrency. It would have been great to see more benchmarks (like 1000 concurrent users, even if the app would never see such traffic).

I was surprised that SQLite didn't beat Postgres on simpler queries. Without seeing the queries, if they contain multiple joins, that could be another reason why SQLite is slower since it can only do "nested loop" JOINs. The workloads also seem mostly analytical, we'll see where DuckDB goes in this space (looking promising for now).

One thing not mentioned in the blog post is "easy to maintain and update". If the app is deployed with Docker, adding Postgres means you go from worrying about:

  1 docker container + 1 docker volume

to:

  2 docker containers (docker-compose?) + 1 docker volume + internal docker networking

The fact that SQLite is so simple to operate also has some advantages. You can usually go back to an SQLite-based app 1 year later and understand everything. With other databases, you may need to review users, roles, permissions, connection string, etc.

Another cool thing about SQLite is that it scales mostly linearly as you add more CPU cores. It can scale as long as you can also scale the number of processes in your app. In other databases, scaling means adding connection pools and other tools in the middle.

Lastly, and related to the latency topic. The low latency in SQLite means that you can sometimes convert a big query into 5 separate, small and reusable queries without any performance hit. It's really remarkable home much cleaner your code can become with this.

In any case, I love working both with Postgres and SQLite, and we need more blog posts like this one showing real-world cases.

[+] rglullis|3 years ago|reply

Here is a comparison that I'd like to see next: how much time was spent on the following...

- porting the application, making all these benchmarks and writing a blog post

vs

- simply getting a VM with more RAM and more CPUs.

In the conclusion the author goes on to talk about the options "if I get the budget" vs "without throwing more money at the problem". I'd guess that the cost of the day of a Principal Engineer would far surpass the cost of years running a 4CPU server.

[+] yread|3 years ago|reply

I understand and partially agree with all the criticism of the article (and could add my own like why did you tweak work_mem out of all the hundreds of settings on postgres) but it's a blog post, not a scientific article - I still think it's a useful post. I might be in a similar situation at some point and it's nice to know that you haven't encountered any major gotchas and made it even 2x faster. Thanks for sharing your experience.

[+] rammy1234|3 years ago|reply

If you say using SQLite, there is always risk of losing that data file in disk right. How is that managed. is this a dumb question , which PostgreSQL do not ??

[+] xwdv|3 years ago|reply

No one ever got fired for just using postgtreSQL, maybe burned though.

[+] cocoflunchy|3 years ago|reply

According to https://www.linode.com/pricing/ a shared instance with 2GB of RAM and 1CPU is $10/month. I don't know how much a principal engineer at twilio makes but I'm pretty sure this wasn't a cost effective optimization... It's also a bit weird not to talk about indexing? Are sqlite and Postgresql similar enough that the same indexing strategy should be used?

244 comments