top | item 46496103

Databases in 2025: A Year in Review

717 points| viveknathani_ | 1 month ago |cs.cmu.edu

192 comments

order

danielfalbo|1 month ago

Maybe off-topic but,

If you're not familiar with the CMU DB Group you might want to check out their eccentric teaching style [1].

I absolutely love their gangsta intros like [2] and pre-lecture dj sets like [3].

I also remember a video where he was lecturing with someone sleeping on the floor in the background for some reason. I can't find that video right now.

Not too sure about the context or Andy's biography, I'll research that later, I'm even more curious now.

[1] https://youtube.com/results?search_query=cmu+database

[2] https://youtu.be/dSxV5Sob5V8

[3] https://youtu.be/7NPIENPr-zk?t=85

sirfz|1 month ago

Indeed, I was delighted when I read the part about wutang's time capsule and obviously OP is a wu-tang and general hip hop fan. The intro you shared is dope!

znpy|1 month ago

I can't understand if their "intro to database systems" is an introductory (undergrad) level course or some advanced course (as in, introduction to database (internals)).

Anyone willing to clarify this? I'm quite weak at database stuff, i'd love to find some undergrad-level proper course to learn and catch up.

sargun|1 month ago

Andy Pavlo absolutely seems like the kind of guy that I would want to get a drink with.

dang|1 month ago

(I consed "https://" onto your links so they'd become clickable. Hope that's ok!)

beders|1 month ago

While the author mentions that he just doesn't have the time to look at all the databases, none of the reviews of the last few years mention immutable and/or bi-temporal databases.

Which looks more like a blind spot to me honestly. This category of databases is just fantastic for industries like fintech.

Two candidates are sticking out. https://xtdb.com/blog/launching-xtdb-v2 (2025) https://blog.datomic.com/2023/04/datomic-is-free.html (2023)

apavlo|1 month ago

> none of the reviews of the last few years mention immutable and/or bi-temporal databases.

We hosted XTDB to give a tech talk five weeks ago:

https://db.cs.cmu.edu/events/futuredata-reconstructing-histo...

> Which looks more like a blind spot to me honestly.

What do you want me to say about them? Just that they exist?

zie|1 month ago

You can get pretty far with just PG using tstzrange and friends: https://www.postgresql.org/docs/current/rangetypes.html

Otherwise there are full bitemporal extensions for PG, like this one: https://github.com/hettie-d/pg_bitemporal

What we do is range types for when a row applies or not, so we get history, and then for 'immutability' we have 2 audit systems, one in-database as row triggers that keeps an on-line copy of what's changed and by who. This also gives us built-in undo for everything. Some mistake happens, we can just undo the change easy peasy. The audit log captures the undo as well of course, so we keep that history as well.

Then we also do an "off-line" copy, via PG logs, that get shipped off the main database into archival storage.

Works really well for us.

radarroark|1 month ago

People are slow to realize the benefit of immutable databases, but it is happening. It's not just auditability; immutable databases can also allow concurrent reads while writes are happening, fast cloning of data structures, and fast undo of transactions.

The ones you mentioned are large backend databases, but I'm working on an "immutable SQLite"...a single file immutable database that is embedded and works as a library: https://github.com/radarroark/xitdb-java

delichon|1 month ago

I see people bolting temporality and immutability onto triple stores, because xtdb and datomic can't keep up with their SPARQL graph traversal. I'm hoping for a triple store with native support for time travel.

quotemstr|1 month ago

XTDB addresses a real use-case. I wish we invested more in time series databases actually: there's a ton of potential in a GIS-style database, but 1D and oriented around regions on the timeline, not shapes in space.

That said, it's kind of frustrating that XTDB has to be its own top-level database instead of a storage engine or plugin for another. XTDB's core competence is its approach to temporal row tagging and querying. What part of this core competence requires a new SQL parser?

I get that the XTDB people don't want to expose their feature set as a bunch of awkward table-valued functions or whatever. Ideally, DB plugins for Postgres, SQLite, DuckDB, whatever would be able to extend the SQL grammar itself (which isn't that hard if you structure a PEG parser right) and expose new capabilities in an ergonomic way so we don't end up with a world of custom database-verticals each built around one neat idea and duplicating the rest.

I'd love to see databases built out of reusable lego blocks to a greater extent than today. Why doesn't Calcite get more love? Is it the Java smell?

malloryerik|1 month ago

Btw Datomic is free now that Nubank supports it (and runs a large bank on it).

There's also a fantastic kind of mini, FOSS, file-based Datomic-style Datalog DB that's not immutable called Datalevin. Uses the hyper-fast LMDB under the hood. It's called Datalevin. https://github.com/juji-io/datalevin

anonymousDan|1 month ago

Why fintech specifically?

TekMol|1 month ago

From my perspective on databases, two trends continued in 2025:

1: Moving everything to SQLite

2: Using mostly JSON fields

Both started already a few years back and accelerated in 2025.

SQLite is just so nice and easy to deal with, with its no-daemon, one-file-per-db and one-type-per value approach.

And the JSON arrow functions make it a pleasure to work with flexible JSON data.

delaminator|1 month ago

From my perspective, everything's DuckDB.

Single file per database, Multiple ingestion formats, full text search, S3 support, Parquet file support, columnar storage. fully typed.

WASM version for full SQL in JavaScript.

DrBazza|1 month ago

From my perspective - do you even need a database?

SQLite is kind-of the middle ground between a full fat database, and 'writing your own object storage'. To put it another way, it provides 'regularised' object access API, rather than, say, a variant of types in a vector that you use filter or map over.

kopirgan|1 month ago

As a backend database that's not multi user, how many web connections that do writes can it realistically handle? Assuming writes are small say 100+ rows each?

Any mitigation strategy for larger use cases?

Thanks in advance!

andrewinardeer|1 month ago

Pardon my ignorance, yet wasn't the prevailing thought a few years ago that you would never use SQLite in production? Has that school of thought changed?

quotemstr|1 month ago

FWIW (and this is IMHO of course) DuckDB makes working with random JSON much nicer than SQLite, not least because I can extract JSON fields to dense columnar representations and do it in a deterministic, repeatable way.

The only thing I want out of DuckDB core at this point is support for overriding the columnar storage representation for certain structs. Right now, DuckDB decomposes structs into fields and stores each field in a column. I'd like to be able to say "no, please, pre-materialize this tuple subset and store this struct in an internal BLOB or something".

randomtoast|1 month ago

I would say SQLite when possible, PostgreSQL (incl. extensions) when necessary, DuckDB for local/hobbyist data analysis and BigQuery (often TB or PB range) for enterprise business intelligence.

odie5533|1 month ago

For as much talk as I see about SQLite, are people actually using it or does it just have good marketers?

CuriouslyC|1 month ago

I think the right pattern here is edge sharding of user data. Cloudflare makes this pretty easy with D1/Hyperdrive.

phendrenad2|1 month ago

Man, I hope so. Bailing people out of horribly slow NoSQL databases is good business.

A1aM0|1 month ago

Pavlo is right to be skeptical about MCP security. The entire philosophy of MCP seems to be about maximizing context availability for the model, which stands in direct opposition to the principle of Least Privilege.

When you expose a database via a protocol designed for 'context', you aren't just exposing data; you're exposing the schema's complexity to an entity that handles ambiguity poorly. It feels like we're just reinventing SQL injection, but this time the injection comes from the system's own hallucinations rather than a malicious user.

Miyamura80|1 month ago

Totally agree, unfettered access to databases are dangerous

There are ways to reduce injection risk since LLMs are stateless and thus you can monitor the origination and the trustworthiness of the context that enters the LLM and then decide if MCB actions that affect state will be dangerous or not

We've implementeda mechanism like this based on Simon Willison's lethal trifecta framework as an MCP gateway monitoring what enters context. LMK if you have any feedback on this approach to MCP security. This is not as elegant as the approach that Pavlo talks about in the post, but nonetheless, we believe this is a good band-aid solution for the time bein,g as the technology matures

https://github.com/Edison-Watch/open-edison

nijave|1 month ago

Yes and no. Least privilege has existed in databases for a very long time. You need to implement correct DB privileges using user/roles, views, and other best practices. The MCP server is more like a dumb client in this setup.

However, that's easy for people to forget and throw privileged creds at the MCP and hope for the best.

The same stands for all LLM tools (MCP servers or otherwise). You always need to implement correct permissions in the tool--the LLM is too easily tricked and confused to enforce a permission boundary

anthonypasq|1 month ago

i dont know anyone with a brain that is using a DB mcp with write permissions in prod. i mean trying to lay that blame on a protocol for doing something as nuts as that seems unfair.

SpaceL10n|1 month ago

Was the trade-off so exciting that we abandoned our own principles? Or, are we lemmings?

Edit: My apologies for the cynical take. I like to think that this is just the move fast break stuff ethos coming about.

p2hari|1 month ago

The author mentions about it in the name change for edgeDb to gel. However, it could also have been added in the Acquisitions landscape. Gel joined vercel [1].

1. https://www.geldata.com/blog/gel-joins-vercel

lvl155|1 month ago

I want to thank Andy and the entire DB Group at CMU. They’ve done a great job of making database accessible to so many people. They are world class.

techsystems|1 month ago

What did they do?

felipelalli|1 month ago

I think it's time for a big move towards immutable databases that weren't even mentioned in this article. I've already worked with Datomic and immudb: Datomic is very good, but extremely complex and exotic, difficult learning curve to achieve perfect tuning. immudb is definitely not ready for production and starts having problems with mere hundreds of thousands of records. There's nothing too serious yet.

zjaffee|1 month ago

What an amazing set of articles, one thing that I think he's missed is the clear multi year trends.

Over the past 5 years there's been significant changes and several clear winners. Databricks and Snowflake have really demonstrated ability to stay resilient despite strong competition from cloud providers themselves, often through the privatization of what previously was open source. This is especially relevant given also the articles mentioning of how cloudera and hortonworks failed to make it.

I also think the quiet execution of databases like clickhouse have shown to be extremely impressive and have filled a niche that wasn't previously filled by an obvious solution.

ComputerGuru|1 month ago

Pg18 is an absolutely fantastic release. Everyone flaks about the async IO worker support, but there’s so much more. Builtin Unicode locales, unique indexes/constraints/fks that can be added in unvalidated state, generated virtual (expression) columns, skip scans on btree indexes (absolutely huge), uuidv7 support, and so much more.

ugamarkj|1 month ago

Interesting article and I've enjoyed reading all the comments. The focus on Postgres is fascinating to me. From an analytics database perspective, we've had great success with Exasol, which isn't so well known in the US. It is very low overhead (no index management) and extremely fast and scalable. They have a free version as well as a licensed MPP version -- cloud hosted or on-prem. It is a blank slate, but it can do all the things.

budapest05|1 month ago

For analytics workloads, Exasol is a great choice: high performance & MPP scale. They offer a free Personal Edition for download and testing (cloud and on‑prem options available): https://downloads.exasol.com/exasol-personal worth a quick benchmark with your own data.

throw0101d|1 month ago

Regarding distributed(-ish) Postgres, does anyone know if something like My/MariaSQL's multi-master Galera† is around for Pg:

> MariaDB Galera Cluster provides a synchronous replication system that uses an approach often called eager replication. In this model, nodes in a cluster synchronize with all other nodes by applying replicated updates as a single transaction. This means that when a transaction COMMITs, all nodes in the cluster have the same value. This process is accomplished using write-set replication through a group communication framework.

* https://mariadb.com/docs/galera-cluster/galera-architecture/...

This isn't necessarily about being "web scale", but having a first-party, fairly-automated replication solution would make HA easier for a number internal-only stuff much simpler.

† Yes, I am aware: https://aphyr.com/posts/327-jepsen-mariadb-galera-cluster

nijave|1 month ago

Citus, sort of Cockroach

For HA, Patroni, stolon, CNPG

Multimaster doesn't necessarily buy you availability. Usually it trades performance and potentially uptime for data integrity.

backtogeek|1 month ago

I can't believe that article has no mention of SQLite ??

bob1029|1 month ago

No MSSQL, DB2 or Oracle either. Anything this proven & stable is probably not worth blogging about in this context. SQLite gets a lot of attention on HN but that's a bit of an exception.

astrostl|1 month ago

Same. CMD-F, 'sqlite', no hits, skip and go straight to comments.

jereze|1 month ago

No mention of DuckDB? Surprising.

dujuku|1 month ago

Also somewhat surprised. DuckDB traction is impressive and on par with vector databases in their early phases. I think there's a good chance it will earn an honorable mention next year if adoption holds and becomes more mainstream. But my impression is that it's still early in its adoption curve where only those "in the know" are using it as a niche tool. It also still has some quirks and foot-guns that need moderately knowledgeable systems people to operate (e.g. it will happily OOM your DB)

mariocesar|1 month ago

Same surprise here. However in practice, the community tends to talk about DuckDB more like a client-side tool than a traditional database

santiagobasulto|1 month ago

I love these yearly review posts. Thanks Andy and team.

divan|1 month ago

> Acquisitions ... Gel → Vercel

is a bit misleading. Gel (formerly EdgeDB) is sunsetting it's development. (extremely talented) Team joins Vercel to work on other stuff.

That was a hard hit for me in December. I loved working with EdgeQL so much.

senderista|1 month ago

It is a beautifully designed language and would make a great starting point for future DB projects.

quotemstr|1 month ago

Why does "database" is surveys like this not include DuckDB and SQLite, which are great [1] embedded answers to Clickhouse and PostgreSQL. Both are excellent and useful databases; DuckDB's reasonable syntax, fast vectorized everything, and support for ingesting the hairiest of data as in-DB ETL make me reach for it first these days, at least for the things I want to do.

Why is it that in "I'm a serious database person" circles, the popular embedded databases don't count?

[1] Yes, I know it's not an exact comparison.

gr4vityWall|1 month ago

Didn't know MongoDB was suing the company behind FerretDB. That's disgusting.

beembeem|1 month ago

Andy has a balanced and appropriate take here.

bzGoRust|1 month ago

I would like to mention that vector databases like Milvus got lots of new features to support RAG, Agent development, features like BM25, hybrid search etc..

qinchencq|1 month ago

Was hoping to read about graph database, AI-related changes..., but didn't expect this: "I almost died in the spring semester...surprisingly hard to concentrate on important things like databases when you can't breathe." Hope Prof. Pavlo has been breathing better, stellar review.

tiemster|1 month ago

Also emmer (which is perhaps too niche to get mentioned in an article like this), which I focuses more on being a quick/flexible 'data scratchpad', rather than just scale.

https://hub.docker.com/r/tiemster/emmer

furrball010|1 month ago

nice to see it get mentioned here :), I like using it also for scripts etc. Quite flexible because you can do everything with the api.

pjmlp|1 month ago

Over here, it is DB2, SQL Server or Oracle if using a plain RDMS, or whatever DB abstraction layer is provided on top of a SaaS product, where we get to query with some kind of ORM abstraction preventing raw SQL, or GraphQL, without knowing the implementation details.

sandos|1 month ago

This sounds like a flashback to J2EE. Which I know is still alive and well. Banks, insurance companies and the tax agency do not much care for fancy new stuff, but that it works.

npalli|1 month ago

Andy is probably the only person who adores Larry Ellison (Oracle) unironically.

sam_goody|1 month ago

I love his unabashed attitude.

- Larry hit spot #1! Yay!

- Larry lost $130 BILLION in two months. 1/3 of his wealth. "I don't care!"

viccis|1 month ago

Ironically unironically.

codeulike|1 month ago

Barely any mention of Oracle or MS Sql Server, commonly reckoned to be #1 and #3 most used databases in the world

https://db-engines.com/en/ranking

qcnguy|1 month ago

Oracle is mentioned at the start, where he proclaims the "dominance" of Postgres and then admits its newest features have been in Oracle for nearly a quarter of a century already. The dominance he's talking about is only about how many startups raise how many millions from investors, not anything technical.

And then of course at the end he has a whole section about Larry Ellison, like always.

cluckindan|1 month ago

”I still haven't met anybody who is actively using Dgraph.”

That’s because it is mostly used in national security and military applications in several countries.

thesurlydev|1 month ago

Supabase seems to be killing it. I read somewhere they are used by ~70% of YCombinator startups. I wonder how many of those eventually move to self-hosted.

alexpadula|1 month ago

Been reading these for a few years. I enjoy them, thank you Andy. I hope you’re doing better.

andersmurphy|1 month ago

With a trend towards immutable single writer databases MMAP seems like a massive win.

mtndew4brkfst|1 month ago

Andy is very critical of using mmap in database implementations.

jimmar|1 month ago

> "The Dominance of PostgreSQL Continues"

It seems like the author is more focused on database features than user base. Every metric I can find online says that MySQL/MariaDB is more popular than PostgreSQL. PostgreSQL seems "better" (more features, better standards compliance) but MySQL/MariaDB works fine for many people. Am I living in a bubble?

mdasen|1 month ago

Popularity can mean multiple things. Are we talking about how frequently a database is used or how frequently a database is chosen for new projects? MySQL will always be very popular because some very popular things use it like WordPress.

It does feel like a lot of the momentum has shifted to PostgreSQL recently. You even see it in terms of what companies are choosing for compatibility. Google has a lot more MySQL work historically, but when they created a compatibility interface for Cloud Spanner, they went with PostgreSQL. ClickHouse went with PostgreSQL. More that I'm forgetting at the moment. It used to be that everyone tried for MySQL wire compatibility, but that doesn't feel like what's happening now.

If MySQL is making you happy, great. But there has certainly been a shift toward PostgreSQL. MySQL will continue to be one of the most used databases just as PHP will remain one of the most used programming languages. There's a lot of stuff already built with those things. I think most metrics would say that PHP is more widely deployed than NodeJS, but I think it'd be hard to argue that PHP is what the developer community is excited about.

Even search here on HN. In the past year, 4 MySQL stories with over 100 point compared to 28 PostgreSQL stories with over 100 points (and zero MariaDB stories above 100 points and 42 SQLite). What are we talking about here on HN? Not nearly as frequently MySQL - we're talking about SQLite and PostgreSQL. That's not to say that MySQL doesn't work great for you or that it doesn't have a large installed base, but it isn't where our mindshare is about the future.

dujuku|1 month ago

> Every metric I can find online says that MySQL/MariaDB is more popular than PostgreSQL

What are those metrics? If you're talking about things like db-engines rankings, those are heavily skewed by non-production workloads. For example, MySQL still being the database for Wordpress will forever have a high number of installations and developers using and asking StackOverflow questions. But when a new company or established company is deciding which new database to use for their custom application, MySQL is seldom in the running like it was 8-10 years ago.

spprashant|1 month ago

I think author is basing his observations on where the money is flowing. PostgreSQL adjacent startups and businesses are seeing a lot of investment.

apavlo|1 month ago

> Am I living in a bubble?

There are rumblings that the MySQL project is rudderless after Oracle fired the team working on the open-source project in September 2025. Oracle is putting all its energy in its closed-source MySQL Heatwave product. There is a new company that is looking to take over leadership of open-source MySQL but I can't talk about them yet.

The MariaDB Corporation financial problems have also spooked companies and so more of them are looking to switch to Postgres.

dmarwicke|1 month ago

we had to restrict ours to views only because it kept trying to run updates. still breaks sometimes when it hallucinates column names but at least it can't do anything destructive

SchwKatze|1 month ago

Can we even say that Anyblox is a file format? By my understanding of the project it's "just" a decoder for other file formats to solve the MxN problem.

shekispeaks|1 month ago

TiDB has gained some momentum in silicon valley with companies looking to adopt it. Does he have any commentary on TiDB which is an OLTP and OLAP hybrid?

cryptica|1 month ago

It's so weird how everyone nowadays is using Postgres. It's not like end users can see your database.

It's disturbing how everyone is gravitating towards the same tools. This started happening since React and kept getting worse. Software development sucks nowadays.

All technical decisions about which tools to use are made by people who don't have to use the tools. There is no nuance anymore. There's a blanket solution for every problem and there isn't much to choose from. Meanwhile, software is less reliable than it's ever been.

It's like a bad dream. Everything is bad and getting worse.

esafak|1 month ago

What's wrong this postgres?

da02|1 month ago

Which alternatives to PostgreSQL would you like to see get more attention?