Building a new database management system in academia (2017)

[+] onetimeuse92304|2 years ago|reply

It may seem daunting, but I think many people make it more complex / difficult than it needs to be.

I have rolled out two transactional databases of my own. In both cases I had to provide very specific properties and for some reason I could not find an existing product that would meet all requirements. For example, one of them was an embedded device that was very restricted in memory, all operations needed to run with hard bounds on time and memory and the storage for the data was a flash chip without wear levelling which required the database itself to manage writes to prolong the chip's life.

The key is to notice how your database system is going to be different from others and what properties are not essential.

Also, making general purpose DBMS tends to be much more complex vs making more niche solutions where you know a bit more about what the uses are going to be and what kinds of loads you can expect.

Creating a custom engine for a given application can be very simple task because you can easily cross out requirements you don't care about and you only care that it works well for the loads that this particular application can generate.

Also, it is unlikely you are going to beat fierce competition in general purpose "and a kitchen sink" database management system market, but much easier to find a niche that is underserved and create a usable, competitive product with relatively little effort. That's how SQLite started.

[+] yingjunwu|2 years ago|reply

Proud to see my name (https://twitter.com/YingjunWu) mentioned in Andy's blog. I was Andy's visiting PhD at CMU and was the top 1 contributor to Peloton (https://github.com/cmu-db/peloton).

Today, building a database from scratch is extremely difficult, for several reasons: 1. it anyways takes a long time; 2. there are so many successful (open-source) databases; 3. hiring top engineers are so expensive. 4. you won't get enough attention unless your system is drastically better than existing ones.

An interesting observation is that very few database was built since 2020 - almost all the newly built databases were developed on top of existing databases (PostgreSQL, ClickHouse, etc).

I started building RisingWave (https://github.com/risingwavelabs/risingwave) in early 2021. The only reason we built the system from scratch was that none of the existing systems can address the problem we are solving - distributed SQL stream processing at cloud scale. We tried Flink but gave up, as it's too heavy and it's architecture was not designed for the cloud environment.

If you want to build a database from scratch, or are simply interested in databases, we may talk.

[+] iudqnolq|2 years ago|reply

You're obviously the expert here, but I was surprised that you found it notable very few databases have been released in the last three years. That seems like a very short timeframe. Per Wikipedia ClickHouse started as an experimental project in 2009 and was first released in 2016.

[+] stakhanov|2 years ago|reply

This is an announcement from 2017 about "the next five years", which time period is now squarely in the past.

Did the DBMS ever come into existence? (If so: link, please). If not: Why should we be interested in this announcement in 2023?

[+] paddw|2 years ago|reply

This is Andy Pavlo, so he probably got sidetracked with https://ottertune.com/

Not sure what op's intention with this was

[+] whoevercares|2 years ago|reply

DeepDive from he listed become SnorkelAI, which become hot lately

[+] dang|2 years ago|reply

Discussed at the time:

Building a Database System in Academia - https://news.ycombinator.com/item?id=13931752 - March 2017 (15 comments)

[+] pcthrowaway|2 years ago|reply

Uh... the link to macrobase redirects to pornhub.

Not a good look for people browsing at work

[+] apavlo|2 years ago|reply

Yikes! Thanks for the heads up. Peter left Stanford so I guess they took over the domain name :-(

[+] eatonphil|2 years ago|reply

If you're interested in the idea of databases built from scratch since the time this post was written in 2017 (based on GitHub contributions info), here are a few:

- Materialize: 2017

- DuckDB: 2018

- RedPanda: 2019

- TigerBeetle: 2020

[+] zX41ZdbW|2 years ago|reply

According to my estimation, a new database engine is born every week - mostly key-value and document databases. Only a small subset of them survive after one year. According to a guess by Stonebreaker, a DBMS takes around 7 years to become mature enough for general applications.

[+] jchrisa|2 years ago|reply

I am building a new immutable cryptographically verified database using IPLD data structures and prolly trees. This allows changes made anywhere to be transparently synced, and for operations to be commuted amongst untrusted peers, for instance allowing for shared index maintenance.

https://use-fireproof.com/docs/architecture

It's also the easiest way to write React apps. Here are some ChatGPT expert builders that I've trained to use the CSS framework of your choice with Fireproof: https://use-fireproof.com/docs/chatgpt-quick-start/#react-ex...

[+] guodong|2 years ago|reply

KùzuDB[1] is an in-process graph database built from scratch and came out of academia too. We are from Data Systems Group at University of Waterloo, started since Sep 2020, and have a small team actively work on it now. These two posts[2,3] explain where we are from and where we're going, if anyone is interested.

[1]: https://github.com/kuzudb/kuzu

[2]: https://kuzudb.com/blog/meet-kuzu.html

[3]: https://kuzudb.com/blog/what-every-gdbms-should-do-and-visio...

[+] otoolep|2 years ago|reply

I started rqlite[1] in 2014[2], FWIW. While I didn't build the storage engine, or the consensus system, I've built the entire "management" part of the RDBMS from scratch. I'm almost 10 years at it, and there is still plenty to do.

[1] https://www.rqlite.io

[2] https://www.philipotoole.com/9-years-of-open-source-database...

[+] tlarkworthy|2 years ago|reply

and DuckDB came out of academia too and is not based on Postgres either (highly relevant and notably absent in the authors list of academic DBs at the end of the article)

https://duckdb.org/pdf/SIGMOD2019-demo-duckdb.pdf

EDIT: oh the article is old

[+] zachmu|2 years ago|reply

Dolt started in 2018: https://doltdb.com

Yes we have commit history from 2015 but that's from an earlier db project (noms) that we forked and built on top of

[+] frankdejonge|2 years ago|reply

Correct me if I’m wrong, but I don’t think RedPanda is a database. I see it as a streaming data solution, which the novelty factor can be discussed as well, since it’s basically Kafka.

[+] whoevercares|2 years ago|reply

FWIW, many of his recent student went to Databricks now

[+] esjeon|2 years ago|reply

For unsuspecting readers: this article talks about the feasibility of building a new practical DBMS. Database is the most critical piece of software for businesses, so it has already been thoroughly explored and researched. It's very difficult to find a better solution for existing problems. One should either invent a new paradigm or tackle unsolved problems to justify the cost of development.

Technology-wise, writing a toy DBMS is nothing difficult. Even undergraduates can do it.

42 comments