top | item 46850937

(no title)

atombender | 28 days ago

It's probably going to be acquired. The last effort to commercialize the TUM (Technical University of Munich) database group's work was acquired by Snowflake and disappeared into that stack.

CedarDB is the commercialization of Umbra, the TUM group's in-memory database lead by professor Thomas Neumann. Umbra is a successor to HyPer, so this is the third generation of the system Neumann came up with.

Umbra/CedarDB isn't a completely new way of doing database stuff, but basically a combination of several things that rearchitect the query engine from the ground up for modern systems: A query compiler that generates native code, a buffer pool manager optimized for multi core, push-based DAG execution that divides work into batches ("morsels"), and in-memory Adaptive Radix Tries (never used in a database before, I think).

It also has an advanced query planner that embraces the latest theoretical advances in query optimization, especially some techniques to unnest complex multi-join query plans, especially with queries that have a ton of joins. The TUM group has published some great papers on this.

discuss

order

Sesse__|28 days ago

> It also has an advanced query planner that embraces the latest theoretical advances in query optimization, especially some techniques to unnest complex multi-join query plans, especially with queries that have a ton of joins. The TUM group has published some great papers on this.

I always wondered how good these planners are in practice. The Neumann/Moerkotte papers are top notch (I've implemented several of them myself), but a planner is much more than its theoretical capabilities; you need so much tweaking and tuning to make anything work well, especially in the cost model. Does anyone have any Umbra experience and can say how well it works for things that are not DBT-3?

senderista|28 days ago

Umbra is not an in-memory database (Hyper was). TUM gave up on the feasibility of in-memory databases several years ago (when the price of RAM relative to storage stopped falling).

atombender|27 days ago

Thanks for the correction. My understanding was that it was still in-memory but "fell back on" disk. ART indexes were touted as one of the novel aspects of Umbra, and my understanding is that ART doesn't work well as an on-disk data structure, so I guess I need to read up on the architecture now.

cmrdporcupine|28 days ago

Yeah I think the way Umbra was pitched when I watched the talks and read the paper was as more as "hybrid" in the sense that it aimed for something close to in-memory performance while optimizing the page-in/page-out performance profile.

The part of Umbra I found interesting was the buffer pool, so that's where focused most of my attention when reading though.

senderista|28 days ago

Are you thinking of Hyper being acquired by Tableau?

atombender|27 days ago

My bad. HyPer was acquired by Tableau, which was acquired by Salesforce.