top | item 46851231

(no title)

Umbra is not an in-memory database (Hyper was). TUM gave up on the feasibility of in-memory databases several years ago (when the price of RAM relative to storage stopped falling).

discuss

atombender|27 days ago

Thanks for the correction. My understanding was that it was still in-memory but "fell back on" disk. ART indexes were touted as one of the novel aspects of Umbra, and my understanding is that ART doesn't work well as an on-disk data structure, so I guess I need to read up on the architecture now.

cmrdporcupine|27 days ago

No, again, ART was Hyper's specialty. Because you're right, ART specializes at in-memory workloads it is not amenable to paging.

I believe Umbra is heavily BTree based, just like its cousin LeanStore.

One of its specific innovations is its buffer pool which uses virtual memory overcommit and multiple possible buffer sizes to squeeze better performance out of page management.

The talk at https://www.youtube.com/watch?v=pS2_AJNIxzU is delightful.

My understanding is the research projects LeanStore & Umbra -- and now I assume the product CedarDB based on the people involved, etc. -- are systems based on the observation that a) existing on-disk systems aren't built well with the characteristics of nVME/SSD drives in mind b) RAM prices up to this year were not dropping at the same rate as they were early in the 2010s, meaning that pure in-memory databases were not so competitive, so it's important to look at how we can squeeze performance out of systems that perform paging. And of course in the last 6 months this has become extremely relevant with the massive spike in RAM prices.

That and the query compilation stuff, I guess, which I know less about.

cmrdporcupine|28 days ago

Yeah I think the way Umbra was pitched when I watched the talks and read the paper was as more as "hybrid" in the sense that it aimed for something close to in-memory performance while optimizing the page-in/page-out performance profile.

The part of Umbra I found interesting was the buffer pool, so that's where focused most of my attention when reading though.