top | item 45649779

(no title)

matharmin | 4 months ago

The project looks great! Object storage is often so much better in terms of cost efficiency than a database on EBS. It's often 10-20x more expensive for EBS after taking into account that you need 3x replicas for a typical MongoDB deployment, and need to over-provision the storage. And being able to scale compute independently from storage is great.

The biggest things I'm missing from the docs (checked the github page and the site) is seeing what MongoDB features are supported or not. I've worked with Azure CosmosDB before, and even though it claims MongoDB compatibility, it has many compatibility issues as soon as you have more than a basic CRUD application. Some examples include proper ChangeStream support, partial index support, multi-key index support, set of supported aggregation pipeline operations, tailable cursor support, snapshot queries.

Another thing that's not clear: What does multi-master/multi-write mean in practice? What happens if you write to the same data at the same time on different nodes?

discuss

iamlintaoz|4 months ago

That's exactly the reason. S3 is better in almost all aspects compared with EBS, except the performance part, and I am glad that our Data Substrate technology solved this issue gracefully [1].

As for the compatibility, we are leveraging some of the code from 4.03 version (the last AGPL version), and we have a very good compatibility (we will show some results in later blog posts). As I mentioned in another reply post, the Mongo APIs are reasonably stable over the last few years, only seeing very minor changes. Most of the later versions improved upon performance and transaction supports, which we support natively with our underlying data substrate technologies. Still, if you have any specific API that you feel is needed, we'd be happy to implement and we welcome community contributions.

Multi-master/multi-writer means it is a fully distributed database. Of course you can run it in single node configurations and get all the single node benefits, but if deployed in a cluster, you do not need to worry about which node to write to, or how data are sharded. If you writes potentially can cause conflicts (i.e. write to the same data at the same time on different nodes), the concurrency-control will handle that for you. In fact, you will encounter the same issue even in a single node configuration, since a single node is still multi-threaded.

[1] https://www.eloqdata.com/blog/2025/07/16/data-substrate-bene...