I am not sure of the exact statistic, but something like 95% of all production databases are less than 10GB. There seems to be a 'FAANG hacker' fascination with 'extreme-scale' which probably comes from seeing the challenges faced by the handful of organizations working at that level. Much of the time most graph database users want (as in why are they there) a DB that allows you to flexibly model your data and run complex queries. They probably also want some sort of interoperability. If you can do that well for 10GB, that is holy grail enough. We certainly found that developing graph database TerminusDB [1] - most users have smaller production DBs, more lightly use bells and whistles features, and really want things like easy schema evolution.[1] https://github.com/terminusdb/terminusdb
threeseed|2 years ago
Those are related but are distinct from each other.
And sure about 95% of companies would have their needs met with a simpler system but that does leave a lot of companies who will not. And for those of us in say finance doing customer/fraud analytics I would welcome all the performance I can get.
loeg|2 years ago
The paper has "Scale to Hundreds of Thousands of Cores" in the title. I have not yet read the paper but it seems unlikely it doesn't talk about scalability.
rocqua|2 years ago
In these kinds of workloads you quickly run into performance bottlenecks. Even in-memory analyses need care to avoid conplete pointer chasing slowdowns.
I do still hope this is fast in like a single CPU 32 core 64GB system with an SSD. But if this takes a cluster to be useful, then I will still love it.
RhodesianHunter|2 years ago
bubblethink|2 years ago
Yeah, but the hacker fascination is what drives progress. You could have made the same type of argument about ML, and we would have been content with MNIST.
im_down_w_otp|2 years ago
One of the simpler supported backends for our Modality product (https://auxon.io/products/modality), which results in a data model that’s a special case of a DAG for modeling big piles of casually correlated events from piles and piles of distributed components for “system of systems” use cases, is built using SQLite, and the scaling limiter is almost always how efficiently the traces & telemetry can be exfiltrated from the systems under test/observation before how fast the ingest path can actually record things becomes a problem.
That said, I do love me some RDMA action. 10 years ago I was fiddling with getting Erlang clustering working via RDMA on a little 5 node Infiniband cluster. To mixed results.
belter|2 years ago
parentheses|2 years ago
- proportion of jobs (not companies) requiring extreme scale - the fact that non extreme scales are the long tail doesn't mean it's a fat tail
- proportion of buyers/potential users that walk away from the inability to handle extreme scale
... and more sarcastically
- proportion of articles about extreme scale
- proportion of repos about extreme scale
mumblemumble|2 years ago
swader999|2 years ago
fnord77|2 years ago
paulddraper|2 years ago
Definitely a data heavy product, wherever it is that you're offering.
(Unless you keep large blobs in the DB. But database scale has more to do with records than raw storage.)
cubefox|2 years ago
crabmusket|2 years ago
unknown|2 years ago
[deleted]
ElFitz|2 years ago
And these are seemingly huge services.
And yet…
deegles|2 years ago