top | item 36023746

(no title)

LukeEF | 2 years ago

I am not sure of the exact statistic, but something like 95% of all production databases are less than 10GB. There seems to be a 'FAANG hacker' fascination with 'extreme-scale' which probably comes from seeing the challenges faced by the handful of organizations working at that level. Much of the time most graph database users want (as in why are they there) a DB that allows you to flexibly model your data and run complex queries. They probably also want some sort of interoperability. If you can do that well for 10GB, that is holy grail enough. We certainly found that developing graph database TerminusDB [1] - most users have smaller production DBs, more lightly use bells and whistles features, and really want things like easy schema evolution.

[1] https://github.com/terminusdb/terminusdb

discuss

threeseed|2 years ago

This research paper is talking about performance whilst you're talking about scalability.

Those are related but are distinct from each other.

And sure about 95% of companies would have their needs met with a simpler system but that does leave a lot of companies who will not. And for those of us in say finance doing customer/fraud analytics I would welcome all the performance I can get.

loeg|2 years ago

> This research paper is talking about performance whilst you're talking about scalability. Those are related but are distinct from each other.

The paper has "Scale to Hundreds of Thousands of Cores" in the title. I have not yet read the paper but it seems unlikely it doesn't talk about scalability.

rocqua|2 years ago

This isn't a graph database like neo4j. This is a graph database like I hoped neo4j would be. It's not about having an easier time working with schemas. It's about analyzing graphs that are too big to fit in RAM. Transaction analysis for banks, trafic analysis of roads, failure resilience of utility networks, etc.

In these kinds of workloads you quickly run into performance bottlenecks. Even in-memory analyses need care to avoid conplete pointer chasing slowdowns.

I do still hope this is fast in like a single CPU 32 core 64GB system with an SSD. But if this takes a cluster to be useful, then I will still love it.

RhodesianHunter|2 years ago

But the 5% of places where that kind of scale is needed are the ones paying the top 1% salary band, so this is the content distributed systems engineers like to read about and work on.

bubblethink|2 years ago

>There seems to be a 'FAANG hacker' fascination

Yeah, but the hacker fascination is what drives progress. You could have made the same type of argument about ML, and we would have been content with MNIST.

im_down_w_otp|2 years ago

I think I kind of agree with this.

One of the simpler supported backends for our Modality product (https://auxon.io/products/modality), which results in a data model that’s a special case of a DAG for modeling big piles of casually correlated events from piles and piles of distributed components for “system of systems” use cases, is built using SQLite, and the scaling limiter is almost always how efficiently the traces & telemetry can be exfiltrated from the systems under test/observation before how fast the ingest path can actually record things becomes a problem.

That said, I do love me some RDMA action. 10 years ago I was fiddling with getting Erlang clustering working via RDMA on a little 5 node Infiniband cluster. To mixed results.

belter|2 years ago

Interesting that you mention the value 10 GB, as it is the size of a DynamoDB partition or an AWS Aurora cell...

parentheses|2 years ago

I agree with your sentiment but I suppose you're considering the wrong statistics. Instead you should consider: - how many jobs have interviews that necessitate knowing how to handle extreme scale

- proportion of jobs (not companies) requiring extreme scale - the fact that non extreme scales are the long tail doesn't mean it's a fat tail

- proportion of buyers/potential users that walk away from the inability to handle extreme scale

... and more sarcastically

- proportion of articles about extreme scale

- proportion of repos about extreme scale

mumblemumble|2 years ago

Only one anecdote, but I found out a while after starting at my current job that directly questioning the extent to which scale-out was actually needed to solve a problem during a technical interview question is the thing that made me stand out from the rest of the crowd, and landed me the job. Being able to constructively challenge assumptions is an incredibly valuable job skill, and good managers know that.

swader999|2 years ago

I get that angle but I also see orgs capturing too much data. What's the use case for it? Not sure but if we ever do need it we'll have it is the typical answer.

fnord77|2 years ago

really? I don't quite believe that. We're a tiny company with maybe 70 customers and db is roughly 11Tb.

paulddraper|2 years ago

Assuming 1kb per "record" that's 150 million records per customer.

Definitely a data heavy product, wherever it is that you're offering.

(Unless you keep large blobs in the DB. But database scale has more to do with records than raw storage.)

cubefox|2 years ago

That seems a lot. What type of data?

crabmusket|2 years ago

Congratulations, you're in the 5%, along with us :)

unknown|2 years ago

[deleted]

ElFitz|2 years ago

Just have a look at the size of all of English Wikipedia. Or all of StackOverflow.

And these are seemingly huge services.

And yet…

deegles|2 years ago

What are the databases with easy schema evolution?