top | item 45928099

(no title)

arconis987 | 3 months ago

would like to see work like this, but for datasets in the hundreds of TB or single-digit PB

but i definitely agree about this point

> Cluster fatigue is real

imo, the concept of “extremely ephemeral query workers” is under-explored

stateless, maintenance-free, burstable fleets of query workers is what I would like to see more of in the future.

it’s how we do it, and it gives us full-text search on multi-hundred terabyte data sets in S3, where queries finish in a handful of seconds. our approach: https://docs.scanner.dev/scanner/what-and-why/how-it-works/h...

anyone else doing ephemeral query workers fleets?

discuss

order

ramraj07|3 months ago

Yes.. its called snowflake? Theyre exactly that and why they work so well. I know youre asking for an OSS but what snowflake offers is a fleet of servers that can build your cluster in a second as opposed to minutes that you need if you want to spin it up yourself..

sagarm|3 months ago

> extremely ephemeral query workers

Reading data from S3 can really add up, so this isn't as straightforward as it seems.