top | item 44325607

(no title)

blop | 8 months ago

I found this pdf presentation with lots of great technical details about data management and a devops infra oriented view of this telescope: https://ci-compass.org/assets/602137/2025jan23_cicompass_rub...

Worth a read for the devops guys around here!

  - about 20TB per day, around 100PB expected for the whole survey
  - 0.5PB ceph cluster for local data
  - workloads on 20 nodes kubernetes cluster/argocd
  - physical infra managed with puppet/ansible
  - 100Gbs(+40Gs backup) fiber connection to US-based datacenter for further processing

discuss

newpavlov|8 months ago

I wonder if they could reduce the data size at rest by using specialized compressing techniques. Your probably could build an averaged "model" of the sky observed by the telescope (probably with account for stellar parallax and bright planets) and store only compressed diffs, not full images.

But I guess, since storage is relatively cheap, it's simply impractical to bother with such complexity.

xhkkffbf|8 months ago

There's quite a bit of black out there. That should compress easily.

blop|8 months ago

actually the telescope devops guys were hiring a couple years ago on HN: https://news.ycombinator.com/item?id=38101085 :-D

Melatonic|8 months ago

Insanity - love it

cycomanic|8 months ago

If you think this is insanity I encourage you to look up the expected data to come out of the SKA. Even after several processing steps they expect several hundred PB/year (the raw data which is not being archived is several orders of magnitude more). That is only SKA-low I think for SKA-mid we are talking Exabyte/year. I recall that their chief scientist said once they are operational they will process more data than google and facebook combined.