top | item 43836824

(no title)

freeatnet | 10 months ago

Interesting! A friend recently asked me if I knew of any tools to improve GPU observability across their deployments (primarily for cost tracking purposes, I think), but he was looking for an OSS solution. Do you plan to open source this?

discuss

order

leeab|10 months ago

We have considered this and may go down this route in the future. One thing we asked ourselves was what open sourcing provides. Usually it's a desire for privacy or cost in the form of self-hosting, among other reasons.

Currently, our free version is self-hosted and monitors clusters with up to 64 GPUs. We feel this will work for many use cases, especially just to try it out. Monitoring GPUs typically requires you to deploy something where your GPUs live. Since you’re already installing software on your cluster, you might as well keep your data there too.

fustercluck|10 months ago

Your Github repo says you need 120 GB of persistent storage, but our bare metal GPU clusters only have local storage. Would like to try your thing, but hosting the data with the GPUs is a pretty big blocker for us.