top | item 41571335

(no title)

adityapatadia | 1 year ago

Nice ideas, but we have chosen a really simple Kubernetes deployment. We only install the host OS (ubuntu server) and then join the self-hosted GPUs as workers in a Kubernetes cluster.

No other task is needed and our Grafana monitors if the server (and its containers) are up and running.

discuss

order

godelski|1 year ago

Sorry, my suggestion was if you need to do training. If you're only serving then the suggestions I made aren't as valuable and something like what you've done probably make more sense. But you want a proper cluster setup to do multigpu and especially multi node stuff

drio|1 year ago

> "Would you mind sharing the name of the data center?"

Curious to know what you use other than grafana in your monitoring stack. We use prometheus for metrics/alerts and Loki/promtail for logs.