top | item 25914419

(no title)

volta87 | 5 years ago

From the POV of an HPC cluster user, when using SLURM's `srun` or similar to schedule a job, this now allows you to use `srun --container=<your container>`, and it will start each node where you app run using the container, and make sure MPI, GPUs, etc, all work.

If you don't know anything about containers, it probably will be a bit hard to imagine what this buys you, but don't worry, as more clusters start moving towards this model, you'll have to learn about containers at some point.

From the POV of the HPC cluster, it means that the `module` system can be replaced with containers, and that can significantly lower the maintenance overhead of the cluster. In a sense, it turns HPC cluster users into HPC cluster maintainers (that have to build their own images, preparing their own environment, etc).

discuss

No comments yet.