(no title)
monus | 1 year ago
[1]: https://github.com/qawolf/crik
[2]: The Party Must Go On - Resume Pods After Spot Instance Shutdown, https://kccnceu2024.sched.com/event/1YeP3
monus | 1 year ago
[1]: https://github.com/qawolf/crik
[2]: The Party Must Go On - Resume Pods After Spot Instance Shutdown, https://kccnceu2024.sched.com/event/1YeP3
JoosToopit|1 year ago
Does crik guarantee the order of events (saving a checkpoint should be followed by killing the old process/pod, which should be followed by a restoration - the order of these 3 events is strict) and given that criu can checkpoint and restore sockets state correctly - how does that work for kubernetes? The new pod will have a different IP.
monus|1 year ago
rmetzler|1 year ago
Usually clients would connect to a Kubernetes svc to not have the problem with changing IPs. Even for just a single pod I would do that.
alexeldeib|1 year ago
Animats|1 year ago
Doing this on a networked application is going to be iffy. The restored program sees a time jump. The world in which it lives sees a replay of things the restore program already did once, if restore is from a checkpoint before a later crash.
If you just want to migrate jobs within a cluster, there's Xen.