top | item 42123387

(no title)

ksd482 | 1 year ago

This was a nice short read. A simple (temporary) solution, yet a clever one.

How was he managing the instances? Was he using kubernetes, or did he write some script to manage the auto terminating of the instances?

It would also be nice to know why:

1. Killing was quicker than restarting. Perhaps because of the business logic built into the java application?

2. Killing was safe. How was the system architectured so that the requests weren't dropped altogether.

EDIT: formatting

discuss

order

jumploops|1 year ago

The author mentions 2011 as the time they switched from REST to RPC-ish APIs, and this issue was related to that migration.

Kubernetes launched in 2014, if memory serves, and it took a bit before widespread adoption, so I’m guessing this was some internal solution.

This was a great read, and harkens back to the days of managing 1000s of cores on bare metal!

braggerxyz|1 year ago

> It would also be nice to know why:

1. Killing was quicker than restarting.

If you happen to restart one of the instances that was hanging in the infinite thread, you can wait a very long time until the Java container actually decides to kill itself because it did not finish its graceful shutdown within the alotted timeout period. Some Java containers have a default of 300s for this. In this circumstance kill -9 is faster by a lot ;)

Also we had circumstances where the affected Java container did not stop even if the timeout was reached because the misbehaving thread did consume the whole cpu and none was left for the supervisor thread. Then you can only kill the host process of the JVM.