top | item 41115430

(no title)

tcas | 1 year ago

My guess is this is all due to CloudWatch logs putlogevents failures.

By default a docker container configured with awslogs runs in "blocking" mode. As logs get logged, docker will buffer them and push to CloudWatch logs frequently. In case the log stream is faster than what the buffer can absorb, stdout/stderr get blocked and then the container will freeze on the logging write call. If putlogevents is failing, buffers are probably filling up and freezing containers. I assume most of AWS uses it's own logging system, which could cause these large, intermittent failures.

If you're okay dropping logs, add something like this to the container logging definition:

  "max-buffer-size": "25m"
  "mode": "non-blocking"

discuss

mbaumbach|1 year ago

I just want to thank you for providing this info. This was exactly the cause of some of our issues and this config setting restored functionality to a major part of our app.

tcas|1 year ago

Happy it helped. If you have a very high throughput app (or something that logs gigantic payloads), the "logging pauses" may slow down your app in non-obvious ways. Diagnosing it the very first time took forever (I think I straced the process in the docker container and saw it was hanging on `write(1)`)

https://aws.amazon.com/blogs/containers/preventing-log-loss-...