(no title)
mnahkies | 1 month ago
For example, our integration test suite on a particular service has become quite slow, but it's not particularly clear where the time is going. I suspect a decent amount of time is being spent talking to postgres, but I'd like a low touch way to profile this
6keZbCECT2uB|1 month ago
There are a few challenges here. - Off-cpu is missing the interrupt with integrated collection of stack traces, so you instrument a full timeline when they move on and off cpu or periodically walk every thread for its stack trace - Applications have many idle threads and waiting for IO is a common threadpool case, so its more challenging to associate the thread waiting for a pool doing delegated IO from idle worker pool threads
Some solutions: - Ive used nsight systems for non GPU stuff to visualize off CPU time equally with on CPU time - gdb thread apply all bt is slow but does full call stack walking. In python, we have py-spy dump for supported interpreters - Remember that any thing you can represent as call stacks and integers can be converted easily to a flamegraph. eg taking strace durations by tid and maybe fd and aggregating to a flamegraph
trillic|1 month ago
Kuinox|1 month ago