top | item 34802288

(no title)

ishitatsuyuki | 3 years ago

IMHO, perf's decision to write whole stacks directly to the disk and unwinding them as a post-process is a really bad design. It wastes disk space, and as the author pointed out, it also has a lot of IO overhead.

As an alternative approach, https://github.com/mstange/samply processes data streamed from perf and unwinds it in realtime. The unwinding overhead is surprisingly low: it only takes around 1% of (single) CPU per CPU profiled. Solving the disk waste alone has been a tremendous improvement of profiling experience. As a bonus, the unwinding and symbolization works reliably while I frequently had postprocessing not terminating when using the perf CLI directly.

discuss

order

sitkack|3 years ago

Are you saying that Dwarf information should be unwound in realtime or that it should use framepointers and debug information to trivially sample the stacks and record the symbols?

If you have framepointers and debug information, it is both high resolution and fast. DWARF is a fallback for not having framepointers.

If you are saying the DWARF information should be processed at the point of use and not copied and processed later, then I concur. But we should also encourage folks to compiled WITH `-fno-omit-frame-pointer` and `-g`