top | item 34802288

(no title)

IMHO, perf's decision to write whole stacks directly to the disk and unwinding them as a post-process is a really bad design. It wastes disk space, and as the author pointed out, it also has a lot of IO overhead.

As an alternative approach, https://github.com/mstange/samply processes data streamed from perf and unwinds it in realtime. The unwinding overhead is surprisingly low: it only takes around 1% of (single) CPU per CPU profiled. Solving the disk waste alone has been a tremendous improvement of profiling experience. As a bonus, the unwinding and symbolization works reliably while I frequently had postprocessing not terminating when using the perf CLI directly.

discuss

sitkack|3 years ago

Are you saying that Dwarf information should be unwound in realtime or that it should use framepointers and debug information to trivially sample the stacks and record the symbols?

If you have framepointers and debug information, it is both high resolution and fast. DWARF is a fallback for not having framepointers.

If you are saying the DWARF information should be processed at the point of use and not copied and processed later, then I concur. But we should also encourage folks to compiled WITH `-fno-omit-frame-pointer` and `-g`

irogers|3 years ago

This could be a great Linux perf GSoC project. Projects and mentors are being looked for: https://wiki.linuxfoundation.org/gsoc/2023-gsoc-perf

lathiat|3 years ago

Parca also have done work to unwind DWARF in kernel with eBPF: https://www.polarsignals.com/blog/posts/2022/11/29/profiling...

Edit: refer to another comment in this thread: https://news.ycombinator.com/item?id=34809265