perf or Intel VTune are the two standard choices AFAIK. Both have a certain learning curve, both are extremely capable in the right hands. (Well, on macOS you're pretty much locked to using Instruments; I don't know if Callgrind works there but would suspect it's an uphill battle.)
Callgrind is a CPU simulator that can output a profile of that simulation. I guess it's semantics whether you want to call that a profiler or not, but my point is that you don't need a simulator+profiler combo when you can just use a profiler on its own.
(There are exceptions where the determinism of Callgrind can be useful, like if you're trying to benchmark a really tiny change and are fine with the bias from the simulation diverging from reality, or if you explicitly care about call count instead of time spent.)
perf on the whole system, with the whole software stack compiled with stack pointers, flamegraphs for visualisation, is an essential starting point for understanding real world performance problems.
Sesse__|1 year ago
Callgrind is a CPU simulator that can output a profile of that simulation. I guess it's semantics whether you want to call that a profiler or not, but my point is that you don't need a simulator+profiler combo when you can just use a profiler on its own.
(There are exceptions where the determinism of Callgrind can be useful, like if you're trying to benchmark a really tiny change and are fine with the bias from the simulation diverging from reality, or if you explicitly care about call count instead of time spent.)
rwmj|1 year ago