(no title)
nwlieb | 1 year ago
You can also use the PTWRITE instruction to attach metadata to the stream which seems very powerful.
Hope we can see such an extension on AMD as well.
nwlieb | 1 year ago
You can also use the PTWRITE instruction to attach metadata to the stream which seems very powerful.
Hope we can see such an extension on AMD as well.
Sesse__|1 year ago
Typically you get a cycle count every six branches, give or take.
yosefk|1 year ago
https://yosefk.com/blog/profiling-in-production-with-functio...
https://danluu.com/perf-tracing/
Regarding the slowdown - magic-trace reports 2-10% slowdowns which IMO is actually fine even for production (unless this adds up to a huge dollar cost, for most people it won't) since in return for this you are actually capable to debug the rare slowdowns which are the worst part of your user experience.
However, the hardware feature that I propose (https://yosefk.com/blog/profiling-in-production-with-functio...) would likely have lower overhead since it relies on software issuing tracing instructions, eg at each function entry & exit (rather than any control flow change), and it could be variously selective (eg exclude short functions without loops; and/or you could configure the hardware to ignore short calls. BTW maybe you can with Intel Performance Trace, too, I'm just not really familiar with it.)
yosefk|1 year ago
Like I said there, I'm frankly shocked that all CPUs haven't raced to implement similar features, that magic-trace which is built on top of Intel Performance Trace isn't used more widely, and that developers aren't insisting on running under magic-trace in production and requiring to deploy on Intel servers for that purpose.
The extension I propose is much simpler, and seems similar to what PTWRITE would do if it was the only feature in Intel Performance Trace. I have a lot of experience in chip architecture, and I believe that every CPU maker and every chip maker can support this easily - much more so than full feature parity with Intel Performance Trace. I hope they will!
nwlieb|1 year ago
I wonder if this is a general issue relating to memory ordering or out-of-order execution, or whether this can be implemented more efficiently in a different extension.
Thank you for the linked article! Agreed on the huge potential for using these tools in production. The community could definitely benefit (even indirectly) by pushing for this kind of instruction set more widely.