(no title)
kenjin4096 | 2 months ago
Also this time, I'm pretty confident because there are two perf improvements here: the dispatch logic, and the inlining. MSVC can actually convert switch-case interpreters to threaded code automatically if some conditions are met [1]. However, it does not seem to do that for the current CPython interpreter. In this case, I suspect the CPython interpreter loop is just too complicated to meet those conditions. The key point also that we would be relying on MSVC again to do its magic, but this tail calling approach gives more control to the writers of the C code. The inlining is pretty much impossible to convince MSVC to do except with `__forceinline` or changing things to use macros [2]. However, we don't just mark every function as forceinline in CPython as it might negatively affect other compilers.
[1]: https://github.com/faster-cpython/ideas/issues/183 [2]: https://github.com/python/cpython/issues/121263
jtrn|2 months ago
Also, I’m not that familiar with the whole process, but I just wanted to say that I think you were too hard on yourself during the last performance drama. So thank you again and remember not to hold yourself to an impossible standard no one else does.
halflings|2 months ago
That was a very niche error, that you promptly corrected, no need to be so apologetic about it! And thanks for all the hard work making Python faster!
kenjin4096|2 months ago