joncatanio's comments

joncatanio | 3 years ago | on: Modern Python Performance Considerations

This is a great read, and it's fantastic to see all the work being done to evaluate and improve the language!

The dynamic-nature of the language is actually something that I had studied a few years back [1]. Particularly the variable and object attribute look ups! My work was just a master's thesis, so we didn't go too deep into more tricky dynamic aspects of the language (e.g. eval, which we restricted entirely). But we did see performance improvements by restricting the language in certain ways that aid in static analysis, which allowed for more performant runtime code. But for those interested, the abstract of my thesis [2] gives more insight into what we were evaluating.

Our results showed that restricting dynamic code (code that is constructed at run time from other source code) and dynamic objects (mutation of the structure of classes and objects at run time) significantly improved the performance of our benchmarks.

There was also some great discussion on HN when I had posted our findings as well [3].

[1]: https://github.com/joncatanio/cannoli

[2]: https://digitalcommons.calpoly.edu/theses/1886/

[3]: https://news.ycombinator.com/item?id=17093051

joncatanio | 7 years ago | on: Show HN: Cannoli – A compiler for a subset of Python written in Rust

This is exactly why PyPy blew both Cannoli and CPython away in the microbenchmarks used for analysis. As I've said elsewhere, the focus was on comparing Cannoli (unoptimized) to Cannoli (optimized) and not a direct comparison to CPython or PyPy. However, the microbenchmarks were running iterations of 1-10 million, giving the JIT plenty of time to find beneficial traces in the PyPy interpreter.

joncatanio | 7 years ago | on: Show HN: Cannoli – A compiler for a subset of Python written in Rust

After I ran the experimental evaluation, I had similar thoughts. If PyPy ever matches the current version of CPython I'm not sure why one wouldn't use PyPy over CPython. The biggest hurdle is matching support for popular libraries like NumPy, Tensorflow, Pandas, Scipy etc. I know they're working on supporting these, it's definitely a lot of work to do, easier said than done.

joncatanio | 7 years ago | on: Show HN: Cannoli – A compiler for a subset of Python written in Rust

Hey Steve, glad to see you saw this post, now I get to personally thank you for all of your work on the Rust project. It's a really great community, it was very easy to get help in the IRC when I'd get stuck on new-to-me concepts. The documentation was also incredible, and the language itself is awesome! So thanks, and keep up the great work over there, I'm going to be definitely peddling Rust when I can haha.

joncatanio | 7 years ago | on: Show HN: Cannoli – A compiler for a subset of Python written in Rust

That work is great!

> We have presented the first limit study that tries to quantify the costs of various dynamic language features in Python.

This is spot on what we were doing as well, that's great to have this as a reference.

> 1. If I understand the source on GitHub correctly, you parse Python source code yourself. I'm fairly sure your simulation would be a lot more faithful if you compiled Python bytecode instead. Did you consider this, and if yes, was there a particular reason not to do it that way?

We did not consider this actually. This would be a very interesting concept to explore. For the unoptimized version of Cannoli we do look up variables in a list of hash tables (which represent the current levels of scope). We did perform a scope optimization that then uses indices to access scope elements and this was much faster. However, it meant that the use of functions like `exec` and `del` were no longer permitted since we would not be able to statically determine all scope elements at run time (consider `exec(input())`, this could introduce anything into scope and we can't track that).

If you know, how does CPython resolve scope if it maps variable names to indices? In the case of `exec(input())` and say the input string is `x = 1`, how would it compile bytecode to allocate space for x and index into the value? I don't have much experience with the CPython source, so please excuse me if the question seems naive :)!

> 2. Where do you actually make useful use of Rust's static ownership system? I've only skimmed that part of the thesis very quickly, but I missed how you track ownership in Python programs and can be sure that things don't escape. Can you give an example of a Python program using dynamic allocation that your compiler maps to Rust with purely static ownership tracking and freeing of the memory when it's no longer used?

Elements of the Value enum (that encapsulates all types) relied on `Rc` and `RefCell` to defer borrow checking to run time. Consider a function who has a local variable that instantiates some object. Once that function call has finished Cannoli will pop that local scope table and all mappings will be dropped when it goes out of scope. The object encapsulated in a `Rc` will have it's reference count decremented to 0 and be freed.

This is how I've interpreted the Rust borrow checker, I will say that this was the first time I had ever used Rust so it's possible that I am not completely right on this. But once that table goes out of scope, all elements should be dropped by the borrow checker and any Rc should be decremented/dropped.

> 3. Related to 2: Why bother with any notion of ownership at all? Did you try mapping everything to Rust's reference counting and just letting it do its best? I'm wondering how much slower that would be. Python is also reference counted, after all, and I guess the Rust compiler should have more opportunities to optimize reference counting operations.

I did defer a lot of borrow checking to run time with Rc, but I tried to use this as little as possible to maximize optimizations that may result from static borrow checking.

> 4. In general, do you have an idea why your code is slower than Python, besides the hash table variable lookup issue I mentioned above?

If you remove the 3 outlier benchmarks (that are slow because of Rust printing and a suboptimal implementation of slices), Cannoli isn't too far off from CPython. And in fact, with the ray casting benchmark, Cannoli began to outperform CPython at scale. This leads me to believe that the computations in Cannoli are faster than CPython. However, there is still a lot of work to do to create a more performant version of Cannoli. The compiler itself was only developed for ~4 months, I have no doubt that more development time would yield a better results.

That being said, I think the biggest slowdown comes from features of Rust that might not have been utilized. This is just speculation, but I think the use of lifetimes could benefit the compiled code a lot. I also think there may be more elegant solutions to some of the translations (e.g. slices), that could provide speedup. But I can't say that there is one thing causing the slowdown, and profiling the benchmarks (excluding the outliers) support that.

joncatanio | 7 years ago | on: Show HN: Cannoli – A compiler for a subset of Python written in Rust

Great question! The "Compiling Python" section of my thesis is pretty much an explanation of how I had to translate elements of Python into Rust because of the borrow checker. There were a couple tricks (like using closures for functions) to getting around compile-time borrow checking. Some situations required the use of Rc & RefCell to provide multiple references to mutable data, this defers borrow checking to run time. So yes, the borrow checker got in the way. But I didn't have to write a garbage collector because the automatic memory management was handled via Rust's ownership rules (the caveat here is with cyclical references which would need to be tracked, this work was omitted for time).

It does complicate the generated code, I don't know if Rust is the greatest intermediate representation. But I do think it was a better choice than C. Debugging the generated code was so great because of the detail that the Rust compiler displays for warnings/errors.

I'd be interested in seeing how a Python interpreter written in Rust would compare to CPython, this would probably make use of more Rust optimizations (than trying to generate code).

joncatanio | 7 years ago | on: Show HN: Cannoli – A compiler for a subset of Python written in Rust

I would certainly like to. The base implementation of Cannoli is the best place to continue this. Some optimizations that were implemented apply restrictions to parts of the language and following these any further would ultimately diverge from the Python project. That being said, many of the optimizations could still be done and "fall back" on unoptimized code at run time if needed. I think this project certainly shows a lot of promise, it just hasn't been developed as long as other projects so it still needs plenty of work :).

joncatanio | 7 years ago | on: Show HN: Cannoli – A compiler for a subset of Python written in Rust

Yes :)! So I mention PEP 3107 (https://www.python.org/dev/peps/pep-3107/) in the thesis. This allows type annotations in the function signature. Cannoli leverages both type annotations in function signatures and assignments to output optimized code. Other projects like Pythran also use type annotations.

PEP 3107 does say:

> By itself, Python does not attach any particular meaning or significance to annotations.

However, I think this will change especially as more projects begin to outperform CPython. In the "Results > Object Optimization" section of the thesis paper, I cover using these very type annotations to optimize the code.

The biggest problem with Python annotations right now is that they don't really mean anything. Nothing is really enforced so it is totally valid to have 'x : int = "string"'. The compiler would have to just ignore this annotation since it was provided the wrong data. This could also be difficult to identify if a variable was being used and its type mislabeled. So it's not perfect but I think it's a step in the right direction.

joncatanio | 7 years ago | on: Show HN: Cannoli – A compiler for a subset of Python written in Rust

Does it claim better performance than CPython or PyPy? I can't quite find the reference to PyPy (after a quick scan of the page/github repo. It looks like a cool project! They seem to be doing a lot of optimizations, which they list on their github page https://github.com/kayhayen/Nuitka#optimization. It looks like the git repo was created ~2013 (I dunno if it was hosted/worked-on elsewhere prior to that) so they've had a few years to optimize. Cool project though!

joncatanio | 7 years ago | on: Show HN: Cannoli – A compiler for a subset of Python written in Rust

I've used Python quite a bit for various projects. For a compilers class I wrote a compiler in Python and had a blast. So I spoke with that advisor and decided I wanted to get a Master's and he had suggested a project that analyses Python. The main question concerned which dynamic features of a language cause performance issues. Python just happened to have a lot of the features that we hypothesized caused slowdowns so we chose it. Plus we were both familiar with the language so that was a draw.

The same analysis could be done on JS or Ruby, it would be cool to see if a similar compiler would yield the same performance results for restricting features in JS/Ruby. It would also validate this work nicely as well.

joncatanio | 7 years ago | on: Show HN: Cannoli – A compiler for a subset of Python written in Rust

This is very cool! Thanks for doing that :)!

I'm actually moving out to NYC this July to work for Major League Baseball. The Advanced Media division (MLBAM). I'll be doing some software engineering there, mainly API work for various apps, I'm very excited about it!

I'll have to work on compilers in my free time haha, I really enjoyed the work I did on this thesis.

joncatanio | 7 years ago | on: Show HN: Cannoli – A compiler for a subset of Python written in Rust

As others have commented, AOT compilation is limited to the information available at compile time. Various features of Python like dynamic typing and object/class mutation (via del) preclude many static analysis techniques. In Cannoli, this meant that the compiler had to also generate code that manages scope at run time. Whenever an identifier was encountered in the compiled code a hashmap would be searched to find the bound value. This overhead becomes expensive, and the thesis covers optimizations that avoid this. PyPy's JIT operates on the PyPy interpreter itself, finding linear lists of operations that are frequently used. It can then compile these operations to bytecode so the next time that trace is encountered it can execute the compiled code. The self-analysis at run time provides information that an AOT compiler just doesn't have.

That being said, I did leave a few suggestions in the "future work" section that talk about writing an AOT compiler for RPython (the version of Python that PyPy's interpreter is written in). This would provide more information at compile time and would be an interesting comparison between a Python interpreter compiled AOT versus a Python interpreter with a JIT (PyPy).

joncatanio | 7 years ago | on: Show HN: Cannoli – A compiler for a subset of Python written in Rust

I recently finished the code for my thesis and wanted to share with you all :). The goal of the thesis was to evaluate language features of Python that were hypothesized to cause performance issues. Quantifying the cost of these features could be valuable to language designers moving forward. Some interesting results were observed when implementing compiler optimizations for Python. An average speedup of 51% was achieved across a number of benchmarks. The thesis paper is linked on the GitHub repo, I encourage you to read it!

This was also my first experience with Rust. The Rust community is absolutely fantastic and the documentation is great. I had very little trouble with the "learning curve hell" that I hear associated with the language. It was definitely a great choice for this work.

I also included PyPy in my validation section and "WOW". It blew both Cannoli and CPython out of the water in performance. The work they're doing is very interesting and it definitely showed on the benchmarks I worked with.

page 1