MonkeyType: A system for Python that automatically generates type annotations

[+] Bnshsysjab|6 years ago|reply

While I really like type annotations I hate that they’re little more than IDE hints.

There’s nothing to stop calling functions with incorrect datatypes at runtime and I’d consider this one of the single biggest weaknesses of python right now

[+] duckerude|6 years ago|reply

There is no runtime typechecking, but with an offline typechecker like mypy they're still far more than just IDE hints. In my experience you can catch most typing problems that way.

PHP went with runtime typechecking, and it isn't pretty. The type system is much less powerful than Python's because it's really hard to add complex runtime checks to an already existing dynamic language. You can make a function check whether its argument is an array, but you can't make it check whether it's, say, an array of integers, because that would require walking through the entire array, or making every array do expensive bookkeeping in case it's needed later.

If you want more complex types you have to resort to phpdoc annotations, which are only useful for offline checkers, like Python's.

When I wrote PHP we used Psalm for offline typechecking. Between that and PhpStorm the runtime checks didn't add a lot of value, and they even made mocking a lot harder at times.

On the other hand I imagine runtime checks are more useful for legacy codebases that are too messy for static analysis. I didn't work on those.

[+] BiteCode_dev|6 years ago|reply

The fact you can rely on every API to allow duck typing is a core feature of the language. It has pros and cons, of course, like all features. But it is not an accident, it's by design.

And type hints have been designed with this in mind. It will stay that way.

If this is feature makes you unhappy, Python will make you unhappy. It's important to know what are the values that are important to you and chose your toolbox accordingly.

[+] memco|6 years ago|reply

I wrestle with this too, but I think it can be useful if you understand the limitations and usefulness of it for yourself. I know that the type hints are way for me to think more carefully about what I want the functions to do and how I'd like them to respond. I'm not expecting the type checker to enforce correctness or optimize better, I'm simply hoping that the tools will help me write the best code I can for my uses. I think it's analogous to a writer working with an editor: the writer is allowed to write whatever they want, but the editor's job is to analyze it and provide suggestions to make it better. The writer has to decide what to do with those suggestions. Sometimes hinting gets in the way and I hate it, but sometimes it helps me. I try to use the help and ignore the interruptions.

[+] jdormit|6 years ago|reply

Check out Pydantic: https://pydantic-docs.helpmanual.io/. It lets you define model classes and enforces the type hints of the models' fields at runtime. Plus you get free serialization/deserialization to dicts, JSON, or arbitrary Python classes. It's _really_ nice.

[+] yodsanklai|6 years ago|reply

> There’s nothing to stop calling functions with incorrect datatypes at runtime

You can enforce typechecking of your code on the CI with a tool like `mypy` if you want additional safety. But you'll never get the same guarantees you'd get in OCaml or Rust. It's a tradeoff between flexibility and safety.

[+] Twirrim|6 years ago|reply

There's actually value beyond them just IDE hints. They can help expose mixed type operations (e.g. foo = int + float) and other places where you may be unintentionally casting between types. Eliminating those can result in notable performance improvements, particularly in tight loops.

[+] londt8|6 years ago|reply

You can use strict types and enforce that all types are correct during runtime. Check out mypy with "--no-any-expression" flag and Pydantic.

[+] raverbashing|6 years ago|reply

If you like Java go program in Java.

Python is not Java. It does not enforce type annotations by design. It's not a weakness.

[+] heavenlyblue|6 years ago|reply

What are your thoughts on purely static tools on that for Python?

Basically my latest project that I was thinking about is to write a symbolic executor for python that would simply preserve all of the required-type information for all of the execution paths of the program.

Also the other project is to write a tool that allows you to walk all of the possible executions of a given piece of code to find which functions are being called from it - incredibly useful for refactoring of the old code.

Does something like this exists already that you've used and found useful?

[+] BiteCode_dev|6 years ago|reply

I'm not sure, but maybe you can have a look at the following projects to have an idea of prior art:

- jedi: https://github.com/davidhalter/jedi

- nuitka: nuitka.net/

- python language server: https://github.com/palantir/python-language-server

- pyright: https://github.com/microsoft/pyright

- pyre: https://pyre-check.org/

- and of course mypy: http://mypy-lang.org/

I don't think any of them qualify, but they may all have little pieces.

Also, if you need to analyze Python syntax, baron will help: https://pypi.org/project/baron/

Good luck.

[+] chrisseaton|6 years ago|reply

> simply preserve all of the required-type information for all of the execution paths of the program

A type-flow analysis? Due to Python language features such a meta-programming and runtime code execution, you may find that at the end of this all your reported types will come out as 'Any' and the tool will be useless!

[+] tln|6 years ago|reply

I wrote something like this. It used the bytecode, simulating types on the stack.

Functions with annotations had to be called with compatible types, functions without annotations were symbolically executed for each call site. Constructors were special cased so that the object attributes were set up correctly. Types for a stack slot or local could diverge and be narrowed depending on the path through the program.

The amount of metaprogramming used in practice in python made using it for real programs feel insurmountable. Also, I'm pretty sure performance would have been a problem beyond the toy programs I got it working on. But it was a fun exercise.

I'll have to dig up that code

[+] Epskampie|6 years ago|reply

I think I’ve seen several of these now for python, seems very useful.

I hope a good one will be made soon for javascript/typescript.

Or maybe it’s needed less there because of typescripts type interference?

[+] smt88|6 years ago|reply

You’re correct that TS doesn’t need it due to its excellent type inference

[+] anentropic|6 years ago|reply

I will definitely check this out!

Very similar is https://google.github.io/pytype/ which also infers and adds annotations, as well as providing a type checker which in some ways works better than mypy (recursive types!)

[+] garrettgrimsley|6 years ago|reply

Previous discussion: https://news.ycombinator.com/item?id=15982390

[+] FpUser|6 years ago|reply

Not Python exactly but I have these situation. I am writing business server in C++ that exposes JSON based RPC API and Javascript client library to access said API.

I have no idea where is that claim about high productivity of these free form languages came from. In C++ in addition to the nice intellisense compiler checks everything before sending program to debug. So I do not need to do manual testing during runtime to discover typo in my code.

In javascript I can write literally any crap and the system lets it go. Intellisense sucks big time as well. As a result of this and usual typos I have to literally go and do a lot of clicking to see if this part blows up at runtime.

So all in all I'd say productivity in C++ is way better. And C++ is not a stellar example of easy language. Actually it is easy for me. Not because I am super-duper programmer but your basic coding with the help of standard library is a piece of cake. I do not really need to be an architecture astronaut and dive into any esoteric stuff. Simply not needed

[+] gcthomas|6 years ago|reply

To answer your question, Python is several times more expressive than C++ given that is is a higher level language, and since fewer lines are needed for a given program it is likely to be quicker to write.

To quote Martelli from Google talking about how the Youtube startup outran Google Video's might so much that they had to buy them:

"Eventually we bought that little start up and we found how 20 developers ran circles around our hundreds of great developers. The solution was simple! Those 20 guys were using Python. We were using C++. So, that was YouTube and still is." [1]

Google took months to copy new YouTube features, while YouTube could catch up in a week after Google innovated.

[1] Google Books, search term 'python interviews google vs youtube'

[+] plafl|6 years ago|reply

Awesome. I will try this next Monday on a code base at work, I hope it will improve readability. I admit I'm a little skeptical about the adoption of optional typing if it's not going to affect performance but maybe this will change my mind.

[+] mkchoi212|6 years ago|reply

Haven’t been involved in the Python community for awhile but isn’t already a thing? Really cool project but thought I saw something like this already existed. If I’m right, is there anything that stands this apart from its counterparts?

[+] Browun|6 years ago|reply

What's the benefit of this over simply adding the type annotations directly? I guess this is mostly for those unwilling to understand types? Especially given the admitted limitations of inferring types, such as the add exmaple discussed; this seems to be fixing an anti-pattern problem. As this those who would build a project in Python that would largely benefit from these annotations, would be most suited to just spend the couple of hours needed to truly apply it themselves.

[+] BiteCode_dev|6 years ago|reply

One of the core requirements of the type hints design is that they don't affect the Python language.

You can still write python programs without them.

You can add them only partially.

You can also add them later.

Python stays Python.

MonkeyType lets you take a code base that is untyped, or partially typed, for historical or because it was not worth it at the time, and turn it into a fully typed project quickly.

It makes the transition between the exploration phase into the industrial phase much easier.

For me, it's kinda fantastic to be able to fiddle with a design, changing my mind again and again, without having to fight the type system, then once things are settled, add an additional safety net on top.

It's also very reassuring knowing I can just hack things, knowing that later on, if I want to take it to the next level, I have the option to change my mind about type hints.

[+] thelastbender12|6 years ago|reply

I used it sometime back to bootstrap type annotations for a web api, with pretty reasonable results. Though the codebase was admittedly small.

[+] resoluteteeth|6 years ago|reply

Have you actually spent time adding type annotations to existing python code? In my experience it's a huge pain compared to other languages. Documentation is sloppy about types because people aren't used to worrying about them, so even looking at the code it's really hard to even determine the actual types of things without just running it and poking around.

[+] lmeyerov|6 years ago|reply

Is there any level of growing community consensus on a types/contracts-for-python effort?

[+] asplake|6 years ago|reply

Nice! I see some Django-related FAQs. Anyone used this with Flask?

[+] Bnshsysjab|6 years ago|reply

It should be relatively straight forward, it deduces typing based on your unit tests and CI.

[+] rajbiswas125|6 years ago|reply

Would python finally become statically typed?

[+] BiteCode_dev|6 years ago|reply

No. It's been explicitly stated that Python will:

- never become statically typed

- never make the type hints mandatory

- never enforce the type hints at run time

Python is jokingly said to be "the second best language for most things". To achieve this, it has to be able to be decent for many different purposes such as setuping machine learning models, scripting SIG system, doing test automation, powering a web site, provisioning linux servers, scrapping web page, batch renaming files and teaching code, analyzing numbers from a hdf5 file, being the client to a database/ftp/ssh/imap system.

It means it can't just be purely functional/OOP/imperative, be must borrow from each paradigm. Be statically typed or not, but gives features from both. Etc.

Because you don't want type hints when you are in jupyter or scripting your linux box. But you may want it when your Saas reach 3 millions lines.

55 comments