This is really nicely written; kudos to the author for compiling a great deal of information in a readable format.
If I can be forgiven one nitpick: Poetry does not use a PEP 518-style[1] build configuration by default, which means that its use of `pyproject.toml` is slightly out of pace with the rest of the Python packaging ecosystem. That isn't to say that it isn't excellent, because it is! But you the standards have come a long way, and you can now use `pyproject.toml` with any build backend as long as you use the standard metadata.
By way of example, here's a project that's completely PEP 517 and PEP 518 compatible without needing a setup.py or setup.cfg[2]. Everything goes through pyproject.toml.
I believe that Poetry does conform to PEP 518 (i.e. it specifies `[build-system]requires/build-backend`), but not to the `dependencies` part of PEP 621 [1]. There are plans for this in the future though [2]. Though I would defer to your expertise if I'm mistaken.
> By way of example, here's a project that's completely PEP 517 and PEP 518 compatible without needing a setup.py or setup.cfg[2]. Everything goes through pyproject.toml.
Using pyproject.toml with pip / flit still has many rough edges such as pip being unable to install deps locally for development or not generating lock files. Poetry is way more mature IMO.
We start with learning that we absolutely need this Poetry thing because… it's what everyone else uses. It's refreshing to see author who can skip usual badly argued justifications and just plain admin that he does not know shit and is just following rest of the herd.
Then we continue by "solving" depependencies by usual way of ignoring them and just freezing whatever happens to be present.
Then there is inevitable firing up of virtualenv, because that's just what you have to do when dealing with messed up dependencies.
Next one is new to me. Apparently, one does not just set up git hooks nowadays but use separate tool with declarative config. Because if you ever happen upon something not covered by the Tool, that would mean you are no longer part of the herd.
Then we push our stuff straight to pypi, because of course our stuff can't possibly have any dependencies outside of python herd ecosystem. It's not like we knew our dependencies anyway.
Then comes the fun part, pulling in tox, because when you have special tool to handle dependencies, what you just need is another tool with different environment and dependency model.
Code quality section I will just skip over, seeing what pass for code quality these days makes me too sad.
What follows is setup of several proprietary projects that modern opensource seemingly can't exist without.
What is more interresting is "tyding up" by moving code from git root to subdir. Now, this is of course perfectly sensible thing to, but I wonder why is it called 'src'? Maybe some herd memeber saw compiled language somewhere and picked it up without understanding difference between compiled binary and source code?
Now don't take this as if I have problem with the article content in itself. No, as a primer to modern python packaging it's great. It's not authors fault that his work is so comprehensive it lays out bare all the idiosyncrasies, herd mentality, cargocultism and general laziness of python ecosystem these days. Or is it?
pip freeze doesn't pin transitive dependencies and so you have to pick something and Poetry is fine and actively developed.
> virtualenv, because that's just what you have to do when dealing with messed up dependencies
No that's what you do when you have multiple dependency trees for different projects on your system. Somehow people got the message that global variables were bad but still think that "random bullshit strewn on my specific system" is a great way to make software that works on other people's machines.
> Because if you ever happen upon something not covered by the Tool
You write your own hook because it's entirely plugin based.
> tox, because when you have special tool to handle dependencies, what you just need
A tool that doesn't pollute your development environment with testing packages and doesn't run your tests in your development environment, hygiene that before this tool basically nobody bothered to do because it was tedious.
Jeez, I skimmed the article, and saw what I assume to be a comprehensive but basic primer on modern packaging, like you say. But I also inferred that the author is probably a newer programmer, with only a few years of experience. He's learning about tools and best practices in an accessible language and having fun sharing knowledge through his blog.
The sentiment behind your comments is shared, but I don't see the need to sarcastically rant about it and rail all the suggestions OP made.
If anything, I'm surprised someone with more experience didn't see the post for what it is, and attacking someone's post like this just shows immaturity when you could have easily taken those opinions and formed a constructive argument or given good advice.
I had the same reaction. Not much explaining, no justification for the tooling decisions, pushing more undocumented code to pypi because why not, "I saw this other package do this", etc.
I guess it's great if you're just looking for a shortcut to push something up to pypi, but my guess is someone new to it won't really understand what's going on other than some vague sense that they're following "best practices".
And then I imagine that same person will go on to write another article like this, and on and on we go!
the irony here is python packaging has sucked forever, and this is just another example of it. "Do more with less" has never entered the average python developers mind.
you'd think herd mentality might help it but it only creates more packaging solutions.
Now days, I've stopped using python outside of tiny scripts and I will never touch it for a large project.
I wish people would just forget about pre-commit, this thing is especially useless in a setting where a CI/CD pipeline exists. It's not that hard to write a simple Makefile or shellscript to run linters on push.
Pre-commit is one of the most annoying tools that have come into existence in recent years that everyone seems to be cargo-culting. It doesn't play well with editors since in order to find the actual binary path, you'd have to open up a sqlite database to fish out the virtualenv pre-commit created. Pre-commit also increases maintenance burden, since its configuration is completely separate from your usual requirements-dev.txt/pyproject.toml/setup.cfg etc. If you have dev dependencies in one of these files because making your editors to find the pre-commit created binaries are hard, now you have to keep both versions in sync.
I really don't see the point of any pre-commit hooks unless you are the one guy that doesn't use a modern CI/CD platform.
Pre-commit is optional, you can just not install them into .git/ ... Although I'd indeed prefer just before push, or just make display warnings
One thing that's really annoying these days are CI/CD that can't be replicated locally, generating quite annoying delays in the development. Jenkins seems particularly problematic in this regard: the steps get encoded in some cryptic pet Jenkins server, and then you have to wait minutes until an agent picks it up and reaches the step you actually care about. Other tools are a little quicker, but still...
So, I think at the very least pre-commit hooks help with this "over-reliance" on the CI/CD server. It's so much better DX when you can run parts of the pipeline instantaneously.
Our CI pipelines invoke pre-commit. That way it's trivial to run the exact same tools locally as would be run in CI.
Running the tools locally is basically about tightening the development loop. Many of the commonly used tools (e.g. black, isort etc.) actually make the changes to the files so you'll never even commit failing versions. Do you really want to push changes to some remote CI system only to be told it's failed some boring QA check? There's nothing at all stopping you from doing that. Pre-commit is completely optional for each developer. I would just recommend it for sanity reasons.
Pre-commit running tools in it's own virtual environment is a feature, not a bug, in my book -- it means that the dependencies for my linter tools aren't mixed in with the dependencies for the code I'm writing.
And, keeping things separate from setup.cfg or pyproject.toml is optional: The tools still look for configuration in their usual places, so it's still possible have your black options in pyproject.toml and just a bare-bones entry to call black in your .pre-commit file if you prefer.
Do you have any tutorials for setting up CI/CD? My impression was that's all stuff that runs in the cloud but if it's something I can use on my own personal projects I'd play with it. Frankly a lot of these things become unintelligible. I've used pre-commit for things like black and autopep8 and that's all pretty understandable to me. The CI/CD things I've read all seem like everyone already understands some giant Rube Goldberg contraption that they're strapping on things for some reason that probably matters to giant dev teams.
PyPi is adding support for GitHub OIDC for publishing packages soon, so there will be no need to generate API keys - you can just grant your GitHub Actions permissions to publish to PyPi.
Thank you for linking it! Yes, this will be a huge convenience and security win for the large number of packages that use GitHub to release new versions.
However, it drinks the code coverage cool-aid that started like 30 years ago when code coverage tools emerged.
Management types said "high test code coverage == high quality"; lets bean count that!!
A great way to achieve high code coverage is to have less than robust code that does not check for crazy error cases that are really hard to reproduce in test cases.
Code coverage is a tool to help engineers write good tests. One takes the time to look at the results and improve the test. It is a poor investment to be obsessed with code cover on paths where the cost to test them greatly exceeds the value.
10% coverage and 100% are both alarm bells. Don't assume naive, easy to produce metrics are the same as quality code.
Python is the language with one of the highest 100%-coverage-to-effort ratios. The included unittest.mock framework is making it quite easy to trigger obscure errors and ensure they are handled properly.
Combined with thoughtful use of `# pragma: no cover` a 98% code coverage nowadays is an immediate warning that something was rushed. With this and type checking I feel RuntimeErrors much easier to avoid these days.
And typing, not even a mention?! :) But otherwise a great article, thank you!
100% coverage as a side effect of careful testing isnt a red flag.
Coverage is a decent (among other things) measure unless it becomes a target. Once it becomes a target you get shitty rushed tests that act mostly as cement surrounding current behavior - bugs and all.
"How do you create a Python package? How do you set up automated testing and code coverage? How do you publish the package? That's what this article teaches you." — delivered as promised!
We've been fine-ish with classic setup.py/setup.cfg + gha for publishing to pypi. But as we do OSS data science (gpu graph ai + viz), where conda is typical nowadays...
... Have to admit: We recently ended up contracting conda packaging out because it was nowhere near clear enough to make sense for our core team to untangle. Would love to see a similar tutorial on a github flow packaging & publishing to conda. Still no convinced we're doing it right for subtleties like optional dependencies: equivalent of `pip install graphistry` vs `pip install graphistry[ai]` vs `graphistry[umap-learn]`, etc.
We do a lot of deep learning and image processing and pip works much better for us. PyTorch makes wheels that contain all reauired DLLs on all systems. Maybe conda isn’t needed anymore.
I found conda-forge community quite helpful here [1]. They make feedstock repositories based on templates that cover a lot of automation. Their bots pickup updated packages in pypi and automatically file merge requests, run tests and even merge updates if tests pass successfully. Basically you only need to maintain your recipe here and there when your dependencies change.
Oh, I see... I personally barely use conga and I have no idea how that is done. I don't think I'll write any blog article like that any time soon :( Maybe you could do it!
This is really nice. The only thing I'm missing here is a simple way to bump versions. Any ideas on how to do that?
For Node, it's quite simple and even built into npm. Also the version is only part of the package.json file. For Python you probably have your version somewhere in __init__.py, and I always end up writing ugly bash scripts that modify multiple places with sed.
The title should be: How to create a "Python DISTRIBUTION package".
The term "python package" means something entirely different (or at the very least is ambiguous in a pypi/distribution context).
To add to the confusion, creating a totally normal, runnable python package in a manner that makes it completely self-contained such that it can be "distributed" in a standalone manner, while still being a totally normal boring python package, is also totally possible (if not preferred, in my view).
I guess it's a great exercise to set up a repository yourself in this way, but once you have experience with the technologies involved, it's much easier to just use a cookiecutter template [1] to set up your package. Another aspect to consider is that there are often different tools to achieve the same goal, thus, it makes sense to experiment until you've found your perfect package setup.
Excellent points! I have seen several cookiecutter templates, but like you said, those aren't very useful when you are at the very start and everything looks weird and new.
Sigh I don't see why I need to use a 3rd party tool for what should be a very straightforward process in Python out of the box. In fact, I think these days it actually is straightforward, of course once you work out what you need to do...
This is almost exactly how I set up python projects; it’s reassuring to see it set out in one place.
I started using tox-poetry-installer[1] to make tox pick up pinned versions from the lock file and reuse the private package index credentials from poetry.
This excellent article references Textualize, which - as I have just found from their website - has a really great approach to job interviews:
https://www.textualize.io/jobs
[I have no ties to this company and have never applied there.]
To that end, I've had good success with `replease-please`: https://github.com/googleapis/release-please . It's available as a GitHub Action and works out of the box very easily. It does tagging, publishing, editing the CHANGELOG, creating a release and more. Whatever you want it to, really, using a bool flag in the CI pipeline that triggers after a release-please release was made aka merged into main.
[+] [-] woodruffw|3 years ago|reply
If I can be forgiven one nitpick: Poetry does not use a PEP 518-style[1] build configuration by default, which means that its use of `pyproject.toml` is slightly out of pace with the rest of the Python packaging ecosystem. That isn't to say that it isn't excellent, because it is! But you the standards have come a long way, and you can now use `pyproject.toml` with any build backend as long as you use the standard metadata.
By way of example, here's a project that's completely PEP 517 and PEP 518 compatible without needing a setup.py or setup.cfg[2]. Everything goes through pyproject.toml.
[1]: https://peps.python.org/pep-0518/
[2]: https://github.com/trailofbits/pip-audit/blob/main/pyproject...
[+] [-] milliams|3 years ago|reply
[1] https://peps.python.org/pep-0621/
[2] https://github.com/python-poetry/roadmap/issues/3
[+] [-] pid-1|3 years ago|reply
Using pyproject.toml with pip / flit still has many rough edges such as pip being unable to install deps locally for development or not generating lock files. Poetry is way more mature IMO.
[+] [-] RojerGS|3 years ago|reply
However, I took a look at PEP 518 and failed to understand what was wrong with Poetry's default configuration. Can you help me out?
[+] [-] rmbyrro|3 years ago|reply
The font size is extremely small to the point of being unreadable on a mobile phone.
[+] [-] radonek|3 years ago|reply
We start with learning that we absolutely need this Poetry thing because… it's what everyone else uses. It's refreshing to see author who can skip usual badly argued justifications and just plain admin that he does not know shit and is just following rest of the herd. Then we continue by "solving" depependencies by usual way of ignoring them and just freezing whatever happens to be present.
Then there is inevitable firing up of virtualenv, because that's just what you have to do when dealing with messed up dependencies.
Next one is new to me. Apparently, one does not just set up git hooks nowadays but use separate tool with declarative config. Because if you ever happen upon something not covered by the Tool, that would mean you are no longer part of the herd.
Then we push our stuff straight to pypi, because of course our stuff can't possibly have any dependencies outside of python herd ecosystem. It's not like we knew our dependencies anyway.
Then comes the fun part, pulling in tox, because when you have special tool to handle dependencies, what you just need is another tool with different environment and dependency model.
Code quality section I will just skip over, seeing what pass for code quality these days makes me too sad. What follows is setup of several proprietary projects that modern opensource seemingly can't exist without. What is more interresting is "tyding up" by moving code from git root to subdir. Now, this is of course perfectly sensible thing to, but I wonder why is it called 'src'? Maybe some herd memeber saw compiled language somewhere and picked it up without understanding difference between compiled binary and source code?
Now don't take this as if I have problem with the article content in itself. No, as a primer to modern python packaging it's great. It's not authors fault that his work is so comprehensive it lays out bare all the idiosyncrasies, herd mentality, cargocultism and general laziness of python ecosystem these days. Or is it?
[+] [-] Spivak|3 years ago|reply
pip freeze doesn't pin transitive dependencies and so you have to pick something and Poetry is fine and actively developed.
> virtualenv, because that's just what you have to do when dealing with messed up dependencies
No that's what you do when you have multiple dependency trees for different projects on your system. Somehow people got the message that global variables were bad but still think that "random bullshit strewn on my specific system" is a great way to make software that works on other people's machines.
> Because if you ever happen upon something not covered by the Tool
You write your own hook because it's entirely plugin based.
> tox, because when you have special tool to handle dependencies, what you just need
A tool that doesn't pollute your development environment with testing packages and doesn't run your tests in your development environment, hygiene that before this tool basically nobody bothered to do because it was tedious.
[+] [-] vonseel|3 years ago|reply
The sentiment behind your comments is shared, but I don't see the need to sarcastically rant about it and rail all the suggestions OP made.
If anything, I'm surprised someone with more experience didn't see the post for what it is, and attacking someone's post like this just shows immaturity when you could have easily taken those opinions and formed a constructive argument or given good advice.
[+] [-] slingnow|3 years ago|reply
I guess it's great if you're just looking for a shortcut to push something up to pypi, but my guess is someone new to it won't really understand what's going on other than some vague sense that they're following "best practices".
And then I imagine that same person will go on to write another article like this, and on and on we go!
[+] [-] sigmonsays|3 years ago|reply
you'd think herd mentality might help it but it only creates more packaging solutions.
Now days, I've stopped using python outside of tiny scripts and I will never touch it for a large project.
[+] [-] wyuenho|3 years ago|reply
Pre-commit is one of the most annoying tools that have come into existence in recent years that everyone seems to be cargo-culting. It doesn't play well with editors since in order to find the actual binary path, you'd have to open up a sqlite database to fish out the virtualenv pre-commit created. Pre-commit also increases maintenance burden, since its configuration is completely separate from your usual requirements-dev.txt/pyproject.toml/setup.cfg etc. If you have dev dependencies in one of these files because making your editors to find the pre-commit created binaries are hard, now you have to keep both versions in sync.
I really don't see the point of any pre-commit hooks unless you are the one guy that doesn't use a modern CI/CD platform.
[+] [-] doliveira|3 years ago|reply
One thing that's really annoying these days are CI/CD that can't be replicated locally, generating quite annoying delays in the development. Jenkins seems particularly problematic in this regard: the steps get encoded in some cryptic pet Jenkins server, and then you have to wait minutes until an agent picks it up and reaches the step you actually care about. Other tools are a little quicker, but still...
So, I think at the very least pre-commit hooks help with this "over-reliance" on the CI/CD server. It's so much better DX when you can run parts of the pipeline instantaneously.
[+] [-] globular-toast|3 years ago|reply
Running the tools locally is basically about tightening the development loop. Many of the commonly used tools (e.g. black, isort etc.) actually make the changes to the files so you'll never even commit failing versions. Do you really want to push changes to some remote CI system only to be told it's failed some boring QA check? There's nothing at all stopping you from doing that. Pre-commit is completely optional for each developer. I would just recommend it for sanity reasons.
[+] [-] carso|3 years ago|reply
And, keeping things separate from setup.cfg or pyproject.toml is optional: The tools still look for configuration in their usual places, so it's still possible have your black options in pyproject.toml and just a bare-bones entry to call black in your .pre-commit file if you prefer.
[+] [-] fluidcruft|3 years ago|reply
[+] [-] captn3m0|3 years ago|reply
https://github.com/pypi/warehouse/issues/10619
[+] [-] woodruffw|3 years ago|reply
Thank you for linking it! Yes, this will be a huge convenience and security win for the large number of packages that use GitHub to release new versions.
[+] [-] remram|3 years ago|reply
[+] [-] RojerGS|3 years ago|reply
[+] [-] diekhans|3 years ago|reply
However, it drinks the code coverage cool-aid that started like 30 years ago when code coverage tools emerged.
Management types said "high test code coverage == high quality"; lets bean count that!!
A great way to achieve high code coverage is to have less than robust code that does not check for crazy error cases that are really hard to reproduce in test cases.
Code coverage is a tool to help engineers write good tests. One takes the time to look at the results and improve the test. It is a poor investment to be obsessed with code cover on paths where the cost to test them greatly exceeds the value.
10% coverage and 100% are both alarm bells. Don't assume naive, easy to produce metrics are the same as quality code.
Otherwise, and excellent article.
[+] [-] zeotroph|3 years ago|reply
Combined with thoughtful use of `# pragma: no cover` a 98% code coverage nowadays is an immediate warning that something was rushed. With this and type checking I feel RuntimeErrors much easier to avoid these days.
And typing, not even a mention?! :) But otherwise a great article, thank you!
[+] [-] pydry|3 years ago|reply
Coverage is a decent (among other things) measure unless it becomes a target. Once it becomes a target you get shitty rushed tests that act mostly as cement surrounding current behavior - bugs and all.
[+] [-] bvrmn|3 years ago|reply
[+] [-] RojerGS|3 years ago|reply
[+] [-] tony-allan|3 years ago|reply
[+] [-] lmeyerov|3 years ago|reply
... Have to admit: We recently ended up contracting conda packaging out because it was nowhere near clear enough to make sense for our core team to untangle. Would love to see a similar tutorial on a github flow packaging & publishing to conda. Still no convinced we're doing it right for subtleties like optional dependencies: equivalent of `pip install graphistry` vs `pip install graphistry[ai]` vs `graphistry[umap-learn]`, etc.
[+] [-] plonk|3 years ago|reply
[+] [-] werewolf|3 years ago|reply
1. https://conda-forge.org/docs/user/introduction.html
[+] [-] RojerGS|3 years ago|reply
[+] [-] RojerGS|3 years ago|reply
Also, can't believe everyone let me get away with not writing about documentation! I'll see to it that it gets done and added to the article.
[+] [-] slhck|3 years ago|reply
For Node, it's quite simple and even built into npm. Also the version is only part of the package.json file. For Python you probably have your version somewhere in __init__.py, and I always end up writing ugly bash scripts that modify multiple places with sed.
[+] [-] frumiousirc|3 years ago|reply
[+] [-] tpoacher|3 years ago|reply
The term "python package" means something entirely different (or at the very least is ambiguous in a pypi/distribution context).
To add to the confusion, creating a totally normal, runnable python package in a manner that makes it completely self-contained such that it can be "distributed" in a standalone manner, while still being a totally normal boring python package, is also totally possible (if not preferred, in my view).
(shameless plug: https://github.com/tpapastylianou/self-contained-runnable-py... )
[+] [-] davnn|3 years ago|reply
[1] https://github.com/search?q=python+package+cookiecutter
[+] [-] RojerGS|3 years ago|reply
[+] [-] zx14|3 years ago|reply
Python is a mess.
[+] [-] dannyboland|3 years ago|reply
I started using tox-poetry-installer[1] to make tox pick up pinned versions from the lock file and reuse the private package index credentials from poetry.
[1] https://github.com/enpaul/tox-poetry-installer
[+] [-] dr_kiszonka|3 years ago|reply
[I have no ties to this company and have never applied there.]
[+] [-] anonymoushn|3 years ago|reply
[+] [-] d0mine|3 years ago|reply
[+] [-] RojerGS|3 years ago|reply
[+] [-] shadycuz|3 years ago|reply
I don't have a blog post but you can see the process on my personal project https://github.com/DontShaveTheYak/cf2tf
Check out the merged PR's and the GitHub actions.
I even do alpha releases to test pypi.
[+] [-] diarrhea|3 years ago|reply
[+] [-] unknown|3 years ago|reply
[deleted]
[+] [-] Bullfight2Cond|3 years ago|reply
https://pdm.fming.dev/
[+] [-] iggy_knights|3 years ago|reply
Another good tool (which was endorsed by the PyPA) is Hatch - https://hatch.pypa.io/latest/environment/
I currently use PDM because it supports conda virtual environments for isolation, but am keeping an eye on Hatch.
[+] [-] nogbit|3 years ago|reply
[+] [-] fetzu|3 years ago|reply
[+] [-] RojerGS|3 years ago|reply