top | item 32258783

How to create a Python package in 2022

424 points| kieto | 3 years ago |mathspp.com | reply

145 comments

order
[+] woodruffw|3 years ago|reply
This is really nicely written; kudos to the author for compiling a great deal of information in a readable format.

If I can be forgiven one nitpick: Poetry does not use a PEP 518-style[1] build configuration by default, which means that its use of `pyproject.toml` is slightly out of pace with the rest of the Python packaging ecosystem. That isn't to say that it isn't excellent, because it is! But you the standards have come a long way, and you can now use `pyproject.toml` with any build backend as long as you use the standard metadata.

By way of example, here's a project that's completely PEP 517 and PEP 518 compatible without needing a setup.py or setup.cfg[2]. Everything goes through pyproject.toml.

[1]: https://peps.python.org/pep-0518/

[2]: https://github.com/trailofbits/pip-audit/blob/main/pyproject...

[+] pid-1|3 years ago|reply
> By way of example, here's a project that's completely PEP 517 and PEP 518 compatible without needing a setup.py or setup.cfg[2]. Everything goes through pyproject.toml.

Using pyproject.toml with pip / flit still has many rough edges such as pip being unable to install deps locally for development or not generating lock files. Poetry is way more mature IMO.

[+] RojerGS|3 years ago|reply
Your nitpick is forgiven! Thanks a lot for this information, I was not aware of this...

However, I took a look at PEP 518 and failed to understand what was wrong with Poetry's default configuration. Can you help me out?

[+] rmbyrro|3 years ago|reply
If I can be forgiven another nitpick regarding readability:

The font size is extremely small to the point of being unreadable on a mobile phone.

[+] radonek|3 years ago|reply
What a great article.

We start with learning that we absolutely need this Poetry thing because… it's what everyone else uses. It's refreshing to see author who can skip usual badly argued justifications and just plain admin that he does not know shit and is just following rest of the herd. Then we continue by "solving" depependencies by usual way of ignoring them and just freezing whatever happens to be present.

Then there is inevitable firing up of virtualenv, because that's just what you have to do when dealing with messed up dependencies.

Next one is new to me. Apparently, one does not just set up git hooks nowadays but use separate tool with declarative config. Because if you ever happen upon something not covered by the Tool, that would mean you are no longer part of the herd.

Then we push our stuff straight to pypi, because of course our stuff can't possibly have any dependencies outside of python herd ecosystem. It's not like we knew our dependencies anyway.

Then comes the fun part, pulling in tox, because when you have special tool to handle dependencies, what you just need is another tool with different environment and dependency model.

Code quality section I will just skip over, seeing what pass for code quality these days makes me too sad. What follows is setup of several proprietary projects that modern opensource seemingly can't exist without. What is more interresting is "tyding up" by moving code from git root to subdir. Now, this is of course perfectly sensible thing to, but I wonder why is it called 'src'? Maybe some herd memeber saw compiled language somewhere and picked it up without understanding difference between compiled binary and source code?

Now don't take this as if I have problem with the article content in itself. No, as a primer to modern python packaging it's great. It's not authors fault that his work is so comprehensive it lays out bare all the idiosyncrasies, herd mentality, cargocultism and general laziness of python ecosystem these days. Or is it?

[+] Spivak|3 years ago|reply
> Poetry thing because…

pip freeze doesn't pin transitive dependencies and so you have to pick something and Poetry is fine and actively developed.

> virtualenv, because that's just what you have to do when dealing with messed up dependencies

No that's what you do when you have multiple dependency trees for different projects on your system. Somehow people got the message that global variables were bad but still think that "random bullshit strewn on my specific system" is a great way to make software that works on other people's machines.

> Because if you ever happen upon something not covered by the Tool

You write your own hook because it's entirely plugin based.

> tox, because when you have special tool to handle dependencies, what you just need

A tool that doesn't pollute your development environment with testing packages and doesn't run your tests in your development environment, hygiene that before this tool basically nobody bothered to do because it was tedious.

[+] vonseel|3 years ago|reply
Jeez, I skimmed the article, and saw what I assume to be a comprehensive but basic primer on modern packaging, like you say. But I also inferred that the author is probably a newer programmer, with only a few years of experience. He's learning about tools and best practices in an accessible language and having fun sharing knowledge through his blog.

The sentiment behind your comments is shared, but I don't see the need to sarcastically rant about it and rail all the suggestions OP made.

If anything, I'm surprised someone with more experience didn't see the post for what it is, and attacking someone's post like this just shows immaturity when you could have easily taken those opinions and formed a constructive argument or given good advice.

[+] slingnow|3 years ago|reply
I had the same reaction. Not much explaining, no justification for the tooling decisions, pushing more undocumented code to pypi because why not, "I saw this other package do this", etc.

I guess it's great if you're just looking for a shortcut to push something up to pypi, but my guess is someone new to it won't really understand what's going on other than some vague sense that they're following "best practices".

And then I imagine that same person will go on to write another article like this, and on and on we go!

[+] sigmonsays|3 years ago|reply
the irony here is python packaging has sucked forever, and this is just another example of it. "Do more with less" has never entered the average python developers mind.

you'd think herd mentality might help it but it only creates more packaging solutions.

Now days, I've stopped using python outside of tiny scripts and I will never touch it for a large project.

[+] wyuenho|3 years ago|reply
I wish people would just forget about pre-commit, this thing is especially useless in a setting where a CI/CD pipeline exists. It's not that hard to write a simple Makefile or shellscript to run linters on push.

Pre-commit is one of the most annoying tools that have come into existence in recent years that everyone seems to be cargo-culting. It doesn't play well with editors since in order to find the actual binary path, you'd have to open up a sqlite database to fish out the virtualenv pre-commit created. Pre-commit also increases maintenance burden, since its configuration is completely separate from your usual requirements-dev.txt/pyproject.toml/setup.cfg etc. If you have dev dependencies in one of these files because making your editors to find the pre-commit created binaries are hard, now you have to keep both versions in sync.

I really don't see the point of any pre-commit hooks unless you are the one guy that doesn't use a modern CI/CD platform.

[+] doliveira|3 years ago|reply
Pre-commit is optional, you can just not install them into .git/ ... Although I'd indeed prefer just before push, or just make display warnings

One thing that's really annoying these days are CI/CD that can't be replicated locally, generating quite annoying delays in the development. Jenkins seems particularly problematic in this regard: the steps get encoded in some cryptic pet Jenkins server, and then you have to wait minutes until an agent picks it up and reaches the step you actually care about. Other tools are a little quicker, but still...

So, I think at the very least pre-commit hooks help with this "over-reliance" on the CI/CD server. It's so much better DX when you can run parts of the pipeline instantaneously.

[+] globular-toast|3 years ago|reply
Our CI pipelines invoke pre-commit. That way it's trivial to run the exact same tools locally as would be run in CI.

Running the tools locally is basically about tightening the development loop. Many of the commonly used tools (e.g. black, isort etc.) actually make the changes to the files so you'll never even commit failing versions. Do you really want to push changes to some remote CI system only to be told it's failed some boring QA check? There's nothing at all stopping you from doing that. Pre-commit is completely optional for each developer. I would just recommend it for sanity reasons.

[+] carso|3 years ago|reply
Pre-commit running tools in it's own virtual environment is a feature, not a bug, in my book -- it means that the dependencies for my linter tools aren't mixed in with the dependencies for the code I'm writing.

And, keeping things separate from setup.cfg or pyproject.toml is optional: The tools still look for configuration in their usual places, so it's still possible have your black options in pyproject.toml and just a bare-bones entry to call black in your .pre-commit file if you prefer.

[+] fluidcruft|3 years ago|reply
Do you have any tutorials for setting up CI/CD? My impression was that's all stuff that runs in the cloud but if it's something I can use on my own personal projects I'd play with it. Frankly a lot of these things become unintelligible. I've used pre-commit for things like black and autopep8 and that's all pretty understandable to me. The CI/CD things I've read all seem like everyone already understands some giant Rube Goldberg contraption that they're strapping on things for some reason that probably matters to giant dev teams.
[+] captn3m0|3 years ago|reply
PyPi is adding support for GitHub OIDC for publishing packages soon, so there will be no need to generate API keys - you can just grant your GitHub Actions permissions to publish to PyPi.

https://github.com/pypi/warehouse/issues/10619

[+] woodruffw|3 years ago|reply
Hey, that’s my issue :-)

Thank you for linking it! Yes, this will be a huge convenience and security win for the large number of packages that use GitHub to release new versions.

[+] remram|3 years ago|reply
Surely you mean OAuth2? I really hope you mean OAuth2.
[+] RojerGS|3 years ago|reply
Oh, this would be neat! Looking forward to it!
[+] diekhans|3 years ago|reply
Lot of good info and saved away!

However, it drinks the code coverage cool-aid that started like 30 years ago when code coverage tools emerged.

Management types said "high test code coverage == high quality"; lets bean count that!!

A great way to achieve high code coverage is to have less than robust code that does not check for crazy error cases that are really hard to reproduce in test cases.

Code coverage is a tool to help engineers write good tests. One takes the time to look at the results and improve the test. It is a poor investment to be obsessed with code cover on paths where the cost to test them greatly exceeds the value.

10% coverage and 100% are both alarm bells. Don't assume naive, easy to produce metrics are the same as quality code.

Otherwise, and excellent article.

[+] zeotroph|3 years ago|reply
Python is the language with one of the highest 100%-coverage-to-effort ratios. The included unittest.mock framework is making it quite easy to trigger obscure errors and ensure they are handled properly.

Combined with thoughtful use of `# pragma: no cover` a 98% code coverage nowadays is an immediate warning that something was rushed. With this and type checking I feel RuntimeErrors much easier to avoid these days.

And typing, not even a mention?! :) But otherwise a great article, thank you!

[+] pydry|3 years ago|reply
100% coverage as a side effect of careful testing isnt a red flag.

Coverage is a decent (among other things) measure unless it becomes a target. Once it becomes a target you get shitty rushed tests that act mostly as cement surrounding current behavior - bugs and all.

[+] bvrmn|3 years ago|reply
I guess you have no much experience with python and how easy to get 100% coverage in it.
[+] RojerGS|3 years ago|reply
Thanks for the words of caution! For such a small package, I think 100% code coverage isn't necessarily a bad thing yet :P But you raise valid points!
[+] tony-allan|3 years ago|reply
"How do you create a Python package? How do you set up automated testing and code coverage? How do you publish the package? That's what this article teaches you." — delivered as promised!
[+] lmeyerov|3 years ago|reply
We've been fine-ish with classic setup.py/setup.cfg + gha for publishing to pypi. But as we do OSS data science (gpu graph ai + viz), where conda is typical nowadays...

... Have to admit: We recently ended up contracting conda packaging out because it was nowhere near clear enough to make sense for our core team to untangle. Would love to see a similar tutorial on a github flow packaging & publishing to conda. Still no convinced we're doing it right for subtleties like optional dependencies: equivalent of `pip install graphistry` vs `pip install graphistry[ai]` vs `graphistry[umap-learn]`, etc.

[+] plonk|3 years ago|reply
We do a lot of deep learning and image processing and pip works much better for us. PyTorch makes wheels that contain all reauired DLLs on all systems. Maybe conda isn’t needed anymore.
[+] werewolf|3 years ago|reply
I found conda-forge community quite helpful here [1]. They make feedstock repositories based on templates that cover a lot of automation. Their bots pickup updated packages in pypi and automatically file merge requests, run tests and even merge updates if tests pass successfully. Basically you only need to maintain your recipe here and there when your dependencies change.

1. https://conda-forge.org/docs/user/introduction.html

[+] RojerGS|3 years ago|reply
Oh, I see... I personally barely use conga and I have no idea how that is done. I don't think I'll write any blog article like that any time soon :( Maybe you could do it!
[+] RojerGS|3 years ago|reply
Hey, original author here. Thanks a lot for sharing this!

Also, can't believe everyone let me get away with not writing about documentation! I'll see to it that it gets done and added to the article.

[+] slhck|3 years ago|reply
This is really nice. The only thing I'm missing here is a simple way to bump versions. Any ideas on how to do that?

For Node, it's quite simple and even built into npm. Also the version is only part of the package.json file. For Python you probably have your version somewhere in __init__.py, and I always end up writing ugly bash scripts that modify multiple places with sed.

[+] frumiousirc|3 years ago|reply
A very nice post. Consider adding a prominent RSS/atom feed for your blog. My lack of finding one means I won't easily catch any future posts.
[+] tpoacher|3 years ago|reply
The title should be: How to create a "Python DISTRIBUTION package".

The term "python package" means something entirely different (or at the very least is ambiguous in a pypi/distribution context).

To add to the confusion, creating a totally normal, runnable python package in a manner that makes it completely self-contained such that it can be "distributed" in a standalone manner, while still being a totally normal boring python package, is also totally possible (if not preferred, in my view).

(shameless plug: https://github.com/tpapastylianou/self-contained-runnable-py... )

[+] davnn|3 years ago|reply
I guess it's a great exercise to set up a repository yourself in this way, but once you have experience with the technologies involved, it's much easier to just use a cookiecutter template [1] to set up your package. Another aspect to consider is that there are often different tools to achieve the same goal, thus, it makes sense to experiment until you've found your perfect package setup.

[1] https://github.com/search?q=python+package+cookiecutter

[+] RojerGS|3 years ago|reply
Excellent points! I have seen several cookiecutter templates, but like you said, those aren't very useful when you are at the very start and everything looks weird and new.
[+] zx14|3 years ago|reply
Sigh I don't see why I need to use a 3rd party tool for what should be a very straightforward process in Python out of the box. In fact, I think these days it actually is straightforward, of course once you work out what you need to do...

Python is a mess.

[+] dannyboland|3 years ago|reply
This is almost exactly how I set up python projects; it’s reassuring to see it set out in one place.

I started using tox-poetry-installer[1] to make tox pick up pinned versions from the lock file and reuse the private package index credentials from poetry.

[1] https://github.com/enpaul/tox-poetry-installer

[+] dr_kiszonka|3 years ago|reply
This excellent article references Textualize, which - as I have just found from their website - has a really great approach to job interviews: https://www.textualize.io/jobs

[I have no ties to this company and have never applied there.]

[+] anonymoushn|3 years ago|reply
It's impressive that this involves 3 human-readable configuration languages and 2 markup languages
[+] d0mine|3 years ago|reply
You can use pyproject.toml for the tox configuration too:

    [tool.tox]
    legacy_tox_ini = """<tox.ini content here>"""
[+] RojerGS|3 years ago|reply
Yeah, but that looks a bit odd and you lose the syntax highlighting... I considered that but ended up going with a tox.ini file.
[+] shadycuz|3 years ago|reply
You can take this a step further and completely automate the release of your package. That means the tagging, publishing and the GitHub release notes.

I don't have a blog post but you can see the process on my personal project https://github.com/DontShaveTheYak/cf2tf

Check out the merged PR's and the GitHub actions.

I even do alpha releases to test pypi.

[+] diarrhea|3 years ago|reply
To that end, I've had good success with `replease-please`: https://github.com/googleapis/release-please . It's available as a GitHub Action and works out of the box very easily. It does tagging, publishing, editing the CHANGELOG, creating a release and more. Whatever you want it to, really, using a bool flag in the CI pipeline that triggers after a release-please release was made aka merged into main.
[+] Bullfight2Cond|3 years ago|reply
Poetry uses non-standard dependency specification formats. PDM is like Poetry but faster/more standards compliant.

https://pdm.fming.dev/

[+] iggy_knights|3 years ago|reply
Good thing about finally converging on some sort of standard means tools become more interoperable:

Another good tool (which was endorsed by the PyPA) is Hatch - https://hatch.pypa.io/latest/environment/

I currently use PDM because it supports conda virtual environments for isolation, but am keeping an eye on Hatch.

[+] nogbit|3 years ago|reply
Fantastic article. A clickable TOC at the top would be a great addition.
[+] fetzu|3 years ago|reply
Excellent and very informative post, thank you very much !
[+] RojerGS|3 years ago|reply
Thanks for the nice words. Was there anything that was unclear or that you think was missing?