Show HN: Wetlands – a lightweight Python library for managing Conda environments
30 points| arthursw | 9 months ago |arthursw.github.io
Wetlands not only simplifies the creation of isolated Conda environments with specific dependencies, but also allows you to run arbitrary Python code within those environments and retrieve the results. It uses the multiprocessing.connection and pickle modules for inter-process communication. Additionally, one can easily use shared memory between the environments, making data exchange more efficient.
Docs: https://arthursw.github.io/wetlands/latest/ Source: https://github.com/arthursw/wetlands/
I’d really appreciate any feedback. Thanks!
mushufasa|9 months ago
I've been using Conda for 10 years as my default package manager on my devices (not pipenv or poetry etc). I started because it was "the way" for data science but I kept with it because the syntax is really intuitive to me (conda create, conda activate).
I'm not sure what problem you are solving here -- the issues with conda IMO are that it is overkill for the rest of the python community, so conda-forge has gradually declined and I typically create a conda environment then use pip for the latest libraries. Managing the conda environments though is not my issue -- that part works so well that I keep with it.
If you could explain why you created this and what problems you are solving with an example, that would be helpful. All package managers are aimed at "avoiding dependency conflicts" so that doesn't really communicate to me what this is and what real problem it solves.
N1H1L|9 months ago
superkuh|9 months ago
jpecar|9 months ago
Jokes aside, this feels very meta: package manager for a package manager for a package manager. Reminds me of the old RFC1925: "you can always add another layer of abstraction". That RFC also says "perfection has been reached not when there is nothing left to add, but when there is nothing left to take away".
And as a hpc admin, I'm not offering my users any help with conda and let them suffer on their own. Instead I'm showing them the greener pastures with spack, easybuild and eessi whenever I can. And they're slowly migrating over.
vindex10|9 months ago
could you elaborate a bit more on why HPC world is special when it comes to configuring the environment?
I always feel it is a typical problem in software development, to separate operating system env from the application env.
do you use spack / easybuild on your personal computer, for example if you need to install a package that is not part of the distribution?
arthursw|9 months ago
arthursw|9 months ago
I made this library for a workflow management system, which can use any tool packaged with Conda, not just Python tools. The tools can be binaries made in C++, Java programs, or anything Conda can containerize. Note that Docker is not an option because it cannot be installed automatically on all platforms (and because of performances on non-Linux OS).
My users do not have to worry about command lines to install tools since Wetlands is installed in the workflow management system. Each tool is installed when the user executes a workflow using it.
In the bio-image analysis and medical imaging communities —as well as many others— scientists are often unfamiliar with the Python ecosystem and the concept of virtual environments. However, they rely heavily on a wide range of tools, each with numerous dependencies written in various languages. Applications with a built-in package management system like Wetlands greatly simplify their workflow by handling the complex task of setting up environments for these tools behind the scenes.
For example, Napari is an excellent viewer for multi-dimensional images written in Python which can be easily extended via plugins. There are hundreds of plugins, to do things like image denoising, registration, segmentation, particle tracking, etc. Plugins depend on tools (like Segment-Anything-Model, Cellpose, Stardist, etc.) which cannot be installed in the same environment. Wetlands can come to the rescue and isolate each plugin in its own environment.
I hope the purpose of Wetlands is clearer now :)
barapa|9 months ago
phronimos|9 months ago
[0]: https://docs.conda.io/projects/conda-build/en/latest/resourc...
ElectricalUnion|9 months ago
Now it's mostly behind us, but there used to be a time where pypi didn't have wheels (a 2012 thing), or manylinux wheels (a 2016 thing) for most libraries. pip install was a world of pain if you didn't have the "correct source packages" in your system.
And now several of those projects built back then, they're no longer projects but deployed systems, might as well stick to what is working.
joppy|9 months ago
One thing an conda package can do which an PyPI package cannot is have binary dependencies: a conda package is linked upon installation, and packages can declare dependencies on shared libraries. As common example is numeric libraries depending on a BLAS implementation: in a conda/pixi environment you will get exactly one BLAS shared library linked into your process, used by numpy, scipy, optimisers, etc. For some foundational libraries like BLAS which have multiple implementations, the user even has the power to consistently switch the implementation within the environment, eg from OpenBLAS to Intel’s MKL.
The PyPI package format does not allow binary dependencies: wheels must be self-contained when it comes to binary code (not when it comes to Python code - which hopefully makes it clear that something here is inconsistent). Take any numerical python environment and enumerate the copies of BLAS you have, it is probably 3-5. All running their own threadpools.
Another very simple example is with inbuilt modules depending on native code, like the sqlite3 module. In a conda/pixi installation you are guaranteed that the python binary links against the same sqlite3 code as the command-line sqlite3 cli tool in the same environment. Stuff like this removes many cross-language or cross-tool hassles.
I prefer uv or poetry if I’m doing anything simple or pure python (or perhaps with a small binary dependency like an event loop). But pixi is the way to go for large environments with lots of extra tools and numerical libraries.
blactuary|9 months ago
When I read the uv docs and see other people's examples, I have a hard time understanding how it works for my workflow. It seems I could continue using conda for environment management and only use uv for package installation and it would be much faster, but that also feels a little shaky and potential for error combining the two tools, and since mamba became the default solver conda is pretty fast, even when building a new env from scratch.
It feels like conda and it's ability to have multiple Python versions, with env management built in, gives me more than uv, just without the package installation speed. But I am certainly open to someone explaining uv to me in a way to disprove that
jessekv|9 months ago
uv replaces pip, conda and pip have been complementary for a long time. But I would be surprised if uv does not take on conda at some point, e.g. with a micromamba subcommand.
martinky24|9 months ago
chillpenguin|9 months ago
whalesalad|9 months ago
pyenv is all you need. it manages python versions and python virtual environments. you can create and destroy them just as easily as git branches.
pyenv + good ol' requirements.txt is really all you need.
if your env dictates containers, it's even easier to work with. FROM python:version and done.
mrweasel|9 months ago
Please don't. Never have a tool that automatically reaches out onto the internet to get a binary and then run it. Just let the user know that they need to install either pixi or micromamba. It's inherently unsafe and you don't know what will be put into those binaries in the future.
Maybe it's because I don't have a use case for this, but I don't really get what this is for. It's interesting, but I'm not really sure where I'd use it.
unknown|9 months ago
[deleted]