top | item 41633567

(no title)

yaleman | 1 year ago

The fact that tensorflow takes up 12.9TiB is truly horrifying, and most of that because they use pypi's storage as a dumping ground for their pre-release packages. What a nightmare they've put on other people's shoulders.

discuss

order

theamk|1 year ago

I think pypi should require larger packages, like tensorflow, to self-host their releases.

There is all support for that already - the pypi index file contains arbitrary URL for data file and a sha256 hash. Let pypi store the hashes, so there is no shenanigans with versions being secretly overridden, but point the actual data URLs to other servers.

(There must obviously be a balance for availability vs pypi's cost, so maybe pypi hosts only smaller files, and larger files must be self-hosted? Or pypi hosts "major releases" while pre-releases are self-hosted? And there should be manual exceptions for "projects with funding from huge corporations" and "super popular projects from solo developers"...)

aragilar|1 year ago

I believe tensorflow does remove old pre-releases (I know other projects do), so that number I think might be fairly static?

That tensorflow is that big isn't surprising, given the install of it plus its dependencies is many gigabytes (you can see the compressed sizes of wheels on the release pages e.g. https://pypi.org/project/tensorflow/#files), and the "tensorflow" package (as opposed to the affiliated packages) based on https://py-code.org/stats is 965.7 GiB, which really only includes a relatively small number of pre-releases.

Why tenserflow is that big comes down to needing to support many different kinds of GPUs with different ecosystem versions, and I suspect the build time of them with zig cc (assuming it works, and doesn't instead require pulling in a different compiler/toolchain) would be so excessive (especially on IoT/weaker devices) that it would make the point of the exercise moot.

amoshebb|1 year ago

Is it though? If it saves one engineer one afternoon that storage has paid for itself, and this thing has hundreds of thousands of downloads a day.

Wouldn’t it be more horrifying to force everybody who wants to use a prerelease to waste an afternoon getting it to build just to save half a hard drive?

skeledrew|1 year ago

That's besides the point though. Yes, having prebuilt binaries is very helpful. But what happens if Fastly decides against renewing next time and there is nobody else willing to sponsor? The cost is through the sky for the PSF to handle. Where does PyPI go?