top | item 46367625

(no title)

pbalau | 2 months ago

> keep the commons clean [from the second link]

A glance at the r/python will show that almost every week there is a new pypi package generated by ai, with dubious utility.

I did a quick research using bigquery-public-data.pypi.distribution_metadata and out of 844719 package, 126527 have only 1 release, almost 15%.

While is not unfathomable that a chunk of those really only needed one release and/or were manually written, the number is too high. And pypi is struggling for resources.

I wonder how much crap there is on github and I think this is an even larger issue, with the new versions of LLMs being trained on crap generated by older versions.

discuss

order

viraptor|2 months ago

Storage is relatively cheap. Packages with only one release and little usage in the wild will be a rounding error in cost. A few years ago, Pypi required an over million dollars equivalent in CDN traffic per month. Storing a million of small dead packages is not worth the concern.

pbalau|2 months ago

While my research was very shallow, the issue is with the practice. And I didn't look at how large those packages are.

It might not be a storage problem right now, but the practice of publishing crap is dangerous, because it can be easily abused. I think it is very easy to publish via pypi a lot of very heavy packages.

OptionOfT|2 months ago

Same on r/rust. Post after post with a new project that does something groundbreaking.

Until you look at the source code and notice it's all held together by Duct tape.