(no title)
pbalau | 2 months ago
A glance at the r/python will show that almost every week there is a new pypi package generated by ai, with dubious utility.
I did a quick research using bigquery-public-data.pypi.distribution_metadata and out of 844719 package, 126527 have only 1 release, almost 15%.
While is not unfathomable that a chunk of those really only needed one release and/or were manually written, the number is too high. And pypi is struggling for resources.
I wonder how much crap there is on github and I think this is an even larger issue, with the new versions of LLMs being trained on crap generated by older versions.
viraptor|2 months ago
pbalau|2 months ago
It might not be a storage problem right now, but the practice of publishing crap is dangerous, because it can be easily abused. I think it is very easy to publish via pypi a lot of very heavy packages.
OptionOfT|2 months ago
Until you look at the source code and notice it's all held together by Duct tape.