top | item 47045732

(no title)

netdevphoenix | 13 days ago

Why would they? Github has 28 million public repos, Codeberg only hit 300k last year. Anyway, Codeberg was just a placeholder for 'repo source _less_ likely to be in their training data'. Codeberg was quick candidate for a place to find a big old codebase with non-sensitive data.

It is indeed hard but the guys at Codeberg are certainly an order of magnitude better than Github as they opted out of the main AI crawlers, regularly block IPs known to belong to AI startups and they allow you to make your repos only be accessible to logged in users.

You seem be going on a tangent, here. Main point was about performing a well documented test anyway.

discuss

order

simonw|13 days ago

My question about the "obvious" thing was genuine - it wasn't obvious to me.