top | item 38149263

(no title)

For what it's worth, we also have millions of previously unreleased (in bulk) books in English, mostly non-fiction, that are available for torrent on our website.

High-speed access available for anyone who can do at-scale text extraction, or who can supply us with new collections.

discuss

sillysaurusx|2 years ago

Anna! I just want to say, I love you. Everything about what you’re doing is heroic. Whoever and wherever you are, thank you.

Please focus on your opsec. The more visible you become, the progressively angrier people will get. Don’t do anything silly like edit your Wikipedia page from your house.

With that out of the way, someone I know happens to have the original books3 epub files. I think they can be convinced to send them to you. It’s only 200,000 books, but that could theoretically grow your collection by 10% or so. I don’t know whether that would be helpful to you (you’ve far surpassed books3 at this point), but if so, let me know.

Given the legal risks, the best course of action for AI companies is probably to ignore English and European books entirely. There is plenty of Chinese data, and the models would learn all the same concepts without exposing anyone to lawsuits.

stavros|2 years ago

Since you're pretty knowledgeable about these things, I think I should ask here: I've made a fairly simple design for a program based on BitTorrent, that will allow people to "donate" their disk space to organizations like archive.org, Anna's Archive, and anything else that needs data hosted.

Basically, you download a client, say "allocate 2 TB of my disks to whatever archive.org/donate/disk.rss" says, and the server/client combination ensures you download and seed the rarest 2TB of the collection.

This design is also open, in the sense that the server can share the database of torrents it contains, and anyone can use it to fetch any of the files in the dataset from the swarm.

Would something like this be at all useful? I've emailed a few archivists, but I got no response, and the one person I've managed to talk to about this said there have been a few attempts on this, but they always fail for one reason or another.

pilimi_anna|2 years ago

Yes please, put us in touch by email. Or feel free to email me yourself and we can set up more secure comms from there. Thanks so much for everything you are doing as well!