I will blame overlong copyright term lengths. 70 years after authors death or 95 years after publication, allowing most recent work to enter the commons effectively after a century, or more, from now [0].
Given the argument over LLMs consuming books illegally, I think publishers could be a little concerned that an LLM that combined partial previews on every modern work on a subject might be a destroyer of the market for the average book on the subject with the license to do so having been properly granted via this feature.
Among the less-important things I'd like to send back in time to my past-self:
"The trend in digitized book passages will reverse, and they will become harder and harder to find with time, so clip your own copies of everything you like to quote."
I’m genuinely curious how you feel about LLMs being trained on pirated material. Not being snarky here.
Your comment reflects the old “information wants to be free” ideals that used to dominate places like HN, Slashdot, and Reddit. But since LLMs arrived, a lot of the loudest voices here argue the opposite position when it comes to training data.
I’ve been trying to understand whether people have actually changed their views, or whether it’s mostly a shift in who is speaking up now.
I just checked and yes, search inside of books with previews is still possible.
(a) when you search books.google.com and find a book with a preview, it opens their new book viewer - the search is at the bottom of the page. You can also click "View All" to see all references of your search in that book.
(b) if you go to the book homepage (clicking X in the top right of the book viewer if that opened), there's still a "Search Inside Book" next to the "Preview" button under the title.
Anna's Archive or any piracy of book does not replace Google Books search functions at all. The search functions of these website just looks inside the PDF text, Google Books helped me many time to find manuscripts or old books that were not OCR'd properly. It's really a big loss.
They don't do full text search anymore esp for copyrighted books. I wonder if this is not a regression but an intent to give them a let up in the AI race.
So, if you search for some text that occurs at the end of one chunk, will it then preview a following chunk? And could chaining these chunks give you the entire book?
If so, I could see someone doing this to exfiltrate books.
You're talking about in-book search (TFA is about search across all books), and yes that was indeed once a known technique for extracting whole or nearly whole books.
That's why publishers responded by excluding sections of books from search (it will list the pages but you can't view them), and individual Google accounts became limited in how many extra pages they were ever allowed to see of an individual book beyond the standard preview pages.
But then LibGen, Z-lib, and Anna's Archive became popular and built up their collections...
If search gives you a preview with a few surrounding words, it is fairly simple to abuse search with quotation marks to extract bigger and bigger sections of the books, potentially till you have the whole book.
Since I pretty much only use Google Books for public domain books, old magazines, and newspapers I haven't noticed any problem with it. Maybe it's not as dead as this person thinks.
Google Books is long dead. If you click on the author's name in one of the results, it will search inauthor:"Author's Name" and this search will return garbage because it chokes on double quotes. This has been true for at least a couple of years; Google Books is not compatible with itself. Changing the double quotes to single quotes fixes it. Also, lately, when you filter only for books that have Full View some results that have Full View get dropped for no intelligible reason.
Nobody is looking at it. I wouldn't be surprised if the preview search was switched off by accident.
For me Books is only useful (and it is very useful) for books out of copyright, 100+ years old. Sometimes they aren't at archive.org.
I hate Google, but I think it's a bit absurd to criticize them on this if somehow it's over AI. The only reason Google created Books may even have been AI, but they were hoping to have the books open to everyone, and the publishers and authors whose full text is being blocked are literally the people who stopped it from happening. Maybe they spoke up about AI, too. I find it even hard to even criticize that Google doesn't take care of Books - it has no purpose or profit potential for them anymore, it's obviously charity that they don't take it down completely.
I suspect it's actually the opposite. Standard inverted index text search is incredibly cheap and mature. Vector search requires generating embeddings and running approximate nearest neighbor queries, which is significantly more compute intensive than simple keyword matching. If they switched, it wasn't to save on compute costs.
[+] [-] abetusk|2 months ago|reply
> The largest truly open library in human history
[0] https://en.wikipedia.org/wiki/Anna%27s_Archive
[+] [-] cft|2 months ago|reply
[+] [-] NedF|2 months ago|reply
[deleted]
[+] [-] al_borland|2 months ago|reply
“Our mission is to organize the world’s information and make it universally accessible and useful”
https://about.google/company-info/
[+] [-] tick_tock_tick|2 months ago|reply
[+] [-] zb3|2 months ago|reply
[+] [-] crazygringo|2 months ago|reply
Almost certainly, this is something that publishers requested the removal of, under threat of requiring previews to be removed entirely.
Books that are out of copyright still have full search and display enabled.
So blame publishers, not Google.
[+] [-] abetusk|2 months ago|reply
[0] https://en.wikipedia.org/wiki/Copyright_law_of_the_United_St...
[+] [-] adamnemecek|2 months ago|reply
[+] [-] tamarinddreams|2 months ago|reply
[+] [-] Terr_|2 months ago|reply
"The trend in digitized book passages will reverse, and they will become harder and harder to find with time, so clip your own copies of everything you like to quote."
[+] [-] mystraline|2 months ago|reply
Check out library genesis, Anna's archive, and scihub for content.
Piracy isnt theft if buying isnt ownership.
[+] [-] kevin42|2 months ago|reply
Your comment reflects the old “information wants to be free” ideals that used to dominate places like HN, Slashdot, and Reddit. But since LLMs arrived, a lot of the loudest voices here argue the opposite position when it comes to training data.
I’ve been trying to understand whether people have actually changed their views, or whether it’s mostly a shift in who is speaking up now.
[+] [-] GorbachevyChase|2 months ago|reply
[+] [-] adamnemecek|2 months ago|reply
[+] [-] Zathman|2 months ago|reply
(a) when you search books.google.com and find a book with a preview, it opens their new book viewer - the search is at the bottom of the page. You can also click "View All" to see all references of your search in that book.
(b) if you go to the book homepage (clicking X in the top right of the book viewer if that opened), there's still a "Search Inside Book" next to the "Preview" button under the title.
[+] [-] adamnemecek|2 months ago|reply
[+] [-] lr0|2 months ago|reply
[+] [-] didip|2 months ago|reply
Then it would have been hella useful.
[+] [-] adamnemecek|2 months ago|reply
Here are two screenshots taken on Jan 20 and Jan 23 https://bsky.app/profile/adamnemecek.bsky.social/post/3mdbup...
They don't do full text search anymore esp for copyrighted books. I wonder if this is not a regression but an intent to give them a let up in the AI race.
[+] [-] toephu2|2 months ago|reply
Similarly, a year ago or so ChatGPT could summarize YouTube videos. Google put a stop to that so now only Gemini can summarize YouTube videos.
[+] [-] jeffbee|2 months ago|reply
[+] [-] pfdietz|2 months ago|reply
If so, I could see someone doing this to exfiltrate books.
[+] [-] crazygringo|2 months ago|reply
That's why publishers responded by excluding sections of books from search (it will list the pages but you can't view them), and individual Google accounts became limited in how many extra pages they were ever allowed to see of an individual book beyond the standard preview pages.
But then LibGen, Z-lib, and Anna's Archive became popular and built up their collections...
[+] [-] xorsula1|2 months ago|reply
[+] [-] Andrex|2 months ago|reply
"Hey, remove search?"
"OK, it was costing money anyways."
[+] [-] breppp|2 months ago|reply
[+] [-] londons_explore|2 months ago|reply
[+] [-] btrettel|2 months ago|reply
More on the HathiTrust project: https://en.wikipedia.org/wiki/HathiTrust
Though I don't know how many of the HathiTrust books are the "preview" kind the Reddit post mentions. Maybe none are?
[+] [-] bryanrasmussen|2 months ago|reply
[+] [-] mikestew|2 months ago|reply
"But a few days ago they removed ALL search functions for any books with previews, which are disproportionately modern books." <emphasis mine>
[+] [-] adamnemecek|2 months ago|reply
[+] [-] damnitbuilds|2 months ago|reply
Protest this by pirating, until copyright terms are reduced to make copyright once again a net benefit for society.
[+] [-] ChrisArchitect|2 months ago|reply
[+] [-] caplane|2 months ago|reply
[+] [-] pessimizer|2 months ago|reply
Nobody is looking at it. I wouldn't be surprised if the preview search was switched off by accident.
For me Books is only useful (and it is very useful) for books out of copyright, 100+ years old. Sometimes they aren't at archive.org.
I hate Google, but I think it's a bit absurd to criticize them on this if somehow it's over AI. The only reason Google created Books may even have been AI, but they were hoping to have the books open to everyone, and the publishers and authors whose full text is being blocked are literally the people who stopped it from happening. Maybe they spoke up about AI, too. I find it even hard to even criticize that Google doesn't take care of Books - it has no purpose or profit potential for them anymore, it's obviously charity that they don't take it down completely.
[+] [-] unknown|2 months ago|reply
[deleted]
[+] [-] bulge|1 month ago|reply
[deleted]
[+] [-] kingstnap|2 months ago|reply
Which tends to be kind of poop compared to true text search.
[+] [-] storystarling|2 months ago|reply