Show HN: I used Claude Code to discover connections between 100 books
524 points| pmaze | 1 month ago |trails.pieterma.es
I built a system for Claude Code to browse 100 non-fiction books and find interesting connections between them.
I started out with a pipeline in stages, chaining together LLM calls to build up a context of the library. I was mainly getting back the insight that I was baking into the prompts, and the results weren't particularly surprising.
On a whim, I gave CC access to my debug CLI tools and found that it wiped the floor with that approach. It gave actually interesting results and required very little orchestration in comparison.
One of my favourite trail of excerpts goes from Jobs’ reality distortion field to Theranos’ fake demos, to Thiel on startup cults, to Hoffer on mass movement charlatans (https://trails.pieterma.es/trail/useful-lies/). A fun tendency is that Claude kept getting distracted by topics of secrecy, conspiracy, and hidden systems - as if the task itself summoned a Foucault’s Pendulum mindset.
Details:
* The books are picked from HN’s favourites (which I collected before: https://hnbooks.pieterma.es/).
* Chunks are indexed by topic using Gemini Flash Lite. The whole library cost about £10.
* Topics are organised into a tree structure using recursive Leiden partitioning and LLM labels. This gives a high-level sense of the themes.
* There are several ways to browse. The most useful are embedding similarity, topic tree siblings, and topics cooccurring within a chunk window.
* Everything is stored in SQLite and manipulated using a set of CLI tools.
I wrote more about the process here: https://pieterma.es/syntopic-reading-claude/
I’m curious if this way of reading resonates for anyone else - LLM-mediated or not.
johnwatson11218|1 month ago
I just spent time getting it all running on docker compose and moved my web ui from express js to flask. I want to get the code cleaned up and open source it at some point.
johnwatson11218|1 month ago
johnwatson11218|1 month ago
-- -- Name: refresh_topic_tables(); Type: PROCEDURE; Schema: public; Owner: postgres --
CREATE PROCEDURE public.refresh_topic_tables() LANGUAGE plpgsql AS $$ BEGIN -- Drop tables in reverse dependency order DROP TABLE IF EXISTS topic_top_terms; DROP TABLE IF EXISTS topic_term_tfidf; DROP TABLE IF EXISTS term_df; DROP TABLE IF EXISTS term_tf; DROP TABLE IF EXISTS topic_terms;
EXCEPTION WHEN OTHERS THEN RAISE EXCEPTION 'Error refreshing topic tables: %', SQLERRM; END; $$;ct0|1 month ago
hellisad|1 month ago
fittingopposite|1 month ago
8organicbits|1 month ago
I'm seeing "Thanos committing fraud" in a section about "useful lies". Given that the founder is currently in prison, it seems odd to consider the lie useful instead of harmful. It kinda seems like the AI found a bunch of loosely related things and mislabeled the group.
If you've read these books I'm not seeing what value this adds.
Closi|1 month ago
Terretta|1 month ago
Theranos is the fraud mentioned in the piece.
theturtletalks|1 month ago
0. https://github.com/ValdikSS/GoodbyeDPI
noname120|1 month ago
https://habr.com/en/articles/456476/
https://android-review.googlesource.com/c/platform/system/bt...
dinkleberg|1 month ago
pxc|1 month ago
Anyway, it introduced me to the idea of using computational methods in the humanities, including literature. I found it really interesting at the time!
One of the the terms it introduced me to is "distant reading", whose name mirrors that of a technique you may have studied in your gen eds if you went to university ('close reading"). The idea is that rather than zooming in on some tiny piece of text to examine very subtle or nuanced meanings, you zoom out to hundreds or thousands of texts, using computers to search them for insights that only emerge from large bodies of work as wholes. The book argued that there are likely some questions that it is only feasible to ask this way.
An old friend of mine used techniques like this for dissertation in rhetoric, learning enough Python along the way to write the code needed for the analyses she wanted to do. I thought it was pretty cool!
I imagine LLMs are probably positioned now to push distant reading forward in an number of ways: enabling new techniques, allowing old techniques to be used without writing code, and helping novices get started with writing some code. (A lot of the maintainability issues that come with LLM code generation happily don't apply to research projects like this.)
Anyway, if you're interested in other computational techniques you can use to enrich this kind of reading, you might enjoy looking into "distant reading": https://en.wikipedia.org/wiki/Distant_reading
plutokras|1 month ago
LLMs are great at finding media by vague descriptions. ;)
smusamashah|1 month ago
In "Father wound" the words "abandoned at birth" are connected to "did not". Which makes it look like those visual connections are just a stylistic choice and don't carry any meaning at all.
Oras|1 month ago
hecanjog|1 month ago
chrisgd|1 month ago
The one I found most connected that the LLm didn’t was a connection between Jobs and the The Elephant in the Brain
The Elephant in the Brain: The less we know of our own ugly motives, the easier it is to hide them from others. Self-deception is therefore strategic, a ploy our brains use to look good while behaving badly.
Jobs: “He can deceive himself,” said Bill Atkinson. “It allowed him to con people into believing his vision, because he has personally embraced and internalized it.”
urbandw311er|1 month ago
I do like the idea though — perhaps there is a way to refine the prompting to do a second pass or even multiple passes to iteratively extract themes before the linking step.
Balgair|1 month ago
Have you read the Syntopicon by Mortimer J Adler?
It's right up your alley on this one. It's essentially this, but in 1965, by hand, with Isaac Asimov and William F Buckley Jr, among others.
Where did you get the books from? I've been trying to do something like this myself, but haven't been able to get good access to books under copyright.
Yeah, thinking a bit more here, you've created a Syntopicon. I've always wanted to make a modern one too! You can do the old school late night Wikipedia reading session with the trails idea of yours. Brilliant!
Really though, how can I help you make this bigger?
tolerance|1 month ago
I think that this sucks the discreet joy out of reading and learning. Having the ways that the topics within a certain book can cross over in lead into another book of a different topic externalized is hollowing and I don’t find it useful.
On the other hand I feel like seeing this process externalized gives us a glimpse at how “the algorithms” (read: recommender systems) suggest seemingly disjunctive content to users. So as a technical achievement I can’t knock what you’ve done and I’m satisfied to see that you’re the guy behind the HN Book map that I thought was nice too.
At its core this looks like a representation of the advantages that LLMs can afford to the humanities. Most of us know how Rob Pike feels about them. I wonder if his senior former colleague feels the same: https://www.cs.princeton.edu/~bwk/hum307/index.html. That’s a digression, but I’d like to see some people think in public about how to reasonably use these tools in that domain.
mathgeek|1 month ago
Intuitively, I agree. This feels like the different between being a creator (of your own thoughts as inspired by another person's) and a consumer (although in a somewhat educational sense). There would need to be a big advantage to being taught those initial thoughts, analogous to why we teach folks algebra/calculus via formulas rather than having every student figure out proofs for themselves.
bonkusbingus|1 month ago
drakeballew|1 month ago
amadeuswoo|1 month ago
samuelknight|1 month ago
This is the best way to re-enforce a copilot because models are pretty smart most of the time and I can correct the cases where it stumbles with minimal cognitive effort. Over time I find more and more tasks are solved by agent intelligence or these happy path hints. As primitive as it is, CLAUDE.md is the best we have for long-term adaptive memory.
pmaze|1 month ago
lkbm|1 month ago
I was recently trying to remember a portal fantasy I read as a kid. Goodreads has some impressive lists, not just "Portal Fantasies"[0], but "Portal Fantasies where the portal is on water[1], and a seven more "where/what's the portal" categories like that.
But the portal fantasy I was seeking is on the water and not on the list.
LLMs have failed me so far, as has browsing the larger portal fantasy list. So, I thought, what if I had an LLM look through a list of kids books published in the 1990s and categorize "is this a portal fantasy?" and "which category is the portal?"
I would 1. possibly find my book and 2. possibly find dozens of books I could add to the lists. (And potentially help augment other Goodread-like sites.)
Haven't done it, but I still might.
Anyway, thanks for making this. It's a really cool project!
[0] https://www.goodreads.com/list/show/103552.Portal_Fantasy_Bo...
[1] https://www.goodreads.com/list/show/172393.Fiction_Portal_is...
znnajdla|1 month ago
catlifeonmars|1 month ago
nkrisc|1 month ago
drakeballew|1 month ago
Edit/update: if you are looking for the phantom thread between texts, believe me that an LLM cannot achieve it. I have interrogated the most advanced models for hours, and they cannot do the task to any sort of satisfactory end that a smoked-out half-asleep college freshman could. The models don't have sufficient capacity...yet.
liqilin1567|1 month ago
rtgfhyuj|1 month ago
https://trails.pieterma.es/trail/collective-brain/ is great
what-the-grump|1 month ago
… realize that it’s nonsense and the LLM is not smart enough to figure out much without a reranker and a ton of technology that tells it what to do with the data.
You can run any vector query against a rag and you are guaranteed a response. With chunks that are unrelated any way.
baxtr|1 month ago
Take for example the OODA loop. How are the connections made here of any use? Seems like the words are semantically related but the concept are not. And even if they are, so what?
I am missing the so what.
Now imagine a human had read all these books. It would have come up with something new, I’m pretty sure about that.
https://trails.pieterma.es/trail/tempo-gradient/
timoth3y|1 month ago
You have an interesting idea here, but looking over the LLM output, it's not clear what these "connections" actually mean, or if they mean anything at all.
Feeding a dataset into an LLM and getting it to output something is rather trivial. How is this particular output insightful or helpful? What specific connections gave you, the author, new insight into these works?
You correctly, and importantly point out that "LLMs are overused to summarise and underused to help us read deeper", but you published the LLM summary without explaining how the LLM helped you read deeper.
pmaze|1 month ago
A trail that hits that balance well IMO is https://trails.pieterma.es/trail/pacemaker-principle/. I find the system theory topics the most interesting. In this one, I like how it pulled in a section from Kitchen Confidential in between oil trade bottlenecks and software team constraints to illustrate the general principle.
Aurornis|1 month ago
But so many of the links just don't make sense, as several comments have pointed out. Are these actually supposed to represent connections between books, or is it just a random visual effect that's suppose to imply they're connected?
I clicked on one category and it has "Us/Them" linked to "fictions" in the next summary. I get that it's supposed to imply some relationship but I can't parse the relationships
rjh29|1 month ago
lisdexan|1 month ago
hecanjog|1 month ago
I won't pile on to what everyone else has said about the book connections / AI part of this (though I agree that part is not the really interesting or useful thing about your project) but I think a walk-through of how you approach UI design would be very interesting!
jennyholzer6|1 month ago
[deleted]
unknown|1 month ago
[deleted]
unknown|1 month ago
[deleted]
zkmon|1 month ago
Orwelliian motives (sheer egoism, aesthetic enthusiasm, historical impulse and political purposes) are somewhat dated.
andai|1 month ago
The book was really big and it got stuck in "indexing". (Possibly broke the indexer?) But thanks to the CLI integration, it was able to just iteratively grep all the info it needed out of it. I found this very amusing.
Anthropic's article on retrieval emphasizes the importance of keyword search, since they often outperform embeddings depending on the query. Their own approach is a hybrid:
https://www.anthropic.com/engineering/contextual-retrieval
hising|1 month ago
frankdenbow|1 month ago
djeastm|1 month ago
It's like grabbing a half-dozen books off the library shelf, opening to a random page in each, then flit through them, kind of like a "engineering nerd book sample platter".
Aurornis|1 month ago
The visual style of linking phrases from one section to the next looks neat, but the connections don’t seem correct. There’s a link from “fictions” to “internal motives” near the top of the first link and several other links are not really obviously correct.
pmaze|1 month ago
There's two stages to the linking: first juxtaposing the excerpts, then finding and linking key phrases within them. I find the excerpts themselves often have interesting connections between them, but the key phrases can be a bit out there. The "fictions" to "internal motives" one does gel for me, given the theme of deceiving ourselves about our own motivations.
reedf1|1 month ago
akshay326|1 month ago
rhgraysonii|1 month ago
fudged71|1 month ago
itsangaris|1 month ago
guidoism|1 month ago
Balgair|1 month ago
I really think we all should sync up and talk more. I want to make this bigger.
dexterlagan|1 month ago
JimmyJamesJames|1 month ago
#1: would a larger dataset increase the depth and breadth of insight ( go to #2) #2: with the initial top 100, are there key ‘super node’ books that stand out as ones to read due the breadth they offer. Would a larger dataset identify further ‘super node’ books.
amelius|1 month ago
https://en.wikipedia.org/wiki/Netflix_Prize
(Are people still trying to improve upon the original winning solution?)
trinsic2|1 month ago
sciences44|1 month ago
Solid technical execution too. Well done!
barrenko|1 month ago
neodypsis|1 month ago
dev_l1x_be|1 month ago
adsharma|1 month ago
Wouldn't it be good if recursive Leiden and cypher was built into an embedded DB?
That's what I'm looking into with mcp-server-ladybug.
nurettin|1 month ago
Conclusion: you find wisdom in everything if you look for it.
gvedem|1 month ago
>The Law of Fives states simply that: ALL THINGS HAPPEN IN FIVES, OR ARE DIVISIBLE BY OR ARE MULTIPLES OF FIVE, OR ARE SOMEHOW DIRECTLY OR INDIRECTLY APPROPRIATE TO 5.
>The Law of Fives is never wrong.
>In the Erisian Archives is an old memo from Omar to Mal-2: "I find the Law of Fives to be more and more manifest the harder I look."
butterNaN|1 month ago
threecheese|1 month ago
fittingopposite|1 month ago
stogot|1 month ago
dangoodmanUT|1 month ago
typon|1 month ago
podgorniy|1 month ago
froil|1 month ago
simonw|1 month ago
jgalt212|1 month ago
chromanoid|1 month ago
I really appreciate you mentioning this. I think this is the nature of LLMs in general. Any symbol it processes can affect its reasoning capabilities.
joe_the_user|1 month ago
It's all fun and game 'till someone loses an eye/mind/even-tenuous-connection-to-reality.
Edit: I'd mention that the themes Claude finds qualify as important stuff imo. But they're all pretty grim and it's a bit problematic focusing on them for a long period. Also, they are often the grimmest spin things that are well known.
drakeballew|1 month ago
sgt101|1 month ago
https://medium.com/gft-engineering/using-text-embeddings-and...
Validation is a problem here - you find relationships, but so what? Is it right.... I can't say. It is interesting though.
unknown|1 month ago
[deleted]
iamzamek|1 month ago
pennaMan|1 month ago
localhost|1 month ago
Someone called the output "egregious, obviously referential LLM dog". And someone else called this trail an example of that: https://trails.pieterma.es/trail/tempo-gradient/
What struck me when reading it was that I parachuted into that trail with zero context. So I asked an LLM to generate some surrounding context for me: this is the verbatim prompt that I just gave gpt-5.2/medium running in my own agentic harness:
> read this: https://trails.pieterma.es/trail/tempo-gradient/. then i want you to write a short essay that shows the relationship between the different quotes making sure to use each quote in the process. the narrative should help me understand the links between the different quotes in each book in context. feel free to use your own knowledge of the books and your general knowledge to make this an approachable thing. what another llm did here is cool - it found relationships between this set of quotes. now i want you to materialize them into a compelling essay that invites curiosity and teaches me something. write the essay to tempo-gradient.md and respond in chat with how you thought about the problem and whether you thought these relationships were meaningful.
This was my agent's response:
Wrote the essay to tempo-gradient.md.
How I thought about it: I treated “tempo gradient” as the shared mechanism behind all eight excerpts—mismatch in loop speed (notice → interpret → decide → act) caused by attention, infrastructure latency, organizational rigidity, system delays, stealthy manipulation, or runaway acceleration. Then I ordered the quotes from “inner time” (tennis/attention) → engineered milliseconds (HFT) → explicit doctrine (OODA) → industrial schedules (mobilization) → structural delays (Beer Game/economy) → slow covert tempo (cyber) → extreme fast tempo (AI takeoff), so each quote sets up the next.
Whether the relationships felt meaningful: yes, mostly. They’re not “the same idea” so much as the same shape appearing in different contexts—feedback delay and decision cadence determining who gets to set the terms of reality for everyone else. The only relationship that feels a bit more interpretive than direct is tennis ↔ geopolitics, but even there the bridge (“how many nows” as a controllable input to decision tempo) is surprisingly sturdy.
And this is the output: https://gist.github.com/jflam/49753b7da64a74f07e35f6e2465f1a...
wormpilled|1 month ago
Interesting... seems like it wants the keys on your system! ;)
mannanj|1 month ago
only-one1701|1 month ago
jennyholzer6|1 month ago
[deleted]
unknown|1 month ago
[deleted]
jereees|1 month ago
pixelsmithy|1 month ago
This project, titled "Show HN: I used Claude Code to discover connections between 100 books," is an attempt to use an LLM (Claude) to perform "syntopical reading"—analyzing a library of 100 Hacker News favorite books to find thematic "trails" and connections between seemingly disparate texts. The author used CLI tools and LLM orchestration to index topics and generate these connections, aiming to move beyond simple summarization toward deeper insight.
Below are my thoughts on the project, followed by an analysis of the specific criticisms raised in the thread.
My Thoughts Conceptually, this is a fascinating experiment in "digital humanities." It attempts to automate a very high-level cognitive task: synthesis. Usually, we use computers for retrieval (search) or storage. Using them to find semantic bridges between concepts like "Jobs' reality distortion field" and "Theranos' fake demos" is a compelling use case for LLMs.
However, the execution reveals the current limits of this technology. The resulting "connections" often feel like a parlor trick—impressive that the machine did it, but often lacking the "click" of genuine insight. The project succeeds more as a technical visualization of vector embeddings than as a literary tool. It produces a map of linguistic proximity rather than conceptual necessity.
Criticisms & Agreement Analysis Here are the main criticisms from the comment section and my take on them:
1. The "Rorschach Test" / Spurious Connections Criticism: Users like tmountain, smusamashah, and timoth3y argue that the connections are "weaker than weak" or purely surface-level (e.g., linking "fracture" in one book to "crumble" in another). They suggest the project is an "LLM Rorschach test" where the human user forces meaning onto random noise generated by the model.
Do I agree? Yes. Reasoning: LLMs operate on statistical probability and vector similarity. They often confuse topical relatedness (these words appear in similar contexts) with causal or logical connection. A connection between "Us/Them" and "fictions" might make sense in a vector space, but to a human reader expecting a philosophical argument, it feels disjointed. Without the reasoning for the link being rigorous, the user has to do the heavy lifting to invent the connection, making the tool less of a "guide" and more of a "random prompt generator.”
2. Outsourcing Critical Thought Criticism: Users eloisius and DrewADesign argue that the project defeats the purpose of reading.[1] They contend that "the thinking is the point," and having a machine find connections robs the reader of the synthesis process that leads to understanding.
Do I agree? Partially. Reasoning: If the goal is learning, they are correct; you cannot learn by having a machine digest information for you. However, if the goal is discovery or research, this criticism is too harsh. Researchers have always used indices, concordances, and bibliographies to find connections they didn't know existed. If this tool is treated as a "super-index" rather than a "replacement reader," it has validity. The danger lies in mistaking the map (the AI's graph) for the territory (the actual knowledge).
3. Hallucinations and Conceptual Errors Criticism: User 8organicbits pointed out a weird label ("Thanos committing fraud" in a section about "useful lies") and questioned the logic of calling a fraud "useful" if the founder is in prison.
Do I agree? Yes. Reasoning: (Note: User Terretta clarified the commenter likely confused the comic villain Thanos with the company Theranos, which was in the text). However, the criticism about the label "useful lies" holds up. The LLM likely grouped "Theranos" under "useful lies" because the deception functioned for a time, but it lacks the nuance to understand that "fraud" and "useful tool" are categorically different to a human moral compass. This highlights the "alien" nature of LLM categorization—it organizes by semantic weight, not human logic.
4. "LLM Slop" and Fatigue Criticism: User typon and others noted the descriptions have a "distinct LLM voice" and dismissed it as "slop."[1] User Aurornis mentioned recognizing the writing style immediately.
Do I agree? Yes. Reasoning: By 2026 (the context of this thread), users are highly attuned to "AI-ese"—the perfectly grammatical but hollow, hedging, and overly enthusiastic tone of standard model outputs. This "slop" creates a trust deficit. When a human reads a connection written by a human, they assume intent. When they read one written by an LLM, they assume hallucination until proven otherwise. This high barrier to trust makes the project harder to enjoy.
Conclusion I agree with the consensus of the thread: Cool tech demo, shallow utility. The project is a great example of what LLMs can do (processing vast amounts of text to find patterns), but it inadvertently highlights what they cannot do (understand the deep, human significance of those patterns). It effectively automates the "what" but misses the "so what?"
djeastm|1 month ago
Perhaps you might instead provide your own TL;DR after reading it yourself?
jennyholzer6|1 month ago
[deleted]
durch|1 month ago
[deleted]
glemion43|1 month ago
A LLM is a transformer. It transforms a prompt into a result.
Or a human idea into a concrete java implementation.
Currently I'm exploring what unexpected or curious transformations LLMs are capable of but haven't found much yet.
At least I myself was surprised that an LLM can transform a description of something into an IMG by transforming it into a SVG.
calmoo|1 month ago
sidrag22|1 month ago
afro88|1 month ago
miracoli|1 month ago
nephihaha|1 month ago
jennyholzer6|1 month ago
[deleted]
napolux|1 month ago
pharrington|1 month ago