So a good test would be replacing the spell names in the books with made-up spells. And if a "real" spell name was given, it also tests whether it "cheated".
A real test is synthesizing 100,000 sentences of this slect random ones and then inject the traits you want thr LLM to detect and describe, eg have a set of words or phrases that may represent spells and have them used so that they do something. Then have the LLM find these random spells in the random corpus.
It could still remember where each spell is mentioned. I think the only way to properly test this would be to run it against an unpublished manuscript.
ggrab|19 days ago
MarcellusDrum|15 days ago
outofpaper|23 days ago
lxgr|23 days ago
staticman2|23 days ago
If you ask a model to discuss an obscure work it'll have no clue what it's about.
This is very different than asking about Harry Potter.