(no title)
nestorD | 3 months ago
I can vouch for the fact that LLMs are great at searching in the original language, summarizing key points to let you know whether a document might be of interest, then providing you with a translation where you need one.
The fun part has been build tools to turn Claude code and Codex CLI into capable research assistant for that type of projects.
throwup238|3 months ago
What does that look like? How well does it work?
I ended up writing a research TUI with my own higher level orchestration (basically have the thing keep working in a loop until a budget has been reached) and document extraction.
nestorD|3 months ago
But I realized I was not using it much because it was that big and inflexible (plus I keep wanting to stamp out all the bugs, which I do not have the time to do on a hobby project). So I ended up extracting it into MCPs (equipped to do full-text search and download OCR from the various databases I care about) and AGENTS.md files (defining pipelines, as well as patterns for both searching behavior and reporting of results). I also put together a sub-agent for translation (cutting away all tools besides reading and writing files, and giving it some document-specific contextual information).
That lets me use Claude Code and Codex CLI (which, anecdotally, I have found to be the better of the two for that kind of work; it seems to deal better with longer inputs produced by searches) as the driver, telling them what I am researching and maybe how I would structure the search, then letting them run in the background before checking their report and steering the search based on that.
It is not perfect (if a search surfaces 300 promising documents, it will not check all of them, and it often misunderstands things due to lacking further context), but I now find myself reaching for it regularly, and I polish out problems one at a time. The next goal is to add more data sources and to maybe unify things further.