By cruft cleaner, do you mean cleaning the HTML well? Right now, we do 2 things to help with that, a pretty robust parsing stack as well as a "summaries" feature that returns an LLM-generated query-biased text output for every webpage returned.If something else though, curious.
No comments yet.