Why is this a problem? You don't need an LLM; you need a "model for detecting whether and where the given context appears". We're so used to LLMs now that we forget these NLPs problems have been worked on for a long time and they don't require a huge computational beast and it takes a few ms to run (on only the response, while it's being streamed).
phailhaus|1 month ago