top | item 44502398

(no title)

_1 | 7 months ago

We do use WebLLM and a hosted Weaviate database, but there are complaints about speed (both retrieval and time to first token as the context will get big). The Gemma 3n "nesting doll" approach sounds like it could be useful .. but haven't found anyone specifically doing it to add domain specific knowledge.

discuss

janalsncm|7 months ago

Typically retrieval is the fast part in my experience. Have you considered cheaper retrieval methods? Bm25 does pretty well on its own. And you can augment your dataset by precomputing relevant queries for each doc.