top | item 46630555

(no title)

I am using LangChain with a SQLite database - it works pretty well on a 16G GPU, but I started running it on a crappy NUC, which also worked with lesser results.

The real lightbulb moment is when you realise the ONLY thing a RAG passes to the LLM is a short string of search results with small chunks of text. This changes it from 'magic' to 'ahh, ok - I need better search results'. With small models you cannot pass a lot of search results ( TOP_K=5 is probably the limit ), otherwise the small models 'forget context'.

It is fun trying to get decent results - and it is a rabbithole, next step I am going into is pre-summarising files and folders.

I open sourced the code I was using - https://github.com/acutesoftware/lifepim-ai-core

discuss

IXCoach|1 month ago

You can modify this, theres settings for - how much context - chunk size

We had to do this, 3 best matches but about 1000 characters each was far more effective than the default I ran into of 15-20 snippets of 4 sentences each

We also found a setting for "when do you cut off and/or start" the chunk, and set it to double new lines

Then just structured our agentic memory into meaningful chunks with 2 new lines between each, and it gelled perfectly.

( hope this helps )

reactordev|1 month ago

You can expand your context window to something like 100,000 to prevent memory loss.