(no title)
yatz | 1 year ago
Here is how it works. When you upload attachments, in my case a very large PDF, it chunks that PDF into small parts and stores them in a vector database. It seems like the chunking part is not that great, as every time you make a call, the system loads a large chunk or many chunks and sends them to the model along with your prompt, which inflates your per request costs to 10 times more than the prompt + response tokens combined. So, be mindful of the hidden costs and monitor your usage.
nl|1 year ago
This is how RAG works.
While you can come up with work-arounds like using lesser LLMs as a pre-filtering step the fact is that if you need GPT to read the doc you need GPT to read the doc.
firejake308|1 year ago
metaskills|1 year ago
iamflimflam1|1 year ago
benreesman|1 year ago
Ideally if you want a model’s weights to include a credible representation of non-trivial data you want it somewhere in the training pipeline (usually earlier is better for important stuff but that’s a hubristic at best), but there’s transfer learning of various kinds, and joint losses of countless kinds (CLIP in SD-style diffusors come to mind), and fine tunes (if that doesn’t just count as transfer learning), and dimensionality reduction that is often remarkably effective, and multi-tower models like what evolved into DLRM, and I’m forgetting/omitting easily 100x the approaches I mentioned.
It’s possible I misunderstand you, so please elaborate if so?
barfbagginus|1 year ago
unknown|1 year ago
[deleted]
hnuser123456|1 year ago
yatz|1 year ago