top | item 40395968

(no title)

yatz | 1 year ago

Assistants API is promising, but earlier versions have many issues, especially with how it calculates the costs. As per OpenAI docs, you pay for data storage, a fixed price per API call, + token usage. It sounds straightforward until you start using it.

Here is how it works. When you upload attachments, in my case a very large PDF, it chunks that PDF into small parts and stores them in a vector database. It seems like the chunking part is not that great, as every time you make a call, the system loads a large chunk or many chunks and sends them to the model along with your prompt, which inflates your per request costs to 10 times more than the prompt + response tokens combined. So, be mindful of the hidden costs and monitor your usage.

discuss

order

nl|1 year ago

> as every time you make a call, the system loads a large chunk or many chunks and sends them to the model along with your prompt,

This is how RAG works.

While you can come up with work-arounds like using lesser LLMs as a pre-filtering step the fact is that if you need GPT to read the doc you need GPT to read the doc.

firejake308|1 year ago

True, this is how RAG works, but this is why I prefer to use open-source LLMs for RAG: because the token costs are less opaque and I can control how many chunks I pull fromthe database to manage my costs

metaskills|1 year ago

Yup, this seems right. You pay for tokens no matter what. Even in other APIs. Did you know you can set an expire for files, vector stores, etc? No need to pay for long term storage on those. Also, threads are free.

iamflimflam1|1 year ago

There isn’t really any other way for this to work. The only way for the model to answer questions on your pdf is for the information to be somewhere in the prompt.

benreesman|1 year ago

That might be true of specific models or specific APIs for accessing them, but I’d argue isn’t even remotely true of neural networks generally or generatively-pretrained decoder-only attention-inspired language models in particular.

Ideally if you want a model’s weights to include a credible representation of non-trivial data you want it somewhere in the training pipeline (usually earlier is better for important stuff but that’s a hubristic at best), but there’s transfer learning of various kinds, and joint losses of countless kinds (CLIP in SD-style diffusors come to mind), and fine tunes (if that doesn’t just count as transfer learning), and dimensionality reduction that is often remarkably effective, and multi-tower models like what evolved into DLRM, and I’m forgetting/omitting easily 100x the approaches I mentioned.

It’s possible I misunderstand you, so please elaborate if so?

barfbagginus|1 year ago

The way they vectorized the PDF could be less efficient than simply extracting the text and dropping it into context as text. If it's a 100 MB PDF then it's probably a scanned PDF, and OpenAI is probably using an OCR model to vectorize each page directly. It seems an opaque process with room to be inefficient. So I would be interested to know if we could save on token/vector fees by preprocessing the PDF to text with our own OCR.

hnuser123456|1 year ago

How large is a very large PDF?

yatz|1 year ago

Close to 100mb.