top | item 35428107

(no title)

It's just the GPT-4 API - the chunks are sent as part of a prompt. In that case it won't use data from all chunks but it will try to find any chunks that provide descriptions of the document. I've found with research papers, for example, it fetches parts of the introduction and abstract.

discuss

wufufufu|2 years ago

Oh so there is pre-processing to find the useful portions? What are you using for the pre-processing?

I feel that it's inevitable that OpenAI et al. will be able to handle large PDF documents eventually. But until then I'm sure there's a lot of value of in this kind of pre-processing/chunking.

naveedjanmo|2 years ago

Yeah I think you're right - the 32k context window for GPT-4 (not available for everyone yet) is already enough for research papers. I'm using a library called Langchain, there's also LlamaIndex.