(no title)
anotherpaulg | 4 months ago
1. Minimize the number of PDF pages per context/call. Don't dump a giant document set into one request. Break them into the smallest coherent chunks.
2. In a clean context, re-send the page and the extracted target content and ask the model to proofread/double-check the extracted data.
3. Repeat the extraction and/or the proofreading steps with a different model and compare the results.
4. Iterate until the proofreadings pass without altering the data, or flag proofreading failures for stronger models or human intervention.
SketchySeaBeast|4 months ago