top | item 43723215

(no title)

tominous | 10 months ago

In my case I had hundreds of invoices in a not-very-consistent PDF format which I had contemporaneously tracked in spreadsheets. After data extraction (pdftotext + OpenAI API), I cross-checked against the spreadsheets, and for any discrepancies I reviewed the original PDFs and old bank statements.

The main issue I had was it was surprisingly hard to get the model to consistently strip commas from dollar values, which broke the csv output I asked for. I gave up on prompt engineering it to perfection, and just looped around it with a regex check.

Otherwise, accuracy was extremely good and it surfaced a few errors in my spreadsheets over the years.

discuss

order

jofzar|10 months ago

I hope there is a future where csv comma's don't screw up data. I know it will never happen but it's a nightmare.

Everyone has a story of a csv formatting nightmare