top | item 38482982

(no title)

defsectec | 2 years ago

This looks amazing, I'll have to play around with this over the weekend.

I regularly hand transcribe RPG PDFs scans from dubious sources that have not always been run through OCR to have selectable text. If it has, it wasn't always done very well.

It's literally faster to type it all myself than fix all the errors from copy-pasting (or after using OCR to turn it into text).

Even if the file was an official PDF the formatting would often get screwed up with lots of double or triple spaces and even tabs included between words.

This would save so much time if I can get it to work. Thanks for sharing!

discuss

order

milep|2 years ago

I had this use case also in mind. Already tried with one book, but the results were not that good. Many of the tables and text boxes were messed up. I had pretty good results converting tables to markdown with ChatGPT by taking a screenshot of a table and pasting it to chat. It was able to handle some "irregular" tables with a bit of prompting. Like "Read the table row by row. Column headers are X, Y, Z. X is text, Y is number, Z is word" as a simplified example.

crooked-v|2 years ago

> I regularly hand transcribe RPG PDFs scans from dubious sources

Heh, that was my immediate thought too. There's a ton of RPG stuff that never had any kind of physical release and is totally orphaned as IP.