top | item 46850586

(no title)

xecaz | 28 days ago

I also wrote an extraction tool that; extracts all pictures to files names associated with what pdf they came from ocr the jpg for text, and if more than 8 characters extracts the text to a txt file leaves the org files should you need to revisit them makes the dump searchable locally. Ill like the repo if anyone from the media is interested, but didn't have the manpower to do this manually.

discuss

No comments yet.