top | item 12298566

Show HN: DocRipper – Scrape .doc|.docx|.pdf|.txt|.sketch with 1 command

14 points| pzaich | 9 years ago |github.com | reply

2 comments

order
[+] Meekro|9 years ago|reply
Interesting! Apache Tika is the "classic" way of doing this, and it supports many dozens of formats. Is this meant to address certain needs that Tika doesn't?
[+] pzaich|9 years ago|reply
Interesting! I started this project a long while ago and I gradually introduced additional features and formats over time after running into performance issues with other file parsers (not Tika). Tika looks like a great solution if you don't mind the Java dependency.

Here's a JRuby wrapper: https://github.com/ricn/rika