Interesting! Apache Tika is the "classic" way of doing this, and it supports many dozens of formats. Is this meant to address certain needs that Tika doesn't?
Interesting! I started this project a long while ago and I gradually introduced additional features and formats over time after running into performance issues with other file parsers (not Tika). Tika looks like a great solution if you don't mind the Java dependency.
[+] [-] Meekro|9 years ago|reply
[+] [-] pzaich|9 years ago|reply
Here's a JRuby wrapper: https://github.com/ricn/rika