(no title)
emj
|
1 year ago
Last time I tried to parse .docx it was full of opaque binary blobs, it might be a zip but parsing the data is like summoning arcane magic. It might have changed in the last decade, but considering the Microsoft has no incitement to make the situation better parsing it is always going to be a "fun" exercise.
tithe|1 year ago
But indexing PDFs, now there's a fun one.