Pandoc (https://pandoc.org) can be used to convert a .docx file to markdown and other file formats like djot and typst. I don't think pandoc can convert powerpoint and excel files.
The hard part about document conversion is not finding a tool which can convert the formats but the tool which does it best. I wonder how MarkItDown ranks for the tasks for the various types.
The README of MarkItDown mentions "indexing and text analysis" as the two motivating features, whereas Pandoc is more interested in document preparation via conversion that maintains rich text formatting.
Since my personal use leans towards the latter, I'm hesitant to believe that this tool will work better for me but others may have other priorities.
That was the first thing I checked, and it looks like they’re using some existing python package to parse docx files. I wonder if they contributed to it or vetted it strongly
zamadatix|1 year ago
jez|1 year ago
Since my personal use leans towards the latter, I'm hesitant to believe that this tool will work better for me but others may have other priorities.
disgruntledphd2|1 year ago
_rs|1 year ago
LordDragonfang|1 year ago