(no title)
jp57 | 2 months ago
Well, yes, but the binary blob is a zip archive of a directory of text XML files, and one could imagine tooling that wraps the git interaction in an unzip/zip bracket.
The real problem is that lawyers, like basically all other non-programmers, neither know nor care about the sequence of bytes that makes a file in the minds of programmers. In their minds the file IS what they see when they open it in word: a sequence of white rectangles with text laid out on it in specific ways, including tables with borders, etc. The fact that a lot of really complicated stuff goes on inside the file to get the WYSIWYG rendering is not only irrelevant to them, it's unknown.
Maybe the answer here will be along the lines of Karpathy's musings about making LLMs work directly with pixels (images of text), instead of encoded text and tokenizers [1]. An AI tool would take the document visually-standard legal document form, and read it, and produce output with edits, redlines, etc as directed by the user.
jpbryan|2 months ago
The diff of the document (referred to as a "redline") is what lawyers send to the client and their counterparties. It's essential that the redline is legible for all parties and reflects their professionalism.
Moreover, it is not enough to see the structural changes between the versions. A lawyer needs to see the formatting changes between the versions as well which cannot be accomplished by diffing XML files.
HPsquared|2 months ago
Imustaskforhelp|2 months ago
If openxml can be converted to csv/similar perhaps which can be converted to recutils
Recutils supports both mdb (Microsoft Access database files)/csv files to/from recutils
I saw this project on a recent hackernews comment and I had seen some comments there about how it does / can work decently with git features iirc (https://news.ycombinator.com/item?id=46265811)
I am interested to hear what your thoughts on recutils are and if perhaps we can have microsoft word/similar to git+recutils like workflow maybe
I thought about it and a tar/zipped git folder which can contain images/other content too which can be referenced with recutils instead of openxml/word document to me does feel an interesting idea
I am not sure but I think that openxml directly embeds data like pictures which can defnitely make it hard for git software to work perhaps but basically I am interested what you think about this/any feedback
conartist6|2 months ago
jiggawatts|2 months ago
At the start of the project the Markdown is authoritative, and the DOCX is just for previewing the styling. (Pandoc can insert the text into a layout template with place holders.)
Towards the end of a project I'll start treating the DOCX as authoritative but continue generating Markdown from it, so I can run the AI over it as a final proof-read or whatever.
This is similar to what people used to do with DocBook, but with a more friendly text format and a more AI-friendly "modern" workflow with Git, etc...
conception|2 months ago