I really assumed that this was that; even calling it a universal version control system for binary files would be kind of a weird way of describing it but is plausibly a valid description for the package manager.
Git can display diff between binary files using custom diff drivers:
> Put the following line in your .gitattributes file: *.docx diff=word
> This tells Git that any file that matches this pattern (.docx) should use the “word” filter when you try to view a diff that contains changes. What is the “word” filter? You have to set it up [in .gitconfig].
In their 'Git is unsuited for applications' blog post[0] they also say the following:
> We currently have to clone the whole repository just to edit translation files. That is problematic for big repositories. The repository for posthog.com for example is ~680MB in size. Even though we only need translation files which would be at max 1MB in size, we have to clone the whole repository. That is also one of the reasons why git is not used at Facebook, Google & Co which have repository sizes in the gigabytes.
I get that it can be a bit complex, but Git can handle this circumstance pretty easily if you know how (or write a script for it).
For example, cloning the GIMP repo from GitLab takes me about 56 seconds and uses up 632 MB on disk, using just `git clone <repo>`.
(You can also run `git sparse-checkout init --no-cone` and then just `git sparse-checkout add *.po` to grab every .po file in the repo and nothing else)
Takes 14 seconds on my laptop and uses 59 MB of disk space, and checks out only the specified directories and their contents.
So yeah, it's not as automatic as one might like but ship a shell script to your translators and you're good to go. The 'Git can't do X' arguments are mostly untrue; it should really be 'Getting git to do X is more complicated than I would prefer' or 'Explaining how to do X is git is a pain', both of which are legitimate complaints.
This is great for showing diffs. To actually make git store only deltas, not entire binaries, you would need to configure "clean" and "smudge" filters for the format.
Given that docx (and xlsx) are a bunch of XML files compressed by zip, you can actually have clean diffs, and small commits.
Yeah, this is how I would prefer to solve this problem personally, but it would be really nice to have some collection of tools that cover common binary file formats automatically instead of having to configure this manually every time.
This is really great. I read the Git config article, but I thought the image diff example was kinda lackluster. Im sure some better metrics could be extracted for a more descriptive diff.
Learnings from the comments so far: I need to refine the positioning of lix.
Lix is not a replacement for git. Nor does it target version controlling code as the primary use case.
A better positioning might be "version control system as a library". The primary use case is embedding lix into applications, AI agents, etc. that need version control.
I need to to bed now. I have a flight to catch in 6 hours.
PS I am open to suggestions regarding the positioning!
Hi, before you get too wedded to the name, you should be aware that there's already a major nix project called lix: https://lix.systems/.
Before clicking, I assumed this was actually a new feature of theirs that would apply nix build principles of some sort to version control of binaries.
I wonder how much room this leaves for unintended, not shown changes. E.g. Excel is a complex format that allows all sort of metadata and embeddings that would not always seem as cell changes ...
Depends on the diff you render and what the plugin tracks.
In general, lix gives in API to track changes in any file format (via plugins). The "diff noise" thus depends on a) the plugin i.e. does it track them metadata? and b) what is rendered as the diff.
If the user doesn't care about seeing a diff of metadata in Excel, don't render the metadata in the diff. The latter is trivial because diffing in lix is just a SQL query.
It can but the plugins are not developed for production readiness yet. I should clarify that.
The way to write a plugin:
Take an off the shelf parser for pdf, docx, etc. and write a lix plugin. The moment a plugin parses a binary file into structured data, lix can handle the version control stuff.
> But then the first thing it talks about is diffing files. Which honestly shouldn’t even be a feature of VCS. That’s just a separate layer.
There is nuance between git line by line diffing and what lix does.
For text diffing it holds true that diffing is a separate layer. Text files are small in size which allows on the fly diffing (that's what git does) by comparing two docs.
On the fly diffing doesn't work for structured file formats like xlsx, fig, dwg etc. It's too expensive. Both in terms of materializing two files at specific commits, and then diffing these two files.
What lix does under the hood is tracking individual changes, _which allows rendering a diff without on the fly diffing_. So lix is kind of responsible for the diffs but only in the sense that it provides a SQL API to query changes between two states. How the diff is rendered is up to the application.
Most version control systems that are not Git support binary. In the industry you most often see Perforce P4 and Subversion being used for that purpose.
Through the gitattributes and gitconfig files, git can be extended to work with any external tool for specific file types. For example: https://github.com/ewanmellor/git-diff-image
It seems to me that this is just an issue of diff features. Git can extended to show semantic diff of binary files and it doesn't technically need a completely new VCS.
As git became the most popular VCS right now and it continues to do so for foreseeable future, I don't think incompatibility with git is a good design choice.
I look at the page and leave without any clue as to what it actually does. Agents and AI are mentioned so I assume it might just be incoherent slop?
The person behind this boasts on Twitter, that they fired all their remote developers and used AI instead.
Judging by tweets, this project is 2-3 years in the making.
> Lix is a universal version control system that can diff any file format (.xlsx, .pdf, .docx, etc).
> Unlike Git's line-based diffs, Lix understands file structure. Lix sees price: 10 → 12 or cell B4: pending → shipped, not "line 4 changed" or "binary files differ".
How? I have a custom binary file format, how would Lix be able to interpret this?
> Lix adds a version control system on top of SQL databases that let's you query virtual tables like file, file_history, etc. via plain SQL. These table's are version controlled.
Great semantic diffs, but does Lix actually define a merge algebra for concurrent structured edits, or are conflicts just punted back to humans? How does its SQL engine guarantee deterministic merges vs last-write-wins?
Merge algebra is similar to git with a three way merge. Given that lix tracks individual changes, the three way merge is more fine grained.
In case of a conflict, you can either decide to do last write wins or surface the conflict to the user e.g. "Do you want to keep version A or version B?"
The SQL engine is merge unrelated. Lix uses SQL as storage and query engine, but not for merges.
Yes. The tracking works via plugins to keep it generic. Here is a rough illustration:
File change -> Plugin (detects changes) -> Lix
It works surprisingly well because most standard file formats have off the shelf parsers. Parse a file format, and et voila, it is trivial to diff. Then pass on a standard schema for changes to lix and you end up with a generic API to query changes.
Lix doesn't target code version control. It can be used for it. But the primary use case is embedding version control in applications. Such an application can be an AI agent that modifies files which entails the need to show what the agent did in that file e.g. tracking the changes.
Git is good enough for code. I don't think there is space to gain much market share.
Based on the product description, it seems that they don't like text, and want to deal in objects. It would feel strange if they did support a terminal, rather than a GUI.
gu5|1 month ago
yjftsjthsd-h|1 month ago
Rexxar|1 month ago
uasi|1 month ago
> Put the following line in your .gitattributes file: *.docx diff=word
> This tells Git that any file that matches this pattern (.docx) should use the “word” filter when you try to view a diff that contains changes. What is the “word” filter? You have to set it up [in .gitconfig].
https://git-scm.com/book/en/v2/Customizing-Git-Git-Attribute...
danudey|1 month ago
> We currently have to clone the whole repository just to edit translation files. That is problematic for big repositories. The repository for posthog.com for example is ~680MB in size. Even though we only need translation files which would be at max 1MB in size, we have to clone the whole repository. That is also one of the reasons why git is not used at Facebook, Google & Co which have repository sizes in the gigabytes.
I get that it can be a bit complex, but Git can handle this circumstance pretty easily if you know how (or write a script for it).
For example, cloning the GIMP repo from GitLab takes me about 56 seconds and uses up 632 MB on disk, using just `git clone <repo>`.
In comparison, running these commands:
(You can also run `git sparse-checkout init --no-cone` and then just `git sparse-checkout add *.po` to grab every .po file in the repo and nothing else)Takes 14 seconds on my laptop and uses 59 MB of disk space, and checks out only the specified directories and their contents.
So yeah, it's not as automatic as one might like but ship a shell script to your translators and you're good to go. The 'Git can't do X' arguments are mostly untrue; it should really be 'Getting git to do X is more complicated than I would prefer' or 'Explaining how to do X is git is a pain', both of which are legitimate complaints.
[0] https://samuelstroschein.com/blog/git-limitations/
theknarf|1 month ago
nine_k|1 month ago
packetlost|1 month ago
cat5e|1 month ago
Thanks for sharing!
samuelstros|1 month ago
Going through the questions now. So much for going to bed.
samuelstros|1 month ago
Lix is not a replacement for git. Nor does it target version controlling code as the primary use case.
A better positioning might be "version control system as a library". The primary use case is embedding lix into applications, AI agents, etc. that need version control.
I need to to bed now. I have a flight to catch in 6 hours.
PS I am open to suggestions regarding the positioning!
KingMob|1 month ago
Before clicking, I assumed this was actually a new feature of theirs that would apply nix build principles of some sort to version control of binaries.
micw|1 month ago
samuelstros|1 month ago
In general, lix gives in API to track changes in any file format (via plugins). The "diff noise" thus depends on a) the plugin i.e. does it track them metadata? and b) what is rendered as the diff.
If the user doesn't care about seeing a diff of metadata in Excel, don't render the metadata in the diff. The latter is trivial because diffing in lix is just a SQL query.
mrgoldenbrown|1 month ago
Wow, sounds useful. Git doesn't do that out of the box.
BUT... the list of available "plugins" only has .csv,.md and json, which are things that git already handles just fine?
Can it actually diff excel and word and PDF or not?
samuelstros|1 month ago
The way to write a plugin:
Take an off the shelf parser for pdf, docx, etc. and write a lix plugin. The moment a plugin parses a binary file into structured data, lix can handle the version control stuff.
forrestthewoods|1 month ago
But then the first thing it talks about is diffing files. Which honestly shouldn’t even be a feature of VCS. That’s just a separate layer.
samuelstros|1 month ago
There is nuance between git line by line diffing and what lix does.
For text diffing it holds true that diffing is a separate layer. Text files are small in size which allows on the fly diffing (that's what git does) by comparing two docs.
On the fly diffing doesn't work for structured file formats like xlsx, fig, dwg etc. It's too expensive. Both in terms of materializing two files at specific commits, and then diffing these two files.
What lix does under the hood is tracking individual changes, _which allows rendering a diff without on the fly diffing_. So lix is kind of responsible for the diffs but only in the sense that it provides a SQL API to query changes between two states. How the diff is rendered is up to the application.
Antibabelic|1 month ago
Izkata|1 month ago
thephotonsphere|1 month ago
https://lix.systems/
rlonstein|1 month ago
ezoe|1 month ago
As git became the most popular VCS right now and it continues to do so for foreseeable future, I don't think incompatibility with git is a good design choice.
samuelstros|1 month ago
But, Lix use case is not version controlling code.
It’s embedding version control in applications. Hence, the reason why lix runs within SQL databases. Apps have databases. Lix runs of top of them.
The benefit for the developer is a version control system within their database, and exposing version control to users.
notachatbot123|1 month ago
The person behind this boasts on Twitter, that they fired all their remote developers and used AI instead.
Judging by tweets, this project is 2-3 years in the making.
> Lix is a universal version control system that can diff any file format (.xlsx, .pdf, .docx, etc).
> Unlike Git's line-based diffs, Lix understands file structure. Lix sees price: 10 → 12 or cell B4: pending → shipped, not "line 4 changed" or "binary files differ".
How? I have a custom binary file format, how would Lix be able to interpret this?
> Lix adds a version control system on top of SQL databases that let's you query virtual tables like file, file_history, etc. via plain SQL. These table's are version controlled.
What does SQL have to do with everything?
samuelstros|1 month ago
AI agents are the pull right now to why version control is needed outside of software engineering.
The mistake in the blog post is triggering comparisons to git, which leads to “why is this better/different than git?”.
If you have a custom binary file, you can write a plugin for it! :)
Lix runs on top of a SQL database because we initially built lix on top of git but needed:
- database semantics (transactions, acid, etc.)
- SQL to express history queries (diffing arbitrary file formats cant be solved with a simple diff() API)
danmeier|1 month ago
samuelstros|1 month ago
In case of a conflict, you can either decide to do last write wins or surface the conflict to the user e.g. "Do you want to keep version A or version B?"
The SQL engine is merge unrelated. Lix uses SQL as storage and query engine, but not for merges.
anttiharju|1 month ago
brnt|1 month ago
mackross|1 month ago
yoyohello13|1 month ago
Edit: Oh I see. Seems like their use case is embedding version control into another application.
samuelstros|1 month ago
Someone can write a CLI for it. Though, the primary use case is not code version control but embedding into applications
internet_points|1 month ago
orthoxerox|1 month ago
solidsnack9000|1 month ago
samuelstros|1 month ago
File change -> Plugin (detects changes) -> Lix
It works surprisingly well because most standard file formats have off the shelf parsers. Parse a file format, and et voila, it is trivial to diff. Then pass on a standard schema for changes to lix and you end up with a generic API to query changes.
AmbroseBierce|1 month ago
samuelstros|1 month ago
I'm the creator of lix.
Lix doesn't target code version control. It can be used for it. But the primary use case is embedding version control in applications. Such an application can be an AI agent that modifies files which entails the need to show what the agent did in that file e.g. tracking the changes.
Git is good enough for code. I don't think there is space to gain much market share.
hekkle|1 month ago
lombasihir|1 month ago
mog_dev|1 month ago
dev_l1x_be|1 month ago
bibimsz|1 month ago
huflungdung|1 month ago
[deleted]