top | item 46713387

Lix – universal version control system for binary files

142 points| onecommit | 1 month ago |lix.dev

57 comments

order

gu5|1 month ago

Lix is also a soft fork of the official Nix package manager implementation: https://lix.systems/

yjftsjthsd-h|1 month ago

I really assumed that this was that; even calling it a universal version control system for binary files would be kind of a weird way of describing it but is plausibly a valid description for the package manager.

uasi|1 month ago

Git can display diff between binary files using custom diff drivers:

> Put the following line in your .gitattributes file: *.docx diff=word

> This tells Git that any file that matches this pattern (.docx) should use the “word” filter when you try to view a diff that contains changes. What is the “word” filter? You have to set it up [in .gitconfig].

https://git-scm.com/book/en/v2/Customizing-Git-Git-Attribute...

danudey|1 month ago

In their 'Git is unsuited for applications' blog post[0] they also say the following:

> We currently have to clone the whole repository just to edit translation files. That is problematic for big repositories. The repository for posthog.com for example is ~680MB in size. Even though we only need translation files which would be at max 1MB in size, we have to clone the whole repository. That is also one of the reasons why git is not used at Facebook, Google & Co which have repository sizes in the gigabytes.

I get that it can be a bit complex, but Git can handle this circumstance pretty easily if you know how (or write a script for it).

For example, cloning the GIMP repo from GitLab takes me about 56 seconds and uses up 632 MB on disk, using just `git clone <repo>`.

In comparison, running these commands:

    git clone --quiet --filter=blob:none --sparse https://gitlab.gnome.org/GNOME/gimp.git gimp-sparse-clone
    git -C gimp-sparse-clone sparse-checkout add po po-libgimp po-plug-ins po-python po-script-fu po-tags po-tips po-windows-installer
(You can also run `git sparse-checkout init --no-cone` and then just `git sparse-checkout add *.po` to grab every .po file in the repo and nothing else)

Takes 14 seconds on my laptop and uses 59 MB of disk space, and checks out only the specified directories and their contents.

So yeah, it's not as automatic as one might like but ship a shell script to your translators and you're good to go. The 'Git can't do X' arguments are mostly untrue; it should really be 'Getting git to do X is more complicated than I would prefer' or 'Explaining how to do X is git is a pain', both of which are legitimate complaints.

[0] https://samuelstroschein.com/blog/git-limitations/

theknarf|1 month ago

Would be interesting to see some tooling built around being a custom diff driver for a bunch of different standard formats!

nine_k|1 month ago

This is great for showing diffs. To actually make git store only deltas, not entire binaries, you would need to configure "clean" and "smudge" filters for the format. Given that docx (and xlsx) are a bunch of XML files compressed by zip, you can actually have clean diffs, and small commits.

packetlost|1 month ago

Yeah, this is how I would prefer to solve this problem personally, but it would be really nice to have some collection of tools that cover common binary file formats automatically instead of having to configure this manually every time.

cat5e|1 month ago

This is really great. I read the Git config article, but I thought the image diff example was kinda lackluster. Im sure some better metrics could be extracted for a more descriptive diff.

Thanks for sharing!

samuelstros|1 month ago

Holy moly. I just went to bed. Checking my phone for last time. Opening hackernews for "one last scroll" and see lix, my project, popping up here.

Going through the questions now. So much for going to bed.

samuelstros|1 month ago

Learnings from the comments so far: I need to refine the positioning of lix.

Lix is not a replacement for git. Nor does it target version controlling code as the primary use case.

A better positioning might be "version control system as a library". The primary use case is embedding lix into applications, AI agents, etc. that need version control.

I need to to bed now. I have a flight to catch in 6 hours.

PS I am open to suggestions regarding the positioning!

KingMob|1 month ago

Hi, before you get too wedded to the name, you should be aware that there's already a major nix project called lix: https://lix.systems/.

Before clicking, I assumed this was actually a new feature of theirs that would apply nix build principles of some sort to version control of binaries.

micw|1 month ago

I wonder how much room this leaves for unintended, not shown changes. E.g. Excel is a complex format that allows all sort of metadata and embeddings that would not always seem as cell changes ...

samuelstros|1 month ago

Depends on the diff you render and what the plugin tracks.

In general, lix gives in API to track changes in any file format (via plugins). The "diff noise" thus depends on a) the plugin i.e. does it track them metadata? and b) what is rendered as the diff.

If the user doesn't care about seeing a diff of metadata in Excel, don't render the metadata in the diff. The latter is trivial because diffing in lix is just a SQL query.

mrgoldenbrown|1 month ago

Home page states Lix can diff. "any file format like .xlsx, .pdf, .docx"

Wow, sounds useful. Git doesn't do that out of the box.

BUT... the list of available "plugins" only has .csv,.md and json, which are things that git already handles just fine?

Can it actually diff excel and word and PDF or not?

samuelstros|1 month ago

It can but the plugins are not developed for production readiness yet. I should clarify that.

The way to write a plugin:

Take an off the shelf parser for pdf, docx, etc. and write a lix plugin. The moment a plugin parses a binary file into structured data, lix can handle the version control stuff.

forrestthewoods|1 month ago

Weird sales pitch. I think Git is super mediocre and a VCS that supports binary files would be awesome.

But then the first thing it talks about is diffing files. Which honestly shouldn’t even be a feature of VCS. That’s just a separate layer.

samuelstros|1 month ago

> But then the first thing it talks about is diffing files. Which honestly shouldn’t even be a feature of VCS. That’s just a separate layer.

There is nuance between git line by line diffing and what lix does.

For text diffing it holds true that diffing is a separate layer. Text files are small in size which allows on the fly diffing (that's what git does) by comparing two docs.

On the fly diffing doesn't work for structured file formats like xlsx, fig, dwg etc. It's too expensive. Both in terms of materializing two files at specific commits, and then diffing these two files.

What lix does under the hood is tracking individual changes, _which allows rendering a diff without on the fly diffing_. So lix is kind of responsible for the diffs but only in the sense that it provides a SQL API to query changes between two states. How the diff is rendered is up to the application.

Antibabelic|1 month ago

Most version control systems that are not Git support binary. In the industry you most often see Perforce P4 and Subversion being used for that purpose.

ezoe|1 month ago

It seems to me that this is just an issue of diff features. Git can extended to show semantic diff of binary files and it doesn't technically need a completely new VCS.

As git became the most popular VCS right now and it continues to do so for foreseeable future, I don't think incompatibility with git is a good design choice.

samuelstros|1 month ago

Indeed, if lix were to target code version controlling, incompatibility with git is a “dead on arrival” situation.

But, Lix use case is not version controlling code.

It’s embedding version control in applications. Hence, the reason why lix runs within SQL databases. Apps have databases. Lix runs of top of them.

The benefit for the developer is a version control system within their database, and exposing version control to users.

notachatbot123|1 month ago

I look at the page and leave without any clue as to what it actually does. Agents and AI are mentioned so I assume it might just be incoherent slop?

The person behind this boasts on Twitter, that they fired all their remote developers and used AI instead.

Judging by tweets, this project is 2-3 years in the making.

> Lix is a universal version control system that can diff any file format (.xlsx, .pdf, .docx, etc).

> Unlike Git's line-based diffs, Lix understands file structure. Lix sees price: 10 → 12 or cell B4: pending → shipped, not "line 4 changed" or "binary files differ".

How? I have a custom binary file format, how would Lix be able to interpret this?

> Lix adds a version control system on top of SQL databases that let's you query virtual tables like file, file_history, etc. via plain SQL. These table's are version controlled.

What does SQL have to do with everything?

samuelstros|1 month ago

Thanks for the feedback.

AI agents are the pull right now to why version control is needed outside of software engineering.

The mistake in the blog post is triggering comparisons to git, which leads to “why is this better/different than git?”.

If you have a custom binary file, you can write a plugin for it! :)

Lix runs on top of a SQL database because we initially built lix on top of git but needed:

- database semantics (transactions, acid, etc.)

- SQL to express history queries (diffing arbitrary file formats cant be solved with a simple diff() API)

danmeier|1 month ago

Great semantic diffs, but does Lix actually define a merge algebra for concurrent structured edits, or are conflicts just punted back to humans? How does its SQL engine guarantee deterministic merges vs last-write-wins?

samuelstros|1 month ago

Merge algebra is similar to git with a three way merge. Given that lix tracks individual changes, the three way merge is more fine grained.

In case of a conflict, you can either decide to do last write wins or surface the conflict to the user e.g. "Do you want to keep version A or version B?"

The SQL engine is merge unrelated. Lix uses SQL as storage and query engine, but not for merges.

anttiharju|1 month ago

for office files one can also unzip and zip to store them in git as plaintext

brnt|1 month ago

Its a pity Word doesnt open it's own OOXml export. At least Libre office has .fodt.

yoyohello13|1 month ago

Looks cool, but seems kind of weird that it only works through an sdk. Should there be a cli or something?

Edit: Oh I see. Seems like their use case is embedding version control into another application.

samuelstros|1 month ago

Correct. Lix has been developed with the embedded use-case in mind.

Someone can write a CLI for it. Though, the primary use case is not code version control but embedding into applications

orthoxerox|1 month ago

It's nice, but it needs to support the most common file formats used in gamedev to gain enough traction.

solidsnack9000|1 month ago

It was initially hard for me to understand how this could work but it looks like there is a plugin system?

samuelstros|1 month ago

Yes. The tracking works via plugins to keep it generic. Here is a rough illustration:

File change -> Plugin (detects changes) -> Lix

It works surprisingly well because most standard file formats have off the shelf parsers. Parse a file format, and et voila, it is trivial to diff. Then pass on a standard schema for changes to lix and you end up with a generic API to query changes.

AmbroseBierce|1 month ago

Git is a command line program so it feels strange that this doesn't seem to support that use case.

samuelstros|1 month ago

Hi,

I'm the creator of lix.

Lix doesn't target code version control. It can be used for it. But the primary use case is embedding version control in applications. Such an application can be an AI agent that modifies files which entails the need to show what the agent did in that file e.g. tracking the changes.

Git is good enough for code. I don't think there is space to gain much market share.

hekkle|1 month ago

Based on the product description, it seems that they don't like text, and want to deal in objects. It would feel strange if they did support a terminal, rather than a GUI.

lombasihir|1 month ago

because its a stupid content tracker. see man git.

mog_dev|1 month ago

I wonder if this could be used in conjunction with git for UT5 projects

bibimsz|1 month ago

compelling problem statement. md and csv have their limit.