top | item 41097576

Show HN: Vdm, a sane alternative to e.g. Git submodules

100 points| ryapric | 1 year ago |github.com

Hey folks! I've been working on something on & (mostly) off for a little over a year, and picked it back up recently because of yet another related frustration at work.

I've spent a lot of time ripping out git submodules from repos my teams use, but I've spent an equally large amount of time wondering why there doesn't seem to be a better option for managing arbitrary dependencies across repos in the Year of Our Lord 2024. So, I put together a really early version of such an arbitrary-dependency manager. It's called vdm, and you can find it in the linked URL above & below.

I'm sharing mostly because I'm curious if I'm blatantly missing some other tool that exists that isn't language-specific (like Bit for JS seems to be, for example), but also in case people have any hot-takes or feedback on the functionality as listed in the README.

Also of note is that I'm not sharing to potentially monetize or "generate customer interest" or anything -- I'm just another builder on the internet.

Thanks for looking, and let me know if you have any questions!

vdm: https://github.com/opensourcecorp/vdm

60 comments

order

posix86|1 year ago

Nice!!

If you're looking for alternatives, here's something we've built (hope I'm not hijacking this): https://github.com/audiotool/pasta

It's called "pasta" for copy pasta. It was built with exactly the same motivation aa yours, also has a yaml config file, and is also implemented in go, kinda interesting. If yours takes off and we can drop ours, that'd be awesome!

For some feedback in features we have which we thinkg we'd be missing:

- we have the ability to copy individual files and specific subdirectories of other repos, not the entire repos

- mechanics to "clear" the target directory, in case a file gets deleted upstream, to keep the directories in sync

- we've modelled it with a plugin API, so you can implement new "copiers" for bitbucket, google drive, subversion, ...

- the github plugin we have uses the Github API for better performance, and you can add auth by setting an env var GITHUB_TOKEN

We also create a "result" file of every copy, noting the exact commit that was copied, which might or might not be a useful... Were thinking of posting it here at some point but never got around to it. Again, if yours takes off, that'd be the best option :)

We're using it mostly to copy .proto definitions from one repo to another.

glandium|1 year ago

Probably unpopular opinion: git submodules are just fine. They're "just" lacking a consistent UI. They have improved over the years, but the default config sucks because the defaults emulate the original, awful, UX. With proper configuration, it's much better, although there are still pain points (like rebase conflicts in non-submodule parts messing things up if you don't git submodule update)

dotancohen|1 year ago

Can we see your `git config -l`? I sparingly use git submodules, and don't really suffer from any of the common issues as I have a very strict update routine, but I'd love to see where things could be improved.

vilunov|1 year ago

If they don't have a consistent UI and emulate the original awful UX, then in what aspect are they nice?

They have a ton of problems in my experience, a few off the top of my head:

- They force the specific repo url, e.g. ssh github even if you prefer to clone by http.

- Pulling from remote becomes difficult when submodules change, e.g. when a submodule is merged into main repo and becomes a proper subdir.

- git commands such as `git checkout -- .` don't work properly on them and I don't see how configs could change that.

chipdart|1 year ago

> Probably unpopular opinion: git submodules are just fine. They're "just" lacking a consistent UI.

I second the sentiment. Git submodules work just fine. The UX could use some work. It baffles me why bolting on convoluted tools is considered a preferable alternative.

__m|1 year ago

What I don't like about submodules is that they are centralised, you can't just easily migrate to another server without having them still point to the old one, the urls are version controlled. I since then moved to packages.

comex|1 year ago

If it just clones the repos and removes the .git directories, then I assume it doesn't keep their commit history? So if you use e.g. `git blame` or `git log` to look at file history, you will see when changes were introduced to the parent repo, but not when/why those changes were made in the first place.

In that respect, it resembles git-subtree with --squash, but differs from git-submodule or regular git-subtree.

ryapric|1 year ago

Yep, you have it correct. I've got a note at the bottom of the README that I'm considering adding a config field to keep the .git directory, but I'm trying to keep pretty far away from git-in-git consequences/use cases. I said the same in another comment here, but I don't envision vdm becoming something that's git-specific or developmental -- it's really just intended to be a getter, not a writer, and the functionality reflects that.

Cool info though, thanks for sharing!

quilombodigital|1 year ago

To me, the biggest indicator that all the links being posted here about Git submodule systems come from people who don't know what they're doing is the fact that all of them (vdm, pasta, peru, git-aggregator, etc.) are using YAML as a config. Anyone who has worked at least a few years with Git and YAML knows that this type of file is not Git/diff friendly. I've seen too many disastrous merges, and the developers in the company have to keep using unityyamlmerge to resolve a foolish decision by Unity. Moreover, if anyone here has tried to parse YAML, they understand how unnecessary it is to use this format 99% of the time. In your case, the only advice I can give is to use a complete repo config per line, so it doesn't spread across different lines. This ensures the atomicity and integrity of your information.

juped|1 year ago

I never thought of that before, but it's a good point.

esafak|1 year ago

What config format do you recommend?

greatgib|1 year ago

If you are looking for something very light and efficient, let me suggest you to give a try to:

https://github.com/fviard/svn_xternals

Despite the README saying that it is a work in progress, the tool is functional for a few years already. Also, again despite the name, it works with GIT.

The idea is to be able to use the concept of "externals" from SVN transparently with svn or GIT. It does something similar to what Google "gclient" was doing but in a more efficient way (ie a lot faster and consuming a lot less resources).

To use it, you just need to create a file ("externals.conf" in your project for example), in a format like that:

externals.conf

   git@github.com:user/myproject_core.git                   myproject/core
   git@github.com:user/myproject_plugins_onething.git       myproject/plugins/onething
   git@github.com:anotheruser/another_thing.git@mybranch    myproject/plugins/another_thing
   git@github.com:corpuser/random_library.git@release-tag-123           myproject/vendor/random_library
Then, you can simply run: python3 externalsup.py

And it will take care to do automatically the git clone, or pull, or "switch" if you change a branch/tag indicator in the externals file.

Like that, you can easily commit a externals.conf file in a root project folder, and individually manage the version of sub-components that can be hosted anywhere.

The "externals.conf" file is a plain text file so easily to read and diff to compare different versions of your project.

lioeters|1 year ago

Git Subrepo is another alternative to submodules and subtree.

> This git command clones an external git repo into a subdirectory of your repo. Later on, upstream changes can be pulled in, and local changes can be pushed back. Simple.

https://github.com/ingydotnet/git-subrepo

After trying many similar solutions, it gets the closest to what I want to achieve, which is nested Git repositories. A project with subprojects, each of which can be an independent Git repo with its own history to push/pull, while the project itself has the entire codebase and history.

It's written in Bash, so fairly portable.

---

Edit: After skimming through the project vdm, I see the problems it aims to solve are different from what git-subrepo does. The latter is more about monorepos. Ah well, that's what I get for commenting before reading the post.

vdm does look useful for managing a project with external dependencies, which are Git repos owned by others or oneself. Maybe like a language-agnostic package manager.

djha-skin|1 year ago

I made a full dependency manager called Degasolv[1] capable of managing arbitrary code in zip files some years back. I wrote it in Clojure. It has features for hosting zip repositories, version comparison, transitive dependency resolution, the whole nine yards.

I poured my heart and soul into it[2] but it wasn't very popular. I guess there's not much need for a dependency manager that's not tailored to the needs of a particular community, like a platform or language.

1: https://github.com/djhaskin987/degasolv

2: https://degasolv.readthedocs.io

foooorsyth|1 year ago

Looks cool! Seems functionally similar to AOSP’s git-repo, but already feels more approachable with that simple yaml remote list.

What collaborative tool would you recommend using with vdm? AOSP has gerrit which is sort of specifically designed for this multi-remote meta setup. GitHub/GitLab don’t play nice with this type of environment.

mafuyu|1 year ago

This tool looks like "submodules, but lighter", while repo is "submodules, but heavier". Looks to me like the motivation is for dependencies that are not hard enough to justify a submodule.

Repo seriously sucks to use, but I also can't imagine many tools living up to AOSP-type workloads without being specifically designed for it. My gripe with repo is that it's really hard to pin the entire repo state if you have a bunch of prototype patches across multiple subrepos. I usually end up having to modify the XML directly.

ryapric|1 year ago

Thanks! That AOSP `repo` tool is one I'd not heard of, so thanks for sharing!

I actually haven't really put much thought into collaborative/mutlirepo development work using vdm -- the original intent was for it to strictly be a retriever of what the user specifies. I think the majority of both my frustration and complexity of other tools is because they're trying to solve for a lot more than at least I personally usually want to use them for. It's like, I just want a `pip install/requirements.txt/go.{mod,sum}` for any kind of tool, not just the language that takes up the majority of my codebase.

One of the thoughts I had, though, was to maybe add a field for each remote section called e.g. `try_local` that takes a filesystem path, and if that doesn't exist then fetch the remote. That way, your development workflow is exactly the same as usual -- developing a dependency is just doing work in the actual repo directory nearby. I'm not married to the idea though. I just REALLY don't want to have it be in the business of managing git-repos-in-git-repos, because vdm isn't really intended to be a Git tool, if that makes sense.

prpl|1 year ago

I think you’re going to find that, out there, somebody has already built this. I’ve built one, and worked on two others that somebody built. Usually they have names like workspace manager or repo manager or whatever. Most will probably have something to build a dag and code to do a topological sort for the recursive projects. The better ones will use the topological sort to pull repos and build in parallel.

In addition, other tools can also do this to varying degrees of success, like Bazel and cmake.

rendaw|1 year ago

What problems are there with git submodules and how does this solve them? The readme isn't forthcoming in this respect.

t_believ-er873|1 year ago

Nice! As an alternative backup tool, you can look at GitProtect Backup & Disaster Recovery for GitHub, Bitbucket, and GitLab. It allows you to pick up the storage (Cloud/local or both), automate backups by scheduling them at the most appropriate time, avoiding throttling, and restore data immediately from any point in time in case of failure, and many other features that meet pain points.

sebastienbeau|1 year ago

In our case we do not use submodules, because we need to apply some patch or PR to the dependency.

To solve it we use git-aggregator (I am not the autor) (language agnostic too). It seem to have the same features as VDM + some extra one (possiblity to have a frozen file, possibly to apply patch/pr...)

Source : https://github.com/acsone/git-aggregator

keithnz|1 year ago

I quite like https://github.com/ingydotnet/git-subrepo

This allows you to treat common code in a repo as just a normal part of the repo. However, the common code is also in a repo of its own. This tool then allows you to push / merge your changes back to the common repo.

Check the git page for a list of the benefits.

samtheprogram|1 year ago

I think submodules make sense in a lot of use cases, but a gotcha I saw with a team introduced to them recently is that pulling down from a branch or switching branches doesn’t update the submodule and/or stop you from changing branches if it is modified without being committed in some way.

If I could have submodules that operated that way I think submodules would be a lot more straightforward to newcomers.

jayd16|1 year ago

Yup, submodules are actually ok. Like with most git issues, it's more of a tooling UX problem then an architecture deficiency.

kadoban|1 year ago

Does it do anything to help manage the .gitignore file(s)? Otherwise I'd think you have to specify the dependency in both places consistently, which sounds a bit tedious.

alex7734|1 year ago

For projects where I can't trust that the people involved can deal with submodule bullshit correctly I just use these git aliases:

    box = !cd ${GIT_PREFIX:-.} && git config --get remote.origin.url > .gitboxinfo && git rev-parse --abbrev-ref HEAD >> .gitboxinfo && git rev-parse HEAD >> .gitboxinfo && mv .git .gitbox && git add -f .gitboxinfo && true
    unbox = !cd ${GIT_PREFIX:-.} && mv .gitbox .git && true
Then I add the .gitbox folder to gitignore. Whenever I need to interact with the "submodule" repo I unbox, otherwise I leave it boxed and as far as everyone else in the project is concerned, the dependency was just copied n pasted in the project.

If you ever need to regenerate the gitbox folder from scratch you can take a peek at the gitboxinfo file and git clone and reset the dependency repo in a temp folder, then move the git folder next to the gitboxinfo file.

Plus unlike submodules with this you can have local changes to the submodule files without having to fork the submodule itself.

TekMol|1 year ago

For a Python project, what are the pros/cons of

1: A setup.py that installs dependencies like this:

    pip install git+https://github.com/dependency/repo
2: Git submodules

?

est|1 year ago

3. copy everything into vendor/lib folder.

version pinning, no extra install needed, works offline, zero deps headaches.

Example: requests.packages.*

skribanto|1 year ago

I like to wrap it in a venv (pure python project) or nix flake (mixed languages)

000ooo000|1 year ago

Not so much of a hot take as some confusion: what are the pain points of Git submodules that lead to this tool? You imply they're 'not sane', worse but don't mention any of the deficiencies that your tool overcomes.

frizlab|1 year ago

The project looks interesting.

Regarding the name, I’m French, and VDM basically means FML in French.

anakaiti|1 year ago

nice! I've been using jsonnet-bundler for this, even for non-jsonnet projects.

neeh0|1 year ago

Another solution that "nix" solved years ago.

iveqy|1 year ago

This seems to be almost the same as androids repo tool. https://android.googlesource.com/tools/repo

Personally I don't see the difference between this and submodules. Repo stores the information in xml files, vdm stores it in yaml files and git submodules in the git database. I don't really care.

The real headache for me is the trouble of traceability vs ease of use. You need to specify your dependencies with a sha1 to have traceable SLSA compliant builds, but that also means that you'll need to update all superrepos once a submodule is updated. Gerrit has support for this, but it's not atomic, and what about CI? What about CI that fails?

foooorsyth|1 year ago

>I don’t really care

I care about the aesthetics and the convenience that the tool provides. git-repo at least has a simple command to get all the latest stuff (repo sync). Git submodules is a mess in this regard. Just look at this stack overflow thread:

https://stackoverflow.com/questions/1030169/pull-latest-chan...

People are confused at how to do THE most basic command that you’d have to do every single day with a multi-repo environment. There’s debating in the comments about what flags you should actually use. No thanks.

There’s a lot of room for improvement in this space. git-repo isn’t widely used outside of aosp. Lots of organizations are struggling with proper tooling for this type of setup.

38|1 year ago

[deleted]

ryapric|1 year ago

Sure, but not all my code in a single repo is monolingual code, and not all of those languages have cooperative (or existing) package managers. Bash is a great example of something I wish I could stop copy-pasting between repos, and was actually the original motivator for vdm (along with protobuf files, for the same reasons).