The other conclusion to draw is "Git is a fantastic choice of database for starting your package manager, almost all popular package managers began that way."
I think the conclusion is more that package definitions can still be maintained on git/GitHub but the package manager clients should probably rely on a cache/db/a more efficient intermediate layer.
Mostly to avoid downloading the whole repo/resolve deltas from the history for the few packages most applications tend to depend on. Especially in today's CI/CD World.
This is exactly the right approach. I did this for my package manager.
It relies on a git repo branch for stable. There are yaml definitions of the packages including urls to their repo, dependencies, etc. Preflight scripts. Post install checks. And the big one, the signatures for verification. No binaries, rpms, debs, ar, or zip files.
What’s actually installed lives in a small SQLite database and searching for software does a vector search on each packages yaml description.
Semver included.
This was inspired by brew/portage/dpkg for my hobby os.
This is how WinGet works. It has a small SQLite db it downloads from a hosted url. The DB contains some minimal metadata and a url path to access the full metadata. This way WinGet only has to make API calls for packages it's actually interacting with. As a package manager, it has plenty of problems still, but it's a simple, elegant solution for the git as a DB issue.
I actually find that nixpkgs being a monorepo makes it even better. The code is surprisingly easy to navigate and learn if you've worked in large codebases before. The scaling issues are good problems to have, and git has gotten significantly better at handling large repos than it was a decade ago, when Facebook opted for Mercurial because git couldn't scale to their needs. If anything, it's GitHub issues and PRs that are probably showing its cracks.
For the purposes of the article, git isn't just being used as a database, it's being used as a protocol to replicate the database to the client to allow for offline operation and then keep those distributed copies in sync. And even for that purpose you can do better than git if you know what you're doing, but knowledge of databases alone isn't going to help you (let alone make your engineering more economical than relying on free git hosting).
saidinesh5|2 months ago
Mostly to avoid downloading the whole repo/resolve deltas from the history for the few packages most applications tend to depend on. Especially in today's CI/CD World.
reactordev|2 months ago
It relies on a git repo branch for stable. There are yaml definitions of the packages including urls to their repo, dependencies, etc. Preflight scripts. Post install checks. And the big one, the signatures for verification. No binaries, rpms, debs, ar, or zip files.
What’s actually installed lives in a small SQLite database and searching for software does a vector search on each packages yaml description.
Semver included.
This was inspired by brew/portage/dpkg for my hobby os.
pseufaux|2 months ago
edolstra|2 months ago
Sure, eventually you run into scaling issues, but that's a first world problem.
l9o|2 months ago
bluGill|2 months ago
kibwen|2 months ago
adastra22|2 months ago
IshKebab|2 months ago
fn-mote|2 months ago
As it is, this comment is just letting out your emotion, not engaging in dialogue.
venturecruelty|2 months ago