top | item 30015693

Ask HN: Which version control system should a new business in 2022 use?

18 points| tomrod | 4 years ago | reply

There seem to be a whole slew of version control systems out there. The three I have limited familiarity with are Github, Gitlab, and Bitbucket.

Do you have one your recommend for a new small business? Are there any that give better support for ML in addition to application code, such as data and model versioning options? What else would you ask if you were in my shoes?

21 comments

order
[+] austhrow743|4 years ago|reply
Imo with every single thing that’s not your businesses core advantage or differentiation, you should go with what you’re familiar with. If I was in your shoes i wouldn’t let myself turn it in to a decision to decide. Whatever I was already using would win by default.

To do otherwise is bike shedding. Every comment you read in this thread, every faq you open, free account you create to poke around. It’s time and energy that could have been spent working on the product. Far more importantly though, is the false sense of achievement you’d feel having spent time choosing bitbucket or whatever wins out, while having not progressed.

[+] dusted|4 years ago|reply
Indeed, if VCS is not your core biz, go with whatever you know well and feel comfortable with (for instance, disaster recovery and solving uncommon problems). Briefly consider the ease of on boarding of new employees, and whether your own skills are good enough to teach it to them if your preferred system is not one that can be expected of people to know.
[+] tomrod|4 years ago|reply
Your input is much appreciated!
[+] ncmncm|4 years ago|reply
All three of those are Git.

The only other modern alternative is Fossil, used for the SQLite project.

They are both good. Fossil is probably more reliable, but the difference is unlikely to affect your business. I have had Git repositories become corrupt, where I had to clone a new copy and abandon the old one. That was OK only because I was not maintaining private branches on them.

The problem with the commercial Git wrapper services is that they try to lock you in. All the secondary parts are not kept in the same, or any, repository, so if you come to want to migrate, it is a big chore. That is intentional.

I would like to know of an online Git service that keeps everything about a project in the same, or anyway some, clonable archive.

[+] smt88|4 years ago|reply
This is definitely bikeshedding[1]. Your choice of git host has nothing to do with your chances of success.

Use Github, ignore this thread, and spend your time on revenue-generating activities. You can switch later if you need to, and you aren't going to need to.

1. https://en.m.wiktionary.org/wiki/bikeshedding

[+] fundamental|4 years ago|reply
For code, git has won out among the other options. As per ML data+model versioning that area is still evolving and what the right choices are there depends on ML frameworks as well as your approaches to deploying new models.

Generally I'd view data and trained model versioning to be separate, but linked to the training code versioning. In an ideal world you end up with a system where data version+training code version is in the metadata of a given model version, but there's plenty of other aspects of the data science themed addons to consider.

[+] remram|4 years ago|reply
Those are not version control systems.
[+] orf|4 years ago|reply
If you don’t know any VCS, use git. If you know and are very familiar with another VCS, use that.

At the small business scale it doesn’t really matter, won’t materially improve your product and every second spent thinking about it is a second spent not thinking about your actual business.

[+] giantg2|4 years ago|reply
I think technically those three are all git. I believe you can host a get repo/server yourself too.

Currently use GitHub and my company uses BitBucket. I've used SVN in the past.

I think a small company doesn't need BitBucket since it's really just extended for integrating with JIRA.

[+] GauntletWizard|4 years ago|reply
The answer to the question you asked is unquestionably "Git", but the hosting provider for said git repository is open for debate.

I use GitHub and GitLab for clients. I use GitLab for my personal repos, because it's free offerings were better. I don't recommend one over the other, per se - both have advantages. GitLab's all inclusive CI is easier to use if you understand it well, but you need to have a tools guy who really understands the value of building your own and not building your own. GitHub as the defacto leader has more third party integrations - I would use Circle CI over GitHub Actions, because actions is very inflexible at the moment. It is planned to get better, and I believe it will.

ML support is not on any of their radar, and specifically for that, GitLab's ability to drop in your own runners would come in very handy, but data versioning support is not a first class feature - Though Git-LFS is as good there as elsewhere.

[+] codegeek|4 years ago|reply
"New Small business"

Go with something simple, tried and tested. I prefer Github. Never warmed up to the idea of Gitlab's UI and few years ago tried its OSS version which was slow as hell (this was in 2015 so it's been a while). I used to be on Bitbucket but when they got acquired by Atlassian, game over.

[+] 908B64B197|4 years ago|reply
The three listed use git as their interface and back-end.

Now, they all lock you in their own issue and release tracker. I would go with github, just because it's pretty much the industry standard (some OSS projects moved to gitlab but keep a mirror on github just because there's so many users and it's where people expect to find the official repo).

[+] foobarbaz33|4 years ago|reply
> better support for ML... such as data and model versioning options?

I'm not familiar with the kinds of files ML deals with. But if you want to version control huge data dumps or non-textual files, git is not ideal. Or use it as a general purpose "share folder" to store random things. A git repo is best fit to store source code for a single project.

SVN might be OK if you insist on using 1 source control system for both code and dumps.

But I don't think you need to force all things into 1 system. It would be acceptable to store pure source code in git. And dumps in something else, maybe using old school backup strategies rather than source control if they are truly massive.

[+] kelseyfrog|4 years ago|reply
Git+DVC is an option. I haven't personally used it, but I'm prepared to explore it when my team stumbles onto the problem of "code and data need to be in sync or things will break in very terrible ways."
[+] tomrod|4 years ago|reply
Depending on implementation, model objects can be binary objects or sometimes XML/JSON. Usually the former, and I'm not sure what ONNX uses under the hood.

The ideal for data outside of databases in my world is parquet or feather, which again are binary (if I understand the spec right).

[+] sepiasaucer|4 years ago|reply
Couldn’t git lfs handle large non-textual files? How does SVN hand large non-textual files?
[+] pkrumins|4 years ago|reply
Definitely Github. Everyone is familiar with Github and can instantly start using it without thinking or learning. Rarely anyone has experience with Gitlab or Bitbucket and you'll spend hours figuring out how to get started and where's what.
[+] quickthrower2|4 years ago|reply
Generally Git/Github is a good choice. But you could throw a 3 sided dice and pick any of them and be fine.
[+] bluehuman|4 years ago|reply
For database schema versioning, you may try bytebase.com, it also can work seamlessly with gitlab.
[+] swah|4 years ago|reply
Git + Github/Gitlab, don´t think about it.
[+] crumbits|4 years ago|reply
I’d use CVS, it’s old but very well tested, it’s easy to use, and you can avoid dollar-sucking platforms and services easily with it.