top | item 27809110

Ask HN: How to handle 500+ repositories in GitHub?

27 points| voorloopnul | 4 years ago

Searching for ways to handle a large number of repositories I stumbled in this thread: https://github.community/t/structuring-repositories-or-organisations/817

Apparently Gitlab and Bitbucket give you a feature to group repositories into projects while Github is lacking something similar.

How companies using Github handle their repositories when there are 100's of them?

32 comments

order

sverhagen|4 years ago

We are currently moving from Bitbucket to GitHub (hurrah!), and since we were "assigned" a single organization in GitHub by our corporate parent, we are also looking at importing a lot of repositories into a single organization. And the inability to further organize these repositories has been a big disappointment for an otherwise good experience with GitHub. Pretty much what this thread is about!

We end up telling our teams to use the different search functions to help in making sense of the madness: search by language (automatically detected by GitHub), team, type (forks or other), topics and names. We have a decent process to make sure the appropriate topics are set for repositories (we manage all non-forks through Terraform), and to make sure that names include a useful prefix. Though I don't think we'll ever stop debating the right prefixes for our repository names.

It's a "poor man's" ordeal, that's for sure. Organizations are probably the right way to go, if you can.

We're still in the move, so too soon to tell, but I am also wondering if we aren't worried about a non-issue. Because while Bitbucket has "groups", I don't think I have ever consciously used those groups while working in Bitbucket (nor have we ever set up the Bitbucket groups very well for our organization anyway).

dariusj18|4 years ago

RE: Topics, this is a feature I just found. What is the UX like when trying to find repos by topic?

prepend|4 years ago

My org has about 500 repos split across 3 orgs. There’s a “main” org, an OSS org, and then splinters for people who are willing to pay for and manage their own.

For the main org, we have templates that projects typically follow where they include info in the readme for what group and individuals manage and we require that they add a tag for the specific center and encourage tags for projects.

So it’s sort of possible to search for a tag within the org to see all the projects but it’s janky and confusing.

Some groups create “housekeeping repos” that is just a repo with docs that link out and describe all their projects. They use repo instead of wiki as GitHub wikis are kind of a pain to manage (must have specific permissions and can’t just fork and send a PR). So that group uses the housekeeping repo as the link they give out to new team members, etc.

For the OSS org projects, we also have a portal repo that builds a github.io portal site that shows cards for each repo and allows searching and sorting. The OSS org doesn’t use the tagging scheme because the public doesn’t really know or care about our internal org names. We have about 175 projects in our OSS org.

Note, we also run GitLab community edition internally and actually have subgroups and stuff. But since GitLab requires internal network access GitHub use is growing since GitHub is in the cloud and doesn’t require VPN. The GitLab license costs are much higher than GitHub and not really compatible with our dev style. We have lots of non-devs and the ratio is probably 3:1 of non-dev:dev and GitLab makes us license everyone the same so we can’t pay $1000/year so a PM can update readmes and project cards.

szszrk|4 years ago

There are github organizations. I checked this year and it allowed to create one for free.

https://docs.github.com/en/organizations/collaborating-with-...

prepend|4 years ago

There’s no hierarchy though, so that’s the problem for me. If I have 400 repos and divide them across 4 orgs with 100 each, users have to be added to all four groups, search across all four, etc.

Gitlab allows groups under the org so you can have hierarchies common in org structures.

I wish GitHub would do this with suborgs. So I can have a “Foo” org for my company and then divisions or big projects can have suborgs. So all users are members of Foo and then also members of their particular groups.

This is sort of possible with Teams and permissions but is hard UX as there’s no way to just see all the repos in a suborg.

sfgweilr4f|4 years ago

I'm having trouble with 30 or so repos. I shudder to think what 500 or more is like. Search every time? kind of not good.

Maybe there is a place for an "index" repo that holds a set of github pages that acts as an index into all the repos and groups them via that page instead of just using search.

prepend|4 years ago

One of our devs built a portal that does this that I think is easier to use than the normal GitHub search, http://cdcgov.github.io

I think most teams just know their repos and go to a small set of projects so don’t spend a lot of time browsing the org trying to find stuff.

brarsanmol|4 years ago

I believe you can do this multiple ways, GitHub allows you to create Teams and Projects and then it also allows you to add Topics to repositories.

1) If you don't want to have access control on your repositories you can simply create a project and link it multiple repositories to it.

2) In a similar vein, another approach without access control is adding a topics to your repositories and then searching them in the GitHub search bar using *topic: your-topic".

3) Finally, if you need to have access controls or permissions on your repositories you can create teams and assign that team repositories.

I am not a professional or corporate user, so please forgive me if any of this not 100% accurate.

ecesena|4 years ago

I started a new project with monorepo, but soon the number grew when I had to share smaller private sub-projects with smaller teams. I wish there was a way to only share a directory or so.

On the monorepo, I learned a couple interesting things. For example with npm packages, even if you host multiple packages in a single monorepo you can still track dependents for each individual lib [1]. Well done Github.

[1] https://github.com/.../.../network/dependents

fruityrudy|4 years ago

Google “git subrepo”

tossaway9000|4 years ago

430+ repos here, corporate setting, its not great but we use some custom scripts (validate users, groups, which repos are allowed to be public), strict documentation, and quarterly auditing. We have a lot of strict requirements around branch protections and the API for branch protections leaves a lot to be desired.

We've found that setting up new repos in a strictly documented manner has been the best way to approach it, we also have some github actions that run periodically to run some sanity checks across repos.

We're a terraform shop but we had countless issues with the terraform-github-provider but, maybe its improved the last year or so.

Also, Github has no "protections" around tagging, this really hurts us as we want to move to tags and releases for versioning but don't have a way to require multiple approvals before cutting an artifact that can be promoted so we have to wrap some customization and processes around it.

cik|4 years ago

Unfortunately we ended up giving in and using multiple GitHub organizations. There doesn't seem to be a better answer like repository tagging or labeling. It's not ideal, and has definitely resulted in lost issues, duplicated effort and the like. However - it's the best answer we have.

prepend|4 years ago

I’ve thought about this, but it makes it hard to “inner source” as each project just sticks to its org and there’s less cross-pollination. It’s also an admin hassle to move users across orgs.

noufalibrahim|4 years ago

Not exactly your requirement but I heavily use Github Classroom while teaching and it generates a ton of repositories (one per exercise per student). I name them consistently and wrote a few command line scripts to grade, delete etc. them.

shoo|4 years ago

Probably need to think through the concrete use cases for "handle".

e.g. if there are requirements related to tracking issues in some uniform way in software projects that have a many:many relationship with source repos, a possible solution is to use some other system for issue tracking, don't try to track issues in github.

If there's a requirement to uniformly configure access controls / branch protection etc to hundreds of git repos, you could use terraform or roll your own automation using the github API combined with whatever you can enforce at the github org level.

bramblerose|4 years ago

In the process of moving from GitLab to GitHub. We just use naming conventions for this: all repositories are called Company.Group.Product, where Group roughly corresponds to the old GL group.

andrewstuart2|4 years ago

Just out of curiosity, why the move? I'll admit I'm something of a GitLab fanboy, but honestly it's because they keep winning on just about everything in CE vs even the paid alternatives (in my mind).

dbg31415|4 years ago

One feature I REALLY like about ZenHub is that you can build a project board with multiple repositories on it. Easy to tie tickets from the back-end repo to the front-end repo, or whatever you need. Wish GitHub would build out multi-repo projects... or just buy ZenHub. Nice of GitHub to build out their own project boards in recent years, but it's never been as good as some of the 3rd Party tools that are out there.

https://www.zenhub.com/

jjice|4 years ago

For those who are in organizations that have near 500 repos, why do many? Is it shear size of the company, or is it organizing individual internal libraries into different repos (seems like good practice), or something else? I've only ever worked for small companies where we have max 5 real repos, and I work at a company with a mono repo right now.

Large scale systems are very interesting to me, but I have to say, I don't mind the mono repo.

Sevii|4 years ago

I work in an org with a similar setup where there is no name-spacing on git repositories. We end up prefixing the project name with our org name.

ex repo name is: $OrgName + $ProjectName

Honestly, it is not a great solution because the org name has changed a few times and we have repos under 3-4 different naming schemes.

rurban|4 years ago

I have currently 375 repos under my user, plus about 20 more under different orgs. What's the exact problem to solve? Just avoiding to pay per user per org?

I even have some crontab's to update all issues via git bug bridge locally, so that I don't need an internet connection to github.com to work on tickets. Clicks are not easily automated, so I avoid them for my workflows. Same for almost daily rebasing. (all feature and bugfix branches are automatically rebased to master, when master moves). This is 90% automatic.

All my repos are public, I pay nothing. I find that better than paying for it and keeping them private.

PaywallBuster|4 years ago

"feature to group repositories into projects while Github is lacking"

giantg2|4 years ago

Maybe take a variation of the old COBOL playbook to number things in an extensible way - prefix repo names so that they are organized by it.