My advice is that if components need to release together, then they ought to be in the same repo. I'd probably go further and say that if you just think components might need to release together then they should go in the same repo, because you can in fact pretty easily manage projects with different release schedules from the same repo if you really need to.
On the other hand if you've got a whole bunch of components in different repos which need to release together it suddenly becomes a real pain.
If you've got components that will never need to release together, then of course you can stick them in different repositories. But if you do this and you want to share common code between the repositories then you will need to manage that code with some sort of robust versioning system, and robust versioning systems are hard. Only do something like that when the value is high enough to justify the overhead. If you're in a startup, chances are very good that the value is not high enough.
As a final observation, you can split big repositories into smaller ones quite easily (in Git anyway) but sticking small repositories together into a bigger one is a lot harder. So start out with a monorepo and only split smaller repositories out when it's clear that it really makes sense.
Components might need to be released “together”, but if they are worked on by different teams, it means they’ll have a different release process, as in different timeline, different priorities.
First of all this is normal, because otherwise the development doesn’t scale.
In such a case the monorepo starts to suck. And that’s the problem with your philosophy ... it matters less how the components connect, it matters more who is working on it.
Truth of the matter is that the monorepo encourages shortcuts. You’d think that the monorepo saves you from incompatibilities, but it does so at the expense of tight coupling.
In my experience people miss the forest from the trees here. If breaking compatibility between components is the problem, one obvious solution is to no longer break compatibility.
And another issue is one of responsibility. Having different teams working on different components in different repos will lead to an interesting effect ... nobody wants to own more than they have to, so teams will defend their components against unneeded complexity.
And no, you cannot split a monorepo into a polyrepo easily. Been there, done that. The reason is that working in a monorepo versus multiple repos influences the architecture quite a lot and the monorepo leads to very unclear boundaries.
My rule of thumb is: if you need to do PRs in several repositories to do one features, you should probably merge the repositories. At work, we have code spread among a bunch of repositories, and having to link to the 2/3 related PRs in other repos is a major PITA, and even more so for the reviewers.
> As a final observation, you can split big repositories into smaller ones quite easily (in Git anyway) but sticking small repositories together into a bigger one is a lot harder. So start out with a monorepo and only split smaller repositories out when it's clear that it really makes sense.
If you only need to do this once, subtree will do the job, even retaining all your history if you want.
I'm not sure what the easier way to split big repos is.
I haven't tried in Git, but with Mercurial merging repos is as simple as pulling from an unrelated repository and merging, that's it. It's a lot simpler than splitting a repo up unless you accept that all of the old history can remain, then you just make a clone and delete what should no longer be a part of the repository.
But monorepo leads to tight coupling, and that is just as much a pain to work on as versioning, or two teams are simultaneously working on the same shared code, and you have not only merge conflicts, but conflicting functionality.
So why is that? Why do we ned to couple together the software development efforts with release? Based on my experience there is no difference between the monorepo vs multirepo approach from the deployment point of view.
After trying to get the best of both with Subversion Externals and Git Submodules, I'd have to agree. At least until things are so loosely coupled they're begging for a public release.
That said, some packaging solutions can bridge the gap reasonably well. Unless you need instantaneous, atomic releases.
What are you talking about! In my perfect micro services world I just have these enforced bounded contexts that are so perfectly designed they never need to change. Consequently all parts of the system are perfectly independent snowflakes that can be deployed without thinking about any other parts of the system. It’s beautiful really when you think about the mess that things were before we could do this!
I can think of situations where components 'need' to release together because of organizational rules and not any actual binding between the components, in that case of course they do not need to be in the same repository.
I agree that you should always start with one repo and split as needed, it's the MVR way (minimum viable repository)
My problem with polyrepos is that often organizations end up splitting things too finely, and now I'm unable to make a single commit to introduce a feature because my changes have to live across several repositories. Which makes code review more annoying because you have to tab back and forth to see all the context. It's doubly frustrating when I'm (or my team is) the only people working on those repositories, because now it doesn't feel like it gained any advantages. I know the author addresses this, but I can't imagine projects are typically at the scale they're describing. Certainly it's not my experience.
Also I definitely miss the ability to make changes to fundamental (internal) libraries used by every project. It's too much hassle to track down all the uses of a particular function, so I end up putting that change elsewhere, which means someone else will do it a little different in their corner of the world, which utterly confuses the first person who's unlucky enough to work in both code bases (at the same time, or after moving teams).
My current team managed to break a single "component" out into a separate repository. Then that repository broke into two, then those broke into other repositories, until we've eventually have around 10 or so different repositories that we work on every day.
An average change touches 4 of them, and touching one of them triggers on average releases on 2 or 3 of them. Even building these locally is super tedious, because we don't have any automation in place (not formally plan to) for chain building these locally.
This is a nightmare scenario for myself. A simple change can require 4 pull requests and reviews, half a day to test and a couple hours to release.
Yet my team keeps identifying small pieces that can be conceptually separated from the rest of the functionality, even if they are heavily coupled, and makes new repos for these!
It's an interesting social problem in how you manage those project / library / repository boundaries. On the flipside, though, it's been well documented that among many of the major monorepos those boundaries still exist, they just become far more opaque because no one has to track them. You find the weird gatekeepers in the dark that spring out only when you get late in your code review process because you touched "their" file and they got an automated notice from a hidden rules engine in your CI process you didn't even realize existed.
In the polyrepo case those boundaries have to be made explicit (otherwise no one gets anything done) and those owners should be easily visible. You may not like the friction they sometimes bring to the table, but at least it won't be a surprise.
It's very much possible to make changes to internal libraries used all over the place, but it does require versioning to be something that people think about, and a mechanism by which those libraries aren't just pulled from source control to depend on them. Once you've got some sort of dependency management such as an internal gem/npm/whatever source you can treat those internal dependencies the same as you'd treat external ones, instead of having to somehow coordinate a release of absolutely everything in one go.
That's not really that different in a monorepo since you often need reviews from the same number of people anyway.
I once had to wait for 9 months to get a complex change through in a monorepo setting because of all the people involved, the number of stuff it touched and the fact that everything was constantly in flux so I spent half my time tracking changes. I'm not saying it would have been faster in a polyrepo. I'm saying that complex changes are complex regardless of how the source is organized.
I do however think that polyrepos forces you to be more disciplined and that it is easier to slip up in a polyrepo and turn a blind eye to tighter couplings.
The multi-repository code review is an interesting concept. Here at RhodeCode we're actually working on such solution to implement. This is in first to solve our internal problem of release code-review spanning usually two projects at once.
This is a hard and complex problem. Especially how to make code-review not too messy if you target 5-8 repos at once.
I think this article is complete horseshit. A monorepo will serve you 99% of the time until you hit a certain level of scale when you get to worry about whether a monorepo or a polyrepo is actually material. Most cases are never going to get there. Before that point, a polyrepo is purely a distraction and makes synchronous deployment really painful. We had to migrate a polyrepo to a monorepo and it was not fun because it was a migration that should have never had to be done in the first place. Articles like this are fundamentally irresponsible.
I work on CI/CD systems, and that’s one thing that definitely gets harder in a monorepo.
So you made a commit. What artifacts change as a result? What do you need to rebuild, retest, and redeploy? It doesn’t take a large amount of scale to make rebuilding and retesting everything impossible. In a poly repo world, the repository is generally the unit of building and deployment. In monorepo it gets more messy.
For instance, one perceived benefit of a monorepo is it removes the need for explicit versioning between libraries and the code that uses them, since they’re all versioned together.
But now, if someone changes the library, you need to have a way to find all of its usages, and retest those to make sure the changed didn’t break their use. So there’s a dependency tree of components somewhere that needs to be established, but now it’s not explicit, and no one is given the option to pin to a particular version if they can’t/won’t update. This is the world of google & influcenced the (lack of) dependency management in go.
You could very well publish everything independently, using semver, and put build descriptors inside each project subdirectory, but then, congratulations, you just invented the polyrepo, or an approximation thereof.
I found it to be neither horseshit nor irresponsible. A bit overdrawn and skewed in some of its arguments, perhaps. But then again... so was your critique. For example:
We had to migrate a polyrepo to a monorepo and it was not fun because it was a migration that should have never had to be done in the first place
s/polyrepo/monorepo/ in the above and you have an assertion of about equal plausibility and weight.
Here here yowlingcat. Article is a way too prescriptive and agreed, borders on irresponsible. Monorepo vs polyrepo argument is way too broad a subject to create generalized stereotypes like this. These opinions sadly are taken as facts by impressionable managers, new developers, etc, and have cascading effects on the rest of us in the industry. Use what makes sense in the project environment and team, don't just throw shade at teams who are successfully and productively using monorepos where they make sense. Sure there is good reason to split things up on boundaries sometimes, (breaking out libraries, rpc modules, splitting along dev team boundaries, etc etc etc), but not blindly by default. Will Torvalds split up the kernel into a polyrepo after reading this article? Something tells me that would be a bit disruptive.
> A monorepo will serve you 99% of the time until you hit a certain level of scale when you get to worry about whether a monorepo or a polyrepo is actually material
If you worked in a company that had a core product in a repo, and you wanted to create a slack bot for internal use, where would you put the code? I assume not within your core product's codebase, but within a separate repo, thus creating a polyrepo situation.
So when you say a monorepo will serve you in 99% of cases, are you not counting "side" projects, and simply talking about the core product?
My last 2 jobs have been working on developer productivity for 100+ developer organizations. One is a monorepo, one is not. Neither really seems to result in less work, or a better experience. But I've found that your choice just dictates what type of problems you have to solve.
Monorepos are going to be mostly challenges around scaling the org in a single repo.
Polyrepos are going to be mostly challenges with coordination.
But the absolute worst thing to do is not commit to a course of action and have to solve both sets of challenges (eg: having one pretty big repo with 80% of your code, and then the other 20% in a series of smaller repos)
Pretty funny to read that the things I do every day are impossible.
Monorepo and tight coupling are orthogonal issues. Limits on coupling come from the build system, not from the source repository.
Yes, you should assume there is a sophisticated "VFS". What is this "checkout" you speak of? I have no time for that. I am too busy grepping the entire code base, which is apparently not possible.
If the "the realities of build/deploy management at scale are largely identical whether using a monorepo or polyrepo", then why on earth would google invest enormous effort constructing an entire ecosystem around a monorepo? Choices: 1) Google is dumb. 2) Mono and poly are not identical.
> then why on earth would google invest enormous effort constructing an entire ecosystem around a monorepo? Choices: 1) Google is dumb. 2) Mono and poly are not identical.
I think, once you've chosen a path of mono or poly, you have quite a challenge ahead of you to migrate to the other.
At that point, the tradeoffs arent based purely on the technical benefits - and "invest in monorepo tooling" may become a perfectly valid decision, as it's cheaper than "migrate to a polyrepo setup'.
I'm not arguing either way for or against monorepo, just pointing out that "must be a good idea because Google does it" is invalid - technical merit is just one of the thousands of concerns to be balanced.
3) Google is committed to a monorepo to the point migrating away from it would be unpractical.
Truth is, ending up with a monorepo is _really easy_. It usually starts with something that doesn't even _feel_ like more than one project: backend code, frontend templates and some celery/whatever tasks, maybe some minor utility CLI tools. And this happens at the stage nobody wants to even _think_ about more than one git repository.
Once those are big enough, it's likely too late.
But hey, you can always claim _you wanted it that way_. My cats always look good while pulling that one.
It can work. That doesn't mean it is a universal solution. And it doesn't even mean it is a solution that is guaranteed to cover most projects. Whether or not a monorepo works depends on a lot of factors. In my experience the number of cases where it doesn't work appears to outnumber the cases where it works.
It can work nicely when you have disciplined and demonstrably above average programmers that are good at structuring the internal architecture of systems and will know how to design for plasticity. It is also an advantage if all your code is written in the same style and doesn't come from a bunch of older codebases. But even then you can end up with messes that you will be likely to conveniently forget about.
For instance while clear decoupling was a goal when I worked at Google, it wasn't always a reality. There were still lots of very deep and direct dependencies that should never have been there.
It does not work well if you have "average" developers or if you have undisciplined developers or excessive bikeshedders (which kill productivity).
Then there is the tooling. Most people do not work for Google and do not have the ability to spend as much money and time on tooling as Google does. What Google does largely works because of the tooling. It would suck balls without it. To be honest: some things sucked balls even with the tooling. Especially when working with people in different time zones.
Google isn't really a valid example of why monorepo is a good idea because your average company isn't going to have a support structure even remotely as huge as Google. (If you disagre: hey, it's easy, go work for Google for a while and then tell me I'm wrong)
“why on earth would google invest enormous effort constructing an entire ecosystem around a monorepo?”
Didn’t google have a monorepo before git was created? And was created by academics? Legacy and momentum have a strong influence on the future. Hasn’t google also built a lot of tools for the monorepo and dedicates employees to it? That’s exactly the issue this article is about.
From an external perspective, the speed and scale of product rollouts from the bigger tech companies is very slow. I don’t know if the tooling has much to do with it, but I suspect it might. I’ve heard some horror stories (some from here) about how it takes months to get small changes into production.
3) A monorepo with significant investment in ecosystem and tooling is a better choise than a polyrepo
For other (smaller) companies, polyrepo might be the better choice because [significant investment in ecosystem and tooling] is not appealing, and the investments of Google et al. have not leaked through sufficiently into general available tools. Some headway is being made in the latter [1], so monorepo might be the "obvious" best choice in 10 years or so.
> As described above, at scale, a developer will not be able to easily edit or search the entirety of the codebase on their local machine. Thus, the idea that one can clone all of the code and simply do a grep/replace is not trivial in practice.
Yeah this is a pretty widespread and fundamental misunderstanding that leads to a lot of bad policy decisions.
If 'grepping code' is your first resort then you're hitting things with a hammer. I'm writing code that a machine is supposed to understand. If the machine can't understand how the bits interact then I have much bigger problems than where my code is stored. Probably we're dealing with a lot of toxic machismo bullshit that is hurting our ability to deliver.
If you want discipline, if you want cooperation, hell if you just want to be able to hire a bunch of new people when you land a big customer, you need some form of support for static analysis and the code navigation that it enables. Stop the propeller heads from using magic and runtime inference to wire up the parts of the system, or find a new gig. Even languages where static typing isn't a thing have documentation frameworks where you can provide hints that your IDE can understand (ex: jsdoc for Javascript).
For a large team, working without any kind of static analysis is a recipe for a rigid oligarchy. Only people who have memorized the system can reason about it. Everybody else who tries to make ambitious changes ends up breaking something. See what happens when you trust new people with new ideas? New is bad. Be safe with us.
And even if by some miracle you do make the change without blowing stuff up, you're still in the doghouse, because we have memorized the old way and you are disrupting things!
Some crazy ideas work well. Some reasonable ideas fail horribly. To grow, people need the space to tinker and an opaque codebase ruins those opportunities. Transparency is also helpful when debugging a production issue, because people can work in parallel to the people most likely to solve the problem (even the person who is usually right is way off base occasionally). I should be able to learn and possibly contribute without jamming up the rest of the team by asking inane questions.
You need pretty good but entirely achievable tooling and architecture to get that, but man when you do it's like getting over a cold and remembering what breathing feels like.
At least the author gave us the courtesy of italicizing his broken assumption from the outset of the post.
> Because, at scale, a monorepo must solve every problem that a polyrepo must solve, with the downside of encouraging tight coupling, and the additional herculean effort of tackling VCS scalability.
Right.
But you have to get to "scale" first (as it relates to VCSs). Most companies don't. Even if they're successful. Introducing polyrepos front loads the scaling problems for no reason whatsoever. A giant waste of time.
Checkmate! I didn't even need a snarky poll. The irony of that poll is that it clearly demonstrates his zealotry, not other people's.
There's a lot wrong with this article. Most of the arguments are either not backed up or are misleading. I haven't heard anyone argue they can drop dependency management because of a monorepo.
The author lists downsides of monorepos without listing the upsides and downsides of polyrepos so its really half complete.
I don't think anyone who likes a monorepo is suggesting you just commit breaking changes to master and ignore downstream teams. What it does do is give the ability to see who those downstream teams (if any) might be.
The crux of the author's argument is that added information is harmful because you might use it wrong. Its just as easy (far easier in fact) to ignore your partners without the information a monorepo gives. Its not really an argument at all. There's really nothing here but "there be dragons".
Monorepo's provide some cross functional information for a maintenance price. Its up to you whether the benefit is worth the overhead.
Seems like the main point is that you'll still need to add additional tooling (search, local cloning, build, etc) to handle scaling, something you can do just as well with polyrepos. Conversely, for polyrepos, you can add tooling to fix issues with dependency management and multi-project changes/reviews. However, the author figures that monorepos engourage bad code culture and points out that Git is hard to build a monorepo on.
To me this message seems a bit shallow, of course we can build tooling to hide the fact that we have a polyrepo. Given well enough built tooling and consistent enough polyrepo structure (all using same VCS, all being linked from common tooling, following common coding standards and using the same build tooling, etc.) the distinction from having a monorepo is more of an implementation detail.
Given the choice between a consistent monorepo where everyone is running everything at HEAD and a polyrepo where each project have their own rules and there's no tooling to make a multi-project atomic change, I'd go for the former.
Given the choice between identical working environments but different underlying implementations I would go for whatever the tools team think is easier to maintain.
I’ve found monorepos to be extremely valuable in an immature, high-churn codebase.
Need to change a function signature or interface? Cool, global find & replace.
At some point monorepos outgrow their usefulness. The sheer amount of files in something that’s 10K+ LOC ( not that large, I know ) warrants breaking apart the codebase into packages.
Still, I almost err on the side of monorepos because of the convenience that editors like vscode offer: autocomplete, auto-updating imports, etc.
The biggest gripe I have with modern day monorepos is that people are trying to use Git to interact with them, which doesn't make a tremendous amount of sense, and results in either an immense amount of pain and/or the creation of a bunch of tools to try to coerce Git into behaving basically like SVN.
Which of course begs the question, rather than trying to perform a bunch of unnatural acts, why not just use SVN to start with? It works extremely well with monorepo & subtree workflows.
Sure it has some warts in a few dimensions around branching, versioning, etc. compared to Git when using Git in ways aligned with how Git wants to work, but those warts are minimal in comparison to what's required to pretzel Git monorepos into scaling effectively.
Maybe its just that the author's cutoff is at the wrong team size, but the monorepo I work on (with ~150 devs) has almost none of the problems presented.
Unreasonable for a single dev to have the entire repo? I'm looking at a repo with ~10 million LoC and ~1.4 million commits. I have 74 different branches checked out right now. Hard drives are cheap.
Code refactors are impossible? I reviewed two of those this morning. They're essentially a non-event. I'm not sure what to make of the merge issue - does code review have to start over after a merge? That seems like a deep issue in your code review process. The service-oriented point seems like a non-sequitur, unless you're telling me I'm supposed to have a service for, say, my queue implementation or time library.
The VCS scalability issue is the only real downside I see here. And it is real, but it also seems worth it. It helps that the big players are paving the way here - Facebook's contributions to the scalability of mercurial has definitely made a difference for us.
I do really like mono-repos, but google's other significant new project: fuchsia - is set-up as multi-git repo (and I believe chromium too, maybe android (haven't checked)). For fuchsia, they use a tool called "jiri"[1] to update the repos, previously (and maybe still in use) is the "gclient" sync tool [2] way from depot_tools[3]
It even reflects a bit to the build system of choice, GN (used in the above), previously gyp, feels similar on the surface (script) to Bazel, but has some significant differences (gn has some more imperative parts, and it's a ninja-build generator, while bazel, like pants/bucks/please.build is a build system on it's own).
Simply fascinated :), and can't wait to see what the resolution of all this would be... Bazel is getting there to support monorepos (through WORKSPACEs), but there are some hard problems there...
Having worked with some organisations building on Android (>1,000 repos), life is not easy when you are trying to build on top of it and regularly take updates etc.
I asked one company how many changes required changes to more than one repo and was told "a small percentage". We then did some basic analysis of issue IDs across commits and discovered that it was in reality nearer 30% of changes. Keeping those together was just plain very hard.
Start to scale this by teams of hundreds or thousands of devs and you get a lot of pain.
Managing branches is also hard - easy to create (with repo tool) - but hard to track changes.
Funny that there are so many reimplementations of git submodules but with support for "just give me HEAD" - Google has two (jiri and repo), my company has a home-grown one too.
Android uses a top level repo that behaves as a monorepo with thousands of submodules inside. It's also designed for multiple companies sharing code and working with non shared code at the same time which introduces some constraints and challenges.
My polyrepo cautionary tale: Two repos, one for fooclient, one for fooserver, talking to each other over protocol. Fooserver can do scary dangerous permanent things to company server instances, of which there are thousands.
Fooserver sprouts a query syntax ("just do this for test servers A and B"), pushed to production. Fooclient sprouts code that relies on this, pushed to production. A bit later, Fooserver is rolled back, blowing away query syntax, pushed to production. "Just do this for test servers A and B" now becomes "Do this for every server in the company". Hilarity ensues.
Is there any examples of someone who actually maintained a monorepo for a massive company, who now says they shouldn't? It always seems to be "back seat drivers" against monorepo, not people with practical experience (that I can see at least)
To me, the key point is this: Splitting your code into multiple repos draws a permanent architectural boundary, and it's done at the start of a project (when you know the least about the right solution).
The upsides and downsides of this are an interesting debate, but there is a cost to polyrepos if you want to change the system architecture. There is a cost to monorepos too as argued by this post, and its up to the tech leads as to which cost is greater.
"The frank reality is that, at scale, how well an organization does with code sharing, collaboration, tight coupling, etc. is a direct result of engineering culture and leadership, and has nothing to do with whether a monorepo or a polyrepo is used. The two solutions end up looking identical to the developer. In the face of this, why use a monorepo in the first place?"
.....because, as the author directly stated, the type of repo has nothing to do with the product being successful. So stop bikeshedding, pick a model, and get on with the real business of delivering a successful product.
Could you get the best of both worlds by having a monorepo of submodules? Code would live in separate repos, but references would be declared in the monorepo. Checkins and rollbacks to the monorepo would trigger CI.
- TDD smoke tests should run automatically in dev on save with 10 seconds. Bonus points for running personal TDD sandbox on faster remote servers via rsync and trigger on file-save.
- Standardize on 1-3 languages.
- Services composed of simpler 12factor microservices, not monorepo megaservices. Deploy fuse switching, proxying, HA/redundancy, rate limiting, monitoring and performance stats collection just like macroservices.
[+] [-] curtis|7 years ago|reply
On the other hand if you've got a whole bunch of components in different repos which need to release together it suddenly becomes a real pain.
If you've got components that will never need to release together, then of course you can stick them in different repositories. But if you do this and you want to share common code between the repositories then you will need to manage that code with some sort of robust versioning system, and robust versioning systems are hard. Only do something like that when the value is high enough to justify the overhead. If you're in a startup, chances are very good that the value is not high enough.
As a final observation, you can split big repositories into smaller ones quite easily (in Git anyway) but sticking small repositories together into a bigger one is a lot harder. So start out with a monorepo and only split smaller repositories out when it's clear that it really makes sense.
[+] [-] bad_user|7 years ago|reply
First of all this is normal, because otherwise the development doesn’t scale.
In such a case the monorepo starts to suck. And that’s the problem with your philosophy ... it matters less how the components connect, it matters more who is working on it.
Truth of the matter is that the monorepo encourages shortcuts. You’d think that the monorepo saves you from incompatibilities, but it does so at the expense of tight coupling.
In my experience people miss the forest from the trees here. If breaking compatibility between components is the problem, one obvious solution is to no longer break compatibility.
And another issue is one of responsibility. Having different teams working on different components in different repos will lead to an interesting effect ... nobody wants to own more than they have to, so teams will defend their components against unneeded complexity.
And no, you cannot split a monorepo into a polyrepo easily. Been there, done that. The reason is that working in a monorepo versus multiple repos influences the architecture quite a lot and the monorepo leads to very unclear boundaries.
[+] [-] forty|7 years ago|reply
[+] [-] naniwaduni|7 years ago|reply
If you only need to do this once, subtree will do the job, even retaining all your history if you want.
I'm not sure what the easier way to split big repos is.
[+] [-] hvidgaard|7 years ago|reply
But monorepo leads to tight coupling, and that is just as much a pain to work on as versioning, or two teams are simultaneously working on the same shared code, and you have not only merge conflicts, but conflicting functionality.
[+] [-] StreamBright|7 years ago|reply
[+] [-] paulryanrogers|7 years ago|reply
That said, some packaging solutions can bridge the gap reasonably well. Unless you need instantaneous, atomic releases.
[+] [-] andy_ppp|7 years ago|reply
[+] [-] bryanrasmussen|7 years ago|reply
I agree that you should always start with one repo and split as needed, it's the MVR way (minimum viable repository)
[+] [-] outsomnia|7 years ago|reply
[+] [-] unknown|7 years ago|reply
[deleted]
[+] [-] mrgriffin|7 years ago|reply
Also I definitely miss the ability to make changes to fundamental (internal) libraries used by every project. It's too much hassle to track down all the uses of a particular function, so I end up putting that change elsewhere, which means someone else will do it a little different in their corner of the world, which utterly confuses the first person who's unlucky enough to work in both code bases (at the same time, or after moving teams).
[+] [-] inertiatic|7 years ago|reply
An average change touches 4 of them, and touching one of them triggers on average releases on 2 or 3 of them. Even building these locally is super tedious, because we don't have any automation in place (not formally plan to) for chain building these locally.
This is a nightmare scenario for myself. A simple change can require 4 pull requests and reviews, half a day to test and a couple hours to release.
Yet my team keeps identifying small pieces that can be conceptually separated from the rest of the functionality, even if they are heavily coupled, and makes new repos for these!
[+] [-] WorldMaker|7 years ago|reply
In the polyrepo case those boundaries have to be made explicit (otherwise no one gets anything done) and those owners should be easily visible. You may not like the friction they sometimes bring to the table, but at least it won't be a surprise.
[+] [-] dmix|7 years ago|reply
https://github.com/OctoLinker/OctoLinker
You can just click the import [project] name and it will switch to the repo.
[+] [-] jon-wood|7 years ago|reply
[+] [-] bborud|7 years ago|reply
I once had to wait for 9 months to get a complex change through in a monorepo setting because of all the people involved, the number of stuff it touched and the fact that everything was constantly in flux so I spent half my time tracking changes. I'm not saying it would have been faster in a polyrepo. I'm saying that complex changes are complex regardless of how the source is organized.
I do however think that polyrepos forces you to be more disciplined and that it is easier to slip up in a polyrepo and turn a blind eye to tighter couplings.
[+] [-] marcinkuzminski|7 years ago|reply
This is a hard and complex problem. Especially how to make code-review not too messy if you target 5-8 repos at once.
[+] [-] yowlingcat|7 years ago|reply
[+] [-] thanatos_dem|7 years ago|reply
So you made a commit. What artifacts change as a result? What do you need to rebuild, retest, and redeploy? It doesn’t take a large amount of scale to make rebuilding and retesting everything impossible. In a poly repo world, the repository is generally the unit of building and deployment. In monorepo it gets more messy.
For instance, one perceived benefit of a monorepo is it removes the need for explicit versioning between libraries and the code that uses them, since they’re all versioned together.
But now, if someone changes the library, you need to have a way to find all of its usages, and retest those to make sure the changed didn’t break their use. So there’s a dependency tree of components somewhere that needs to be established, but now it’s not explicit, and no one is given the option to pin to a particular version if they can’t/won’t update. This is the world of google & influcenced the (lack of) dependency management in go.
You could very well publish everything independently, using semver, and put build descriptors inside each project subdirectory, but then, congratulations, you just invented the polyrepo, or an approximation thereof.
[+] [-] drugme|7 years ago|reply
We had to migrate a polyrepo to a monorepo and it was not fun because it was a migration that should have never had to be done in the first place
s/polyrepo/monorepo/ in the above and you have an assertion of about equal plausibility and weight.
[+] [-] jskaggz|7 years ago|reply
[+] [-] audience_mem|7 years ago|reply
If you worked in a company that had a core product in a repo, and you wanted to create a slack bot for internal use, where would you put the code? I assume not within your core product's codebase, but within a separate repo, thus creating a polyrepo situation.
So when you say a monorepo will serve you in 99% of cases, are you not counting "side" projects, and simply talking about the core product?
[+] [-] lsferreira42|7 years ago|reply
[+] [-] sfrench|7 years ago|reply
Monorepos are going to be mostly challenges around scaling the org in a single repo.
Polyrepos are going to be mostly challenges with coordination.
But the absolute worst thing to do is not commit to a course of action and have to solve both sets of challenges (eg: having one pretty big repo with 80% of your code, and then the other 20% in a series of smaller repos)
[+] [-] rossjudson|7 years ago|reply
Pretty funny to read that the things I do every day are impossible.
Monorepo and tight coupling are orthogonal issues. Limits on coupling come from the build system, not from the source repository.
Yes, you should assume there is a sophisticated "VFS". What is this "checkout" you speak of? I have no time for that. I am too busy grepping the entire code base, which is apparently not possible.
If the "the realities of build/deploy management at scale are largely identical whether using a monorepo or polyrepo", then why on earth would google invest enormous effort constructing an entire ecosystem around a monorepo? Choices: 1) Google is dumb. 2) Mono and poly are not identical.
[+] [-] kiallmacinnes|7 years ago|reply
I think, once you've chosen a path of mono or poly, you have quite a challenge ahead of you to migrate to the other.
At that point, the tradeoffs arent based purely on the technical benefits - and "invest in monorepo tooling" may become a perfectly valid decision, as it's cheaper than "migrate to a polyrepo setup'.
I'm not arguing either way for or against monorepo, just pointing out that "must be a good idea because Google does it" is invalid - technical merit is just one of the thousands of concerns to be balanced.
[+] [-] LaGrange|7 years ago|reply
Truth is, ending up with a monorepo is _really easy_. It usually starts with something that doesn't even _feel_ like more than one project: backend code, frontend templates and some celery/whatever tasks, maybe some minor utility CLI tools. And this happens at the stage nobody wants to even _think_ about more than one git repository.
Once those are big enough, it's likely too late.
But hey, you can always claim _you wanted it that way_. My cats always look good while pulling that one.
[+] [-] bborud|7 years ago|reply
It can work nicely when you have disciplined and demonstrably above average programmers that are good at structuring the internal architecture of systems and will know how to design for plasticity. It is also an advantage if all your code is written in the same style and doesn't come from a bunch of older codebases. But even then you can end up with messes that you will be likely to conveniently forget about.
For instance while clear decoupling was a goal when I worked at Google, it wasn't always a reality. There were still lots of very deep and direct dependencies that should never have been there.
It does not work well if you have "average" developers or if you have undisciplined developers or excessive bikeshedders (which kill productivity).
Then there is the tooling. Most people do not work for Google and do not have the ability to spend as much money and time on tooling as Google does. What Google does largely works because of the tooling. It would suck balls without it. To be honest: some things sucked balls even with the tooling. Especially when working with people in different time zones.
Google isn't really a valid example of why monorepo is a good idea because your average company isn't going to have a support structure even remotely as huge as Google. (If you disagre: hey, it's easy, go work for Google for a while and then tell me I'm wrong)
[+] [-] ashelmire|7 years ago|reply
Didn’t google have a monorepo before git was created? And was created by academics? Legacy and momentum have a strong influence on the future. Hasn’t google also built a lot of tools for the monorepo and dedicates employees to it? That’s exactly the issue this article is about.
From an external perspective, the speed and scale of product rollouts from the bigger tech companies is very slow. I don’t know if the tooling has much to do with it, but I suspect it might. I’ve heard some horror stories (some from here) about how it takes months to get small changes into production.
[+] [-] dtech|7 years ago|reply
For other (smaller) companies, polyrepo might be the better choice because [significant investment in ecosystem and tooling] is not appealing, and the investments of Google et al. have not leaked through sufficiently into general available tools. Some headway is being made in the latter [1], so monorepo might be the "obvious" best choice in 10 years or so.
[1] For example, Git large file support is mostly from corporate contributors https://git-lfs.github.com/ https://github.com/Microsoft/VFSForGit
[+] [-] hinkley|7 years ago|reply
Yeah this is a pretty widespread and fundamental misunderstanding that leads to a lot of bad policy decisions.
If 'grepping code' is your first resort then you're hitting things with a hammer. I'm writing code that a machine is supposed to understand. If the machine can't understand how the bits interact then I have much bigger problems than where my code is stored. Probably we're dealing with a lot of toxic machismo bullshit that is hurting our ability to deliver.
If you want discipline, if you want cooperation, hell if you just want to be able to hire a bunch of new people when you land a big customer, you need some form of support for static analysis and the code navigation that it enables. Stop the propeller heads from using magic and runtime inference to wire up the parts of the system, or find a new gig. Even languages where static typing isn't a thing have documentation frameworks where you can provide hints that your IDE can understand (ex: jsdoc for Javascript).
For a large team, working without any kind of static analysis is a recipe for a rigid oligarchy. Only people who have memorized the system can reason about it. Everybody else who tries to make ambitious changes ends up breaking something. See what happens when you trust new people with new ideas? New is bad. Be safe with us.
And even if by some miracle you do make the change without blowing stuff up, you're still in the doghouse, because we have memorized the old way and you are disrupting things!
Some crazy ideas work well. Some reasonable ideas fail horribly. To grow, people need the space to tinker and an opaque codebase ruins those opportunities. Transparency is also helpful when debugging a production issue, because people can work in parallel to the people most likely to solve the problem (even the person who is usually right is way off base occasionally). I should be able to learn and possibly contribute without jamming up the rest of the team by asking inane questions.
You need pretty good but entirely achievable tooling and architecture to get that, but man when you do it's like getting over a cold and remembering what breathing feels like.
[+] [-] 0xFACEFEED|7 years ago|reply
> Because, at scale, a monorepo must solve every problem that a polyrepo must solve, with the downside of encouraging tight coupling, and the additional herculean effort of tackling VCS scalability.
Right.
But you have to get to "scale" first (as it relates to VCSs). Most companies don't. Even if they're successful. Introducing polyrepos front loads the scaling problems for no reason whatsoever. A giant waste of time.
Checkmate! I didn't even need a snarky poll. The irony of that poll is that it clearly demonstrates his zealotry, not other people's.
[+] [-] jayd16|7 years ago|reply
The author lists downsides of monorepos without listing the upsides and downsides of polyrepos so its really half complete.
I don't think anyone who likes a monorepo is suggesting you just commit breaking changes to master and ignore downstream teams. What it does do is give the ability to see who those downstream teams (if any) might be.
The crux of the author's argument is that added information is harmful because you might use it wrong. Its just as easy (far easier in fact) to ignore your partners without the information a monorepo gives. Its not really an argument at all. There's really nothing here but "there be dragons".
Monorepo's provide some cross functional information for a maintenance price. Its up to you whether the benefit is worth the overhead.
[+] [-] jonex|7 years ago|reply
To me this message seems a bit shallow, of course we can build tooling to hide the fact that we have a polyrepo. Given well enough built tooling and consistent enough polyrepo structure (all using same VCS, all being linked from common tooling, following common coding standards and using the same build tooling, etc.) the distinction from having a monorepo is more of an implementation detail.
Given the choice between a consistent monorepo where everyone is running everything at HEAD and a polyrepo where each project have their own rules and there's no tooling to make a multi-project atomic change, I'd go for the former.
Given the choice between identical working environments but different underlying implementations I would go for whatever the tools team think is easier to maintain.
[+] [-] olingern|7 years ago|reply
Need to change a function signature or interface? Cool, global find & replace.
At some point monorepos outgrow their usefulness. The sheer amount of files in something that’s 10K+ LOC ( not that large, I know ) warrants breaking apart the codebase into packages.
Still, I almost err on the side of monorepos because of the convenience that editors like vscode offer: autocomplete, auto-updating imports, etc.
[+] [-] im_down_w_otp|7 years ago|reply
Which of course begs the question, rather than trying to perform a bunch of unnatural acts, why not just use SVN to start with? It works extremely well with monorepo & subtree workflows.
Sure it has some warts in a few dimensions around branching, versioning, etc. compared to Git when using Git in ways aligned with how Git wants to work, but those warts are minimal in comparison to what's required to pretzel Git monorepos into scaling effectively.
[+] [-] thedufer|7 years ago|reply
Unreasonable for a single dev to have the entire repo? I'm looking at a repo with ~10 million LoC and ~1.4 million commits. I have 74 different branches checked out right now. Hard drives are cheap.
Code refactors are impossible? I reviewed two of those this morning. They're essentially a non-event. I'm not sure what to make of the merge issue - does code review have to start over after a merge? That seems like a deep issue in your code review process. The service-oriented point seems like a non-sequitur, unless you're telling me I'm supposed to have a service for, say, my queue implementation or time library.
The VCS scalability issue is the only real downside I see here. And it is real, but it also seems worth it. It helps that the big players are paving the way here - Facebook's contributions to the scalability of mercurial has definitely made a difference for us.
[+] [-] malkia|7 years ago|reply
[1] - https://fuchsia.googlesource.com/jiri/ [2] - https://chromium.googlesource.com/chromium/tools/depot_tools... [3] - https://chromium.googlesource.com/chromium/tools/depot_tools...
It even reflects a bit to the build system of choice, GN (used in the above), previously gyp, feels similar on the surface (script) to Bazel, but has some significant differences (gn has some more imperative parts, and it's a ninja-build generator, while bazel, like pants/bucks/please.build is a build system on it's own).
Simply fascinated :), and can't wait to see what the resolution of all this would be... Bazel is getting there to support monorepos (through WORKSPACEs), but there are some hard problems there...
[+] [-] robaato|7 years ago|reply
I asked one company how many changes required changes to more than one repo and was told "a small percentage". We then did some basic analysis of issue IDs across commits and discovered that it was in reality nearer 30% of changes. Keeping those together was just plain very hard.
Start to scale this by teams of hundreds or thousands of devs and you get a lot of pain.
Managing branches is also hard - easy to create (with repo tool) - but hard to track changes.
[+] [-] IshKebab|7 years ago|reply
[+] [-] Too|7 years ago|reply
[+] [-] towaway1138|7 years ago|reply
Fooserver sprouts a query syntax ("just do this for test servers A and B"), pushed to production. Fooclient sprouts code that relies on this, pushed to production. A bit later, Fooserver is rolled back, blowing away query syntax, pushed to production. "Just do this for test servers A and B" now becomes "Do this for every server in the company". Hilarity ensues.
[+] [-] CJefferson|7 years ago|reply
[+] [-] ajuc|7 years ago|reply
Seriously, you have over 1 TB of code and 100 people wrote it?
[+] [-] thehazard|7 years ago|reply
[+] [-] rkangel|7 years ago|reply
The upsides and downsides of this are an interesting debate, but there is a cost to polyrepos if you want to change the system architecture. There is a cost to monorepos too as argued by this post, and its up to the tech leads as to which cost is greater.
[+] [-] peterwwillis|7 years ago|reply
.....because, as the author directly stated, the type of repo has nothing to do with the product being successful. So stop bikeshedding, pick a model, and get on with the real business of delivering a successful product.
[+] [-] sterlind|7 years ago|reply
[+] [-] sierdolij|7 years ago|reply
- Semantic versions.
- Group components into reusable packages.
- Don't use git modules or other source cloning in builds, use native/platform package management.
- Access control is made much easier.
- Sign commits and tags.
- Code review either before- or after-the-fact, just do it(tm).
- Reproducible builds - strip out timestamps/random tokens/unsorted metadata.
- Create CHANGELOGs semi/automatically.
- Eliminate manual steps altogether.
- Distributed builds/build caching (distcc, ccache).
- TDD smoke tests should run automatically in dev on save with 10 seconds. Bonus points for running personal TDD sandbox on faster remote servers via rsync and trigger on file-save.
- Standardize on 1-3 languages.
- Services composed of simpler 12factor microservices, not monorepo megaservices. Deploy fuse switching, proxying, HA/redundancy, rate limiting, monitoring and performance stats collection just like macroservices.