top | item 39088826

(no title)

rajeevk | 2 years ago

I have not analyzed the full potentials and benefits of Diversion but I would not agree with the statements you made about the Git. I think you should not focus on Git in your pitch.

>>it was built for a very different world in 2005 (slow networks, much smaller projects, no cloud)

Slow network: why is this a negative thing? If something is designed for a slow network then it should perform well in a fast network.

Mush small project: I do not agree. I can say that it was not designed for very very large projects initially. But many improvements were made later. When Micorosoft adopted Git for Windows, they faced this problem and solved it. Please look at this https://devblogs.microsoft.com/bharry/the-largest-git-repo-o...

No cloud: Again I would not agree. Git is distributed so should work perfectly for the cloud. I am not able to understand what is the issue of Git in the cloud environment.

>>In our previous startup, a data scientist accidentally destroyed a month’s work of his team by using the wrong Git command

This is mostly a configuration issue. I guess this was done by a force push command. IFAIK, you can disable force push by configuration.

discuss

jsnell|2 years ago

> Slow network: why is this a negative thing? If something is designed for a slow network then it should perform well in a fast network.

Designing for resource-constrained systems usually means you're making tradeoffs. If the resource constraint is removed, you're no longer getting the benefit of that tradeoff but are paying the costs.

For example, TCP was designed for slow and unreliable networks. When networks got faster, the design decisions that made sense for slow networks (e.g. 32 bit sequence numbers, 16 bit window sizes) became untenable, and they had to spend effort on retrofitting the protocol to work around these restrictions (TCP timestamps, window scaling).

fourside|2 years ago

That makes sense but then the pitch should include something about how back in 2005 the design for git had to make a trade off because of X limitation, but now that restriction isn’t applicable which enables features A and B. I don’t really see what trade offs a faster network enables other than making it a requirement that you have a network connection to do work (commits are a REST call). I’m not sure that’s a trade off I’d want in my VCS, but maybe I’m just not the target audience for this.

funcDropShadow|2 years ago

Even a force push doesn't destroy the reflog or runs the GC server-side. I wonder how you can accidentally loose data with Git. I've seen lot's of people not being able to find it, but really destroying it is hard.

sasham|2 years ago

He force pushed a diverged branch or something like that, and we only found out after a while. We were eventually able to recover because someone didn't pull. But it was not a fun experience :D

grumbel|2 years ago

> doesn't destroy the reflog or runs the GC server-side.

Git doesn't give you access to the server side reflog either. So it's of not much use if you don't control the server.

As for losing data with Git, the easiest way to accomplish that is with data that hasn't been committed yet, a simple `git checkout` or `git reset --hard` can wipe out all your changes and even reflog won't keep record of that.

noufalibrahim|2 years ago

I agree. It's quite hard to actually destroy data in git. Even with the so called "destructive" commands, walking through the reflogs can usually restore work that was accidentally deleted or whatever.

billpg|2 years ago

I configured my github to only allow commits with an anonymised email address. Time passed and I used another machine on which I had already opened that repo before. I pulled my recent work successfully, wrote stuff and then committed and pushed.

Github rejected my commit as I had the wrong email address. I then had to try and work out how I delete a commit but keep all my changes so I could commit it all again but with the correct email address.

I'm not sure exactly what I did but in my ham-fisted experimentation I deleted the commit and restored my local copy back to the way it was before my commit, losing all my work that day.

cqqxo4zV46cp|2 years ago

Destroying it and nobody knowing how to recover, or that it can be recovered at all, it are identical.

sasham|2 years ago

Thanks! We're definitely not trying to bash Git, it's done a lot of good for software development and for sure is going to continue evolving.

Git had much more edge when it was competing vs SVN and other centralized VCSs. With 10Mb networks (if you were in office) you could feel physical pain when committing stuff ><

Reg how Git is not perfect in the cloud world - check out GitHub's blog post here about their cloud dev environment, Codespaces https://github.blog/2021-08-11-githubs-engineering-team-move...

"The GitHub.com repository is almost 13 GB on disk; simply cloning the repository takes 20 minutes."

Moving 13GB inside your own cloud should take seconds at most. The problem is the way Git works, it clones your entire repository into the container with your cloud environment, using a slow network protocol. With Diversion it takes a few seconds.

andsoitis|2 years ago

> Thanks! We're definitely not trying to bash Git, it's done a lot of good for software development and for sure is going to continue evolving.

It is not about bashing git; it is about anchoring your argument of why Diversion is a better alternative around git. You're basically taking your game/arguments to their playing field, and thus will have an uphill battle for mindshre.

Instead, consider reframing the playing field and mention git less (if at all). Something like "the future of version control is blah". Surprise us, talk to us about your vision for source control, or better yet, code and multi-discipline collaboration (e.g. between eng and design), etc.

asimpletune|2 years ago

I'm not sure I understand this at all.

> The problem is the way Git works, it clones your entire repository into the container with your cloud environment, using a slow network protocol.

What about git's network protocol is 'slow'?

I think I can also come up with a pretty simple experiment to prove or disprove this: 1. Fill a file with 13Gb of data and commit it. 2. Upload that to GitHub or wherever you want 3. Time how long it takes to clone and compare that to the real GitHub.com

You will find the one we made takes 'seconds' (or minutes, depending on your network connection), while the the GitHub.com will take some time.

So, same data, two different results? The difference in this experiment rules out the 'slow' network protocol as the difference maker. The real reason is that the GitHub.com repo will have hundreds or thousands of commits.

Basically, the difference is the commit history, because that's how git needs to work. Git stores the diffs for the entire commit history, not just the literal files at the HEAD. I don't know what the network protocol has to do with that.

dartos|2 years ago

> We're definitely not trying to bash Git

Using git with bash is the best way to use git (:

funcDropShadow|2 years ago

That article also states that using a standard Git feature, shallow clones, you go from 20min to 90s. Most of the problems touched upon in the article are about state management for local environments, yes that can be tricky. And it can take time, but it has nothing to do with Git.

vintagedave|2 years ago

>> a data scientist accidentally destroyed a month’s work of his team

> This is mostly a configuration issue

git apologism :)

(FWIW I do agree with the rest of your comment, and I hope you forgive the slight joke. Product users, for any product are fallible humans. That might be fallible in accidentally deleting, or it might be fallible in forgetting to turn on the safety settings.)

Very seriously, something like this should not be possible in a source control system. Data integrity needs to be built in by design.

MatthiasPortzel|2 years ago

> Data integrity needs to be built in by design

It is built into Git by design. Git keeps commits around for 90 days even after they’re “deleted.” This is why people who understand Git were so skeptical of OP’s claim. The point that Git is confusing still stands, however.

devjab|2 years ago

The issue with a lot of freedom and unopinionated tools is always going to be the multitude of ways to fuck up. On the flip-side, you may not like what choices are made if you’re forced to use it in a certain way.

We enforce a strict pull-request squish commit with four eyes approval only. You can’t force push, you can’t rebase, you can’t not squish or whatever else you’d want to do. But we don’t pretend that is the “correct” way to use Git, we think it is, but who are we to tell you how to do you?

We take a similar approach to how we use Typescript. We have our own library of coding “grammar?” that you have to follow if you want to commit TS into our pipelines. Again, we have a certain way to do things and you have to follow them, but these ways might not work for anyone else, and we do sometimes alter them a little if there is a good reason to do so.

I don’t personally mind strict and opinionated software. I too think Git has far too many ways to fuck up, and that is far too easy to create a terrible work environment with JavaScript. It also takes a lot of initial effort to set rules up to make sure everyone works the same way. But again, what if the greater community decided that rebase was better than squash commit? Then we wouldn’t like Git, and I’m sure the rebase crowd feels the same way. The result would likely leave us with two Gits.

Though I guess with initiatives like the launch here, is two Gits. So… well.

dmazzoni|2 years ago

What if someone pushes something inappropriate? Shouldn't there be a way to delete it?

As an example, what if someone pushes:

- A private key or password - Copyrighted content - Illegal content

In cases like this, it needs to be possible to remove the bad commit from the repository entirely.

IshKebab|2 years ago

> When Micorosoft adopted Git for Windows, they faced this problem and solved it.

On Windows. On Linux Git still doesn't scale well to very large repos. Before you say "but Linux uses git!", we're talking repos that are much bugger than Linux.

Also the de facto large file "solution" is LFS, which is another half baked idea that doesn't really do the job.

You sound like you're offended that Git isn't perfect because you like it so much. But OP is 100% right here; these are things that Git doesn't do well. It's ok to really like something that isn't perfect. You don't have to defend flaws that it clearly has.

WorldMaker|2 years ago

>> When Micorosoft adopted Git for Windows, they faced this problem and solved it.

> On Windows. On Linux Git still doesn't scale well to very large repos.

All of Microsoft's solutions for git scaling have been cross-platform. Even VFS had a FUSE driver if you wanted it, but VFS is no longer Microsoft's recommended solution either, having moved on to things like sparse "cone" checkouts and commit-graphs, almost all of which is in mainline git today.

I also find it funny the complaint that git scales worse on Linux than Windows given how many Windows developers I know with file operation speed complaints on Windows that Linux doesn't have (and is a big reason to move to Windows Dev Drive given the chance, because somewhat Linux-like file performance).

graemep|2 years ago

How common are repos bigger than Linux?

Linux also has the huge advantage of an ecosystem, tools and integrations. It is overkill for small projects and there are friendlier alternatives for those - but git wins because it is what everyone knows. Something aimed at the small number of large projects will suffer the same problem.

Wytwwww|2 years ago

> really like something that isn't perfect. You don't have to defend flaws that it clearly has.

Certainly true. But it's not clear at all how does the product solve these specific problems (they say "Painless Scalability" which sounds nice but did they try developing any 100+ GB projects with massive numbers of commits/branches on it?)

Rygian|2 years ago

> This is mostly a configuration issue. I guess this was done by a force push command. IFAIK, you can disable force push by configuration.

If a feature can lead to actual unintended data loss, it should come disabled by default. Are there any other "unsafe by default" features in Git? What would be a sane general default that prevents unwanted data loss, and why is it the case?

guax|2 years ago

--force always imply data loss. You're overriding the remote state.

Do people use it in an unsafe manner because they don't understand git and there lies a problem that could be tackled? yes.

With that, I don't think git has any feature that is unsafe by default.

couchand|2 years ago

Should a chain saw come with the ability to start the engine disabled by default?

dmazzoni|2 years ago

But it doesn't lead to data loss.

The commits that were overwritten by "force" are still there on the server. Any admin could recover them pretty easily. They're probably still present in the local repo of the person who ran "git push --force" too, as well as anyone else's machine who has cloned the repo.

The only way you'd actually lose data is if every single person who had a clone of the repo ran gc.

Or apparently if nobody knew about "git reflog" and nobody bothered to do a Google search for "oops I accidentally force pushed in git" to learn how to fix it.

aseipp|2 years ago

The Windows Git repository is only 300GB, that's basically childs' play when people are talking about "large repo scalability". Average game developer projects will be multiple terabytes per branch, with a very high number of extremely large files, and very large histories on top of it. Git actually still does handle large files very poorly, not only extremely large repos in aggregate. The problem with large Git repositories is nowhere near solved, I assure you.

laeri|2 years ago

This includes assets right or some kind of prebuilt data in custom formats? Otherwise it would be hard to have this much data in source files.

Kiro|2 years ago

Git is bad for games and they should definitely compare them in their pitch if they want to capture that market.

kfrzcode|2 years ago

No, it's not. LFS has improved over the years. Git is supported as a first class citizen in Unreal Engine 5 - alongside P4.

jasfi|2 years ago

The complexity people think they face with Git can often be overcome with a good UI and/or tutorials.

sasham|2 years ago

In part yes, e.g. lots of people like SourceTree. Some of the complexity is inherent though, e.g. local vs remote branches and the various conflicts & errors as a result. Git exists for 18 years, and yet the complexity problem wasn't solved yet. Other tools like SVN were never considered to be so hard to use / easy to screw up.

lifeofguenter|2 years ago

Have you ever tried running Git in the cloud? :)

Cloud-native and running things on “EC2” are very different things.

sasham|2 years ago

Yep :) Lots of products run Git on EC2/containers, e.g. GitPod or GitHub Codespaces. Ironically, Diversion works much faster on these than git

https://github.blog/2021-08-11-githubs-engineering-team-move...