Git from the inside out

[+] leni536|11 years ago|reply

> from inside out

As a physicist it always surprises me how thinking between physicist and programmers most of the time is kind of reversed. Most git tutorials seem like this to me (Mechanics analogy):

   1. Slopes
   2. Springs and gears
   3. Horrendous contraptions
   4. Ropes and pulleys
   ...
   9. Newton I., II. and III.

These kind of "inside out" tutorials are natural for me and recently I taught git basics to my SO in a similar way. It worked out well (she is a physicist too). I don't want to generalize though, it's maybe rooted in the common ways of teaching programming and physics.

[+] JoshTriplett|11 years ago|reply

Imagine if everyone learning physics came in with an attitude of "I need to learn and use the rocket equation as quickly as possible" (or substitute some other high-level problem for "the rocket equation"). You'd end up with strange backwards tutorials for physics that start out with that specific model, then a handful of related models, then how to abuse that model to handle things it doesn't really apply to, and much later the underlying physics and mathematics to solve arbitrary generalized problems.

Many people start out trying to figure out either "how do I use git exactly like (svn, cvs, vss, ...)" or "how do I commit and push my changes", so tutorials start there. Most people don't approach git by learning its underlying data model. Arguably people should, because it's a rather simple data model, and then all the commands become simple applications of that model.

[+] scott_karana|11 years ago|reply

Tutorials are specifically "learn to get something done, RIGHT NOW, working on the shoulders of giants".

Education from first principles is common in computer science, but tutorials aren't the place.

Analogous in physics would be, "I need to figure out exactly how much force is going to be applied through this climbing harness and pulley system lest I fall and die", and learning how to plug your variables into a standard set of equations without understanding anything but basic calculator operation.

[+] dasil003|11 years ago|reply

All the replies are missing the fundamental difference between learning physics and learning git: every animal on the planet is intuitively schooled from birth—even before birth by their DNA—in the daily practice of mechanics.

Learning git is a comparatively abstract intellectual pursuit. You can't assume people know how to practice version control or even what its purpose and utility is, therefore learning this way is going to be very hit or miss depending on the preparedness of the audience.

[+] martininmelb|11 years ago|reply

So, maybe this is more useful?

Unlike more primitive version control systems, git repositories are not linear, they already support branching, and are thus best visualised as trees in their own right. Branches thus become trees of trees. To visualise this, it’s simplest to think of the state of your repository as a point in a high-dimensional ‘code-space’, in which branches are represented as n-dimensional membranes, mapping the spatial loci of successive commits onto the projected manifold of each cloned repository.

(Quoted from tartley.com)

[+] RickHull|11 years ago|reply

It has to do whether the audience already has a solid theoretic / symbolic / formal background. Do they already think in terms of models? For graduate level physicists, this is almost surely the case.

Most git tutorials are for people who need to get up to speed quickly, having had git imposed on them from above, mapping surface area concepts from the prior version control system. Hard core engineers with an interest find their way to the theoretic center.

[+] amelius|11 years ago|reply

The reason is that programmers try to create new things, whereas physicists try to figure out how things work which are already there, waiting to be discovered (or they are really engineers).

Also, programmers create abstractions such that one does not need to know about the nitty gritty in order to understand how to use something.

[+] joshuapants|11 years ago|reply

I think this is generally because people want to start Doing Things right away with what they're learning, so most teaching attempts revolve around giving the student something they can Do and then explaining it afterward.

[+] kazinator|11 years ago|reply

Okay Johnny, here is how we drive a car. First, put on your seat belt. Next, we discuss Newton I, II and III. We really should have done that right away, but they say, "safety first!"

[+] rtpg|11 years ago|reply

this is disingenuous. As a child, you _did_ see slopes and springs and gears and ropes and pulleys before Newton, just not in physics classes but in everyday life. Physics classes start from first principles, maybe, but in terms of what you get exposure to, the three laws of newton show up pretty late in one's understanding of how things happen.

[+] kazinator|11 years ago|reply

People knew about inclined planes and pulleys before Newton postulated I, II and III.

[+] chx|11 years ago|reply

Regular pitch for http://www.sbf5.com/~cduan/technical/git/

> you can only really use Git if you understand how Git works. Merely memorizing which commands you should run at what times will work in the short run, but it’s only a matter of time before you get stuck or, worse, break something.

It's concise (the linked article is gigantic) and allows for an understanding of this overhyped user hostile DVCS. I know that git won but don't expect me to be happy about it.

[+] fit2rule|11 years ago|reply

I found this tutorial (the one you recommend) to be, generally, terrible. It doesn't define terms, but rather just jumps into the git terminology without explaining things. For example the section on "Commit Objects" - what actually is a commit object? Is it a directory, is it a special file, is it .. something else? The term is used in an abstract manner without actually explaining what the term means - sure, it explains what is associated with the term, but why is it an object?

The same goes for the "Heads" explanation - its a reference to a commit object. Is this a softlink, is it a field in a special file that contains a reference .. what "is it?", and why is the word "head" used?

Whereas in the tutorial referred to in this article, "Git From the Inside Out", the terms are actually defined before they're used in any significant fashion. "Commit Objects = After creating the tree graph, git commit creates a commit object. This is just another text file in .git/objects/:" (with example), and .. "Heads = Which is the current branch? To find out, Git goes to the HEAD file at .git/HEAD and finds: ref: refs/heads/master This says that HEAD is pointing at master. master is the current branch."

I just want to point this difference out, because it actually is endemic in all Git tutorials I've found - either you define the basic terms, such that an association can be made in the mind of the reader, allowing them to grasp the abstractions .. or you don't. This appears to be a common problem with Git tutorials, in my opinion - the terms, or rather the taxology of the Git abstractions - are quite unclear. Why on earth the term "Head" is used to refer to a commit object is quite un-intuitive .. at first. Git really requires a deeper dive into the abstractions before surfacing for true understanding.

[+] barbs|11 years ago|reply

What alternative do you prefer?

[+] AceJohnny2|11 years ago|reply

See also "Git from the Bottom Up": https://jwiegley.github.io/git-from-the-bottom-up/

(originally a PDF in 2008)

[+] dwyer|11 years ago|reply

Much better article IMO. Introducing the low level commands that the higher level ones wrap around is a much more fun and interactive way to understand the .git schema to me.

[+] kazinator|11 years ago|reply

Wish `git` didn't have annoying special cases in it. For instance

   git rebase -i HEAD~2

won't work if there are only two commits, because HEAD~2 refers to a nonexistent commit after the first two.

There should be some friggin' NIL terminator there which takes the HEAD~2 reference.

Imagine having a function to, say, delete characters from a string which takes an open-ended range [from, to). Then imagine that the index to has to exist in the string; it must not point one element past the end! Oops, you cannot delete from a position to the end of the string.

The garbage-collected object graph is nice and "Lisp-like" in some ways, but silly in others.

Oh, and in case you're thinking "just make an empty initial commit, and it will effectively be your NIL terminator". No can do; git doesn't allow empty commits. Of course, you can make a file called ".nil" and add it and commit. Use "()" as the commit comment. :)

[+] nshepperd|11 years ago|reply

I feel like the lack of an initial empty commit is really a failure to match the intuitive graph model. Clearly there should be an arrow corresponding to "adding the first files". And that arrow needs somewhere to go from and to. Hence, `git init` should always start by creating an initial commit object referring to an empty tree.

A side bonus of this would be that since the initial commit is empty it has a fixed id. Suddenly, all git repositories have a common ancestor, and you can merge any two random projects together without losing history!

[+] chx|11 years ago|reply

git commit --allow-empty

[+] dnc|11 years ago|reply

For grokking git, indispensable resource is git early dev mailing list and corresponding code base (first couple of months after project started). Linus explained it in very clear and precise way in the mailing list and related code. The initial code base is surprisingly small (around 1200 LOC of clear and precise C code). Used data structures are simple and self-explanatory. Although most of the original code is not in the git code base anymore, the data structures and main design ideas have stayed there intact so far.

[+] voltagex_|11 years ago|reply

I'm going to have to go through the archives later (anyone got an mbox?) but it's tricky to follow the early development. The archives seem to start at http://marc.info/?l=git&r=20&b=200504&w=2 and I don't see many design messages from Linus.

[+] coldpie|11 years ago|reply

I actually found gitcore-tutorial(7) to be a really great resource when I was learning git.

[+] mendelk|11 years ago|reply

Interesting article.

Also wasn't aware that Hacker School changed their name.

https://www.recurse.com/blog/77-hacker-school-is-now-the-rec...

[+] chromedude|11 years ago|reply

Don't worry you aren't very far behind the times. It was announced yesterday.

[+] jordigh|11 years ago|reply

Huh, another git explanation.

Either the thing is so easy to understand that everyone can do it and is then compelled to write about it, or it's so difficult to understand that everyone feels the compulsion to explain it to everyone else.

[+] RickHull|11 years ago|reply

Maybe a quip, but that dichotomy is so false it's not even funny. Git is tough to wrap one's head around at first. It's not intuitive unless one already has a deep background in this space. Hence, there are lots of attempts to explain it in order to bring more into the fold of intuition. It seems perfectly natural and good, and your contemptuous tone puzzles me.

[+] bsder|11 years ago|reply

> it's so difficult to understand that everyone feels the compulsion to explain it to everyone else.

All successful religions include proselytization in their basic tenets.

Ahem. Anyhow. Let me be more charitable. git is so hard to learn because it has an impedance mismatch with the primary user's use case.

git clearly works in the large--that's what Linus designed it for. The problem is that git forces you into that workflow immediately and has no intermediate steps. The problem is that most of us use version control to coordinate less than 10 people, and git forces way too much mental energy on top of something which is very simple in CVS or SVN or ... any other version control system, really (maybe arch had a worse mental model ... that's not a compliment).

Use Mercurial for a while, and then use git. You will find yourself saying things like: "Why should I need to even care about X?" and the answer is always "Well, if you had 100 committers and 40 branches ..."

Things like: "Why would I need to name a branch?" "Why wouldn't I just sync a repository completely?" "Why not just clone the whole repository?" etc.

The problem is that everybody is forced into "git in the large" in order to contribute to open source projects. The tutorial that needs to be written is "git in the small", but I'm not sure that the design of git actually allows that tutorial to be written.

[+] ta0967|11 years ago|reply

Its concepts easy to understand and git is very powerful, but the interface is horrible and a bitch to learn. Git does not need another tutorial, it needs an alternative porcelain.

[+] edavis|11 years ago|reply

Or, alternatively, git's building blocks are elegant, powerful, and interesting and writing about them makes for natural blog posts/guides/tutorials.

[+] logicallee|11 years ago|reply

>"This essay explains how Git works. It assumes you understand Git well enough to use it to version control your projects."

so...the opposite of Bjarne Stroustrup's maligned "The C++ Programming Language", which fails to explain how C++ works, after assuming you don't know it. :)

seriously though no need for the second sentence. this article is a great intro!

[+] RansomTime|11 years ago|reply

Footnote 3: git prune deletes all objects that cannot be reached from a ref. If the user runs this command, they may lose content.

In what cases would a user lose content? When something is added but not committed only?

[+] m0tive|11 years ago|reply

When you've committed something, but then rebased or reset the branch position so the commit is not longer in the history of any branch or tag. This usually isn't a problem, because when you rebase work you are making a copy of the commit so references to the data should be the same.

I also think it's worth noting, `git gc`, which is triggered automatically occasionally, actually runs `git prune`.

[+] ThinkBeat|11 years ago|reply

What tool did the poster use to create the diagrams?

[+] maryrosecook|11 years ago|reply

OmniGraffle. I really enjoy using it.

[+] Fannon|11 years ago|reply

A great title would have been: The Guts of Git.

[+] a3_nm|11 years ago|reply

Already taken: https://lwn.net/Articles/131657/

[+] msie|11 years ago|reply

I regular read articles about Git's inner workings and I always seem to forget it. :-(

[+] RansomTime|11 years ago|reply

[deleted]

[+] ams6110|11 years ago|reply

Yet another attempt to explain the incomprehensible.

Why does such a popular version control systems find itself in need of so many explanatations.

Any startup attempting to market something that required a user to understand concepts such as this...

https://codewords.recurse.com/images/two/git-from-the-inside...

...would be laughed out of the room in any other context.

[+] dr4g0n|11 years ago|reply

A version control system with the feature set git has is necessarily complicated, this is not a bad thing. Forcing the complexity on the user is a bad thing, but git does not do this except when it is necessary to. Using git at a basic level is not hard.

This essay is not attempting to explain how to use git, it is explaining how git itself works and how changes are physically tracked. There's no need for a basic user to know this information, but if someone wants to dig into git and understand how it works, this essay is a nice guide. Explaining how SVN or any other version control system works at this level of detail would also be complicated.

[+] rudolf0|11 years ago|reply

Git isn't exactly a startup (or a company, or even a product), nor did it ever intend to be.

Lots of great software happens to be difficult for a lot of people to intuitively understand.

[+] sp332|11 years ago|reply

You are not required to understand this concept. But if you're working on a large team and your branch history gets this complicated, you can still effectively use git to manage it. Try that with less-popular version control systems. (Mercurial comes to mind, actually.)

[+] sergiotapia|11 years ago|reply

Git won because of Github, it's an unpopular opinion but I stand by it. If Github did not exist, Git wouldn't have been adopted by so many projects.

[+] brown9-2|11 years ago|reply

Sometimes technical things are complicated, because people don't work or communicate in tidy ways.

You could make similar complaints about many of the foundations that make technology possible. Take for example a protocol that enabled you to read this message: http://en.wikipedia.org/wiki/Transmission_Control_Protocol#/...

[+] semi-extrinsic|11 years ago|reply

So by your logic, we shouldn't have helicopters? Using them arguably requires a lot more understanding of complicated things than git does.

96 comments