I would maybe be interested in Git allowing you to plug in your own diff generators for different file types.
But I would not want Git itself trying to understand the contents of files. That seems to me to be an idea that lives on a misconception of the "things programmers believe about names" variety. Not every file in source control is source code. Not every programming language's grammar maps to an abstract syntax tree. In some files, such as makefiles, the difference between tabs and spaces is semantically significant. Some languages (such as Fortran and Racket) have variable syntax. And so on and so forth.
So I think that we really don't want the source control system itself trying to get too smart about the contents of files. That will inevitably make the source control system less compatible with the various kinds of things you might want to put into source control. And it will also make the source control system a lot more complicated than it would otherwise be, in return for a largely theoretical payoff.
But if we want to delegate the work of generating diffs off to other people, so that Git can allow for syntax or semantics-aware diffing without having to personally wade into that quagmire (and perhaps also allowing language communities to support multiple source control systems, a bit like how it works with LSP), that might be an interesting thing to experiment with.
I disagree. Many engineers want to refactor across a sequence of small PRs, for example. Small PRs are a good thing, because they’re easier to understand. But today, Git makes this painful. Also, understanding how the meaning of code changes over time can help reduce bugs.
The solution will have to be pluggable. But I think it is possible, and there are sane things to do (e.g. fall back to vanilla git) when there are missing plugs.
Not only that, but imagine you realize there is a bug in the parsing tool. Now you have to go back and re-parse the code, or otherwise just deal with a bad history forever. Suddenly you’re storing text again.
I do kind of love the idea of Git using ASTs instead of source code. It makes a ton of sense.
Even just in the immediate term I wish I could make Git(hub) tabs/2 spaces/4 spaces/whatever agnostic. Seems crazy to me that in 2021 we still have to make opinionated choices across orgs about what to use... why can't we pull the code down, view it in whatever setup we want, then commit a normalized version?
[whispers] this is actually something tabs allow you to do natively by setting custom tab widths in text editors but I've given up trying to sell people on tabs at this point and just want to be able to do my own thing
It's not that you're going too far, it's that you're not going far enough!
It's not a Git question, it's a programming language question. There's no reason source code need to be stored as plain text[1]! Editors show it as text, we edit it as text, but why wouldn't it be _stored_ as an AST? Not only does formatting becomes an editor concern, but code could even be edited as a tree, as a graph, as whatever you want
[1] - well, actually there's plenty of reasons: chiefly because plaintext is very interoperable
Tabs do work as long as they aren't fixed width (I don't know what you mean by "custom").
For instance, in many languages, one will sometimes have to split a function call to many lines, and in most languages function names aren't of fixed length, thus in order to get a correct alignment for parameters, the tab width at that point will have to match the function name length.
I agree with your idea of storing a normalized version of the code in the repo: it wouldn't then matter whether that version contains characters to align the code properly, it would just be inserted by the editor/linter as needed. The difficulty is that sometimes linting isn't enough, and some manual formatting is needed. Or perhaps the formatting rules are under specified?
Another issue with AST diffing is when languages allow some form of syntactic sugar as preprocessing: the compiler might just see the simplified tree, not the one with the "sugary" forms. A tool capable of parsing such languages should also be able to handle these extensions.
fwiw, this is what we do in Dark [1]. We store (serialized) ASTs, then then we pretty print them in the editor. This converts the AST into tokens that you see on your screen, complete with configurable* indentation, line-length, etc. Code would be displayed according to your config* and the same code displayed differently to a different developer looking at the same code.
One of the practical issues here is, if your code fails to compile in CI with an error like
/home/ci/src/foo.c:123:45: error: use of undeclared identifier 'a'
or
/home/ci/src/bar.py:50: syntax error in type comment
or crashes in production with an error like
java.lang.NullPointerException
at com.example.Baz.doThings(Baz.java:1337)
you really want to be able to find line 123 column 45, line 50, or line 1337 in your editor, and have that be the same line as what your CI compiled and deployed.
On its own, tabs vs. spaces only affects columns, and you can probably figure things out without columns (although it's a shame to lose it). But different tab sizes affect how long your lines are, and line wrapping is a thing that people care about at least as much as tabs vs. spaces (people with different size monitor or fonts will easily see too-long or too-short lines on their display; if your spaces are equivalent to the tab stop, the distinction is literally invisible). And once you start rewrapping lines, everyone's line numbers are different.
I think it's possible to solve this by using some sort of AST-based index into the file and teaching IDEs to let you seek based on that, but it's suddenly a more complex problem.
Reading this article, I feel as though the author doesn't deeply understand git.
git works on blobs of data, not files, and not lines of text. It doesn't just happen to also work on binary files- that's all it works on.
Now, if the author is suggesting that git-diff ought to have a language specific mode that parses changed files as ASTs to compare, now I'm interested. Let's do that. I'll help!
But git does not need to change how it works for that to happen. Git does not even need git-diff to exist to serve it's main purpose.
Rebases and cherry-picks work by applying diffs, not by copying blobs. Auto-merging also needs to look at file content as text, you can't auto-merge a binary file with git.
It's an often repeated fact that if you look inside Git, it doesn't work with diffs, it works with blobs. But if you look closer, it's often diffs again!
There's also a historical angle here that's important to inspect - Git was designed to specifically be content agnostic. There are some predecessors in the SCM space (like VSS) that are specifically language aware and allow the checking out of line ranges (pinning them so that no one else will make a conflicting change specifically) and even entire functions - these systems can cause a lot of grief while failing to protect the logic they're specifically trying to protect. As the warts on SVN got more and more visible I think the general assumption was that the replacement SCM would come out of this code aware space - but it didn't and in retrospect we all dodged a huge bullet when that happened.
I absolutely adore tooling around git that makes diffs more visible - one thing I absolutely gush over is anything that can detect and highlight function reordering... however, the core process of merging and rebasing and all that jazz - I don't think we're going to find anything automated that I'll ever trust when I'm not working on a ridiculously clean codebase - minor changes can have echo effects and when two people are coding in the same general area they need to be aware of what the other person is trying to do.
I dunno I feel like you're focusing on a detail that's not particularly relevant. The author's main thrust is precisely what you described about parsing changed files as ASTs.
Storing AST instead of source code is one of the goals of the very interesting Unison programming language: https://www.unisonweb.org/
Part of what's nice about Git (and plain text in general) is that it's the lowest common denominator for a lot of things. This is why traditional Unix tools are built oriented around streams of bytes. Text is a low level carrier protocol; you can encode almost anything in it, but you need to agree on some kind of format.
The good part is that you can use very very generic tools on almost arbitrary pieces of data. The bad part is that you might have to do a lot of parsing and re-parsing of the same data, and you have to contend with the dangers of underspecified formats.
Git follows the Unix tradition in this regard. As a result, it is nearly universal in what it can store. You can use it to store pretty much anything, but you are now at the lowest common denominator of support for any particular data format.
Git-for-ASTs will no longer have this universality property, but will gain a lot more power in the covered domain. This is a design tradeoff.
One thing that's nice about Git is that you can specify arbitrary diff drivers with the "attributes" system. So even if the Git database is storing plain text, your diff driver can parse your source code into ASTs and present AST diffs to you when you run `git diff`. Perhaps more impressive, you can configure custom merge drivers, so you can (theoretically) implement semantic merging of ASTs right inside Git.
There are probably some fundamental limitations of this system, because the underlying data is still stored as blobs of bytes. But you can get pretty far as long as you don't mind parsing and re-parsing the same text over and over.
I don't see how this could ever work on evolving languages, different GIT versions would produce different commits and read commits differently based on the latest C++ standard. This would potentially lead to version control bugs where different GIT versions creates different results from the same commit, that is horrible, version control needs to be 100% bug free in that regard.
The only reasonable application would be to use a language AST parser to better identify relevant text diffs, but the commits still needs to be stored as text.
This doesn't really make sense, because in order to have those code changes compile correctly, there must be a corresponding commit to CI config that changes the complier version or compiler switches for the new language version. The "semantic-diff-er" can also be driven by that commit such that it uses the correct language version.
`git` generally doesn't work with lines of text. Mostly it works with opaque file blobs and directory trees.
`git diff` and `git merge` work with lines of text by default - but they don't have to. You can supply your own `diff` and `merge` tools with the `difftool.*` and `mergetool.*` config options, try them out with `git-difftool` and `git-mergetool` commands, and set the default with the `git.diff` and `git.merge` config options.
If someone wanted to create AST-based diff and merge tools for a given language, they could be plugged right into the existing `git` infrastructure and it would work with them absolutely fine.
This feature is useful in so many different places. I use it to diff small encrypted files in my repo - just add `gpg -d` as a diff configuration and now I can use git log, diff etc in a meaningful way with binary files.
I've heard of people using it with pdfs as well - a pdf to html converter lets you get a good idea of what changed in the document.
What if generating a diff is nontrivial? Say you rename an identifier. That might be a single command in an IDE. A sufficiently high-level "diff" format could easily capture that intent. But working backwards from hundreds of touched lines across many files to deduce that single semantic edit is not trivial. Git assumes that arbitrary diffs can be deduced from "before" and "after" files, but this isn't the case - it may be that you'd rather generate the new file from the diff!
> If someone wanted to create AST-based diff and merge tools for a given language, they could be plugged right into the existing `git` infrastructure and it would work with them absolutely fine.
There's a lot tooling in the Eclipse modelling ecosystem which could be easily used for this. Storing XML-based models in git is no problem and there's tooling for diffing and merging models via a GUI or programmatically. Combined with the fact that xtext DSLs use EMF models to represent ASTs, it wouldn't be too hard to glue together an AST-based a diff/merge tool for an xtext DSL.
> `git` generally doesn't work with lines of text. Mostly it works with opaque file blobs and directory trees.
I am not sure this is true.
In the past it gave me problems with line ending normalization between windows/mac/linux, in and out. In those cases it definitely had a lines of text view of things.
Our tool uses git as the foundation of its functionality. It superimposes git diffs on top of ASTs.
It is insanely powerful.
For example, we use it to power semantic code search and current support Python, Javascript, and Java. We generate a JSON object describing the AST differences between initial and terminal commits on GitHub PRs. A full text search on the JSON objects performs surprisingly well when we want to answer questions like, "When did we add dateutils as a dependency?" or "When did we last change the /journals handler on the API?"
The Python integration currently sees the most use but if you are interested in other languages, we would be happy to support it.
Do drop me a DM if you want help getting started with Locust.
Whenever I do Clojure, something that can get difficult when working with multiple people is how the parentheses/brackets/braces stack up, especially when everyone seems to have different opinions on how that works. As a result, if you're not careful, when there's a merge conflict you can have a ton of extra parentheses, which can be irritating to debug.
Obviously this is at some level an issue inherent to Lisps (and to be clear, I love Lisps, and these small headaches are worth it), but I think problems like that could be reduced if our source controls were aware of the ASTs.
Git is designed to require human oversight. This is usually a feature, but in recent years has become a bug with things like GitOps.
It's important to remember that Git is a terrible database because of its lack of semantic structure. All conflicts require a human who does have to context. This is why almost no one builds a system that uses Git as a two way interface. And when they do, its via Github Pull Requests (which go to humans) and not Git itself.
In all, this makes it a wonderful general purpose shared filesystem. And that's about it.
The output could be a lot more compact, it could do better at adding context (in the same way https://github.com/romgrk/nvim-treesitter-context does, etc), but if you're interested in this it's really within reach, go help out.
> The fact that git works on lines of text [...] we could be looking at the alterations to the abstract syntax tree.
Fundamentally git does not operate on text, it operates on files (content addressed SCM not a ledger of text diffs); diffs are generated upon request between arbitrary merkel trees. So there is no need to implicate git in such a tool, it can be independent:
GIT_EXTERNAL_DIFF
When the environment variable GIT_EXTERNAL_DIFF is set, the program
named by it is called to generate diffs, and Git does not use its
builtin diff machinery. For a path that is added, removed, or
modified, GIT_EXTERNAL_DIFF is called with 7 parameters:
path old-file old-hex old-mode new-file new-hex new-mode
There's a good blog post about auto-merging JSON/XML structured data files (for game content) on the bitsquid blog from 2010:
> having content conflicts is no fun either. A level designer wants to work in the level editor, not manage strange content conflicts in barely understandable XML-files. The level designer should never have to mess with WinMerging the engine's file formats.
> And conflicts shouldn't be necessary. Most content conflicts are not actual conflicts. It is not that often that two people have moved the exact same object or changed the exact same settings parameter. Rather, the conflicts occur because a line-based merge tool tries to merge hierarchical data (XML or JSON) and messes up the structure.
> In those rare cases when there is an actual conflict, the content people don't want to resolve it in WinMerge. If two level designers have moved the same object, we don't really help them address the issue by bringing up a dialog box with a ton of XML mumbo-jumbo. Instead, it is much better to just pick one of the two locations and go ahead with merging the file. Then, the level designers can fix any problems that might have occurred in the level editor -- the right tool for the job.
I don't understand why GitHub hasn't solved the issue of diffs starting with a '}' (or ')' or 'end'). Just slide the diff over while it starts with a closing token! I suppose it's an artifact of the diffing algorithm, but aren't there better diffing algorithms, even built-in within git?
This is by far the most obvious example of "git doesn't understand programming languages", but it also seems like the most straightforward to fix.
I’ve done quite a lot of work on version management on structured data (in my case this was for a version managed GIS database) and it’s not an easy problem, and is likely even harder with something like an AST that is generated from a text file and so does not preserve the identity of nodes. I’m not saying that it’s impossible, but it is more work and requires more tooling around it than people think, and it keeps coming up here and other places as a, “really good idea.”
I'm trying to remember the citation, but I remember seeing a presentation once from someone who studied this and they said that the thing that worked best was a hybrid approach: use structured diff at the top level of the program (modules / methods) but use line-based for statements and expressions. According to them, the structured diff can give unintuitive results if applied at the lowest syntactic levels.
I’d give anything just to get a few basic merge modes. For example “this file can treat two one line additions as unordered”.
So any shared append-only file (a change log, an enumeration,…) doesn’t automatically conflict.
Syntax aware diffing would be great too, but I’d take something much simpler. For syntax aware stuff I’d love something that could tell semantic changes from noise.
[+] [-] mumblemumble|4 years ago|reply
But I would not want Git itself trying to understand the contents of files. That seems to me to be an idea that lives on a misconception of the "things programmers believe about names" variety. Not every file in source control is source code. Not every programming language's grammar maps to an abstract syntax tree. In some files, such as makefiles, the difference between tabs and spaces is semantically significant. Some languages (such as Fortran and Racket) have variable syntax. And so on and so forth.
So I think that we really don't want the source control system itself trying to get too smart about the contents of files. That will inevitably make the source control system less compatible with the various kinds of things you might want to put into source control. And it will also make the source control system a lot more complicated than it would otherwise be, in return for a largely theoretical payoff.
But if we want to delegate the work of generating diffs off to other people, so that Git can allow for syntax or semantics-aware diffing without having to personally wade into that quagmire (and perhaps also allowing language communities to support multiple source control systems, a bit like how it works with LSP), that might be an interesting thing to experiment with.
[+] [-] saurik|4 years ago|reply
This is already supported.
[+] [-] ffwacom|4 years ago|reply
Are there some examples of this?
[+] [-] madmax96|4 years ago|reply
The solution will have to be pluggable. But I think it is possible, and there are sane things to do (e.g. fall back to vanilla git) when there are missing plugs.
[+] [-] ironmagma|4 years ago|reply
[+] [-] afavour|4 years ago|reply
Even just in the immediate term I wish I could make Git(hub) tabs/2 spaces/4 spaces/whatever agnostic. Seems crazy to me that in 2021 we still have to make opinionated choices across orgs about what to use... why can't we pull the code down, view it in whatever setup we want, then commit a normalized version?
[whispers] this is actually something tabs allow you to do natively by setting custom tab widths in text editors but I've given up trying to sell people on tabs at this point and just want to be able to do my own thing
[+] [-] williamdclt|4 years ago|reply
It's not a Git question, it's a programming language question. There's no reason source code need to be stored as plain text[1]! Editors show it as text, we edit it as text, but why wouldn't it be _stored_ as an AST? Not only does formatting becomes an editor concern, but code could even be edited as a tree, as a graph, as whatever you want
[1] - well, actually there's plenty of reasons: chiefly because plaintext is very interoperable
[+] [-] enriquto|4 years ago|reply
[+] [-] fstrthnscnd|4 years ago|reply
For instance, in many languages, one will sometimes have to split a function call to many lines, and in most languages function names aren't of fixed length, thus in order to get a correct alignment for parameters, the tab width at that point will have to match the function name length.
I agree with your idea of storing a normalized version of the code in the repo: it wouldn't then matter whether that version contains characters to align the code properly, it would just be inserted by the editor/linter as needed. The difficulty is that sometimes linting isn't enough, and some manual formatting is needed. Or perhaps the formatting rules are under specified?Another issue with AST diffing is when languages allow some form of syntactic sugar as preprocessing: the compiler might just see the simplified tree, not the one with the "sugary" forms. A tool capable of parsing such languages should also be able to handle these extensions.
[+] [-] Anon_troll|4 years ago|reply
You can often see where the writer put the most effort and thought by just seeing how they wrote it. This can help analyzing a codebase considerably.
If everything is normalized, you lose those valuable cues.
[+] [-] thrwyoilarticle|4 years ago|reply
[+] [-] pbiggar|4 years ago|reply
[1] https://darklang.com
* I haven't actually enabled users to configure this, but it's just some variables called 'indent' and `lineLength` in the code
[+] [-] geofft|4 years ago|reply
On its own, tabs vs. spaces only affects columns, and you can probably figure things out without columns (although it's a shame to lose it). But different tab sizes affect how long your lines are, and line wrapping is a thing that people care about at least as much as tabs vs. spaces (people with different size monitor or fonts will easily see too-long or too-short lines on their display; if your spaces are equivalent to the tab stop, the distinction is literally invisible). And once you start rewrapping lines, everyone's line numbers are different.
I think it's possible to solve this by using some sort of AST-based index into the file and teaching IDEs to let you seek based on that, but it's suddenly a more complex problem.
[+] [-] BiteCode_dev|4 years ago|reply
[+] [-] thefreeman|4 years ago|reply
[+] [-] convolvatron|4 years ago|reply
but admit it, tabs are fragile and a pretty weak implementation
[+] [-] mabbo|4 years ago|reply
git works on blobs of data, not files, and not lines of text. It doesn't just happen to also work on binary files- that's all it works on.
Now, if the author is suggesting that git-diff ought to have a language specific mode that parses changed files as ASTs to compare, now I'm interested. Let's do that. I'll help!
But git does not need to change how it works for that to happen. Git does not even need git-diff to exist to serve it's main purpose.
[+] [-] tux3|4 years ago|reply
Rebases and cherry-picks work by applying diffs, not by copying blobs. Auto-merging also needs to look at file content as text, you can't auto-merge a binary file with git.
It's an often repeated fact that if you look inside Git, it doesn't work with diffs, it works with blobs. But if you look closer, it's often diffs again!
[+] [-] munk-a|4 years ago|reply
I absolutely adore tooling around git that makes diffs more visible - one thing I absolutely gush over is anything that can detect and highlight function reordering... however, the core process of merging and rebasing and all that jazz - I don't think we're going to find anything automated that I'll ever trust when I'm not working on a ridiculously clean codebase - minor changes can have echo effects and when two people are coding in the same general area they need to be aware of what the other person is trying to do.
[+] [-] hardwaregeek|4 years ago|reply
[+] [-] mbauman|4 years ago|reply
https://nbdime.readthedocs.io/en/latest/vcs.html#git-integra...
[+] [-] nerdponx|4 years ago|reply
Part of what's nice about Git (and plain text in general) is that it's the lowest common denominator for a lot of things. This is why traditional Unix tools are built oriented around streams of bytes. Text is a low level carrier protocol; you can encode almost anything in it, but you need to agree on some kind of format.
The good part is that you can use very very generic tools on almost arbitrary pieces of data. The bad part is that you might have to do a lot of parsing and re-parsing of the same data, and you have to contend with the dangers of underspecified formats.
Git follows the Unix tradition in this regard. As a result, it is nearly universal in what it can store. You can use it to store pretty much anything, but you are now at the lowest common denominator of support for any particular data format.
Git-for-ASTs will no longer have this universality property, but will gain a lot more power in the covered domain. This is a design tradeoff.
One thing that's nice about Git is that you can specify arbitrary diff drivers with the "attributes" system. So even if the Git database is storing plain text, your diff driver can parse your source code into ASTs and present AST diffs to you when you run `git diff`. Perhaps more impressive, you can configure custom merge drivers, so you can (theoretically) implement semantic merging of ASTs right inside Git.
There are probably some fundamental limitations of this system, because the underlying data is still stored as blobs of bytes. But you can get pretty far as long as you don't mind parsing and re-parsing the same text over and over.
[+] [-] ssivark|4 years ago|reply
[+] [-] Jensson|4 years ago|reply
The only reasonable application would be to use a language AST parser to better identify relevant text diffs, but the commits still needs to be stored as text.
[+] [-] dboreham|4 years ago|reply
[+] [-] shepherdjerred|4 years ago|reply
[+] [-] pkghost|4 years ago|reply
[+] [-] Karellen|4 years ago|reply
`git diff` and `git merge` work with lines of text by default - but they don't have to. You can supply your own `diff` and `merge` tools with the `difftool.*` and `mergetool.*` config options, try them out with `git-difftool` and `git-mergetool` commands, and set the default with the `git.diff` and `git.merge` config options.
If someone wanted to create AST-based diff and merge tools for a given language, they could be plugged right into the existing `git` infrastructure and it would work with them absolutely fine.
[+] [-] bspammer|4 years ago|reply
I've heard of people using it with pdfs as well - a pdf to html converter lets you get a good idea of what changed in the document.
[+] [-] dTal|4 years ago|reply
[+] [-] tyleo|4 years ago|reply
[+] [-] colonwqbang|4 years ago|reply
The problem seems to be that we are lacking the format and the toolchain to manipulate it, and that is not the fault of git.
What is the state of the art in this area? Does somebody know of a viable format and toolchain, or any interesting projects looking to build them?
[+] [-] indentit|4 years ago|reply
[1]: https://news.ycombinator.com/item?id=27875333
[+] [-] kapep|4 years ago|reply
There's a lot tooling in the Eclipse modelling ecosystem which could be easily used for this. Storing XML-based models in git is no problem and there's tooling for diffing and merging models via a GUI or programmatically. Combined with the fact that xtext DSLs use EMF models to represent ASTs, it wouldn't be too hard to glue together an AST-based a diff/merge tool for an xtext DSL.
[+] [-] kmeisthax|4 years ago|reply
Merge drivers are Git's most powerful and least known feature, and I really wish they were more common.
[+] [-] rileymat2|4 years ago|reply
I am not sure this is true.
In the past it gave me problems with line ending normalization between windows/mac/linux, in and out. In those cases it definitely had a lines of text view of things.
[+] [-] zomglings|4 years ago|reply
Our tool uses git as the foundation of its functionality. It superimposes git diffs on top of ASTs.
It is insanely powerful.
For example, we use it to power semantic code search and current support Python, Javascript, and Java. We generate a JSON object describing the AST differences between initial and terminal commits on GitHub PRs. A full text search on the JSON objects performs surprisingly well when we want to answer questions like, "When did we add dateutils as a dependency?" or "When did we last change the /journals handler on the API?"
The Python integration currently sees the most use but if you are interested in other languages, we would be happy to support it.
Do drop me a DM if you want help getting started with Locust.
[+] [-] tombert|4 years ago|reply
Whenever I do Clojure, something that can get difficult when working with multiple people is how the parentheses/brackets/braces stack up, especially when everyone seems to have different opinions on how that works. As a result, if you're not careful, when there's a merge conflict you can have a ton of extra parentheses, which can be irritating to debug.
Obviously this is at some level an issue inherent to Lisps (and to be clear, I love Lisps, and these small headaches are worth it), but I think problems like that could be reduced if our source controls were aware of the ASTs.
[+] [-] ClassAndBurn|4 years ago|reply
It's important to remember that Git is a terrible database because of its lack of semantic structure. All conflicts require a human who does have to context. This is why almost no one builds a system that uses Git as a two way interface. And when they do, its via Github Pull Requests (which go to humans) and not Git itself.
In all, this makes it a wonderful general purpose shared filesystem. And that's about it.
[+] [-] cormacrelf|4 years ago|reply
The output could be a lot more compact, it could do better at adding context (in the same way https://github.com/romgrk/nvim-treesitter-context does, etc), but if you're interested in this it's really within reach, go help out.
I wonder if you can use it for automerge yet.
[+] [-] tomxor|4 years ago|reply
Fundamentally git does not operate on text, it operates on files (content addressed SCM not a ledger of text diffs); diffs are generated upon request between arbitrary merkel trees. So there is no need to implicate git in such a tool, it can be independent:
[+] [-] maweki|4 years ago|reply
Not to mention changing ASTs (while maintaining concrete syntax) in different versions of the language.
[+] [-] cies|4 years ago|reply
This is a nice contemporary one:
https://github.com/projectional-haskell/structured-haskell-m...
Lisps also have all kinds of options available in Emacs, but it is more special to see this outside of the land of s-expressions.
[+] [-] shoo|4 years ago|reply
> having content conflicts is no fun either. A level designer wants to work in the level editor, not manage strange content conflicts in barely understandable XML-files. The level designer should never have to mess with WinMerging the engine's file formats.
> And conflicts shouldn't be necessary. Most content conflicts are not actual conflicts. It is not that often that two people have moved the exact same object or changed the exact same settings parameter. Rather, the conflicts occur because a line-based merge tool tries to merge hierarchical data (XML or JSON) and messes up the structure.
> In those rare cases when there is an actual conflict, the content people don't want to resolve it in WinMerge. If two level designers have moved the same object, we don't really help them address the issue by bringing up a dialog box with a ton of XML mumbo-jumbo. Instead, it is much better to just pick one of the two locations and go ahead with merging the file. Then, the level designers can fix any problems that might have occurred in the level editor -- the right tool for the job.
-- http://bitsquid.blogspot.com/2010/06/avoiding-content-locks-...
[+] [-] CodeIsTheEnd|4 years ago|reply
This is by far the most obvious example of "git doesn't understand programming languages", but it also seems like the most straightforward to fix.
[+] [-] aardvark179|4 years ago|reply
[+] [-] unknown|4 years ago|reply
[deleted]
[+] [-] ufo|4 years ago|reply
[+] [-] alkonaut|4 years ago|reply
So any shared append-only file (a change log, an enumeration,…) doesn’t automatically conflict.
Syntax aware diffing would be great too, but I’d take something much simpler. For syntax aware stuff I’d love something that could tell semantic changes from noise.
[+] [-] auscompgeek|4 years ago|reply