top | item 31620002

Ask HN: Why aren't code diagram generating tools more common?

132 points| lurker137 | 3 years ago

When I'm trying to get familiar with a new codebase it often takes me a long time to build a proper mental model of the whole system. Even with my own projects, it's easy to lose track of all the components and their interactions since they're constantly changing, and making hand-drawn diagrams is time consuming.

So my questions are:

- Why isn't diagram generation automated as part of the build process (UML or otherwise)?

- Why aren't code visualization tools more popular? The options out there seem outdated

- Would you want to use these tools? What would be your ideal tool?

Edit: looks like this is a duplicate question https://news.ycombinator.com/item?id=31569646

I can't delete it so feel free to discuss more

108 comments

order
[+] mattm|3 years ago|reply
When I took on a tech lead role, I began to understand the importance better of having diagrams available for key parts of the system. It just becomes so much easier to explain to people and get them up to speed. I often find myself referring back to the diagrams as well to refresh my memory quickly.

After trying various diagramming tools and dragging around boxes and lines, I settled on PlantUML which makes diagrams much much easier to create and modify. It cuts out a lot of the pain of diagramming with the mouse which means there is less resistance to creating diagrams and I do it more.

To your question, "Why isn't diagram generation automated as part of the build process" - one thing I've found that would be difficult to solve is the level of detail you need in the diagram. For instance, in a very complex system with many decision branches, a diagram with every branch would not be helpful. There are cases where I want a high-level component overview but don't want to clutter up the diagram with lots of details. And yet there may b cases where I do want some more detail but may be only in a certain section of the code. I think this judgement of detail tradeoffs is what would be the hardest problem to solve for diagram generation tools. You want enough detail to be useful but not too much to be overkill.

[+] parker01011001|3 years ago|reply
I hear you on having detail to be useful and not an overkill. As echoed by others, I usually bookmark threads like these to find a tool that can help, I've found SourceTrail and CodeSee. Have you used either one?
[+] HighlandSpring|3 years ago|reply
One of the problems is: what is the language to describe these diagrams? We do have UML and it's various variants: PlantUML, Mermaid but these are too low level to prescribe conventions over how to use these to describe complex architectures. A sequence diagram could describe anything through customer journeys, rest api call patterns to call stacks within a VM. Granularity/level of abstraction needs to be captured or else you end up with metres squared of boxes that cannot be parsed at a glance unless you're Rainman.

The closest I found that solves this problem is https://c4model.com/ but you still need the code to turn your code into these markups. Can this be well inferred from code alone without framework specific interpreters? I doubt it.

And then you still need a frontend to zoom and navigate the ridiculous amount of hierarchy found within any modern software architecture, e.g microservices.

It also doesn't help microservices patterns also prescribe that you don't share repositories or code. So now you also need to pattern match untyped references across these codebases.

This is a lot of convention and tooling that I'm not sure exists.

Edit: and this is before even getting into version control and reconciling the target->as-is iterative loop.

[+] pjot|3 years ago|reply
I really enjoy(ed) using c4. Still need to figure out how to protect against screenshots though!
[+] CipherThrowaway|3 years ago|reply
Boxes and arrows are a bad representation for complex systems with detailed relationships. Legible diagrams are limited to high level representations of a system where many details are left out. Constructing useful high level views of complex systems requires human judgement.

Generation of legible diagrams could be accomplished on a domain or framework basis where code is subject to local patterns and can be structured "for" generation. We see this with things like OpenAPI schema generation.

Ultimately I think diagramming isn't prioritized because diagrams themselves aren't that valuable. They're just a medium for the actually valuable thing: high level representations.

[+] chrismatheson|3 years ago|reply
I wonder if one could feedback into the other though?

I want this diagram from my code because it’s a simple way to understand the system, ok then you better refactor your code to match that simple abstraction…

[+] cheunste|3 years ago|reply
> Why isn't diagram generation automated as part of the build process (UML or otherwise)?

I vaguely recall Visual Studio has this option where you can generate some sort of class diagram. It looked like shit the last time I used it (~2019) especially as your classes get more and more functions built into it. I also can't imagine how shitty it looks for codebases that have a significant coupling problem.

Furthermore, creating a UML diagram is a documentation process rather than something that should be automatically built in. I put it on the same level as writing a document in a word doc or something that's done as the project gets closer to being finished. Some places can live with it, a lot of places (actual software companies) probably do not as they move unreasonably fast (Agile) which does not even allow time for documentation or they just purposely neglect documentation.

> Why aren't code visualization tools more popular? The options out there seem outdated

Because they look like shit. I tried mermaid with markdown, I was not happy with the results, I tried plantUML back in 2019, I hated how it ended up looking, I hated how I have to install java for it, and I gave up on it pretty quickly.

The only code visualization tool I ever use is either draw.io or MS Visio. At lease there's a plugin for that for VS Code.

> Would you want to use these tools? What would be your ideal tool?

Markdown with vim option. It also must have an option to force a top-down flow approach and not freaking forcing it to be a left-right layout

[+] kevan|3 years ago|reply
>It looked like shit the last time I used it (~2019) especially as your classes get more and more functions built into it. I also can't imagine how shitty it looks for codebases that have a significant coupling problem.

That's the point, right? Visually representing the complexity of the system. I've used IntelliJ to do this before to show why modifying certain behavior was so slow and error-prone. In that case there were 3-4 classes with heavily overlapping functionality because, surprise, in the past there were multiple teams contributing to the same codebase that all did their own thing.

[+] flohofwoe|3 years ago|reply
I think UML (etc...) was one of those things that look great on the surface but once you start diving deeper all the problems hidden under the surface become overwhelming. If a thing has been tried many times in the past and even with a lot of money thrown at it, yet it still disappeared into obscurity, then it's a pretty good sign that the idea wasn't great to begin with.

In practice it's the same problem as "noodle graph" visual programming. It works well in some niches (e.g. creating shaders in graphics programming, or sometimes describing AI tasks in game programming), but it completely breaks down outside those niches.

[+] NonNefarious|3 years ago|reply
Plus, if it's as much work to create a diagram as it is to do much of the programming (not to mention maintaining it), you're just not going to do it.

One type of diagram I have found to be truly useful, though, is the sequence diagram. I needed to integrate someone else's library into my application, and having this was a huge help.

If anyone has a pointer to a good sequence-diagram generator (that runs on Mac, preferably), I'd be happy to hear about it!

[+] la3lma|3 years ago|reply
It's hard because programming is hard :-). I still believe UML is great, but the difficulty is to make the diagrams so precise that they convey crucial understanding, yet so abstract that they hide as much detail as possible.

That is nontrivial, and it is very hard to do well. But it is also the essential job necessary when designing software and then communicating the essence of that design.

My favourite tool btw is plantuml. It lets you describe diagrams (class, sequence, deployment) with text/algebra. Plantuml works well up to a point where the diagrams becomes to complex for the layout algorithm to do well.

I used to think of this as an annoyance, but now I think of it as a feature: It is a way for the universe to tell me that the model is becoming too complex. The layout algorithm serves as a proxy for everyone else that should parse the diagram, and if I can make the diagram better by simplifying, so be it.

Now, a human can do diagram layout better than plantuml, so a human can easily concoct diagrams that are both more complex and better looking than plantuml, but it is my firm belief that this usually not a good thing: It more often than not means that the message is lost in the complexity of the diagram.

Keep it simple!

[+] Weidenwalker|3 years ago|reply
I've already mentioned this on the other thread (https://news.ycombinator.com/item?id=31569646), but my friend and I have been working on https://www.codeatlas.dev as a sideproject - it's a tool for creating pretty (2D!) visualisations of codebases, while providing additional insights via overlays (e.g. commit density, programming language or other results from static analysis like dead code/test coverage/etc.). For example here's the Kubernetes codebase visualised using codeatlas: https://www.codeatlas.dev/repo/kubernetes/kubernetes

At the moment, codeatlas is just the static gallery, but we're only a few weekends away from releasing a Github action that deploys this diagram on github pages for your own repos - if you're interested, feel free to watch this repo: https://github.com/codeatlasHQ/codebase-visualizer-action

OP, how close is this to what you had in mind in your question?

EDIT: fixed broken links :o

[+] mthoms|3 years ago|reply
Just a heads up: Your links are broken. I think it's because you are using Reddit's syntax which HN doesn't support.
[+] lurker137|3 years ago|reply
I've since been convinced that what I had in mind initially (generating a bunch of static diagrams with each build) is not very useful. Your site comes closer to what I think would be the better solution, an interactive diagram, but at the level of classes/functions and their interactions instead of files/folders. Your project looks great for exploring a Github repository though.
[+] porcoda|3 years ago|reply
I’ve had this need a few times. Just a couple weeks ago I needed to quickly understand the set of package dependencies within a codebase and wrote some scripts that extracted a report as well as a graphviz file. I’ve done that a few times over the years. The biggest obstacle to a general purpose tool usually is the compiler front end that is needed to correctly parse the code to get the entities and relations you need to visualize. Without that it’s hard to write a reliable tool for extracting the information, and if you care about multiple languages you need multiple front ends.

People do want it (contrary to the common HN refrain of “well, I don’t want it so clearly nobody wants it”). We’ve had customers where I work specifically ask for these kinds of tools. They’re just harder than they seem to write, not only for the parsing reason I mention above. For many codebases you see a giant ball of spaghetti if you look at the full graph, or the layout algorithm gives you something gigantic and hard to browse. That’s a deficiency in graph visualization tools: again, a hard problem with little good tooling out there.

I’d love to see more work in this area since there do exist people who see value in it, contrary to the skeptics.

[+] snapdaddy|3 years ago|reply
I agree that dependency graphs are the way to go - they show the structure of the code better than anything else. And yes, the problem is that the graphs show all of the details, whether they are important for gaining a good, overall picture of the system or not.

What you need is the ability to filter the graph. Narwhal and the nx mono-repo toolset has a pretty cool dependency graph feature built in. Here's a video of how they use it:

https://youtu.be/KTGKpoiLE0k?t=253

[+] graphviz|3 years ago|reply
Where is the funding for this work? Customers want to solve end-to-end problems, now, so it's difficult to get support for deep work on a particular part of the visualization and analysis stack (for almost anything, not just code).
[+] rgoulter|3 years ago|reply
Maybe it'd be neat.. but, I think sometimes "the map is not the territory" goes both ways. - I probably want a diagram to be simpler than the actual system.

With a manually constructed diagram, I have leverage to handwave irrelevant details away.

Perhaps to compare with documentation: it's easy to automatically describe things like types, and maybe callgraphs, but there's value in having prose which explains details about the interface which the program's type doesn't reveal. - With diagrams to visualise a system, the significance (or incidental nature) of the relationships may be hard to pick automatically.

[+] nelgaard|3 years ago|reply
Yes exactly. There are no tools that can build proper mental models.

Most of a system is either uninteresting or trivial. You need someone to tell you where the interesting part is.

[+] sidlls|3 years ago|reply
Diagram generation is plagued by the same problems as the "Rational Rose" fantasy of automatic code generation from diagrams: trivial applications are trivial to diagram, and non-trivial ones defy it, as the complexity (dependencies tend to form dense, multiply connected graphs in these applications) quickly outstrips any straightforward mapping to a visual representation.

I wouldn't use these tools anyway, to be honest. They have some limited utility when constrained to small components/parts of an application (e.g., self-contained libraries), but for understanding systems as a whole there is too much to have effective reverse-engineering into a visualization (in my opinion).

[+] charlieflowers|3 years ago|reply
I think it's kind of like the AI Winter -- there was a period of time when the software industry really went down a stupid path in regards to diagram-based code generation. A lot of kool aid was drunk over promises of making it so that everyone would be able to program.

But, of course, it turns out, someone still needs to understand and be able to debug all the nuances that makes complex logic systems complex, especially when they're cobbled together from many underlying systems.

The real goal should be to take good programmers and magnify what they can do. But since the industry bought so hard into the naive vision, the industry is behind where it should be on a smarter vision.

[+] rjsw|3 years ago|reply
The OP is asking for diagram generation not code generation.
[+] dahart|3 years ago|reply
Speculating based on both writing large systems from scratch, and joining a group where a large confusing system was being used...

Diagrams are sometimes unnecessary overhead early in a project. Sometimes I’ve used them and seen other people use them for initial design planning, especially if management needs to be involved or approve the plans & schedule. But by a year later, the design has grown and changed, and everyone on board is so familiar with the code, but also so pressed for time and feature delivery, that making diagrams doesn’t make sense: nobody involved at this point needs them. Two years later, when the code is getting complicated and slowing down, and you’re onboarding some new people, that’s when it might help to sketch the flow of code.

FWIW, sometimes a good profiling tool will show you and let you explore call stacks, call graphs, execution charts, etc. I often reach for a profiler when I’m new to a codebase. Flame charts are a fave of mine. You can find flame charts in Chrome’s debug tools, or in compiled language profilers like vtune or valgrind. Here’s a decent article on how to use them https://www.brendangregg.com/FlameGraphs/cpuflamegraphs.html

Another issue is that well designed code bases diagram themselves by their module structure, while diagrams for poorly designed code bases may not help understand them at all. When code has too many side effects, or things are poorly or misleadingly named, when class boundaries aren’t well defined or the code has a lot of spaghetti, diagrams might not really help.

IMO two things worth doing are: get a mentor in any new codebase any time you can, and 2) start building your own arsenal of code diagramming tools, rather than wondering why or waiting for others to do it. Demonstrate the value of diagramming code to people around you and see if you can get it to catch on.

[+] lurker137|3 years ago|reply
The thing with profiling tools is that they're more focused on the details than the big picture at the system level. I'll definitely be using flame graphs more though, thanks for the tip. Also you are absolutely right about waiting for others to make the tools, but sometimes the tools don't exist for a good reason that I wouldn't have realized otherwise.
[+] rramadass|3 years ago|reply
CASE Tools with round-trip engineering need to make a comeback.

To answer your question, people do use various tools to extract Class Hierarchies, Call Graphs, Cross-Reference listing etc. The other HN thread that you have linked to contains some details. Lots of people do use them. You can easily add Doxygen/CFlow etc. to your make files to generate the diagrams during every build. The key thing for usage is that do not try to comprehend the entire system as a whole (all but impossible for large systems) but localize your study to a module at a time. Once you have the different pieces mapped out, you can combine them by hand.

[+] davidy123|3 years ago|reply
Fully agree. A diagram is ok to describe a flow, but for complex code or systems, round tripping is a necessity, otherwise the diagram is or quickly becomes inaccurate and worse than useless. In the 90s I tried software called TogetherJ, it seemed to support round tripping really well, so well maintaining code was the same as maintaining diagrams, and led to better quality in both, along with documentation and other benefits such as relaying higher order concepts. I just did a search for togetherj and the only reference I could find was a 20 year old forum mention. Weird. I suppose there must be high end enterprise CASE software that still supports this approach, though I'm guessing most people abuse it so it's no longer respected. I think these things happen in cycles, as we can 'orchestrate' larger systems with meta descriptions of components, they become more valuable.
[+] mtkd|3 years ago|reply
Any high level diagram usually needs one of the architects to produce it manually in the context of the audience and what aspect they specifically need to know about the structure/logic

In most complex systems the part where the magic happens is likely impossible for a tool to identify so would get lost in the noise of the cruft around it -- even for monoliths using frameworks and especially for anything distributed across microservices, and it's usually that aspect that is of most interest

[+] phailhaus|3 years ago|reply
> - Why isn't diagram generation automated as part of the build process (UML or otherwise)?

This is very hard. And since it's hard, it's not automated. And since it's not automated, it goes out of date very quickly. I think that's the fundamental issue: keeping around evergreen documentation is a lot of overhead. There is no connection between the code and the diagrams, so it's too easy to change the code and not realize that the diagram needs to be updated too.

Another thing is that it's really the most useful for new members. If you've been working on the infra for a while, you already know the structure and you don't need the diagram. So teams tend to just avoid the diagrams altogether.

[+] parker01011001|3 years ago|reply
Completely agree. And what ends up happening is that when generating diagrams doesn't turn into tradition, it does end up being forgotten and contributes to more time spent by members in understanding the entire codebase. I find it hard to believe that there isn't a tool that is being built for this though so I know we'll eventually have it.
[+] photochemsyn|3 years ago|reply
One issue is that a general visualization tool might have have a lot of problems jumping from a codebase in language X to a codebase in language Y (let alone a mixed codebase). MS seems to have this Code Map tool but it looks like it's for C# / VB mainly, with some C++ support.

https://docs.microsoft.com/en-us/visualstudio/modeling/map-d...

In many cases it might really be faster and easier to just diagram things out with a pad of paper and a pencil compared to setting up a tool like this and getting all the parts working correctly without any bugs.

That said, a virtual reality 3D tool for visualizing code base dependencies, internal structure, what parts call what other parts, internal exception handling etc. would be pretty cool. Maybe it's an area where AI machine learning could do something.

[+] lurker137|3 years ago|reply
That tool from Microsoft comes closest to what I think would be ideal, generated diagrams that aren't static and where you can include/exclude and move around components. It would definitely have to be language specific, but once things catch on IDEs implement language specific plugins soon after.
[+] Kapura|3 years ago|reply
- Why isn't diagram generation automated as part of the build process (UML or otherwise)?

It's another thing that can break, another element that needs to be maintained. In my experience there are very few pieces of code that will be able to run indefinitely without ever being updated, fixed, or re-examined at some point. The cost of adding more processes is not one-time, and it can be difficult to figure out what the time bounds are.

- Why aren't code visualization tools more popular? The options out there seem outdated

People who are interested in the structure of code are typically engineers, capable of writing and reading the codebase of interest. A UML diagram may be a way to understand an element of the system, but things such as in-line comments in the codebase itself are often more instructive on structure and function.

- Would you want to use these tools? What would be your ideal tool?

When I was in high school, if I didn't want to read, say Crime & Punishment, I could buy the Cliff's Notes version, and get a chapter-by-chapter summary of major characters, events, and literary techniques. In many ways, it contained all of the information of the book without the substance.

But importantly, it took significantly less time to read and fully process than the book, while being written in the same language. In code, it is already extremely easy to look thru header files, or collapse every function in your IDE to get a high-level overview of what data and methods exist. You can then dive in immediately to anything you would like to understand better ("what does 'UpdateSignificanceValue' really mean") and there's no mental overhead in translating from an encoded diagram into whatever your mental model is. This is why I do not personally see value in code visualization -- outside of notes I take that are relevant to any specific problem I am working on.

[+] mariojv|3 years ago|reply
This isn't a tool for generating diagrams from actual code, but I have really enjoyed using PlantUML lately while putting together design or architecture proposals: https://plantuml.com

As someone who is not a very visual person at all, I found it really nice to use to make my design docs more comprehensible to visual learners. I've gotten good feedback about designs every time I've used the tool.

[+] forinti|3 years ago|reply
Even a few very basic UML diagrams (Use Cases, Class, Sequence) can form a very effective introduction to a system.

I feel that people have just embraced Agile blindly and simply forgot about basic modelling.

[+] Frost1x|3 years ago|reply
Embraced or been forced into it? If you want to do long term planning and design, good luck. Everyone wants to continuously change their mind on what they want/need and have the software react yet they also somehow assume this approach creates well defined systems when it does quite the opposite. Adaptation can still create reliable and well defined systems but the rate of adaptation needs to be reasonable. In agile that simply isn't the case, it's just a way to pass consumer demand and responsibility for meeting that demand right down to developers while arbitrarily placing budgetary and time constraints around that process. Development teams are often acting as small businesses anymore with similar risks but less rewards with a middle men sitting between them and the consumer, unless you work at a large tech company where that's still a little bit insulated although not entirely when product lines are killed off.

Ultimately, you just create and endless amount of complex work that keeps developers continuously busy. On the bright side there's a never ending amount of tedious work wrestling systems back into some manageable form, on the downside that work is miserable, in my opinion because much of it can be removed whth proper planning. At some point, expectations eventually meet reality no matter how many developers management burns through, at some point it's clearly not an issue with technology, it's an issue with approach and project management. By that time the organization has had enough turnover in those above and below those pushing agile that those issues too can be hand waived away and the cycle repeats.

[+] hotcrossbunny|3 years ago|reply
Absolutely this! Communication of design is an enormous gap that has emerged in the last could of decades.
[+] rramadass|3 years ago|reply
>I feel that people have just embraced Agile blindly and simply forgot about basic modelling.

Very Good Point!

If you only look at things piecemeal and never holistically, the need for modeling and corresponding tools decreases.

[+] mynegation|3 years ago|reply
At the beginning of my career I worked for a company that was started with two reverse-engineering tools: one to produce low-level, single method/function flowcharts and another for automatic extraction of high level components and connections between them. It retired the former and later - the latter and pivoted to static analysis tools: finding logical errors, security vulnerabilities, enforcing coding standards. So I have first-hand knowledge of what worked and what did not work.

The main problem with low-level code visualization was that it did not add much to the well-formatted code representation in most cases. As for the high-level architecture extraction tool, which is more close to the question in the article, many links on the diagram do not just involve header inclusion, module import, method calls etc that are relatively easy to extract (not without its own challenges with virtual and indirect calls though). Users wanted to see Inter process communications (socket, queues, pipes, http connections) and extracting those is an uphill battle though we introduced some of it (lots of custom, platform specific code). Between this and knowing which connections are important and which are less so, automatically extracted diagrams were of limited value.

[+] rramadass|3 years ago|reply
What were the two tools used?
[+] prbs23|3 years ago|reply
I totally agree that having visual diagrams of a code base can be super helpful, especially when getting familiar with the code, or onboarding a new engineer. However I don't think we see automated tools because generating a useful diagram from source code is not a solved problem.

Fundamentally I think that the useful kind of software or system diagrams are always abstractions of the actual code. Figuring out the correct abstraction for the intended purpose requires either experience or a lot of trial and error. It may be possible for very specific applications, but I kind of doubt there is an algorithm to generate the content for a useful system diagram from the raw code.

Then there is the problem of rendering and layout out the diagram automatically. We have Graphviz and Mermaid, and probably others I haven't heard of, and while these do an okay job, I've never found their layout algorithms to be particularly great.

Overall, I don't think anything is going to be as useful as a manually drawn diagram, made with a specific intent in mind.