Show HN: Visualize the entropy of a codebase with a 3D force-directed graph
180 points| gabimtme | 2 years ago |github.com
I work at a startup where business evolves really fast, and requirements change frequently, so it's easy to end up with big piles of code stacked together without a clear structure, specially with tight deadlines. I made dep-tree [1] to help us maintain a clean code architecture and a logical separation of concerns between parts of the application, which is accomplished by: (1) Visualizing the source files and the dependencies between them using a 3D force-directed graph; and (2) Enforcing some dependency rules that allow/forbid dependencies between different parts of the application.
The 3D force-directed graph visualization works like this: - It takes an entrypoint to the codebase, usually the main executable file or a library's entrypoint (index.js, main.py, etc...) - It recursively crawls import statements gathering other source files that are being depended upon - It creates a directed graph out of that, where nodes are source files and edges are the dependencies between them - It renders this graph in the browser using a 3D force-directed layout, where attraction/repulsion forces will be applied to each node depending on which other nodes it is connected to.
With this, properly decoupled codebases will tend to form clusters of nodes, representing logical parts that live together and are clearly separated from other parts, and tightly coupled codebases will be rendered without clear clustering or without a clear structural pattern in the node placement.
Some examples of this visualization for well-known codebases are:
TypeScript: https://dep-tree-explorer.vercel.app/api?repo=https%3A%2F%2F...
React: https://dep-tree-explorer.vercel.app/api?repo=https%3A%2F%2F...
Svelte: https://dep-tree-explorer.vercel.app/api?repo=https%3A%2F%2F...
Langchain: https://dep-tree-explorer.vercel.app/api?repo=https%3A%2F%2F...
Numpy: https://dep-tree-explorer.vercel.app/api?repo=https%3A%2F%2F...
Deno: https://dep-tree-explorer.vercel.app/api?repo=https%3A%2F%2F...
The visualizations are cool, but it's just the first step. The dependency rules checking capabilities is what makes the tool actually useful in a daily basis and what keeps us using it every day in our CI pipelines for enforcing decoupling. More info about this feature is available in the repo: https://github.com/gabotechs/dep-tree?tab=readme-ov-file#che.... The code is fully open-source.
[+] [-] ytjohn|2 years ago|reply
From the visualization perspective, it reminds me a lot of Gource. Gource is a cool visualization showing contributions to a repo. You see individual contributors buzzing around updating files on per-commit and per-merge.
https://github.com/acaudwell/Gource
[+] [-] gabimtme|2 years ago|reply
Golang is very challenging to implement, because dependencies between files inside a package are not explicitly declared, you can just use any function from any file without importing it as long as they both belong into the same package, so supporting Golang would probably require spawning an LSP and resolving symbols.
The reason for implementing dep-tree in Go was because things were going to get algorithmic af, and better to choose a language as simple as possible, knowing that it also needed to be performant.
[+] [-] sam_bristow|2 years ago|reply
[1]https://erikbern.com/2016/12/05/the-half-life-of-code.html
[+] [-] gabimtme|2 years ago|reply
[+] [-] Weidenwalker|2 years ago|reply
Always interesting to see different ways of visualising the same thing. A while ago my friend and I also made a codebase visualisation tool ([https://www.codeatlas.dev/gallery](https://www.codeatlas.dev...), but instead of taking the graph route, we opted for Voronoi treemaps in 2D! It's a tradeoff between form and function for sure, modelling code as a DAG is definitely more powerful for static analysis. However, in most graph-based visualizations (this, gource) I just find myself getting lost super quickly, because the shapes are just not very recognisable.
Really impressed by how polished this already is, nice docs, on-the-fly rendering, congrats!
If I ever find time to work on codebase visualisation again, I might have to steal the idea of codebase entropy to better layout which files to place close to which others!
[+] [-] Weidenwalker|2 years ago|reply
[+] [-] daxfohl|2 years ago|reply
Edit: oh, looking at the docs, apparently that's exactly what this tool does. Though it would be nice to have function level granularity. Maybe by annotating the code itself.
[+] [-] sam_bristow|2 years ago|reply
[+] [-] contravariant|2 years ago|reply
[+] [-] gabimtme|2 years ago|reply
[+] [-] a1o|2 years ago|reply
[+] [-] sideshowb|2 years ago|reply
[+] [-] gabimtme|2 years ago|reply
[+] [-] SushiHippie|2 years ago|reply
Lets say my project looks like this:
src/example/foo.py
src/example/bar.py
And If bar.py containse the statement "from example.foo import Foo" there is no link between the files foo and bar. Though, if the statement is "from .foo import Foo" it shows a link.
[+] [-] gabimtme|2 years ago|reply
export PYTHONPATH=src
[+] [-] Already__Taken|2 years ago|reply
[+] [-] MilStdJunkie|2 years ago|reply
[1] Asciidoc/RsT (include directive for both), XML (DITA/S1000D/DocBook/etc, each with different transclude mechanisms), any markup that supports transclusion.
[+] [-] palmfacehn|2 years ago|reply
Would love to see a tool that could automatically break these dependencies into optional features within their crate. It felt like a poor use of my time to track everything down manually.
[+] [-] TN1ck|2 years ago|reply
https://youtu.be/oyLBGkS5ICk?si=cawjnPnR9riEyvf2
[+] [-] sideshowb|2 years ago|reply
Out of interest, I'm thinking how this sort of method works if you ignore the semi-arbitrary distinction between your own code and other libraries. If, say, an array class is used everywhere, wouldn't that look like a bad pattern on the dependency graph? Or is there a way to read the graph that tells you that your pervasive use of np.array is still appropriately decoupled?
[+] [-] gabimtme|2 years ago|reply
If a node is depended upon a lot, all the resulting edges induce weaker forces to adjacent nodes, so this accounts for the fact that some files will be depended upon a lot, and that's fine.
There's also the option to just exclude that kind of files from the analysis with the --exclude flag. I've found that to be useful for massive auto-generated files.
[+] [-] christkv|2 years ago|reply
[+] [-] leetrout|2 years ago|reply
> I work at a startup where business evolves really fast, and requirements change frequently, so it's easy to end up with big piles of code stacked together without a clear structure, specially with tight deadlines
That smells.
It sounds like the team could benefit from better stack technologies and a bit more discipline in how it is applied to solutioning.
> Enforcing some dependency rules that allow/forbid dependencies between different parts of the application.
What is the alternative to this tool that lowers the cognitive barrier / builds the right muscles for the team to understand what they should / shouldnt depend on?
[+] [-] gabimtme|2 years ago|reply
For our specific case it's actually pretty good, we've built a lot of discipline around maintainability, but in general this is a recurring problem in tech teams who might not be able to afford the time it takes to gain discipline.
> What is the alternative to this tool that lowers the cognitive barrier / builds the right muscles for the team to understand what they should / shouldnt depend on?
Some programming languages allow you to split the codebase into modular units (npm workspaces, cargo workspaces, etc..) which forces developers to modularize things, and dependencies between modules need to be explicitly declared.
This is good, but usually not enough, as nothing prevents you to mess things up within a module/workspace.
There's some other tooling with similar functionality to dep-tree, but language-specific and with visualizations not suitable for large codebases (.dot files, 2d svgs...)
[+] [-] nyrikki|2 years ago|reply
This is why we see all these products targeted at companies with 24 microservices with 26 developers who have to run end to end testing on everything.
Architectural erosion is primarily a cultural issue and any tool that helps people discover and call out architectural violations is potentially useful.
Many companies can't just do the inverse Conway law, and if you look at the state of devops report, note how they call out CAB forums and controls being problematic for even high performing companies to become elite.
This product as an example, which just really means you want to keep k8s but have given up on loose coupling and high cohesion.
https://www.signadot.com/blog/how-uber-and-doordash-enable-d...
Throwing products at structure problems typically doesn't work.
[+] [-] crucialfelix|2 years ago|reply
Maintaining a code base requires communication, PR reviews and discipline. That doesn't always happen.
Having lint check rules is brilliant. Never mind discipline, you just need a friendly error to say don't import services into an ORM model file. I'm going to adopt this right away.
[+] [-] gjgtcbkj|2 years ago|reply
[+] [-] rikroots|2 years ago|reply
One piece of feedback, if I may. It's really difficult to read the blue labels against the black background. Is there any way to change the palette colors?
[1] https://dep-tree-explorer.vercel.app/api?repo=https%3A%2F%2F...
[+] [-] gabimtme|2 years ago|reply
That's definitely is an improvement point, I have just calibrated things looking at my screen, which might have a high saturation/brightness setting.
Thanks for the feedback!
[+] [-] _ZeD_|2 years ago|reply
[+] [-] gabimtme|2 years ago|reply
[+] [-] airstrike|2 years ago|reply
___
0. https://nodejs.org/api/packages.html#subpath-patterns
[+] [-] gabimtme|2 years ago|reply
[+] [-] enoch2090|2 years ago|reply
[+] [-] gabimtme|2 years ago|reply
[+] [-] matheusmoreira|2 years ago|reply
[+] [-] compacct27|2 years ago|reply
The visualization here is amazing in its own right as well, can I ask what part of the codebase renders it and handled the force-directed part?
[+] [-] gabimtme|2 years ago|reply
Force-directed is an algorithm for displaying graphs in a 2d or 3d space, which simulates attraction/repulsion based on the dependencies between the nodes, the wikipedia page explains it really well https://en.wikipedia.org/wiki/Force-directed_graph_drawing
> Love it, I think dependency trees are super underused data for static analysis.
Definitely, specially for evaluating "the big picture" of a codebase
[+] [-] DenisM|2 years ago|reply
[+] [-] gabimtme|2 years ago|reply
[+] [-] jongjong|2 years ago|reply
React's graph looks like a mess. Why am I not surprised...
[+] [-] graphviz|2 years ago|reply
Graphs are wonderful abstractions for the structures that arise in many kinds of engineering, but you need to focus on understanding those abstractions, not just pictures rendered by heuristics. Visualization can be wonderful, but has its limitations, especially when used out of the box.