top | item 28074827

Show HN: Visualizing a Codebase

283 points| wattenberger | 4 years ago |octo.github.com

I explored an alternative way to view codebases to the typical folder/file list, showing a bird's-eye-view of its structure.

https://octo.github.com/projects/repo-visualization

96 comments

order

wattenberger|4 years ago

We always look at our code in a file/folder list - I explored an alternative way to view codebases, showing a bird's-eye-view of its structure. This write-up walks through the motivations, ways to use the visualization, and potential future directions (there are many!).

There's also an interactive tool to check out your own repos and a GitHub Action if you want to integrate a diagram into a README.

hsn915|4 years ago

I honestly don't understand the point of this.

Is it just a visualization of the directory structure?

I expect code visualization to more or less ignore the file structure and focus on semantic analysis. For example, show the major components of the system and how they interact. Perhaps the major components are represented by some kind of a module, or a collection of modules. I don't have any concrete ideas. But I was expecting something in this vicinity.

pvillano|4 years ago

ah, but is there a generalizable way to do this? Every code base has files and directories.

hevalon|4 years ago

Personally I find it interesting as tools like these have been helping me to understand the team's and delivery dynamics when I'm joining new dev-teams.

This particular is great, as it reminds me of a book that I had read a while back; Your Code as a Crime Scene [1] by Adam Tornhill.

Adam is trying to explain something similar, but takes the whole concept onto the next level by explaining how tech debt and hidden coupling can be discovered using the git history and similar file structure visualisations.

[1] https://pragprog.com/titles/atcrime/your-code-as-a-crime-sce...

tessierashpool|4 years ago

me too. I have a set of git history analysis scripts that I often use when I'm joining a new team.

they're here:

https://github.com/gilesbowkett/rewind

but they're a bit stale at the moment. one major weakness they have is that they work on a per-repo basis. no problem at all for monorepos, but for a company with a lot of repos, it'd make sense to use the GitHub GraphQL API to find out which repos see the most activity.

breck|4 years ago

This is so cool. I really appreciate how they added the "Search for a file" and "Excluded paths" to the demo. Makes it a lot more useful while still so simple to use.

Edit: the more I play with it the more I like it. Also just noticed their feature to deep link to repos (example: https://octo-repo-visualization.vercel.app/?repo=owid%2Fowid...). The future directions they mention also sound really exciting. Seeing files that cause a lot of CI failures, files by # of authors, files by # of changes, all that stuff would be really cool.

rckrd|4 years ago

My favorite tool for visualizing a codebase is Gource. Here's a 1 minute visualization of the Linux Kernel repository from 1991-2015 https://twitter.com/mattrickard/status/1423366779590430721

onefuncman|4 years ago

I used Gource to visualize a hackathon, we had everyone start from a templated repo so it was really easy to show the entirety of participants coding activity (albeit at a macro level)

taeric|4 years ago

I really really want to like these sorts of visualizations. But they just fall flat on me.

The "you can see really quickly..." text is scrolling by and I'm like... "Nope, that picture still means nothing to me." :(. It starts highlighting different parts and I'm completely at a loss on what is highlighted.

I do think this can be very effective once I'm trained on it. Such that I plan to play with it. But I just don't visually think of programs in anything close to this manner.

Anyone know if studies that explore how we think of our programs?

withinboredom|4 years ago

Anecdotally, I don’t visualize my code, however there is a sense of distance and depth between things (in the functional sense, not the lexical sense).

I feel like this visualization goes a long way to showing the distance I feel when working on the code, but that’s only because the visualization captures the lexical distance and we generally group functionally close things together.

akdas|4 years ago

I worked on something like this a few years ago, only in VR so you could walk around the visualization and use your spatial recognition abilities in 3D.

One part we struggled with was evolving the visualization with the codebase. I see in the demos at the bottom that small changes to the codebase can have a large impact on the visualization (unless I'm missing something), making it difficult to treat the visualization as a fingerprint over time. I wonder if there are plans to address this.

This is an area I'm very interested in, happy to chat about it any time.

breck|4 years ago

My big bet (10% confident it's correct and will be world changing), is that having languages that rely on spatial position of tokens in a 2D or 3D grid will be a big leap forward and make 3D visualizations quite natural.

I think 3-D visualizations of 1-D languages (all our current programming languages are 1-D) will not be so helpful, as you will be looking at transformations, not the actual code as it exists.

If anyone is intrigued and wants to write a function that takes as input a parsed Tree Notation program (https://jtree.treenotation.org/designer/) and outputs a a https://www.mecabricks.com/ file, get in touch! I could even fund something like that, if needed. Such a function would then be able to generate a LEGO version of any program written in a Tree Language. From there, I think there could be interesting discoveries to be made related to future version control systems and collaborative editing algos (I think you could beat CRDT/OT/et al).

dale_glass|4 years ago

That sounds interesting, can you give more details about that?

I work on Vircadia (https://vircadia.com/) and have been thinking for a while that it would be cool to have in-world visualizations of things like the project's structure and github activity. It's a big one, so perhaps the right 3D representation would make the project's structure more understandable at a glance.

dvt|4 years ago

Apologies for the hot take, but imo GitHub has been really knocking it out of the park with terrible ideas lately (remember how everyone fell all over themselves during the Copilot release?). This is an absolutely worthless visualization that only impresses those that haven't heavily worked with visualizations. A few points right off the bat:

    - Labels are way too small, so you'll need to zoom in..
    - ...but if you zoom in, you'll need to pan...
    - ...and if you need to pan, you lose context
    - Hovering over "connected files" is just a jumbled mess
Case in point: look at the `paperjs/paper.js` example they themselves provide. There's a big circle called "packages" and inside that circle, two smaller circles that all contain the exact same files: "package.json," "index.js," and "README.md" -- how is this insightful in any way? I need to go to the repo to actually see that one of the folders is called "paper-jsdom" and the other one "paper-jsdom-canvas." The visualization literally confuses me more than just looking at the repo.

I don't mean to be overly negative, but it's just not a good visualization and no one will ever seriously use this.

boulos|4 years ago

This is a Show HN post. While you have valid criticisms (small labels, using filenames as labels produces lots of package.json, etc.), the way you shared it certainly violates the site guidelines (“Be kind”).

You knew you were being harsh and let your emotional response get to you. But, you should remind yourself that a person was on the other side of this post, and she cared enough to share it. Even if you feel the visualization is unacceptably bad, you should seek to find a way to provide constructive criticism. You’ve got the beginnings of actionable feedback, it’s just covered in invective language (though directed at the work not the person, so that’s something!).

spartanatreyu|4 years ago

I don't think it's useful in it's current form (except perhaps for newcomers to a project to have an idea of the file structure at a glance).

But I also don't think they're trying to present this as a new killer feature they've been working on for years.

I'm pretty sure this is just an experiment/exploration done by a few people over a few months to see what they found, then they presented their results.

NikhilVerma|4 years ago

https://news.ycombinator.com/newsguidelines.html

> Be kind. Don't be snarky. Have curious conversation; don't cross-examine. Please don't fulminate. Please don't sneer, including at the rest of the community.

> When disagreeing, please reply to the argument instead of calling names. "That is idiotic; 1 + 1 is 2, not 3" can be shortened to "1 + 1 is 2, not 3."

> Please respond to the strongest plausible interpretation of what someone says, not a weaker one that's easier to criticize. Assume good faith.

TechBro8615|4 years ago

The guy literally says he timeboxed his exploration on this experiment.

I think it's pretty cool and would love to have the option to navigate repositories like this.

zumu|4 years ago

While I wholeheartedly agree, I'd argue there is value in having R&D teams tinkering around with different ideas. They may churn out duds, but theoretically they will produce something valuable eventually.

LeftHandPath|4 years ago

It's probably useless - but it would make for a killer powerpoint slide at your next meeting, wouldn't it?

pizza|4 years ago

I admire the people who work on pet projects like this one and share them with the world despite getting criticism that is just barely constructive. These are merely design issues- not things that stem from it being a terrible idea. It's a good idea that just needs a v2 design, there is no need to go on a diatribe about unrelated events, this was clearly just someone taking an idea they thought could be valuable and sharing it with others in the hopes they might gain value from it also.

I have needed quick visual fingerprints of repos for a very long time, even just a crude outlay of the filesizes was what I needed but this provides even more. I just don't like the tone of this criticism, and nobody would ever do Show HN's ever again if "it's just not a good visualization and no one will ever seriously use this" was the standard for commentary.

onion2k|4 years ago

As a tool for exploring a repo it does have some flaws in navigating, but as a tool for comparing the complexity of two repos it looks very useful. It's immediately obvious where the depth, complexity and 'weight' of something lies. That's useful.

Plus, even if the result isn't perfect, the fact people are exploring alternatives to a tree structure is great, because trees suck for anything that's broad and deep, especially in a language you're new to that doesn't have familiar patterns.

gus_massa|4 years ago

I tried with https://github.com/racket/racket and for some reason it puts all the content of the subfolder "racket"/"src" in a vertical strip near the middle of the circle instead of spreading the parts evenly. How is each part arranged?

wattenberger|4 years ago

interesting! I'm seeing them grouped in the middle. It's a tricky layout, using d3.js's circle packing algorithm, then recursively using a force layout to relax each folder's contents.

noproto|4 years ago

Took a look at my own codebase, which is 99% Rust. All gray, I'm guessing Rust isn't currently a recognized file type? Either way, very nice! I currently use the "dirtree" tool (https://github.com/emad-elsaid/dirtree) to generate diagrams like this of my codebase for documentation: https://github.com/WhiteBeamSec/WhiteBeam/wiki/Code-layout

The "eralchemy" tool (https://github.com/Alexis-benoist/eralchemy) is also excellent at visualizing SQL databases: https://github.com/WhiteBeamSec/WhiteBeam/wiki/SQL-schema

Zababa|4 years ago

.ml and .mli files are not recognized either. Still, it's a fun tool.

fire|4 years ago

Oddly enough, the page partially loads, hangs, crashes the tab and attempts to reload, hangs, and then crashes the entire running mobile chrome instance on my phone.

I don't think I've ever seen that before! I'm guessing the page is just memory heavy and android 11's memory manager can't figure out how to deal with it.

( chrome mobile, pixel 3 xl, android 11 )

alphabet9000|4 years ago

100% cpu for me, browser tab uses 3+ GB ram, chromium hangs then crashes

dddw|4 years ago

Same, crashed rather recent phone as well

er4hn|4 years ago

Very fun! Would a similar visualization work for showing the insides of a go binary?

It would be super cool to have a way to visualize how different modules add bloat in size (and may pull in other bloaty modules as well)

juancampa|4 years ago

This is cool but using rectangles instead of circles would help this visualization. Circles waste real estate and not friendly to labels (e.g. curved text that is harder to read)

wattenberger|4 years ago

There are definitely many trade-offs with using a circle pack layout - I snuck a bit of the reasoning into the collapsed section halfway through the write-up! Overall, this layout worked best for me, with the nesting feeling very natural and the circles feeling very "atomic".

But big picture, this write-up isn't tied to the current visualization! It's more focused on exploring _how_ a visual representation could help our understanding of codebases. There are tons of jumping off points, including different vizes!

thiht|4 years ago

Agree on this. Rectangles can also be aligned and given a logical order. It’s probably less nice to the eye than circles, but more useful.

graderjs|4 years ago

I found this useful on my project. I realized I have many 'dusts' files in directories. Tiny little guys just like grains of sand nestling among the larger circles, looking to be useful. Beautiful structure and images! I love seeing my beautiful work in this beautiful format. It really brings out the beauty! :)

agucova|4 years ago

This is actually something I'd consider adding to all of my big projects. It really does help with discoverability.

gnrlst|4 years ago

Random nitpick: the issue with color coding files is that you may have many different file types leading to colors that overlap.

Case in point in the author's create-react-app example: in one of the scrolling "comment boxes", the author calls out that the "tasks/" folder is mainly CSS files which made me raise an eyebrow...why would a tasks folder be mainly CSS files? -- and upon closer inspection of the colored legend .sh files are a VERY similar green. Just to satisfy my curiosity I visited the repo and sure enough, it was just .sh files, without a single .css file.

It makes me doubt the experience of the author...how can a folder called tasks/ (in any repo) be .css files?

devsatish|4 years ago

This is cool . I remember using “Understand for C++” that does something like this , a full source code graph visualization- function flow etc. This of course starts as a folder visualization , but I see the value- seeing the big picture

KronisLV|4 years ago

I find that it's useful not just to look at the current contents of codebase, but how it has evolved over time. For example, after being onboarded, this lets me see where most of the current effort on a codebase is concentrated and what the biggest recent changes have been.

For this, i believe that Gource is a lovely tool, which you can just point at a Git repository and it will visualize it: https://gource.io/

anigbrowl|4 years ago

Nice implementation. I especially like the curved directory titles.

hashhar|4 years ago

I like the quick insights I can gain from this! Very promising. It's very basic in it's current implementation but I see a lot of potential specially about the "how files are linked" part.

It's a nice bird's eye view. One thing I'd like is for there to be multiple metrics to use for the size of packages e.g. lines of code, number of files, number of methods etc.

That way you can make sense of what are the heavyweight parts of the codebase.

eyelidlessness|4 years ago

Weird seeing this as a Show HN. That said: since MS and GitHub are the same company… one of the things I really want is to be able to opt in to reference/search into dependencies. I don’t need visualization, I need “yes show me node_modules/*/*.js when normally I wouldn’t want that.” I use a VSCode extension that does this in the file browser, but I want it across everything that determines whether something is hidden.

banana_giraffe|4 years ago

Perhaps it's not fair, but the first repo I thought of trying, aws/aws-cli caused it to freeze my browser's tab. When it finally unfroze, I'm presented with a few large circles and way too many tiny dots to be useful.

Guess there's an upper limit on the size of the repo, or perhaps it's more geared to different "shapes" of layouts.

zkldi|4 years ago

Is this site critically slow in firefox for anyone else or just me? It's running at around 1fps...

slava_kiose|4 years ago

Thank you very much for the article! Everything is available, very useful information!

imagineerschool|4 years ago

Very interesting and illuminating!

Please continue on this adventure, you're onto something great!

cameronbrown|4 years ago

Is there anything like this, but for visualising the connections between git commits, files, GitHub issues, maybe even classes etc...?

Waterluvian|4 years ago

I got half way down the page to a full screen graph and it was impossible to scroll further.

Can we just not do these cute UI gimmicks?

butwhywhyoh|4 years ago

How is this any different from just showing the fully expanded tree of the folder structure?

arvindrajnaidu|4 years ago

Now if we can just click on those circles and start coding.

agustif|4 years ago

I can't access it, HN hug of death?

wattenberger|4 years ago

Hmm, it should be loading! What OS/browser are you on?

nathias|4 years ago

I prefer tree.