Show HN: Instantly visualize any codebase as an interactive diagram
222 points| ahmedkhaleel | 1 year ago |gitdiagram.com
Given any public GitHub repository it generates diagrams in Mermaid.js with Claude 3.5 Sonnet
I extract information from the file tree and README for details and interactivity (you can click components to be taken to relevant files and directories)
Also, you can replace "hub" with "diagram" in any repository URL to access its diagram
I created this because I wanted to contribute to open-source projects but quickly realized their codebases are too massive for me to dig through manually, so this helps me get started
I do still plan on adding other features like private repository access if that becomes a thing people want
This project was heavily inspired by https://gitingest.com/ so make sure to check that out as well!
Hopefully this tool can help you and feedback is always welcome!
corysama|1 year ago
https://gitdiagram.com/id-Software/Quake-III-Arena
https://imgur.com/a/gwoabtk
Clicking on any box takes you directly to either a file or a folder in the repo. AFAICT, the boxes, wires, groups, labels are all inferred by the AI.
ahmedkhaleel|1 year ago
ComputerGuru|1 year ago
Also might want to coalesce https and http
Not sure if it queues jobs for processing so that when I refresh after a failure it is continuing where it left off or if it is starting over anew? “Progress bar” makes it hard to say.
Aside: I dislike the “modern progress bar” that’s just a scrolling marquee of pithy quips. One of the difficult problems I worked on for a SW project was adding sane progress to a multi-stage backup tool so that the completed percentage and ETA correctly represented a mix of millions of single kb files and random multi-gb files, backed up across multiple pipelines on multiple cores, asynchronously piping from one stage to the next with buffering. Needed to add a good progress metric without poisoning cpu core caches or hurting the efficiency of how work was being divided. This doesn’t seem as hard by comparison!
Sorry for only having tangentially relevant things to report at this time; still waiting for it to finish with the fish-shell codebase so I can give some good feedback!
layer8|1 year ago
uncomplexity_|1 year ago
"Estimated cost: $8.07 USD" lol
Edit: "Repository is too large (>200k tokens) for analysis. Claude 3.5 Sonnet's max context length is 200k tokens. Current size: 1334798 tokens."
mshockwave|1 year ago
I really wish to see how well (or bad) it works on mega projects. Because those are usually the ones I need diagrams like this the most.
freakynit|1 year ago
""" Repository is too large (>200k tokens) for analysis. Claude 3.5 Sonnet's max context length is 200k tokens. Current size: 1334798 tokens. """
mazambazz|1 year ago
btown|1 year ago
https://github.com/ahmedkhaleel2004/gitdiagram/blob/main/bac... - the prompts in question. Don't sell yourself short as per the comments, they're very well designed prompts!
Using an inexpensive LLM to summarize each file might be an interesting next step, putting few-word summaries alongside the filenames in much the same setup you currently have! But, honestly, it may not be particularly necessary for large existing open-source projects that have already bikeshedded their file naming over many iterations, and/or have highly intentional structures for maintainability.
billyp-rva|1 year ago
[0] https://www.ilograph.com/blog/posts/diagrams-ai-can-and-cann...
mulmboy|1 year ago
Anecdotally I've had great success with code to diagram via LLM including fine details. But as with anything LLM you need to really get the context right. This can not be overemphasized. And iterate with the LLM, goodness.
lor_louis|1 year ago
Might be a bug, so here's the repo. https://github.com/lorlouis/cedit
ahmedkhaleel|1 year ago
antonpirker|1 year ago
I tried it with mine: https://gitdiagram.com/getsentry/sentry-python
A view things:
There are way more integrations in the integration layer, so maybe they should be either shown or a "..." somewhere should tell people that there is more.
The "Hub" is deprecated so it would be cool, that this fact is shown somewhere.
Otherwise really cool!
Animats|1 year ago
I put in a repository of mine that implements a UI in Rust, and it gave me a reasonable diagram. It's just a top-level structure of the program, though. No detail. Not much info about connections between components. The layout was kind of weird.[1]
Another one, from a fork I have of a rendering library.[2] It found the big parts, but provides little insight.
Here's a JPEG 2000 decoder. Even less insight.[3]
The progress messages are bogus. They have no relationship to what's going on. Progress messages indicating progress appear for a bad URL.
[1] https://gitdiagram.com/John-Nagle/ui-mock
[2] https://gitdiagram.com/John-Nagle/rend3-hp
[3] https://gitdiagram.com/John-Nagle/jpeg2000-decoder
NathanFlurry|1 year ago
For comparison:
- Hand made diagram: https://github.com/rivet-gg/rivet/blob/d45bf556e903404ab2df0...
- GitDiagram (no instructions): https://gitdiagram.com/rivet-gg/rivet
owenpalmer|1 year ago
karmakaze|1 year ago
> File tree and README combined exceeds token limit (50,000). Current size: 159829 tokens. This GitHub repository is too large for my wallet, but you can continue by providing your own Anthropic API key.
Without having an idea of the output it would produce, I can't tell if it's worth it. I'm not particularly interested in this test example so it's something I might try for an easy win, but probably tweak and maintain whatever it produces--or discard it and make something by hand. Showing something subjectively incorrect is good motivation.
ahmedkhaleel|1 year ago
thih9|1 year ago
Also, I tried this with https://github.com/rails/rails and it never finished.
dylan604|1 year ago
shahzaibmushtaq|1 year ago
ahmedkhaleel|1 year ago
initramfs|1 year ago
https://gitdiagram.com/EI2030/Low-power-E-Paper-OS
https://gitdiagram.com/hatonthecat/Solar-Kernel
https://gitdiagram.com/hatonthecat/OpenSourceCondo
https://gitdiagram.com/hatonthecat/Open-Source-Car
jesse__|1 year ago
ahmedkhaleel|1 year ago
jasfi|1 year ago
layer8|1 year ago
WillAdams|1 year ago
https://github.com/WillAdams/gcodepreview
which is probably the weirdest structure one could imagine (Literate Program as a .tex file containing Python and OpenSCAD code for https://pythonscad.org/ ) there the Python file is the core, there is an intermediate OpenSCAD file which wraps it, and then a top-level OpenSCAD file which the user interacts with.
blondin|1 year ago
ahmedkhaleel|1 year ago
ssivark|1 year ago
gloosx|1 year ago
I don't get this statement: tried the first repository with the following result
>Repository is too large (>200k tokens) for analysis
So seems this is not suitable for codebases "too massive" because they are "too large", what to do with these?
DoingIsLearning|1 year ago
Error code: 529 - {'type': 'error', 'error': {'type': 'overloaded_error', 'message': 'Overloaded'}}
I assume Anthropic is suffering...
nulld3v|1 year ago
Diagram could maybe have a bit more detail but what is there looks accurate! Really cool stuff OP!
whalesalad|1 year ago
ahmedkhaleel|1 year ago
visch|1 year ago
Minervaskell|1 year ago
Error message: Repository is too large (>200k tokens) for analysis. Claude 3.5 Sonnet's max context length is 200k tokens. Current size: 1334798 tokens.
Cool project though! Kudos!
chris_5f|1 year ago
ahmedkhaleel|1 year ago
fsndz|1 year ago
nhatcher|1 year ago
https://gitdiagram.com/ironcalc/IronCalc
I think the color coding for the legend is incorrect though.
Overall looks great, congratulations and thanks!
louthy|1 year ago
“Failed to generate diagram. Please try again later.”
[1] https://github.com/louthy/language-ext
abrookewood|1 year ago
diamondage|1 year ago
k0ns0l|1 year ago
Quick thought: Since you're tackling large codebases, maybe add some zoom controls?
mparnisari|1 year ago
ahmedkhaleel|1 year ago
phoenixreader|1 year ago
Jerrrry|1 year ago
Well done!
zackproser|1 year ago