top | item 38915999

Show HN: Auto Wiki – Turn your codebase into a Wiki

183 points| oshams | 2 years ago |wiki.mutable.ai | reply

Hi HN! I’m Omar from Mutable.ai. We want to introduce Auto Wiki (https://wiki.mutable.ai/), which lets you generate a Wiki-style website to document your codebase. Citations link to code, with clickable references to each line of code being discussed. Here are some examples of popular projects:

React: https://wiki.mutable.ai/facebook/react

Ollama https://wiki.mutable.ai/jmorganca/ollama

D3: https://wiki.mutable.ai/d3/d3

Terraform: https://wiki.mutable.ai/hashicorp/terraform

Bitcoin: https://wiki.mutable.ai/bitcoin/bitcoin

Mastodon: https://wiki.mutable.ai/mastodon/mastodon

Auto Wiki makes it easy to see at a high level what a codebase is doing and how the work is divided. In some cases we’ve identified entire obsolete sections of codebases by seeing a section for code that was no longer important. Auto Wiki relies on our citations system which cuts back on hallucinations. The citations link to a precise reference or definition which means the wiki generation is grounded on the basis of the code being cited rather than free form generation.

We’ve run Auto Wiki on the most popular 1,000 repos on GitHub. If you want us to generate a wiki of a public repo for you, just comment in this thread! The wikis take time to generate as we are still ramping up our capacity, but I’ll reply that we’ve launched the process and then come back with a link to your wiki when it’s ready.

For private repos, you can use our app (https://app.mutable.ai) to generate wikis. We also offer private deployments with our own model for enterprise customers; you can ping us at [email protected]. Anyone that already has access to a repo through GitHub will be able to view the wiki, only the person generating the wikis needs to pay to create them. Pricing starts at $4 and ramps up by $2 increments depending on how large your repo is.

In an upcoming version of Auto Wiki, we’ll include other sources of information relevant to your code and generate architectural diagrams.

Please check out Auto Wiki and let us know your thoughts! Thank you!

132 comments

order
[+] teraflop|2 years ago|reply
Cool concept. Right off the bat I see some big issues with the generated CPython documentation:

> This provides a register-based virtual machine that executes the bytecode through simple opcodes.

Python's VM is stack-based, not register-based.

> The tiered interpreter in …/ceval.c can compile bytecode sequences into "traces" of optimized microoperations.

No such functionality exists in CPython, as far as I know.

> The dispatch loop switches on opcodes, calling functions to manipulate the operand stack. It implements stack manipulation with macros.

No it doesn't. If you look at the bytecode interpreter, it's full of plain old statements like `stack_pointer += 1;`.

> The tiered interpreter is entered from a label. It compiles the bytecode sequence into a trace of "micro-operations" stored in the code object. These micro-ops are then executed in a tight loop in the trace for faster interpretation.

As mentioned above, this seems to be a complete hallucination.

> During initialization, …/pylifecycle.c performs several important steps: [...] It creates the main interpreter object and thread

No, the code in this file creates an internal thread state object, corresponding to the already-running thread that calls it.

> References: Python/clinic/import.c.h The module implements finding and loading modules from the file system and cached bytecode.

This is kinda sorta technically correct, but the description never mentions the crucial fact that most of this C code only exists to bootstrap and support the real import machinery, which is written in Python, not C. (Also, the listed source file is the wrong one: it just contains auto-generated function wrappers, not the actual implementations.)

> Core data structure modules like …/arraymodule.c provide efficient implementations of homogeneous multidimensional arrays

Python's built-in array module provides only one-dimensional arrays.

And so on.

[+] nerdponx|2 years ago|reply
Great example of plausible but completely incorrect outputs from an AI model that would go largely undetected by a non-expert human.
[+] oshams|2 years ago|reply
Thank you for this feedback. We actually have an Auto Wiki v2 in the works which is even higher quality, would be interesting to see how it changes when that comes out.
[+] hk__2|2 years ago|reply
That’s nice but the name is confusing: it’s not generating a wiki at all, but a documentation website with a Wikipedia-like theme. Wikis are collaborative websites; Wikipedia is only one of them.
[+] userbinator|2 years ago|reply
...and I thought it was a wiki about cars.
[+] oshams|2 years ago|reply
Apologies for the confusion, we are thinking of adding the ability to edit the wikis. What do you think?
[+] velox_neb|2 years ago|reply
Reading these wikis makes me feel we need to invent some visual convention to indicate AI-generated text. Like a particular color or font. This would make it so people don't feel cheated after they realize they just spent several minutes trying to make sense of something churned out by an LLM. (I mean this as a voluntary design enhancement for sites that want to be nice, of course people can always cheat.)
[+] charcircuit|2 years ago|reply
I think this would be better as a more general bot authored distinction.
[+] lucasban|2 years ago|reply
Color or font would not necessarily be accessible, a consistent icon or tag around it would likely be easier for screen readers or other low vision situations
[+] oshams|2 years ago|reply
Hi! Appreciate your comment, I personally think AI generated content is the future. The reactions people are having to AI generated content is very similar to the reactions to the printing press whereby anyone could write anything and mass distribute it. I think people also had similar reactions to Google indexing the web. (Note: I'm not discounting existential risk, that is real but another topic for another day.)
[+] 8organicbits|2 years ago|reply
I think this falls into a common mistake people make about documentation. Good documentation doesn't explain what the code does, it explains why the code is written the way it is, the constraints that caused this decision to be made and even alternatives not considered. You cant really guess those things by looking a code. I'm a fan of ADRs for that reason.

Honestly this looks overly verbose to me, a common LLM problem. The mistakes others cite, are also pretty concerning.

https://adr.github.io/

[+] _a_a_a_|2 years ago|reply
Good point, completely agree and interesting link, thanks
[+] paxys|2 years ago|reply
The entire point of a Wiki is that it can be collaboratively edited. This is static documentation, just with a Wikipedia-like UI.
[+] oshams|2 years ago|reply
Would this be your top request? We're thinking of adding that functionality.
[+] dormento|2 years ago|reply
And its wrong! Its not difficult to find whole paragraphs that were entirely made up. LLMs are not fit for this sort of thing.
[+] CGamesPlay|2 years ago|reply
I'd love to see the wiki generated for a less already-documented example. These high-profile projects are good demos and the results look compelling (I checked out AutoGPT's and NeoVim's), but these projects already have a ton of documentation that helps the model substantially. What are the smaller projects where it has to generate documentation from code (and not necessarily well-commented code) rather than existing documentation?
[+] TheEzEzz|2 years ago|reply
Super cool. When I think about accelerating teams while maintaining quality/culture, I think about the adage "if you want someone to do something, make it easy."

Maintaining great READMEs, documentation, onboarding docs, etc, is a lot of work. If Auto Wiki can make this substantially easier, then I think it could flip the calculus and make it much more common for teams to invest in these artifacts. Especially for the millions of internal, unloved repos that actually hold an org together.

[+] oshams|2 years ago|reply
Thank you! We like the analogy of dehydrating knowledge that can be used (hydrated) later. Beyond even unloved repos, we'd even argue broader organizational knowledge that seems to have been lost to history like Roman Concrete or how to precisely build the Saturn V could potentially be "stored" using AI.
[+] Amigo5862|2 years ago|reply
The only thing I see that this adds over existing docs-to-HTML tooling is that it uses a wikipedia-inspired theme.

Meanwhile on the negative side, it adds hallucinations. You say you "cut back" on them but as teraflop's comment shows, it still has plenty.

BTW: even the Mastodon link from your OP says "wiki not found" for me.

[+] eduardosalaz|2 years ago|reply
Does it parse Julia files? I am having trouble with generating the wiki for a Julia repository, what surprised me was that it could parse and understand .tex files! Looks promising.
[+] oshams|2 years ago|reply
Hey ! Yes, it should work, is there a public repo in particular you'd like us to Auto Wiki? Please bear with us as we ramp up on capacity.
[+] oshams|2 years ago|reply
FYI an update from us: We're moving our authentication system to wiki.mutable.ai so you can generate them for private wikis without needing to go through app.mutable.ai.
[+] comex|2 years ago|reply
[Edit: Apparently I’m reviewing the wrong product; see replies.]

I tried the app version on one of my old repos. It’s a somewhat challenging test case because there are few comments and parts of the code are incomplete, though I’d say the naming convention is pretty good. The app suggested the question “What is the purpose of the ‘safemode-ui-hook.m’ file?” I accepted the suggestion, and the output was… completely wrong.

I’m not surprised it guessed the purpose wrong; even a human would need some context to understand what’s going on in that particular file, though of course the AI did worse by being confidently wrong rather than saying it didn’t know. But the AI also made specific claims that could be seen as wrong just by reading the file. It claimed the file “defines a SUBSTITUTE_safemodeUIHook C struct” when neither that struct name nor anything like it appears anywhere in the file. The name seems to just be mashed together from the repo name and file name.

Which makes me wonder, did the AI even see the content of the file? Is it pre-summarized somehow in a way that makes it know very little about the file? Or did the AI see it in full, but hallucinate anyway?

[+] oshams|2 years ago|reply
It sounds like you're referring to our chat product. We are aware of the limitations of that, this is why we created auto wiki! We plan to integrate the two in the future.
[+] sysread|2 years ago|reply
Why does it request write access to my repos via gh auth?
[+] oshams|2 years ago|reply
Great question. We originally had some functionality that can do pull requests for our app, what makes sense instead is to:

1. move private wikis to wiki.mutable.ai (not app.mutable.ai) 2. restrict permissions for wiki github app to read only

Hope that explains things, we just wanted to launch as early as possible to get all the wonderful feedback from the HN community so we can bake it into Auto Wiki v2.

[+] OmarShehata|2 years ago|reply
The Bitcoin and Mastadon links don't seem to be working! (wiki not found)

Would love to see this for Godot (https://github.com/godotengine/godot). Maybe Maplibre too (https://github.com/maplibre/maplibre-native)!

[+] oshams|2 years ago|reply
We're trying work out why it doesn't load for a subset of people. We tested on all browsers/OS configs with 0.5% coverage. Please accept my apologies.

We are generating those two wikis now. Thanks for the request.

[+] Eiim|2 years ago|reply
I'll go ahead and put in a request for my own repo: Eiim/Chokistream

In the meantime, I have a different bit of feedback: the categories don't make much sense to me. I can't find a consistent theme in "Tooling", Bun isn't really a frontend library (although it has frontend components like a bundler), I don't know much about Urbit but it doesn't look like it belongs in "Crypto" (just a P2P network with a crypto-adjacent userbase), iptv-org/iptv doesn't seem to make sense in Education, etc.

Also, a number of the links in the Bun page (the ones not in monospace) are 404s. I don't see those types of links on other pages so maybe a bug that was fixed but not backported?

Edit: It'd also be nice if the search bar could just search for repo name instead of having to remember the associated GH user

[+] ae0001|2 years ago|reply

[deleted]

[+] perpil|2 years ago|reply
For a wilder idea of what you can do with GitHub and wikis see https://speedrun.cc No AI, so the wikis need to be hand created, but being able to build tools and UIs right into your GitHub documentation is a powerful concept.
[+] elicksaur|2 years ago|reply
How are you going to handle this scenario:

- Person reads your auto wiki explanation of some part of a codebase.

- The explanation is incorrect.

- The person, believing your explanation as authoritative, complains to the developers of the codebase. Maybe opens an issue on an open-source project, posts on a discord or the like.

- The maintainers now have to deal with this misinformation adding overhead to their workload.

As someone who has helped others who were led astray by ChatGPT, this setup adds a ton of mental baggage to the person’s ask for help. They now have “But ChatGPT said…” to contradict the actually correct thing that you are trying to teach them.

[+] oshams|2 years ago|reply
Great question, here's our plan:

1. Allow users to give feedback on wikis and individual sections (we already have this) 2. Increase accuracy - we will continually push for this. Auto wiki v2 will be significantly more accurate AND informative. 3. Allow for GH repo owners to edit or modify content (seems thats common feedback on this launch). 4. General education and encouragement for people to read the wikis and click through to the actual citation to ground them.

[+] TachyonicBytes|2 years ago|reply
As long as this is happening, might as well try some of my favorites: https://github.com/wasm3/wasm3, https://github.com/WebAssembly/wabt, https://github.com/bytecodealliance/wasmtime
[+] TachyonicBytes|2 years ago|reply
Also, how does it "know" which parts are the important parts? Example, from the React repo, we have this:

    The key components of React's implementation include:

    The reconciler, implemented in …/react-reconciler, which contains the ReactFiberReconciler class and algorithms for recursively diffing virtual DOM trees and scheduling rendering work. The beginWork and completeWork phases drive the reconciliation process.
But the reconciler seems to be an experimental, not core, recent package, not a key one.
[+] oshams|2 years ago|reply
So sorry, missed this, we're on it! Great choices btw. Wasm related stuff is a good test for auto wiki.