Autodoc: Toolkit for auto-generating codebase documentation using LLMs

[+] SilverBirch|3 years ago|reply

I have a conceptual problem with this. Documentation is meant to describe stuff thats's not in the code. Sure, there's the odd occasion where you've done some super weird optimization where you want to say "Over here I've translated all my numbers into base 5, bit reversed them and then added them, mathematically this is the same as just adding them in base 9, but fits our custom logic cell better". But that's the exception, the general purpose of documentation is to describe why it's doing hwat it's doing, not how. Tell me that this module does X this way because that helps this other module do Y". Tell me why you've divided the problem this way. You're giving information about why certain design decisions were made, not just describe what they are.

It doesn't matter how good your LLM is, the information simply isn't there for it to know the information it needs to document. You're never going to get a comment out of this that says "This interface is meant to be backwards compatible with the interface Bob once wrote on a napkin in the pub on a particularly quiet friday afternoon when he decided to reinvent Kafka".

[+] ithkuil|3 years ago|reply

Just give your LLM access to all your slack chats and screenshots of all items found in the bins and it would tell you what Bob had for breakfast too

[+] gchamonlive|3 years ago|reply

I agree with you that documentation should expose developer intent that isn't encoded in the codebase, but if the documentation at least reduces the entry bar to understanding the code, I believe it could have its merits.

Autodoc won't substitute the need for engineers to document their work, but I believe specially in legacy cosebases that it could help with maintenance of an otherwise helpless codebase.

[+] ModernMech|3 years ago|reply

I've been using ChatGPT to write docs. Here's how I'll do it: I'll start feeding it specs and examples about my project. Then I will tell it the outline of the docs we are going to write. Then for each section, I tell it we're going to write that section, and provide it with the subsections. Finally, in the prompt, I fill in all the key details it needs to hit, which is where I would tell it "Make a note that the interface was meant to be backwards compatible...". I had already defined ahead of time that a "Note" is an element in the docs with a little emoji in front and special styling, so it even formats it nicely.

[+] usefulcat|3 years ago|reply

I think it depends a lot on what exactly is meant by ‘documentation’. If I’m looking at the man page for for strlen, what it does is almost always the first thing I need to know. I’d go so far as to say that knowing what something does is almost always a prerequisite for understanding anything to do with ‘why’.

[+] cush|3 years ago|reply

Yep. Comments are for why not how. Docs though can be very dumb and still quite useful. "This is the class to access the Animals database. Use getAnimals to query for animals."

[+] divan|3 years ago|reply

Just tested it on a side-project codebase.

Main impression - it does hallucinate like crazy. I asked "How does authorization of HTTP request work?" and it started spitting explanation of how user bcrypt hash is stored in SQlite database and token is stored in Redis cache. There are no signs of SQLite or Redis whatsover on this project.

In other query it started confidently explaining how `getTeam` and `createTeam` functions work. There are no such entity or a word "team" in the entire codebase. To add to the insult, it said that this whole team handling logic is stored in `/assets/sbadmin2/scss/_mixins.scss`.

Other time it offered extremely detailed explanation of some business-logic related question, linking to a lot of existing files from the project, but that was completely off.

Sometimes it offered meaningful explanations, but was ignoring the question. Like I ask to explain relation between two entities and it started showing how to display that entity in a HTML template.

But I guess it's just a question of time when tools like this become a daily assistant. Seems invaluable for the newcomers to the codebase.

[+] scary-size|3 years ago|reply

Wrong documentation is even worse than no documentation. Without it your at least force to look at the code and validate your assumptions. Get a feel for the code base. Wrong docs pointing you to tech that’s not even used is going to be a mess.

[+] unknown|3 years ago|reply

[deleted]

[+] jprete|3 years ago|reply

Alternatively, it's hyped up like crazy, the tech is inherently bad at information retrieval, and most of the people hyping it are trying to get in on a gold rush.

To be clear, I don't know the answer.

[+] funfunfunction|3 years ago|reply

Hey if your project is public I would love to take a look if you don’t mind sharing a link

[+] sour-taste|3 years ago|reply

Most interesting part to me, the prompts:

https://github.com/context-labs/autodoc/blob/83f03a3cee62d6e...

> You are acting as a code documentation expert for a project called ${projectName}. Below is the code from a file located at \`${filePath}\`. Write a detailed technical explanation of what this code does. Focus on the high-level purpose of the code and how it may be used in the larger project. Include code examples where appropriate. Keep you response between 100 and 300 words. DO NOT RETURN MORE THAN 300 WORDS. Output should be in markdown format. Do not say "this file is a part of the ${projectName} project". Do not just list the methods and classes in this file. Code: ${fileContents} Response:

[+] koboll|3 years ago|reply

This basically matches my experience in trying to get it to do the right thing. BEING VERY EXPLICIT AND ANGRY WORKS TO REINFORCE A POINT. Specifically telling it to not do a thing it will otherwise do is often necessary.

The only part that surprises me is `Output should be in markdown format`. Usually being that vague results in weird variation in output; I'd have expected a formatted example in the prompt for GPT to copy.

[+] mpalmer|3 years ago|reply

There's a subreddit where people post angry prescriptive memos or notices from their terrible bosses, and this would fit right in.

[+] hedora|3 years ago|reply

This has far surpassed my dystopian predictions of how people would misuse LLMs.

Self spamming your own code base with comments that are either obvious, misleading or wrong was previously unfathomable to me.

Most people think I’m unrealistically pessimistic.

Well done.

[+] jamesrom|3 years ago|reply

Code is the ultimate reference for understanding a project, while documentation is often neglected, outdated, or incorrect. It can also be difficult to keep up to date.

An LLM may not fully comprehend the code like the original author, but it can offer a different perspective that may be valuable. The only argument I've seen against LLMs is that it may encourage laziness, but this is a flawed argument similar to those made against the printing press, which was said to make people illiterate.

As a reader of the docs: does it require discipline to refer back to the code when needed. Yes, but this is no different from the discipline required to write documentation in the first place. But with a key difference, the discipline is shifted from author to reader.

[+] userbinator|3 years ago|reply

People have been doing that for a long time. It's usually a form of malicious compliance to "document everything" policies.

https://www.reddit.com/r/ProgrammerHumor/comments/4ktp12/my_...

[+] armchairhacker|3 years ago|reply

But you see, developers have been self spamming their code with obvious, misleading, and/or wrong comments for decades. Especially with those pesky “everything must have a doc-comment” linters. Think of all the time ChatGPT will save doing it for them!

[+] layer8|3 years ago|reply

In can help to give an initial overview over a code base, its structure and contained functionality (with the usual risk of inaccuracies and fabulations), but it can’t supply the rationale and contextual information that good documentation provides, which can’t be derived from just the code. It also can’t distinguish between what is an implementation detail and what is part of an implicit interface contract.

[+] abhishekbasu|3 years ago|reply

your pessimism is hardly unwarranted. this will lead to rather lousy code documentation, and might be misused by folks. but on the other hand, i look at this as something that will be effective as a "summarizer" of sorts where it describes already written code with lack of/poor documentation.

[+] elromulous|3 years ago|reply

https://abstrusegoose.com/432

[+] riku_iki|3 years ago|reply

He now can use gpt to generate code, comments, comments in code reviews, and work on 3 jobs simultaneously.

[+] bqmjjx0kac|3 years ago|reply

It would be chef’s kiss perfect if another LLM did the code review.

[+] ChatPGT|3 years ago|reply

[deleted]

[+] andrewmcwatters|3 years ago|reply

I'm far, far more interested in having an LLM tell me where particular functionality is in a codebase, and how it works from a high level.

Autogenerating function documentation seems like such a low bar by comparison. It's like taking limited creativity and applying it with high powered tools.

Literally like asking for a faster horse.

Tell me how WebKit generates tiles for rasterizing a document tree. Show me specifically where it takes virtualized rendering commands and translates them into port specific graphics calls.

Show me the specific binary format and where it is written for Unreal Engine 5 .umaps so that I can understand the embedded information for working between different types of software or porting to different engines.

Some codebases are so large that it literally doesn't matter if individual functions are documented when you have to build a mental model of several layers of abstraction to understand how something works.

[+] funfunfunction|3 years ago|reply

Completely agree. Explaining how systems work in plain english is much more valuable than just giving inputs and outputs of individual functions. We want to understand how a system and it's subsystems work independently and interdependently.

We're not there yet with Autodoc; there is still tons of work to do.

If you have't tried the demo, give it a shot. You might be surprised.

[+] IanCal|3 years ago|reply

Documentation can help the llm search though. Layering of tasks is important.

[+] golem14|3 years ago|reply

Today, LLMs learn from a codebase that has mostly insightful comments from well-meaning humans.

In the future, the training sets will contain more and more automatically generated stuff I believe will not be curated well, leading to a spiral of ever declining quality.

[+] sour-taste|3 years ago|reply

One thing I find worse than no docs is wrong docs.

It would be really cool if we could take code + docs, feed it into an LLM and get a determination of whether the code matches what's in the docs. It could also be a good way to evaluate the correctness of the generated docs from the linked tool (assuming it works).

[+] userbinator|3 years ago|reply

If docs can be generated, they're not worth reading.

[+] rkagerer|3 years ago|reply

Come talk to me when it can reverse engineer as good as Ken Shirriff, develop a complete understanding of the whole codebase, and generate authoritatively accurate and useful output. Oh, and uncover bugs while it's at it.

[+] layer8|3 years ago|reply

This still wouldn’t be enough, because writing good documentation usually requires knowing contextual information not contained in the code base.

[+] awestroke|3 years ago|reply

Oh, so it's only interesting when it has completely surpassed even the smartest human in intelligence? When that happens, why would anybody bring it to you? When that happens, why would humans be needed for anything?

[+] coolspot|3 years ago|reply

> Come talk to me when it can…

I am afraid no one will come talk to you when that happens.

[+] whiplash451|3 years ago|reply

New tech always need early adopters. You do not have to be part of them, but do not dismiss them either.

[+] jeffypoo|3 years ago|reply

Are you really grand standing against an AI model rn

[+] zx8080|3 years ago|reply

How to verify the meaning of docs? How to deal with the model hallucinations?

It would be hell to lose trust to api docs due to those risks.

[+] docandrew|3 years ago|reply

The way I’m thinking of it is as a junior engineer who can go do busy work for me. I’m not going to accept their “PRs” without a review. Even if it gets me 75% of the way there, that’s still a big time savings for me.

[+] funfunfunction|3 years ago|reply

Hallucination is definitely a problem but can be somewhat mitigated by good prompting. GPT-4 seems less prone to hallucination. This will be better over time.

You can view the prompts used for generating docs here[1] and the prompts used for answering questions here[2]

[1]https://github.com/context-labs/autodoc/blob/master/src/cli/... [2https://github.com/context-labs/autodoc/blob/master/src/cli/...]

[+] splatzone|3 years ago|reply

Very good point, but easily solved - just tag the docs as being generated by GPT-4, and make sure whoever reads them knows it.

[+] petesergeant|3 years ago|reply

Example output: https://github.com/context-labs/autodoc/blob/master/.autodoc...

[+] verdverm|3 years ago|reply

Please run this on the BuildKit code base which has almost no comments but a huge usage footprint. It's also (obviously) a non-trivial test case

[+] chatmasta|3 years ago|reply

It was funny how for years the only documentation of BuildKit was some .md file in the middle of that repo with all the magic incantations for running apt install with cached layers... to be fair, I actually found it clearer than the real Docker docs.

[+] funfunfunction|3 years ago|reply

Hi there, creator of autodoc here.

You can do it yourself and make pull request back to the BuildKit codebase :)

If it gets merged everyone who uses BuiltKit would have access.

[+] smrtinsert|3 years ago|reply

I'm more interested for LLMs to get the point when it can look at the several hundred codebases in your company and tell you who sets what value in your local data model, and why they set it. There's always slack instead of poorly generated documentation.

[+] unknown|3 years ago|reply

[deleted]

[+] splatzone|3 years ago|reply

This is extremely interesting. I have a few monstrous repositories I’d like to try it on.

The thing I’m wondering about is the cost. How much would it cost to run this on the entire WordPress source, for example?

[+] funfunfunction|3 years ago|reply

How many pages? You can get an estimate for how much it would cost using the `estimate` command in Autodoc.

[+] whiplash451|3 years ago|reply

Very cool stuff.

I think people who dismiss this kind of tool because it can hallucinate stuff are off topic.

The AI will get better and better, but more importantly we will evolve and learn to work with this kind of tool.

[+] ImageDeeply|3 years ago|reply

Granting that some hallucination is likely, still seems like a step in the right direction.

86 comments