top | item 30760835

Literate programming is much more than just commenting code

173 points| increscent | 4 years ago |justinmeiners.github.io

107 comments

order
[+] falcolas|4 years ago|reply
My favorite literate program still has to be the book "Physically Based Rendering". An optimized, feature rich ray tracer in the form of a textbook.

That said, I wouldn't personally want to try and collaborate on such a program with more than one other person. It would make for a great single-contributer OSS library though. Rubber duck debugging built right into the prose.

[+] taeric|4 years ago|reply
https://smile.amazon.com/gp/product/1541259335 is also a great book. As is Stanford GraphBase.

My personal bet is that it is probably easier to collaborate on something like this than you would think. The imposed structure of programs, in general, already makes a lot of collaboration tough.

[+] eunoia|4 years ago|reply
Great book! It’s available online, for free at https://www.pbrt.org/.

You can also find older, physical editions on EBay for $10-$15.

[+] svat|4 years ago|reply
I would go further: literate programming is not just "much more than" commenting code, because you can do LP without commenting much. The main thing in LP is the idea/orientation of writing as if you're writing something for a human reader. This does often lead to more comments, but even something like "here's the code" followed by lots of code can be LP, if you deem it sufficient for your intended audience. (Earlier comment of mine about target audience and not over-commenting: https://news.ycombinator.com/item?id=29871047)

This works well for people who are writers by nature (like Knuth who's always making edits and improvements to his books https://news.ycombinator.com/item?id=30149221). One problem though (and there are several) is that because this is so personal, nearly everyone who seriously tries LP ends up writing their own LP tool (including the author of this post!).

[+] WorldMaker|4 years ago|reply
I'm somewhat hopeful the growing ubiquity of especially Jupyter notebooks leads to better, more universal tools for literate programming. Notebooks have always been a form of literate programming. Jupyter and its underlying formats are now ubiquitous enough with a lot of strong IDE support (across a variety of IDEs) that I'm hopeful a better convergence as a "general literate programming platform" from the notebook side may just be a matter of time. (Other than that a lot of strong LP proponents so far seem to mostly be oblivious to the happenings in Notebook spaces and vice versa, despite there being so much cross-over.)
[+] antirez|4 years ago|reply
I think likewise. When I had to write the radix tree implementation for Redis I faced two problems:

- I needed a stable implemention as soon as possible, I had a performance issued that needed to be solved by range queries.

- The radix tree was full of corner cases.

So I resorted to literate programming, which is in general very near to my usual programming style. You can find it in the rax.c file inside the Redis source code, as you can see as the algorithm is enunciated, the corresponding code is inplenented.

Other than that I wrote a very extensive fuzzer for the implementation. Result: after the initial development I don't think it was never targeted by serious bugs, and now the implementation is very easy to modify if needed.

[+] BeetleB|4 years ago|reply
The problems one will run into with literate programming:

1. Lack of tooling.

2. Refactoring becomes nontrivial

3. How one would write a program in literate style will vary widely from person to person. If you write your code in literate style, it may be easy for you to follow it years later and modify it, but it likely will not be the case for a coworker. If they have to modify the code, the cognitive load will not be too different from that of just dealing with well written code.

Disclaimer: I've written two nontrivial programs literate style that I continue to rely on and occasionally modify years after writing them. It works as advertised.

[+] klibertp|4 years ago|reply
For point 3. - it's exactly the same with code that's not literate. Writing code is ultimately about expressing ideas using a language, which is really much closer to writing a novel than to drawing a plan for a bridge. As such, if you want to make your code understandable to other, you have to learn to write well. Just like with novels, there's no problem with having a personal style, or a specific flavor that comes from how you use the language, how you structure your sentences and paragraphs, how you guide a reader through the story.

In other words, the style varying between people is not a problem - bad writing is. And, unfortunately, in my experience very few programmers are capable of consciously producing good writing. The fact that most of the docs out there are barely-legible trash is a proof of this.

I'm sure that reading literate code from Charles Stross would be a blast. It would be exciting, sometimes surprising, but still clear, easy to navigate, structured in a way allowing for extension within a well thought-out framework. Unfortunately, when people without his talent try to use LP, they produce things on par with that unfinished fantasy novel you started writing in 8th grade.

Programming requires a bit of talent, but you can get by with lots of hard work. Literate programming is much harder than that and requires a lot of talent to be beneficial to the codebase. Without that, your LP code will be Fifty Shades of Twilight, and honestly, we don't need more of things like that.

[+] WorldMaker|4 years ago|reply
I feel like problem 1 is on the cusp of solutions given the amount of money poured into tooling for "notebooks" like Jupyter. Notebooks are a form of literate programming. Projects written in Jupyter Notebooks are getting larger and scaling harder. I think a convergence should eventually happen that larger scale literate programming tasks can benefit immensely from the tooling investments in notebooks like Jupyter.
[+] throwaquestion5|4 years ago|reply
Problems 1 and 3 I could imagine. I would need to learn how to be a better writer to share a literate program.

As someone experienced in the topic, What's the biggest hurdle when trying to refactor the code?

[+] yumiris|4 years ago|reply
Literate programming has been particularly useful for my "dotfile" configurations, such as .emacs, .vimrc, .zshrc and even the .gitconfig file.

I use one .org file to declare all of my configurations, and tangle them together into the aforementioned files. This keeps things pretty portable, and makes up for the unintuitive readability of many dotfiles.

It can also work for rudimentary shell scripts and other single-file goodies; however, scaling it to proper multi-file programs proves to be difficult, especially when multiple developers are involved.

[+] syntaxfree|4 years ago|reply
This is a cool idea. Also so if you switch tools like WMs you know what you used to have even if it takes some work to reconstruct what that was. But have such a tangle of glued together and custom written tiling WM rice that I can never switch to anything every again.
[+] sritchie|4 years ago|reply
Literate programming is going to feel far more powerful when we expand the definition to include:

- Smalltalk-ish things like writing suites of custom viewers for various types, - demos and examples in-line inside of a library - multiple stories about the same piece of code, but all with the ability to IMPORT the story as a library

I've been writing sicmutils[0] as a "literate library"; see the automatic differentiation implementation as an example[1].

A talk I gave yesterday at ELS[2] demos a much more powerful host that uses Nextjournal's Clerk[3] to power physics animations, TeX rendering etc, but all derived from a piece of Clojure source that you can pull in as a library, ignoring all of these presentation effects.

Code should perform itself, and it would be great if when people thought "LP" they imagined the full range of media through which that performance could happen.

[0] sicmutils: https://github.com/sicmutils/sicmutils

[1] autodiff namespace: https://github.com/sicmutils/sicmutils/blob/main/src/sicmuti...

[2] Talk code: https://github.com/sritchie/programming-2022

[3] Clerk: https://github.com/nextjournal/clerk

[+] WorldMaker|4 years ago|reply
Don't forget to include "notebooks" in the expanded view of literate programming. The amount of code being written in Jupyter notebooks alone today in practice dwarves much of literate programming in preceding years.
[+] QuikAccount|4 years ago|reply
I like literate programming in theory but the most common response I see to it is that writing self documenting code is better because as you are working on a code base with many people, it is unlikely they will keep your prose up to date as the code is changed.
[+] mannykannot|4 years ago|reply
> self documenting code is better because as you are working on a code base with many people, it is unlikely they will keep your prose up to date as the code is changed.

There is no reason to believe they are any more likely to keep code self-documenting (or to succeed even if they try) - it is not as if it will not compile or run unless it is.

I see literate programming to be an attempt to put some rigor into the otherwise terminally vague concept of self-documenting code (conceptually, it is way beyond the platitudes in 'clean code', even though it came first.) It is, however, doomed to failure in practice because it always takes less information (and less skill) to merely specify what a program will do than it does to not only specify what it will do but also explain and justify that as a correct and efficient solution to a problem that matters.

Neither 'literate' nor 'self-documenting' code are objective concepts.

[+] Koshkin|4 years ago|reply
Self-documenting code is fine - until someone starts wondering why code does what it does, or if someone wants to generate documentation. (No, the lazy style, "OpenFile - opens a file", does not cut it.)
[+] falcolas|4 years ago|reply
I think the one potential mitigating factor is that new features can be entirely new "chapters". Thanks to the tangling, a feature that needs to be added in 10 different places in the code can be written completely separately from the rest of the code.

Additionally, bugs can be fixed in-situ, refactoring can occur at will, and neither would require the prose around them to change, since code being talked about (despite moving or undergoing small changes) still fulfills the original, documented, purpose.

[+] taeric|4 years ago|reply
The problem with self documenting code is that it doesn't help justify all of the parts into the whole. This is particularly troublesome in code where a refactor effectively isolated entire sections of the code, but the person that did the refactor didn't realize it, and now you have code that exists only for the sake of existing tests.
[+] doliveira|4 years ago|reply
How are newcomers handled in those "self-documenting codebases"?
[+] lf-non|4 years ago|reply
I am not a big fan of the complex literate programming style involving code-generation which this article talks about.

But I recently discovered that Google's zx [1] scripting utility supports executing scripts in markdown documents and I combined it with httpie [2] and usql [3] for a bit of quick and dirty automation testing and api verification code and it worked out pretty well.

I imagine for most people nowadays jupyter or vscode notebooks are the closest it comes to practical literate programming.

[1] https://github.com/google/zx#markdown-scripts

[2] https://github.com/httpie/httpie

[3] https://github.com/xo/usql

[+] andrewshadura|4 years ago|reply
ifupdown, the Debian tool to manage network interfaces, used to be written in literate C using noweb. When I took it over from the original author, I struggled to understand how it worked. I had to print out the weaved version of it, and read it making notes on the paper. I eventually managed to make sense of it, but making any change was very difficult, so I ended up converting it to plain C, adding some comments from the original literate source and reindenting.
[+] foxdeploy|4 years ago|reply
This talked about writing code for humans then immediately jumped into some arcane mathematic scrawl like the stuff when Sephiroth casts supernova
[+] vim-guru|4 years ago|reply
I've written a fair share of literate code.

It works well for personal stuff where you would like to leave some bits of information for yourself (typically, configuration files).

It works well for small libraries where good documentation is important.

It works well for visualisation-work, where you may combine multiple languages and data-formats without writing API's for each.

In larger scale apps though and with collaboration; you run into problems with tooling on multiple levels. I am working on tackling scale, but collaboration is tricky. Mostly because you need structure to collaborate and then you will likely end up with an outline that's pretty close to a directory-tree and then you've lost one of the good bits of literate code in my opinion.

[+] atweiden|4 years ago|reply
I’d like to see a literate programming version of GitHub where the community standardizes around an eminently-readable Markdown-like syntax. srcweave [1] looks like a great start.

[1]: https://github.com/justinmeiners/srcweave

[+] rektide|4 years ago|reply
Building a Habitable Computing Environment[1] was a recent blush i had with a "literate" computing project, this time less about programming specifically & about system setup/config.

I confess I'd rather forgotten what literate specifically meant beyond code comments describing the flow, but i did find it to be a remarkably comprehensive & understandable document, a prime example of how we might teach & understand computing. Even if it did leave me puzzling out what a number of the many many many scripts were for!

Certainly the overall project of computing needs a lot of help, ways to explain itself. Ive seen tons and tons and tons of "dotfiles" projects, but none have gotten anywhere near to as comprehensible as this literate programming project, from what I've seen.

[1] https://tess.oconnor.cx/config/hobercfg.html https://news.ycombinator.com/item?id=30748033 (19 points, 1d ago, 0 comments)

[+] mci|4 years ago|reply
IMHO, attempts to show literate programming on screen are doomed to meet with mediocre success. DEK invented literate programming with printed books in mind. I dare say that the only successful literate programs are books printed on paper.

First, in a printed book, it is easier to find a previous page and compare a fragment on it with the current fragment. Second, a printed book has no links tempting you with the words "CLICK ME" to disrupt the flow so you can read it from cover to cover with fewer distractions. Third, anecdotally, I can see flaws much easier on a printout than on screen, both in programs and in texts.

[+] kwhitefoot|4 years ago|reply
> in a printed book, it is easier to find a previous page and compare a fragment on it with the current fragment.

This is why I like plain text for everything (or Emacs Org Mode) because then I can have multiple frames showing different parts of the same buffer in Emacs.

[+] floodyberry-|4 years ago|reply
I always feel like I'm missing something about literate programming because every example is "godawful". Transitioning between text and context-less code fragments littered with markup is incredibly jarring, and instead of something "written for humans", you now have something that is neither text _or_ code that you have to piece together and hold in your head as you go.
[+] nonrandomstring|4 years ago|reply
This is great stuff. It's how all code and data research should be presented, where the document is the program and you can reproduce it as easily as you can read it. After years of using Pure Data (a visual dataflow) whose unofficial motto was "The diagram is the program" I got this philosophy stuck deep in my brain. Today I use Org-Mode (In Emacs) for tangling (with something called Babel) that can run source code from many languages as part of an active document.
[+] michaelrpeskin|4 years ago|reply
I remember my first day of advanced computer graphics (25 years ago, so advanced then wasn’t advanced now) the whole class was shocked when the professor did his dramatic start of class saying “you should never document your code!”…dramatic pause…”you should code your document”.

I still think of that today when I’m writing complex algorithms. I write everything first in prose. Then translate that to a more list like structure. And then I fill in the code around that.

Works really well when I have to come back months later and figure out what I was thinking.

[+] dwohnitmok|4 years ago|reply
Are there any large (> 5 people teams) projects written with literate programming?

Also are there any IDE plugins or error stack trace/debuggers for literate programming?

I haven't really paid attention to literate programming in a long long time and I'm curious if the field has advanced.

(Also I don't understand this: "A typical literate file produces many source files." Why? Why would you care about having multiple source files? Isn't the literate file the source at this point?)

[+] guitarbill|4 years ago|reply
I worked on a larger project where some of the code was "literate" programming. It was an absolute pain to modify anything. Debugging, not so much, since tangle produces raw source code. This you can work with. The problem is working with the original files.

Syntax highlighting? Good luck! But possibly you could work around this, e.g. via custom highlighting syntax. Same with any auto-complete, contextual IDE help, etc. Refactoring was painful.

Also, the text absolutely destroys being able to scan and reason about the control flow quickly. Especially bad when a dev decides something needs "a lot of documentation" and writes a small novel.

Needless to say, it was truly awful.

[+] flukus|4 years ago|reply
IMHO the two biggest problems I find with existing tools are that they assume the documentation is the source of truth and that the tangling will leave artifacts in code. Both of these make them poorly suited to the sort of projects that most of us work on, I'm not Knuth writing a dead tree tome but I would like some better ways to add documentation to existing projects and have to integrate with others.

I wrote a PoC of a tangling tool that worked through "virtual" files and had a syntax aware handler, there's enough information that it could possible work with language servers too, but sadly I haven't had time to take things further: https://gitlab.com/lusher/tanglemd

[+] k__|4 years ago|reply
"Also are there any IDE plugins or error stack trace/debuggers for literate programming?"

Good question!

I always think it would be nice to just write Markdown sprinkled with code, but without IDE/editor support, it's dead in the water :(

[+] goosedragons|4 years ago|reply
You might still want multiple tangled files to follow a familiar app structure, or for languages that need to be compiled or if you don't want users to have to add the extra step of tangling the source before they build your program.
[+] hzhou321|4 years ago|reply
What are the key differences between a human audience and a machine interpreter? It is not the language or prose. It is the structure and order. For machines, details comes first. You declare all the actors and types with every non-forgiving annotations first. You may tuck the details into a header, but it still needs be ordered according to compilers and structured in the way that machines gets the details first. On the other hand, for human, it is top-down context oriented. The details are important, but not after we establish the right context.

So for literate programming, if you just think it is how you write the code (e.g. self-documenting or not), or you think it is the amount of commenting (e.g. doc string or not), if you are not first and constantly thinking about how to structure your code and establish context, you are not getting literate programming.

Now, once you understand your ends, the means (tangle or weave), will come along. It is easy to invent one if you don't have one. On the other hand, getting your coworkers to agree and work together, that's hard. It is easy to get machines to work together and it is easy for human to cope.

[+] copperx|4 years ago|reply
It would be great if IDEs supported literate programming; the tangle/weave commands, simple as they are, create many possible points for navigation. An IDE would be ideal to go back and forth from the prose to the code.
[+] taeric|4 years ago|reply
You will be shocked to know that emacs and org-mode can do exactly this. You can tangle source, and go from the tangled source back to the section that generated that source.

If you are wanting to just do cweb, then the debugging symbols already let you step through the source line by line without having to look at the tangled source.

[+] ggm|4 years ago|reply
I have always felt a literate program is probably for many of us, a future deliverable on the hack we've implemented up front.

Very very few people can start from the abstraction and get TO a literate outcome without a lot of false steps along the way.

Or, as an alternative, the LOC of a literate program has to include the 100x cost of exploring how to carve it out of the block of mud we start from, including making our own tools.

[+] JasonFruit|4 years ago|reply
The cool thing is that if it's valuable, you can leave the iteration artifacts in the literate document, not as commented-out blocks like we often see, but as a part of the explanation of how we arrived at the final version of our solution. Not all code that's in the document has to end up in the compiled/executed files.
[+] tomjen3|4 years ago|reply
>Very very few people can start from the abstraction and get TO a literate outcome without a lot of false steps along the way.

But don't writers face the same issue with their text? Am I the only one who writes more code than what ends up in a PR? Isn't that exactly what the Git history is for?

[+] derangedHorse|4 years ago|reply
Maybe I'm missing something, but I didn't find his way of programming to be all that more useful than just having well written code. The small code snippets are labeled and shown where they are referenced but this seems to mimic the functionality of functions which, when using an IDE like Visual Studio, can have where it's referenced identified through tooling.