There's a saying in business: "What gets measured gets improved."
The same holds for programming. Additionally: "What gets documented gets improved."
I used to edit my .emacs1 directly. One day I decided to convert it into a literate program2. So over the course of a week I went about extracting chunks of code, documenting them, and rearranging them.
“We tend to read the code only long enough to understand it for the purpose(s) we currently have in mind.”
The process of converting the file to literate form alone improved it. There were blocks of old unused code I deleted. When I came across lines I'd written when I was less experienced with Emacs, I was able to rewrite them in a more compact and understandable form. And I fixed some annoyances that had always bugged me but only became clear to me as I was reading through the code.
You could argue that just the act of reading the file was what improved the code. But you'd be missing the point. Why would I have read the file if I weren't documenting it? The only reasons for which I've seen people read code are:
Both of these are shallow dips into the code. When performing them, the objective is to get in and get out. We tend to read the code only long enough to understand it for the purpose(s) we currently have in mind. But we're missing a wider, broader understanding of the code. And without that understanding, we can only make focused improvements.
There is nothing wrong with incremental change. There is tremendous power in Kaizen. And maybe that's why it's such a prevalent development model.
But in order to achieve revolutionary improvements in code, we need insight. We need to be able to make connections in the code. The deeper we understand the code, the more we can hold of it in our head3.

| comments | literate text | book | |
|---|---|---|---|
| detail | function | inter-function | super-function |
| location | source file | source file | external file |
We know we need comments. Most experienced programmers write them.
Comments are the most detailed level of documentation. They deal with concepts smaller at the function level.
Literate text wraps and contextualizes the comments and the blocks of code that they inhabit. It explains the connections and relationships between blocks. It's the glue between paragraphs4.
Good literate text is a narrative. It explains concepts as they come along in the natural flow of the program.
The handbook means business. It's about getting things done.
The handbook for your program guides new programmers—or yourself once you've been away from the code for too long—to the parts they are interested in now. It provides a map, an overview of the territory that the program covers.
A handbook can have howtos, FAQs, tutorials. If your program is for other programmers, this would be your program's developer website.
We enjoy demonstrating our knowledge to other people. We want to be recognized for how smart we are. We like to show off our creations. And excepting sociopaths, we enjoy helping other people.
Many programming literati bemoan the lose of craftsmanship in our field. But if we observe the programming process, we can see why it happens.
A lone programmer5 composes a program fragment at his terminal. He's lost in the moment as he explores the problem space. He follows trails of thought along paths of reasoning, sometimes backtracking when he hits a dead end. He knows the result he's trying to achieve and why. And he's most likely being driven by a deadline, whether imposed by someone else or unconsciously by himself.
When the programmer achieves his desired result, he feels a rush of satisfaction at solving a problem and creating something. He might not by 100% satisfied with his solution or the approach he took to it, but he reasons that he can improve upon it later. For now he is justifiably exhausted and his mind needs a break.
At least.
And fixing errors doesn't count as a 2nd pass.
Comments are for documenting code as you write it. And documenting code as you write it can be a distraction.
As with video encoding, we can get a better result with a 2-pass approach. In the first pass you can focus on writing the code. Sprinkle comments as you go for any details that might be unclear.
But now your code is literature. It's a document. And you wouldn't write a document in 1 pass. That's called the first draft! You would proofread it at least once. As you proofread a document, you catch mistakes that are so obvious you're amazed that you made them in the first place. And as you subvocalize what you read you find awkward parts of the document that you need to rewrite. You may change how a sentence starts or move a paragraph up or down.
The same should apply to programming. Just as a sentence or paragraph may not make sense after being written on the fly, a function may not make sense. Or perhaps one paragraph doesn't lead cleanly into another. Maybe a paragraph is trying to say too much and needs to be broken into smaller pieces.
The proofreading becomes even more valuable once you've been away from the document for a while, especially after a period of sleep. Much of what you had stored in short-term memory has been flushed out and you see the document more from the perspective of someone who has never seen it before. So there's no need to force yourself to document as you go. Take your time. Back off from the code for a while. Give your mind room to breathe. But come back later and take a look, otherwise someone will be reading your first draft in the future.
When you believe that your program will be read (because you've provided it in a convenient format which shows value) and you have pride in either the code or the documentation, you will work to improve the other.
If one chapter of your book or section of your document looks full, impressive, and is a pleasure to read and you come across another section that falls flat, you'll naturally want to fix it. And because you have your program in printed form, you'll see those sections. They might begin to gnaw at you after a time.
It's like having a clean room. When you've cleaned your room you're more alert and intent on keeping it clean. You know the effort it took you to clean it, and you don't want to have to repeat it. Besides, you get enjoyment from the feeling of having a clean room and want it to last.
1. A .emacs file is a configuration file for the Emacs editor. It's actually a program written in the Elisp programming language. You can find mine here.
2. I used the Noweb tool, if you're curious.
3. See the studies of chess masters conducted by Alfred Binet.
4. If you're a TeXnician, it can help to think of it as glue.
5. Almost all programmers are lone programmers. Even when working in teams, most programmers are alone at their terminal as they actually compose their program fragments. The only exception I know of is in pair programming. And I doubt that even in a full Extreme Programming shop that the majority of code is written while working as pairs. How do you achieve flow without solitude?
Last Updated 2009-05-03 00:06:49 UTC
Sun May 24, 2009 23:46
[...] As you proofread a document, you catch mistakes that are so obvious you're amazed that you made them in the first place. And as you subvocalize what you read you find awkward parts of the document that you need to rewrite.
[... ] We enjoy demonstrating our knowledge to other people. We want to be recognized for how smart we are. We like to show off our creations. And excepting sociopaths, we enjoy helping other people. [...]
Thanks, Magnus
Fri Oct 02, 2009 06:41
Yes, there's always a tension between clean code and deadlines/market pressure. I hope that we can shift more towards taking more time when programming in the future. However, sometimes it feels like the industry is going in the opposite direction ;)
—Chris Forno (jekor)
Thu Jan 28, 2010 04:53
One of the principles of cleanroom programming (the methodology) is not to even run your code until after you've gone through a peer verification. Peer verification constitutes examining every line of code to ensure that it does its part of the intended function. If you've broken everything down into small enough modules, it's fairly clear.
The professor who taught this method at my school found that in practice, verified code takes less time to write. Seems like a preposterous allegation, since in addition to writing the code you're now writing all of the intended functions for all of the code and also going through a painstaking review process. But the research shows that doing this pays huge dividends in the maintenance phase, almost entirely eliminating that eternal hell of fixing bugs most projects lapse into.
People often talk about how much better their code is when they use Haskell because of the type checking. I think the expression "type checking" really doesn't do what Haskell's actually doing justice, because it's really proving quite a bit about the correctness of your program. I think if you were to combine cleanroom methodology with Haskell, you'd have a real win. Haskell's type system seems stronger to me than Eiffel's contracts, which are another useful formal method.
I'm ignoring what seems to me to be the existential crisis of software, that people seem to need to see churn. If your project isn't actively developed, you'll lose users, even if your project is provably correct or close to it. People want new features, and correctness is antithetical to bloat. Dijkstra harped on this and I think he was right. While Magento, Drupal and Joomla are gaining users, the Unix perspective of unifying many small tools hasn't really even showed up yet on the web. We expect large systems and we expect them to continue getting larger, not more correct.
The article is great advice though. I've been meaning to get serious about literate programming. This is the best argument for it I've seen yet.
—Daniel Lyons