Basic Concepts of Literate Programming

Next: Writing Code for LPS Up: Designing and Documenting Individual Previous: Designing and Documenting Individual

Basic Concepts of Literate Programming

Programming in real life is a very difficult enterprise. While a student, you may write a total program consisting of perhaps 2,500-3,000 lines. This is about the size of a student compiler or operating system. You will write that program and then essentially discard it. In industry, you will probably have simultaneous responsibility for several routines each several thousand lines long. The ``industrial strength'' routines will be in a system that your company depends on to do business--a bug in some common systems may cost a company thousands of dollars per day until the bug is fixed.

Making the problem worse is that you probably will not have written several of those routines. Many companies rely on the ``institutional memory'' of the programmers over documentation. This situation is untenable in the long run. A major cause of this reliance is that compilers make documentation very hard to do. This is caused by two major problems:

Compilers demand their input in a very strict, non-human understandable order.
Compilers cannot deal with an extended character set intermixed with the code.

This problem faced Donald Knuth when he invented the typesetting system TeX\ in 1986. The first use of TeX was to write the documentation of the system[2]. Such systems have become known as webs[3]. There are many webs available and they all eliminate the two problems noted above.. noweb by Norman Ramsey of Bellcore, is one of the simplest and most flexible.

Literate programming systems (which I'll abbreviate as LPS but this is non-standard) deal with two types of input from a single file. These types are called chunks because the LPS rearranges chunks as needed. The two types are called documentation and code chunks. A code chunk can be any code fragment. The documentation chunks are anything that are not code chunks. The documentation chunks can contain, literally, anything. Typically, documentation chunks contain formatting information.

Obviously, there are two different uses of the file. One use is to serve input up to the compiler(s); the other is to feed the formatting programs. When the file is to be input to the compiler, we run it by a filter known as a tangle. This program strips out the comments and orders the code chunks in their proper order (see Section 7.2). When the file is to be formatted for presentation, the filter is known as a weave. The weave program inserts text formatting information.

Next: Writing Code for LPS Up: Designing and Documenting Individual Previous: Designing and Documenting Individual

Steve Stevenson
Wed Feb 26 10:54:45 EST 1997