Mark Smotherman
v. 1.5
December 4, 2021
This is a collection of links, ideas, and people that have influenced my programming and that I want to pass on to my students in CPSC 1010 and 1020. Of course, it is a personal review and not an exhaustive review of all the wisdom and experience that is available in the published literature and on the web.
The guidelines are focused on relatively high-level issues, such as interfaces, resource management, memory management, and concurrency. Such rules affect application architecture and library design. Following the rules will lead to code that is statically type safe, has no resource leaks, and catches many more programming logic errors than is common in code today. And it will run fast – you can afford to do things right.
- Optimize for the reader, not the writer
Our codebase (and most individual components submitted to it) is expected to continue for quite some time. As a result, more time will be spent reading most of our code than writing it. We explicitly choose to optimize for the experience of our average software engineer reading, maintaining, and debugging code in our codebase rather than ease when writing said code. "Leave a trace for the reader" is a particularly common sub-point of this principle: When something surprising or unusual is happening in a snippet of code (for example, transfer of pointer ownership), leaving textual hints for the reader at the point of use is valuable (std::unique_ptr demonstrates the ownership transfer unambiguously at the call site).
- Be consistent with existing code
Using one style consistently through our codebase lets us focus on other (more important) issues. Consistency also allows for automation: tools that format your code or adjust your #includes only work properly when your code is consistent with the expectations of the tooling. In many cases, rules that are attributed to "Be Consistent" boil down to "Just pick one and stop worrying about it"; the potential value of allowing flexibility on these points is outweighed by the cost of having people argue over them.
Google currently recommends C++17:
Currently, code should target C++17, i.e., should not use C++2x features, with the exception of designated initializers. The C++ version targeted by this guide will advance (aggressively) over time.Do not use non-standard extensions.
Consider portability to other environments before using features from C++14 and C++17 in your project.
Brooks has wonderfully described the joys and woes of programming in his book, The Mythical Man Month:
The Joys of the CraftThe Mythical Man Month is a reflection on why the software effort for S/360 had so many more difficulties than the hardware effort. It presented Brooks's Law: "Adding manpower to a late software project makes it later." The full text of the first edition of the book (1975) can be found here (funded by Gordon Bell). For a 20th-anniversary edition of Mythical Man Month in 1995, Brooks added four chapters, including his 1986 essay "No Silver Bullet – Essence and Accidents of Software Engineering" in which he writes, "There is no single development, in either technology or in management technique, that by itself promises even one order-of-magnitude improvement in productivity, in reliability, in simplicity." and a reflection on his essay after a nine-year period, "'No Silver Bullet' Refired", in which he again stated that there was no "magical solution" to improving programming productivity. Recently, at ICSE 2018, he extended his prediction that no silver bullet will be found within the next ten years.Why is programming fun? What delights may its practitioner expect as his reward?
First is the sheer joy of making things. As the child delights in his mud pie, so the adult enjoys building things, especially things of his own design. I think this delight must be an image of God's delight in making things, a delight shown in the distinctness and newness of each leaf and each snowflake.
Second is the pleasure of making things that are useful to other people. Deep within, we want others to use our work and to find it helpful. In this respect the programming system is not essentially different from the child's first clay pencil holder "for Daddy's office."
Third is the fascination of fashioning complex puzzle-like objects of interlocking moving parts and watching them work in subtle cycles, playing out the consequences of principles built in from the beginning. The programmed computer has all the fascination of the pinball machine or the jukebox mechanism, carried to the ultimate.
Fourth is the joy of always learning, which springs from the nonrepeating nature of the task. In one way or another the problem is ever new, and its solver learns something: sometimes practical, sometimes theoretical, and sometimes both.
Finally, there is the delight of working in such a tractable medium. The programmer, like the poet, works only slightly removed from pure thought-stuff. He builds his castles in the air, from air, creating by exertion of the imagination. Few media of creation are so flexible, so easy to polish and rework, so readily capable of realizing grand conceptual structures. (As we shall see later, this very tractability has its own problems.)
Yet the program construct, unlike the poet's words, is real in the sense that it moves and works, producing visible outputs separate from the construct itself. It prints results, draws pictures, produces sounds, moves arms. The magic of myth and legend has come true in our time. One types the correct incantation on a keyboard, and a display screen comes to life, showing things that never were nor could be.
Programming then is fun because it gratifies creative longings built deep within us and delights sensibilities we have in common with all men.
The Woes of the Craft
Not all is delight, however, and knowing the inherent woes makes it easier to bear them when they appear.
First, one must perform perfectly. The computer resembles the magic of legend in this respect, too. If one character, one pause, of the incantation is not strictly in proper form, the magic doesn't work. Human beings are not accustomed to being perfect, and few areas of human activity demand it. Adjusting to the requirement for perfection is, I think, the most difficult part of learning to program.
Next, other people set one's objectives, provide one's resources, and furnish one's information. One rarely controls the circumstances of his work, or even its goal. In management terms, one's authority is not sufficient for his responsibility. It seems that in all fields, however, the jobs where things get done never have formal authority commensurate with responsibility. In practice, actual (as opposed to formal) authority is acquired from the very momentum of accomplishment.
The dependence upon others has a particular case that is especially painful for the system programmer. He depends upon other people's programs. These are often maldesigned, poorly implemented, incompletely delivered (no source code or test cases), and poorly documented. So he must spend hours studying and fixing things that in an ideal world would be complete, available, and usable.
The next woe is that designing grand concepts is fun; finding nitty little bugs is just work. With any creative activity come dreary hours of tedious, painstaking labor, and programming is no exception.
Next, one finds that debugging has a linear convergence, or worse, where one somehow expects a quadratic sort of approach to the end. So testing drags on and on, the last difficult bugs taking more time to find than the first.
The last woe, and sometimes the last straw, is that the product over which one has labored so long appears to be obsolete upon (or before) completion. Already colleagues and competitors are in hot pursuit of new and better ideas. Already the displacement of one's thought-child is not only conceived, but scheduled.
This always seems worse than it really is. The new and better product is generally not available when one completes his own; it is only talked about. It, too, will require months of development. The real tiger is never a match for the paper one, unless actual use is wanted. Then the virtues of reality have a satisfaction all their own.
Of course the technological base on which one builds is always advancing. As soon as one freezes a design, it becomes obsolete in terms of its concepts. But implementation of real products demands phasing and quantizing. The obsolescence of an implementation must be measured against other existing implementations, not against unrealized concepts. The challenge and the mission are to find real solutions to real problems on actual schedules with available resources.
One of his early essays recommended "Programming Bottom-Up," and he gives four advantages (citation removed):
- By making the language do more of the work, bottom-up design yields programs which are smaller and more agile. A shorter program doesn't have to be divided into so many components, and fewer components means programs which are easier to read or modify. Fewer components also means fewer connections between components, and thus less chance for errors there. As industrial designers strive to reduce the number of moving parts in a machine, experienced Lisp programmers use bottom-up design to reduce the size and complexity of their programs.
- Bottom-up design promotes code re-use. When you write two or more programs, many of the utilities you wrote for the first program will also be useful in the succeeding ones. Once you've acquired a large substrate of utilities, writing a new program can take only a fraction of the effort it would require if you had to start with raw Lisp.
- Bottom-up design makes programs easier to read. An instance of this type of abstraction asks the reader to understand a general-purpose operator; an instance of functional abstraction asks the reader to understand a special-purpose subroutine.
- Because it causes you always to be on the lookout for patterns in your code, working bottom-up helps to clarify your ideas about the design of your program. If two distant components of a program are similar in form, you'll be led to notice the similarity and perhaps to redesign the program in a simpler way.
In the introduction to Writing Solid Code, Maguire states (emphasis in original):
The most critical requirement for writing bug-free code is to become attuned to what causes bugs. All of the techniques and guidelines presented in this book are the result of programmers asking themselves two questions over and over again, year after year, for every bug found in their code:For example, one technique he recommends is that your program should scan and validate any data tables at startup. This catches any bug in the data as early as possible. Maguire provides a number of memory logging routines in his book, but I would now recommend that you use valgrind instead (see below).
- How could I have automatically detected this bug?
- How could I have prevented this bug?
Code Complete is over 800 pages in length and full of coding wisdom. I find it hard to pick one best idea, but let me highlight part of his discussion of software quality:
There's no such thing as a free lunch, and even if there were, there's no guarantee that it would be any good. Software development is a far cry from haute cuisine, however, and software quality is unusual in a significant way. The General Principle of Software Quality is that improving quality reduces development costs.Links to his articles in IEEE Software and other periodicals can be found here. The front matter, chapter 5, and the index of Code Complete 2 can be found here.Understanding this principle depends on understanding a key observation: The best way to improve productivity and quality is to reduce the time spent reworking code, whether the rework arises from changes in requirements, changes in design, or debugging. The industry-average productivity for a software product is about 8 to 20 lines of code per person per day. It takes only a matter of minutes to type in 8 to 20 lines of code, so how is the rest of the day spent?
A software product takes much longer to develop than a simple program and involves a lot more than merely typing in the code. But that's not what usually fills the rest of the day.
The rest of the the day is usually spent debugging. Debugging takes about 50 percent of the time spent in a traditional, naive software-development cycle. Getting rid of debugging by preventing errors improves productivity. Therefore, the most obvious method of shortening a development schedule is to improve the quality of the product and decrease the amount of time spent debugging and reworking the software.
Parnas has had lasting impact on software design with his concept of information hiding, which is the idea that programmers should identify and encapsulate design decision "secrets" to reduce complexity and to restrict the impact of the most-likely changes to the design. Here are excerpts from his 2009 presentation on "Making Information Hiding Work" (emphasis in original):
Information Hiding is often considered an empirical result.
It is a theoretical result!
Theorem: If you can prove program A correct knowing only the interface to program B, then if you change B without changing its interface, A need not change (proof still valid).
Requires empirical verification
• That people can know what to hide and what they are hiding.
• That the interfaces can be efficient
• That we can describe interfaces without revealing secrets.
...
We still have bad software because Information Hiding is not done well.
• Secrets show (even in textbook examples).
• The abstraction is not documented.
...
What would make a module structure good?
• Parts can be designed independently.
• Parts can be implemented independently.
• There are no duplicate or almost alike pieces of code (clones, cut/pasties)
• Parts can be tested independently.
• Parts can be changed independently.
• Integration is easy if the modules met their specification
• Maintainers can find the relevant code and change it quickly
He recommends that students learn more than just Java and that they work to fully comprehend pointers and recursion:
But beyond the prima-facie importance of pointers and recursion, their real value is that building big systems requires the kind of mental flexibility you get from learning about them, and the mental aptitude you need to avoid being weeded out of the courses in which they are taught. Pointers and recursion require a certain ability to reason, to think in abstractions, and, most importantly, to view a problem at several levels of abstraction simultaneously. And thus, the ability to understand pointers and recursion is directly correlated with the ability to be a great programmer.He also emphasizes the importance of fixing bugs early in his essay, The Joel Test: 12 Steps to Better Code:Nothing about an all-Java CS degree really weeds out the students who lack the mental agility to deal with these concepts. As an employer, I've seen that the 100% Java schools have started churning out quite a few CS graduates who are simply not smart enough to work as programmers on anything more sophisticated than Yet Another Java Accounting Application, although they did manage to squeak through the newly-dumbed-down coursework.
5. Do you fix bugs before writing new code?..."zero defects" meant that at any given time, the highest priority is to eliminate bugs before writing any new code. Here's why.
In general, the longer you wait before fixing a bug, the costlier (in time and money) it is to fix.
For example, when you make a typo or syntax error that the compiler catches, fixing it is basically trivial.
When you have a bug in your code that you see the first time you try to run it, you will be able to fix it in no time at all, because all the code is still fresh in your mind.
If you find a bug in some code that you wrote a few days ago, it will take you a while to hunt it down, but when you reread the code you wrote, you'll remember everything and you'll be able to fix the bug in a reasonable amount of time.
But if you find a bug in code that you wrote a few months ago, you'll probably have forgotten a lot of things about that code, and it's much harder to fix. By that time you may be fixing somebody else's code, and they may be in Aruba on vacation, in which case, fixing the bug is like science: you have to be slow, methodical, and meticulous, and you can't be sure how long it will take to discover the cure.
And if you find a bug in code that has already shipped, you're going to incur incredible expense getting it fixed.
Although discouraged by style guidelines, I have found it helpful to use spaces to pad out the conditions for better visual effect. Placing simple statement blocks on the same lines as the conditions and vertically aligning all the blocks also add to the visual table effect, e.g.,
if ( newNum < oldNum ) { decrFlag = 1; }
else if ( newNum == oldNum ) { equalFlag = 1; }
else { incrFlag = 1; }
After testing, you can use a "code beautifier" tool like indent
to reformat the code to meet the expected style guidelines.
If the dependent conditions being tested are simple values, then the nest of if-else statements can usually be replaced by a switch statement or perhaps by a lookup table for a more concise program segment.
An advanced technique is to encode various conditions into an index variable using bitwise or-ing into specific fields of the index value and then using the index value to access the appropriate entry in a lookup table. E.g., the code segment above becomes:
if ( newNum < oldNum ) { indexValue = indexValue | 4; }
else if ( newNum == oldNum ) { indexValue = indexValue | 2; }
else { indexValue = indexValue | 1; }
In C and C++ a for
loop is an abbreviation of a
while
loop. Therefore, the loop:
for ( i = 0; i < count; i++ ) {
// loop body
}
expands into:
i = 0;
while ( i < count ) {
// loop body
i++;
}
I rarely use do-while
loops. Note that the condition test
in a do-while
loop is executed after the body of the loop
has been executed; so, unlike a while
loop,
a do-while
loop has to execute at least once.
A colleague sent me this
humorous comparison.
Authors and sources vary on suggested maximum routine sizes. I remember being advised to limit PL/I routines to at most one page of line printer output (66 lines). McConnell reviewed several studies on routine sizes and error rates; while he saw no consistent pattern across the studies, he recommends that you limit routines to 200 lines. The Google C++ Style Guide recommends that functions should be less than 40 lines, and the C++ Core Guidelines state that 1-5 line functions should be considered normal.
The Google C++ Style Guide and the C++ Core Guidelines have numerous other recommendations about function design and preferred parameter-passing choices in C++.
Other courses will discuss these choices in more depth, but here are the general considerations:
This simple structure is also sometimes used for a FIFO queue with an insert operation at the tail and a remove operation at the head. However, the insert operation requires a list traversal if there is only a head pointer. As mentioned above, adding a tail pointer makes the inserts at the tail more efficient (in particular, inserts become O(1) as opposed to O(N)).
A testing tip for linked lists is to have at least one test vector that populates a list, empties it, and then re-populates it. In grading student programs over the years, restoring the empty list condition after removing the last node of a list is a common source of bugs (e.g., the head pointer may be correctly set to null but the tail pointer is a dangling reference).
When feasible, consider using C++ vectors or one of the various container classes in the C++ STL in place of writing your own linked list routines.
By separating the method definitions from the class declaration, a C++ class can partially satisfy the ideal of an abstract data type. However, a class declaration does not completely hide the implementation since it discloses the private attributes in the declaration. Furthermore, limits on the values that the private attributes can hold are not specified in the class declaration.
Good class design means that there is a class invariant that must be maintained by all methods; thus, unrestricted access should not be allowed. Use a struct instead of a class if you plan to provide unrestricted getter and setter methods for each private variable. To illustrate this, consider a point data structure with x and y coordinates. If there is no invariant that must be maintained (e.g., the point must lie in Quadrant I), then the point should be defined as a struct rather than a class. See Erik Engheim, "To Hell With Setters and Getters," Medium, November 4, 2017.
Abstract Data Type C++ Class specifies the values that the data type can hold not in general; invariants can help specifies the operations that can be performed yes, through the list of public methods without specifying how the data type is implemented partially, through the separate definition of the class methods; however,
- private attributes and methods are visible in the class declaration
- public getter and setter methods also reveal the implementation
// UML // // +---------------+ // | Cup | // +---------------+ // | - max_volume_ | private attributes // | - volume_ | names use Google C++ style with _ suffix // +---------------+ // | + Cup() | public methods // | + Capacity() | abbreviated versions without parameter // | + Contains() | and return value data types // | + Increase() | // | + Decrease() | // +---------------+ // ... // // invariants // // max_volume_ >= 3 // // 0 <= volume_ <= max_volume_
Input file Code pattern 4
10
20
30
40scanf( "%d", &count );
for ( i = 0; i < count; i++ ) {
scanf( "%d", &value );
// use value
}
Input file Code pattern 10
20
30
40
-1scanf( "%d", &value );
while ( value != -1 ) {
// use value
scanf( "%d", &value );
}
Input file Code pattern 10
20
30
40while ( scan( "%d", &value ) != EOF ) {
// use value
}
head
, grep
, sort
,
tail
, uniq
, and wc
are Unix commands that are often helpful when combined with each other.
Furthermore, many programs are written for the Unix environment using the filter pattern so that they can easily participate in a pipeline. This means that a program is written for a single purpose and that it reads a simple text stream from stdin and writes a simple text stream to stdout. (Note that stderr does not flow through the pipeline without extra redirection.) The problem-solving power comes from the ability to easily combine several single-purpose programs on the command line.
See this lesson on Unix pipes and filters. See also Eric S. Raymond's discussion of the filter pattern here and my discussion here.
There are multiple ways to set a parameter that governs the execution of a program (for example, the size of an array):
For example, consider testing a stack module. The mini-language could contain commands such as:
e
+ 3
t
v 3
-
e
And the shell script could be:
./test_interpreter < test_vector_1
./test_interpreter < test_vector_2
...
Turn on all the compiler warnings (e.g., -Wall). Use your own assertion statements to validate input, parameters, assumptions, preconditions, postconditions, and even "impossible" conditions (i.e., sanity checks).
If you have a long list of compiler errors, fix the first few then look down the remainder for obvious errors that you can also fix. After you have fixed all the obvious errors and those at the top of the list, recompile. Some of the non-obvious error messages may have been generated because statement parsing in the compiler was thrown off by one of the early errors, and you may now see a much shorter list.
Use version control or consider saving a copy of your original code.
You may find that the "fixes" you made while debugging are wrong
and that you want to restart from the original code, or you want to
use diff
to compare the original and latest copies.
Think before you throw in extra print statements. Using assertions and/or a debugger may be a better use of your time. Alternatively, try to recreate the bug in a smaller or more constrained context. You may want to write a trivial program to verify your understanding of a language concept. You may want to write a special test case that will confirm or deny your guess about what went wrong.
If you do decide to add print statements, consider a "binary search" approach to limit the amount of information you need to review. For example, add the print statement to the middle of your program, and then see if the error occurs before or after this point.
If you are convinced that you need to add a large number of print statements, consider using macros and conditional compilation. Or perhaps consider defining a verbose mode for your program, which can be enabled by a command line flag. Use special formats for the extra debugging output so that you can easily visually scan for unusual patterns or other clues. The extra output lines can also be piped through grep to screen for critical messages.
Alternatively, you can write the extra information to a log file, or to a "flight recorder" internal data structure (i.e., a circular buffer). If you choose logging, consider logging the sizes of your data structures at key points to gain clues from imbalances, perhaps using a graphical approach for summarizing the extra information (e.g., a histogram). This is usually done by writing a script to post-process the log file.
Consider logging each entry into key functions or procedures, where the log record contains not just the routine name but also the values of the input parameters. You may want to additionally log each exit from these key routines where the log record contains the return value or values of the output parameters. Reviewing these values may give you clues about any bugs in the calling routines.
Consider using signature checking for dynamic data structures, that is, defining an extra field that will contain a structure-specific identifier (i.e., the signature, or what McConnell calls a dog tag and others call a canary or sentinel). The identifier can be a character string or an unusual hex number. When a pointer to a data structure is passed to a routine, have the routine first validate the data structure by checking for the correct signature. Just prior to freeing the data structure, be sure to check and then change the signature; this will help catch double-free and use-after-free errors. McConnell recommends including signatures at the beginning and the end of dynamically allocated areas to catch overwrite errors. A checksum field can also be used to catch corrupted data areas.
After you have experienced the woes of debugging several times, use the memory of these "war stories" to counteract the resentment you may feel towards taking the extra time to unit test.
For example, the running time for matrix multiplication can be greatly reduced when picking the best loop nesting order and/or adding cache blocking (also called tiling). For common algorithms there may be compiler optimizations and autotuners that generate efficient code. PHiPAC, Portable High Performance ANSI C, and ATLAS, Automatically Tuned Linear Algebra Software, are example autotuners for common dense linear algebra applications.
Typically binding times are:
The encoding of a state machine would use bit fields of the index value to represent the output and transition tables. An MS thesis at CMU implemented this approach for comparing all possible 2-bit branch predictors, which would need 20 bits:
Index (Binary) Condition Parameter Reject? 0 (0 00) F 0 1 (0 01) F 1 2 (0 10) F 2 3 (0 11) F 3 yes 4 (1 00) T 0 5 (1 01) T 1 6 (1 10) T 2 7 (1 11) T 3 yes
each possible fsm is encoded as a 20-bit string inputs and predictions: 0 = untaken, 1 = taken four states: 0,1,2,3 (encoded in two bits) state 0 is the intial state output table: .---------- prediction for state 0 (0 or 1) | .------- prediction for state 1 (0 or 1) | | .---- prediction for state 2 (0 or 1) | | | .- prediction for state 3 (0 or 1) | | | | p0 p1 p2 p3 n0[] n1[] n2[] n3[] | | | | | | | | transition table: | | | `- next states for state 3 | | `------ next states for state 2 | `----------- next states for state 1 `---------------- next states for state 0 where ni[] is a 4-bit string of 2-bit next state for state i and input 0 2-bit next state for state i and input 1
Using this encoding technique, you can execute the various possibilities directly or you can structure the code with the for loop as a generator program that writes out complete C programs to different source files. Another alternative is to structure the generator to write a program to a temporary file and use system() calls in Linux to compile and run the generated program. The generated program should then write the index value (i.e., its "signature") to an output file along with its other output. Then only the output files need to be saved and can be postprocessed for various metrics with the best metrics tagged with the temporary program signatures.
There is a spectrum of approaches to software design, from the rigid "waterfall" approach, where the design is expected to be complete before coding starts, to the freewheeling "code and fix", where little attention is paid to the overall software design. There are problems at both edges of the spectrum, and many experienced programmers favor an incremental or spiral approach, where an initial design is refined during its implementation.
"Code and fix" – Years ago, mainframe programmers had to wait hours or even days to get results from the run of a compiler. Consequently, they were more careful about avoiding bugs when writing a program. Charles Mann describes the unintended negative effect of today's immediate feedback when compiling code in "Why Software Is So Bad":
Instead of meticulously planning code, programmers stayed up in caffeinated all-night hacking sessions, constantly bouncing results off the compiler. Again and again, the compiler would spit back error messages; the programmers would fix the mistakes one by one until the software compiled properly. "The attitude today is that you can write any sloppy piece of code and the compiler will run diagnostics," says SRI's Neumann. "If it doesn't spit out an error message, it must be done correctly, right?"As programs grew in size and complexity, however, the limits of this "code and fix" approach became evident. On average, professional coders make 100 to 150 errors in every thousand lines of code they write, according to a multiyear study of 13,000 programs by Humphrey of Carnegie Mellon.
... The inadequate design in the final products, critics argue, reflects inadequate planning in the process of creating them. According to a study by the Standish Group, a consulting firm in West Yarmouth, MA, U.S. commercial software projects are so poorly planned and managed that in 2000 almost a quarter were canceled outright, creating no final product. The canceled projects cost firms $67 billion; overruns on other projects racked up another $21 billion. But because "code and fix" leads to such extensive, costly rounds of testing, even successful projects can be wildly inefficient. Incredibly, software projects often devote 80 percent of their budgets to repairing flaws they themselves produced-a figure that does not include the even more costly process of furnishing product support and developing patches for problems found after release.
"System testing goes on for almost half the process," Humphrey says. And even when "they finally get it to work, there's still no design." In consequence, the software can't be updated or improved with any assurance that the updates or improvements won't introduce major faults. "That's the way software is designed and built everywhere-it's that way in spaceships, for God's sake."
Unfortunately, most student projects are done in the code-and-fix style because they are used once and then thrown away, what my colleague Brian Malloy calls "Dixie cup" programming. Steve McConnell uses the analogy of building a doghouse (i.e., code and fix) versus building a skyscraper. I find the building analogy useful, e.g.,
Agile – There are numerous incremental approaches to software development. Agile development is one of those approaches that is currently popular. Bertrand Meyer has analyzed Agile methods and written a book, Agile! The Good, the Hype and the Ugly. He rejects Agile's replacements of software requirements and specifications by user stories and tests; however, he praises Agile's short iteration cycles and its association of tests with functionality. An ACM webcast by Meyer presenting the key ideas from his book can be found here.
Design by contract – My colleague Murali Sitaraman highly recommends design by contract, which Bertrand Meyer introduced in the Eiffel programming language in the 1980s. There are now a number of third-party libraries and preprocessors for implementing design by contract in other programming languages, such as C++. Design by contract relies on the programmer explicitly listing (ideally prior to any coding) the preconditions, postconditions, and invariants for a software component. This causes the programmer to pay careful attention to the design of a software component. Design assumptions are placed directly in the code and are automatically used for assertion checking during software development. Direct language support reduces the clutter of individual assertion/if statements that would otherwise be needed, and the assertion checking can be turned off for better performance in trusted production code. For more information and examples, see here.
Object-oriented design – Brooks termed object-oriented (O-O) programming a "brass bullet" in his "'No Silver Bullet' Refired" reflection, since it promises to allow programmers to build with bigger pieces. Brooks considered why O-O hasn't been a silver bullet and shared an excerpt from correspondence with Parnas:
The answer is simple. It is because [O-O] has been tied to a variety of complex languages. Instead of teaching people that O-O is a type of design, and giving them design principles, people have taught that O-O is the use of a particular tool. We can write good or bad programs with any tool. Unless we teach people how to design, the languages matter very little. The result is that people do bad designs with these languages and get very little value from them. If the value is small, it won't catch on.Further work in O-O design has led to the development of design patterns, with over 50 patterns identified in the wikipedia article.
Tom Dalling, author of Programming for Beginners, writes in a blog post about the importance of vocabulary:
Doctors do not discuss diagnoses with each other in simple terms. Medical schools don't teach in simple terms, either. Medicine is complicated, so they communicate with a complicated set of terminology. Everyone understands the term "broken leg," but that is too vague for a medical diagnosis. Using precise terminology like "transverse compound fracture of the tibia" allows medical staff and students to communicate accurately.Programming is not as complex as the human body, but it is still complex. This complexity is why programming has so much terminology. There are lots of concepts, and all of those concepts have names. Just like in medicine, we use terminology to communicate accurately, because simple terms are often too vague.
... there are no three tips that will fix this problem. You just have to invest time into learning the vocabulary.
In his "'No Silver Bullet' Refired" reflection, Brooks noted the problem of programmers needing a larger vocabulary to facilitate reuse of large building blocks:
Higher-level languages have larger vocabularies, more complex syntax, and richer semantics.As a discipline, we have not pondered the implications of this fact for program reuse. To improve quality and productivity, we want to build programs by composing chunks of debugged function that are substantially higher than statements in programming languages. Therefore, whether we do this by object class libraries or procedure libraries, we must face the fact that we are radically raising the sizes of our programming vocabularies.
Vocabulary learning constitutes no small part of the intellectual barrier to reuse. So today people have class libraries with over 3000 members. Many objects require specification of 10 to 20 parameters and option variables.
Anyone programming with that library must learn the syntax (the external interfaces) and the semantics (the detailed functional behavior) of its members if they are to achieve all of the potential reuse.
This task is far from hopeless. ... We need research to appropriate for the software reuse problem the large body of knowledge as to how people acquire language.
Good choices of names in a design can lessen the burden of vocabulary learning. Ralph Johnson and Brian Foote highlight this in their paper on "Designing Reusable Classes":
Although protocols are important for defining interfaces within programs, they are even more important as a way for programmers to communicate with [each] other. Shared protocols create a shared vocabulary that programmers can reuse to ease the learning of new classes. Just as mathematicians reuse the names of arithmetic operations for matrices, polynomials, and other algebraic objects, so Smalltalk programmers use the same names for operations on many kinds of classes. Thus, a programmer will know the meaning of many of the components of a new program the first time it is read.
The Google C++ Style Guide says:
The most important consistency rules are those that govern naming. The style of a name immediately informs us what sort of thing the named entity is: a type, a variable, a function, a constant, a macro, etc., without requiring us to search for the declaration of that entity. The pattern-matching engine in our brains relies a great deal on these naming rules.
It's clear that this page is very limited in what I have presented. I am leaving out contributions to programming skills and software engineering from Bentley, Dijkstra, Hoare, Kernighan and Plauger, Knuth, Scott Meyers, Wirth, and so many others. I hope that what I have included is helpful to you, the reader, and encourages you in the craft of programming.
[Mark's homepage] [CPSC homepage] [Clemson Univ. homepage]
mark@clemson.edu