Beyond Y2K – Its Just the Beginning
John D. McGregor
When you read this we will be rapidly approaching the culmination of the most chronicled design decision in computing history. Last minute efforts will be demanded by managers, promises will be made by politicians, and millions of dollars will be made by hucksters from various scams. Incidentally, ironic isn’t it that 2K is not 2000! Nevertheless, this is not a column about Y2K specifically, rather it is about the quality and integrity of design decisions.
The Y2K crisis is based on a design decision made by a number of people many years ago. Would they have made a different decision then with today’s knowledge? I am not certain that they would or that they should. Should they have wasted all that storage space all those years when it wasn’t needed because in twenty or thirty years it would be needed? I don’t think so.
The other night riding in the airport limo I was asked whether the Y2K "situation" raised ethical issues. My response was that the original decision was reasonable in context and was not malicious; therefore, it is not an ethical issue. I went on to say that certainly the way that the repair of Y2K-related problems is handled in a company could raise ethical issues. Any manager that did not carefully examine the inventory of software either in use in the company, or produced by the company, in a timely manner for possible problems was acting unethically with respect to their clients. Incidentally, I worked with an international standard in one domain that was adopted in1997 and is not Y2K compliant!
I believe that we are going to increasingly be faced with fundamental problems that are the result of poor design decisions. This is inevitable given:
I want to consider two issues in this column that are related to the current crisis. First, I will discuss the origin of the Y2K phenomenon and the general problem of quality design decisions. Second, I will describe some possible ways to improve the quality of these decisions.
The next "Y2K" bug
Let me say immediately that I am not smart enough to know what the next major problem will be, but given the ubiquitous nature of computers, it will undoubtedly have far reaching impact. Even if the decision is not as low level and fundamental as the Y2K issue, it could still have tremendous impact. Imagine the effect of a faulty decision that is limited to e-commerce systems. I think I know something about how this earth-shaking event will be created. There are two key factors in creating a crisis such as Y2K: poor documentation of design decisions and poor planning.
The first requirement is to make a design decision without adequately investigating and communicating the limitations of that decision. The original design decision to only use two digits to represent the year was a good decision at the time. I am not certain whether a detailed study was carried out, but I am certain that there was insufficient documentation of the fact that at some time in the future two digits would not be acceptable. In fact, since that time, thousands of developers accepted this decision as a basic principle rather than a constraint subject to review. Many systems have been constructed using this assumption even when storage space was no longer the expensive problem it once was.
Everyday in software factories around the world, thousands of decisions are made. Any one of these, if it is sufficiently fundamental and if it is propagated to enough other systems, could become the next "Y2K" bug. Interestingly, some of the very techniques that are intended to improve software development, such as design patterns, foster an environment in which a single decision can be widely propagated and maybe employed in environments where it is not applicable. However, I want to use the design pattern technique to support our discussion of making quality design decisions.
Figure 1 shows some of the sections in one popular format of design pattern description. The Problem section provides an opportunity for the development team to construct a statement of the design decision to be made. I have found that this is often the most important step since it causes the team to carefully consider, often for the first time, exactly what is the problem to be solved. The Context section also provides an opportunity for the project staff to consider the basic constraints over which they have no control.
Figure 1 – Design Pattern Format
The Forces section contains the discussion of the competing opportunities. Below we will discuss a technique for structuring this section. The Solution section provides the description of the basic design that has been determined to be optimal. The Resulting Context section describes how the original environment has been modified because of the application of the pattern.
Two basic decisions being made in many projects these days are the selection of a database model, relational or object-oriented, and the selection of a distribution architecture. I mention these together because each of these decisions influences the other and ultimately force other decisions as well. The technique, that will be discussed below, supports the analysis of dependent decisions; however, I want to focus on only the database selection issue below.
The second requirement for creating a Y2K level of crisis is the failure to plan beyond the next couple of versions of an application. There is a nuclear power plant near my home. It’s owner recently applied to extend its licensed life for an additional ten years beyond the original planned life of twenty years. This will require a re-evaluation of the plant’s design and a projection of whether the wear experienced in the past twenty years indicates that it can safely operate another ten. What is of interest here, beyond the fact that I can glow in the dark at parties, is that the original plan actually included a specific lifetime for the plant. The physical nature of the plant and the radiation that causes it to deteriorate makes this type of calculation a matter of mathematics. Specific life times are also identified for airplanes in commercial service. The FAA takes the age of an airplane into consideration when specifying what maintenance procedures must be performed.
Some software products have implicit lifetimes, such as tax preparation programs, but most just live until the next version is installed or until competing programs render it obsolete. The basic lifetime of a software application is not considered in corporate planning. This is considered a reasonable view because software does not "wear" the way that physical machinery does. Or does it?
Think about physical components in a loosely coupled mechanical environment such as an automobile. As a mechanical component moves friction causes the part to change. When a component breaks, it is replaced. Often it is replaced with the latest product that performs that function rather than an exact duplicate. I recently replaced the original equipment tires on my van with a replacement product from a national manufacturer. The newer tires have a different drag coefficient and a different edge design making them squeal every time I turn a sharp corner in the mountains near my home. As the car gets older worn parts change how the car steers and rides. Finding original parts becomes more difficult and the replacement parts often have just a slight mismatch.
As a piece of software gets older, individual statements do not change but their result may be different if the statement causes an interaction with another component that has been changed. A recent crash of a major e-commerce system was apparently due to the fact that a new release of the system expected a particular patch of the operating system that had not been installed on the server machine. New releases of existing components bring new implementations of existing interfaces. New standards emerge, versions are backward compatible to a point, and the old browser won’t display newer versions of html. Your browser "fails" on more and more web sites. Your trusty graphics editor will not import nor produce the newer formats. The new version of the operating system provides new ways of performing a standard operation but also introduces subtle side effects that are not readily obvious, or are all too obvious. Tried running VisiCalc lately? Over time, a piece of software "looks and feels" old and does not fit into its environment as tightly as newer software. It is "wearing".
As I will describe below, companies that produce large amounts of software either for their own internal use, or integration into a more comprehensive product or just as a product in its own right should allocate some resources to the review and maintenance of basic assumptions underlying their products. Below I will outline some techniques for supporting this activity.
Avoiding That Next Bug
As I have already stated, I am not smart enough to know which design or implementation decision will be the next world-effecting bug. Instead I am going to talk about how we might avoid creating the next bug or at least recognize it before it becomes a global threat like the current problem. I will first discuss three techniques for addressing these issues and then I will show some additional tasks which support those techniques.
First is the technique of introspection. Basically this is an analysis in which we examine the internal, self-imposed requirements and constraints of a project rather than the external user requirements. We can not question every detail of every design on every project; therefore, we must find a way to identify those details that have the potential to adversely affect the quality of the project. Lets’ consider a couple of possible techniques.
First is the time honored testing technique of examining boundaries. A tester builds test cases that examine the values closest to points at which a small change in input values results in a large change in the expected system response. Sensitivity analysis is a technique in which the analyst investigates the input range to identify places at which the system’s response changes, or is expected to change, in a larger than usual manner. Design decision boundaries include the points at which the word size changes from 32 bits to 64, the character set encodings change or the year changes from 1999 to 2000. When these changes occur, an introspective analysis should be triggered.
In the last few years one major change that has been recognized as a potential difficulty is internationalization. The change from English to French or Italian, which can be handled by the substitution of one string for another, is not a boundary point. It is the change of character set say from English to Chinese that is a boundary point. This change may require different collating sequences that have different numbers of members, messages that take widely varying amounts of space and that require display drivers that can display unique symbols.
A context diagram, such as the one in Figure 2, is a second tool that can be used to trigger introspection. This diagram is typically used to show the "external" dependencies of a system. As these entities change, such as a new version of the database, this should trigger analyses that examine any dependencies between the system and the changed external component. I am using "external" here to mean any piece that the current project is not responsible for producing. Adding support for a new platform such as Unix usually does trigger some level of analysis but usually only to the extent of changes in external interfaces between the OS and the application. An introspective analysis would also consider indirect relationships that might be affected. For example, sizes of communication buffers may be different and may lead to unexpected results.
Figure 2 – Context Diagram
Life Cycle Management
The second technique for avoiding the next Y2K bug involves the application of a life cycle perspective to software asset management. Life cycle management is a common technique within IT organizations for managing the costs and quality of hardware but it has not often been applied to software especially at the component level. A life cycle approach considers all of the activities from the creation of the asset to its destruction. In our case we will consider the life cycle for a software component.
First, the life cycle approach defines a series of phases through which the asset progresses from the time that it is created to the time that it is destroyed. The life cycle for an object is different from the life cycle for a class or component. A class is a definition that goes through stepwise refinement over time. An object is a runtime entity that goes through the states defined for it in the class definition. There are different phases through which each progresses at different rates and varying time scales. By definition classes last at least as long as any of their instances exist. Object life times vary from a few processor cycles for objects used as transient copies during assignment operations to years if the object is persistent.
Figure 3 shows the life cycle definition that we use for software components. This life cycle is a continuum along which each component can be placed. The position on that continuum guides the developer in terms of what actions should be taken to reuse a component or to move it along to a more mature stage.
Figure 3 – Component Life Cycle
Concept realized as a component
Component used in n applications
Concept refined & implementation refactored
Component is used in many applications
Component technically obsolete
Component obsolete & removed from catalog
The life cycle approach facilitates the assignment of a lifetime to each asset. This can be based on estimates derived from the perceived life times of systems that use the component and other interacting components. Context diagrams like the one that was used above can be used to define those systems that will affect the lifetime of the software asset. The analysis might consider questions such as:
I want to consider the example of a decision of whether to use a relational or object-oriented database for a financial accounting system. I am going to use the design pattern description format presented above to document the issues. I am NOT suggesting that the information constitutes a design pattern, merely that this is a comprehensive framework for capturing and organizing the relevant information.
Problem: The financial accounting system allows several levels of financial service people to access information about the financial transactions of a company. The system retains information about financial transactions for at least the seven years required by the Internal Revenue Service. The system being developed could use either a relational or an object-oriented database.
Context: The system is being designed using an object-oriented approach and implemented using Java. The information is stored in objects while it is manipulated by the application. It can be stored in any format provided it is translated into objects.
Forces: Figure 4 presents a "House of Quality" diagram from Quality Function Deployment . The vertical list on the left represents the forces against which the "solutions" must be evaluated. Each is accompanied by a priority assigned by the development team. The horizontal list across the top represents the design choices. In this case I am going to look at 4 individual products but they are arranged to also provide a comparison of the object-oriented (the left most pair) and relational models.
The body of the matrix contains ratings that would be derived in any number of ways. (These are not real numbers. I don’t want to spend time defending specific results.) The use of 9, 3, 1 as weighting values separates the ratings sufficiently to reflect major differences. This is realistic by not giving the user of the form the ability to express very small differences since this type of analysis is really only accurate at a macro level.
Figure 4 – House of Quality
Solution: Based on the information produced by the analysis of the forces, it is decided to use the Poet database. Notice that the analysis makes it quite clear that either of the databases using the object-oriented model resolves more of the forces than those using the relational model.
Resulting Context: After applying this solution, other decisions become necessary. The types of locking and the granularity of transactions must be determined in other design decisions. The result of applying this solution is a decision that does not need to be reevaluated until there is either a major upgrade of Poet that will require extensive rework of the application or until another type of model is introduced to structure persistent stores.
High quality software only comes from high quality design decisions. However, what is a good decision today may not be tomorrow. By documenting decisions and identifying boundaries at which these decisions should be reconsidered, project teams can manage decisions more effectively. The design pattern descriptive format can be used to structure the information about a decision. A number of techniques are available to assist in these decisions. Quality Function Deployment provides a systematic approach to evaluating the relative weightings of forces that determine the ultimate decision. This technique recognizes that not all the forces involved in a decision are of equal importance. The Y2K phenomenon should be ample warning that periodic introspection of fundamental decisions is required to prevent today’s reasonable assumption from becoming tomorrow’s professional embarrassment.
1. Fatemeh Zahedi, Quality Information Systems, Boyd and Fraser, 1995.