Testing Analysis and Design Models

This website is preserved for historical and scholarly reference and is no longer actively maintained.

Testing Models: the Requirements Model

John D. McGregor

When was the last time you left a design review session feeling liked you had really improved the quality of the system you were developing? Reviews are so human-intensive that most projects seek to minimize the time allocated to this activity because it is seldom that we can really answer my first question in the affirmative. One reason for this is that while design reviews may follow a structured agenda, it is seldom that a design is systematically and completely examined with respect to its requirements. Often the model itself is used to determine the content of the review as opposed to some external, reliable source. As a result, we can seldom have a feeling of closure from a review session.

One technique for reducing major code churn in a project that is using heavy doses of concurrent engineering is to rely on easily modified models as the system representation. The project can then delay coding until the requirements and architecture are firm. The larger the project, the more beneficial this procedure is. But this only works if the models are considered as "real" as the final code. This is seldom the case in our business despite the fact that a growing number of projects write some or all of their code at the touch of a button (from inside a CASE tool).

In the last column I listed a number of areas in the software development process that support quality construction practices. One of those areas was the review and inspection of analysis and design models In this column and the next, I will describe some techniques that ensure that the models are more completely examined than in the typical review. I will also provide examples of applying the technique to specific models.

Test Vs Review

Design and code reviews are an established part of many development processes. They are very effective at locating certain types of faults in a system. Where they are not effective is in finding something that is not there. That’s not a misprint! Errors of omission are particularly difficult to recognize. Reviews by their very structure are usually biased toward validating the material presented for review rather than searching for those concepts that should be there but that have been omitted.

A test is a comparison between what is and what should be. The product being examined, such as code or a diagram, is constructed from some type of specification for that product. Every test hypothesizes that if the product receives a particular input it will respond in the specified manner. The input is "applied" to the product and the resulting state of the product is evaluated to determine whether the product performed as required. The simulated execution using the test case as input is often where this process becomes labor intensive. More about this below.

What makes a test more powerful than a review is the systematic search of the input space. Although we can’t examine every possible requirement, systematically selecting test cases is far superior to randomly selecting those that we will verify. The search can be defined to explore the most risky or the most frequently used sections of the program or to emphasize those types of errors that are particularly critical to the project.

Basic Model Testing Procedure

The basic procedure for testing a model is very similar to that for a design review with the addition of steps for generating and executing the explicit test cases. For a given model under test (MUT), there is some reference model that served as the basis for the construction of the MUT. For example, the domain analysis model serves as the reference model for construction of the application analysis model. This reference model is often used as the basis for determining whether the MUT is behaving as specified or not.

Figure 1- Model to Model

Testing steps

Identify the model(s) to be tested. The test should be limited to a single level of model, such as either an analysis or design model. Each model usually comprises multiple diagrams. On most projects of any size, the entire model will not be tested at one time. The portion of the model to be tested should be identifiable and will often correspond to an organizational unit such as a team.

Specify the levels of coverage desired. Coverage is stated in terms of some subset of the model that will be examined. Appropriate levels may come from analysis of the environment within which the system executes or from standards set by the customer or regulatory authorities.

Select people to participate in the review process. This is more important than for traditional code testing. The model testing process often involves manually simulating the MUT’s reaction to its inputs. This simulation is human driven unless the models have been constructed using special tools. The "testers" should include representatives of the group that produced the MUT, representatives of the group that produced the reference model and representatives of the group that will use the MUT as their reference model.

Select the test cases for the MUT. This is a critical step in the technique. Rather than being guided by what is in the MUT, this step supports the systematic selection of what gets reviewed. The specific test cases are usually derived from models constructed earlier in the development process. These tests consist of specific scenarios that will be run through the MUT. For example, the inputs to testing analysis models are derived from the use cases in the requirements model.

"Execute" the scenarios. This amounts to a desk check of the model. The test session involves the participation of the model builders in an interactive process. This is the step that is most specific to the particular model. In the next section I will discuss this further and then I will expand further in the sections that discuss specific models.

Evaluate the results of the tests. Model tests are typically so tightly incorporated into the development process that this evaluation is continuous throughout the development period. Faults found are relayed to developers more quickly than in traditional testing because the developers provide the simulation. They experience the faults first hand!

Evaluate the process. The last step in any good process is an evaluation of the process itself. Here we can ask questions such as

What was our yield (faults/page)?
Was this the right set of people to have in the session?
Was this the correct time to schedule the review?

The result is an analysis that can affect the effectiveness of future reviews.

Testing Session

The actual test is carried out in an interactive session that often involves creating diagrams in the development notation. That is, the application of a test case usually is documented by constructing a diagram just as the design process would. In fact, if the development team is using the UML notation and a comprehensive modeling approach, this activity will be an integral part of the development process. For example, once the class diagram and some level of state model have been constructed, a message sequence diagram can be constructed for a use case scenario. As the message sequence diagram is created, the class diagram can be checked to be certain that all messages in the sequence diagram correspond to associations in the class diagram. The sequence of messages shown in the message sequence diagram should also be checked to determine whether it conforms to the allowable sequence shown in the state diagram.

In a later section I am going to discuss some specific models. In that section I will describe exact activities that are used to structure the examination.

Evaluation Criteria

There are three basic criteria for evaluating a model. Each can be the focus of specific test cases. Later when I discuss specific models, I will give some examples of test cases for each of these criteria.

Correctness

A model is correct if it is judged to be equivalent to some reference standard that is assumed to be an infallible source of truth (an oracle in the testing jargon). The standard often is a human expert who judges based on their knowledge. In particular, in object-oriented software projects, this oracle is often referred to as a domain expert.

Completeness

A model is complete if no required elements are missing. It is judged by determining if the entities in the model describe the aspects of knowledge being modeled in sufficient detail for the goals of the current portion of the system being developed. This judgment is based on the model’s ability to represent the required situations and on the knowledge of experts. In an iterative process, completeness is considered to be relative to the level of maturity and scope of content expected in the current increment.

Consistency

A model is consistent if there are no contradictions among the elements within the model. This may be judged by considering whether the relationships among the entities in the model allow a concept to be represented in more than one way. Other indications include whether it is possible to find contradictions, such as differing cardinalities, in the way the model represents a concept or scenario.

There are actually two levels of consistency we should consider. A diagram is internally consistent if there are no contradictions between the elements in the diagram. A model is internally consistent (same as a diagram being externally consistent) if there are no contradictions between the elements in the different diagrams that comprise the model.

Specific models

I obviously don’t have sufficient space to discuss all of the models built on the typical development project nor all of the different notations used for various methods. In this column I am going to discuss testing the requirements model and next month I will focus on a couple of models at very different levels. Throughout I will use the UML as the defining notation. I will also make the same distinction in this discussion between diagrams and models as is made in the UML approach. A diagram is a single type of picture while a model may use several different diagram types to present several views of the system.

Requirements model

An anxious manager asked me the other day if there wasn’t some way that he could test the requirements model before expending lots of effort in designing and implementing based on incomplete or incorrect requirements. My response was "I am glad to asked me that!" Since this is the very first model created in a project, there is no formal reference model to use to evaluate the use case model. But, since it is the first model, it is quite important that it be validated. I will describe a technique in which the requirements writers and other domain experts will serve as our "input model" upon which to base our tests. The remainder of this discussion will be organized around the process outlined above.

Identify the model(s) to be tested. The requirements model in an object-oriented. project comprises the use case diagram and the actor diagram. The purpose of this model is to define what the system is intended to do. It is useful to test this model since an error here has consequences over the entire development life cycle. It may also be one of the most difficult to "test". The discussion here will assume that the use cases have been decomposed using the uses and extends relations. The test of this model might be automated if an executable requirements model has been constructed; however, since this is still the exception rather than the rule I will use a manual testing approach.

Figure 2: Use Case model

Specify the levels of coverage desired. The level of test coverage for the use case model is determined by the degree to which the actor model is covered. That is, how representative of the user community is the pool of reviewers? In Figure 2, the actor diagram illustrates the equivalence classes of system users and the distinct testing perspectives represented by each equivalence class. I suggest that there be a person for each equivalence class of user to ensure adequate representation. But the most important, and difficult selection criteria, is to find representatives that will commit sufficient time to do an adequate job.

Select people to participate in the review process.

For a test of requirements the three audiences that should be represented in the testing session include: requirements writers, development personnel and users. The "users" may be represented by experts within the company who focus on the domain, representing a breadth of knowledge, rather than specific products. These people are valuable for judging the correctness and consistency of the requirements. The requirements and development personnel are in a better position to judge the completeness of the requirements model relative to the product being developed. The subject matter experts will raise an issue if they see missing concepts, but the requirements writers can judge whether the missing concept falls within the scope of the application.

Select the test cases for the MUT.

This is the most difficult step to motivate project personnel to complete and managers to tolerate. Many developers feel that they "know" what should be in the requirements model and don’t need to spend the time. My experience has shown that developers make assumptions in the absence of explicit requirements. These assumptions are usually based on experience from previous projects. Often the features about which assumptions are made are just the ones being changed in the new product.

A test case is a scenario similar to a use case. The test cases are constructed by those reviewers who have not created the original requirements model. The test case writers should select situations that are as unusual as possible while still being realistic requirements. This provides one basis for determining the completeness of the model. The scenarios should be very specific and should be constructed before the reviewers examine, and possibly become biased by, the MUT.

"Execute" the scenarios.

The requirements model is a difficult one to consider for execution. I have had success using the message sequence diagram notation to represent any algorithm. For the requirements model, each vertical line corresponds to a use case fragment. The example in Figure 3 shows three use case fragments that are related by the uses relation. The testers should be able to represent each test case using this technique and the decomposed use case model.

Figure 3: Message sequence chart

Evaluate the results of the tests.

A correctness fault in the requirements model would be evidenced by situations in which a valid use is incorrectly described. Perhaps no mention is made of data that must be provided as part of the use or a required pre-condition use is omitted.

An obvious completeness fault would be a valid use of the system that is not included in the use case model. An even more critical error is the omission of a valid actor from the actor diagram. This can cause the omission of a large number of required uses of the system. A third type of completeness fault would occur when a sub-use case is not available to fulfill a uses relation required by a system level use case.

A consistency fault would be evidenced if two uses handle similar user requests but in very different ways. The chances of this type of error occurring are greatly reduced if the use model has been properly structured. However, a diagram in which the use cases are decomposed raises the increased possibility that the model may not be complete because the use case descriptions are no longer atomic.

The test process that I presented earlier is administered by the system test team supplemented by a group of users, or surrogates for the users such as system engineers who are responsible for writing the system requirements. Users can be aggregated into groups having related uses of the system much as I have previously described constructing equivalence classes for input data. In Figure 2, the two operators having similar use of the application can be grouped and one chosen to represent that perspective on the system. The representative of each user equivalence class can be selected for the test group based on convenience considerations such as who has a tight deadline, who is located local to the testing team and other factors including:

A need for breadth of representation,
A need to give priority to critical uses and/or users, and
A need to explore areas most likely to contain faults.

This group reviews the model by having each user examine those use cases related to their use of the system using the three criteria discussed above.

Each "tester" would examine the model for completeness by determining that all their uses of the system were represented in the model. Each tester would examine the full text of the use cases that describe their uses to determine that the use case describes each correctly. Finally, the group of testers would examine the model to determine whether uses that are similar, yet have different users, are handled consistently in the model.

"Passing" this test would be achieved if the reviewers can not identify any missing, conflicting or incorrect uses of the system. This conclusion should be reached only after each user representative has verified that all their required uses have been accounted for in the model.

Evaluate the process.

There are a number of techniques that can be used to evaluate the quality of the testing process and to identify improvements that should be made. Below I will very briefly discuss a few ideas.

How effective was the testing session? The yield in this type of review can be based on the number of comments made per page of document. This would be a very crude measure. A refinement on this would be to count only those comments that result in a modification to the MUT. A manager might prefer to evaluate the number of modifications per hour. The basic idea is to use a measure for which an increase relates directly to the purpose of the activity. This is as opposed to a measure such as pages reviewed per hour in which an increase or decrease does not relate directly to the real purpose of the activity: finding faults!

Was the review scheduled at the correct time? I analyzed the results of one review recently that occurred immediately after a major shift in the application’s architecture. In fact, the major change occurred within the time when the document was frozen for comment. The decision was made to continue the review in spite of the change. Approximately half of the comments made during the review were because of this change! The yield of the review was very low because of this preoccupation with problems that the creators of the model fully understood and would fix as soon as the document was unfrozen.

Were the correct people involved in the review? This testing approach requires knowledgeable participants since many of them will provide input into the symbolic execution of the model. Incorrect test results can be the result of incorrect input rather than an incorrect model. The process also requires judgment about when the answer is correct as well as complete and consistent.

Summary

The systematic testing of analysis and design models is a labor-intensive exercise that is highly efficient. The technique can be made more efficient by a careful selection of use cases to serve as test cases. Depending upon the selection technique, the faults that would have the most critical implications for the system or those that would occur most frequently can be targeted. Defects identified at this level affect a wider scope of the system and can be removed with less effort than defects found at the more detailed level of code testing. The activities I described here work closely with the typical development techniques used by developers following an object-oriented method. By integrating these techniques into the development process, the process becomes self-validating in that one development activity identifies defects in the products of previous development tasks. Testing becomes a continuous process that guides development rather than a judgmental step at the end of the development process. Several of these techniques can be automated, further reducing the effort required. The result is an improved system which ultimately will be of higher quality and in many cases at a lower cost due to early detection.

References

Martin Fowler. UML Distilled, Addison-Wesley, Reading Massachusetts, 1997.

John D. McGregor. Planning for Testing, Journal of Object-Oriented Programming, February 1997.

John D. McGregor. Quality - Thy name is not testing, Journal of Object-Oriented Programming, March 1998.