The Pentium Chronicles
This is a set of obervation/discussion questions about Bob Colwell's
The Pentium Chronicles and links for further information.
The book was assigned reading for a number of semesters in CPSC 3300,
Introduction to Computer Organization, at Clemson University.
Bob Colwell
"If my readers walk away with only one idea from the book, I'd hope
it's this one: the essence of engineering is the art of the
compromise. There is no great design without constraints, and
successfully juggling those constraints against the absolute project
goals is the engineering primary job. Just about every topic I
raise in that book is a description of how I tried to find the
right balance point between two or more competing influences."
- Bob Colwell (personal correspondence, with his permission to quote,
Oct. 9, 2006)
Foreword
Wen-mei Hwu
- Compare Wen-mei Hwu's statements that parallelism should be
"under the hood" in order to simplify programming to the
recents trends of multithreaded and multicore processors. (p. xi)
- ** The ultimate principles of great project management are listed
on p. xiii as
- Acquire the best minds.
- Organize them so that each of them operates within their
comfort zones.
- Protect them from the forces of mediocrity.
- Help them to take calculated risks.
- Motivate them to tie up loose ends.
Were these principles evident in the DG Eagle project? Explain.
** - marks a question that refers back to
The Soul of a New Machine
Preface
Fred Pollack
- What was the importance of Fred Pollack's leadership? (p. xvii)
- ** Was there equal leadership involved in the Eagle Project?
Which DG leader(s) provided it?
Chapter 1 - Introduction
P6 architecture team working under Bob Colwell
|
|
|
|
Glenn Hinton
|
Dave Papworth
|
Michael Fetterman
|
Andy Glew
|
Also, Randy Steck was the P6 design manager, and Gurbir Singh was
the chief bus designer (Singh's MS degree is from Clemson ECE).
- What is "proliferation thinking"? (pp. 3-4)
- Why should a project not begin until the right leadership is in
place? (p. 7)
- Identify and explain the four major phases of a project,
according to Colwell. (pp. 7-8)
- ** Identify dates for (or events that mark) the concept, refinement,
realization, and production phases for the Eagle project.
Chapter 2 - The Concept Phase
- Why was the dual-die approach a potential design mistake? (p. 12)
- "What constitutes success for this project?" Why is this question
important, and what was the answer for the P6? (p. 13)
- Why did meeting in a storage room help? (p. 18)
- Why should you keep a record of the "roads not taken"? (p. 19)
- Explain the acronym DFA. What did the DFA tool do? (pp. 21-24, see
also the Appendix, pp. 173-178)
- Why can't all teams be compared equally? (p. 28)
- Should you overpromise or overdeliver? What is Colwell's advice?
(p. 29)
- Colwell applies this promise/deliver choice to a microprocessor
design task. Does a similar choice apply to predicting schedules
for software projects? Explain.
- ** Did Tom West choose to overpromise or overdeliver on the
Eagle project? Does your answer depend on his target audience,
and does your answer depend on whether it was regarding performance
or schedule? Explain. (E.g., "'We're always assuming that things'll
break right for us,' observed Alsing." p. 117 in Kidder)
- How long are typical companies' roadmaps (i.e., planning horizons)?
(pp. 31-33)
- Explain why software vendors look "sideways or backwards". (p. 34)
- What is likely to happen if management treats team members as
"field-replaceable units"? (p. 35)
- Explain the effect of geographic distance on "us-versus-them"
conflicts. (p. 39)
- Isn't cubicle floorplanning too much micro-management? Explain
how it helped to do this kind of planning in the early part of
the P6 project. (pp. 39-40)
- Explain how Colwell solved the coding standard conflict. (p. 42)
[I think he confuses the functions of Unix commands "indent" and
"lint". Both are mentioned in the P6 coding standards document.]
-
Google C++ Style Guide
Chapter 3 - The Refinement Phase
- What can happen when a team gets done early? Would a team leader
be tempted to sandbag to avoid this? (p. 44) How would you
counteract this?
- What can happen when a team leader wants to show how self-sufficient
he or she is? (p. 44)
- How did writing the behavioral model help? (pp. 49-51)
Was it worth it? (Or is Colwell being defensive in the face of
complaints that it "wasn't worth it"?)
- What is a POR? (see pp. 14-15) Should it be on-line? Explain.
- What is an ECO? Should they be electronically tracked? Explain.
- What went wrong with ECOs in managing the Willamette project? (pp. 53-54)
- What is the role of an ECO czar? (p. 57)
- What was the benefit of video-taped talks by the architects? (p. 60)
- Why should you not "make an example" of someone who makes a design
mistake? (p. 62)
- Explain how an overzealous desire to measure the project schedule
can lead to more bugs. (p. 63) What can you do to counteract this?
- Explain how bug measurement can sometimes lead to undercounting bugs.
(p. 66) What can you do to counteract this?
- Explain why late changes are typically more error prone. (p. 67)
- What are the differences between the following terms?
- bug sighting
- anomaly
- confirmed bug
- Why shouldn't a design engineer be allowed to declare that a
bug is fixed? (p. 69)
- ** Why is debugging a microprocessor in the 1990s harder than
debugging a superminicomputer in 1980?
- What did the P6 architects add to the hardware to make the
design easier to debug? Compare the hardware help(s) to
software techniques such as
- gdb trace back of call stack
- program events (e.g., calls and parameters) written to
a log file on disk or to a circular buffer in memory
(sometimes called a "flight recorder")
- other software techniques
- "Foster a design culture that encourages an emotional attachment by
the designer to the product they are designing (not just their part
of the product). But engineers must also be able to emotionally
distance themselves from their work when it is in the project's best
interests." (p. 72)
Explain what this means and how it should affect a design review.
- ** Did Tom West foster within the DG Eagle team the emotional
attachment-but-ability-to-distance that Colwell recommends? Explain.
- How can a design review become an emotional minefield? (p. 75)
- At what point in a project should design reviews start? (p. 75)
Chapter 4 - The Realization Phase
- What is bad about a too-early start to the realization? (p. 81)
- What is the "essence of engineering"? (p. 81)
- Why was the 1-percent rule important? (pp. 81-82)
- What was the purpose of starting the MAS documents early
in the refinement phase? (p. 83)
- Why is "pipelining" the architects not a good idea? (p. 83)
- What is Colwell's version of "pinball"? -- If you do not ____,
you do not get to keep playing. (p. 85)
- Why do performance monitoring and testability need a special czar?
(p. 89)
- What is gratuitous innovation, and why is it considered
harmful? (pp. 89-90) ** How much did gratuitous innovation
influence Josh Rosen in his work on the ALU board for the DG Eagle?
- Why would upper management not be supportive of adding new tests
to the validation plans as validators learned more about the
design? (p. 91)
- see Figure 2 ("Bugs down over time = manager bonus.")
and Figure 3 ("No bonus.") in Al Bessey, et al.,
"A Few Billion Lines of Code Later: Using Static Analysis to
Find Bugs in the Real World," CACM, Feb. 2010
- "Representative story: At company X, version 2.4 of the tool
found approximately 2,400 errors, and over time the company
fixed about 1,200 of them. Then it upgraded to version 3.6.
Suddenly there were 3,600 errors. The manager was furious for
two reasons: One, we 'undid' all the work his people had done,
and two, how could we have missed them the first time?"
- in the Bessy article, see the discussion of tool "upgrade models",
including choices such as "Never before a release"
- How did Colwell's HOTM metric help "manage his manager"? (p. 92)
- What is a "shoot the engineers" point in a project? (pp. 92-93)
- What did Colwell do when Bentley reminded him that validators
"don't put the bugs in"? (p. 93)
- What are the reasons Colwell gives for big companies fielding
multiple design teams? (p. 94)
- ** Compare Colwell's thoughts regarding a request for a new
simulator for Willamette to West's thoughts about a simulator
for Eagle. (p. 97)
- Why would Colwell sometimes "sit on" company standardization
efforts? (p. 97-98)
- What motivates a team to go "above and beyond"? (pp. 99-100)
- ** Compare Colwell's approach to the Tom West quote:
"No one ever pats anyone on the back around here. That's how it works."
- Why did Colwell at one point decline extra head count? (pp. 105-106)
- Compare Colwell's decision to decline extra head count with
"Brooks' Law" in Fred Brooks, The Mythical Man Month:
"Adding manpower to a late software project makes it later."
- Why did Colwell at one point request extra head count? (p. 106)
What was the difference in the respective tasks? How much
communication was required among the respective teams for the
tasks?
- Why did the "estimate of work accomplished" and "estimate of
effort remaining" curves never converge? (Fig. 4.1 on p. 107,
also p. 110)
- Why should a "crunch phase" be limited to six months? (pp. 110-111)
- ** How long was the Eagle project's crunch phase?
- Colwell did not like being pressured by his design manager on the
Willamette project (pp. 110-111). Yet he pushed people underneath
him at times (e.g., Bentley, p. 93). Is Colwell being hypocritical,
or is there value in his opinion that the architects should
should have "some time to think"?
- What were the three issues that caused Colwell to stop the project
and devote efforts to optimize? (p. 82, p. 95, and p. 113)
Note: "tapeout" is the point in the project when the design is
released to manufacturing.
(see, e.g., Wikipedia)
Chapter 5 - The Production Phase
- What is the outcome of the concept phase? The refinement phase?
The realization phase? (p. 8, p. 81, and p. 115)
- What are the barriers faced in driving a new design into production?
(pp. 116-117) See, e.g.,
- How does pre-silicon testing of cache SRTL differ from post-silicon
testing of the actual cache hardware? (p. 117)
- What is a speed curve? (p. 118)
- see D. Ditzel,
"A Conversation with Dan Dobberpuhl,"
ACM Queue, Dec. 5, 2003
DITZEL: When you build chips, do they all come out at the same speed and
the same power?
DOBBERPUHL: Don't we wish? The semiconductor process variation is quite
wide and follows a statistical distribution that designers target
anywhere from three to six sigma points of the variations, depending
on what kind of yield they want to get. In terms of frequency, it's
not unusual to have a 40 or 50 percent spread from the slowest to
fastest coming out of the same process line.
- What was the purpose of the "war room"? List some of the issues that
were handled. (pp. 119-120)
- Explain the three reasons Colwell cites why you might choose not to
fix a bug. (see the footnote on p. 120)
- What are test vectors? (p. 122)
- Identify and discuss some performance tuning tips for the P6.
- Was the choice to favor 32-bit performance over 16-bit performance
a good decision? If so, why was Colwell subject to so much
criticism? Explain your answer. (pp. 124-127, also see p. 27)
-
"Intel's Rivals Ready to Exploit P6 Weakness," Byte, October 1995
[link broken]
(from "Poor, Poky P6" illustration)
"BYTE's cross-platform BYTEmark CPU and FPU benchmarks confirm
that a 90-MHz Pentium outperforms a 150-MHz P6-based system
running in 16-bit code. The performance of the P6 improves over
that of the Pentium when running 32-bit code, as you would expect.
The P6's difficulty with 16-bit code is less pronounced at the
application level, but it's still noticeable. Practically any
100-MHz Pentium-based machine will outrace a 150-MHz P6-based
computer when running Windows 3.1 applications."
- Why should you hold back some design tasks for the engineers who
are finishing up another project? (p. 129)
- Explain the "Windows NT Saga". Why was it a "Dilbert" moment?
(pp. 130-132)
- How did Colwell handle the 16-bit performance criticism at the
rollout? What was his goal? (p. 133)
- What are some of Jerry Weissman's tips on making a presentation?
(pp. 134-136) See also
Chapter 6 - The People Factor
VLSI Design Engineer
Intel offers you the opportunity to develop world-class expertise
in architecture, logic or circuit design. You may also choose to work
in product teams where you will acquire broad skills at all levels of
design. The primary focus of your job will be specifying and implementing
effective design solutions.
Intel Job Description on www.intel.com, 1997
- Describe the recruitment process. Why did they need to do a
"strcpy test"? (pp. 138-139)
- What qualities did Colwell identify as typical in a "fast tracker"?
(pp. 139-140, see also Colwell's self-description on p. 167)
(Cf. Rasala wanting engineers "who took an interest in the entire
computer", p. 150 in Kidder)
- See the work by Robert Kelley, e.g., as described in
Alan Webber,
"Are You a Star at Work?" fastcompany.com, May 1998.
- study at Bell Labs to determine the characteristics that
could predict the star engineers - found that IQ, school
attended, etc., were not good predictors
- "The secrets to being a star are not in people's personal
characteristics but in how people go about doing their work."
- take initiative
- beyond your job description
- something that will help the company succeed
- something that will help other people (not just something
that will get you noticed)
- often involves risk
- see it through to completion
- earn your way into a network of expertise
- must be willing and able to help others with your expertise
- like a bartering system, you must bring something with you
to trade
- self-management
- time, personal relationships, career skills and growth
- perspective
- understand your job within the context of the company
- see problems from different viewpoints (e.g., competitors,
customers, managers, team members)
- recognize patterns
- ability to be a follower and help leaders accomplish company goals
- team work
- be willing to accept group commitments, schedules, etc.
- build consensus; help reduce team conflict
- understand the company organization and competing interests
within the company
- determine who to trust and who to avoid
- determine which issues within the company will become
important and which should be ignored
- communication skills
- match communication to audience
- don't overcommunicate
- "The good news is, all of these skills can be learned."
-
a York College of Pennsylvania survey on professionalism, 2009
- characteristics of professionalism
- "personal interaction skills, including courtesy and respect"
- "the ability to communicate, which includes listening skills"
- "a work ethic which includes being motivated and working on a
task until it is complete"
- "appearance"
- valued traits
- "accepts personal responsibility for decision and actions"
- "is able to act independently"
- "has a clear sense of direction and purpose"
- "Many companies favor job candidates with stellar academic
records from prestigious schools - but AT&T and Google have
established through quantitative analysis that a demonstrated
ability to take initiative is a far better predictor of high
performance on the job." from
T.H. Davenport, J. Harris, and J. Shapiro
"Competing on Talent Analytics,"
Harvard Business Review, October 2010
- "The seven top characteristics of success at Google are all
soft skills: being a good coach; communicating and listening
well; possessing insights into others (including others
different values and points of view); having empathy toward
and being supportive of one's colleagues; being a good
critical thinker and problem solver; and being able to make
connections across complex ideas." from Cathy Davidson
(in a Valerie Strauss column),
"The surprising thing Google learned about its employees - and
what it means for today's students,"
Washington Post, December 20, 2017
- See Srilatha Manne,
"The Road to Success In Industry."
blog post, Computer Architecture Today, Oct. 5, 2017
- What are some of the difficult cases in firing someone? When would
demotion be a good outcome? (p. 141)
- What does it mean to "own your own career"? (footnote on p. 142)
- Compare "constructive confrontation" with "win/win". (p. 144)
- from Cheryl Dahle,
"Is the Internet Second Nature?", Fast Company, June 2001:
"Intel takes ideas seriously too -- so much so that it subjects
them to a trial by fire. The company even includes a segment on
'constructive confrontation' in the training that it offers all
new hires. The class teaches employees how to rip into one another's
ideas without actually ripping into one another. 'We have this
common way to disagree, and that gives us speed,' says Michael Fors,
37, a comanager of Intel University and an instructor for the course.
'We don't spend time being defensive or taking things personally.
We cut through all of that and get to the issues.'"
- from Bill Rosenblatt's review of
Inside Intel:
"Intel's aggressive corporate culture also became the model
for high-tech companies. It's a style they call 'constructive
confrontation.' (Grove, in his book Only the Paranoid Survive,
calls it 'debate.') The culture of constructive confrontation
demands that everyone eschew politeness and restraint when
expressing opinions. People who are shy or who can't take the
heat (like, say, IBM middle-management veterans) could never
survive. Inside Intel tells stories of many meetings
among top management devolving into shouting matches that
verged on physical violence."
- Bob Sutton,
"An Insider's View of Constructive Confrontation at Intel,"
May 2007
-
"Constructive Confrontation: When Conflict Enhances Collaboration,"
Wharton@Work newsletter, March 2008
- Stephen Covey,
The 7 Habits of Highly Effective People: "
Win/win
is a frame of mind and heart that constantly seeks mutual benefit
in all human interactions. One person's success is not achieved at
the expense or exclusion of the success of others. It's not your way,
or my way; it's a better way or a higher way."
- Guy Burgess and Heidi Burgess,
"Constructive Confrontation: A Strategy for Dealing with Intractable
Environmental Conflicts"
- Explain how OpX can draw a house painter's attention to the scaffolding
rather than the house. (p. 144, see also p. 167)
- Describe iMBO. (pp. 150-151)
- Describe the dangers of a corporate entitlement attitude. (pp. 152-153)
Chapter 7 - Inquiring Minds Like Yours
- Compare Colwell's feelings regarding the Merced project (pp. 162-163)
with this advice from Fred Brooks in The Mythical Man Month:
"Plan to throw one away; you will, anyhow."
- (from Brooks, p. 116)
"In most projects, the first system built is barely usable.
It may be too slow, too big, awkward to use, or all three.
... Where a new system concept or new technology is used,
one has to build a system to throw away, for even the best
planning is not so omniscient as to get it right the first
time. The management question, therefore is not whether
to build a pilot system and throw it away. You will
do that. The only question is whether to plan in advance to
build a throwaway, or to promise to deliver the throwaway to
customers. ... Delivering that throwaway to customers buys
time, but it does so only at the cost of agony for the user,
distraction for the builders while they do the redesign, and
a bad reputation for the product that the best redesign will
find hard to live down."
(emphasis in original)
- Note that in the 20th anniversary edition of MMM, Brooks
advocates for an incremental-build approach to software projects
so that user testing can begin early and so that a build-to-budget
strategy can be adopted.
- Is Intel a sweatshop? (p. 164-166)
- Compare Colwell's thoughts with those of Paul Otellini
recorded in Rich Karlgaard,
"Intel CEO Otellini on Successful Company Culture,"
Forbes, February 16, 2011
In your 36 years, Paul, how has Intel's culture changed - and how
has it stayed the same?
"Instantaneous communication is the biggest change in my career. I can
do a video that instantly goes to Intel's 80,000 employees. Information
has become a competitive advantage. Speed of business is obviously faster.
Ninety percent of the revenue Intel gets in December comes from products
that weren't on the market in January. Another change is work life balance.
Intel used to be a company of 8am check-ins and timecards. But now people
can telecommute, take time out during the day."
Wasn't Andy Grove the author of Intel's notorious time cards?
"Hah, yes. But that changed when we taught Andy email. Now Andy emails
everyone all the time ... (laughs) ... to my eternal regret."
...
How would you summarize the Intel culture?
"Egalitarian. Merit based. That came from Noyce. Anyone can speak in a
meeting, but you must speak with data. That came from Moore. Take risks.
Embrace innovation, but do it with discipline. That's Grove. World-class
manufacturing came from Barrett. I've added a marketing component.
Long term consideration - for discussion during whole book review
Technical info on the P6
Background and technical description of the Intel P6 microarchitecture
More of Bob Colwell's writings and presentations
List of At Random columns in IEEE Computer, 2002-2005
"Engineering Lessons from the Pittsburgh Steelers,"
presentation at Grove City College, April 2009
Oral history of Robert P. Colwell,
interviewed by Paul Edwards, August 2009
"The Chip Design Game at the End of Moore's Law,"
keynote speech at HotChips 25, 2013