techweb techweb techweb techweb e-Commerce Solutions , Enterprise VoIP , vPro , WAN Acceleration search advanced Today's TechSearch Blog >> techweb techweb techweb Back to /Nov 95 Features / Up to /Table of Contents / Ahead to /Nov 95 Features / * P6 Unveiled o We put a P6 through its paces + An Impressive New Processor + Awesome Anatomy + Four Heads Are Better + The P6 Niche + A Life of Service + Old Apps for New o P6 Gets Aboard ATX + Replacing the AT Standard o Field Trip To A Fab + A world few people have seen + Cleanliness Counts + Smaller Is Better o By The Numbers! * STOP Don't Upgrade your Hardware for the Wrong Reason * Hammer Out a Multiprocessor Strategy * The Win95 Road Map (WordPad) * Are We Having Fun Yet? P6 Unveiled We put a P6 through its paces An Impressive New Processor Click Here to see a 237KB bitmap image of artwork which goes with this article, entitled: * The P6 Chip * *THE P6's fancy footwork, appetite for 32-bit apps and built-in second-level cache memory make it an impressive new processor. We put a P6 through its paces to give you the first in-depth look at how fast Intel's new chip really is. * *by: Gil Bassak * Right now, Intel's newest microprocessor, the P6 chip, is a mighty rare commodity. But you'll see it popping up in myriad new personal computers and server products that will be hitting the market any time now. We managed to get our hands on a preproduction, demonstration PC that included a P6 on an Intel motherboard. Our goal was to see if it really delivers the boons it promises for Windows 95 users. We'll have more on those tests later. Endowed with impressive technical credentials, the 5.5 million-transistor P6 processor is Intel's first and only x86-architecture processor honed for the new 32-bit Windows 95 applications. But these qualities alone don't reveal the fast and fancy processing capabilities the Intel designers built into it. Nor do they suggest the new applications and markets--some way beyond the desktop--that PC vendors envision once they harness P6 processing power within their products. Awesome Anatomy So what makes the P6 special? A microprocessor like the P6 requires much esoteric design, involving various trade-offs to keep the chip both manufacturable and affordable. In the end, however, it's a handful of features that make the P6 noteworthy, including 32-bit-wide processing capacity, high-speed second-level cache and glueless multiprocessing capacity. It also has high-reliability features like error checking and correction, and a trio of clever processing shortcuts collectively called dynamic execution. The result is a 133MHz processor that, for 32-b it applications, reaches an estimated 200 on the SPECint92 benchmark, roughly twice the performance of a 100MHz Pentium processor. Optimized for a 32-bit world, the P6 actually spins its wheels when running 16-bit applications and operating systems. Intel is very clear on this point: The P6 is not meant for Windows 3.x and the 16-bit programs that run under it. If you haven't upgraded to a 32-bit OS, you are much better off with a 150MHz Pentium processor. One reason for this is that the P6 gets much of its processing punch from multiple parallel "pipelines," or data streams. Operating independently, these pipelines allow the P6 to move on to other instructions and data during waiting times--called latencies--required by an earlier instruction. To take advantage of this pipelined architecture, today's 16-bit software would have to be recompiled for such "multithreaded" execution by, for example, identifying instructions that might cause latencies. Whatever minimal improvements the P6 might bring to 16-bit software simply wouldn't be worth the added cost. Instead, P6-based PCs need 32-bit operating systems like Windows 95, Windows NT and OS/2 Warp. Similarly, application software needs to be written specifically for 32-bit processing in order to utilize the P6's power and capabilities. Beyond its 32-bit capacity, the P6 stands apart from its predecessors by incorporating its own second-level cache memory. Almost, but not quite on the chip, the 16 million transistor, 256KB address and data cache sits adjacent to the P6 processor in an unusual dual-cavity package. Expected to come in a 512KB version as well, the cache enjoys unfettered access to the core processor. That access is gained through a dedicated 64-bit "backside" bus that runs at the processor's internal clock rate. This way, the P6's normal connection to the outside world--the "frontside" bus--can be adjusted to one-half, one-third or one-quarter of the internal clock rate (depending on which is appropriate for a specific sy stem's capabilities), while the cache connection continues to scream along at the full clock speed. Like the Pentium, the P6 has a fully integrated first-level cache that dedicates 8KB each to data and instruction. Both first- and second-level cache structures store the most recently used commands and data, anticipating the statistical likelihood that recently accessed memory will be needed again. In the event of a cache hit, the processor is able to access data more quickly from the cache's high-speed static memory than if it sought it from the slower system memory. In contrast, a miss occurs when the cache does not contain the desired data, forcing the processor to access system RAM. The second-level cache also balances the work loads among the processors in multiprocessor systems. On the downside, the built-in second-level cache limits system designers from increasing its size to improve performance. Intel counters this claim by pointing to the high-speed backend bus and the nonblocking feature of both cache levels. Nonblocking means the processor continues to access memory while it processes a cache miss. As a result, says Intel, tests show that a 512KB cache in the P6 performs as if it had 2MB or 3MB of conventional nonblocking cache. Four Heads Are Better The tests Intel refers to were run on a multiprocessor server, which brings us to a P6 feature that system makers especially like: the capacity for glueless multiprocessing among up to four P6 processors. In comparison, only two Pentium processors can be as easily harnessed to work in tandem. However, you'll need special symmetric multiprocessing software to access the P6's full multiprocessing power. In addition to SMP, reliable operation and data integrity are qualities the PC makers seek in adapting the P6 to server and other high-performance applications. Here vendors are reassured by the P6 processor's error checking and correction circuits between the processor core and the second-level cache. This allows the OS to eye the bus to find and fix memory contents gone bad. In addition, built-in diagnostics, fault recovery mechanisms and functional redundancy checks operate among multiple processors. The real key to the P6's high performance is its dynamic execution of instructions. Normally, a processor follows instructions strictly in the order that it pulls them from memory. Indeed, following a different order would change the program entirely. Dynamic execution, however, adjusts the flow of instructions for fastest execution, jumping ahead, if necessary, to subsequent instructions even before the current task is finished. In the end, the original instruction sequence is preserved. The ultimate aim is to keep information flowing within the P6. In all, three techniques comprise dynamic execution: multiple branch prediction, data flow analysis and speculative execution. Stated simply, multiple branch prediction anticipates branch operations based on past patterns. Branch inst ructions direct the processor down one of two different instruction paths, depending on a previous event or outcome. Although the actual branch outcomes aren't always as guessed, they are correct often enough, from 75 percent to 85 percent of the time, to speed the execution. The second technique, data flow analysis, schedules the instructions for execution as soon as they are ready, rather than by the order in which they appear in the program flow. Once the instruction is executed, its result is available to the processor in much less time than if the processor had to execute it from scratch. The third leg of dynamic execution, speculative execution, anticipates instructions that are likely to be needed. The end result of all of these features is that the P6, more than any of its predecessors, can sweep through entire sections of a program and set up its resources to make quick work of them in parallel. The P6 Niche The P6 may take the personal computer to places that were once the sole domain of expensive minicomputers and high-end workstations: 3-D modeling, engineering and design, software development, and animation and multimedia authoring, to name just a few. CompuAdd Computer Corp., for example, is working with hardware and software vendors to prepare PCs for several vertical-market systems within the medical, engineering and 3-D graphics fields. Moreover, for the small office/home office (SOHO) market, the company is preparing machines that offer high-end multimedia features. These machines will bundle with MPEG boards that take full advantage of the P6's capabilities to display film-quality video, free of the distracting delays caused by lesser processors. Using multiple processors, P6 machines will begin to compete in the workstation market for applications in the financial and business worlds, as well as decision-support functions. High-performance PCs will replace the midrange computers that, in the first downsizing wave, repla ced mainframes. With the P6 as a relatively low-cost alternative, cost-conscious companies using midsize computers will certainly take notice. The move of P6 PCs to midrange computing clearly puts the ball in the court of software vendors, who must develop applications that fully exploit both the new processor's 32-bit and multiprocessing capabilities. Advanced Web navigators, intelligent agents and games are other applications that will benefit from the P6's advent. In addition to new applications and faster operations, users can expect the P6's high-performance processing to yield enhanced interfaces, like the use of multiple windows, to ease user interaction. Normally, the greater the number of open display windows, the less attention the processor can give to other chores, like responding to requests from a modem or keyboard. The P6, however, runs fast enough to manage three or four Web browsing windows while still attending to the demands of other programs and peripheral devices. A Life of Service Scalable network servers may be another slot for relatively low-cost but potent P6 PCs. For example, NEC Technologies will use the chip to expand its Express line of network servers. Other possible destinations for the P6 include Web and rendering servers. To deliver the horsepower that servers need, up to four P6 processors will run side-by-side in the same machine without the need for "glue" logic circuits. PC makers need only drop the additional processors into awaiting sockets. In addition, the P6 already carries a number of reliability-boosting features usually reserved for servers. These include error checking and correction, built-in diagnostics and fault recovery, and functional redundancy checking among multiple processors. The P6 multiprocessor will probably be a good fit for high-end server applications, but the high cost of retraining and supporting people after they switch to new and unfamiliar 32-bit applications, as well as the cost of replacing the PCs themselves with upgraded hardware, may slow the processor's move into widespread use. Although many companies will stick to the Pentium processor in the near term, power users, who tend to need less hand-holding, are good candidates for upgrading early to the P6. Those who do move to the new processor will find a flood of applications next year. An International Data Corp. forecast predicts that 32-bit operating systems will make up 80 percent of the total number of worldwide shipments by the end of 1996. Although the Windows 95 users who employ true 32-bit applications will see speed gains, the real winners may be those who use a wholly 32-bit environment, such as Windows NT. Although 32-bit Windows 95-based applications will run better on the P6, in general, the Pentium processor is preferable for the more common 16-bit applications. Old Apps for New Demanding applications, like software-based desktop videoconferencing and continuous s peech recognition, may also tap the P6 for processing power. It remains to be seen if the P6 can pull off such processing-intensive work without the help of separate digital signal processor (DSP) chips. Intel hopes that its native signal processing (NSP) approach will blossom with the P6. NSP puts certain mathematically intensive computations, normally the work of dedicated DSP chips, squarely on the shoulders of the host processor. Such computations are needed for tasks like manipulating graphics as well as compressing and decompressing video and audio. Intel expects that by hitching NSP technology to the P6 processor, PCs will also be able to tackle authoring and rendering applications without DSPs or accelerators. Payoffs from NSP include lower-cost systems and a higher degree of adaptability, since new compression specifications for graphics, multimedia and videoconferencing can be added by simply downloading software. That's how the P6 works, along with our take on where yo u should use it. Now let's turn to the real-world tests we ran to see how the chip does in real life. We tested to see how the processor stands up against the Pentium and we've gauged its performance for 16- and 32-bit applications. The results follow in "By the Numbers!" *Gil Bassak *is a freelance technical journalist living in Ossining, N.Y. Click Here to find the e-mail IDs for our editors, who can put you in touch with this author. P6 Gets Aboard ATX Replacing the AT Standard *by: Gil Bassak and John Gartner * Click Here to see a 225KB bitmap image of artwork which goes with this article, entitled: * P6 Gets Aboard ATX * In addition to redesigning CPUs, Intel has proposed a new motherboard design to replace the AT standard. The new design abandons the popular baby AT form factor for what Intel calls the ATX. The new confi guration, according to Intel, is simpler and less expensive (about $11 less in materials cost) than the baby AT design, gives easier access to upgradable components and brings cooling air closer to the processor (lower left of illustration). Other innovations include a single power connector for peripherals, and a drive connector that's closer to the drive bays (upper right of illustration). But will the industry go along? Many companies, especially cash-strapped PC makers, appreciate Intel's efforts. Intel foots the bill for research while everyone reaps the benefits. But not everyone agrees with Intel's ideas. Deep-pocketed vendors, for example, are more likely to cut their own design path to distinguish their products from others. It's too early to know how many vendors will build systems around the ATX, but if the past is any measure the verdict will favor Intel. One difficulty, though, is that changing the processor, motherboard and chassis simultaneously may result in design difficul ties, unless Intel clearly states which changes need to be made to accommodate the new board. At least one major manufacturer has decided to create its own solution. Hewlett-Packard has developed its own motherboard in an effort to make RAM configuration more flexible. HP's desktop system design includes eight memory sockets instead of four, and supports more memory--256MB instead of 128MB using interleaved memory pairs. Other differences from Intel's design include more expansion slots and an optional on-board SCSI interface. Once the P6 becomes available, though, chances are you'll see the ATX board in all but a few PCs. Field Trip To A Fab A world few people have seen *by: John D. Ruley * I'm looking through a glass wall into a world few people have ever seen. It's a strange, artificial world, where people wear white "bunny suit" overalls and breathe through face masks and filter cartridges. No, I'm not talking about some alien planet. The place I'm talking about is in Aloha, Oreg., a tree-filled suburb some 30 miles from Portland. It's Intel's D-1 fabrication plant (or "fab"), where Intel makes its latest microprocessors. Touring D-1 is a curious experience. I'm not allowed into the manufacturing space--a gigantic Class-1 clean room that covers 67,000 square feet and runs the length of the building's total 450,000 square feet. Those entering must wear the aforementioned bunny suit, which is worn to protect the delicate silicon surfaces from human contamination. No matter how much Intel's engineers want to impress you, they won't risk tainting the clean room. A tall, friendly, open-faced Intel engineer named Kevin Hassett takes me on my tour--walking outside the room, looking in through double doors that are used only in emergencies. He explains how 200 workers on staggered shifts work three days in one week, four the next to maintain the fab's 24-hours-a-day, 364-days-a-year schedule. They close the plant one day a year to work on its electrical system. The plant produces delicate 8-inch wafers, each divided into tiny squares less than a centimeter per side. Each square is, potentially, a microprocessor. Each contains more than 3 million individual transistor switches that measure less than a micron (one millionth of a meter) in size. Cleanliness Counts Keeping the wafers clean is vital. A single speck of microscopic dust can destroy a microprocessor. Class-1 means the room is certified to contain less than one particle of dirt per cubic meter of air. That's cleaner--by a factor of 1,000--than the rooms NASA uses to assemble space probes. Wafers, which will eventually be cut up into individual microprocessors, move 25 at a time from station to station in a closed cassette that protects them from the few particles in the fab. Hundreds of wafers move through the facility at any time. Processing a wafer takes weeks. In a quiet conference room, Dr. Youssef El-Mansy, who directs Intel's Oregon-based technology group, explains the economics of chip production. Doing Class-1 fabrication lets Intel minimize the number of defects on a wafer to about one defect per square centimeter. Trying to reduce defects more than that is pretty much impossible. Smaller Is Better That's why the microprocessor's size is critical. If a microprocessor takes up much more than one square centimeter, then most microprocessors would have defects, and the "yield"--the percentage of microprocessors that come off the wafer without defects--would be very small. If the microprocessors are smaller than a square centimeter, more will be defect-free, and the yield will rise. A positive side effect is that smaller microprocessors can be run at higher frequencies--making them faster. The key to making microprocessors is to keep the wafers clean and make the microprocessors small. Dr. El-Mansy is making them very small. At the end of our meeting, he hands me a plastic keychain. Embedded in it is a defective Pentium processor from the D-1 line. Less than half a centimeter on a side, it contains well over 3 million individual transistor switches. Working together, those transistors made up one of the most powerful ICs ever inventedÐbut with one speck of dust, it's just a keychain. By The Numbers! *by: John D. Ruley and William Gee * Click Here to see a 118KB bitmap image of artwork which goes with this article, entitled: * 32-bit Low-level Tests * Click Here to see a 130KB bitmap image of artwork which goes with this article, entitled: * Memory Throughput (Read+Write) Tests * Click Here to see a 117KB bitmap image of artwork which goes with this article, entitled: * 16-bit Low-level Tests (Wintune 2.0) * Click Here to see a 117KB bitmap image of artwork which goes with this article, entitled: * SQL Server Tests ("B" Benchmark, SQL Server 6.0) * Click Here to see a 106KB bitmap image of artwork which goes with this article, entitled: * 16-Bit Application Test (Word 6) * How does the P6 stack up against other processors? We went to Intel's customer engineering laboratory in Hillsboro, Ore., to find out. While there, we ran a test series of our own on a prototype system configured with the P6. The tests included our standard Wintune 2.0 benchmark and our new 32-bit NTHell 2.0 benchmark. We also ran application tests using several versions of Microsoft Office. Since the P6 will initially be sold largely into the high-performance application server market, our tests also incorporated an unscaled, in-memory client/server database benchmark. As expected, our overall results showed that the P6 represents a marginal improvement over the Pentium a s a platform for today's 16-bit desktop applications. However, the processor's performance is equivalent to a dual Pentium when used for high-performance 32-bit computing. Since a single-processor system based on the P6 is significantly simpler (and therefore both cheaper to manufacture and more reliable) than a dual-processor Pentium, we believe that single-processor P6 systems are well-suited as application servers. As 32-bit desktop applications become more common, the P6 will migrate to desktop computers as well. This ability to deliver full performance only when running 32-bit code is reminiscent of how NT behaves when using a RISC processor. However, the P6 has one huge advantage: it runs all applications, be they 16 or 32 bit, natively, without software emulation. We noted differences in P6 performance when running Windows 95 compared to Windows NT. In particular, we measured a three-to-one difference in video (GDI) performance, probably due to Win95's use of 16-bit code in its grap hics subsystem. This indicates that P6-systems will be capable of full performance only when running an uncompromised 32-bit OS such as Windows NT. Back to /Nov 95 Features / Up to /Table of Contents / Ahead to /Nov 95 Features / Free Newsletters TechEncyclopedia TechCalendar Opinion Research Careers & Workplace Webcasts About Us Contact Us Site Map Mobile Tech News Software Tech News Security Tech News E-Business Tech NewsManagement Tech News Networking Tech News Hardware Tech News InformationWeek InternetWeek Network Computing IT Architect Optimize Magazine Financial Technology Network Wall Street & Technology Briefing Centers Bank Systems & Technology Insurance & Technology CommWeb IT Pro Downloads Secure Enterprise Intelligent Enterprise Byte and Switch Dark Reading Light Reading Unstrung Media Kit | Copyright © 2006 CMP Media LLC | Privacy Statement | Your California Privacy Rights | Feedback