1 MPUTERS C O WITHOUT CL CKS O ASYNCHRONOUS CHIPS IMPROVE COMPUTER PERFORMANCE BY LETTING EACH CIRCUIT RUN AS FAST AS IT CAN By Ivan E. Sutherland and Jo Ebergen How fast is your personal computer? When people ask this question, they are typically referring to the frequency of a minuscule clock inside the computer, a crystal oscillator that sets the basic rhythm used throughout the machine. In a computer with a speed of one gigahertz, for ex- ample, the crystal “ticks” a billion times a second. Every action of the computer takes place in tiny steps, each a billionth of a second long. A simple transfer of data may take only one step; complex calculations may take many steps. All operations, however, must begin and end according to the clock’s timing signals. OLIVIER LAUDE SCIENTIFIC AMERICAN 62 AUGUST 2002 62 SCIENTIFIC AMERICAN COPYRIGHT 2002 SCIENTIFIC AMERICAN, INC.
2 ADVOCATE FOR ASYNCHRONY: Ivan E. Sutherland, one of the authors of this article, has been called “the father of computer graphics.” Now the leader of a research group at Sun Microsystems Labor atories, he holds a silicon wafer containing UltraSPARC IIIi processor chips, which use some asynchronous circuits. COPYRIGHT 2002 SCIENTIFIC AMERICAN, INC.
3 Because most modern computers use a single rhythm, we — a chip chronous design to build a data-driven media processor — and Philips Electronics call them synchronous. Inside the computer’s microprocessor for editing graphics, video and audio for two of its pagers. produced an asynchronous microcontroller chip, a clock distribution system delivers the timing signals from Asynchronous parts of otherwise synchronous systems are also the crystal oscillator to the various circuits, just as sound in air beginning to appear; the UltraSPARC IIIi processor recently in- delivers the beat of a drum to soldiers to set their marching troduced by Sun includes some asynchronous circuits developed pace. Because all parts of the chip share the same rhythm, the by our group. We believe that asynchronous systems will be- output of any circuit from one step can serve as the input to any come ever more popular as researchers learn how to exploit their other circuit for the next step. The synchronization provided benefits and develop methods for simplifying their design. Asyn- by the clock helps chip designers plan sequences of actions for chronous chip makers have achieved a good measure of techni- the computer. cal success, but commercial success is still to come. We remain The use of a central clock also creates problems. As speeds a long way from fulfilling the full promise of asynchrony. have increased, distributing the timing signals has become more and more difficult. Present-day transistors can process data so Beat the Clock quickly that they can accomplish several steps in the time that of asynchronous WHAT ARE THE POTENTIAL BENEFITS it takes a wire to carry a signal from one side of the chip to the systems? First, asynchrony may speed up computers. In a syn- other. Keeping the rhythm identical in all parts of a large chip chronous chip, the clock’s rhythm must be slow enough to ac- requires careful design and a great deal of electrical power. commodate the slowest action in the chip’s circuits. If it takes Wouldn’t it be nice to have an alternative? a billionth of a second for one circuit to complete its operation, Our research group at Sun Microsystems Laboratories seeks the chip cannot run faster than one gigahertz. Even though such alternatives. Along with several other groups worldwide, many other circuits on that chip may be able to complete their we are investigating ways to design computing systems in which operations in less time, these circuits must wait until the clock each part can proceed at its own pace instead of depending on ticks again before proceeding to the next logical step. In con- the rhythm of a central clock. We call such systems asynchro- trast, each part of an asynchronous system takes as much or nous. Each part of an asynchronous system may extend or short- as little time for each action as it needs. Complex operations en the timing of its steps when necessary, much as a hiker takes can take more time than average, and simple ones can take less. long or short steps when walking across rough terrain. Some of Actions can start as soon as the prerequisite actions are done, the pioneers of the computer age, such as mathematician Alan without waiting for the next tick of the clock. Thus, the sys- M. Turing, tried using asynchronous designs to build machines tem’s speed depends on the average action time rather than the in the early 1950s. Engineers soon abandoned this approach in slowest action time. favor of synchronous computers because common timing made Coordinating asynchronous actions, however, also takes the design process so much easier. time and chip area. If the efforts required for local coordination Now asynchronous computing is experiencing a renaissance. are small, an asynchronous system may, on average, be faster Researchers at the University of Manchester in England, the Uni- than a clocked system. Asynchrony offers the most help to ir- versity of Tokyo and the California Institute of Technology have regular chip designs in which slow actions occur infrequently. demonstrated asynchronous microprocessors. Some asynchro- Asynchronous design may also reduce a chip’s power con- nous chips are already in commercial mass production. In the sumption. In the current generation of large, fast synchronous late 1990s Sharp, the Japanese electronics company, used asyn- chips, the circuits that deliver the timing signals take up a good chunk of the chip’s area. In addition, as much as 30 percent of Overview/ Cloc kless Systems the electrical power used by the chip must be devoted to the Most modern computers are synchronous: all their ■ clock and its distribution system. Moreover, because the clock operations are coordinated by the timing signals of tiny is always running, it generates heat whether or not the chip is crystal oscillators within the machines. Now researchers doing anything useful. are designing asynchronous systems that can process In asynchronous systems, idle parts of the chip consume neg- data without the need for a governing clock. ligible power. This feature is particularly valuable for battery- SPARC ARE REGISTERED TRADEMARKS OF SUN IN THE U.S. AND OTHER COUNTRIES Asynchronous systems rely on local coordination ■ powered equipment, but it can also cut the cost of larger systems LTRA circuits to ensure an orderly flow of data. The two by reducing the need for cooling fans and air-conditioning to pre- most important coordination circuits are called the vent them from overheating. The amount of power saved de- Rendezvous and the Arbiter. pends on the machine’s pattern of activity. Systems with parts ■ The potential benefits of asynchronous systems that act only occasionally benefit more than systems that act con- include faster speeds, lower power consumption and less tinuously. Most computers have components, such as the float- radio interference. As integrated circuits becom e ing-point arithmetic unit, that often remain idle for long periods. more complex, chip designers will need to learn Furthermore, asynchronous systems produce less radio in- asynchronous techniques. terference than synchronous machines do. Because a clocked system uses a fixed rhythm, it broadcasts a strong radio signal COPYRIGHT IS HELD BY SUN MICROSYSTEMS,INC. SUN, SUN MICROSYSTEMS, INC., AND U 64 SCIENTIFIC AMERICAN AUGUST 2002 COPYRIGHT 2002 SCIENTIFIC AMERICAN, INC.
4 KET BRIGADES C BU the bucket brigade can be used to describe THE METAPHOR of clock tocks, each person grasps the bucket pushed forward by the flow of data in a computer. A synchronous computer system ). An asynchronous system, in middle the preceding person ( the ticktock in which each is like a bucket brigade person follows contrast, is like an ordinary bucket brigade: each person who rhythm of a clock. When the clock ticks, each person pushes a holds a bucket can pass it down the line as soon as the next I.E.S. and J.E. — bucket forward to the next person down the line ( top ). When the bottom person’s hands are free ( ). SYNCHRONOUS ASYNCHRONOUS at its operating frequency and at the harmonics of that fre- In contrast, increasing the speed of a clocked system usually re- quency. Such signals can interfere with cellular phones, televi- quires upgrading every part. sions and aircraft navigation systems that operate at the same Lo c al Cooperation frequencies. Asynchronous systems lack a fixed rhythm, so they TO DESCRIBE HOW ASYNCHRONOUS systems work, we spread their radiated energy broadly across the radio spectrum, often use the met aphor of the bucket brigade. A clocked system emitting less at any one frequency. ucket brigade in which each person must pass and re- is like a b Yet another benefit of asynchronous design is that it can be ets according to the ticktock rhythm of the clock. ceive buck used to build bridges between clocked computers running at dif- When the clock ticks, each person pushes a bucket forward to ferent speeds. Many computing clusters, for instance, link fast the next person down the line; when the clock tocks, each per- PCs with slower machines. These clusters can tackle complex son grasps the bucket pushed forward by the preceding person problems by dividing the computational tasks among the PCs. see illustrations above [ ]. The rhythm of this brigade cannot go Such a system is inherently asynchronous: different parts march faster than the time it takes the slowest person to move the to different beats. Moving data controlled by one clock to the of the buckets are light, everyone heaviest bucket. Even if most control of another clock requires asynchronous bridges, because in the line must wait for the clock to tick before passing the next the data may be “out of sync” with the receiving clock. bucket. Finally, although asynchronous design can be challenging, An asynchronous bucket brigade is governed by local co- it can also be wonderfully flexible. Because the circuits of an operation rather than a common clock. Each person who holds asynchronous system need not share a common rhythm, de- a bucket can pass it to the next person down the line as soon signers have more freedom in choosing the system’s parts and as the next person’s hands are free. Before each action, one per- determining how they interact. Moreover, replacing any part son may have to wait until the other is ready. When most of the with a faster version will improve the speed of the entire system. BRYAN CHRISTIE DESIGN 65 www.sciam.com SCIENTIFIC AMERICAN COPYRIGHT 2002 SCIENTIFIC AMERICAN, INC.
5 HO W A RENDEZVOUS CIRCUIT WORKS can coordinate the RENDEZVOUS CIRCUITS RENDEZVOUS CIRCUIT actions of an asynchronous system, uller C-element M Input wire 1 allowing data to flow in an orderly fashion Inverter without the need for a central clock. Shown here is an electronic pipeline controlled by a chain of Muller C- elements, each of which allows data to pass down the line only when the utput wire O — indicating that preceding stage is “full” — and the following data are ready to move stage is “empty.” line pe Data pi Data latch Each Muller C-element has two input wires and one output wire. The output M uller C-element fires when both inputs are FALSE changes to 2 TRUE when both inputs FALSE and back to . (In the diagram, TRUE TRUE signals are are FALSE signals in red.) shown in blue and The inverter makes the initial inputs to the Muller C-element differ, setting all stages empty at the start. Let’s assume and the TRUE that the left input is initially ). A change in signal at 1 ( FALSE right input TRUE ) 2 ( to FALSE the left input from arrive ta Da Latch open s briefly — indicates that the stage to the left is full that is, some data have arrived. Because uller C-element fires M 3 the inputs to the Muller C-element are now the same, its output changes to FALSE . This change in signal does three things: it moves data down the pipeline by briefly making the data latch trans- signal back to the FALSE parent, it sends a preceding C-element to make the left stage empty, and it sends a FALSE signal ahead to the next C-element to make the — I.E.S. and J.E. ). right stage full ( 3 Latch open s briefly Da ta arrive buckets are light, however, they can move down the line very A clocked pipeline executes these actions in a rhythm indepen- quickly. Moreover, when there’s no water to move, everyone dent of the operations performed or the size of the numbers. can rest between buckets. A slow person will still hinder the per- and one may take just as much time as adding two Adding one formance of the entire brigade, but replacing the slowpoke will 30-digit numbers. In an asynchronous pipeline, though, the du- return the system to its best speed. ration of each action may depend on the operation performed, Bucket brigades in computers are called pipelines. A com- the size of the numbers and the locations of the data in memo- mon pipeline executes the computer’s instructions. Such a ry (just as in a bucket brigade, the amount of water in a bucket pipeline has half a dozen or so stages, each of which acts like a may determine how long it takes to pass it on). person in a bucket brigade. Instead of handling buckets of wa- Without a clock to govern its actions, an asynchronous sys- ter, however, each stage performs one action of the instruction’s tem must rely on local coordination circuits instead. These cir- execution. For example, a processor executing the instruction cuits exchange completion signals to ensure that the actions at “ADD A B C” must fetch the instruction from memory, decode each stage begin only when the circuits have the data they need. the instruction, get the numbers from addresses A and B in mem- The two most important coordination circuits are called the ory, do the addition and store the sum in memory address C. Rendezvous and the Arbiter. BRYAN CHRISTIE DESIGN SCIENTIFIC AMERICAN 66 AUGUST 2002 COPYRIGHT 2002 SCIENTIFIC AMERICAN, INC.
6 A Rendezvous element indicates when the last of two or Given only one request, an Arbiter promptly permits the cor- more signals has arrived at a particular stage. Asynchronous sys- responding action, delaying any second request until the first tems use these elements to wait until all the concurrent actions action is completed. When an Arbiter gets two requests at once, finish before starting the next action. For instance, an arithmetic it must decide which request to grant first. For example, when division circuit must have both the dividend (say, 16) and the di- two processors request access to a shared memory at approxi- visor (say, 2) before it can divide one by the other (to reach the mately the same time, the Arbiter puts the requests into a se- answer 8). quence, granting access to only one processor at a time. The Ar- One form of Rendezvous circuit is called the Muller C-ele- biter guarantees that there are never two actions under way at ment, named after David Muller, now retired from a professor- once, just as the traffic officer prevents accidents by ensuring ship at the University of Illinois. A Muller C-element is a logic that there are never two cars passing through the intersection see box on opposite circuit with two inputs and one output [ on a collision course. TRUE , its ]. When both inputs of a Muller C-element are page Although Arbiter circuits never grant more than one request output becomes TRUE FALSE . When both inputs are , its output at a time, there is no way to build an Arbiter that will always FALSE . Otherwise the output remains unchanged. For becomes reach a decision within a fixed time limit. Present-day Arbiters the Muller C-element to act as a Rendezvous circuit, its inputs reach decisions very quickly on average, usually within about must not change again until its output responds. A chain of a few hundred picoseconds. (A picosecond is a trillionth of a Muller C-elements can control the flow of data down an elec- second.) When faced with close calls, however, the circuits may tronic bucket brigade. occasionally take twice as long, and in very rare cases the time Our research group recently introduced a new kind of Ren- needed to make a decision may be 10 times as long as ]. GasP evolved see box on next page dezvous circuit called GasP [ normal. from an earlier family of circuits designed by Charles E. Mol- The fundamental difficulty in making these de- thout a clock Wi its actions, an asynchronous to govern coordination circuits instead. system must rely on local nar, our late colleague at Sun Microsystems. Molnar dubbed his cisions is nicely illustrated by the parable of Buridan’s creation asP*, which stands for asynchronous symmetric pulse ass. Attributed to Buridan, a 14th-century French philoso- protocol (the asterisk indicates the double “P”). We added the pher, this parable suggests that an ass placed exactly between “G” to the name because GasP is what you are supposed to do two equal piles of hay might starve to death because it would when you see how fast our new circuits go. We have found that be unable to choose which pile to eat. Similar minor dilemmas GasP modules are as fast and as energy-efficient as Muller C- are familiar in everyday life. For example, two people ap- elements, fit better with ordinary data latches and offer much proaching a doorway at the same time may pause before de- greater versatility in complex designs. ciding who will go through first. They can go through in either order, and Buridan’s ass can eat from either pile of hay. In both Buridan’s Ass cases, all that is needed is a way to break the tie. performs another task essential for AN ARBITER CIRCUIT An Arbiter breaks ties. Like a flip-flop circuit, an Arbiter has asynchronous computers. An Arbiter is like a traffic officer at two stable states corresponding to the two choices. One can an intersection who decides which car may pass through next. think of these states as the Pacific Ocean and the Gulf of Mex- JO EBERGEN IVAN E. SUTHERLAND and are true believers in ) is asynchronous computing. Although Sutherland ( middle left best known as a pioneer of computer graphics — he invented the interactive graphics program Sketchpad in 1963 — he became involved in asynchronous circuit design in the mid-1960s while building a graphics processor at Harvard University. He is now a THE AUTHORS vice president and fellow at Sun Microsystems, leading the Asyn- chronous Design Group at the company’s laboratories. Ebergen ( middle right ) became fascinated by asynchronous circuit design 20 years ago during a three-month stint as a research assistant to Charles L. Seitz of Caltech. He subsequently taught at Eind- hoven University of Technology in the Netherlands and the Uni- versity of Waterloo in Canada before joining Sun’s Asynchronous Design Group in the summer of 1996. OLIVIER LAUDE 67 www.sciam.com SCIENTIFIC AMERICAN COPYRIGHT 2002 SCIENTIFIC AMERICAN, INC.
7 A GASP MODULE WORKS HOW ull-up P GASP MODULES can also act as Rendezvous ull-down P transistor elements for an asynchronous data pipeline. transistor NAND 1 (The “P” in GasP is capitalized to acknowledge gate an earlier family of circuits.) Each GasP module has two wires connecting it to its Inverter neighbors and an output wire that drives a data latch. At the heart of the module is a AS G P MODULE FALSE output NAND gate, which produces a only when both inputs are TRUE . Otherwise the NAND produces a TRUE output. The connection wires between modules Data pi pe line Data latch represent the stages in the pipeline. At the NAND gate TRUE start, all the signals in these wires are fires 2 ), indicating that the stages are empty ( blue 2 ) ( ). The arrival of data in the pipeline ( 1 FALSE changes the incoming wire’s signal to Tr ansistor ansistor Tr conducts conducts TRUE ). An inverter flips the signal to ( red and sends it to the NAND gate, which changes its . This makes the data latch FALSE output to transparent, allowing the data to move down ansistor conducts Tr FALSE output also drives the the pipeline. The arrive Latch opens Da ta TRUE state (which incoming wire back to the NAND gate NAND gate fires means empty) via a pull-up transistor and resets 3 state FALSE drives the outgoing wire to the (which means full) via an inverter and a pull- down transistor. The NAND gate’s output ansistor Tr conducts then returns to ), making the latch 3 TRUE ( FALSE signal in opaque again. Meanwhile the the outgoing wire triggers the same process in the next GasP module. I.E.S. and J.E. — Da ta arrive Latch opens Latch closes ico. Each request to an Arbiter pushes the circuit toward one metaphor, you can move the Continental Divide with a shovel, stable state or the other, just as a hailstone that falls in the but you cannot get rid of it. Although there is no way to elimi- Rocky Mountains can roll downhill toward the Pacific or the nate meta-stability, simple, well-designed Arbiter circuits can en- Gulf. Between the two stable states, however, there must be a sure that virtually all delays are very brief. A typical contempo- meta-stable line, which is equivalent to the Continental Divide. rary Arbiter has a normal delay of 100 picoseconds and experi- If a hailstone falls precisely on the divide, it may balance mo- ences a delay of 400 picoseconds less than once every 10 hours mentarily on that sharp mountain ridge before tipping toward of operation. The probability of delays decreases exponentially the Pacific or the Gulf. Similarly, if two requests arrive at an Ar- with the length of the delay: an 800-picosecond pause occurs biter within a few picoseconds of each other, the circuit may less than once every billion years of operation. pause in its meta-stable state before reaching one of its stable he Need for Speed T states to break the tie. SUN MICROSYSTEMS concentrates on de- OUR GROUP AT Novice Arbiter designers often seek to avoid even the occa- signing fast asynchronous systems. We have found that speed sional long delay by fashioning complicated circuits. A common often comes from simplicity. Our initial goal was to build a error involves a circuit that notices the “hung” meta-stable state — like two counterflow pipeline with two opposing data flows and pushes the Arbiter toward a particular decision. This is like parallel bucket brigades moving in opposite directions. We training Buridan’s ass to go left when it notices a hard choice. wanted the data from both flows to interact at each of the Such training, however, merely gives the ass three choices rather stages; the hard challenge was to ensure that every “north- than two: go left, go right, or notice a hard choice and there- bound” data element would interact with every “southbound” fore go left. Even a trained ass will starve when unable to decide data element. Arbitration turned out to be essential. At each between the last two choices. Or, to use the geographic BRYAN CHRISTIE DESIGN 68 SCIENTIFIC AMERICAN AUGUST 2002 COPYRIGHT 2002 SCIENTIFIC AMERICAN, INC.
8 joint between successive stages, an Arbiter circuit permitted Furthermore, the experiments at Manchester, Caltech and only one element at a time to pass. This project proved very use- Philips demonstrate that asynchronous microprocessors can be ful as a research target; we learned a great deal about coordi- compatible with their clocked counterparts. The asynchronous nation and arbitration and built test chips to prove the relia- processors can connect to peripheral machines without special bility of our Arbiter circuits. programs or interface circuitry. More recently, we have chosen a fresh research target, a pro- A Challenging Time cessing structure we call FLEET. The name refers not only to freedom of asynchro- ALTHOUGH THE ARCHITECTURAL speed but also to the collection of computing elements, each of nous systems is a great benefit, it also poses a difficult challenge. which we call a ship. Each ship does its task concurrently with Because each part sets its own pace, that pace may vary from the others; the FLEET system controls their actions by moving time to time in any one system and may vary from system to data among them through an asynchronous switching network. system. If several actions are concurrent, they may finish in a Our work on FLEET has led to many discoveries. We wanted large number of possible sequences. Enumerating all the pos- speed, and that led us to create the basic GasP circuits. We want- sible sequences of actions in a complex asynchronous chip is as ed to steer data items from one pipeline to another, like cars at difficult as predicting the sequences of actions in a schoolyard highway interchanges, and that led us to design a larger family full of children. This dilemma is called the state explosion prob- of GasP circuits that can act like a dense system of freeways. lem. Can chip designers create order out of the potential These circuits can move data about twice as fast as a clocked sys- chaos of concurrent actions? tem could. To gauge the speed of our switching networks, we of- T in the coming he technological trend is inevitable: decades, asynchronous design will become prevalent. Fortunately, researchers are developing theories ten build rings on our test chips around which the data elements for tackling this problem. Designers need not worry rush like race cars. We measure the time it takes a data element about all the possible sequences of actions if they can set to complete a round-trip, which is a lot easier than measuring the certain limitations on the communication behavior of each cir- much shorter time it takes data to advance through one stage. cuit. To continue the schoolyard metaphor, a teacher can pro- Our designs are beginning to enter Sun’s computer products. mote safe play by teaching each child how to avoid danger. The UltraSPARC IIIi processor chip contains asynchronous data Another difficulty is that we lack mature design tools, ac- queues that accept information from memory chips [ see illus- cepted testing methods and widespread education in asyn- tration on page 63 ]. This asynchronous system is the simplest chronous design. A growing research community is making and fastest way to compensate for the differences in arrival time good progress, but the present total investment in clock-free of signals from memory chips that lie at different distances from computing pales in comparison with the investment in clocked the processor. Sun’s product designers are gaining confidence design. Nevertheless, we are confident that the relentless ad- that they can build asynchronous parts, that the parts work and vances in the speed and complexity of integrated circuits will can be tested, and that asynchrony offers important advantages force designers to learn asynchronous techniques. We do not over clocked design. As their confidence grows, more and more know yet whether asynchronous systems will flourish first with- commercial products will use asynchronous parts for greater in large computer and electronics companies or within start-up speed and flexibility, better power efficiency and reduced radio companies eager to develop new ideas. The technological trend, interference. however, is inevitable: in the coming decades, asynchronous Sun is by no means the only company investigating asyn- design will become prevalent. Eventually there will no longer chronous circuits. A group at Philips Research in the Nether- be an easy answer to the question, How fast is your personal lands has developed an asynchronous error corrector for a dig- computer? ital compact cassette player and an asynchronous version of a popular microcontroller for portable devices. The asynchro- MO RE TO EXPLORE nous microcontroller has been incorporated into pagers sold by Micropipelines. Ivan E. Sutherland: The Turing Award Lecture. Philips. The success of the Philips research group comes from Communications of the ACM, Vol. 32, No. 6, pages 720–738; June 1989. three factors. First, the team learned how to create products Asynchronous Circuits and Systems. Proceedings Special issue of rapidly using a programming language called Tangram that of the IEEE; February 1999. simplifies hardware design. Second, the low power consump- Principles of Asynchronous Circuit Design: A Systems Perspective. tion of their asynchronous circuits enables a pager to operate Edited by Jens Sparsø and Steve Furber. Kluwer Academic Publishers, 2001. longer on a battery charge. Third, the reduced radio interfer- ence of their asynchronous circuits allowed the inclusion of The Asynchronous Logic Home Page is at www.cs.man.ac.uk/async/ both a computer and a sensitive radio receiver in a tiny device. wsinap/async.html ~ www.win.tue.nl/ The Asynchronous Bibliography is at 69 SCIENTIFIC AMERICAN www.sciam.com COPYRIGHT 2002 SCIENTIFIC AMERICAN, INC.