Showing posts with label electronics. Show all posts
Showing posts with label electronics. Show all posts

Inside an unusual 7400-series chip implemented with a gate array

When I look inside a chip from the popular 7400 series, I know what to expect: a fairly simple die, implemented in a straightforward, cost-effective way. However, when I looked inside a military-grade chip built by Integrated Device Technology (IDT)4 I found a very unexpected layout: over 1500 transistors in an orderly matrix. Even stranger, most of the die is wasted: less than 20% of these transistors are used, forming scattered circuits connected by thin metal wires.

In this blog post, I look at this chip in detail, describe its gates, and explain how it implements the "1-of-4" decoder function. I also discuss why it sometimes makes sense to build chips with a gate array design such as this, despite the inefficiency.

A photo of the tiny silicon die in its package.  This chip is the IDT 54FCT139ALB dual 1-of-4 decoder.  Click this image (or any other) for a larger version.

A photo of the tiny silicon die in its package. This chip is the IDT 54FCT139ALB dual 1-of-4 decoder. Click this image (or any other) for a larger version.

In the photo below, you can see the silicon die in more detail, with the silicon appearing pink. The main circuitry is implemented in the nine rows that form the gate array, a grid of 1584 transistors. The tiny dark rectangles are transistors of two types, NMOS and PMOS, that work together to implement CMOS logic circuits. At this scale, the metal wiring is visible as faint gray lines and smudges, but most of the transistors are unconnected. Surrounding the gate array are 22 input/output (I/O) blocks each with a square bond pad. As with the transistors, many of these I/O blocks are unused. Fourteen of these bond pads have tiny metal bond wires (the thick black lines) that connect the silicon die to the chip's external pins. Finally, the pairs of bond wires at the center left and center right provide ground and power connections for the chip.

Closeup die photo.

Closeup die photo.

The photo below zooms in on three rows of circuitry in the chip. The large dark rectangles are pairs of transistors, with two lines of transistors in each row of circuitry. At the top and bottom of each row, the thick horizontal white lines are metal wiring that provides power and ground. In each row, one line of transistors holds PMOS transistors, next to the power wiring, while the other line holds NMOS transistors, next to the ground wiring. (The orientation flips in each successive row, so it isn't obvious which transistors are which unless you check the power connections at the end of the row.)

A closeup of the die.

A closeup of the die.

The transistors are wired into gates by the metal layers, the white lines. The gates are connected by horizontal and vertical wiring using the wiring channels between the rows. This wiring style is very similar to standard-cell logic. However, unlike standard-cell logic, the underlying transistor grid is fixed, resulting in wasted transistors. In the image above, most of the transistors in the middle row are used, while the top row is unused and the bottom row is mostly unused.

The diagram below shows the structure of one of the transistor blocks, which contains two tall, thin MOS transistors. The vertical metal contacts connect to the sources and drains of the transistors, with the two transistors sharing the middle contact. (On an integrated circuit, the source and drain of a transistor are identical, so it is arbitrary which side is the source and which is the drain.) The short horizontal metal contacts at the top connect to the gates of the two transistors; the gates are made of polysilicon, which is barely visible in the die photo. The gates partition the active silicon (green), forming the transistors. The gate width is approximately 1 µm.

A block of two transistors as they appear on the die, along with a diagram showing the structure. The bar indicates a length of 10 µm.

A block of two transistors as they appear on the die, along with a diagram showing the structure. The bar indicates a length of 10 µm.

NAND gate

In this section, I'll explain the construction of one of the NAND gates on the die. The NAND gate below uses four transistors, two NMOS transistors on the top and two PMOS transistors on the bottom. The white lines are the metal wiring, forming two layers. Most of the wiring (including power and ground) is in the lower (M1) layer. The slightly wider and darker vertical segments are the upper (M2) layer. The circles connect the metal layers when they join, or connect the metal layer to the underlying silicon or polysilicon. With two metal layers, it's a bit tricky to see how the wiring is connected. The A and B inputs each connect to two transistor gates. The transistor group at the top is connected to ground on the right, with the output on the left. The transistor group on the bottom is connected to Vcc on the left and right, with the output in the middle. This has the effect of putting the upper transistors in series and the lower transistors in parallel.

A NAND gate on the die.

A NAND gate on the die.

Below, I've drawn the schematic of the NAND gate. On the left, the layout of the schematic matches the die layout above. On the right, I've redrawn the schematic with a more traditional layout. To understand its operation, note that a PMOS transistor (top on the right schematic) turns on when the input is low, while an NMOS transistor (bottom on the right) turns on when the input is high. When both inputs are high, the two NMOS transistors turn on, connecting ground to the output, pulling it low. When either input is low, one of the PMOS transistors turns on, pulling the output high. Thus, the circuit implements the NAND function. The NMOS and PMOS transistors operate in a complementary fashion, giving CMOS (Complementary MOS) its name.

Schematic of a NAND gate.

Schematic of a NAND gate.

NOR gate

In this section, I'll explain the layout of one of the NOR gates on the die, shown below. This gate is twice as large as the previous NAND gate so it can provide twice the output current.1 The NOR gate uses eight transistors, four PMOS transistors in the upper half and four NMOS transistors in the lower half. (Note that Vcc and ground are flipped compared to the previous gate, as are the NMOS and PMOS transistors.) The two transistors in each block are wired in parallel to produce more current for the output. (A out is the same signal as A in, exiting the block at the top to connect to other circuitry.)

A NOR gate on the die.

A NOR gate on the die.

The schematic below shows the wiring of the eight transistors. The schematic layout corresponds to the physical layout to make it easier to map between the image and the schematic. The upper transistor groups are wired in series, while the lower transistor groups are wired in parallel.

Schematic corresponding to the gate above.

Schematic corresponding to the gate above.

The schematic below has been redrawn to make the functionality clearer, and the parallel transistors have been removed. If either input is high, one of the NMOS transistors on the bottom will turn on and pull the input low. If both inputs are low, the two PMOS transistors will turn on and pull the input high. This provides the desired NOR function.

Simplified NOR gate schematic.

Simplified NOR gate schematic.

Note that the NAND and NOR gates have similar but opposite schematics. In the NAND gate, the NMOS transistors are in series while the PMOS transistors are in parallel. In the NOR gate, the roles of the transistors are swapped.

The chip's circuit

The chip I examined is a "dual 1-of-4 decoder with enable".2 The decoding function takes a two-bit input and selects one of four output lines depending on the binary value. The enable line must be low to activate this operation; otherwise all four output lines are disabled. The chip contains two of these decoders, which is why it is called a dual decoder. In total, the chip contains 18 logic gates,3 so it is very simple, even by 1990s standards.

I reverse-engineered the chip and created the schematic below, showing one of the dual units. Each NAND gate matches one of the four input possibilities to drive one of the four outputs. The NOR gates support the ENABLE signal, blocking the outputs unless ENABLE is active (i.e. low).

Reverse-engineered schematic of half the chip.

Reverse-engineered schematic of half the chip.

The chip uses a general-purpose I/O block (below) for each pin, that can be used as an input or an output depending on how it is wired. Each block contains two large drive transistors: an NMOS transistor to pull the output low and a PMOS transistor to pull the output high. The I/O block has separate control lines for the two output transistors. (At the bottom of the image below, two thin metal wires drive the high-side and low-side transistors.) This permits tri-state logic: if neither transistor is energized, the output is left floating. The gate array drives the output transistors with high-current inverter, constructed from multiple transistors in parallel. (This is why the schematic shows more inverters than may seem necessary.)

One of the 22 I/O blocks on the die. Each I/O block is associated with a bond pad, where a bond wire can be connected to an external pin.

One of the 22 I/O blocks on the die. Each I/O block is associated with a bond pad, where a bond wire can be connected to an external pin.

When used as an input, the pad is wired to the surrounding circuitry slightly differently, connecting to input protection diodes (not shown on the schematic). Thus, the functionality of the I/O blocks can be changed by modifying the metal layers, without changing the underlying silicon.

Some 7400-series history

The earliest logic integrated circuits used resistors and transistors internally, so they were called RTL (Resistor Transistor Logic), but RTL had significant performance problems. RTL was rapidly replaced by Diode Transistor Logic (DTL) and then Transistor Transistor Logic (TTL). In 1964, Texas Instruments created a line of TTL integrated circuits for military applications called the SN5400 series. This was shortly followed by the commercial-grade SN7400 series.

The 7400 series of integrated circuits was inexpensive, fast, and easy to use. The line started with simple logic circuits such as four NAND gates on a chip, and moved into more complex chips such as counters, shift registers, and ALUs. The 7400 series became very popular in the 1970s and 1980s, used by electronics hobbyists and high-performance minicomputers alike. These chips became essential building blocks and "glue" logic for microcomputers, heavily used in the Apple II for instance.

The original 7400 series branched into dozens of families with different performance characteristics but the same functionality. The 74LS (low-power Schottky) family, for instance, became very popular as it both improved speed and reduced power consumption. In the mid-1970s, 7400-series chips were introduced that used CMOS circuitry instead of TTL for dramatically lower power consumption. This CMOS family, the 74C series, was followed by numerous other CMOS families.

That brings us to the chip I examined, a member of IDT's 74FCT (Fast CMOS TTL-compatible) line of chips, introduced in the mid-1980s. (Specifically, it is in the 54FCT family because it handles a wider temperature range.) These chips used advanced CMOS technology to provide high speed, low power consumption, and as a military option, radiation tolerance.

Conclusions

Why would you make a chip in this inefficient way, using a gate array that wastes most of the die area? The motivation is that most of the design cost can be shared across many different part types. Each step of integrated circuit processing requires an expensive mask for photolithography. With a gate array, all chip types use the same underlying silicon and transistors, with custom masks just for the two metal layers. In comparison, a fully custom chip might require eight custom masks, which costs much more. The tradeoff is that gate array chips are larger so the manufacturing cost is higher per device.5 Thus, a gate array design is better when selling chips in relatively small quantities, while a custom design is cheaper when mass-producing chips.6 IDT focused on the high-performance and military market rather than the commodity chip market, so gate arrays were a good fit.

One last thing. The packaging of this chip is very interesting since it is mounted on a multi-chip module. The module also contains two Atmel EEPROMs. Presumably the decoder chip decodes address bits to select one of the EEPROMs.

The multi-chip module containing the decoder chip along with an AT28HC64 EPROM on either side.

The multi-chip module containing the decoder chip along with an AT28HC64 EPROM on either side.

Thanks to Don S. for providing the chip. Follow me on Twitter @kenshirriff or RSS for updates. I've also started experimenting with Mastodon recently as @oldbytes.space@kenshirriff.

Notes and references

  1. Properly sizing the transistors in a gate is important for performance. Since the transistors in the gate array are all the same size, multiple transistors are used in parallel to get the desired current. The 1999 book Logical Effort describes a methodology for maximizing the performance of CMOS circuits by correctly sizing the transistors. 

  2. The part number is "IDT 54FCT139ALB". "54" indicates the chip operates under an enhanced temperature range of -55°C to +125°C. The "A" indicates the chip is 35% faster than the base series (but not as fast as "C"). "L" indicates the chip is packaged in a leadless chip carrier, the square package shown at the top of the article. Finally, "B" indicates the chip was tested according to military standards: MIL-STD-883, Class B. 

  3. The chip contains 18 logic gates according to the functional schematic in the datasheet (below). The implementation actually uses 52 logic gates by my count (2×26) because the implementation doesn't exactly match the schematic. In particular, the datasheet shows three-input NAND gates, but the chip uses a NAND gate and a NOR gate along with inverters. The chip also has additional inverters to drive the output transistors in each I/O block.

    Schematic of the chip from the datasheet.

    Schematic of the chip from the datasheet.

     

  4. Integrated Device Technology was a spinoff from Hewlett Packard that started in 1980. IDT built advanced CMOS chips including fast static RAM and microprocessors (bit-slice and MIPS). It became part of Renesas in 2018. A very detailed 1986 profile of IDT is here. IDT's logo is pretty cool, combining a chip wafer and calculus.

    The logo of Integrated Device Technology.

    The logo of Integrated Device Technology.

    Here's how the logo looks on the die:

    Closeup of the die showing the IDT logo.

    Closeup of the die showing the IDT logo.

    The die also has the initials of the designers, along with some mysterious symbols. One looks like the Chinese character "正". (Update: based on a Twitter comment, these symbols are probably tally marks, indicating the revision count for each mask.)

    Closeups of two parts of the die.

    Closeups of two parts of the die.

  5. Integrated circuit manufacturing is partitioned into the "front end of line", where the transistors are created on the silicon wafer, and the "back end of line", where the metal wiring is put on top to connect the transistors. With a gate array construction, the front end of line steps create generic gate array wafers. The back end of line steps then connect the transistors as desired for a particular component. The gate array wafers can be produced in large quantities and stored, and then customized for specific products in smaller quantities as needed. This reduces the time to supply a particular chip type since only the back end of line process needs to take place. 

  6. The IDT High-Speed CMOS Logic Design Guide briefly mentions the gate array design. The FCT family was built from two sizes of gate arrays, "4004" for smaller chips and "8000" for larger chips. Later, IDT shrunk the original "Z-step" gate arrays to smaller, higher-performance "Y-step" arrays. They then customized some of the devices to create the "W-step" devices. Looking at the markings on the die, we see that this chip uses the "4004Y" gate array.

    The die shows gate slice 4004Y and part 4139Y (indicating 54139 or 74139). The numbers are slightly obscured by a bond wire.

    The die shows gate slice 4004Y and part 4139Y (indicating 54139 or 74139). The numbers are slightly obscured by a bond wire.

     

Inside the mechanical Bendix Air Data Computer, part 5: motor/tachometers

The Bendix Central Air Data Computer (CADC) is an electromechanical analog computer that uses gears and cams for its mathematics. It was a key part of military planes such as the F-101 and the F-111 fighters, computing airspeed, Mach number, and other "air data". The rotating gears are powered by six small servomotors, so these motors are in a sense the fundamental component of the CADC. In the photo below, you can see one of the cylindrical motors near the center, about 1/3 of the way down.

The servomotors in the CADC are unlike standard motors. Their name—"Motor-Tachometer Generator" or "Motor and Rate Generator"1—indicates that each unit contains both a motor and a speed sensor. Because the motor and generator use two-phase signals, there are a total of eight colorful wires coming out, many more than a typical motor. Moreover, the direction of the motor can be controlled, unlike typical AC motors. I couldn't find a satisfactory explanation of how these units worked, so I bought one and disassembled it. This article (part 5 of my series on the CADC2) provides a complete teardown of the motor/generator and explain how it works.

The Bendix MG-1A Central Air Data Computer with the case removed, showing the compact gear mechanisms inside. Click this image (or any other) for a larger version.

The Bendix MG-1A Central Air Data Computer with the case removed, showing the compact gear mechanisms inside. Click this image (or any other) for a larger version.

The image below shows a closeup of two motors powering one of the pressure signal outputs. Note the bundles of colorful wires to each motor, entering in two locations. At the top, the motors drive complex gear trains. The high-speed motors are geared down by the gear trains to provide much slower rotations with sufficient torque to power the rest of the CADC's mechanisms.

Two motor/generators in the pressure section of the CADC. The one at the back is mostly hidden.

Two motor/generators in the pressure section of the CADC. The one at the back is mostly hidden.

The motor/tachometer that we disassembled is shorter than the ones in the CADC (despite having the same part number), but the principles are the same. We started by removing a small C-clip on the end of the motor and and unscrewing the end plate. The unit is pretty simple mechanically. It has bearings at each end for the rotor shaft. There are four wires for the motor and four wires for the tachometer.3

The motor disassembled to show the internal components.

The motor disassembled to show the internal components.

The rotor (below) has two parts on the shaft. the left part is for the motor and the right drum is for the tachometer. The left part is a squirrel-cage rotor4 for the motor. It consists of conducting bars (light-colored) on an iron core. The conductors are all connected at both ends by the conductive rings at either end. The metal drum on the right is used by the tachometer. Note that there are no electrical connections between the rotor components and the rest of the motor: there are no brushes or slip rings. The interaction between the rotor and the windings in the body of the motor is purely magnetic, as will be explained.

The rotor and shaft.

The rotor and shaft.

The motor/tachometer contains two cylindrical stators that create the magnetic fields, one for the motor and one for the tachometer. The photo below shows the motor stator inside the unit after removing the tachometer stator. The stators are encased in hard green plastic and tightly pressed inside the unit. In the center, eight metal poles are visible. They direct the magnetic field onto the rotor.

Inside the motor after removing the tachometer winding.

Inside the motor after removing the tachometer winding.

The photo below shows the stator for the tachometer, similar to the stator for the motor. Note the shallow notches that look like black lines in the body on the lower left. These are probably adjustments to the tachometer during manufacturing to compensate for imperfections. The adjustments ensure that the magnetic fields are nulled out so the tachometer returns zero voltage when stationary. The metal plate on top shields the tachometer from the motor's magnetic fields.

The stator for the tachometer.

The stator for the tachometer.

The poles and the metal case of the stator look solid, but they are not. Instead, they are formed from a stack of thin laminations. The reason to use laminations instead of solid metal is to reduce eddy currents in the metal. Each lamination is varnished, so it is insulated from its neighbors, preventing the flow of eddy currents.

One lamination from the stack of laminations that make up the winding. The lamination suffered some damage during disassembly; it was originally round.

One lamination from the stack of laminations that make up the winding. The lamination suffered some damage during disassembly; it was originally round.

In the photo below, I removed some of the plastic to show the wire windings underneath. The wires look like bare copper, but they have a very thin layer of varnish to insulate them. There are two sets of windings (orange and blue, or red and black) around alternating metal poles. Note that the wires run along the pole, parallel to the rotor, and then wrap around the pole at the top and bottom, forming oblong coils around each pole.5 This generates a magnetic field through each pole.

Removing the plastic reveals the motor windings.

Removing the plastic reveals the motor windings.

The motor

The motor part of the unit is a two-phase induction motor with a squirrel-cage rotor.6 There are no brushes or electrical connections to the rotor, and there are no magnets, so it isn't obvious what makes the rotor rotate. The trick is the "squirrel-cage" rotor, shown below. It consists of metal bars that are connected at the top and bottom by rings. Assume (for now) that the fixed part of the motor, the stator, creates a rotating magnetic field. The important principle is that a changing magnetic field will produce a current in a wire loop.7 As a result, each loop in the squirrel-cage rotor will have an induced current: current will flow up9 the bars facing the north magnetic field and down the south-facing bars, with the rings on the end closing the circuits.

A squirrel-cage rotor. The numbered parts are (1) shaft, (2) end cap, (3) laminations, and (4) splines to hold the laminations. Image from Robo Blazek.

A squirrel-cage rotor. The numbered parts are (1) shaft, (2) end cap, (3) laminations, and (4) splines to hold the laminations. Image from Robo Blazek.

But how does the stator produce a rotating magnetic field? And how do you control the direction of rotation? The next important principle is that current flowing through a wire produces a magnetic field.8 As a result, the currents in the squirrel cage rotor produce a magnetic field perpendicular to the cage. This magnetic field causes the rotor to turn in the same direction as the stator's magnetic field, driving the motor. Because the rotor is powered by the induced currents, the motor is called an induction motor.

The diagram below shows how the motor is wired, with a control winding and a reference winding. Both windings are powered with AC, but the control voltage either lags the reference winding by 90° or leads the reference winding by 90°, due to the capacitor. Suppose the current through the control winding lags by 90°. First, the reference voltage's sine wave will have a peak, producing the magnetic field's north pole at A. Next (90° later), the control voltage will peak, producing the north pole at B. The reference voltage will go negative, producing a south pole at A and thus a north pole at C. The control voltage will go negative, producing a south pole at B and a north pole at D. This cycle will repeat, with the magnetic field rotating counter-clockwise from A to D. Conversely, if the control voltage leads the reference voltage, the magnetic field will rotate clockwise. This causes the motor to spin in one direction or the other, with the direction controlled by the control voltage. (The motor has four poles for each winding, rather than the one shown below; this increases the torque and reduces the speed.)

Diagram showing the servomotor wiring.

Diagram showing the servomotor wiring.

The purpose of the capacitor is to provide the 90° phase shift so the reference voltage and the control voltage can be driven from the same single-phase AC supply (in this case, 26 volts, 400 hertz). Switching the polarity of the control voltage reverses the direction of the motor.

There are a few interesting things about induction motors. You might expect that the motor would spin at the same rate as the rotating magnetic field. However, this is not the case. Remember that a changing magnetic field induces the current in the squirrel-cage rotor. If the rotor is spinning at the same rate as the magnetic field, the rotor will encounter an unchanging magnetic field and there will be no current in the bars of the rotor. As a result, the rotor will not generate a magnetic field and there will be no torque to rotate it. The consequence is that the rotor must spin somewhat slower than the magnetic field. This is called "slippage" and is typically a few percent of the full speed, with more slippage as more torque is required.

Many household appliances use induction motors, but how do they generate a rotating magnetic field from a single-phase AC winding? The problem is that the magnetic field in a single AC winding will just flip back and forth, so the motor will not turn in either direction. One solution is a shaded-pole motor, which puts a copper bar around part of each pole to break the symmetry and produce a weakly rotating magnetic field. More powerful induction motors use a startup winding with a capacitor (analogous to the control winding). This winding can either be switched out of the circuit once the motor starts spinning,10 or used continuously, called a permanent-split capacitor (PSC) motor. The best solution is three-phase power (if available); a three-phase winding automatically produces a rotating magnetic field.

Tachometer/generator

The second part of the unit is the tachometer generator, sometimes called the rate unit.11 The purpose of the generator is to produce a voltage proportional to the speed of the shaft. The unusual thing about this generator is that it produces a 400-hertz output that is either in phase with the input or 180° out of phase. This is important because the phase indicates which direction the shaft is turning. Note that a "normal" generator is different: the output frequency is proportional to the speed.

The diagram below shows the principle behind the generator. It has two stator windings: the reference coil that is powered at 400 Hz, and the output coil that produces the output signal. When the rotor is stationary (A), the magnetic flux is perpendicular to the output coil, so no output voltage is produced. But when the rotor turns (B), eddy currents in the rotor distort the magnetic field. It now couples with the output coil, producing a voltage. As the rotor turns faster, the magnetic field is distorted more, increasing the coupling and thus the output voltage. If the rotor turns in the opposite direction (C), the magnetic field couples with the output coil in the opposite direction, inverting the output phase. (This diagram is more conceptual than realistic, with the coils and flux 90° from their real orientation, so don't take it too seriously. As shown earlier, the coils are perpendicular to the rotor so the real flux lines are completely different.)

Principle of the drag-cup rate generator. From Navy electricity and electronics training series: Principles of synchros, servos, and gyros, Fig 2-16

But why does the rotating drum change the magnetic field? It's easier to understand by considering a tachometer that uses a squirrel-cage rotor instead of a drum. When the rotor rotates, currents will be induced in the squirrel cage, as described earlier with the motor. These currents, in turn, generate a perpendicular magnetic field, as before. This magnetic field, perpendicular to the orginal field, will be aligned with the output coil and will be picked up. The strength of the induced field (and thus the output voltage) is proportional to the speed, while the direction of the field depends on the direction of rotation. Because the primary coil is excited at 400 hertz, the currents in the squirrel cage and the resulting magnetic field also oscillate at 400 hertz. Thus, the output is at 400 hertz, regardless of the input speed.

Using a drum instead of a squirrel cage provides higher accuracy because there are no fluctuations due to the discrete bars. The operation is essentially the same, except that the currents pass through the metal of the drum continuously instead of through individual bars. The result is eddy currents in the drum, producing the second magnetic field. The diagram below shows the eddy currents (red lines) from a metal plate moving through a magnetic field (green), producing a second magnetic field (blue arrows). For the rotating drum, the situation is similar except the metal surface is curved, so both field arrows will have a component pointing to the left. This creates the directed magnetic field that produces the output.

A diagram showing eddy currents in a metal plate moving under a magnet, Image from Chetvorno.

A diagram showing eddy currents in a metal plate moving under a magnet, Image from Chetvorno.

The servo loop

The motor/generator is called a servomotor because it is used in a servo loop, a control system that uses feedback to obtain precise positioning. In particular, the CADC uses the rotational position of shafts to represent various values. The servo loops convert the CADC's inputs (static pressure, dynamic pressure, temperature, and pressure correction) into shaft positions. The rotations of these shafts power the gears, cams, and differentials that perform the computations.

The diagram below shows a typical servo loop in the CADC. The goal is to rotate the output shaft to a position that exactly matches the input voltage. To accomplish this, the output position is converted into a feedback voltage by a potentiometer that rotates as the output shaft rotates.12 The error amplifier compares the input voltage to the feedback voltage and generates an error signal, rotating the servomotor in the appropriate direction. Once the output shaft is in the proper position, the error signal drops to zero and the motor stops. To improve the dynamic response of the servo loop, the tachometer signal is used as a negative feedback voltage. This ensures that the motor slows as the system gets closer to the right position, so the motor doesn't overshoot the position and oscillate. (This is sort of like a PID controller.)

Diagram of a servo loop in the CADC.

Diagram of a servo loop in the CADC.

The error amplifier and motor drive circuit for a pressure transducer are shown below. Because of the state of electronics at the time, it took three circuit boards to implement a single servo loop. The amplifier was implemented with germanium transistors (since silicon transistors were later). The transistors weren't powerful enough to drive the motors directly. Instead, magnetic amplifiers (the yellow transformer-like modules at the front) powered the servomotors. The large rectangular capacitors on the right provided the phase shift required for the control voltage.

One of the three-board amplifiers for the pressure transducer.

One of the three-board amplifiers for the pressure transducer.

Conclusions

The Bendix CADC used a variety of electromechanical devices including synchros, control transformers, servo motors, and tachometer generators. These were expensive military-grade components driven by complex electronics. Nowadays, you can get a PWM servo motor for a few dollars with the gearing, feedback, and control circuitry inside the motor housing. These motors are widely used for hobbyist robotics, drones, and other applications. It's amazing that servo motors have gone from specialized avionics hardware to an easy-to-use, inexpensive commodity.

A modern DC servo motor. Photo by Adafruit (CC BY-NC-SA 2.0 DEED).

A modern DC servo motor. Photo by Adafruit (CC BY-NC-SA 2.0 DEED).

Follow me on Twitter @kenshirriff or RSS for updates. I'm also on Mastodon as @oldbytes.space@kenshirriff. Thanks to Joe for providing the CADC. Thanks to Marc Verdiell for disassembling the motor.

Notes and references

  1. The two types of motors in the CADC are part number "FV-101-19-A1" and part number "FV-101-5-A1" (or FV101-5A1). They are called either a "Tachometer Rate Generator" or "Tachometer Motor Generator", with both names applied to the same part number. The "19" and "5" units look the same, with the "19" used for one pressure servo loop and the "5" used everywhere else.

    The motor that I got is similar to the ones in the CADC, but shorter. The difference in size is mysterious since both have the Bendix part number FV-101-5-A1.

    For reference, the motor I disassembled is labeled:

    Cedar Division Control Data Corp. ST10162 Motor Tachometer F0: 26V C0: 26V TACH: 18V 400 CPS DSA-400-70C-4651 FSN6105-581-5331 US BENDIX FV-101-5-A1

    I wondered why the motor listed both Control Data and Bendix. In 1952, the Cedar Engineering Company was spun off from the Minneapolis Honeywell Regulator Company (better known as Honeywell, the name it took in 1964). Cedar Engineering produced motors, servos, and aircraft actuators. In 1957, Control Data bought Cedar Engineering, which became the Cedar Division of CDC. Then, Control Data acquired Bendix's computer division in 1963. Thus, three companies were involved. 

  2. My previous articles on the CADC are:

     

  3. From testing the motor, here is how I believe it is wired:
    Motor reference (power): red and black
    Motor control: blue and orange
    Generator reference (power): green and brown
    Generator out: white and yellow 

  4. The bars on the squirrel-cage rotor are at a slight angle. Parallel bars would go in and out of alignment with the stator, causing fluctuations in the force, while the angled bars avoid this problem. 

  5. This cross-section through the stator shows the windings. On the left, each winding is separated into the parts on either side of the pole. On the right, you can see how the wires loop over from one side of the pole to the other. Note the small circles in the 12 o'clock and 9 o'clock positions: cross sections of the input wires. The individual horizontal wires near the circumference connect alternating windings.

    A cross-section of the stator, formed by sanding down the plastic on the end.

    A cross-section of the stator, formed by sanding down the plastic on the end.

     

  6. It's hard to find explanations of AC servomotors since they are an old technology. One discussion is in Electromechanical components for servomechanisms (1961). This book points out some interesting things about a servomotor. The stall torque is proportional to the control voltage. Servomotors are generally high-speed, but low-torque devices, heavily geared down. Because of their high speed and their need to change direction, rotational inertia is a problem. Thus, servomotors typically have a long, narrow rotor compared with typical motors. (You can see in the teardown photo that the rotor is long and narrow.) Servomotors are typically designed with many poles (to reduce speed) and smaller air gaps to increase inductance. These small airgaps (e.g. 0.001") require careful manufacturing tolerance, making servomotors a precision part. 

  7. The principle is Faraday's law of induction: "The electromotive force around a closed path is equal to the negative of the time rate of change of the magnetic flux enclosed by the path." 

  8. Ampère's law states that "the integral of the magnetizing field H around any closed loop is equal to the sum of the current flowing through the loop." 

  9. The direction of the current flow (up or down) depends on the direction of rotation. I'm not going to worry about the specific direction of current flow, magnetic flux, and so forth in this article. 

  10. Once an induction motor is spinning, it can be powered from a single AC phase since the stator is rotating with respect to the magnetic field. This works for the servomotor too. I noticed that once the motor is spinning, it can operate without the control voltage. This isn't the normal way of using the motor, though. 

  11. A long discussion of tachometers is in the book Electromechanical Components for Servomechanisms (1961). The AC induction-generator tachometer is described starting on page 193.

    For a mathematical analysis of the tachometer generator, see Servomechanisms, Section 2, Measurement and Signal Converters, MCP 706-137, U.S. Army. This source also discusses sources of errors in detail. Inexpensive tachometer generators may have an error of 1-2%, while precision devices can have an error of about 0.1%. Accuracy is worse for small airborne generators, though. Since the Bendix CADC uses the tachometer output for damping, not as a signal output, accuracy is less important. 

  12. Different inputs in the CADC use different feedback mechanisms. The temperature servo uses a potentiometer for feedback. The angle of attack correction uses a synchro control transformer, which generates a voltage based on the angle error. The pressure transducers contain inductive pickups that generate a voltage based on the pressure error. For more details, see my article on the CADC's pressure transducer servo circuits

Reverse engineering standard cell logic in the Intel 386 processor

The 386 processor (1985) was Intel's most complex processor at the time, with 285,000 transistors. Intel had scheduled 50 person-years to design the processor, but it was falling behind schedule. The design team decided to automate chunks of the layout, developing "automatic place and route" software.1 This was a risky decision since if the software couldn't create a dense enough layout, the chip couldn't be manufactured. But in the end, the 386 finished ahead of schedule, an almost unheard-of accomplishment.

In this article, I take a close look at the "standard cells" used in the 386, the logic blocks that were arranged and wired by software. Reverse-engineering these circuits shows how standard cells implement logic gates, latches, and other components with CMOS transistors. Modern integrated circuits still use standard cells, much smaller now, of course, but built from the same principles.

The photo below shows the 386 die with the automatic-place-and-route regions highlighted in red. These blocks of unstructured logic have cells arranged in rows, giving them a characteristic striped appearance. In comparison, functional blocks such as the datapath on the left and the microcode ROM in the lower right were designed manually to optimize density and performance, giving them a more solid appearance. As for other features on the chip, the black circles around the border are bond wire connections that go to the chip's external pins. The chip has two metal layers, a small number by modern standards, but a jump from the single metal layer of earlier processors such as the 286. The metal appears white in larger areas, but purplish where circuitry underneath roughens its surface. For the most part, the underlying silicon and the polysilicon wiring on top are obscured by the metal layers.

Die photo of the 386 processor with standard-cell logic highlighted in red.

Die photo of the 386 processor with standard-cell logic highlighted in red.

Early processors in the 1970s were usually designed by manually laying out every transistor individually, fitting transistors together like puzzle pieces to optimize their layout. While this was tedious, it resulted in a highly dense layout. Federico Faggin, designer of the popular Z80 processor, describes finding that the last few transistors wouldn't fit, so he had to erase three weeks of work and start over. The closeup of the resulting Z80 layout below shows that each transistor has a different, complex shape, optimized to pack the transistors as tightly as possible.2

A closeup of transistors in the Zilog Z80 processor (1976). This chip is NMOS, not CMOS, which provides more layout flexibility. The metal and polysilicon layers have been removed to expose the underlying silicon. The lighter stripes over active silicon indicate where the polysilicon gates were. I think this photo is from the Visual 6502 project but I'm not sure.

A closeup of transistors in the Zilog Z80 processor (1976). This chip is NMOS, not CMOS, which provides more layout flexibility. The metal and polysilicon layers have been removed to expose the underlying silicon. The lighter stripes over active silicon indicate where the polysilicon gates were. I think this photo is from the Visual 6502 project but I'm not sure.

Standard-cell logic is an alternative that is much easier than manual layout.3 The idea is to create a standard library of blocks (cells) to implement each type of gate, flip-flop, and other low-level component. To use a particular circuit, instead of arranging each transistor, you use the standard design. Each cell has a fixed height but the width varies as needed, so the standard cells can be arranged in rows. For example, the die photo below three cells in a row: a latch, a high-current inverter, and a second latch. This region has 24 transistors in total with PMOS above and NMOS below. Compare the orderly arrangement of these transistors with the Z80 transistors above.

Some standard cell circuitry in the 386. I removed the metal and polysilicon to show the underlying silicon. The irregular blotches are oxide that wasn't fully removed, and can be ignored.

Some standard cell circuitry in the 386. I removed the metal and polysilicon to show the underlying silicon. The irregular blotches are oxide that wasn't fully removed, and can be ignored.

The space between rows is used as a "wiring channel" that holds the wiring between the cells. The photo below zooms out to show four rows of standard cells (the dark bands) and the wiring in between. The 386 uses three layers for this wiring: polysilicon and the upper metal layer (M2) for vertical segments and the lower metal layer (M1) for horizontal segments.

Some standard-cell logic in the 386 processor.

Some standard-cell logic in the 386 processor.

To summarize, with standard cell logic, the cells are obtained from the standard cell library as needed, defining the transistor layout and the wiring inside the cell. However, the locations of each cell (placing) need to be determined, as well as how to arrange the wiring (routing). As will be seen, placing and routing the cells can be done manually or automatically.

Use of standard cells in the 386

Fairly late in the design process, the 386 team decided to use automatic place and route for parts of the chip. By using automatic place and route, 2,254 gates (consisting of over 10,000 devices) were placed and routed in seven weeks. (These numbers are from a paper "Automatic Place and Route Used on the 80386", co-written by Pat Gelsinger, now the CEO of Intel. I refer to this paper multiple times, so I'll call it APR386 for convenience.4) Automatic place and route was not only faster, but it avoided the errors that crept in when layout was performed manually.5

The "place" part of automatic place and route consists of determining the arrangement of the standard cells into rows to minimize the distance between connected cells. Running long wires between cells wastes space on the die, since you end up with a lot of unnecessary metal wiring. But more importantly, long paths have higher resistance, slowing down the signals. Placement is a difficult optimization problem that is NP-complete. Moreover, the task was made more complicated by weighting paths by importance and electrical characteristics, classifying signals as "normal", "fast", or "critical". Paths were also weighted to encourage the use of the thicker M2 metal layer rather than the lower M1 layer.

The 386 team solved the placement problem with a program called Timberwolf, developed by a Berkeley grad student. As one member of the 386 team said, "If management had known that we were using a tool by some grad student as a key part of the methodology, they would never have let us use it." Timberwolf used a simulated annealing algorithm, based on a simulated temperature that decreased over time. The idea is to randomly move cells around, trying to find better positions, but gradually tighten up the moves as the "temperature" drops. At the end, the result is close to optimal. The purpose of the temperature is to avoid getting stuck in a local minimum by allowing "bad" changes at the beginning, but then tightening up the changes as the algorithm progresses.

Once the cells were placed in their positions, the second step was "routing", generating the layout of all the wiring. A suitable commercial router was not available in 1984, so Intel developed its own. As routing is a difficult problem (also NP-complete), they took an iterative heuristic approach, repeatedly routing until they found the smallest channel height that would work. (Thus, the wiring channels are different sizes as needed.) Then they checked the R-C timing of all the signals to find any signals that were too slow. Designers could boost the size of the associated drivers (using the variety of available standard cells) and try the routing again.

Brief CMOS overview

The 386 was the first processor in Intel's x86 line to be built with a technology called CMOS instead of using NMOS. Modern processors are all built from CMOS because CMOS uses much less power than NMOS. CMOS is more complicated to construct, though, because it uses two types of transistors—NMOS and PMOS—so early processors were typically NMOS. But by the mid-1980s, the advantages of switching to CMOS were compelling.

The diagram below shows how an NMOS transistor is constructed. The transistor can be considered a switch between the source and drain, controlled by the gate. The source and drain regions (green) consist of silicon doped with impurities to change its semiconductor properties, forming N+ silicon. The gate consists of a layer of polysilicon (red), separated from the silicon by a very thin insulating oxide layer. Whenever polysilicon crosses active silicon, a transistor is formed. A PMOS transistor has similar construction except it swaps the N-type and P-type silicon, consisting of P+ regions in a substrate of N silicon.

Diagram showing the structure of an NMOS transistor.

Diagram showing the structure of an NMOS transistor.

The NMOS and PMOS transistors are opposite in their construction and operation. An NMOS transistor turns on when the gate is high, while a PMOS transistor turns on when the gate is low. An NMOS transistor is best at pulling its output low, while a PMOS transistor is best at pulling its output high. In a CMOS circuit, the transistors work as a team, pulling the output high or low as needed; this is the "Complementary" in CMOS. (The behavior of MOS transistors is complicated, so this description is simplified, just enough to understand digital circuits.)

One complication is that NMOS transistors are built on P-type silicon, while PMOS transistors are built on N-type silicon. Since the silicon die itself is N silicon, the NMOS transistors need to be surrounded by a tub or well of P silicon.6 The cross-section diagram below shows how the NMOS transistor on the left is embedded in a well of P-type silicon.

Simplified structure of the CMOS circuits.

Simplified structure of the CMOS circuits.

For proper operation, the silicon that surrounds transistors needs to be connected to the appropriate voltage through "tap" contacts.7 For PMOS transistors, the substrate is connected to power through the taps, while for NMOS transistors the well region is connected to ground through the taps. The chip needs to have enough taps to keep the voltage from fluctuating too much; each standard cell typically has a positive tap and a ground tap.

The actual structure of the integrated circuit is much more three-dimensional than the diagram above, due to the thickness of the various layers. The diagram below is a more accurate cross-section. The 386 has two layers of metal: the lower metal layer (M1) in blue and the upper metal layer (M2) in purple. Polysilicon is colored red, while the insulating oxide layers are gray.

Cross-section of CHMOS III transistors. From A double layer metal CHMOS III technology, image colorized by me.

Cross-section of CHMOS III transistors. From A double layer metal CHMOS III technology, image colorized by me.

This complicated three-dimensional structure makes it harder to interpret the microscope images. Moreover, the two metal layers obscure the circuitry underneath. I have removed various layers with acids for die photos, but even so, the images are harder to interpret than those of simpler chips. If the die photos look confusing, don't be surprised.

A logic gate in CMOS is constructed from NMOS and PMOS transistors working together. The schematic below shows a NAND gate with two PMOS transistors in parallel above and two NMOS transistors in series below. If both inputs are high, the two NMOS transistors turn on, pulling the output low. If either input is low, a PMOS transistor turns on, pulling the output high. (Recall that NMOS and PMOS are opposites: a high voltage turns an NMOS transistor on while a low voltage turns a PMOS transistor on.) Thus, the CMOS circuit below produces the desired output for the NAND function.

A CMOS NAND gate.

A CMOS NAND gate.

The diagram below shows how this NAND gate is implemented in the 386 as a standard cell.9 A lot is going on in this cell, but it boils down to four transistors, as in the schematic above. The yellow region is the P-type silicon that forms the two PMOS transistors; the transistor gates are where the polysilicon (red) crosses the yellow region.8 (The middle yellow region is the drain for both transistors; there is no discrete boundary between the transistors.) Likewise, the two NMOS transistors are at the bottom, where the polysilicon (red) crosses the active silicon (green). The blue lines indicate the metal wiring for the cell. I thinned these lines to make the diagram clearer; in the actual cell, the metal lines are as thick as they can be without touching, so they cover most of the cell. The black circles are contacts, connections between the metal and the silicon or polysilicon. Finally, the well taps are the opposite type of silicon, connected to the underlying silicon well or substrate to keep it at the proper voltage.

A standard cell for NAND in the 386.

A standard cell for NAND in the 386.

Wiring to a cell's inputs and output takes place at the top or bottom of the cell, with wiring in the channels between rows of cells. The polysilicon input and output lines are thickened at the top and bottom of the cell to allow connections to the cell. The wiring between cells can be done with either polysilicon or metal. Typically the upper metal layer (M2) is used for vertical wiring, while the lower metal layer (M1) is used for horizontal runs. Since each standard cell only uses M1, vertical wiring (M2) can pass over cells. Moreover, a cell's output can also use a vertical metal wire (M2) rather than the polysilicon shown. The point is that there is a lot of flexibility in how the system can route wires between the cells. The power and ground wires (M1) are horizontal so they can run from cell to cell and a whole row can be powered from the ends.

The photo below shows this NAND cell with the metal layers removed by acid, leaving the silicon and the polysilicon. You can match the features in the photo with the diagram above. The polysilicon appears green due to thin-film effects. At the bottom, two polysilicon lines are connected to the inputs.

Die photo of the NAND standard cell with the metal layers removed. The image isn't as clear as I would like, but it was very difficult to remove the metal without destroying the polysilicon.

Die photo of the NAND standard cell with the metal layers removed. The image isn't as clear as I would like, but it was very difficult to remove the metal without destroying the polysilicon.

The photo below shows how the cell appears in the original die. The two metal layers are visible, but they hide the polysilicon and silicon underneath. The vertical metal stripes are the upper (M2) wiring while the lower metal wiring (M1) makes up the standard cell. It is hard to distinguish the two metal layers, which makes interpretation of the images difficult. Note that the metal wiring is wide, almost completely covering the cell, with small gaps between wires. The contacts are visible as dark circles. Is hard to recognize the standard cells from the bare die, as the contact pattern is the only distinguishing feature.

Die photo of the NAND standard cell showing the metal layer.

Die photo of the NAND standard cell showing the metal layer.

One of the interesting features of the 386's standard cell library is that each type of logic gate is available in multiple drive strengths. That is, cells are available with small transistors, large transistors, or multiple transistors in parallel. Because the wiring and the transistor gates have capacitance, a delay occurs when changing state. Bigger transistors produce more current, so they can switch the values on a wire faster. But there are two disadvantages to bigger transistors. First, they take up more space on the die. But more importantly, bigger transistors have bigger gates with more capacitance, so their inputs take longer to switch. (In other words, increasing the transistor size speeds up the output but slows the input, so overall performance could end up worse.) Thus, the sizes of transistors need to be carefully balanced to achieve optimum performance.10 With a variety of sizes in the standard cell library, designers can make the best choices.

The image below shows a small NAND gate. The design is the same as the one described earlier, but the transistors are much smaller. (Note that there is one row of metal contacts instead of two or three.) The transistor gates are about half as wide (measured vertically) so the NAND gate will produce about half the output current.11

Die photo of a small NAND standard cell with the metal removed.

Die photo of a small NAND standard cell with the metal removed.

Since the standard cells are all the same height, the maximum size of a transistor is limited. To provide a larger drive strength, multiple transistors can be used in parallel. The NAND gate below uses 8 transistors, four PMOS and four NMOS, providing twice as much current.

A large NAND gate as it appears on the die, with the metal removed. The left side is slightly obscured by some remaining oxide.

A large NAND gate as it appears on the die, with the metal removed. The left side is slightly obscured by some remaining oxide.

The diagram below shows the structure of the large NAND gate, essentially two NAND gates in parallel. Note that input 1 must be provided separately to both halves by the routing outside the cell. Input 2, on the other hand, only needs to be supplied to the cell once, since it is wired to both halves inside the cell.

A diagram showing the structure of the large NAND gate.

A diagram showing the structure of the large NAND gate.

Inverters are also available in a variety of drive strengths, from very small to very large, as shown below. The inverter on the left uses the smallest transistors, while the inverter on the right not only uses large transistors but is constructed from six inverters in parallel. One polysilicon input controls all the transistors.

A small inverter and a large inverter.

A small inverter and a large inverter.

A more complex standard cell is XOR. The diagram below shows an XOR cell with large drive current. (There are smaller XOR cells). As with the large NAND gate, the PMOS transistors are doubled up for more current. The multiple input connections are handled by the routing outside the cell. Since the NMOS transistors don't need to be doubled up, there is a lot of unused space in the lower part of the cell. The extra space is used for a very large tap contact, consisting of 24 contacts to ground the well.

The structure of an XOR cell with large drive current.

The structure of an XOR cell with large drive current.

XOR is a difficult gate to build with CMOS. The cell above implements it by combining a NOR gate and an AND-NOR gate, as shown below. You can verify that if both inputs are 0 or both inputs are 1, the output is forced low as desired. In the layout above, the NOR gate is on the left, while the AND-NOR gate has the AND part on the right. A metal wire down the center connects the NOR output to the AND-NOR input. The need for two sub-gates is another reason why the XOR cell is so large.

Schematic of the XOR cell.

Schematic of the XOR cell.

I'll describe one more cell, the latch, which holds one bit and is controlled by a clock signal. Latches are heavily used in the 386 whenever a signal needs to be remembered or a circuit needs to be synchronous. The 386 has multiple types of standard cell latches including latches with set or reset controls and latches with different drive strengths. Moreover, two latches can be combined to form an edge-triggered flip-flop standard cell.

The schematic below shows the basic latch circuit, the most common type in the 386. On the right, two inverters form a loop. This loop can stably hold a 0 or 1 value. On the left, a PMOS transistor and an NMOS transistor form a transmission gate. If the clock is high, both transistors will turn on and pass the input through. If the clock is low, both transistors will turn off and block the input. The trick to the latch is that one inverter is weak, producing just a small current. The consequence is that the input can overpower the inverter output, causing the inverter loop to switch to the input value. The result is that when the clock is high, the latch will pass the input value through to the output. But when the clock is low, the latch will hold its previous value. (The output is inverted with respect to the input, which is slightly inconvenient but reduces the size of the latch.)

Schematic of a latch.

Schematic of a latch.

The standard cell layout of the latch (below) is complicated, but it corresponds to the schematic. At the left are the PMOS and NMOS transistors that form the transmission gate. In the center is the weak inverter, with its output to the left. The weak transistors are in the middle; they are overlapped by a thick polysilicon region, creating a long gate that produces a low current.12 At the right is the inverter that drives the output. The layout of this circuit is clever, designed to make the latch as compact as possible. For example, the two inverters share power and ground connections. Notice how the two clock lines pass from top to bottom through gaps in the active silicon so each line only forms one transistor. Finally, the metal line in the center connects the transmission gate outputs and the weak inverter output to the other inverter's input, but asymmetrically at the top so the two inverters don't collide.

The standard cell layout of a latch.

The standard cell layout of a latch.

To summarize, I examined many (but not all) of the standard cells in the 386 and found about 70 different types of cells. These included the typical logic gates with various drive strengths: inverters, buffers, XOR, XNOR, AND-NOR, and 3- and 4-input logic gates. There are also transmission gates including ones that default high or low, as well as multiplexers built from transmission gates. I found a few cells that were surprising such as dual inverters and a combination 3-input and 2-input NAND gate. I suspect these consist of two standard cells that were merged together, since they seem too specialized to be part of a standard cell library.

The APR386 paper showed six of the standard cells in the 386 with the diagram below. The small and large inverters are the same as the ones described above, as is the NAND gate NA2B. The latch is similar to the one described above, but with larger transistors. The APR386 paper also showed a block of standard cells, which I was able to locate in the 386.13

Examples of standard cells, from APR386. The numbers are not defined but may indicate input and output capacitance. (Click for a larger version.)

Examples of standard cells, from APR386. The numbers are not defined but may indicate input and output capacitance. (Click for a larger version.)

Intel's standard cell line

Intel productized its standard cells around 1986 as a 1.5 µm library using Intel's CMOS technology (called CHMOS III).14 Although the library had over 100 cell types, it was very limited compared to the cells used inside the 386. The library included logic gates, flip-flops, and latches as well as scalable registers, counters, and adders. Most gates only came in one drive strength. Even inverters only came in "normal" and "high" drive strength. I assume these cells are the same as the ones used in the 386, but I don't have proof. The library also included larger devices such as a cell-compatible 80C51 microcontroller and PC peripheral chips such as the 8259 programmable interrupt controller and the 8254 programmable interval timer. I think these were re-implemented using standard cells.

Intel later produced a 1.0 µm library using CHMOS IV, for use "both by ASIC customers and Intel's internal chip designers." This library had a larger collection of drive strengths. The 1.0 µm library included the 80C186 and associated peripheral chips.

Layout techniques in the 386

In this section, I'll look at the active silicon regions, making the cells themselves more visible. In the photos below, I dissolved the metal and polysilicon, leaving the active silicon. (Ignore the irregular greenish shapes; these are oxide that wasn't fully removed.)

The photo below shows the silicon for three rows of standard cells using automatic place and route. You can see the wide variety of standard cell widths, but the height of the cells is constant. The transistor gates are visible as the darker vertical stripes across the silicon. You may be able to spot the latch in each row, distinguished by the long, narrow transistors of the weak inverters.

Three rows of standard cells that were automatically placed and routed.

Three rows of standard cells that were automatically placed and routed.

In the first row, the larger PMOS transistors are on top, while the smaller NMOS transistors are below. This pattern alternates from row to row, so the second row has the NMOS transistors on top and the third row has the PMOS transistors on top. The height of the wiring channel between the cells is variable, made as small as possible while fitting the wiring.

The 386 also contains regions of standard cells that were apparently manually placed and routed, as shown in the photo below. Using standard cells avoids the effort of laying out each transistor, so it is still easier than a fully custom layout. These cells are in rows, but the rows are now double rows with channels in between. The density is higher, but routing the wires becomes more challenging.

Three rows of standard cells that were manually placed and routed.

Three rows of standard cells that were manually placed and routed.

For critical circuitry such as the datapath, the layout of each transistor was optimized. The register file, for example, has a very dense layout as shown below. As you can see, the density is much higher than in the previous photos. (The three photos are at the same scale.) Transistors are packed together with very little wasted space. This makes the layout difficult since there is little room for wiring. For this particular circuit, the lower metal layer (M1) runs vertically with signals for each bit while the upper metal layer (M2) runs horizontally for power, ground, and control signals.15

Three rows of standard cells that were manually placed and routed.

Three rows of standard cells that were manually placed and routed.

The point of this is that the 386 uses a variety of different design techniques, from dense manual layout to much faster automated layout. Different techniques were used for different parts of the chip, based on how important it was to optimize. For example, circuits in the datapath were typically repeated 32 times, once for each bit, so manual effort was worthwhile. The most critical functional blocks were the microcode ROM (CROM), large PLAs, ALU, TLB (translation lookaside buffer), and the barrel shifter.16

Conclusions

Standard cell logic and automatic place and route have a long history before the 386, back to the early 1970s, so this isn't an Intel invention.17 Nonetheless, the 386 team deserves the credit for deciding to use this technology at a time when it was a risky decision. They needed to develop custom software for their placing and routing needs, so this wasn't a trivial undertaking. This choice paid off and they completed the 386 ahead of schedule. The 386 ended up being a huge success for Intel, moving the x86 architecture to 32-bits and defining the dominant computer architecture for the rest of the 20th century.

If you're interested in standard cell logic, I wrote about standard cell logic in an IBM chip. I plan to write more about the 386, so follow me on Twitter @kenshirriff or RSS for updates. I'm also on Mastodon occasionally as @[email protected]. Thanks to Pat Gelsinger and Roxanne Koester for providing helpful papers.

Notes and references

  1. The decision to use automatic place and route is described on page 13 of the Intel 386 Microprocessor Design and Development Oral History Panel, a very interesting document on the 386 with discussion from some of the people involved in its development. 

  2. Circuits that had a high degree of regularity, such as the arithmetic/logic unit (ALU) or register storage were typically constructed by manually laying out a block to implement a bit and then repeating the block as needed. Because a circuit was repeated 32 times for the 32-bit processor, the additional effort was worthwhile. 

  3. An alternative layout technique is the gate array, which doesn't provide as much flexibility as a standard cell approach. In a gate array (sometimes called a master slice), the chip had a fixed array of transistors (and often resistors). The chip could be customized for a particular application by designing the metal layer to connect the transistors as needed. The density of the chip was usually poor, but gate arrays were much faster to design, so they were advantageous for applications that didn't need high density or produced a relatively small volume of chips. Moreover, manufacturing was much faster because the silicon wafers could be constructed in advance with the transistor array and warehoused. Putting the metal layer on top for a particular application could then be quick. Similar gate arrays used a fixed arrangement of logic gates or flip-flops, rather than transistors. Gate arrays date back to 1967

  4. The full citation for the APR386 paper is "Automatic Place and Route Used on the 80386" by Joseph Krauskopf and Pat Gelsinger, Intel Technology Journal, Spring 1986. I was unable to find it online. 

  5. Once the automatic place and route process had finished, the mask designers performed some cleanup along with compaction to squeeze out wasted space, but this was a relatively minor amount of work.

    While manual optimization has benefits, it can also be overdone. When the manufacturing process improved, the 80386 moved from a 1.5 µm process to a 1 µm process. The layout engineers took advantage of this switch to optimize the standard cell circuitry, manually squeezing out some extra space. Unfortunately, optimizing one block of a die doesn't necessarily make the die smaller, since the size is constrained by the largest blocks. The result is that the optimized 80386 has blocks of empty space at the bottom (visible as black rectangles) and the standard-cell optimization didn't provide any overall benefit. (As the Pentium Pro chief architect Robert Colwell explains, "Removing the state of Kansas does not make the perimeter of the United States any smaller.")

    Comparison of the 1.5 µm die and the 1 µm die at the same scale. Photos courtesy of Antoine Bercovici.

    Comparison of the 1.5 µm die and the 1 µm die at the same scale. Photos courtesy of Antoine Bercovici.

    At least compaction went better for the 386 than for the Pentium. Intel performed a compaction on the Pentium shortly before release, attempting to reduce the die size. The engineers shrunk the floating point divider, removing some lookup table cases that they proved were unnecessary. Unfortunately, the proof was wrong, resulting in floating point errors in a few cases. This caused the infamous Pentium FDIV bug, a problem that became highly visible to the general public. Replacing the flawed processors cost Intel 475 million dollars. And it turned out that shrinking the floating point divider had no effect on the overall die size.

    Coincidentally, early models of the 386 had an integer multiplication bug, but Intel fixed this with little cost or criticism. The 386 bug was an analog issue that only showed up unpredictably with a combination of argument values, temperature, and manufacturing conditions. 

  6. This chip is built on a substrate of N-type silicon, with wells of P-type silicon for the NMOS transistors. Chips can be built the other way around, starting with P-type silicon and putting wells of N-type silicon for the PMOS transistors. Another approach is the "twin-well" CMOS process, constructing wells for both NMOS and PMOS transistors. 

  7. The bulk silicon voltage makes the boundary between a transistor and the bulk silicon act as a reverse-biased diode, so current can't flow across the boundary. Specifically, for a PMOS transistor, the N-silicon substrate is connected to the positive supply. For an NMOS transistor, the P-silicon well is connected to ground. A P-N junction acts as a diode, with current flowing from P to N. But the substrate voltages put P at ground and N at +5, blocking any current flow. The result is that the bulk silicon can be considered an insulator, with current restricted to the N+ and P+ doped regions. If this back bias gets reversed, for example, due to power supply fluctuations, current can flow through the substrate. This can result in "latch-up", a situation where the N and P regions act as parasitic NPN and PNP transistors that latch into the "on" state. This shorts power and ground and can destroy the chip. The point is that the substrate voltages are very important for the proper operation of the chip. 

  8. I'm using the standard CMOS coloring scheme for my diagrams. I'm told that Intel uses a different color scheme internally. 

  9. The schematic below shows the physical arrangement of the transistors for the NAND gate, in case it is unclear how to get from the layout to the logic gate circuit. The power and ground lines are horizontal so power can pass from cell to cell when the cells are connected in rows. The gate's inputs and outputs are at the top and bottom of the cell, where they can be connected through the wiring channels. Even though the transistors are arranged horizontally, the PMOS transistors (top) are in parallel, while the NMOS transistors (bottom) are in series.

    Schematic of the NAND gate as it is arranged in the standard cell.

    Schematic of the NAND gate as it is arranged in the standard cell.

     

  10. The 1999 book Logical Effort describes a methodology for maximizing the performance of CMOS circuits by correctly sizing the transistors. 

  11. Unfortunately, the word "gate" is used for both transistor gates and logic gates, which can be confusing. 

  12. You might expect that these transistors would produce more current since they are larger than the regular transistors. The reason is that a transistor's current output is proportional to the gate width divided by the length. Thus, if you make the transistor bigger in the width direction, the current increases, but if you make the transistor bigger in the length direction, the current decreases. You can think of increasing width as acting as multiple transistors in parallel. Increasing length, on the other hand, makes a longer path for current to get from the source to the drain, weakening it. 

  13. The APR386 paper discusses the standard-cell layout in detail. It includes a plot of a block of standard-cell circuitry (below).

    A block of standard-cell circuitry from APR386.

    A block of standard-cell circuitry from APR386.

    After carefully studying the 386 die, I was able to find the location of this block of circuitry (below). The two regions match exactly; they look a bit different because the M1 metal layer (horizontal) doesn't show up in the plot above.

    The same block of standard cells on the 386 die.

    The same block of standard cells on the 386 die.

     

  14. Intel's CHMOS III standard cells are documented in Introduction to Intel Cell-Based Design (1988). The CHMOS IV library is discussed in Design Methodology for a 1.0µ Cell-based Library Efficiently Optimized for Speed and Area. The paper Validating an ASIC Standard Cell Library covers both libraries. 

  15. For details on the 386's register file, see my earlier article

  16. Source: "High Performance Technology Circuits and Packaging for the 80386", Jan Prak, Proceedings, ICCD Conference, Oct. 1986. 

  17. I'll provide more history on standard cells in this footnote. RCA patented a bipolar standard cell in 1971, but this was a fixed arrangement of transistors and resistors, more of a gate array than a modern standard cell. Bell Labs researched standard cell layout techniques in the early 1970s, calling them Polycells, including a 1973 paper by Brian Kernighan. By 1979 A Guide to LSI Implementation discussed the standard cell approach and it was described as well-known in this patent application. Even so, Electronics called these design methods "futuristic" in 1980.

    Standard cells became popular in the mid-1980s as faster computers and improved design software made it practical to produce semi-custom designs that used standard cells. Standard cells made it to the cover of Digital Design in August 1985, and the article inside described numerous vendors and products. Companies like Zymos and VLSI Technology (VTI) focused on standard cells. Traditional companies such as Texas Instruments, NCR, GE/RCA, Fairchild, Harris, ITT, and Thomson introduced lines of standard cell products in the mid-1980s.