Our Latest News

How FPGAs work?

Field programmable gate arrays (FPGAs) can implement any digital logic, from microprocessors to video generators or cryptographic miners. FPGAs consist of many logic blocks, each typically consisting of flip-flops and logic functions and a routing network connecting the logic blocks. what makes FPGAs special is that they are programmable hardware: you can redefine each logic block and the connections between connections that can be used to build complex digital circuits without physically connecting individual gates and flip-flops and without the expense of designing a dedicated integrated circuit.

FPGAs were invented by Ross Freeman, who co-founded Xilinx in 1984 and introduced the first FPGA, the XC2064. This FPGA was much simpler than modern FPGAs, which contain only 64 logic blocks. Whereas there are thousands or millions of logic blocks in modern FPGAs, it led to the current multi-billion dollar FPGA industry. Because of its importance, the XC2064 was inducted into the Chip Hall of Fame. In this article, we reverse-engineered Xilinx’s XC2064, explaining its internal circuitry (above) and how the “bitstream” programs it.

Today, FPGAs are programmed using hardware description languages like Verilog or VHDL, but at the time Xilinx offered their own development software, XACT, running under the MS-DOS operating system for up to $12,000. naturally, XACT could not compare to today’s FPGA development tools, which define each logic block by user function (as shown in the screen shot below) and the connections between the logic blocks, wiring the connections together and generating a bitstream file that can be loaded into the FPGA.

The two lookup tables F and G implement the logic operations at the bottom of the screen, and the top section shows the Carnot diagram for that logic

The FPGA is configured via a bitstream (a sequence of bits with a proprietary format.) If you look at the XC2064’s bitstream (shown below), it is a puzzling mix of patterns that repeat irregularly and are scattered throughout the bitstream. there is no clear connection between the function definitions in XACT and the data in the bitstream. However, studying the physical circuitry of the FPGA can reveal the structure of the bitstream data, and can be understood.

XC2064 Bitstreaming

How FPGAs Work

The diagram below, from the original FPGA patent, shows the basic structure of an FPGA. In this simplified FPGA, there are 9 logic blocks (in blue) and 12 I/O pins. An interconnect network connects the components together. The logic blocks are connected to each other and to the I/O pins by setting switches (diagonal) on the interconnects. Each logic element can be programmed using the desired logic function. The result is a highly programmable chip that can implement any circuit suitable for the available.

FPGA patent shows logic blocks (LE) connected via interconnects

CLBs: Configurable Logic Blocks

While the figure above shows nine configurable logic blocks (CLBs), the XC2064 has 64 CLBs. the figure below shows the structure of each CLB. Each CLB has four inputs (A, B, C, and D) and two outputs (X and Y). Between the two is combinational logic that can be programmed using any desired logic function. the CLBs also contain a flip-flop that allows the FPGA to implement counters, shift registers, state machines, and other stateful circuits. The ladder is a multiplexer and can be programmed through any of its inputs. The multiplexer allows the CLB to be configured for a specific task, selecting the required signals for the flip-flop controls and outputs.

Configurable Logic Blocks in XC2064

So, how does combinational logic implement arbitrary logic functions? Does it use logic such as with gates, or gates, or different or gates?

No, it uses a clever trick called a look-up table (LUT), which actually contains a truth table of logical functions. For example, the functions of three variables are defined by 8 rows in their truth tables. the LUT consists of 8 bits of memory as well as multiplexed circuitry to select the correct value. By storing the values in this 8-bit memory, any 3-input logic function can be implemented.

Interconnects

The second key part of the FPGA is the interconnect, which can be programmed to connect the CLBs in different ways. the interconnect is quite complex, but a rough description is that there are several horizontal and vertical line segments between each CLB. the CLB interconnect points allow connections to be made between the horizontal and vertical lines, thus allowing arbitrary paths to be created.

More complex connections are made through “switchmatrices”. Each switchmatrix has 8 pins and can be connected together in (almost) any way. The diagram below shows the interconnect structure of the XC2064, providing connections to logic blocks (cyan) and I/O pins (yellow). The diagram shows a close-up of the routing function. The green box is the 8-pin switch matrix, while the small squares are the programmable interconnect points.

The XC2064 FPGA has an 8×8 grid of CLBs

Each CLB has a letter name from AA to HH. Interconnects can connect, for example, the output of block DC to the input of block DE, as shown below. The red lines indicate routing paths and the small red squares indicate activated routing points. After leaving the block DC, the signal is directed from the first routing point to the 8-pin switch (green), which directs it to two other routing points and another 8-pin switch. (Unused vertical and horizontal paths are not shown.) Note that the wiring is quite complex; even this short path uses four routing points and two switches.

Example of a signal routed from the output of block DC to block DE

The screen shot below shows how the routing looks in the XACT program. The yellow line indicates the routing between logical blocks. As signals are added, the challenge is to route them efficiently without conflicting paths. the XACT package performs automatic routing, but routes can also be edited manually.

Screenshot of the XACT program

This MS-DOS program is controlled by keyboard and mouse

ImplementaTIon

The rest of this article discusses the internal circuitry of the XC2064, reverse engineered from a photo of the die.

The following figure shows the layout of the XC2064 chip. the main part of the FPGA is an 8×8 grid. Each block contains a logic block and adjacent routing circuits. Although the picture shows the logic blocks (CLBs) as distinct entities from the routing around them, this is not the way the FPGA is implemented. Instead, each logic block and adjacent routing is implemented as a single entity, the picture block. (Specifically, the graph block includes the routes above and to the left of each CLB.)

Layout of the XC2064 chip

The I/O modules provide communication to the outside world around the edge of the IC. They are connected to small green square pads which are connected to the external pins of the chip. The bare die is divided by buffers (green): two vertical and two horizontal. These buffers amplify signals that propagate over long distances in the circuit, thus reducing delays. The vertical shift register (pink) and the horizontal column selection circuit (blue) are used to load the bit stream into the chip as described below.

TIle’s Internal Architecture

The figure below shows the layout of a single TIle in the XC2064; the chip contains 64 of these TIles crammed together as shown above. Approximately 40% of each Tile is occupied by the memory cell (green) that holds the configuration bits. The top third handles interconnect routing through two switch matrices and many individual routing switches, with the logic block below. The key parts of the logic block are the input multiplexers, flip-flops, and look-up tables (LUTs). Each block is connected to adjacent blocks via vertical and horizontal wiring for interconnect, power, and ground. Configuration data bits are fed horizontally to the memory cell, while vertical signals select the specific columns of the memory cell to be loaded.

Layout of a single Tile in XC2064

Transistors

The FPGA is implemented by CMOS logic which is built from NMOS and PMOS transistors. Transistors have two main roles in FPGAs. First, they can be combined to form logic gates. Second, the transistors are used as switches for signals to pass through, for example to control routing. In this role, the transistor is called a transmission transistor.

Structure of a MOSFET

The following close-up of the die photo shows the appearance of the transistor under the microscope. The polycrystalline silicon gate is a serpentine line between two doped silicon regions.

MOSFETs in FPGAs

Bitstream and Configuration Storage

The configuration information in the XC2064 is stored in the configuration memory cell. instead of using RAM blocks for storage, the FPGA’s memory is distributed across the chips in a 160 x 71 grid, ensuring that each bit is located next to its control circuit. The following diagram shows how the configuration bitstream is loaded into the FPGA. The bitstream is fed into a shift register running down from the center of the chip (pink). After loading 71 bits into the shift register, the column selection circuitry (blue) selects a specific memory column and loads into this column in parallel. Then, the next 71 bits are loaded into the shift register and the next column on the left will become the selected column. This process will repeat for all 160 columns of the FPGA, loading the entire bitstream into the chip. The use of shift registers avoids extensive memory addressing circuitry.

How the bitstream is loaded into the FPGA

Importantly, the distribution of the bitstream is exactly the same as in the file: the layout of the bits in the bitstream file matches the physical layout on the chip. As shown below, each bit is stored next to the FPGA control circuitry. Thus, the bitstream file format is directly determined by the layout of the hardware circuitry. For example, when there are gaps between FPGA slices due to buffering circuitry, the same gaps will appear in the bitstream. The content of a bitstream is not designed around software concepts such as fields, data tables, or configuration blocks. Understanding bitstreams depends on thinking from a hardware perspective rather than a software perspective.

Each bit of configuration memory is implemented as shown below. Each memory cell contains two inverters connected in a loop. The circuit has two stable states so that a bit can be stored: 1 for the top inverter and 0 for the bottom inverter, and vice versa. In order to write to this cell, the transmission transistor on the left side is activated, allowing the data signal to pass through. The signal on the data line only overloads the inverter, thus writing the desired bit. (You can also use the same path to read configuration data from the FPGA.) The Q and inverted Q outputs control desired functions in the FPGA, such as closing routing connections, providing bits for lookup tables, or controlling latch circuits. (In most cases, only the Q output is used.)

Schematic of a bit-configured memory from the data sheet

The top Q is the output and the bottom Q is the inverted output

The figure below shows the physical layout of the memory cells. The diagram on the left shows eight memory cells, with one cell highlighted. Each horizontal data line feeds into all the memory cells in that row. Each column selection line selects all the memory cells in that column for writing. The middle photo zooms in on the silicon and polysilicon transistors of one memory cell.

Physical layout of storage cells

Lookup Table Multiplexer

As mentioned earlier, FPGAs implement arbitrary logic functions by using lookup tables. The following figure shows how to implement a lookup table in the XC2064. The eight values on the left are stored in eight memory cells. Four multiplexers select one of each pair of values based on the A input value . If A is 0, then the highest value is selected; if A is 1, then the lowest value is selected. Next, the larger multiplexer selects one of the four values C based on B and In this case, the result is the desired value A XOR B XOR C. By placing different values in the lookup table, the logic function can be changed as needed.

XOR implementation using lookup tables

Each multiplexer is implemented by means of a transistor. Depending on the control signal, one of the pass transistors is activated to pass that input to the output. The diagram below shows a part of the LUT circuit with two of the bits multiplexed. On the right are two memory cells. Each bit is amplified by an inverter and then passes through the pass transistor of the middle multiplexer, which selects one of the bits.

Close-up of the circuit in the LUT implementation

Latch

Each CLB contains a flip-flop that allows the FPGA to implement latches, state machines and other stateful circuits. The diagram below shows the flip-flop implementation. It uses a primary/auxiliary design. When the clock is low, the first multiplexer lets data go to the main latch. When the clock goes high, the multiplexer closes the loop of the first latch and holds the value. (This bit is inverted twice by the “or” gate, the “vs.” gate and the inverter, so it remains unchanged.) Also, when the clock goes high, the auxiliary latch multiplexer receives this bit from the first latch (note that the clock is inverted). This value becomes the output of the flip-flop. When the clock goes low, the secondary multiplexer closes the loop, thereby latching the bit. Thus, the flip-flop is edge-sensitive and latches the value on the rising edge of the clock. The set and reset lines force the flip-flop high or low.

Trigger implementation with arrows pointing to the first multiplexer and two OP-NAND gates

8-pin switch matrix

The switch matrix is an important routing element. Each switch has eight “pins” (two on each side) that can be connected to almost any combination of pins. This allows signals to be turned, split, or crossed more flexibly than a single routing node. The diagram below shows a portion of the routing network between the four CLBs (cyan). The switch matrix (green) can be connected to any combination of the connections on the right. Note that each pin can be connected to 5 of the other 7 pins. For example, pin 1 can be connected to pin 3, but not to pins 2 or 4. This makes the matrix almost a horizontal column, with 20 potential connections instead of 28.

Xilinx Programmable Gate Array Based Data Sheet

The switch matrix is implemented by a row of transfer transistors that are controlled by the memory cells above and below. The transistors are flanked by two switch matrix pins that can be connected through this transistor. Thus, each switching matrix has 20 associated control bits.

Two matrices per block, i.e. 40 control bits are generated per block. The figure below shows one of the memory cells, which is connected to the long bend gate of the transmission transistor below. This transistor controls the connection between pin 5 and pin 1.

One of the memory cells

Therefore, the bit in the bit stream corresponding to that memory cell controls the switch connection between pin 5 and pin 1. Similarly, the other memory cells and their associated transistors control the other switch connections. Note that the order of these connections does not follow a specific pattern. Therefore, the mapping between the bitstream bits and the switch pins is random.

Input Routing

The inputs to the CLB use a different encoding scheme in the bitstream, which is explained by the hardware implementation. In the figure below, the eight circled nodes are potential inputs to the CLB box DD.

Schematic of the encoding scheme used for the inputs of the CLB in the bitstream

At most one node can be configured as an input, as connecting two signals to the same input will short them out. Use a multiplexer to select the desired input. A simple solution is to use an 8-way multiplexer, where 3 control bits select one of the 8 signals. Another simple solution is to use 8 pass transistors, each with its own control signal, one of which selects the desired signal. However, the FPGA uses a hybrid method that avoids the decoding hardware of the first method, but uses 5 control signals instead of the 8 control signals required by the second method.

FPGA uses multiplexer to select one of eight inputs

The schematic above shows the two-stage multiplexer approach used in FPGAs. In the first stage, one of the control signals is activated. In the second stage, a signal from the top or bottom is selected as an output. For example, suppose the control signal B/F is sent to the first stage and “ABCD” is sent to the second stage; input B is the only B that will be passed to the output. therefore, selecting one of the eight inputs requires the use of five bits in the bitstream and the use of five memory cells.

Conclusion

The XC2064 uses a variety of highly optimized circuits to implement its logic blocks and routing. The circuitry requires a compact layout to fit the chip. Even so, the XC2064 was a very large chip, larger than the microprocessors of the time, making it difficult to manufacture at first and costing hundreds of dollars. Compared to modern FPGAs, the XC2064 had a very small cell count, but even so, it sparked a revolutionary new product line.

The key to understanding the XC2064 bitstream is two concepts. First, FPGAs consist of 64 blocks, which are repeating blocks that combine logic blocks and routing. Although FPGAs are described as having logic blocks surrounded by routing, that is not how they are implemented.

The second concept is that there is no abstraction in the bitstream. It maps directly to the two-dimensional layout of the FPGA. Therefore, bitstreams only make sense if you consider the physical layout of the FPGA.

In this article, I will summarize several convolutions commonly used in deep learning and will try to explain them in a way that everyone can understand. 

    GET A FREE QUOTE

    FPGA IC & FULL BOM LIST

    We'd love to

    hear from you

    Highlight multiple sections with this eye-catching call to action style.

      Contact Us

      Exhibition Bay South Squre, Fuhai Bao’an Shenzhen China

      • Sales@ebics.com
      • +86.755.27389663