Fault tolerant cell array architecture

a cell array and fault-tolerant technology, applied in the field of data processing systems, can solve the problems of limiting the complexity of circuitry that can be fabricated in one piece without fatal flaws, requiring many separate chips in each system, and severely limiting the number of interconnections, so as to achieve moderate yield of defect-free arrays and high redundant

Inactive Publication Date: 2008-02-07
NORMAN RICHARD S
View PDF5 Cites 8 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Benefits of technology

[0020] It is therefore one object of the present invention to provide a highly redundant network of cells that allows a large array of cells to be organized from a monolithically fabricated unit, with at least moderate yields of defect-free arrays in spite of significant numbers of defective cells, where all array cells can be directly addressed and have access to a global data bus, allowing the cell array to be used as a compact high-performance memory system.
[0021] It is another object of the present invention to provide a highly redundant network of cells that allows a large array of cells to be organized on a monolithically fabricated unit, with at least moderate yields of defect-free arrays in spite of significant numbers of defective cells, where all array cells have bi-directional communication with their neighboring array cells in at least 3 total dimensions (of which least two dimensions are physical) allowing the cell array to be efficiently used as a parallel processing system on massively parallel tasks of 3-dimensional or higher connectivity.
[0022] It is another object of the present invention to provide a cell-based fault-tolerant array containing sufficient redundancy to allow cells large enough to contain RISC (Reduced Instruction Set Computer) or CISC (Complex Instruction Set Computer) processors to be used while maintaining at least moderate yields on up to wafer-sized arrays.

Problems solved by technology

But lithographic errors have set limits on the complexity of circuitry that can be fabricated in one piece without fatal flaws.
Portable computers using single-chip processors can be built on single circuit boards today, but because lithographic errors limit the size and complexity of today's chips, each system still requires many separate chips.
Using separate chips also creates off-chip data flow bottlenecks because the chips are connected on a macroscopic rather than a microscopic scale, which severely limits the number of interconnections.
Macroscopic inter-chip connections also increase power consumption.
Furthermore, even single board systems use separate devices external to that board for system input and output, further increasing system size and power consumption.
The most compact systems thus suffer from severe limits in battery life, display resolution, memory, and processing power.
Chip size limits, however, prevent the amount of on-chip memory from exceeding a tiny fraction of the memory used in a whole system.
While such shared memory parallel systems do remove the von Neumann uniprocessor bottleneck, the funneling of memory access from all the processors through a single data path rapidly reduces the effectiveness of adding more processors.
This requires complex macroscopic (and hence off-chip-bottleneck-limited) connections between the processors and external chips and devices, which rapidly increases the cost and complexity of the system as the number of processors is increased.
Having each processor set connected to an external I / O device also necessitates having a multitude of connections between the processor array and the external devices, thus greatly increasing the overall size, cost and complexity of the system.
The chip-size limit, however, forces a severe trade-off between number and size of processors in such architectures; the cm−1 chip used 1-bit processors instead of the 8-bit to 32-bit processors in common use at that time.
But even for massively parallel tasks, trading one 32-bit processor per chip for 32 one-bit processors per chip does not produce any performance gains except for those tasks where only a few bits at a time can be processed by a given processor.
Furthermore, these non-standard processors do not run standard software, requiring everything from operating systems to compilers to utilities to be re-written, greatly increasing the expense of programming such systems.
The sizes of these arrays are also limited by lithographic errors, so systems based on such arrays are subjected to the off-chip data flow bottleneck.
Since the output elements cannot add or subtract or edit-and-pass-on a data stream, such display elements can do no data decompression or other processing, so the output array requires a single uncompressed data stream, creating a band-width bottleneck as array size increases.
This necessity for perfection creates low yields and high costs for such displays.
But increased use of such links and increases in their range and data transfer rates are all increasing their demands for bandwidth.
Some electromagnetic frequency ranges are already crowded, making this transmission bottleneck increasingly a limiting factor.
Power requirements also limit the range of such systems and often require the transmitter to be physically pointed at the receiver for reliable transmission to occur.
Processors, however, have large numbers of circuits with unique functions (often referred to in the art as random logic circuits), and a spare circuit capable of replacing one kind of defective circuit cannot usually replace a different kind, making these general spare-circuit schemes impractical for processors.
Of these replication schemes, circuit duplication schemes, as exemplified by U.S. Pat. Nos. 4,798,976 and 5,111,060, use the least resources for redundancy, but provide the least protection against defects because two defective copies of a given circuit (or a defect in their joint output line) still creates an uncorrectable defect.
This, however, leads to a dilemma: When the voting is done on the output of large blocks of circuitry, there is a significant chance that two out of the three copies will have defects, but when the voting is done on the output of small blocks of circuitry, many voting circuits are needed, increasing the likelihood of errors in the voting circuits themselves) Ways to handle having two defective circuits out of three (which happens more frequently than the 2 defects out of 2 problem that the duplication schemes face) are also known.
Not only is a large N an inefficient use of space, but it increases the complexity of the voting circuits themselves, and therefore the likelihood of failures in them.
While this scheme can be applied to integrated circuits (although it predates them considerably), it requires four times as many gates, each with twice as many inputs, as equivalent non-redundant logic, increasing the circuit area and power requirements too much to be practical.
All these N-fold redundancy schemes also suffer from problems where if the replicates are physically far apart, gathering the signals requires extra wiring, creating propagation delays, while if the replicates are close together, a single large lithographic error can annihilate the replicates en masse, thus creating an unrecoverable fault.
The resulting one-dimensional chains, however, lack the direct addressability needed for fast memory arrays, the positional regularity of array cells needed for I / O arrays, and the two-dimensional or higher—neighbor-to-neighbor communication needed to efficiently handle most parallel processing tasks.
This limits the usefulness of these arrangements low or medium performance memory systems and to tasks dominated by one-dimensional or lower connectivity, such as sorting data.
Addressing cells through a global bus has significant drawbacks; it does not allow parallel access of multiple cells, and comparing the cell's address with an address on the bus introduces a delay in accessing the cell.
Furthermore, with large numbers of cells it is an inefficient user of power; in order for N cells to determine whether they are being addressed, each must check a minimum of log 2(N) address bits (in binary systems), so an address signal requires enough power to drive N*log 2(N) inputs.
This is a high price in a system where all intercell signals are global.
Several considerations, however, diminish its applicability to large high-performance array at all but the lowest defect densities.
Thus while large cells create high defect rates, small cells sizes create significant delays in the propagation of signals across the array.
As cell size decreases, yields grow rapidly, but the propagation delays grow, too.
But row-addressing signals propagated across the array would pass sequentially through up to 30 gates, creating far too long a delay for high-performance memory systems.
This interconnection scheme also has problems when used for processing cells, although it is targeted for that use.
The cell bypassing scheme does support two-dimensional neighbor-to-neighbor connectivity, and could support a column-oriented bus for each column, but it cannot support a corresponding row-oriented bus without the 2-gate-per-cell delay.
This multi-cell shift also prevents this scheme from being useful in arrays where physical position of array cells is important, such as direct input or output cell arrays.

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Fault tolerant cell array architecture
  • Fault tolerant cell array architecture
  • Fault tolerant cell array architecture

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0100] Direct Replacement Cell Fault Tolerant Architecture

[0101] Because lithographic errors limit the size of traditional chips, chip-based computer architectures use many separate chips for processing, memory and input / output control. A number of these separate processor, memory, and auxiliary chips are encapsulated in bulky ceramic packages and affixed to even bulkier printed circuit boards to connect to each other. A svelte processor chip like IBM / Apple / Motorola's PowerPC 601, for example, uses a ceramic holder 20 times its own size to allow it to be connected to a still-larger circuit board. While each chip use wires fabricated on a microscopic scale (on the order of 1 micron) internally, the board-level interconnections between the chips use wires fabricated on a macroscopic scale (on the order of 1 millimeter, or 1000 times as wide). Because of this chip-based architectures not only suffer from the expense of dicing wafers into chips then packaging and interconnecting those ...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

PropertyMeasurementUnit
total thicknessaaaaaaaaaa
circumferenceaaaaaaaaaa
circumferenceaaaaaaaaaa
Login to view more

Abstract

A data processing system containing a monolithic network of cells with sufficient redundancy provided through direct logical replacement of defective cells by spare cells to allow a large monolithic array of cells without uncorrectable defects to be organized, where the cells have a variety of useful properties. The data processing system according to the present invention overcomes the chip-size limit and off-chip connection bottlenecks of chip-based architectures, the von Neumann bottleneck of uniprocessor architectures, the memory and I / O bottlenecks of parallel processing architectures, and the input bandwidth bottleneck of high-resolution displays, and supports integration of up to an entire massively parallel data processing system into a single monolithic entity.

Description

[0001] This application is a divisional of U.S. application Ser. No. 10 / 368,003 which is a continuation of U.S. application Ser. No. 10 / 000,813, filed on Nov. 30, 2001 entitled “Output and / or Input Coordinated Processing Array”, which is a continuation of U.S. application Ser. No. 09 / 679,168, filed on Oct. 4, 2000 entitled “Efficient Direct Replacement Cell Fault Tolerant Architecture”, which is a continuation of U.S. application Ser. No. 09 / 376,194, filed on Aug. 18, 1999 entitled “Efficient Direct Replacement Cell Fault Tolerant Architecture”, which is a continuation of U.S. application Ser. No. 08 / 821,672, filed on Mar. 19, 1997 entitled “A Fault Tolerant Data Processing System Fabricated on a Monolithic Substrate”, which is a continuation of U.S. application Ser. No. 08 / 618,397, filed on Mar. 19, 1996 entitled “Efficient Direct Replacement Cell Fault Tolerant Architecture”, which is a continuation of U.S. application Ser. No. 08 / 216,262, filed on Mar. 22, 1994 entitled “Efficien...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
Patent Type & Authority Applications(United States)
IPC IPC(8): G06F9/30G02F1/133G02F1/135G06F3/033G06F3/041G06F3/043G06F3/048G06F11/20G06F15/76G06F15/80G09G3/20H01L27/144H01L27/146H01L27/148H02H3/05
CPCG02F1/13318H01L27/14831G06F3/041G06F3/043G06F3/0488G06F3/04886G06F11/20G06F11/2051G06F15/8007G06F15/8023G09G3/20G09G3/2085G09G3/2088G09G2300/0809G09G2300/0842G09G2330/08G09G2360/18H01L27/1446H01L27/14634G02F1/135G06F11/2041
Inventor NORMAN, RICHARD S.
Owner NORMAN RICHARD S
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products