Fault injection and deterministic repair for debug of packaged semiconductor dies
By incorporating a built-in self-test circuit and deterministic repair technology, the problem of detecting and repairing interconnect failure modes and system failure modes in multi-chip packaging is solved, thereby improving manufacturing yield and system reliability and reducing manufacturing costs.
Patent Information
- Authority / Receiving Office
- CN · China
- Patent Type
- Applications(China)
- Current Assignee / Owner
- INTEL CORP
- Filing Date
- 2025-11-26
- Publication Date
- 2026-06-30
AI Technical Summary
In the semiconductor chip manufacturing process, as devices shrink and performance requirements increase, package-level integration faces challenges such as undesirable material interactions, precision and scaling requirements, power delivery requirements, limited failure tolerance, and material and manufacturing costs. In particular, in multi-chip packaging, interconnect failure modes and system failure modes are difficult to detect and repair effectively.
Employing built-in self-test (BIST) circuitry and deterministic repair technology, faults are injected and deterministically repaired by programming the fault channel register, per-channel error register, and error injection register via the JTAG electrical protocol. The signal paths in the multi-chip package (MCP) are tested and repaired using the FBIST machine.
It enables effective detection and repair of failure modes in multi-chip packaging, improving manufacturing yield, reducing manufacturing costs, and enhancing system reliability and performance.
Smart Images

Figure CN122307300A_ABST
Abstract
Description
Technical Field
[0001] The description generally relates to semiconductor devices, and more specifically, the description relates to testing that may include multi-chip packages of stacked dies, high-density interconnects, interposers, and systems-on-a-chip. Background Technology
[0002] Semiconductor chips are central to intelligent devices and systems such as personal computers, laptops, tablets, phones, servers, and other consumer and industrial products and systems. Manufacturing semiconductor chips presents numerous challenges, which are amplified as devices become smaller and performance demands increase. These challenges include, for example, undesirable material interactions, precision and scaling requirements, power delivery requirements, limited failure tolerance, and material and manufacturing costs.
[0003] Semiconductor devices, including Systems-on-a-Chip (SoCs), can integrate so-called partitions or blocks onto a single die. An SoC can include, for example, a Central Processing Unit (CPU), a Graphics Processing Unit (GPU), an Image Processing Unit (IPU), a Neural Processing Unit (NPU), a Bus Interface Unit (BIU), and / or a Data Streaming Accelerator (DSA). As a result, die size increases and leads to significant manufacturing cost impacts. Package-level integration can mitigate some of the negative manufacturing cost impacts of SoC integration. Various package patterns are possible, such as, for example, 2D planar multi-chip packages (MCPs), 3D packages (e.g., two or more dies on a base die), or other types of stacked and / or side-by-side MCPs. Multi-chip packages can provide small form factor and high die density with fine-pitch die interconnects. The shorter and smaller the interconnects, the lower the parasitic capacitance and delay, which can lead to superior performance of the MCP. Attached Figure Description
[0004] Figures are provided to aid in understanding this disclosure. Figures may include illustrations and descriptions of examples of structures, components, data, methods, and systems. For ease of explanation and understanding, these structures, components, data, methods, and systems in the figures are not exhaustive descriptions. Therefore, the figures should not be construed as depicting the entire boundaries and limits of possible structures, components, data, methods, and systems without departing from the scope of this disclosure. Furthermore, features are not necessarily shown to scale, partly because some features are small and their explanation in the figures is expected to be clear.
[0005] Figure 1 Examples of multi-chip package (MCP) components are provided.
[0006] Figure 2 A transmitting circuit system that can be part of a circuit system of a semiconductor die is shown.
[0007] Figure 3This describes a receiving circuit system that can be part of a circuit system of a semiconductor die.
[0008] Figure 4 A method for testing a semiconductor device is shown, which includes injecting an error from a transmitting die to a receiving die.
[0009] Figure 5 The method for testing semiconductor devices, including injecting errors from the transmitting die to the receiving die, is further described.
[0010] Figure 6 The error injection multiplexer circuit system is shown.
[0011] Figure 7 A method is provided to perform deterministic repair of the signal path between the transmitting and receiving dies.
[0012] Figure 8 A method for verifying the operation of a multi-chip package (MCP) component is shown.
[0013] Figure 9 An example of a computing system is provided.
[0014] The following is a description of certain details and implementation methods, including a non-limiting description of the diagrams depicting some examples and implementation methods. Detailed Implementation
[0015] References to one or more examples should be understood as descriptions of specific features, structures, or characteristics included in at least one implementation. The phrases “an example” or “example” do not necessarily refer to the same example or embodiment. Any aspect described herein may potentially be combined with any other aspect or similar aspect described herein, regardless of whether those aspects relate to the same figures or elements described.
[0016] The terms “connection” and / or “coupling” can indicate that two or more elements are in direct physical or electrical contact with each other. However, the term “coupling” can also mean that two or more elements are not in direct contact with each other, but are separated by one or more elements, but they can still cooperate or interact with each other, for example, physically, magnetically, optically or electrically.
[0017] The terms “first,” “second,” etc., do not indicate order, quantity, or importance, but are used to distinguish one element from another. The terms “one” and “a” in this document do not indicate a limitation on quantity, but rather indicate the presence of at least one of the cited items. The terms “following” or “after” may indicate immediately following or following some other events (single or multiple). According to alternative embodiments, other sequences of operations may also be performed. Furthermore, depending on the application, additional operations may be added or removed.
[0018] Exclusion language (such as the phrase "at least one of X, Y, or Z") is generally used to indicate that an element or feature can be X, Y, or Z or any combination thereof (e.g., X, Y, and / or Z). Therefore, such exclusion language should not be construed as implying that some embodiments require at least one of X, at least one of Y, or at least one of Z to be present respectively.
[0019] The flowcharts shown herein provide examples of sequences of various process actions. Flowcharts can indicate actions to be performed by software or firmware routines and physical operations. Physical operations can be performed by semiconductor processing and / or test equipment (including computer systems that run test protocols and operate test equipment and systems). Although shown in a specific sequence or order, the order of actions can be modified unless otherwise stated. Therefore, the illustrated diagrams should be understood as examples. Processes can be performed in different orders, and some actions can be performed in parallel. Furthermore, one or more actions can be omitted, and not all implementations require all actions to be performed.
[0020] The various components described can be building blocks for performing the described operations or functions. The described components can include software, hardware, or a combination thereof. Some components can be implemented as software modules, hardware modules, special-purpose hardware (e.g., special-purpose hardware, application-specific integrated circuits (ASICs), and digital signal processors (DSPs)), embedded controllers, and / or hardwired circuit systems). Other components can be semiconductor processing and / or testing equipment capable of performing physical operations such as, for example, photolithography, probing, material deposition (e.g., chemical vapor deposition, atomic layer deposition, physical vapor deposition, electrodeposition, and / or sputtering), chemical mechanical polishing (CMP), and etching.
[0021] With respect to the various computer operations or functions described herein, they can be described or defined as software code, instructions, configuration, and / or data. Software content can be provided via an article of art on which content is stored, or by means of operating a communication interface to transmit data via the communication interface. A machine-readable storage medium enables a machine to perform the described functions or operations. A machine-readable storage medium includes any mechanism that stores information in a tangible form accessible to a machine (e.g., a computing device), such as recordable / non-recordable media (e.g., read-only memory (ROM), random access memory (RAM), disk storage media, optical storage media, flash memory devices). Instructions can be stored on a machine-readable storage medium in a non-transitory form. A communication interface includes any mechanism that interfaces with, for example, hardwired, wireless, or optical media to communicate with another device, such as, for example, a memory bus interface, a processor bus interface, an internet connection, or a disk controller.
[0022] Terms such as chip, die, IC (integrated circuit system) chip, IC die, microelectronic chip, microelectronic die, semiconductor die, semiconductor device and / or semiconductor chip are interchangeable and refer to devices that include integrated circuits that can be partially formed from semiconductor materials.
[0023] Semiconductor chip manufacturing processes are sometimes divided into front-end processes (FEOL) and back-end processes (BEOL). The electronic circuitry within the chip, as well as active and passive devices such as transistors, capacitors, resistors, and / or memory cells, are fabricated in the so-called FEOL process. Memory cells include, for example, electronic circuitry for random access memory (RAM) (such as static RAM (sRAM), dynamic RAM (DRAM)), read-only memory (ROM), non-volatile memory, and / or flash memory. FEOL processes can be, for example, complementary metal-oxide-semiconductor (CMOS) processes. BEOL processes involve the metallization of the chip, where interconnects are formed in layers, and the feature size of the interconnects increases in layers closer to the surface of the semiconductor chip. For example, interconnects in a semiconductor chip integrated into a heterogeneous package (e.g., a package including memory and logic chips) may also include through-silicon vias (TSVs) lateral to the semiconductor chip device regions. Semiconductor devices with TSVs can blur the distinction between BEOL and FEOL processes.
[0024] Semiconductor chip interconnects can be created by etching trenches or vias into a dielectric layer and filling the trenches or vias with metal. The dielectric layer can include, for example, low-k dielectrics, SiO2, silicon nitride (SiN), silicon carbide (SiC), and / or silicon carbonitride (SiCN). Low-k dielectrics include, for example, fluorine-doped SiO2, carbon-doped SiO2, porous SiO2, porous carbon-doped SiO2, combinations thereof, and these materials with gas-filled gaps or bubbles. The dielectric layer including conductive characteristics can be an interlayer dielectric (ILD) characteristic. Generally, low-k dielectrics exhibit a dielectric constant lower than that of SiO2.
[0025] The terms “package,” “encapsulation,” “IC package,” “chip package,” “microelectronic package,” or “semiconductor chip package” are interchangeable and generally refer to an enclosed carrier of one or more chips, in which the chips are coupled to a package substrate and encapsulated. The package substrate provides electrical interconnections between the chip (single or multiple) and other chips and / or a motherboard or other circuit boards for I / O (input / output) communication and power delivery. A package with multiple chips can, for example, be a system within a package.
[0026] Packaging substrates typically include a dielectric layer or structure having conductive structures on, through, and / or embedded in the dielectric layer. The dielectric layer can be, for example, a stacked layer. Dielectric materials include Ajinomoto laminated film (ABF), but other dielectric materials are also possible. Semiconductor packaging substrates can have a core or be coreless. Semiconductor packages with a core can have dielectric layers, such as stacked layers on more than one side of the core (e.g., on two opposite sides of the core). The core may include through-hole vias containing conductive material. Other structures or devices within the packaging substrate are also possible.
[0027] A “core” or “package core” typically refers to a layer that is usually embedded within a package substrate. The core can provide structure or rigidity to the package substrate. The core is an optional feature of the package substrate. The core can be a dielectric organic or inorganic material and may have conductive vias extending through the layer. Conductive vias can include metals, such as copper. For example, the package core can be made of glass materials (such as, for example, aluminosilicates, borosilicates, aluminoborosilicates, silica, and fused silica), silicon, silicon nitride, silicon carbide, gallium nitride, or aluminum oxide. In some examples, the core material is a glass fiber reinforced organic resin, such as an epoxy resin. Another exemplary package substrate core is FR4 (woven glass fiber reinforced epoxy). In other examples, the package substrate core is a solid amorphous glass layer.
[0028] In other examples of the encapsulation substrate core, the substrate core is a glass core comprising one or more solid amorphous glass layers. The glass substrate core may comprise glass, such as, for example, aluminosilicate, borosilicate, aluminoborosilicate, silica, and fused silica, and optionally also comprises one or more of the following: Al₂O₃, B₂O₃, MgO, CaO, SrO, BaO, SnO₂, Na₂O, K₂O, SrO, P₂O₃, ZrO₂, Li₂O, Ti, and / or Zn. In further examples of the glass core, the glass may comprise silicon and oxygen, and optionally any one or more of the following: aluminum, boron, magnesium, calcium, barium, tin, sodium, potassium, strontium, phosphorus, zirconium, lithium, titanium, and / or zinc. In some examples, the glass encapsulation substrate core comprises at least 23% silicon and at least 26% oxygen by weight. In other examples, the glass encapsulation substrate core comprises at least 23% silicon, at least 26% oxygen, and at least 5% aluminum by weight.
[0029] Additionally, an example of a solid amorphous glass substrate core can be considered to have a rectangular prism volume. The rectangular prism volume may contain vias already filled with one or more different materials. The material in the vias may be a conductive metal such as copper. An example of a solid amorphous glass substrate core may have a thickness ranging from 50 μm to 1.4 mm. Furthermore, the packaging substrate may include a multilayer glass substrate. The packaging substrate in this example may be a coreless substrate. The multilayer glass substrate may have a thickness, for example, ranging from 25 μm to 50 μm. Moreover, the glass substrate core may have side dimensions ranging from 10 mm to 250 mm. For example, the substrate core may be 10 mm × 10 mm to up to 250 mm × 250 mm in both dimensions, but the substrate core does not necessarily have the same values in both dimensions.
[0030] The package substrate may include one or more interconnect bridges. Interconnect bridges may be partially, completely, or not embedded in the package substrate. Interconnect bridges provide interconnections between chips housed on the package substrate. Interconnections can provide signal I / O between chips. Some interconnect bridges (such as interconnect bridges with conductive through-vias) may also provide power to operatively connected chips. Interconnect bridges may include regions with traces having a smaller width dimension (minimum trace dimension), a smaller height dimension, and / or a smaller length dimension than the vias and traces of the surrounding package substrate. For example, in some regions, the width dimension (or minimum dimension) may be 3 μm or less and / or 10 μm or less. Interconnect bridges may also have a smaller trace pitch than the surrounding package substrate. For example, the center-to-center trace pitch may be 3 μm or less and / or 10 μm or less in some regions. Interconnect bridge substrates may include, for example, silicon, silicon-on-insulator, float glass, borosilicate glass, silicon dioxide, polymers, one or more organic polymer materials, ceramics, and / or silicon nitride materials. Interconnect bridge substrates may include, for example, one or more dielectric layers, which are composed of silicon oxide, silicon nitride, silicon oxynitride, carbon-doped oxide, methylsilsesquioxane, hydrogen silsesquioxane, die back-side film (DBF), epoxy resin film, Class B epoxy resin film, or other dielectric materials. Interconnect bridges may also include coreless substrates composed of multiple dielectric layers. The dielectric layers may be, for example, die back-side film (DBF), epoxy resin film, Class B epoxy resin film, or other dielectric materials. Other materials may also be used for interconnect bridge substrates. Other materials are also possible.
[0031] In MCP, inter-die connectivity can be achieved using interface ports (one on each top die) via base dies. This type of connectivity can be called D2D (die-to-die) interconnect. D2D ports can consist of any number of bundles. In this example, the D2D ports consist of 48 bundles connected to different partition units in the top die, and each bundle can have, for example, 10-57 channels and 1-2 redundant channels. Any number of channels can be used. Redundant channels can be used to replace channels with compromised connectivity through a fusing repair process. The integrity of connectivity for each channel can be tested by a built-in self-test (FBIST) circuit, where both sides of the top die can operate in Tx / Rx (transmit / receive) mode.
[0032] The attachment of top dies (single or multiple) to base dies can be associated with process yield challenges because the IO (input / output) microbump pitch is increasingly smaller for interconnects formed during the assembly process. Failure modes during interconnect formation (e.g., bump bonding techniques) include hard defects such as open and short circuits formed by misalignment, for example. Hard defects can be randomly placed. Failure modes of packaged dies also include system failures. System failure modes can occur due to design margins that can be identified under PVT (process, voltage, temperature) variations and can be associated, for example, with process variations such as lithographic critical dimensions (CD) or implantation profiles on the wafer edge.
[0033] The built-in self-test IC controller for the MCP may include the following registers, which can be programmed according to the Joint Action Test Group (JTAG) Institute of Electrical and Electronics Engineers (IEEE 1149.1) standard protocol. a) Failing channel register The register width is equal to Log2 (total number of channels). There can be 3 failure channel registers: Failed Channel 1 Register Failed channel 2 register, and Failed channel 3 register. These registers can record up to three failure channels. There can be two recovery channels. The third channel can be dedicated to providing additional failure analysis data. For example, if there are 100 channels, the register width is 7. If channels 10, 27, and 31 fail, then: Failed channel 1 register record 10, Failed channel 2 register record 27, and Failed channel 3 register record 31. b) Error Register per ChannelThis register is as wide as the number of channels. It records error information for each channel, and is set to 1 by hardware if a channel fails. For example, if there are 10 channels, the register width is 10. If channels 5 and 7 fail, register bits 5 and 7 record 1:10'b 000. 1 0 1 0000. c) Error Injection Register This register is as wide as the number of channels. If the user sets a specific position to 1, a fault can be injected into the corresponding channel. For example, if there are 10 channels, the register width is 10. If the user intends to inject artifact errors into channel 4, the user sets bit 4 to 1.
[0034] Figure 1 An example of a multi-chip package (MCP) is provided. The MCP includes a first semiconductor die 105 and a second semiconductor die 107. The first semiconductor die 105 includes a transmitting circuit system 115 and a receiving circuit system 110, and the second semiconductor die 107 includes a transmitting circuit system 116 and a receiving circuit system 111. The first semiconductor die 105 and the second semiconductor die 107 are electrically coupled to a third semiconductor die 125. This electrical coupling can be referred to as die-to-die (D2D) coupling. The D2D port can, for example, consist of 48 bundles connected to different partition units in the first die 105, and each bundle can have 10-57 channels and 1-2 redundant channels (to replace disconnected connectivity channels via a melt-repair process). Generally, bundles, partitions, or clusters are used interchangeably. A partition can refer to dividing a digital circuit design into smaller, more manageable sections or blocks based on functionality, hierarchy, or other design criteria. For example, a computing digital circuit system can be partitioned into instruction fetch units, instruction decode units, execution units, and cache memories (L1, L2). The integrity of connectivity for each channel can be tested by an FBIST machine, where the two sides of the top die operate in a Tx / Rx (transmit / receive) mode. As the technology node shrinks, the first semiconductor die 105 and the second semiconductor die 107 can be attached to the third semiconductor die 125 via IO microbumps with increasingly smaller pitches. The base semiconductor die 125 is electrically connected to the package substrate 130.
[0035] Within a cluster, there can be hundreds of transmitting circuitry systems 115 and 116 and hundreds of receiving circuitry systems 110 and 111. A typical cluster can contain up to 500 transmitting circuitry system units and 500 receiving circuitry system units or more. The number of transmitting units can be equal to the number of receiving units, which can be the same as the number of signals crossing the semiconductor die. The transmitting units in the cluster can be controlled by a single transmitting BIST controller, and the receiving units in the cluster can be controlled by a single receiving BIST controller. The transmitting and receiving units may include a) functional triggers, b) functional paths to be tested, and c) test circuitry.
[0036] Semiconductor dies 105, 107, and 125 can be any combination of a microprocessor, CPU (Central Processing Unit), GPU (Graphics Processing Unit), processing core, system-on-a-chip, other processing hardware, a combination of processors or processing cores, a programmable general-purpose or special-purpose microprocessor, an accelerator, DSP, I / O management, a programmable controller, ASIC, programmable logic device (PLD), 3D integrated circuit (3DIC), intermediate, HBM, and / or other memory devices. A semiconductor chip can be, for example, as described herein. Figure 9 Any chip described herein. For example, the first semiconductor die 105 may be a computing chip, the second semiconductor die 107 may be a SOC chip, and the third semiconductor die 125 may be a 3D IC or an active interposer including a circuit system. The semiconductor chip package described herein can generally be part of a variety of larger package structures and configurations, and the foregoing examples are not intended to limit the types of possible components.
[0037] Figure 2 Provided can be, for example Figure 1 Examples of the transmitting circuit units of transmitting circuit systems 115 or 116. Figure 2 In the diagram, F1 is a functional trigger. The test circuit system is demarcated by the first dashed line 205. The functional path to be tested can be F1->B1->B2. The second dashed box demarcates the signal I / O channel 210 as, for example, a channel between the first semiconductor die and the second semiconductor die. Figure 3 In the diagram, dashed box 305 demarcates signal I / O channel 310, which may be electrically coupled to signal I / O channel 210, for example. The first semiconductor die may be semiconductor die 105, the second semiconductor die may be semiconductor die 107, and the signal I / O channel may pass through a third semiconductor die, such as... Figure 1 The semiconductor die 125. Circuit system CG1 is a clock-gated circuit system.
[0038] Figure 3 An example of a receiver circuit unit is provided, which may be, for example... Figure 1 The receiving circuit system 110 or 111. In Figure 3 In the diagram, F2 is a functional flip-flop. The test circuit system is demarcated by the first dashed line 305. The test circuit system may also include a block marked TGF. Circuit systems CG1 and CG2 are clock-gated circuit systems. The functional path to be tested may include B4 -> B2.
[0039] F1 and F2 function flip-flops can be part of the functional logic to be tested. Function flip-flops F1 and F2 operate on a multiplexed clock between the functional clock and the slow test clock. The clocks for F1 and F2 flip-flops are controlled by the clock gating unit CG1. The input clock to clock gate CG1 is a multiplexed clock between the slow test clock and the full-speed functional clock. Clock gating enable and clock selection are generated by a finite state machine (FSM) within the TX and RX circuits.
[0040] Functional path crossover is a crossover via a 3D IC or via a link between F1 in the transmitting die and F2 in the receiving die. Crossover can include: F1 (on the Tx die) -> B1 (on the Tx die) -> B4 (on the Rx die) -> B2 (on the Rx die) -> F2 (on the Rx die).
[0041] exist Figure 2 In this test circuit, the test circuit may include: RES flip-flops, comparators, and EF flip-flops. The RES flip-flop is the result generation flip-flop. The RES flip-flop generates a toggle mode and compares it with the test_input entered through the F1 flip-flop. The clock to the RES flip-flop is controlled by clock gate CG1. The test input is generated by the test controller and is also a toggle mode. The test_input toggle mode iterates through F1. The RES flip-flop on the TX circuit can be used for class testing, i.e., testing performed before the TX die and RX die are packaged. The RES flip-flop on the RX circuit can be used for both class (die-level) and grade (package-level) testing.
[0042] Comparator 215 can be an XOR gate, whose inputs are the RES output and the F1 output via B1 (in the functional path of the TX circuit system). For example, if the two inputs of the XOR gate do not match, its output is 1. Comparator 215 in the TX circuit system can be used for class testing, i.e., testing performed before the TX die and RX die are packaged. Comparator 315 in the RX circuit system can be used for both class (die level) and grade (packaged die) testing.
[0043] The EF flip-flop is an error trigger. If the XOR compare fails, the error is recorded as a 1 in the EF flip-flop; this is sticky. The clock signal to the EF flip-flop is controlled by clock gate CG1. The EF flip-flop can be used on the receiver side. Figure 3 The chain `fbist_rx_sync_check_in` -> `fbist_rx_sync_check_out` is used on the sending side. Figure 2 The information is read from the chain fbist_tx_shift_in->fbist_tx_shift_out. The EF flip-flop in the TX circuit system can be used for type testing. The EF flip-flop in the RX circuit system can be used for both type (die level) and grade (package die) testing.
[0044] The FBIST chain stitches the REF-EF triggers and RES-EF triggers for the channels. This is used to read the EF triggers for channel failure analysis. For example, if there are 10 channels and the EF trigger is set to 1 for channel 5, then channel 5 has failed.
[0045] Example testing of the link may involve the following phases: 1) Initialization; 2) Clock switch 1; 3) Application phase; 4) Clock switch 2; and 5) EF readout. Initialization may involve: a) initializing F2 and RES to the same value; and b) switching to a slower clock. Clock switch 1 may involve: a) switching to a faster clock for the application phase; b) turning off the clock before the clock switch occurs; and 3) switching to a faster functional clock or a slower test clock. The application phase may involve: a) running a toggle mode of 64 clocks or a programmable number of clocks; b) testing for defects from F1 via B4 via B2 via F2; and c) deterministically degating functional clocks CG1 and CG2. Clock switch 2 may involve: a) switching to a slower clock for EF chain readout; b) turning off the clock before the clock switch occurs; and c) slowing down the test clock. EF readout may involve: a) reading the EF flip-flops to detect a failed channel; b) logging the channel failure to a user-readable register; and c) slowing down the test clock.
[0046] The initialization phase allows placing the F2 and RES flip-flops with the same value on the receive circuitry (Rx die) before the application phase begins testing. F2 is initialized using `test_input` injected into the transmit circuitry (Tx die) by the FBIST controller. RES is initialized using the FBIST chain (using...). Figure 3 (fbist_rx_sync_check_in->fbist_rx_sync_check_out). The clocks to F2 and RES are controlled by clock gates CG1 and CG2 respectively, and the inputs to clock gates CG1 and CG2 are multiplexed clocks.
[0047] For Clock Switch 1, the clock is turned off to avoid unwanted glitches and then switched to the desired clock (depending on the test phase, a slow or fast clock). The clock is then degated so that the clock from the multiplexer will come out clean (without glitches). During the Clock Switch 1 phase, the clock is switched to a faster functional clock on the RES and EF flip-flops before entering the application phase.
[0048] During the application phase, once the F2 and RES flip-flops on the RX circuit system (RX die or "RX") are initialized to the same value, the toggle mode is run on the TX circuit system (TX die) via test_input from the FBIST controller. Because RES on RX is already a toggle flip-flop, RES on RX will continue to run in toggle mode. CG1 and CG2 on RX are degated within a predetermined number of clock cycles.
[0049] For clock switch 2, the clock is turned off to avoid unwanted glitches, switched, and degated so that the clock from the multiplexer comes out clean (glitch-free). Before entering the EF readout stage, the clock is switched from the fast clock to the slow clock on RES and the EF flip-flops. The EF readout stage is performed on the slow clock.
[0050] Figure 4 The diagram illustrates a method for testing a semiconductor device, such as an MCP. The MCP may be, for example, a 3D IC. A semiconductor package including a die is selected for testing (405). The semiconductor package may be a semiconductor package that has passed previous testing and / or is considered defect-free. The package may include a first die selected as a signal transmitting die (TX die) and a second die selected as a signal receiving die (RX die). The package may also include a third die, such as an intermediary or smart intermediary through which the signal will pass. Based on programming by the user via JTAG settings in a register selected as an error injection register, the MCP test controller in the TX die injects 0 on the selected channel (410). 0 traverses the TX die and reaches the F2 function flip-flop on the RX die. On the RX die, the F2 output, which is 0, is compared with the toggle mode on the RX die, and EF is set to 1 (415). If the failure channel register correctly reports channel X as a failure channel, the reporting mechanism and logic work correctly for the channel (420). If the failed channel register does not correctly report channel X as a failed channel, there is a problem with the error reporting mechanism for channel X (420). This can be repeated for the same number of channels as all other channels. Figure 4 The elements of the method shown. Figure 4This method allows determining whether the error reporting mechanism, logic implementation, and timing path are working correctly with respect to fault injection. If a part fails, it can be known that the failure is not due to a problem with the error reporting mechanism or the logic implemented in the semiconductor die. In this example, 0 is injected; however, 1 could also be injected.
[0051] Figure 5 The method for testing semiconductor devices, such as MCPs, is further described. A JTAG programmable register is defined in the transmitting die to select the MCP channel for injecting artifact faults. An MCP test controller 505 in the transmitting die can inject 0 into the transmitting circuitry 510. The transmitting circuitry 510 is electrically coupled to the receiving circuitry 515 of the receiving die via interconnect 520. For clarity, in Figure 5 The transmitting circuit system 510 and the receiving circuit system 515 are reproduced only in part (from...). Figure 2 and Figure 3 The per-channel error register 525 (EF) can be as wide as the number of channels. The per-channel error register 525 records error information for each channel and is set to 1 by hardware if a channel fails. For example, if there are 10 channels, the register width is 10, and if channels 5 and 7 fail, bits 5 and 7 of the per-channel error register 525 record 1, resulting in the register value: 10'b 0001010000. The toggle circuit 530 creates the "101010..." pattern. Comparator 315 can detect gate output mismatch.
[0052] The error injection register can be as wide as the number of channels. If the user sets a specific bit to 1, the corresponding channel is fault-injected. For example, if there are 10 channels, the register is 10 wide, and if the user intends to inject an artifact error into channel 4, the user sets bit 4 to 1.
[0053] In another method, it is possible to Figure 2 After repairing the multiplexer 225, place an error injection into the multiplexer. Figure 6 This describes an error injection multiplexer that can be used in the transmit circuitry system of the MCP transmit die. The error injection multiplexer select line comes from a user-programmable JTAG register bit. When the select line is 1, error injection is enabled and the multiplexer output will be equal to the input from the user-programmable bit of the error injection value. The user-programmable bit of the error injection value can be set to 1 or 0 based on the selector to simulate a stuck-at-0 or stuck-at-1 fault. In this alternative method, errors can be injected on all channels at once, instead of as in... Figure 4-5 Errors are injected serially as in the previous method. After error injection, the correctness of the reporting mechanism is checked in both methods.
[0054] exist Figure 4-6 The method can also inject artifacts (errors) with random patterns instead of artifacts with constant values. Additionally, random patterns can be driven to inject artifacts in the event of speed failures. For example, a pattern of 11110000 can be injected.
[0055] Figure 7 A method for performing deterministic repair in MCP is provided. Channels 0 and 6 are labeled Channel 0R and Channel 6R, where R indicates that they are repair channels or redundant channels. Channels 1 through 5 are functional channels (not all channels are shown). On functional channel 4, the method described herein is used, for example (e.g., as described herein regarding...). Figures 4 to 6 As shown and described, an error has been exposed. The error in interconnect 520 is shown as star 720. Transmitting circuit system 510 is electrically coupled to receiving circuit system 515 of receiving die via interconnect 520. For clarity, in Figure 5 The transmitting circuit system 510 and the receiving circuit system 515 are reproduced only in part (from...). Figure 2 and Figure 3 Repair paths 705 and 710 illustrate the repair of signal paths bypassing channel 4. On the transmit die, after the repair, data on channel 4 is now rerouted to channel 5. Similarly, data on channel 5 on the transmit die is now rerouted to channel 6R. The repaired path can exist on all channels on the transmit die. On the receive side, channel 6R is rerouted through channel 5. Similarly, channel 5 is rerouted through channel 4. The repaired path also exists on all channels on the receive die.
[0056] Repair paths 705 and 710 can create longer timing paths. If repair paths 705 and 710 pass testing, the repair of channel 4 is completed. Potential repair paths can exist in the design and in the silicon of all channels used on both the transmit and receive dies. For ease of explanation, only... Figure 7 The repair path is shown on channel 4 in the diagram. The same process can be repeated for other channels in the MCP.
[0057] Other circuit designs are also possible that can perform the functions described in this article.
[0058] Figure 8A method for testing semiconductor packaged components comprising multiple semiconductor dies is described. A batch or group of MCP components is selected because the manufacturing yield is less than a set percentage (e.g., less than 90%) (805). From this batch or group, MCPs without defects on the 3D IC cross-channel (810) are selected for further testing. The error reporting mechanism, logic implementation, and timing path can then be checked to ensure they are working correctly with respect to fault injection. It may then be known that a portion of the failures are not due to problems in the error reporting mechanism or silicon logic. Faults are injected on the 3D IC cross-channel (815). For example, based on... Figure 4-6 The method of injecting faults can be used, and faults can be injected one after another or in parallel. It determines whether the test failed, and if a faulty channel is found, whether the reported number of the faulty channel is correct (820). If the test did not fail, there is a logical problem related to the reporting mechanism (825). If the test indication malfunctions, the error reporting mechanism passes the test (830). For example, this paper discusses... Figure 7 The method described herein repairs the failed channel (835). It is then determined whether the repaired channel passes the test (840). If the repaired channel passes the test, pre-silicon logic problems can be ruled out (845). The problem may be related to the manufacturing process, defects, and / or manufacturing tools. If the repaired channel fails the test, the problem is related to the repaired path (850).
[0059] Figure 9 An example computing system is depicted. The computing system can be a system for operating equipment in a semiconductor manufacturing plant. For example, instructions for operating a semiconductor test system or for performing one or more aspects of the processes described herein can be stored and / or executed on the computing system. The computing system 900 may include, but is not limited to, instructions regarding... Figure 9 The description includes more, different, or fewer features.
[0060] The computing system 900 includes a processor 910, which provides processing, operation management, and execution of instructions for the system 900. The processor 910 may include any type of microprocessor, CPU (Central Processing Unit), GPU (Graphics Processing Unit), processing core, or other processing hardware that provides processing for the system 900, or a combination of processors or processing cores. The processor 910 controls the overall operation of the system 900 and may be or include one or more programmable general-purpose or special-purpose microprocessors, DSPs, programmable controllers, ASICs, programmable logic devices (PLDs), etc., or combinations of these devices.
[0061] In one example, system 900 includes an interface 912 coupled to processor 910, which may represent a higher-speed or high-throughput interface for system components requiring higher bandwidth connections, such as memory subsystem 920 or graphics interface component 940 and / or accelerator 942. Interface 912 represents interface circuitry, which may be a standalone component or integrated onto the processor die. Where present, graphics interface 940 interfaces with a graphics component used to provide a visual display to a user of system 900. In one example, the display may include a touchscreen display.
[0062] Accelerator 942 may be a fixed function or programmable offload engine that can be accessed or used by processor 910. For example, an accelerator in accelerator 942 may provide data compression (DC) capabilities, cryptographic services (such as public key encryption (PKE)), cryptography, hash / authentication capabilities, decryption, or other capabilities or services. In some cases, accelerator 942 may be integrated into a CPU socket (e.g., a connector to a motherboard (or circuit board, printed circuit board, motherboard, system board, or logic board) that includes a CPU and provides an electrical interface to the CPU). For example, accelerator 942 may include a single-core or multi-core processor, a graphics processing unit, a logic execution unit, a single-level or multi-level cache, a functional unit that can be used to independently execute programs or threads, an application-specific integrated circuit system (ASIC), a neural network processor (NNP), programmable control logic, and programmable processing elements such as field-programmable gate arrays (FPGAs) or programmable logic devices (PLDs). Accelerator 942 may provide multiple neural networks, CPUs, processor cores, general-purpose graphics processing units, or may enable graphics processing units to be used by artificial intelligence (AI) or machine learning (ML) models.
[0063] Memory subsystem 920 represents the main memory of system 900 and provides storage for code to be executed by processor 910 or data values to be used during routine execution. Memory subsystem 920 may include one or more memory devices 930, such as read-only memory (ROM), flash memory, one or more random access memories (RAM), such as static RAM (SRAM), dynamic RAM (DRAM), synchronous DRAM (SDRAM), and / or other memory devices, or combinations of such devices. Among other things, memory 930 stores and hosts operating system (OS) 932, which provides a software platform for executing instructions in system 900, and stores and hosts applications 934 and processes 936. In one example, memory subsystem 920 includes memory controller 922, which is a memory controller for generating commands and issuing commands to memory 930. Memory controller 922 may be a physical part of processor 910 or a physical part of interface 912. For example, memory controller 922 may be an integrated memory controller integrated into circuitry within processor 910.
[0064] System 900 may also optionally include one or more buses or bus systems between devices, such as memory buses, graphics buses, and / or interface buses. Buses or other signal lines can communicatively or electrically couple components together, or communicatively and electrically couple components. Buses may include physical communication lines, point-to-point connections, bridges, adapters, controllers, or other circuit systems or combinations thereof. Buses may include, for example, one or more of the following: system bus, peripheral component interface (PCI) or PCI fast (PCIe) bus, HyperTransport or Industry Standard Architecture (ISA) bus, small computer system interface (SCSI) bus, universal serial bus (USB), or FireWire bus.
[0065] In one example, system 900 includes interface 914, which can be coupled to interface 912. In one example, interface 914 represents interface circuitry, which may include standalone components and integrated circuit systems. In one example, a user interface component or a peripheral component, or both, is coupled to interface 914. Network interface 950 provides system 900 with the ability to communicate with remote devices (e.g., servers or other computing devices) via one or more networks. Network interface 950 may include an Ethernet adapter, wireless interconnect component, cellular network interconnect component, USB, or other wired or wireless standard-based interfaces or proprietary interfaces. Network interface 950 may transmit data to devices in the same data center or rack or remote device, which may include transmitting data stored in memory.
[0066] Some examples of the network interface 950 are part of, or used by, an infrastructure processing unit (IPU) or a data processing unit (DPU). xPU can refer at least to an IPU, DPU, GPU, GPGPU (general-purpose computing on a graphics processing unit), or other processing unit (e.g., an accelerator device). An IPU or DPU may include a network interface with one or more programmable pipelines or fixed-function processors to offload operations that may already be performed by the CPU. An IPU or DPU may include one or more memory devices.
[0067] In one example, system 900 includes one or more input / output (I / O) interfaces 960. I / O interfaces 960 may include one or more interface components (e.g., audio, alphanumeric, haptic / touch, or other interfaces) for users to interact with system 900. Peripheral interfaces 970 may include other types of hardware interfaces, such as interfaces to semiconductor manufacturing equipment and / or electrostatic charge management devices.
[0068] In one example, system 900 includes a storage subsystem 980. Storage subsystem 980 includes storage devices (single or multiple) 984, which may be or include any conventional medium for storing data in a non-volatile manner, such as one or more magnetic, solid-state, and / or optical disks. Storage device 984 can generally be considered "memory," although memory 930 is typically execution or operational memory that provides instructions to processor 910. While storage device 984 is non-volatile, memory 930 may include volatile memory (e.g., the value or state of data is indeterminate if power to system 900 is interrupted). In one example, storage subsystem 980 includes a controller 982 for interfacing with storage device 984. In one example, controller 982 is a physical part of interface 912 or processor 910, or may include circuitry or logic from both processor 910 and interface 914.
[0069] A power source (not depicted) provides power to the components of system 900. More specifically, the power source typically interfaces with one or more power sources in system 900 to provide power to the components of system 900.
[0070] Examples of the system can be implemented in various types of computing, smartphones, tablets, personal computers, and networking devices such as switches, routers, rack and blade servers, such as those used in data center and / or server farm environments. Example
[0071] A method may include: selecting a multi-chip package assembly, wherein the multi-chip package assembly includes a first die and a second die, wherein the first die includes a transmitting circuitry system and the second die includes a receiving circuitry system, wherein the transmitting circuitry system is electrically coupled to the receiving circuitry system, and wherein the receiving circuitry system includes a failure channel register; injecting an error into a selected channel of the transmitting circuitry system; and determining whether the failure channel register in the failure channel register has correctly reported a failure channel. The error may be injected from a test controller in the first die. Determining whether the failure channel register in the failure channel register has correctly reported a failure channel includes comparing the output from a function trigger with a toggle mode. Error injection may be performed serially multiple times for multiple channels. Error injection may be performed in parallel on multiple channels. The method may further include logging the channel failure to a user-readable register. The method may further include performing repair on the identified failure channel.
[0072] A method may include: selecting a multi-chip package assembly, wherein the multi-chip package assembly includes a first die and a second die, wherein the first die includes a first transmitting circuitry system and the second die includes a first receiving circuitry system, wherein the first die includes a second transmitting circuitry system, and wherein the second die includes a second receiving circuitry system; wherein the first transmitting circuitry system is electrically coupled to the first receiving circuitry system; rerouting a first signal path from the first transmitting circuitry system to the second transmitting circuitry system; and rerouting a second signal path from the second receiving circuitry system to the first receiving circuitry system. The second transmitting circuitry system may be a redundant transmitting circuitry system. The second receiving circuitry system may be a redundant receiving circuitry system. A failure can be identified in a signal path including the first transmitting circuitry system and the first receiving circuitry system. The second transmitting circuitry system may include a first repair multiplexer, and the first signal path is routed through the first repair multiplexer. The second receiving circuitry system may include a second repair multiplexer, and the second signal path is routed through the second repair multiplexer. The second transmitting circuitry system may be electrically coupled to the second receiving circuitry system. The second transmitting circuit system is a redundant transmitting circuit system, wherein the second transmitting circuit system is a neighboring redundant transmitting circuit system to the first transmitting circuit system.
[0073] At least one machine-readable storage medium may include non-transitory instructions that, when executed by a processor, cause the device to: determine, on a selected multi-chip package assembly, that a first channel includes a fault, wherein the multi-chip package assembly includes a first die and a second die, wherein the first die includes a first transmitting circuitry system and the second die includes a first receiving circuitry system, wherein the first die includes a second transmitting circuitry system, wherein the second die includes a second receiving circuitry system, and wherein the first channel includes the first transmitting circuitry system and the first receiving circuitry system; reroute a first signal path from the first transmitting circuitry system to the second transmitting circuitry system; and reroute a second signal path from the second receiving circuitry system to the first receiving circuitry system. The second transmitting circuitry system may be a redundant transmitting circuitry system. The second receiving circuitry system may be a redundant receiving circuitry system. The second transmitting circuitry system may include a first repair multiplexer, and the first signal path may be routed through the first repair multiplexer. The second transmitting circuitry system may be a redundant transmitting circuitry system, wherein the second transmitting circuitry system is the redundant transmitting circuitry system closest to the first transmitting circuitry system.
[0074] In addition to the content described herein, various modifications may be made to the disclosed content and implementation methods without departing from its scope. Therefore, the descriptions and examples herein should be interpreted as illustrative rather than restrictive.
Claims
1. A method comprising: Select a multi-chip package assembly, wherein the multi-chip package assembly includes a first die and a second die, wherein the first die includes a transmitting circuit system and the second die includes a receiving circuit system, wherein the transmitting circuit system is electrically coupled to the receiving circuit system, and wherein the receiving circuit system includes a failure channel register; Inject errors into selected channels of the transmitting circuitry system; Determine whether the failure channel register in the failure channel register has correctly reported the failure channel.
2. The method according to claim 1, wherein, The error is injected from the test controller in the first die.
3. The method according to claim 1, wherein, Determining whether the failure channel register in the failure channel register has correctly reported the failure channel includes comparing the output from the function trigger with the toggle mode.
4. The method according to claim 1, wherein, Error injection is performed serially multiple times across multiple channels.
5. The method according to claim 1, wherein, Error injection is performed in parallel on multiple channels.
6. The method of claim 1 further includes recording the channel failure to a user-readable register.
7. The method according to claim 1 further includes performing repair on the identified failed channels.
8. A method comprising: A multi-chip package assembly is selected, wherein the multi-chip package assembly includes a first die and a second die, wherein the first die includes a first transmitting circuit system and the second die includes a first receiving circuit system, wherein the first die includes a second transmitting circuit system and the second die includes a second receiving circuit system; wherein the first transmitting circuit system is electrically coupled to the first receiving circuit system. Rewire the first signal path from the first transmitting circuit system to the second transmitting circuit system; and The second signal path is rewired from the second receiving circuit system to the first receiving circuit system.
9. The method according to claim 8, wherein, The second transmitting circuit system is a redundant transmitting circuit system.
10. The method according to claim 8, wherein, The second receiving circuit system is a redundant receiving circuit system.
11. The method according to claim 8, wherein, A malfunction has been identified in the signal channel, which includes the first transmitting circuit system and the first receiving circuit system.
12. The method according to claim 8, wherein, The second transmitting circuit system includes a first repair multiplexer, and the first signal path is routed through the first repair multiplexer.
13. The method according to claim 8, wherein, The second receiving circuit system includes a second repair multiplexer, and the second signal path is routed through the second repair multiplexer.
14. The method according to claim 8, wherein, The second transmitting circuit system is electrically coupled to the second receiving circuit system.
15. The method according to claim 8, wherein, The second transmitting circuit system is a redundant transmitting circuit system, wherein the second transmitting circuit system is a neighboring redundant transmitting circuit system to the first transmitting circuit system.
16. At least one machine-readable storage medium, comprising non-transitory instructions that, when executed by a processor, cause the device to: In the selected multi-chip package assembly, the first channel is determined to include a fault, wherein, The multi-chip package assembly includes a first die and a second die, wherein the first die includes a first transmitting circuit system and the second die includes a first receiving circuit system, wherein the first die includes a second transmitting circuit system, wherein the second die includes a second receiving circuit system, and wherein the first channel includes the first transmitting circuit system and the first receiving circuit system. Rewire the first signal path from the first transmitting circuit system to the second transmitting circuit system; and The second signal path is rewired from the second receiving circuit system to the first receiving circuit system.
17. The at least one machine-readable storage medium according to claim 16, wherein, The second transmitting circuit system is a redundant transmitting circuit system.
18. The at least one machine-readable storage medium according to claim 16, wherein, The second receiving circuit system is a redundant receiving circuit system.
19. The at least one machine-readable storage medium according to claim 16, wherein, The second transmitting circuit system includes a first repair multiplexer, and the first signal path is routed through the first repair multiplexer.
20. The at least one machine-readable storage medium according to claim 16, wherein, The second transmitting circuit system is a redundant transmitting circuit system, and wherein the second transmitting circuit system is the redundant transmitting circuit system that is closest to the first transmitting circuit system.
21. A computer program product comprising instructions that, when executed by a processor, cause the processor to perform the method according to any one of claims 1-15.
22. A computer system for testing multi-chip packaged components, wherein, The multi-chip package assembly includes a first die and a second die, and the computer system includes: a processor; and a memory coupled to the processor for storing instructions, which, when executed by the processor, are used for: Select the channel of the transmitting circuit system of the first die; An error is injected into the selected channel of the transmitting circuit system of the first die; Determine whether the failure channel register in the failure channel register of the second die has correctly reported the failure channel.
23. The method according to claim 22, wherein, The error was injected from the test controller in the first die.
24. The method according to claim 22, wherein, Determining whether the failure channel register in the failure channel register has correctly reported the failure channel includes comparing the output from the function trigger with the toggle mode.
25. The method of claim 22 further includes repairing the identified failed channels.