Wafer level chip architecture and reconfiguration method supporting multi-level hybrid reconfiguration
Through a multi-layered hybrid reconfiguration wafer-level chip architecture, rapid evolution and flexible adaptation of chip capabilities are achieved, solving the problem of rigid chip design in existing technologies, reducing costs and improving security and adaptability.
Patent Information
- Authority / Receiving Office
- CN · China
- Patent Type
- Applications(China)
- Current Assignee / Owner
- QUANTUM CORE CLOUD (BEIJING) MICROELECTRONICS TECH CO LTD
- Filing Date
- 2026-03-17
- Publication Date
- 2026-06-19
Smart Images

Figure CN122240555A_ABST
Abstract
Description
Technical Field
[0001] This invention relates to the field of general architecture of stored-program computers, and more specifically to a wafer-level chip architecture that supports multi-level hybrid reconfiguration. Background Technology
[0002] The integrated circuit industry has long relied on process technology advancements to drive chip performance improvements. However, with the slowdown of Moore's Law, the evolution cycle of advanced processes has lengthened to over two years, and the path of improving computing power solely through process technology is nearing saturation. Meanwhile, the design cost of chips using advanced processes below seven nanometers exceeds $300 million, with tape-out cycles lasting 18 to 24 months. Once the design is finalized, all the chip's functions and performance boundaries are fixed. If new application scenarios need to be adapted or design flaws need to be corrected, a complete redesign and tape-out is necessary, resulting in extremely high product iteration costs and extremely long cycles.
[0003] In the field of artificial intelligence, scenarios such as large-scale model inference, training, and edge deployment have vastly different requirements for computing power accuracy, memory bandwidth, and computing latency. Chips with a single fixed architecture cannot simultaneously meet the needs of multiple scenarios. In addition, under the von Neumann architecture, storage and computing are separated, and the frequent movement of data between storage and computing units leads to high power consumption. The memory wall problem is becoming increasingly prominent under large-scale artificial intelligence workloads.
[0004] While existing reconfigurable technologies have alleviated the aforementioned problems to some extent, they still have significant limitations. Field-programmable gate arrays (FPGAs) have significantly lower area and power efficiency than application-specific integrated circuits (ASICs) due to their lookup table structure; coarse-grained reconfigurable arrays only support single-dimensional reconfiguration at the logic level and cannot simultaneously respond to evolutionary demands at the physical interconnect topology and packaging form level; existing 3D packaging technologies lack unified standardized interface specifications, making it difficult for chips from different vendors to interoperate.
[0005] Patent document CN120371772B discloses a chip based on a reconfigurable neuromorphic interconnect architecture. This chip includes a hierarchical dynamically reconfigurable interconnect layer and memristor-CMOS reconfigurable interconnect units. It dynamically adjusts routing paths through a dynamic routing algorithm and connects CMOS switches and memristor switches in parallel to achieve interconnect reconfiguration. While this scheme demonstrates some innovation in dynamic adjustment at the interconnect level, it suffers from the following shortcomings:
[0006] First, the reconfiguration capability of the proposed solution is limited to adjusting the routing path of the internal interconnect network of the chip. It is a single-dimensional logical-level reconfiguration, which does not involve dynamic switching of computing power precision mode, nor does it involve substantial changes to the physical metal interconnect topology through back-end processes. The reconfiguration depth is limited.
[0007] Second, the proposed solution uses a memristor-CMOS hybrid switch structure to achieve interconnect reconstruction. Memristor devices face problems such as poor consistency, insufficient durability, and limited compatibility with mainstream complementary metal-oxide-semiconductor processes in large-scale engineering mass production, making engineering implementation difficult.
[0008] Third, the proposed solution does not provide an expansion mechanism at the chip packaging level, cannot dynamically expand computing and storage capabilities through three-dimensional stacking of heterogeneous chips, and does not involve the implementation of an in-memory computing architecture.
[0009] Fourth, the proposed solution lacks a cross-layer collaborative refactoring management mechanism, and cannot uniformly schedule, prioritize, arbitrate, and manage secure versions of refactoring operations across the software, interconnect, and packaging layers.
[0010] Therefore, there is an urgent need for a wafer-level architecture that can continuously evolve chip capabilities through collaborative reconstruction at three levels: software instructions, physical interconnects, and 3D packaging, without requiring re-tape-out, while also possessing a robust cross-level collaborative management and security protection mechanism. Summary of the Invention
[0011] The purpose of this invention is to address the shortcomings by proposing a wafer-level chip architecture and reconstruction method that supports multi-level hybrid reconstruction.
[0012] The present invention adopts the following technical solution:
[0013] A wafer-level chip architecture supporting multi-level hybrid reconfiguration, wherein the architecture uses a base wafer as a physical carrier and includes a software instruction layer reconfiguration subsystem, a metal interconnect layer reconfiguration subsystem, a three-dimensional packaging layer reconfiguration subsystem, and a reconfiguration sensing controller.
[0014] The software instruction layer reconstructing subsystem is integrated inside the base wafer and is used to dynamically reconstruct the computing power precision mode and local logic mapping relationship without changing the chip physical structure.
[0015] The metal interconnect layer reconfiguration subsystem utilizes the reserved reconfiguration area and reconfigurable metal interconnect process on the base wafer to change the physical interconnect topology between functional modules inside the chip.
[0016] The three-dimensional packaging layer reconstruction subsystem utilizes the standardized packaging interface reserved on the top layer of the basic wafer to achieve capability expansion through three-dimensional stacking of heterogeneous chips;
[0017] The reconstruction perception controller is integrated on the base wafer and is connected to the software instruction layer reconstruction subsystem, the metal interconnect layer reconstruction subsystem, and the three-dimensional packaging layer reconstruction subsystem, respectively, for unified collaborative management of the three reconstruction subsystems.
[0018] Furthermore, the software instruction layer reconfiguration subsystem includes a dynamic instruction set, a reconfigurable computing power controller, and a sandbox-isolated execution domain;
[0019] The dynamic instruction set includes a basic fixed instruction set stored in a one-time programmable memory area whose contents cannot be modified, and an extensible reconfigurable instruction set that is dynamically loaded after signature verification.
[0020] The reconfigurable computing power controller includes a bit-width configurable multiplier tree that supports dynamic switching between four precision modes: four-bit integer, eight-bit integer, sixteen-bit floating-point, and thirty-two-bit floating-point. It also includes a reconfiguration state snapshot module, which is used to save and restore the computing state during precision switching to achieve hot reconfiguration.
[0021] The sandbox-isolated execution domain provides an independent hardware-isolated execution domain for each dynamically loaded instruction package, and the domains are isolated from each other through a hardware memory protection unit.
[0022] Furthermore, the reserved reconstruction area is pre-arranged with via occupant arrays and suspended metal leads during the fabrication of the base wafer, and these are not connected in the initial state.
[0023] The reserved reconfiguration area is divided into three levels according to its activation capability: Level 1 corresponds to standard local interconnection, Level 2 corresponds to cross-module bypass connection, and Level 3 corresponds to global rerouting.
[0024] The reconfigurable metal interconnect process employs a differential activation method, performing selective exposure and metal deposition only on the interconnect nodes that need to be changed this time, and using terahertz time-domain spectroscopy to perform non-contact alignment accuracy detection on the target via area before activation.
[0025] Furthermore, the standardized packaging interface adopts a high-density micro-bump layout and defines three types of standard slots: computing power expansion slots, storage expansion slots, and acceleration expansion slots. These are used for the stacking connection of computing power expansion chips, storage expansion chips, and dedicated acceleration chips, respectively. The electrical timing specifications and power management specifications of the three types of slots are unified.
[0026] The standardized packaging interface embeds an on-chip optical interconnect interface;
[0027] The three-dimensional encapsulation layer reconstruction subsystem also includes a storage logic fusion chip. The stacked layer of the storage logic fusion chip integrates a lightweight computing logic unit, which supports vector dot product and normalization operations within the storage array.
[0028] The three-dimensional encapsulation layer reconstruction subsystem adjusts the ratio of the number of computing power chips to storage chips in real time through a dynamic storage-computing ratio adjustment mechanism.
[0029] Furthermore, the reconstructed perception controller consists of four functional units: an event listening bus, a physical resource mapping engine, a scheduling baseline pusher, and a state consistency arbiter. These four functional units are interconnected through the internal bus of the reconstructed perception controller.
[0030] The event listening bus is responsible for collecting state change events of the three-layer reconfiguration subsystem;
[0031] The physical resource mapping engine maintains a global physical resource graph and uses an incremental update strategy to handle events.
[0032] The scheduling baseline pusher converts the update results of the physical resource mapping engine into differential scheduling instructions and pushes them to the software instruction layer reconstruction subsystem in an asynchronous and non-blocking manner.
[0033] The state consistency arbitrator prioritizes concurrent change requests across multiple layers, with changes to the metal interconnect layer having the highest priority, followed by the three-dimensional encapsulation layer, and then the software instruction layer. It also maintains a refactoring version chain, which records the state before and after each refactoring operation in a hash chain manner, supporting rollback of any historical version.
[0034] Furthermore, the event listening bus reads the chip descriptor through the handshake pin built into the standardized package interface slot, receives the topology change description table through the unidirectional write interface reserved at the edge of the basic wafer, and receives the refactoring status notification reported by the software instruction layer refactoring subsystem through the on-chip bus.
[0035] The core descriptor, topology change description table and refactoring status notification must all be digitally signed by the state consistency arbitrator before being written into the physical resource mapping engine's pending queue.
[0036] The unidirectional write interface is implemented using a one-time programmable memory cell array, and its unidirectional write characteristic physically prevents external access to read the chip's internal topology state through this interface.
[0037] Furthermore, each node in the reconstructed version chain includes a node sequence number, an event timestamp, an event source layer identifier, an event type, a hash value of the global physical resource graph state before the change, a hash value of the global physical resource graph state after the change, an operator's public key identifier, and an operator's digital signature.
[0038] The chain integrity constraint of the reconstructed version chain is that the hash value of the pre-change state recorded by the Nth node must be equal to the hash value of the post-change state recorded by the Nth node.
[0039] The state consistency arbiter verifies the constraint before writing to each new node. If the verification fails, it locks the system state and reports a security alarm to the system management software.
[0040] Furthermore, the software instruction layer reconstruction subsystem responds to algorithm evolution requirements on a daily basis, the three-dimensional packaging layer reconstruction subsystem responds to scale expansion requirements on a weekly basis, and the metal interconnect layer reconstruction subsystem responds to topology optimization requirements on a monthly basis. The three response cycles together cover the capability evolution requirements throughout the entire life cycle of the chip product.
[0041] The reconfiguration sensing controller occupies no more than 3% of the total area of the base wafer, and the reserved reconfiguration area occupies 8% to 15% of the total area of the base wafer.
[0042] A method for reconstructing a wafer-level chip architecture for multi-level hybrid reconstruction, the method comprising the following steps:
[0043] The software instruction layer reconstruction subsystem dynamically loads signed and verified instruction packages through its extensible reconstruction instruction set to reconstruct the computing power precision mode and local logical mapping relationship. During the reconstruction process, the reconstruction state snapshot module saves the current computing state in a snapshot and restores it after the reconstruction is completed, thus achieving hot reconstruction without interrupting the task being executed.
[0044] The metal interconnect layer reconstruction subsystem performs a differential activation process on the target via region in the reserved reconstruction area to change the physical interconnect topology between functional modules inside the chip. Before activation, terahertz time-domain spectroscopy is used for non-contact alignment detection.
[0045] Heterogeneous cores are stacked through the standardized encapsulation interface slots of the three-dimensional encapsulation layer reconstruction subsystem. After the event listening bus of the reconstruction perception controller detects the core insertion, it automatically reads the core descriptor and triggers the incremental update of the physical resource mapping engine. The scheduling baseline pusher pushes the update result to the software instruction layer reconstruction subsystem in the form of differential scheduling instructions to complete the scheduling strategy adaptation.
[0046] The state consistency arbitrator performs priority arbitration on the above three-layer reconstruction operations and records each reconstruction operation to the reconstruction version chain.
[0047] Furthermore, when a core of the 3D encapsulation layer reconstruction subsystem goes offline unexpectedly, the event listening bus immediately triggers an emergency offline event, the state consistency arbitrator temporarily raises the processing priority of the event to the highest, the physical resource mapping engine sets the corresponding node state to fault, and the scheduling baseline pusher immediately pushes a task migration instruction to notify the software instruction layer reconstruction subsystem to migrate all tasks on the core to other available resources. Before the migration is completed, it is prohibited to assign new tasks to the node.
[0048] When the integrity verification of the reconstructed version chain fails, the state consistency arbiter stops accepting any new reconstructed events, locks the current system state, reports a security alarm to the system management software, and waits for authorized recovery operations.
[0049] The beneficial effects achieved by this invention are:
[0050] This invention, through the collaborative reconstruction of three dimensions—software instruction layer, metal interconnect layer, and 3D packaging layer—compresses the chip capability evolution cycle from years in the traditional single-chip design model to days. The software instruction layer responds to algorithm evolution needs on a daily basis, the 3D packaging layer responds to scaling needs on a weekly basis, and the metal interconnect layer responds to topology optimization needs on a monthly basis. These three complementary time granularities comprehensively cover the capability evolution needs throughout the entire chip product lifecycle, fundamentally changing the existing technology where chip capabilities are fixed at the tape-out stage and evolution relies solely on redesign and manufacturing. The differential activation process selectively modifies the reserved reconstruction area, performing exposure and metal deposition only on the changed interconnect nodes, eliminating the need for re-tape-out. The process cycle for a single interconnect change does not exceed six weeks, reducing costs by more than 60% compared to redesign and tape-out. The 3D packaging layer supports plug-and-play functionality for compliant chips from different suppliers through standardized packaging interfaces, avoiding the cost waste of repeatedly purchasing customized chips for different application scenarios. The refactoring awareness controller senses state changes in the three-layer refactoring subsystem in real time through an event listening bus. It maintains the global physical resource graph incrementally via a physical resource mapping engine, with a single inserted event update latency of no more than 50 microseconds and end-to-end coordination latency of no more than 200 microseconds. The coordination overhead is transparent to upper-layer applications. The state consistency arbiter uses a digital signature verification mechanism to prevent unauthorized refactoring operations. The refactoring version chain records the state before and after each refactoring operation in a hash chain manner, supporting rollback of any historical version and meeting industrial-grade security and compliance requirements.
[0051] To further understand the features and technical content of the present invention, please refer to the following detailed description and drawings of the present invention. However, the drawings provided are for reference and illustration only and are not intended to limit the present invention. Attached Figure Description
[0052] Figure 1 This is a schematic diagram of the overall structure of the wafer-level chip architecture of the present invention;
[0053] Figure 2 This is a schematic diagram of the internal structure of the software instruction layer reconstruction subsystem of the present invention;
[0054] Figure 3 This is a schematic diagram of the internal structure of the reconstructed perception controller of the present invention;
[0055] Figure 4This is a schematic diagram of the event interface between the RAC and the three-layer reconfiguration subsystem of the present invention;
[0056] Figure 5 This is a complete timing diagram of the three-layer collaborative mechanism in the scenario of adding a storage chip to the P-Layer of this invention. Detailed Implementation
[0057] The following specific embodiments illustrate the implementation of the present invention. Those skilled in the art can understand the advantages and effects of the present invention from the content disclosed in this specification. The present invention can be implemented or applied through other different specific embodiments, and various details in this specification can also be modified and changed based on different viewpoints and applications without departing from the spirit of the present invention. Furthermore, the accompanying drawings of the present invention are for simple illustrative purposes only and are not depictions of actual dimensions; this is stated beforehand. The following embodiments will further describe the relevant technical content of the present invention in detail, but the disclosed content is not intended to limit the scope of protection of the present invention.
[0058] Example 1:
[0059] This embodiment provides a wafer-level chip architecture that supports multi-level hybrid reconfiguration, such as... Figure 1 As shown, the architecture uses a base wafer as the physical carrier, integrating a software instruction layer reconstruction subsystem (S-Layer), a metal interconnect layer reconstruction subsystem (M-Layer), and a three-dimensional packaging layer reconstruction subsystem (P-Layer) on the base wafer. The reconstruction awareness controller (RAC) provides unified and coordinated management of the three reconstruction subsystems. The three reconstruction subsystems are complementary in terms of time granularity: the S-Layer responds to algorithm evolution requirements on a daily basis, the P-Layer responds to scaling requirements on a weekly basis, and the M-Layer responds to topology optimization requirements on a monthly basis. Together, they cover the capability evolution requirements throughout the entire lifecycle of the chip product.
[0060] The base wafer serves as the physical foundation of the architecture, comprising several functional module areas and several reserved areas. The functional module areas integrate computing modules, storage control modules, and input / output interface modules. The reserved areas include two types: one is the reserved reconfiguration area (RRA) used by the M-Layer, distributed in the isolation zone between various functional modules, accounting for 8% to 15% of the total area of the base wafer; the other is the standardized packaging interface (SPI) area used by the P-Layer, located on the top layer of the base wafer, used to support the stacking connection of heterogeneous chips. The RAC is integrated in the central management area of the base wafer, accounting for no more than 3% of the total area of the base wafer, and operates through an independent power supply domain.
[0061] Combination Figure 2The S-Layer is integrated inside the base wafer and includes a Dynamic Instruction Set (DISet), a Reconfigurable Computing Controller (RCC), and a Sandbox Isolated Execution Domain (SIE).
[0062] The DISet comprises two parts: the Basic Fixed Instruction Set (FIS) and the Extensible Reconfigurable Instruction Set (EIS). The FIS is stored in a one-time programmable memory area, and its contents cannot be modified. It is responsible for chip startup, security initialization, and handshake communication with the RAC. The EIS is dynamically loaded by externally signed and verified instruction packets and is used to reconfigure the computing power precision mode and local logic mapping relationship. The supported computing power precision modes include INT4, INT8, BF16, and FP32, and the switching delay between each precision mode does not exceed 10 clock cycles.
[0063] The RCC is the hardware execution module of the S-Layer, comprising a precision-scalable computing array and a local logic remapping network. The precision-scalable computing array adopts a bit-width configurable multiplier tree structure, the bit width of which is controlled by the instruction parameters in the EIS. The same hardware circuit operates in different precision modes when receiving different instruction configurations. The local logic remapping network performs real-time remapping of the logical connection relationship of computing nodes without changing the physical wiring through a programmable routing matrix, supporting various logical topologies such as pulsating arrays, tree reduction, and ring communication. The RCC also includes a reconstruction state snapshot module, which saves a snapshot of the current computing state when precision switching or logic remapping operations are performed, and restores it after the switch is completed, realizing hot reconstruction without interrupting the ongoing task.
[0064] The SIE provides an independent hardware-isolated execution domain for each dynamically loaded instruction package. The domains are isolated from each other by a hardware memory protection unit to prevent cross-domain side-channel attacks and unauthorized access.
[0065] The M-Layer utilizes the reserved reconfigurable area (RRA) and reconfigurable metal interconnect (RMIC) on the base wafer to change the physical interconnect topology between functional modules inside the chip.
[0066] The RRA (Remote Aperture Array) pre-arranges via occupant arrays and floating metal leads during the fabrication of the base wafer. In the initial state, the via occupant arrays and floating metal leads are not connected and have no impact on the normal operation of the chip circuit. The RRA is divided into three levels according to its activation capability: Level 1 corresponds to standard local interconnection, used for connection optimization within the same functional module; Level 2 corresponds to cross-module bypass connection, used to establish a dedicated direct connection channel between two adjacent functional modules; Level 3 corresponds to global rerouting, used for long-distance interconnect reconstruction across multiple functional modules. The three levels of activation capability can be enabled step by step according to actual needs without interference.
[0067] The RMIC process employs a differential activation method. Through differential comparison of the mask layout, selective exposure and metal deposition are performed only on the interconnect nodes that need to be changed, rather than remaking the entire metal layer. This keeps the process cycle for a single interconnect change within 4 to 6 weeks. Before activation, the RMIC process uses terahertz time-domain spectroscopy to perform non-contact alignment accuracy detection on the target via area. Activation is only performed after confirming that the alignment accuracy meets the standards.
[0068] The P-Layer utilizes the standardized packaging interface (SPI) on the top layer of the base wafer to achieve three-dimensional stacking expansion of heterogeneous chips, including the heterogeneous in-memory computing expansion architecture (HCIM-X).
[0069] The SPI adopts a high-density microbump layout with a microbump pitch of 50 micrometers. It defines three types of standard slots: Type-C, Type-M, and Type-A, which are used for stacking and connecting computing power expansion chips, storage expansion chips, and dedicated acceleration chips, respectively. The electrical timing and power management specifications of the three types of slots are unified, supporting interoperability of chips from different suppliers that conform to the SPI specification. The SPI embeds an on-chip optical interconnect interface to establish a high-speed optoelectronic hybrid transmission channel between the computing power expansion chip and the base wafer, with a bandwidth density of not less than 2Tbps / mm².
[0070] The HCIM-X includes a Memory Logic Fusion Chip (MLFC) and a Dynamic Compute-Store Ratio Adjustment (DCMR) mechanism. The MLFC is a memory chip mounted on a Type-M slot. Its stacking layer integrates lightweight computing logic units, including a static random access memory array and a multiply-accumulate operation array. It supports high-frequency operator operations such as vector dot product and normalization within the memory array, eliminating the overhead of data transfer between the storage and computing modules. The DCMR adjusts the ratio of the number of computing chips to storage chips in real time through the control bus of the SPI to adapt to different workload characteristics. The ratio of memory to computing is 1:8 in inference scenarios and 1:2 in training scenarios.
[0071] The RAC implements unified collaborative management of the S-Layer, M-Layer and P-Layer. The RAC is connected to the RCC of the S-Layer, the TWP write interface of the M-Layer and the SPI handshake pin of the P-Layer through independent on-chip buses. It continuously monitors the state change events of the three layers, and automatically updates the global physical resource mapping after a reconstruction operation occurs in any layer, and pushes the scheduling policy update instruction to the S-Layer to ensure the collaborative consistency of the three-layer reconstruction operation.
[0072] Example 2:
[0073] Based on the overall architecture described in Embodiment 1, this embodiment provides a detailed description of the internal structure and workflow of the Reconstructed Perception Controller (RAC).
[0074] Combination Figure 3 The RAC consists of four functional units: Event Listening Bus (ELB), Physical Resource Mapping Engine (PRME), Scheduling Baseline Pusher (SBP), and State Consistency Arbiter (SCA). These four functional units are interconnected through the RAC's internal bus. The ELB is responsible for collecting state change events of the three-layer reconfiguration subsystem; the PRME is responsible for maintaining the global physical resource mapping and performing incremental updates; the SBP is responsible for converting resource mapping changes into scheduling policy difference instructions and pushing them to the S-Layer; and the SCA is responsible for priority arbitration of multi-layer concurrent changes and maintenance of the full-link state hash chain.
[0075] Combination Figure 4 The ELB is a standardized event communication interface between the RAC and the three-layer reconfiguration subsystem.
[0076] In the interface implementation between the ELB and the P-Layer, each SPI slot has a built-in insertion detection signal line, a chip type encoding signal line, a handshake response signal line, and a parameter ready signal line. When a chip is inserted, the insertion detection signal is valid. After the ELB detects the valid transition of this signal, it sends a handshake response to the corresponding slot and starts reading the chip descriptor (CD) via the low-speed serial bus. The CD contains a chip type identifier, peak computing power or bandwidth parameters, operating voltage range, maximum power consumption, upper limit of interconnect interface bandwidth, embedded computing logic flags, and a manufacturer's digital signature. After receiving the CD, the ELB calls the SCA to verify the manufacturer's digital signature. After successful verification, the ELB writes the CD to the pending queue of the PRME and records the timestamp, slot number, and chip serial number in the event log.
[0077] In the interface implementation between the ELB and the M-Layer, the topology change of the M-Layer is submitted by the wafer manufacturer through the one-way write interface (TWP) reserved at the edge of the base wafer after the back-end process is completed. The TWP is implemented using a one-time programmable memory cell array. Each memory cell corresponds to an activatable via node in the RRA. When a via node is activated, the manufacturer sets the corresponding memory cell and submits a topology change description table (TCD) through the TWP. The TCD contains the number of via nodes activated this time, the number of each activated node, the source module identifier and target module identifier connected, the interconnect type, the bandwidth level after activation, and the manufacturer's digital signature of all contents of the TCD. The one-way write characteristic of the TWP ensures from a physical structure perspective that the internal topology state of the chip cannot be read by the outside through this interface. After the ELB verifies the signature of the TCD, it writes the TCD into the pending queue of the PRME.
[0078] In the interface implementation between the ELB and the S-Layer, after each instruction packet loading or precision mode switching is completed, the RCC writes a Reconstruction Status Notification (RSN) to the ELB via the on-chip bus. The RSN includes the event type, the current precision mode, the identifier of the affected computing module, the new logical mapping version number, and a checksum. After the ELB verifies the checksum, it writes the RSN into the pending queue of the PRME.
[0079] The PRME maintains a global physical resource graph (GPRG), which is stored in the static random access memory inside the RAC as a weighted directed graph data structure, supporting up to 256 functional module nodes and 1024 interconnection edges.
[0080] Each node of the GPRG records the module identifier, module type, current running status, peak computing power parameters, current load rate, maximum available bandwidth, and the timestamp of the most recent reconstruction of the corresponding functional module. Each edge of the GPRG records the source node module identifier, target node module identifier, interconnection type, current activation status, bandwidth limit, and measured transmission delay.
[0081] The PRME processes events in the ELB queue using an incremental update strategy. For P-Layer insertion events, the PRME adds a functional module node and several directed edges corresponding to the SPI physical connections based on the CD content. For M-Layer topology change events, the PRME traverses the active node records in the TCD, updates the activation status of the corresponding edges from inactive to available, and fills in the bandwidth limit and transmission delay parameters. For S-Layer precision switching events, the PRME updates the computing power parameter field of the affected functional module node. After each incremental update, the PRME calculates a hash value for all contents of the GPRG and transmits it to the SCA as the updated state fingerprint. The processing time of the incremental update is linearly related to the number of nodes and edges involved in the change, and the incremental update delay for a single P-Layer insertion event does not exceed 50 microseconds.
[0082] The SBP converts the incremental update result of the PRME into a Differential Scheduling Instruction (DSI) and pushes it to the RCC of the S-Layer in an asynchronous and non-blocking manner. The DSI includes the instruction type, the module identifier involved, the suggested action type, the suggested migration task load ratio, the bandwidth parameters of the new path, and the expected latency improvement. The instruction type includes four categories: adding available resources, taking resources offline, path update, and load balancing suggestions.
[0083] The RCC internally maintains a DSI receiving queue with a depth of 16. At the beginning of each scheduling cycle, the RCC scheduler checks this queue. If the queue is not empty, it retrieves the DSI and updates the scheduling policy according to the suggested action. If the currently executing task has a highest priority flag, the scheduler can delay processing the DSI until the next scheduling cycle to ensure that the SBP push operation does not enter the critical execution path of the computing task.
[0084] The SCA is responsible for two tasks: priority arbitration and refactoring the version chain maintenance.
[0085] In the priority arbitration mechanism, the arbitration priority of the M-Layer change is the highest, followed by the P-Layer change, and the S-Layer change has the lowest priority. The reason why the M-Layer has the highest priority is that its physical operation of activating vias is irreversible, and the entire system must be prioritized to complete the perception and adaptation. When the SCA detects multiple concurrent events, it puts the low-priority events into a waiting queue with a depth of 8. After the high-priority events complete the PRME incremental update and the SBP DSI push, the next event is taken out from the waiting queue for processing. If the waiting queue is full, the SCA triggers an alarm and notifies the system management software to intervene.
[0086] In the maintenance mechanism of the Reconstructed Version Chain (RVC), after the SCA completes event processing and confirms that the PRME incremental update is complete, it generates an RVC node and writes it to the non-volatile storage area inside the RAC. The RVC node contains a node sequence number, event timestamp, event source layer identifier, event type, hash value of the GPRG state before the change, hash value of the GPRG state after the change, operator public key identifier, and digital signature of the operator for all content of this node. The chain integrity constraint of the RVC is that the hash value of the state before the change recorded by the N+1th node must be equal to the hash value of the state after the change recorded by the Nth node. The SCA verifies this constraint before writing to each new node. If the verification fails, it locks the system state and reports a security alarm to the system management software. When it is necessary to roll back to a certain historical version, the system management software traverses the RVC chain to locate the target node, restores the corresponding historical GPRG state to the static random access memory of the PRME, and then triggers the SBP to push a full resource table synchronization instruction to the S-Layer to complete the rollback.
[0087] Combination Figure 5 Taking the addition of a storage chip with embedded computing logic to the P-Layer as an example, the complete workflow of the three-layer collaborative mechanism is explained.
[0088] After the chip is inserted into the Type-M slot of the SPI, the insertion detection signal is valid. The ELB detects this signal transition, sends a handshake response to the chip, reads the CD via the low-speed serial bus, verifies the manufacturer's digital signature in the CD, and writes the CD into the PRME's pending queue. The above process is completed within 5 microseconds after the chip is inserted. The PRME then adds the corresponding functional module node and SPI physical connection edge to the GPRG, recalculates the state hash value of the GPRG, and the incremental update is completed within 50 microseconds. After the SCA confirms that there are no higher priority events to process, it generates the RVC node for this change and writes it to the non-volatile memory. In the storage area, this operation is completed within 5 microseconds after the PRME update. The SBP then generates a DSI instruction of type "new available resource", which includes the in-memory computing capability parameters and available bandwidth information of the storage chip, and pushes it to the DSI receiving queue of the RCC. This push is completed within 5 microseconds after the SCA completes the RVC node write. The scheduler of the RCC checks the DSI receiving queue in the next scheduling cycle, reads the DSI, and prioritizes scheduling tasks suitable for in-memory computing to the embedded computing logic unit of the newly added chip. The total end-to-end latency of scheduling to utilize the new resource from the physical insertion of the chip into the S-Layer does not exceed 200 microseconds.
[0089] When the ELB fails to verify the digital signature of the CD or the TCD, the ELB discards the event, does not write it into the PRME pending queue, and reports an alarm message containing the slot number and the reason for the failure to the system management software.
[0090] When the number of backlogged events in the PRME queue exceeds a preset threshold, the PRME suspends receiving new events, and the ELB triggers a backpressure signal to each layer to notify each layer to postpone event reporting. Normal reception resumes after the queue is cleared to below the threshold.
[0091] When the integrity verification of the RVC chain fails, the SCA stops accepting any new reconstruction events, locks the current system state, reports a security alarm to the system management software, and waits for authorized recovery operations.
[0092] When the P-Layer core unexpectedly goes offline, the insertion detection signal fails unexpectedly, the ELB immediately triggers an emergency offline event, the SCA temporarily raises the processing priority of this event to the highest, the PRME sets the corresponding node status to fault, and the SBP immediately pushes a task migration instruction to notify the S-Layer to migrate all tasks on the core to other available resources. Before the migration is completed, no new tasks are allowed to be assigned to this node.
[0093] Example 3:
[0094] Based on the overall architecture described in Embodiment 1 and the reconstructed perception controller described in Embodiment 2, this embodiment describes the specific implementation of the two security encryption authentication mechanisms, chip access authentication and instruction packet authentication, in the architecture.
[0095] The secure encryption authentication mechanism is based on a hardware root of trust. During the basic wafer fabrication, the root public key is permanently stored in the storage area of the basic fixed instruction set in a one-time programmable manner. The content of the root public key cannot be modified and serves as the trust anchor for all authentication operations. All external entities that need to access the architecture, including heterogeneous chips and reconfiguration instruction packages, must hold a valid certificate issued by the root private key corresponding to the root public key in order to pass authentication. The secure encryption authentication mechanism adopts the elliptic curve digital signature algorithm with a key length of 256 bits. The signature verification is performed by a dedicated hardware verification unit in the state consistency arbiter. The verification process is performed in parallel with the incremental update operation of the physical resource mapping engine and does not occupy the clock cycle of the main computing path.
[0096] Chip access authentication refers to the process by which the architecture verifies the identity and legitimacy of a heterogeneous chip after it is inserted into the standardized encapsulation interface slot. The specific implementation is as follows:
[0097] Upon shipment, the chip manufacturer writes a chip identity certificate to the read-only storage area of the chip. This chip identity certificate includes the chip serial number, chip type identifier, manufacturer's public key, manufacturer's digital signature of the above information, and certificate validity period. The chip identity certificate is issued by the manufacturer's private key authorized by the root private key, forming a two-level certificate chain: the root certificate corresponds to the root public key embedded in the basic wafer, the manufacturer's certificate is issued by the root certificate, and the chip identity certificate is issued by the manufacturer's certificate.
[0098] When the chip is inserted into the slot of the standardized package interface, the event listening bus detects a valid insertion detection signal, sends a handshake response to the chip, and requests the chip to report its chip identity certificate via a low-speed serial bus. The chip then transmits its chip identity certificate and chip descriptor together to the event listening bus.
[0099] The event listening bus forwards the received chip identity certificate to the hardware verification unit of the state consistency arbitrator. The hardware verification unit performs a two-level certificate chain verification according to the following steps: First, it verifies the signature of the manufacturer's certificate with the solidified root public key to confirm the legality of the manufacturer's public key; second, it verifies the signature of the chip identity certificate with the manufacturer's public key to confirm the authenticity of the chip serial number and type identifier; finally, it verifies whether the certificate validity period is within the current timestamp range.
[0100] After all three verifications pass, the hardware verification unit returns an authentication pass flag to the event listening bus. The event listening bus writes the chip descriptor into the pending queue of the physical resource mapping engine, triggering the subsequent incremental update process. If any verification item fails, the hardware verification unit returns an authentication failure flag and a failure reason code to the event listening bus. The event listening bus discards the chip descriptor, does not write it into the pending queue, and reports an alarm message to the system management software. The alarm message includes the slot number, chip serial number, and failure reason code. The slot corresponding to the failed chip enters a locked state. Before the system management software issues an unlock command, the slot will no longer respond to any new insertion events, preventing attackers from brute-forcing the attack by repeatedly inserting and removing unauthorized chips.
[0101] Instruction package authentication refers to the process by which the architecture verifies the legality of the source and the integrity of the content of an instruction package when an external authorized party issues an extensible refactoring instruction set instruction package to the software instruction layer refactoring subsystem. The specific implementation is as follows:
[0102] When generating an instruction packet, the issuer calculates a hash value for the entire content of the instruction packet and signs the hash value with the issuer's private key. The signature result and the issuer's public key certificate are then attached to the header of the instruction packet to form a complete signed instruction packet. The issuer's public key certificate must also be issued by the root private key or an intermediate private key authorized by the root private key, forming the same certificate chain system as the core access authentication.
[0103] After the signed instruction package is transmitted to the extensible refactoring instruction set loading module of the software instruction layer refactoring subsystem via the external interface, the loading module does not immediately execute the instruction package content. Instead, it first forwards the header information of the signed instruction package to the hardware verification unit of the state consistency arbitrator to request the execution of instruction package authentication.
[0104] The hardware verification unit performs instruction packet authentication according to the following steps: First, it verifies the certificate chain of the issuer's public key certificate to confirm the legality of the issuer's public key; second, it verifies the signature with the issuer's public key to confirm that the content of the instruction packet has not been tampered with during transmission; finally, it verifies whether the target chip serial number field contained in the header of the instruction packet is consistent with the serial number of this chip to prevent instruction packets issued for other chips from being illegally reused for this chip.
[0105] After all the above verifications pass, the hardware verification unit returns an authentication pass flag to the scalable reconfiguration instruction set loading module. The loading module loads the instruction package into the sandbox isolation execution domain for execution. At the same time, the reconfiguration state snapshot module saves a snapshot of the current computing state to support rollback to the stable state before authentication in case of instruction package execution failure. If any verification item fails, the hardware verification unit returns an authentication failure flag and failure reason code. The scalable reconfiguration instruction set loading module discards the instruction package, does not perform any reconfiguration operation, and reports an alarm message to the system management software.
[0106] After each instruction packet authentication is completed, regardless of whether the authentication is successful or not, the state consistency arbiter records the authentication event to the reconstructed version chain. The record includes the instruction packet hash value, the publisher's public key identifier, the authentication result, and the timestamp, ensuring that all instruction packet loading operations are traceable and auditable.
[0107] The chip access authentication and instruction packet authentication share the same root public key system and hardware verification unit, forming a unified security encryption authentication platform. Both authentication mechanisms record the authentication results to the reconstructed version chain, so that the trusted state of the chip at any time can be completely restored by traversing the reconstructed version chain. When the integrity verification of the reconstructed version chain fails, the state consistency arbitrator simultaneously suspends the processing of chip access authentication and instruction packet authentication, locks the system state, and waits for authorized recovery operations to ensure that the security mechanism is not bypassed in abnormal system states.
[0108] The content disclosed above is only a preferred and feasible embodiment of the present invention, and is not intended to limit the scope of protection of the present invention. Therefore, all equivalent technical changes made based on the content of the present invention specification and drawings are included within the scope of protection of the present invention. Furthermore, the elements therein can be updated as technology develops.
Claims
1. A wafer-level chip architecture supporting multi-level hybrid reconfiguration, characterized in that, The architecture uses a basic wafer as the physical carrier and includes a software instruction layer reconstruction subsystem, a metal interconnect layer reconstruction subsystem, a three-dimensional packaging layer reconstruction subsystem, and a reconstruction perception controller. The software instruction layer reconstructing subsystem is integrated inside the base wafer and is used to dynamically reconstruct the computing power precision mode and local logic mapping relationship without changing the chip physical structure. The metal interconnect layer reconfiguration subsystem utilizes the reserved reconfiguration area and reconfigurable metal interconnect process on the base wafer to change the physical interconnect topology between functional modules inside the chip. The three-dimensional packaging layer reconstruction subsystem utilizes the standardized packaging interface reserved on the top layer of the basic wafer to achieve capability expansion through three-dimensional stacking of heterogeneous chips; The reconstruction perception controller is integrated on the base wafer and is connected to the software instruction layer reconstruction subsystem, the metal interconnect layer reconstruction subsystem, and the three-dimensional packaging layer reconstruction subsystem, respectively, for unified collaborative management of the three reconstruction subsystems.
2. The wafer-level chip architecture as described in claim 1, characterized in that, The software instruction layer reconfiguration subsystem includes a dynamic instruction set, a reconfigurable computing power controller, and a sandbox isolated execution domain. The dynamic instruction set includes a basic fixed instruction set stored in a one-time programmable memory area whose contents cannot be modified, and an extensible reconfigurable instruction set that is dynamically loaded after signature verification. The reconfigurable computing power controller includes a bit-width configurable multiplier tree that supports dynamic switching between four precision modes: four-bit integer, eight-bit integer, sixteen-bit floating-point, and thirty-two-bit floating-point. It also includes a reconfiguration state snapshot module, which is used to save and restore the computing state during precision switching to achieve hot reconfiguration. The sandbox-isolated execution domain provides an independent hardware-isolated execution domain for each dynamically loaded instruction package, and the domains are isolated from each other through a hardware memory protection unit.
3. The wafer-level chip architecture as described in claim 2, characterized in that, The reserved reconstruction area is pre-arranged with via occupancy arrays and suspended metal leads during the fabrication of the base wafer, and is not connected in the initial state; The reserved reconfiguration area is divided into three levels according to its activation capability: Level 1 corresponds to standard local interconnection, Level 2 corresponds to cross-module bypass connection, and Level 3 corresponds to global rerouting. The reconfigurable metal interconnect process employs a differential activation method, performing selective exposure and metal deposition only on the interconnect nodes that need to be changed this time, and using terahertz time-domain spectroscopy to perform non-contact alignment accuracy detection on the target via area before activation.
4. The wafer-level chip architecture as described in claim 3, characterized in that, The standardized packaging interface adopts a high-density micro-bump layout and defines three types of standard slots: computing power expansion slots, storage expansion slots, and acceleration expansion slots. These are used for the stacking connection of computing power expansion chips, storage expansion chips, and dedicated acceleration chips, respectively. The electrical timing specifications and power management specifications of the three types of slots are unified. The standardized packaging interface embeds an on-chip optical interconnect interface; The three-dimensional encapsulation layer reconstruction subsystem also includes a storage logic fusion chip. The stacked layer of the storage logic fusion chip integrates a lightweight computing logic unit, which supports vector dot product and normalization operations within the storage array. The three-dimensional encapsulation layer reconstruction subsystem adjusts the ratio of the number of computing power chips to storage chips in real time through a dynamic storage-computing ratio adjustment mechanism.
5. The wafer-level chip architecture as described in claim 4, characterized in that, The reconstructed perception controller consists of four functional units: an event listening bus, a physical resource mapping engine, a scheduling baseline pusher, and a state consistency arbiter. The four functional units are interconnected through the internal bus of the reconstructed perception controller. The event listening bus is responsible for collecting state change events of the three-layer reconfiguration subsystem; The physical resource mapping engine maintains a global physical resource graph and uses an incremental update strategy to handle events. The scheduling baseline pusher converts the update results of the physical resource mapping engine into differential scheduling instructions and pushes them to the software instruction layer reconstruction subsystem in an asynchronous and non-blocking manner. The state consistency arbitrator prioritizes concurrent change requests across multiple layers, with changes to the metal interconnect layer having the highest priority, followed by the three-dimensional encapsulation layer, and then the software instruction layer. It also maintains a refactoring version chain, which records the state before and after each refactoring operation in a hash chain manner, supporting rollback of any historical version.
6. The wafer-level chip architecture as described in claim 5, characterized in that, The event listening bus reads the chip descriptor through the handshake pin built into the standardized package interface slot, receives the topology change description table through the unidirectional write interface reserved at the edge of the basic wafer, and receives the refactoring status notification reported by the software instruction layer refactoring subsystem through the on-chip bus. The core descriptor, topology change description table and refactoring status notification must all be digitally signed by the state consistency arbitrator before being written into the physical resource mapping engine's pending queue. The unidirectional write interface is implemented using a one-time programmable memory cell array, and its unidirectional write characteristic physically prevents external access to read the chip's internal topology state through this interface.
7. The wafer-level chip architecture as described in claim 6, characterized in that, Each node in the reconstructed version chain includes a node sequence number, an event timestamp, an event source layer identifier, an event type, a hash value of the global physical resource graph state before the change, a hash value of the global physical resource graph state after the change, an operator's public key identifier, and an operator's digital signature. The chain integrity constraint of the reconstructed version chain is that the hash value of the pre-change state recorded by the Nth node must be equal to the hash value of the post-change state recorded by the Nth node. The state consistency arbiter verifies the constraint before writing to each new node. If the verification fails, it locks the system state and reports a security alarm to the system management software.
8. The wafer-level chip architecture as described in any one of claims 1 to 7, characterized in that, The software instruction layer reconstruction subsystem responds to algorithm evolution requirements on a daily basis, the three-dimensional packaging layer reconstruction subsystem responds to scale expansion requirements on a weekly basis, and the metal interconnect layer reconstruction subsystem responds to topology optimization requirements on a monthly basis. The three response cycles together cover the capability evolution requirements throughout the entire life cycle of the chip product. The reconfiguration sensing controller occupies no more than 3% of the total area of the base wafer, and the reserved reconfiguration area occupies 8% to 15% of the total area of the base wafer.
9. A multi-level hybrid reconfiguration method based on the architecture described in any one of claims 1 to 8, characterized in that, The method includes the following steps: The software instruction layer reconstruction subsystem dynamically loads signed and verified instruction packages through its extensible reconstruction instruction set to reconstruct the computing power precision mode and local logical mapping relationship. During the reconstruction process, the reconstruction state snapshot module saves the current computing state in a snapshot and restores it after the reconstruction is completed, thus achieving hot reconstruction without interrupting the task being executed. The metal interconnect layer reconstruction subsystem performs a differential activation process on the target via region in the reserved reconstruction area to change the physical interconnect topology between functional modules inside the chip. Before activation, terahertz time-domain spectroscopy is used for non-contact alignment detection. Heterogeneous cores are stacked through the standardized encapsulation interface slots of the three-dimensional encapsulation layer reconstruction subsystem. After the event listening bus of the reconstruction perception controller detects the core insertion, it automatically reads the core descriptor and triggers the incremental update of the physical resource mapping engine. The scheduling baseline pusher pushes the update result to the software instruction layer reconstruction subsystem in the form of differential scheduling instructions to complete the scheduling strategy adaptation. The state consistency arbitrator performs priority arbitration on the above three-layer reconstruction operations and records each reconstruction operation to the reconstruction version chain.
10. The method as described in claim 9, characterized in that, When a core of the 3D encapsulation layer reconstruction subsystem goes offline unexpectedly, the event listening bus immediately triggers an emergency offline event. The state consistency arbitrator temporarily raises the processing priority of this event to the highest level. The physical resource mapping engine sets the corresponding node status to fault. The scheduling baseline pusher immediately pushes a task migration instruction to notify the software instruction layer reconstruction subsystem to migrate all tasks on the core to other available resources. Before the migration is completed, no new tasks are allowed to be assigned to this node. When the integrity verification of the reconstructed version chain fails, the state consistency arbiter stops accepting any new reconstructed events, locks the current system state, reports a security alarm to the system management software, and waits for authorized recovery operations.