A reliability optimization method and apparatus

By constructing an electrical capture probability formula and encoding topological features using graph neural networks, and combining Monte Carlo architecture vulnerability factors and hardening cost weights, the problem of unreasonable hardening priority allocation in integrated circuits is solved, and more reasonable reliability optimization is achieved.

CN122242409APending Publication Date: 2026-06-19HANGZHOU JINZHI ZHAOXIN TECHNOLOGY INNOVATION DEVELOPMENT CO LTD

Patent Information

Authority / Receiving Office
CN · China
Patent Type
Applications(China)
Current Assignee / Owner
HANGZHOU JINZHI ZHAOXIN TECHNOLOGY INNOVATION DEVELOPMENT CO LTD
Filing Date
2026-04-07
Publication Date
2026-06-19

AI Technical Summary

Technical Problem

Existing technologies have failed to effectively achieve multi-dimensional collaborative modeling of electrical capture probability, architectural vulnerability factor and circuit topology characteristics, resulting in unreasonable hardening priority allocation and mismatch between hardening schemes and actual circuit characteristics in integrated circuit reliability optimization.

Method used

By constructing an electrical capture probability formula, the vulnerability factor of the Monte Carlo architecture is estimated based on gate-level simulation and static timing analysis. The topological features of the circuit are encoded using a graph neural network, and the reinforcement cost weight is combined to determine the reinforcement priority weight and generate the target reinforcement scheme.

Benefits of technology

This improves the rationality of integrated circuit reliability optimization, comprehensively considering the vulnerability of the circuit physical layer and architecture layer, as well as the actual hardening cost, and achieves more reasonable hardening decisions.

✦ Generated by Eureka AI based on patent content.

Smart Images

  • Figure CN122242409A_ABST
    Figure CN122242409A_ABST
Patent Text Reader

Abstract

This application provides a reliability optimization method and apparatus. First, based on the vulnerability window and the circuit's operating clock cycle, an electrical capture probability formula is constructed to calculate the electrical capture probability. Then, based on gate-level simulation and static timing analysis, timing-aware Monte Carlo architecture vulnerability factor estimation is performed to obtain the trigger-level architecture vulnerability factor. The circuit netlist is converted into a directed graph, and circuit topology feature encoding is completed based on a graph neural network to obtain hardening cost weights. Finally, based on the trigger-level architecture vulnerability factor, electrical capture probability, and hardening cost weights, hardening priority weights are determined. Based on the hardening priority weights and hardening cost weights, a target hardening scheme is determined for reliability optimization. The entire process no longer considers a single indicator for hardening decisions but comprehensively judges the vulnerability of the circuit's physical and architectural layers, actual hardening costs, and circuit topology characteristics, improving the rationality of reliability optimization.
Need to check novelty before this filing date? Find Prior Art

Description

Technical Field

[0001] This application relates to the field of integrated circuit technology, and in particular to a reliability optimization method and apparatus. Background Technology

[0002] As semiconductor process feature sizes continue to shrink, the sensitivity of integrated circuits to soft errors caused by high-energy particle radiation has increased dramatically. Selective hardening based on circuit vulnerability assessment has become the core solution for balancing reliability with area and power consumption overhead. However, existing technologies have not achieved multi-dimensional collaborative modeling and integrated decision-making of electrical capture probability, architectural vulnerability factor, and circuit topology characteristics. They only consider a few indicators for reliability optimization, resulting in unreasonable hardening priority and a mismatch between hardening schemes and actual circuit characteristics.

[0003] In conclusion, how to improve the rationality of reliability optimization is a problem that urgently needs to be solved by those skilled in the art. Summary of the Invention

[0004] In view of this, this application provides a reliability optimization method and apparatus, which aims to improve the rationality of reliability optimization.

[0005] In a first aspect, this application provides a reliability optimization method, including: Based on the vulnerability window and the circuit operating clock cycle, an electrical capture probability formula is constructed to calculate the electrical capture probability. Based on gate-level simulation and static timing analysis, timing-aware Monte Carlo architecture vulnerability factor estimation is performed to obtain trigger-level architecture vulnerability factor. The circuit netlist is converted into a directed graph, and the circuit topology features are encoded based on the graph neural network to obtain the hardening cost weights. Based on the trigger-level architecture vulnerability factor, the electrical capture probability, and the hardening cost weight, the hardening priority weight is determined. Based on the reinforcement priority weight and the reinforcement cost weight, a target reinforcement scheme is determined for reliability optimization.

[0006] Optionally, before constructing the electrical capture probability formula based on the vulnerability window and the circuit operating clock cycle to calculate the electrical capture probability, the method further includes: The formula for calculating the architectural vulnerability factor is determined based on queuing theory; The vulnerability window is determined based on setup time, hold time, and clock skew. The total time to failure of the chip is decomposed into the product of the original time to failure, the electrical capture probability, and the architectural vulnerability factor to construct a decoupled computation model.

[0007] Optionally, the step of constructing an electrical capture probability formula based on the vulnerability window and the circuit operating clock cycle to calculate the electrical capture probability includes: Based on the vulnerability window and the circuit's operating clock cycle, the electrical capture probability formula is constructed. A refined electrical capture probability formula is constructed by introducing four degradation factors into the electrical capture probability formula. The four degradation factors include pulse width degradation factor, electrical shielding factor, metastable factor, and phase resolution time shielding factor. The trigger-level electrical capture probability is calculated using the refined electrical capture probability formula. The determination of hardening priority weights based on the trigger-level architecture vulnerability factor, the electrical capture probability, and the hardening cost weight includes: The hardening priority weight is determined based on the trigger-level architecture vulnerability factor, the trigger-level electrical capture probability, and the hardening cost weight.

[0008] Optionally, the step of performing time-aware Monte Carlo architecture vulnerability factor estimation based on gate-level simulation and static timing analysis to obtain trigger-level architecture vulnerability factors includes: Extract the duration of the dwell period of the trigger from the Fast Signal Database (FSDB) waveform; Importance sampling is performed based on the duration of the residence period and the time margin, and the probability of extended irrelevant terms is calculated through formal verification. By using the probability of extended irrelevant terms as a control variable, the architectural vulnerability factor is modified to obtain the trigger-level architectural vulnerability factor.

[0009] Optionally, the step of converting the circuit netlist into a directed graph and encoding the circuit topology features based on a graph neural network to obtain the hardening cost weights includes: A graph structure is constructed with triggers as nodes and combinational logic connections between triggers as edges. Fan-out number, timing margin, vulnerability factor of the trigger-level architecture, and electrical capture probability are extracted as initial features of the nodes. The initial features of the nodes are aggregated by a graph sampling aggregation encoder, and the area weight, power consumption weight and fan-out sensitivity weight are predicted by a multilayer perceptron.

[0010] Optionally, determining the target reinforcement scheme based on the reinforcement priority weight and the reinforcement cost weight for reliability optimization includes: Based on the hardening priority weight and the hardening cost weight, with the optimization objective of minimizing hardening area overhead and hardening power consumption overhead, the hardening scheme is generated to optimize reliability by constraining the total chip failure time to not exceed a preset target, ensuring that the triggers of key modules are hardened, and that the critical path meets the lower limit of hardening effectiveness.

[0011] Optionally, after calculating the trigger-level electrical capture probability using the refined electrical capture probability formula, the method further includes: By aggregating the trigger-level architecture vulnerability factor and the trigger-level electrical capture probability at the fetch, decode, execute, store, and write-back stages, a pipeline-level architecture vulnerability index is obtained.

[0012] Secondly, this application provides a reliability optimization device, comprising: The first building module is used to construct an electrical capture probability formula based on the vulnerability window and the circuit operating clock cycle, so as to calculate the electrical capture probability; The first processing module is used to perform time-aware Monte Carlo architecture vulnerability factor estimation based on gate-level simulation and static timing analysis to obtain trigger-level architecture vulnerability factors. The second processing module is used to convert the circuit netlist into a directed graph and complete the circuit topology feature encoding based on the graph neural network to obtain the hardening cost weights. The first determining module is used to determine the hardening priority weight based on the trigger-level architecture vulnerability factor, the electrical capture probability, and the hardening cost weight. The second determining module is used to determine the target reinforcement scheme based on the reinforcement priority weight and the reinforcement cost weight, so as to optimize reliability.

[0013] Optionally, the device further includes: The third determination module is used to determine the calculation formula for the architecture vulnerability factor based on queuing theory; The fourth determination module is used to determine the vulnerability window based on the setup time, hold time, and clock deviation; The second building module is used to decompose the total chip failure time into the product of the original failure time, the electrical capture probability, and the architectural vulnerability factor, in order to build a decoupled computation model.

[0014] Optionally, the first building module is specifically used for: Based on the vulnerability window and the circuit's operating clock cycle, the electrical capture probability formula is constructed. A refined electrical acquisition probability formula is constructed by introducing four levels of degradation factors into the electrical acquisition probability formula; the four levels of degradation factors include pulse width degradation factor, electrical shielding factor, metastability factor, and phase resolution time shielding factor; The refined electrical capture probability formula is used to calculate the trigger-level electrical capture probability; The first determining module is specifically used for: The hardening priority weight is determined based on the trigger-level architecture vulnerability factor, the trigger-level electrical capture probability, and the hardening cost weight.

[0015] Optionally, the first processing module is specifically used for: Extract the duration of the dwell period of the trigger from the Fast Signal Database (FSDB) waveform; Importance sampling is performed based on the duration of the residence period and the time margin, and the probability of extended irrelevant terms is calculated through formal verification. By using the probability of extended irrelevant terms as a control variable, the architectural vulnerability factor is modified to obtain the trigger-level architectural vulnerability factor.

[0016] Optionally, the second processing module is specifically used for: A graph structure is constructed with triggers as nodes and combinational logic connections between triggers as edges. Fan-out number, timing margin, vulnerability factor of the trigger-level architecture, and electrical capture probability are extracted as initial features of the nodes. The initial features of the nodes are aggregated by a graph sampling aggregation encoder, and the area weight, power consumption weight and fan-out sensitivity weight are predicted by a multilayer perceptron.

[0017] Optionally, the second determining module is specifically used for: Based on the hardening priority weight and the hardening cost weight, with the optimization objective of minimizing hardening area overhead and hardening power consumption overhead, the hardening scheme is generated to optimize reliability by constraining the total chip failure time to not exceed a preset target, ensuring that the triggers of key modules are hardened, and that the critical path meets the lower limit of hardening effectiveness.

[0018] Optionally, the device further includes: The probability capture module is used to aggregate the trigger-level architecture vulnerability factor and the trigger-level electrical capture probability according to the instruction fetch stage, decoding stage, execution stage, storage stage, and write-back stage to obtain the pipeline-level architecture vulnerability index.

[0019] Thirdly, embodiments of this application provide a computer device, including: a memory, a processor, and a computer program stored in the memory and executable on the processor. When the processor executes the computer program, it implements the reliability optimization method as described in any of the embodiments of the first aspect of this application.

[0020] Fourthly, embodiments of this application provide a computer-readable storage medium storing instructions that, when executed on a terminal device, cause the terminal device to perform a reliability optimization method as described in any of the embodiments of the first aspect of this application.

[0021] This application provides a reliability optimization method. When executing the method, firstly, an electrical capture probability formula is constructed based on the vulnerability window and the circuit's operating clock cycle to calculate the electrical capture probability. Next, based on gate-level simulation and static timing analysis, timing-aware Monte Carlo architecture vulnerability factor estimation is performed to obtain the trigger-level architecture vulnerability factor. Then, the circuit netlist is converted into a directed graph, and circuit topology feature encoding is completed based on a graph neural network to obtain hardening cost weights. Based on the trigger-level architecture vulnerability factor, electrical capture probability, and hardening cost weights, hardening priority weights are determined. Finally, based on the hardening priority weights and hardening cost weights, a target hardening scheme is determined for reliability optimization. In this way, the entire process no longer considers a single indicator for hardening decisions, but rather combines the vulnerabilities of the circuit's physical layer and architecture layer with actual hardening costs and circuit topology characteristics for a comprehensive judgment, improving the rationality of reliability optimization. Attached Figure Description

[0022] To more clearly illustrate the technical solutions in this embodiment or the prior art, the drawings used in the description of the embodiment or the prior art will be briefly introduced below. Obviously, the drawings described below are only some embodiments of this application. For those skilled in the art, other drawings can be obtained based on these drawings without creative effort.

[0023] Figure 1 A flowchart illustrating a reliability optimization method provided in this application embodiment; Figure 2 A schematic diagram of the first-stage algorithm flow provided in the embodiments of this application; Figure 3 A schematic diagram of the second-stage algorithm flow provided in the embodiments of this application; Figure 4 This is a schematic diagram of the structure of a reliability optimization device provided in an embodiment of this application; Figure 5This is a schematic diagram of the structure of a computer device provided in an embodiment of this application. Detailed Implementation

[0024] The technical solutions in the embodiments of this application will be clearly and completely described below with reference to the accompanying drawings. This application provides a reliability optimization method and apparatus, which relates to the field of integrated circuit technology. The above are merely examples and do not limit the application field of the method and apparatus provided in this application.

[0025] As semiconductor process feature sizes continue to shrink, the sensitivity of integrated circuits to soft errors caused by high-energy particle radiation has increased dramatically. Selective hardening based on circuit vulnerability assessment has become the core solution for balancing reliability with area and power consumption overhead. However, existing technologies have not achieved multi-dimensional collaborative modeling and integrated decision-making of electrical capture probability, architectural vulnerability factor, and circuit topology characteristics. They only consider a few indicators for reliability optimization, resulting in unreasonable hardening priority and a mismatch between hardening schemes and actual circuit characteristics.

[0026] The inventors, through research, proposed the technical solution of this application. First, based on the vulnerability window and the circuit's operating clock cycle, an electrical capture probability formula is constructed to calculate the electrical capture probability. Next, based on gate-level simulation and static timing analysis, timing-aware Monte Carlo architecture vulnerability factor estimation is performed to obtain the trigger-level architecture vulnerability factor. Then, the circuit netlist is converted into a directed graph, and circuit topology feature encoding is completed based on a graph neural network to obtain hardening cost weights. Based on the trigger-level architecture vulnerability factor, electrical capture probability, and hardening cost weights, hardening priority weights are determined. Finally, based on the hardening priority weights and hardening cost weights, the target hardening scheme is determined for reliability optimization. In this way, the entire process no longer considers a single indicator for hardening decisions, but rather combines the vulnerabilities of the circuit's physical and architectural layers with actual hardening costs and circuit topology characteristics for a comprehensive judgment, improving the rationality of reliability optimization.

[0027] To enable those skilled in the art to better understand the present application, the present application will be further described in detail below with reference to the accompanying drawings and specific embodiments. Obviously, the described embodiments are only a part of the embodiments of the present application, and not all of them. Based on the embodiments of the present application, all other embodiments obtained by those skilled in the art without creative effort are within the scope of protection of the present application. It should be noted that, for ease of description, only the parts related to the invention are shown in the accompanying drawings. Unless otherwise specified, the embodiments and features in the embodiments of the present application can be combined with each other.

[0028] See Figure 1 , Figure 1A flowchart of a reliability optimization method provided in this application embodiment includes: S101: Based on the vulnerability window and the circuit operating clock cycle, construct the electrical capture probability formula to calculate the electrical capture probability.

[0029] First, Little's queuing theory is introduced into the field of architectural vulnerability analysis, providing rigorous mathematical support for AVF estimation. Specifically, the ACE bit in a flip-flop of an integrated circuit is considered as a customer in a queuing system, the flip-flop as a service station, the write operation of the ACE bit represents a customer arrival, and the overwrite or read operation of the ACE bit represents a customer departure, thus constructing a queuing model with a single queue and a single service station. Little's law of queuing theory: Average number of customers in the system = Customer arrival rate × Average customer dwell time, can be mapped to AVF estimation, resulting in the following formula: ; It should be noted that AVF (Architecturally Vunerability Factor) represents the probability that a storage unit holds an ACE bit, reflecting the degree of fault exposure at the architectural level (the probability that a fault will affect program output); ACE (Architecturally Correct Execution) represents the impact of categorized hardware status bits on program output. Faults in ACE bits will affect the output, while faults in non-ACE bits will not. The architectural vulnerability factor represents trigger i; This represents the average arrival rate of ACE bit writes to the trigger within a unit of time (cycles). - ¹); represents the average dwell time (cycles) of the ACE bit within trigger i. This formula strictly defines AVF as the pure architecture-level failure exposure probability, which is only related to the arrival and dwell behavior of the ACE bit, completely separating it from physical factors such as electrical, timing, and process, and achieving theoretical decoupling between architecture exposure and electrical sensitivity; T is the total simulation period, and ∑ is the traversal of all sampling windows; The dwell time of the single segment of the k-th ACE bit in the i-th trigger; This represents the total number of ACE bits for the i-th trigger within the observation period.

[0030] Next, define the fragile window, whose formula is: ; in, Indicates a vulnerability window; Establish the trigger time; Hold time for the trigger; This is due to clock skew.

[0031] Based on the vulnerability window and the circuit's operating clock cycle, a basic formula for calculating the electrical capture probability is constructed: ; in, Indicates the probability of electrical capture; Indicates the operating clock cycle of the circuit.

[0032] In some implementations, a three-level degradation factor can be introduced based on the basic formula for calculating the electrical capture probability to construct a refined electrical capture probability formula: ; in, The pulse width degradation factor reflects the probability that the SET pulse width exceeds the logic gate decision threshold. The SET pulse attenuation characteristics under different processes, voltages, and temperatures are calibrated through SPICE circuit simulation. The value range is [0,1]. The electrical shielding factor reflects the attenuation and shielding effect of the combinational logic path on the SET pulse. It is determined by the path logic depth, drive capability, and load capacitance, and is calculated through timing analysis and fault propagation simulation. The metastable factor reflects the change in fault capture probability caused by metastability in the time-series critical path. It is used to model the violation path with slack<0 and is calibrated by SPICE Monte Carlo simulation. It is a phase resolution time masking factor, which combines the SET pulse phase and clock alignment relationship to correct the vulnerable window.

[0033] In this embodiment, the total FIT is strictly decomposed into a three-dimensional product of hardware original faults, electrical capture probability, and architecture exposure probability, achieving end-to-end traceability and optimizability. ; Among them, FIT (Failure-In-Time) is the time of failure, which represents the number of failures per billion hours and is a core indicator for measuring hardware reliability. This indicates the overall system soft error FIT value; This represents the original hardware fault FIT value of trigger i; This represents the probability of electrical fault capture for trigger i.

[0034] This allows for a strict separation of the three dimensions of technology, electrical systems, and architecture.

[0035] S102: Based on gate-level simulation and static timing analysis, perform timing-aware Monte Carlo architecture vulnerability factor estimation to obtain trigger-level architecture vulnerability factors.

[0036] First, the Synopsys VCS compiler is used to perform gate-level functional simulation of the chip circuit diagram, outputting FSDB format waveform files to fully record the timing behavior of all flip-flops' level toggling, writing, overwriting, and reading. FSDB (FastSignal Database) represents the format for storing gate-level simulation waveform data and is used for ACE dwell time extraction.

[0037] Next, the timing parameters of each trigger were extracted using the static timing analysis tool PrimeTime STA: setup time. Duration Clock deviation and timing margin This provides input for calculating the electrical capture probability.

[0038] Analyze the trigger states cycle by cycle from the FSDB waveform to extract the non-overlapping ACE dwell window: ; in, For non-overlapping ACE dwell windows; This is the time when the k-th ACE bit is written to the trigger; This represents the moment when the k-th ACE bit is overwritten or read.

[0039] Next, calculate the single-segment dwell time of the k-th ACE bit in the i-th trigger: ; That is, by traversing all simulation cycles and counting the dwell windows of all ACEs of trigger i, the dwell time set { is obtained. }

[0040] To reduce the variance of Monte Carlo estimation and improve computational efficiency, this application employs a time-weighted importance sampling strategy: First, based on the length of the residence period Slack design sampling distribution with timing margin Prioritize sampling high-impact cycles.

[0041] Next, the JasperGold formal verification tool is used to calculate the probability of the extended irrelevant term (XDC). That is, the probability that the ACE bit is logically masked and does not affect the output in the k-th dwell window.

[0042] Finally, Adjusting AVF as a control variable to reduce the false positive rate: ; Based on the typical pipeline stages of a processor (fetch, decode, execute, store, write back), the AVF triggers and By performing mean-weighted aggregation, a stage-level vulnerability index is obtained. and This provides a basis for targeted reinforcement.

[0043] like Figure 2 As shown, Figure 2 This is a schematic diagram of the first-stage algorithm flow provided in the embodiments of this application. The first-stage algorithm flow includes the implementation methods in steps S101 and S102. S103: Convert the circuit netlist into a directed graph, and complete the circuit topology feature encoding based on the graph neural network to obtain the hardening cost weight.

[0044] To achieve topology-aware and context-aware adaptive hardening, this application uses the ROOQ-GNN hardening optimizer to transform the circuit into a graph structure. It learns topological features through a graph neural network to predict the optimal hardening weights. ROOQ (Reliability-Oriented Optimization via Queueing-Theoretic AVF and Adaptive Protection) is a reliability-oriented optimization framework based on queuing-theoretic AVF and adaptive protection; GNN (Graph Neural Network) is a graph neural network used to learn circuit topology features and achieve context-aware hardening priority allocation.

[0045] First, convert the circuit netlist into a directed graph: ; Among them, nodes For all triggers, edges This represents the combinational logic connection relationship between triggers.

[0046] Initial feature vector of nodes for: ; in, Let be the fan-out number of the i-th register.

[0047] Next, a 3-layer graph sampling aggregation encoder, GraphSAGE, is used to aggregate neighbor node features through a message passing mechanism to generate topology-aware embedding vectors. .

[0048] The embedding vector is decoded using a multilayer perceptron (MLP) to predict the hardening cost weight for each trigger. The hardening cost weight includes: , as well as .in, For area weight, For power consumption weight, For fan-out sensitivity weights.

[0049] S104: Determine the hardening priority weight based on the trigger-level architecture vulnerability factor, electrical capture probability, and hardening cost weight.

[0050] Considering AVF, timing, topology, and fan-out factors, define the hardening priority weight for trigger i. : ; in, This represents the maximum fan-out value in all registers. This represents the timing margin of the i-th register.

[0051] S105: Based on the reinforcement priority weight and reinforcement cost weight, determine the target reinforcement scheme for reliability optimization.

[0052] Finally, with the objectives of minimizing area and power consumption, and constrained by reliability (FIT), timing convergence, and mandatory hardening of critical modules, a mixed-integer linear programming (MILP) model is constructed to solve for the optimal hardening scheme. The objective function is: ; in, Represents the set of all triggers; This represents the area overhead of the protection mechanism corresponding to the i-th register; This represents the power consumption overhead of the protection mechanism corresponding to the i-th register; , This indicates that the i-th register is hardened. This indicates that no reinforcement is needed.

[0053] Its core constraints are: ; in, The FIT reduction factor is applied to the i-th register after applying the corresponding protection mechanism; The system target failure time rate.

[0054] The constraints include, but are not limited to: meeting the system's FIT target, requiring critical modules to be hardened, and setting a lower limit for the effectiveness of hardening critical paths.

[0055] Finally, the MILP solver automatically selects the optimal solution for each trigger, minimizing global overhead.

[0056] like Figure 3 As shown, Figure 3This is a schematic diagram of the first-stage algorithm flow provided in the embodiments of this application. The first-stage algorithm flow includes the implementation methods in steps S103, S104 and S105.

[0057] In this embodiment, an electrical capture probability formula is first constructed based on the vulnerability window and the circuit's operating clock cycle to calculate the electrical capture probability. Next, based on gate-level simulation and static timing analysis, timing-aware Monte Carlo architecture vulnerability factor estimation is performed to obtain the trigger-level architecture vulnerability factor. Then, the circuit netlist is converted into a directed graph, and circuit topology feature encoding is completed based on a graph neural network to obtain hardening cost weights. Based on the trigger-level architecture vulnerability factor, electrical capture probability, and hardening cost weights, hardening priority weights are determined. Finally, based on the hardening priority weights and hardening cost weights, a target hardening scheme is determined for reliability optimization. In this way, the entire process no longer considers a single indicator for hardening decisions, but rather combines the vulnerabilities of the circuit's physical and architectural layers with actual hardening costs and circuit topology characteristics for a comprehensive judgment, improving the rationality of reliability optimization.

[0058] The above are some specific implementations of the reliability optimization method provided in the embodiments of this application. Based on this, this application also provides a corresponding apparatus. The apparatus provided in the embodiments of this application will be described below from the perspective of functional modularity.

[0059] See Figure 4 , Figure 4 This is a schematic diagram of a reliability optimization device 400 provided in an embodiment of this application. The reliability optimization device includes: The first building module 410 is used to build an electrical capture probability formula based on the vulnerability window and the circuit operating clock cycle, so as to calculate the electrical capture probability. The first processing module 420 is used to perform time-aware Monte Carlo architecture vulnerability factor estimation based on gate-level simulation and static timing analysis to obtain trigger-level architecture vulnerability factors. The second processing module 430 is used to convert the circuit netlist into a directed graph and complete the circuit topology feature encoding based on the graph neural network to obtain the hardening cost weight. The first determining module 440 is used to determine the hardening priority weight based on the trigger-level architecture vulnerability factor, the electrical capture probability, and the hardening cost weight. The second determining module 450 is used to determine the target reinforcement scheme based on the reinforcement priority weight and the reinforcement cost weight, so as to optimize reliability.

[0060] Optionally, the device 400 further includes: The third determination module is used to determine the calculation formula for the architecture vulnerability factor based on queuing theory; The fourth determination module is used to determine the vulnerability window based on the setup time, hold time, and clock deviation; The second building module is used to decompose the total chip failure time into the product of the original failure time, the electrical capture probability, and the architectural vulnerability factor, in order to build a decoupled computation model.

[0061] Optionally, the first construction module 410 is specifically used for: Based on the vulnerability window and the circuit's operating clock cycle, the electrical capture probability formula is constructed. A refined electrical acquisition probability formula is constructed by introducing four levels of degradation factors into the electrical acquisition probability formula; the four levels of degradation factors include pulse width degradation factor, electrical shielding factor, metastability factor, and phase resolution time shielding factor; The refined electrical capture probability formula is used to calculate the trigger-level electrical capture probability; The first determining module 440 is specifically used for: The hardening priority weight is determined based on the trigger-level architecture vulnerability factor, the trigger-level electrical capture probability, and the hardening cost weight.

[0062] Optionally, the first processing module 420 is specifically used for: Extract the duration of the dwell period of the trigger from the Fast Signal Database (FSDB) waveform; Importance sampling is performed based on the duration of the residence period and the time margin, and the probability of extended irrelevant terms is calculated through formal verification. By using the probability of extended irrelevant terms as a control variable, the architectural vulnerability factor is modified to obtain the trigger-level architectural vulnerability factor.

[0063] Optionally, the second processing module 430 is specifically used for: A graph structure is constructed with triggers as nodes and combinational logic connections between triggers as edges. Fan-out number, timing margin, vulnerability factor of the trigger-level architecture, and electrical capture probability are extracted as initial features of the nodes. The initial features of the nodes are aggregated by a graph sampling aggregation encoder, and the area weight, power consumption weight and fan-out sensitivity weight are predicted by a multilayer perceptron.

[0064] Optionally, the second determining module 450 is specifically used for: Based on the hardening priority weight and the hardening cost weight, with the optimization objective of minimizing hardening area overhead and hardening power consumption overhead, the hardening scheme is generated to optimize reliability by constraining the total chip failure time to not exceed a preset target, ensuring that the triggers of key modules are hardened, and that the critical path meets the lower limit of hardening effectiveness.

[0065] Optionally, the device 400 further includes: The probability capture module is used to aggregate the trigger-level architecture vulnerability factor and the trigger-level electrical capture probability according to the instruction fetch stage, decoding stage, execution stage, storage stage, and write-back stage to obtain the pipeline-level architecture vulnerability index.

[0066] This application also provides corresponding devices and computer storage media for implementing the solutions provided in this application.

[0067] like Figure 5 As shown, computer device 01 is represented in the form of a general-purpose computing device. The components of computer device 01 may include, but are not limited to: one or more processors or processor units 03, system memory 08, and bus 04 connecting different system components (including system memory 08 and processor unit 03).

[0068] Bus 04 represents one or more of several bus architectures, including memory buses or memory controllers, peripheral buses, graphics acceleration ports, processors, or local buses using any of the various bus architectures. Examples of these architectures include, but are not limited to, the Industry Standard Architecture (ISA) bus, the Micro Channel Architecture (MAC) bus, the Enhanced ISA bus, the Video Electronics Standards Association (VESA) local bus, and the Peripheral Component Interconnect (PCI) bus.

[0069] Computer device 01 typically includes a variety of computer system readable media. These media can be any available media that can be accessed by computer device 01, including volatile and non-volatile media, removable and non-removable media.

[0070] System memory 08 may include computer system readable media in the form of volatile memory, such as random access memory (RAM) 09 and / or cache memory 10. Computer device 01 may further include other removable / non-removable, volatile / non-volatile computer system storage media. By way of example only, storage system 11 may be used to read and write non-removable, non-volatile magnetic media (…). Figure 5 Not shown; usually referred to as a "hard drive"). Although Figure 5As not shown, a disk drive for reading and writing to a removable non-volatile disk (e.g., a "floppy disk") and an optical disk drive for reading and writing to a removable non-volatile optical disk (e.g., a CD-ROM, DVD-ROM, or other optical media) may be provided. In these cases, each drive may be connected to bus 04 via one or more data media interfaces. System memory 08 may include at least one program product having a set (e.g., at least one) of program modules configured to perform the functions of the embodiments of the present invention.

[0071] A program / utility 12 having a set (at least one) of program modules 13 may be stored, for example, in system memory 08. Such program modules 13 include, but are not limited to, an operating system, one or more application programs, other program modules, and program data. Each or some combination of these examples may include an implementation of a network environment. Program modules 13 typically perform the functions and / or methods described in the embodiments of the present invention.

[0072] Computer device 01 can also communicate with one or more external devices 02 (e.g., keyboard, pointing device, display 07, etc.), and with one or more devices that enable a user to interact with the computer device 01, and / or with any device that enables the computer device 01 to communicate with one or more other computing devices (e.g., network card, modem, etc.). This communication can be performed through input / output (I / O) interface 06. Furthermore, computer device 01 can also communicate with one or more networks (e.g., local area network (LAN), wide area network (WAN), and / or public networks, such as the Internet) through network adapter 05. Figure 5 As shown, network adapter 05 communicates with other modules of computer device 01 via bus 04. It should be understood that, although... Figure 5 As not shown in the diagram, it can be used in conjunction with computer device 01 with other hardware and / or software modules, including but not limited to: microcode, device drivers, redundant processing units, external disk drive arrays, RAID systems, tape drives, and data backup storage systems.

[0073] The processor unit 03 executes various functional applications and data processing by running programs stored in the system memory 08, such as implementing a reliability optimization method provided in the embodiments of this application.

[0074] It should be noted that, in this document, relational terms such as "first" and "second" are used only to distinguish one entity or operation from another, and do not necessarily require or imply any such actual relationship or order between these entities or operations. Furthermore, the terms "comprising," "including," or any other variations thereof are intended to cover non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements includes not only those elements but also other elements not expressly listed, or elements inherent to such a process, method, article, or apparatus. Without further limitations, an element defined by the phrase "comprising one..." does not exclude the presence of other identical elements in the process, method, article, or apparatus that includes said element.

[0075] As can be seen from the above description of the embodiments, those skilled in the art can clearly understand that all or part of the steps in the methods of the above embodiments can be implemented by means of software plus a general-purpose hardware platform. Based on this understanding, the technical solution of this application can be embodied in the form of a software product. This computer software product can be stored in a storage medium, such as a read-only memory (ROM) / RAM, magnetic disk, optical disk, etc., including several instructions to cause a computer device (which may be a personal computer, a server, or a network communication device such as a router) to execute the methods described in various embodiments or some parts of the embodiments of this application.

[0076] The various embodiments in this specification are described in a progressive manner. Similar or identical parts between embodiments can be referred to mutually. Each embodiment focuses on its differences from other embodiments. In particular, the apparatus embodiments are basically similar to the method embodiments, so the description is relatively simple; relevant parts can be referred to the descriptions in the method embodiments. Some or all of the modules can be selected to achieve the purpose of this embodiment according to actual needs. Those skilled in the art can understand and implement this without creative effort.

[0077] The above description is merely an exemplary implementation of this application and is not intended to limit the scope of protection of this application.

Claims

1. A reliability optimization method, characterized in that, include: Based on the vulnerability window and the circuit operating clock cycle, an electrical capture probability formula is constructed to calculate the electrical capture probability. Based on gate-level simulation and static timing analysis, timing-aware Monte Carlo architecture vulnerability factor estimation is performed to obtain trigger-level architecture vulnerability factor. The circuit netlist is converted into a directed graph, and the circuit topology features are encoded based on the graph neural network to obtain the hardening cost weights. Based on the trigger-level architecture vulnerability factor, the electrical capture probability, and the hardening cost weight, the hardening priority weight is determined. Based on the reinforcement priority weight and the reinforcement cost weight, a target reinforcement scheme is determined for reliability optimization.

2. The method according to claim 1, characterized in that, Before calculating the electrical capture probability by constructing an electrical capture probability formula based on the vulnerability window and the circuit's operating clock cycle, the method further includes: The formula for calculating the architectural vulnerability factor is determined based on queuing theory; The vulnerability window is determined based on setup time, hold time, and clock skew. The total time to failure of the chip is decomposed into the product of the original time to failure, the electrical capture probability, and the architectural vulnerability factor to construct a decoupled computation model.

3. The method according to claim 1, characterized in that, The electrical capture probability formula is constructed based on the vulnerability window and the circuit operating clock cycle to calculate the electrical capture probability, including: Based on the vulnerability window and the circuit's operating clock cycle, the electrical capture probability formula is constructed. A refined electrical acquisition probability formula is constructed by introducing four levels of degradation factors into the electrical acquisition probability formula; the four levels of degradation factors include pulse width degradation factor, electrical shielding factor, metastability factor, and phase resolution time shielding factor; The refined electrical capture probability formula is used to calculate the trigger-level electrical capture probability; The determination of hardening priority weights based on the trigger-level architecture vulnerability factor, the electrical capture probability, and the hardening cost weight includes: The hardening priority weight is determined based on the trigger-level architecture vulnerability factor, the trigger-level electrical capture probability, and the hardening cost weight.

4. The method according to claim 1, characterized in that, The method, based on gate-level simulation and static timing analysis, performs timing-aware Monte Carlo architecture vulnerability factor estimation to obtain trigger-level architecture vulnerability factors, including: Extract the duration of the dwell period of the trigger from the Fast Signal Database (FSDB) waveform; Importance sampling is performed based on the duration of the residence period and the time margin, and the probability of extended irrelevant terms is calculated through formal verification. By using the probability of extended irrelevant terms as a control variable, the architectural vulnerability factor is modified to obtain the trigger-level architectural vulnerability factor.

5. The method according to claim 1, characterized in that, The process of converting the circuit netlist into a directed graph and encoding the circuit topology features based on a graph neural network to obtain hardening cost weights includes: A graph structure is constructed with triggers as nodes and combinational logic connections between triggers as edges. Fan-out number, timing margin, vulnerability factor of the trigger-level architecture, and electrical capture probability are extracted as initial features of the nodes. The initial features of the nodes are aggregated by a graph sampling aggregation encoder, and the area weight, power consumption weight and fan-out sensitivity weight are predicted by a multilayer perceptron.

6. The method according to claim 1, characterized in that, The process of determining a target reinforcement scheme based on the reinforcement priority weight and the reinforcement cost weight for reliability optimization includes: Based on the hardening priority weight and the hardening cost weight, with the optimization objective of minimizing hardening area overhead and hardening power consumption overhead, the hardening scheme is generated to optimize reliability by constraining the total chip failure time to not exceed a preset target, ensuring that the triggers of key modules are hardened, and that the critical path meets the lower limit of hardening effectiveness.

7. The method according to claim 3, characterized in that, After calculating the trigger-level electrical capture probability using the refined electrical capture probability formula, the method further includes: By aggregating the trigger-level architecture vulnerability factor and the trigger-level electrical capture probability at the fetch, decode, execute, store, and write-back stages, a pipeline-level architecture vulnerability index is obtained.

8. A reliability optimization device, characterized in that, include: The first building module is used to construct an electrical capture probability formula based on the vulnerability window and the circuit operating clock cycle, so as to calculate the electrical capture probability; The first processing module is used to perform time-aware Monte Carlo architecture vulnerability factor estimation based on gate-level simulation and static timing analysis to obtain trigger-level architecture vulnerability factors. The second processing module is used to convert the circuit netlist into a directed graph and complete the circuit topology feature encoding based on the graph neural network to obtain the hardening cost weights. The first determining module is used to determine the hardening priority weight based on the trigger-level architecture vulnerability factor, the electrical capture probability, and the hardening cost weight. The second determining module is used to determine the target reinforcement scheme based on the reinforcement priority weight and the reinforcement cost weight, so as to optimize reliability.

9. A computer device, characterized in that, include: A memory, a processor, and a computer program stored in the memory and executable on the processor, wherein the processor, when executing the computer program, implements the reliability optimization method as described in any one of claims 1-7.

10. A computer-readable storage medium, characterized in that, The computer-readable storage medium stores instructions that, when executed on a terminal device, cause the terminal device to perform the reliability optimization method as described in any one of claims 1-7.