A method for automatically analyzing defects
By constructing a multi-source evidence set and pruning root cause candidate nodes using a graph attention network, combined with a historical knowledge base and a test predicate database, the accuracy problem of defect analysis of large language models in complex systems is solved, achieving higher analytical accuracy and illusion suppression.
Patent Information
- Authority / Receiving Office
- CN · China
- Patent Type
- Applications(China)
- Current Assignee / Owner
- AUTOLINK INFORMATION TECHNOLOGY CO LTD
- Filing Date
- 2026-05-21
- Publication Date
- 2026-06-19
AI Technical Summary
In complex systems, defect analysis based on large language models suffers from low accuracy and is prone to illusions.
A multi-source evidence set is constructed, a defect panorama embedding is generated through the evidence fusion module, non-generative prior constraints are generated by combining the historical defect knowledge base, root cause candidate nodes are pruned using a graph attention network, hypothesis consistency is verified by examining the predicate library, a dual-channel feedback mechanism is adopted to calibrate the belief distribution, eliminate hallucination hypotheses, and improve the accuracy of the analysis.
It effectively reduces the probability of hallucinations in model predictions, improves the accuracy of defect analysis, and enhances the ability to analyze complex systems.
Smart Images

Figure CN122242572A_ABST
Abstract
Description
Technical Field
[0001] This invention relates to the field of data processing technology, specifically to an automatic defect analysis method. Background Technology
[0002] In the mid-to-late stages of embedded product development projects such as intelligent vehicles, drones, and intelligent robots, defect (hereinafter referred to as bug) handling is a major task for developers. Improving the efficiency of bug reporting and fixing is a key focus in the field. For example, the automatic bug reporting technology described in patent CN121233968B improves reporting efficiency. Bug-related data discovered during testing is automatically uploaded to a defect database. Uploaded data includes: bug occurrence time, slice logs, and structured problem descriptions of the defect. Coders analyze and correct bugs within their assigned scope. Currently, most mainstream bug analysis methods are still based on manual analysis. Coders analyze bugs by referring to slice logs and problem descriptions generated in the system when the bug occurred, as submitted by testers. The analysis time varies greatly depending on the developers' abilities. Furthermore, when encountering bugs with complex root causes, collaboration among coders from multiple modules is required, or coders must be very familiar with the system architecture, often requiring significant manpower to resolve.
[0003] With technological advancements, large language models have been introduced into the field of defect analysis. For example, document CN121858999A discloses a method for defect analysis based on historical defect data. This method constructs input text from historical defect data using defect description text and associated logs, and output text based on bug analysis results. Training sample data consisting of system prompts, input text, and output text is then used to train a large language model, identifying the logical relationship between logs and defect semantics. The trained large language model then receives new defect description texts and associated logs submitted by testers and provides defect troubleshooting suggestions. This approach trains a large language model using historical defect data, helping developers narrow down the scope of their investigations. While it doesn't change the manual bug analysis approach, it improves developer efficiency by narrowing the analysis scope.
[0004] However, in the development of embedded systems for smart devices, the control system structure is very complex, and the actual use environment of the product is diverse and difficult to predict, resulting in a wide variety of defect phenomena and root causes. The sample data built based on historical defect data covers only a limited range of situations. The output answer of the Large Language Model (LLM) is essentially a statistical prediction based on existing knowledge. In addition, the model's operating mechanism tends to reward guesses. Therefore, in the absence of sufficient constraints, the model often "fabricates" seemingly reasonable but actually non-existent causal relationships based on statistical priors. That is, the probability of hallucinations in the model's answer is very high, which leads to low accuracy of the inference results output by the model when performing defect analysis on complex systems. Consequently, the application scenarios of reasoning based on the Large Language Model are limited. Summary of the Invention
[0005] To address the issue of low accuracy in analyzing defects in complex systems using large language models in existing technologies, this invention provides an automatic defect analysis method that can effectively reduce the probability of illusions in model predictions and improve the accuracy of defect analysis based on large language models.
[0006] The technical solution of the present invention is as follows: an automatic defect analysis method, characterized in that it includes the following steps: S1: Construct global input I i ={ε i , G, H * ,ρ}; Where, ε i Let G be the multi-source evidence set corresponding to the i-th input data, and the total number of evidence types M in the multi-source evidence set is greater than or equal to 4; G is the basic graph structure of the program, and H is the multi-source evidence set corresponding to the i-th input data. * For historical defect knowledge base, ρ is the test predicate base; I i This is the global input corresponding to the i-th input data; S2: Construct global output O i ={ H i ~ ,Π i U i}; Among them, H i ~ H represents the set of root cause hypotheses corresponding to the i-th sample. i ~ ={H k} K k=1 H k This represents the k-th root cause hypothesis corresponding to the i-th sample, where K represents the total number of root cause hypotheses in the root cause hypothesis set; O i This is the global output corresponding to the i-th sample; Πi ={π (t) k} K k=1 Π i ∈△ K π represents the root cause belief distribution corresponding to the i-th sample; (t) k △ represents the confidence probability that the k-th root cause hypothesis is the true root cause in the t-th iteration step; K This represents the set of mathematical constraints that the distribution of root cause beliefs must satisfy, specifically including: non-negativity and the sum of all values being 1; U i ={h i ,k i ,△Θ},U i h is the system update data corresponding to the i-th sample. i For the current i-th sample, the defect panorama embedding is given by k. i ΔΘ is the true root cause index after labeling corresponding to the i-th sample, and ΔΘ is the parameter increment of the model's lightweight update. S3: Construct the evidence fusion module; The evidence fusion module provides a multi-source heterogeneous evidence set ε corresponding to each input data. i After preprocessing and fusion, a comprehensive feature representing the current defect is generated, denoted as: Defect Panoramic Embedding h. i Simultaneously, the historical defect knowledge base H was searched. * Generate non-generative prior constraints N i This constrains the range of subsequent root cause hypotheses that can be generated. S4: Construct the constraint reasoning module; The constraint reasoning module constructs a dynamic fault propagation subgraph G for each input data point, using the defect anomaly point as the root. i By using the Graph Attention Network (GAT) to fuse the panoramic embedding of defects, h i With node feature x v Calculate the root cause candidate score α for each node. v , v∈G i Filtering low-scoring nodes achieves structured pruning of the hypothesis generation space, eliminating irrelevant nodes to obtain a root cause candidate node set; then, based on the large language model, root cause hypotheses are generated only within the root cause candidate node set; for each generated root cause hypothesis H... k By verifying the consistency between the predicate library ρ and the actual code, the violation degree V(H) is quantitatively calculated. k ), eliminating hallucination assumptions whose violation exceeds a threshold; S5: Construct a defect analysis model, which includes: an evidence fusion module and a constraint reasoning module connected in sequence; S6: Construct training sample data based on historical defect data, and train the model to obtain the trained defect analysis model; The structure of the training sample data includes: global input I i and global output O i ; S7: Deploy the trained defect analysis model in the defect management system; Based on the received defect-related data, the system constructs a global input I for each defect. i The data is fed into the defect analysis model, and the model's output is used to construct output data O. i The inference results based on the input data are obtained.
[0007] Its further features are: The defect analysis model also includes a post-processing module, which is set after the constraint reasoning module; The post-processing module is based on node attention score a v With hypothesis violation degree V(H) k The initial belief distribution of the root cause hypothesis is initialized to obtain the initial belief value π of the k-th root cause hypothesis. (0) k Design a dual-channel feedback mechanism to update beliefs based on feedback and output the updated root cause hypothesis set and belief distribution. The global input construction method includes the following steps: a1: The basic graph structure of the program is: G=(V,E), where V is a node and E is an edge; The program's basic graph structure is a directed graph structure that represents the internal logical connections and data flow of the software code; The node V represents a basic unit of code in the software that executes independently and has logical boundaries; it is the physical carrier of defects. The edge E represents the semantic relationship between nodes; it is the path carrier of fault propagation. a2: Construct a multi-source evidence set as: ε i ={e i,m} M m=1 ; In the multi-source evidence set ε, e i,m Let m represent the evidence of the m-th type corresponding to the i-th defect sample, where M is the total number of evidence types; the types corresponding to m include: stack trace, code change, test coverage, and dependency topology; According to the preset function definition, find the function where the defect occurred and record it as: function to be counted; The stack log is a slice log submitted when a defect is reported; The code changes mentioned refer to the code changes of the function to be counted, from the version where the defect was last detected to the version where the defect was detected this time; code change records are not required for functions being tested for the first time. The test coverage rate refers to the test coverage rate of the code of the function to be statistically analyzed in the version where the defect was found. The dependency topology relationship refers to the calling relationship between nodes V in the code of the function to be statistically analyzed; a3: Construct a historical defect knowledge base as follows: H * ={h j * ,k j *} j=1 N ; Among them, h j * The fusion embedding of the j-th historical defect case with already labeled root causes; k j * Let N be the root cause index corresponding to the j-th historical defect case, and N be the total number of historical defect cases in the historical defect knowledge base. a4: Construct the test predicate library as follows: ρ={P r} R r=1 ; Among them, P r To test the predicate, let represent the r-th code fact determination rule; R is the total number of code fact determination rules defined in the test predicate library; the code fact determination rules are used to verify the root cause hypothesis H. k Does the content match the facts in the code? Root cause hypothesis H k Includes: nodes for defect location and descriptions of defect causes; The types of edges include: synchronous call edges, data dependency edges, inheritance edges, implementation dependency edges, instantiation edges, initialization edges, asynchronous callback edges, and log printing edges; The types of nodes include: function nodes, method nodes, class nodes, module nodes, interface nodes, code block nodes, critical path nodes, and resource nodes; The operations in the evidence fusion module include: b1: Map all original evidence to a unified dimensional real number space to eliminate modal heterogeneity; e i,m ~ =Enc m (e i,m ); Among them, e i,m ∈R d_m For the original evidence of the m-th type of the i-th sample, dm Let Enc be the dimension of the original evidence of the m-th type; m (·) represents a modal encoder specifically designed for different types of evidence; e i,m ~ ∈R d The embedded m-th type of evidence after encoding has all modalities unified to d-dimensional; b2: The confidence center embedding μ is obtained by quantifying the uncertainty of each modality on the current sample using the evidence calibration head. i,m and reliability weight ω i,m ; (μ i,m ,τ i,m )=g m (e i,m ~ ); ; Among them, g m (·) is a specific calibration head for the m-th type of evidence, consisting of an input layer, a hidden layer and an output layer, with the dimension of the hidden layer being d / 2; μ i,m ∈R d The confidence center embedding for the m-th type of evidence characterizes the valid features of the evidence; τ i,m Let be the log-variance uncertainty of the m-th type of evidence corresponding to the i-th input data; τ i,m’ Let be the log-variance uncertainty of the m'-th type of evidence corresponding to the i-th input data; ω i,m ∈(0,1) represents the reliability weight of the m-th type of evidence corresponding to the i-th input data, satisfying the normalization constraint: ; b3: The confidence center embeddings of each modality are fused according to reliability weights to generate a defect panorama embedding h. i The comprehensive characteristics that characterize the current defect: ; In the formula, It is a linear projection function. =W p x+b p W p ∈R d×d and b p ∈R d h are the learnable parameters of the projection function. i ∈R d For defect panoramic embedding; b4: Calculate the cosine similarity s between the current defect embedding and the historical defect embeddings. ij Retrieve the L most similar cases to generate non-generative prior constraints N. i This constrains the range of subsequent root cause hypotheses that can be generated. ; N i ={ h j * |rank(s ij )≤L}; In the formula, h j * ∈R d For the fusion embedding of the j-th historical defect case in the historical defect knowledge base; s ij ∈[-1,1] represents the cosine similarity between the defect corresponding to the current i-th input data and the j-th historical defect; L is the number of similar cases; rank() is the ranking function; N i For the i-th input data, there are non-generative prior constraints; The modality-specific encoders include: Transformer-type models for stack and log text, CodeBERT or GraphCodeBERT models for code changes, single-layer MLP models for test coverage, and graph encoders for topology dependencies. The operations in the constraint reasoning module include: c1: Starting from the defect anomaly root node v0, perform bounded depth expansion and effective edge filtering on the program base graph G to generate a dynamic subgraph G containing only nodes related to fault propagation. i ; G i ={(v,e)∈G|depth(v,v0)≤D max ,e∈ε valid}; Among them, G i =(V i E i G represents the dynamic fault propagation subgraph of the i-th sample. i A subset of G; V i For G i The set of nodes in E i For G i The set of edges in; depth(v,v0) represents the shortest path length from node v to root node v0; D max Indicates the maximum expansion depth; ε valid The set of valid edges; c2: is the dynamic subgraph G iEach node v constructs a multi-dimensional comprehensive node feature x v ; x v =[cyclo(v),change_cnt(v),cov(v),embed(v)] T ; In the formula, x v ∈R d_x d x =3+d represents the total dimension of node features; cyclo(v)∈[0,1] represents the cyclomatic complexity of the function; change_cnt(v)∈[0,1]: Number of code changes within a specified time period; cov(v)∈[0,1] represents test coverage; embed(v)∈R d Represents the semantic embedding of functions and modules; c3: Fusing defective panoramic embeddings using Graph Attention Network (GAT) h i With node feature x v The root cause candidate score α is obtained by calculating the attention score of each node. v Filter out low-scoring nodes to obtain the root cause candidate node set C. i ; ; C i ={v∈V i |α v ≥ε}; In the formula, α v ∈(0,1) represents the root cause candidate scores of node v, satisfying ∑α v =1, v∈V i ; a is the attention vector of the learnable graph attention, a∈R d+d_x ; W1∈R (d+d_x)×d and W2∈R (d+d_x)×d_x The feature projection matrix is a learnable feature; ε represents the temperature coefficient; C represents the scoring threshold; i This is the set of candidate nodes for root causes. c4: Based on the Large Language Model (LLM), it only applies to the root cause candidate node set C. i Internally, combined with non-generative prior constraints N i The root cause hypothesis {H} k} k=1 K ; c5: For each generated root cause hypothesis H k H was verified by examining the predicate corpus ρ. kConsistency with the code facts is used to calculate the violation degree V(H). k Remove violations exceeding the threshold V. max The illusion hypothesis yields the feasible root cause hypothesis set H. i ; ; In the formula, P r To test the predicate, let represent the r-th code fact judgment rule; R is the total number of code fact judgment rules defined in the test predicate library ρ; P r (H k ) indicates the judgment hypothesis H k Does it meet P? r Defined code facts; λ r ∈(0,1) represents the predicate weight, satisfying the condition: Σλ r =1, 1≤r≤R; H i ={H k |V(H k )≤V max}; In the formula, H i V is the set of feasible root cause hypotheses. max This violates the threshold. The operations in the post-processing module include: d1: Node attention score a v With hypothesis violation degree V(H) k ), initialize the belief distribution of the root cause hypothesis, and obtain the initial belief π. k (0) ; ; In the formula, π k (0) ∈△ K Let the initial belief value of the k-th hypothesis satisfy the following constraints: ; V(H k ) represents the root cause hypothesis H k The corresponding dynamic subgraph node, α V(H_k) Represents the dynamic subgraph node V(H) k The root cause candidate score, where β is the preset violation penalty coefficient, exp(-β·V(H)). k )) represents the penalty item based on the degree of violation; d2: Define the observation feedback of the human review channel and the system channel as O. obs ={O human O sys}; Among them, O humanHuman-centric observations, the annotation results of R&D engineers on root cause hypotheses, are represented in the form of a modified belief distribution. ∈△ K ; O sys To ensure objective system observation, the code is corrected according to the detection results output by the constraint reasoning module. The reproduction results of the defect are then tested in a specified environment. The reproduction results are represented by a binary variable y, where y∈{0,1}, and a value of 0 indicates that the defect was not reproduced, while a value of 1 indicates that the defect was reproduced. d3: Updating beliefs about the human passageway using projected KL minimization; The confidence probability π of the k-th root cause hypothesis being the true root cause in the (t+1)-th iteration step k (t+1) The result was obtained through calculation: ; Where, π (t) This represents the confidence probability that the root cause hypothesis is the true root cause in the t-th iteration step; γ is the smoothing coefficient. d4: Likelihood weighting is used to update beliefs about objective observations of the system, transforming objective observations of the system environment into probability weights; ; In the formula, L(y|H k Let be the likelihood function, representing the likelihood under hypothesis H. k Under the given conditions, the probability of observing the result y is calibrated by the model's historical accuracy: ; ρ correct ∈(0,1) represents the historical root cause localization accuracy of the model, estimated by the confusion matrix; The post-processing module also includes operations such as lightweight model parameter adaptation and historical knowledge base update. The lightweight parameter adaptation of the model is as follows: only the core sub-modules of the model are updated in small steps using low-rank LoRA adaptation, while the parameters of the main body of the large model and the encoder are frozen. W t+1 =W (t) +△W; △W=A·B T ; In the formula, W (t) Here is the weight matrix of the module to be updated; △W is the weight increment, obtained from the labeled data of the dual-channel feedback; A∈R d×r and B∈R r×d Let r be a low-rank matrix; The historical knowledge base is updated to embed the current defect's full picture into h. iCompared with the labeled true root cause index k i Add to the historical defect knowledge base H * .
[0008] This application provides an automatic defect analysis method that constructs a multivariate evidence set comprising at least four types of heterogeneous data to describe defects from different dimensions, thereby improving the completeness and representativeness of sample features. In the evidence fusion module, during preprocessing of the multivariate heterogeneous evidence data, non-generative prior constraints are generated by retrieving historical defect knowledge bases to constrain the generation range of subsequent root cause hypotheses, thus suppressing illusions. In the constraint reasoning module, a dynamic fault propagation subgraph is constructed with defect anomalies as roots. Root cause candidate nodes are pruned using graph attention to generate a root cause node candidate set. Root cause hypotheses are generated only within this candidate set, and the consistency between the hypotheses and the code facts is tested using computable violation tests to eliminate illusion hypotheses. Finally, a defect fusion embedding h is fused using a graph attention network (GAT). i With node feature x v This method calculates the root cause candidate score for each node, filters low-scoring nodes, and achieves structured pruning of the hypothesis generation space, selecting high-probability root cause candidate nodes to improve the accuracy of the results. For each generated root cause hypothesis, its consistency with the code facts is verified by examining the predicate library, and the violation degree is quantified. Illusion hypotheses with violation degrees exceeding a threshold are eliminated. The root cause output in this application is not a single root cause conclusion but a set of root cause hypotheses. Each hypothesis in the set is accompanied by a belief distribution. In the post-processing module, a dual-channel feedback mechanism is established to quantify the credibility of each hypothesis in probabilistic form. The initial belief is jointly determined by the graph attention score and the violation degree. Subsequent dual-channel feedback continuously corrects the posterior, calibrating the belief and improving the accuracy of the final judgment result. This method can effectively reduce the probability of hallucinations in the model's prediction results and improve the accuracy of defect analysis based on large language models. Attached Figure Description
[0009] Figure 1 This is a schematic diagram of the system structure of the defect analysis model; Figure 2 Example 1 of multi-source heterogeneous evidence; Figure 3 Example 2 of multi-source heterogeneous evidence; Figure 4 Example of training sample data labels; Figure 5 An example of a standardized JSON data structure for a single training sample; Figure 6 Examples of system prompts for the training samples; Figure 7 Example of user input for the training samples; Figure 8Here is an example of the output portion of the training sample; Figure 9 Example of a defect report output by the model. Detailed Implementation
[0010] This application includes an automatic defect analysis method, which includes the following steps.
[0011] S1: Construct global input I i ={ε i ,G,H * ,ρ}; Where, ε i Let G be the multi-source evidence set corresponding to the i-th input data, and the total number of evidence types M in the multi-source evidence set is greater than or equal to 4; G is the basic graph structure of the program, and H is the multi-source evidence set corresponding to the i-th input data. * ρ is a knowledge base for historical defects, and ρ is a test predicate base.
[0012] The global input construction method includes the following steps.
[0013] a1: The basic graph structure of the program is: G=(V,E), where V is a node and E is an edge; The program foundation graph G is constructed through static program analysis and is a directed graph structure used to represent the internal logical relationships and data flow of software code. Nodes V of the program foundation graph G represent basic units of code that execute independently and have logical boundaries in the software; they are the physical carriers of defects.
[0014] The types of node V include: function nodes and method nodes, such as userLogin(), dataParse(), and paymentCheck(), which are the most core node types; Class nodes, module nodes, and interface nodes, such as: UserService, OrderController, DBConnection; Code blocks and critical path nodes, such as loop bodies, exception handling blocks, and core branch logic; Resource nodes, such as configuration files, third-party libraries, database connection pools, and cache instances.
[0015] Edge E: Represents the semantic relationship between nodes and is the path carrier for fault propagation.
[0016] The specific types of edge E are determined based on the relationships between nodes in the actual code. In this embodiment, the edge types include: synchronous call edges, data dependency edges, inheritance edges, implementation dependency edges, instantiation edges, initialization edges, asynchronous callback edges, and log printing edges. Examples are as follows: 1) Synchronous call edge: When function A executes, it directly calls function B. It must wait for B to return before it can continue. This is the path where faults are most easily propagated. Judgment methods: Direct function calls appear in the code; blocking execution and return value dependencies exist; it belongs to a strong control flow dependency; Example: UserService.login() → PasswordEncoder.encode(); The "→" symbol represents a directed edge. When UserService.login() is executed, it directly calls PasswordEncoder.encode() and needs to wait for the return of PasswordEncoder.encode().
[0017] 2) Data-dependent edges: The output, variable, or result of node A serves as the input of node B, with data transmission, a clear data flow direction, and faults can propagate along with the data; Judgment method: There is variable passing, parameter assignment, and return value usage; the data flow is from A to B; it belongs to strong data dependency; Example: paramCheck() → dataParse() → dbInsert(); The output of paramCheck() is used as the input of dataParse(), and the output of dataParse() is used as the input of dbInsert().
[0018] 3) Inheritance or implementation dependencies; inheritance, implementation, or composition relationships between classes or interfaces; Example: AbstractService → UserServiceImpl; UserServiceImpl inherits from AbstractService. AbstractService provides a general business logic skeleton or partial implementation, and UserServiceImpl extends or overrides specific methods on this basis.
[0019] 4) Instantiate or initialize edges: object creation, resource loading, or configuration injection relationships.
[0020] Example: ConfigLoader → DBConnection; The ConfigLoader is responsible for reading configuration files (such as database URL, username, password, etc.) and passing this configuration information to the DBConnection, or using this configuration information to trigger the instantiation of the DBConnection.
[0021] 5) Asynchronous callback edge: A initiates an asynchronous task and continues execution directly without waiting for B to complete; the execution sequence is decoupled, and the failure propagation is weak and has a low probability.
[0022] Judgment method: The code contains async / await, thread, callback, and future; there is no blocking wait or immediate return dependency; it belongs to weak control flow / weak data flow.
[0023] 6) Log printing edge: Used only for outputting logs and debugging information, without affecting business logic, transmitting data, or changing program state.
[0024] Judgment method: Only log(), print(), debug() and other logging functions are called; there is no business calculation, no state modification, and no call to key logic; it belongs to the auxiliary edge with no fault propagation capability.
[0025] During the training phase, this application constructs training sample data using historical defect data, and uses these training samples as input data for model training. During the inference phase, it constructs input data based on received defect-related data and feeds it into the trained defect analysis model for defect analysis. Therefore, in the following explanations of some parameters of the input data, the terms "sample" and "defect sample" will be used to define the specific parameters.
[0026] a2: Construct a multi-source evidence set as: ε i ={e i,m} M m=1 ; In the multi-source evidence set ε, e i,m Let M represent the evidence of the m-th type corresponding to the i-th defective sample, where M is the total number of evidence types.
[0027] This application designs a multi-source heterogeneous defect evidence ε, which is used to describe the data structure of the original evidence of the defect. The specific value of M and the corresponding evidence type are defined according to the actual situation. In this embodiment, when M=4 and m=1, it corresponds to the stack log; when m=2, it corresponds to code change, i.e., revision history; when m=3, it corresponds to test coverage, such as 80% test coverage; when m=4, it corresponds to dependency topology.
[0028] According to the preset functional definitions, the functions where bugs occur are identified and marked as: functions to be statistically analyzed. During implementation, the system is decomposed into functions through static analysis of the system and program. The size and boundaries of specific functions are statistically analyzed based on historical data. The functions of an intelligent driving system include: cameras, radar, lidar, maps and positioning, adaptive cruise control (ACC), lane keeping assist (LKA), automatic emergency braking (AEB), automatic parking, voice commands, air conditioning, navigation, music playback, windows, lights, etc.
[0029] The stack trace is a slice of the log submitted when a defect is reported, and the specific submission content is implemented based on existing technology. Code changes refer to the code of the function to be tracked, including the record of code changes from the version where a bug was last detected to the version where a bug was detected this time. This data is collected through the code rewrite history. Code change records are not required for functions being tested for the first time. Test coverage refers to the test coverage of the code of the function to be statistically analyzed in the version where the bug was found. In practice, it is collected through test log files. The topology relationship refers to the calling relationship between nodes V in the code of the function to be statistically analyzed; data is collected based on the relationships defined in the program's basic graph G.
[0030] In addition to the four types of evidence mentioned above, other types of evidence can be added based on the actual situation of the system. For example, system monitoring metrics can be used as evidence when m=5. System monitoring metrics refer to real-time numerical indicators of the system's operating status within a certain period before and after the bug occurs, such as: CPU utilization, memory usage, interface response time, request throughput, error rate, timeout rate, number of active users, CAN bus load rate, etc., which are automatically collected by the monitoring system.
[0031] a3: Construct a historical defect knowledge base as follows: H * ={h j * ,k j *} j=1 N ; Among them, h j * The fusion embedding of the j-th historical defect case with already labeled root causes; k j * Let N be the root cause index corresponding to the j-th historical defect case, and N be the total number of historical defect cases in the historical defect knowledge base.
[0032] In this application, the root cause refers to the most original and fundamental code, configuration, and logical error that ultimately leads to the occurrence of a bug. The root cause indicates a single source, not a superficial phenomenon. Specifically, locating the root cause involves finding exactly which line / section of code, which configuration, or which dependency is causing the problem. The root cause is not a superficial phenomenon found during bug analysis, such as null pointer exceptions, timeouts, crashes, or error messages, but rather a specific location in the code where a fundamental reason, such as a missed null check, incorrect parameters, or a logical oversight, causes the bug.
[0033] a4: Construct the test predicate library as follows: ρ={P r} R r=1 ; Among them, P rTo test the predicate, let represent the r-th code fact decision rule; R is the total number of code fact decision rules defined in the test predicate library; the code fact decision rules are used to verify the root cause hypothesis H. k Does the content match the facts in the code? Root cause hypothesis H k This includes: the nodes for defect location and a description of the cause of the defect.
[0034] In practical applications, the predicate P will be tested. r Code fact-checking rules configured to automatically output boolean results are used to verify whether root cause hypotheses are consistent with objective facts such as code structure, runtime status, change history, and test coverage. P r Input: Root cause hypothesis H k H k The content includes: location nodes and descriptions of defect causes; P r The output is either true or false; true indicates that the information conforms to the facts, and false indicates that the information does not conform to the facts. Function: To quantify the consistency between assumptions and code facts, and to automatically identify and eliminate unreasonable assumptions (illusionary assumptions).
[0035] The content of the predicate is examined, and based on the historical defects, common fault types, language characteristics and architectural features of the target software system, high-frequency code fact types that can be automatically determined are extracted.
[0036] S2: Construct global output O i ={ H i ~ ,Π i U i}, O i It includes two parts: root cause hypothesis results and system update data, and supports R&D work order integration and model iteration optimization.
[0037] Among them, H i ~ H represents the set of root cause hypotheses corresponding to the i-th sample. i ~ ={H k} K k=1 The root cause hypotheses in the root cause hypothesis set are arranged in descending order of confidence level; H k Let K represent the k-th root cause hypothesis corresponding to the i-th input sample, where K represents the total number of root cause hypotheses in the root cause hypothesis set.
[0038] Root cause hypothesis H kThis represents a candidate root cause answer generated from the i-th input data, containing root cause location information and an explanation of the defect's cause. It is a single-item judgment of "which code segment and what error caused the defect." The root cause hypothesis set is an unsorted set formed by summarizing all candidate root cause answers, containing all root cause candidates to be verified. H i ~ It is the sorted root cause hypothesis set, which is an ordered list obtained by sorting the above hypothesis set from high to low confidence after predicate verification and elimination of hallucination hypotheses.
[0039] Π i ={π (t) k} K k=1 Π i ∈△ K π represents the root cause belief distribution corresponding to the i-th sample; (t) k Let π represent the root cause belief, indicating the confidence probability that the k-th root cause hypothesis is the true root cause in the t-th iteration step. (t) k Values: [0,1]; Initial belief π (0) k It represents the original confidence level before the belief iteration update (t=0).
[0040] △ K This represents the set of mathematical constraints that the root cause belief distribution must satisfy. Specifically, the set of mathematical constraints includes: non-negativity and the sum of all values being 1.
[0041] U i ={h i ,k i ,△Θ},U i To update data for the system, h i For the current i-th defect, the defect panorama embedding is given by k. i ΔΘ represents the true root cause index after annotation, and ΔΘ represents the parameter increment for lightweight model updates.
[0042] In this application, h is embedded through current defect fusion. i As a global digital feature fingerprint of the defect, it uniquely represents the comprehensive semantic and structural information of the current defect. The true root cause index k is then annotated. i Used to indicate the true location of defects in the inference model in the code, such as node locations.
[0043] S3: Construct the evidence fusion module; The evidence fusion module provides multi-source heterogeneous evidence ε corresponding to each input data. i Preprocessing is performed to generate comprehensive features characterizing the current defect, denoted as: defect panorama embedding h iSimultaneously, the historical defect knowledge base H was searched. * Generate non-generative prior constraints N i This constrains the range of subsequent root cause hypotheses; specifically, preprocessing includes: uniform coding, uncertainty quantification, and inverse weighted fusion.
[0044] The input to the evidence fusion module is: Input A ={ε i H *}, the output is Onput A ={h i N i ,{ω i,m} M m=1}; ω i,m This represents the reliability weight of the m-th type of evidence corresponding to the i-th input data.
[0045] The operations in the evidence fusion module include: b1: Map original evidence of different dimensions and types to a unified dimensional real number space to eliminate modal heterogeneity; e i,m ~ =Enc m (e i,m ); Among them, e i,m ∈R d_m For the original evidence of the m-th type of the i-th sample, d m Let Enc be the dimension of the original evidence of the m-th type; m (·) represents a modal encoder specifically designed for different types of evidence; e i,m ~ ∈R d The embedded m-th type of evidence after encoding is unified into d dimensions for all modalities.
[0046] In this embodiment, four different types of original evidence are included, and a modality-specific encoder Enc is constructed for each type of evidence. m (·). Modal-specific encoders include: Transformer-type models for stack and log text, such as BERT or LLaMA, with text data as input; such as... Figure 2 The example shown is a stack trace evidence fragment with m=1, outputting a 768-dimensional vector based on the BERT encoder output. This is for CodeBERT or GraphCodeBERT models targeting code changes; such as... Figure 2The example shown illustrates a code change diff snippet with m=2, outputting a 768-dimensional vector based on the CodeBERT encoder. For a single-layer MLP model focused on test coverage, the model input is a coverage vector; as shown... Figure 3 The example shown provides evidence for test coverage with m=3, based on a 768-dimensional vector output from a single-layer MLP. For topology-dependent graph encoders, such as GCN or GraphSAGE models, the input is a graph structure constructed from static code analysis; for example... Figure 3 The example shown is an evidence fragment of the dependency topology with m=4, which outputs a 768-dimensional vector based on GCN.
[0047] b2: The confidence center embedding μ is obtained by quantifying the uncertainty of each modality on the current sample using the evidence calibration head. i,m and reliability weight ω i,m The method employs inverse uncertainty weighting to achieve automatic deweighting of high-noise evidence and weighting enhancement of high-reliability evidence.
[0048] (μ i,m ,τ i,m )=g m (e i,m ~ ); ; Among them, g m (·) is a specific calibration head for the m-th type of evidence. The structure of the specific calibration head includes an input layer, a hidden layer and an output layer. The dimension of the hidden layer is d / 2. It is trained only on the current modality and adapts to the noise characteristics of different evidence.
[0049] μ i,m ∈R d The confidence center embedding for the m-th type of evidence represents the effective features of the evidence. The core content remaining after removing noise from a piece of evidence is embedded in the confidence center, which is the clean feature that the model truly uses to determine the root cause of the bug. There is noise and fluctuation in the same type of evidence. The most stable and useful core features are extracted through a specific calibration head, which is the confidence center, ensuring the accuracy of the inference model's results.
[0050] τ i,m Let be the log-variance uncertainty of the m-th type of evidence. The mathematical expression for the log-variance uncertainty is: τ i,m =log(σ i,m 2 ), σ i,m 2 τ represents the variance of the evidence embedding. i,m The larger the value, the more chaotic and unreliable the evidence, indicating that the evidence is less reliable; through specific calibration head gm (·) After learning, τ is directly output for each piece of evidence. i,m .
[0051] ω i,m ∈(0,1) represents the reliability weight of the m-th type of evidence corresponding to the i-th input data, satisfying the normalization constraint: ; Based on reliability weight ω i,m The formula for calculating ω is as follows: the larger the uncertainty τ, the smaller exp(-τ), and therefore the smaller the weight ω. Based on this formula, the model can adaptively assign greater weights to more reliable evidence. i,m By automatically learning from the uncertainty of the evidence itself, it is possible to automatically weight highly reliable evidence and automatically deweight highly noisy evidence.
[0052] This application integrates more than four types of heterogeneous evidence: stack traces, code change diffs, test coverage, dependency topology, etc., and quantifies the uncertainty τ of each type of evidence through an evidence-specific calibration header. i,m Using inverse weighted uncertainty ω i,m =exp(-τ i,m ) / Sum(exp(-τ i,m This system achieves adaptive weight allocation, where higher uncertainty results in lower weights, automatically biasing the fusion result towards high-quality evidence and mathematically guaranteeing the optimal signal-to-noise ratio. Four types of evidence describe defects from different dimensions: logs reflect runtime anomalies, diffs reflect code changes, coverage reflects testing blind spots, and topology reflects propagation paths, forming a comprehensive defect profile and improving the completeness and representativeness of sample features. The quality of various types of evidence varies greatly under different defect scenarios; for example, some defects may have detailed logs but no code changes, while others may have clear coverage reports but incomplete logs. The uncertainty inverse weighting mechanism enables the model to automatically identify which evidence is credible and which is noisy, automatically reducing the weight of high-noise evidence and automatically increasing the weight of high-reliability evidence, avoiding contamination of the fusion result by low-quality evidence. The uncertainty inverse weighting mechanism in this application is a data-driven adaptive mechanism that requires no manual weight setting and allows for flexible expansion of the evidence modalities.
[0053] b3: Multi-evidence weighted fusion: The confidence center embeddings of each modality are fused according to reliability weights to generate a defect panorama embedding h. i The comprehensive characteristics that characterize the current defect: ; In the formula, It is a linear projection function that enables fine alignment of the embeddings of each modality into the same feature space; =W p x+b p Among them, Wp ∈R d×d and b p ∈R d h are the learnable parameters of the projection function. i ∈R d It provides a comprehensive embedding of defects, offering unified feature input for subsequent reasoning; it addresses the issue of misalignment of heterogeneous evidence features, making fusion more scientific and learnable.
[0054] In this application, a special calibration head g is used. m (·) Extract confidence center embeddings μ for each type of evidence. i,m Then, the extracted μ i,m Weighted fusion is performed to reduce noise interference at the source and improve the quality of defect features; at the same time, the reliability weight ω used in the weighted fusion is... i,m Due to the inherent uncertainty τ of the evidence i,m Automatic learning, rather than manual definition, ensures that the calculation results are more consistent with the characteristics of the data itself, thereby ensuring that subsequent calculation results are more accurate.
[0055] b4: Historical defect comparison retrieval, as a priori constraint; Calculate the cosine similarity s between the current defect embedding and the historical defect embeddings. ij Retrieve the L most similar cases and generate N non-generative prior constraints. i This constrains the range of subsequent root cause hypotheses that can be generated. ; N i ={ h j * |rank(s ij )≤L}; In the formula, h j * ∈R d For the fusion embedding of the j-th historical defect case in the historical defect knowledge base; s ij [-1, 1] represents the cosine similarity between the current i-th defect and the j-th defect in historical data; a larger value indicates a greater similarity between the current defect and historical defects. L represents the number of similar cases, and the specific value of L is set according to historical data and computational efficiency. In this embodiment, L ∈ [5, 20], which can balance the strength of prior constraints and computational efficiency. rank() is the ranking function, rank(s ij ) indicates that s ij Sort the data and find the top L largest values. ij , and its corresponding h j * N is constructed as the most similar historical case. i N iThe form is a set consisting of L most similar historical defects, and N i It serves as the prior input for the subsequent constraint reasoning module corresponding to the i-th input data.
[0056] The uncertainty τ is quantified for each type of evidence in the evidence fusion module. i,m Reliable evidence is automatically weighted and strengthened, while unreliable evidence is automatically downweighted. The fusion result retains only the high-confidence feature μ. i,m This allows the model to reason based solely on the core features of the evidence, effectively reducing hallucinations caused by noise.
[0057] This application constructs a reasoning model based on a large language model, using real historical defect cases as priors to force the large language model to reason only within the distribution range of real historical root causes. It constructs non-generative constraints, using real knowledge to lock the reasoning space and suppress illusions at their root. In the evidence fusion module, based on the defect panoramic embedding h... i Similarity retrieval of historical cases, rather than simple keyword matching, can capture deep logical similarities, ensuring more accurate search results. This application constructs non-generative prior constraints through a historical defect comparison retrieval mechanism: firstly, the current defect panorama is embedded into h... i Embedded with historical defect knowledge base h j * Cosine similarity calculation is performed to obtain the semantic similarity s between defects. ij After performing similarity matching between the current defect and historically labeled and verified real defects, the top-L most similar historical cases are extracted to construct a set N of similar historical defects. i This approach ensures the validity of priors without expanding the search space, balancing accuracy and efficiency through a configurable prior construction method; and utilizes N... i Establish a known range of true root causes, use this set of real cases as a priori, and force subsequent constraint reasoning modules to generate hypotheses only within the code area where similar historical root causes are located, and do not allow unfounded, cross-range, or unconventional generation of hypotheses; use verified real knowledge to constrain future reasoning directions.
[0058] S4: Construct the constraint reasoning module; The constraint reasoning module uses defect anomalies as roots to construct a dynamic fault propagation subgraph G for each input data point. i By using the graph attention network GAT to fuse defects and embed h i With node feature x v Calculate the root cause candidate score α for each node. v , v∈G iFiltering low-scoring nodes achieves structured pruning of the hypothesis generation space, eliminating irrelevant nodes to obtain a root cause candidate node set; then, based on the large language model, root cause hypotheses are generated only within the root cause candidate node set; for each generated root cause hypothesis H... k By verifying the consistency between the predicate library ρ and the actual code, the violation degree V(H) is quantitatively calculated. k ), eliminating hallucination assumptions whose violation exceeds a threshold; The input to the constraint reasoning module is: Input B ={h i N i ,G,ρ}, The output is: Output B ={h i ,α v ,{V(H k )} K k=1}
[0059] The operations in the constraint reasoning module include: c1: Construction of dynamic fault propagation subgraph; Starting from the defect anomaly root node v0, bounded depth expansion and effective edge filtering are performed on the program base graph G to generate a dynamic subgraph G containing only nodes related to fault propagation. i ;The specific defect or exception root node v0 is such as the top frame of the stack or the assertion failure point; G i ={(v,e)∈G|depth(v,v0)≤D max ,e∈ε valid}; Among them, G i =(V i E i G represents the dynamic fault propagation subgraph of the i-th sample. i A subset of G; V i For G i The set of nodes in E i For G i The set of edges in the array; depth(v,v0) represents the shortest path length from node v to the root node v0. max Indicates the maximum expansion depth.
[0060] The root cause of a software defect is usually not at the point of error, but rather propagates downstream from an upstream node along the program's control and data flow. Bounded depth expansion ensures that the search scope covers all potential root cause nodes within a 3-5 call depth. Industry experience shows that over 95% of root causes are within 5 call depths of the point of error. Effective edge filtering is based on the semantic type of the edges.
[0061] This method uses D max The depth of the control chart, specifically D max The value is set based on computational efficiency and empirical data on root nodes and bug occurrence nodes from historical data, to avoid excessively large subgraphs that could reduce inference efficiency. In this embodiment, D max ∈[3,5]. ε valid The set of valid edges is defined based on the relationships between nodes in the actual code, or statistically based on historical data. For example, synchronous call edges are the easiest paths for fault propagation and are valid edges for fault propagation between nodes; faults can propagate with data, so data dependency edges are valid edges for fault propagation between nodes; asynchronous tasks are decoupled in execution time, resulting in weak fault propagation and low probability, so asynchronous callback edges are defined as invalid edges in this embodiment; log functions are only used to output logs and debugging information, without affecting business logic, transmitting data, or changing program state, so log printing edges are defined as invalid edges; therefore, data dependency edges and synchronous call edges are retained, while irrelevant nodes associated with invalid edges such as asynchronous callback edges and log printing edges are removed; removing these edges ensures that the subgraph only retains the "inevitable path" of faults, improving the accuracy of the candidate node set.
[0062] Real-world software systems may contain thousands or even tens of thousands of function / module nodes. Searching for the root cause across the entire graph would be computationally expensive and easily introduce irrelevant interference. This method compresses the search scope from the entire graph, including all nodes, to a maximum of D nodes by using a dynamic fault propagation subgraph. max Subgraph search for each node, with dynamic fault propagation subgraphs typically retaining only 10-30 highly relevant nodes, significantly improves inference efficiency. Unlike the simple "keyword matching" method in existing technologies, this method preserves the structured semantics of the code (who calls whom, where the data flows from and to) through fault propagation subgraphs, enabling the model to understand "how the fault propagates along the call chain," rather than just knowing "where the error occurred." For example, logs show that ParkingAPP is experiencing an error, but the fault propagation graph reveals that the true root cause lies in the upstream SteeringActuator. The dynamic subgraph in this application is constructed in real time for each defect, ensuring strong relevance to the current defect while avoiding the inefficiency of full graph search.
[0063] c2: Node feature construction; For dynamic subgraph G i Each node constructs a multi-dimensional comprehensive node feature x v By integrating code structure, changes, overriding, and semantic information, it provides a basis for graph attention reasoning: x v=[cyclo(v),change_cnt(v),cov(v),embed(v)] T .
[0064] In the formula, cyclo(v)∈[0,1] represents the normalized cyclomatic complexity of the function, characterizing the code complexity; change_cnt(v)∈[0,1]: the number of code changes within a specified time period. In this embodiment, the specified time period is 30 days, and the value of the number of code changes in the past 30 days is normalized. The more frequent the changes, the higher the probability of defects. cov(v)∈[0,1] represents the test coverage, which represents the proportion of code (function / code block) corresponding to node v that is executed by unit test cases. It can be automatically obtained from the code or project file; the lower the coverage, the higher the probability of defects.
[0065] embed(v)∈R d The semantic embedding of functions and modules represents the high-dimensional semantic vector of code node v (function / module / class) encoded by the code pre-trained model CodeBERT, which is independently encoded in the node feature construction step of the constraint inference module. Specifically, the encoding timing of embed(v) is as follows: constructing the dynamic fault propagation subgraph G. i Then; the encoding object: each code node v in the subgraph; the encoder is a code-specific pre-trained model built on CodeBERT; the encoder output: a vector embed(v) with fixed dimension d; d is the dimension after the evidence fusion module unifies all types of evidence modalities.
[0066] cyclo(v), change_cnt(v), and cov(v) are three explicit handcrafted features, and embed(v) is a d-dimensional semantic embedding, so x v ∈R d_x ;d x =3+d represents the node feature x v The total dimension.
[0067] c3: Pruning of candidate nodes for graph attention; Defect fusion embedding using graph attention network GAT (Graph Attention Network) i With node feature x v The root cause candidate score α is obtained by calculating the attention score of each node. v By filtering out low-scoring nodes, a structured pruning of the hypothesis generation space is achieved, resulting in a root cause candidate node set C. i ; ; C i ={v∈V i |α v ≥ε}; In the formula, α v ∈(0,1) represents the root cause candidate score of node v, and α represents the scores of all nodes v that satisfy the condition. v The sum is 1: ∑α v =1, v∈V i ; 'a' is the attention vector of the learnable graph attention, a∈R d+d_x The scoring threshold ε is obtained based on historical data statistics. In this embodiment, ε∈[0.05,0.1].
[0068] W1∈R (d+d_x)×d and W2∈R (d+d_x)×d_x The feature projection matrix is a learnable feature; is the temperature coefficient, used to reflect the numerical stability and training convergence of the model, and to avoid the distortion of attention scores caused by the curse of dimensionality; d is the dimension after the evidence fusion module unifies all types of evidence modalities, and the square root of the dimension is used as the temperature coefficient to alleviate the problem of attention score distortion caused by high-dimensional features and improve the numerical stability and training convergence of the model.
[0069] C i Given a set of root cause candidate nodes, graph attention is used to prune the candidate nodes to obtain a set C containing only nodes with "abnormalities in multiple dimensions". i This effectively filters out the code modules most likely to cause defects, significantly reducing the hypothesis generation space. In this embodiment, C... i The number of nodes should be controlled between 10 and 20.
[0070] Root cause candidate score α v x, the node's own characteristics v With defect panoramic embedding h i After concatenation, the association score is calculated using a learnable projection matrix and attention vector; GAT assigns an importance score α related to the current defect to each node in the graph through an attention mechanism. v The core idea is that not all code modules are likely to be root causes. Only those nodes with "high code complexity, frequent recent changes, low test coverage, and semantic features that are highly correlated with the defect description" are high-probability root cause candidates.
[0071] c4: Generation of constrained root cause hypotheses; Based on the Large Language Model (LLM), only in the root cause candidate node set C i Internally, it incorporates non-generative prior constraints N composed of historically similar cases. i The root cause hypothesis {H} k} k=1 K Each hypothesis H kIt includes root cause node location and natural language explanation of the defect cause, such as: The func1 function of module A has a null pointer because the return value of the third-party interface was not verified.
[0072] c5: Computable violation test, eliminating hallucinations; For each generated root cause hypothesis H k H was verified by examining the predicate corpus ρ. k Consistency with the code facts is quantified to obtain the violation degree V(H). k Remove violations exceeding the threshold V. max The illusion hypothesis yields the feasible root cause hypothesis set H. i .
[0073] This application uses the violation degree V(H) k Hypothesis H for the k-th root cause k A quantifiable score for the degree of inconsistency between the hypothesis and objective facts in the code; how many test predicates and how much weight the hypothesis violates. The higher the score, the less the hypothesis conforms to the actual situation in the code, and the more likely it is to be a hallucination hypothesis.
[0074] ; In the formula, P r To test the predicate, let represent the r-th code fact judgment rule; R is the total number of code fact judgment rules defined in the test predicate library ρ; P r (H k ) indicates the judgment hypothesis H k Does it meet P? r Defined code facts; λ r ∈(0,1) represents the predicate weight, satisfying the condition: Σλ r =1, 1≤r≤R.
[0075] Indicates when P r (H k If the result of the inverse check matches the code, then the condition is true and the value is 1.
[0076] V(H k )∈[0,1] indicates that hypothesis H k The degree of violation with the actual code. V(H) k In the calculation formula for P, r (H k Perform the inverse check; if H k If the assumed situation in the code is inconsistent with the actual situation, then P r (H k The result is 0, which, when inverted, becomes 1, and is then compared with λ. r After multiplying, perform cumulative calculations. When dealing with P... r (Hk If the result of inverting the value of ) is 0, then it is not included in the accumulation calculation; therefore, V(H) k The larger the calculated value of V(H), the more serious the deviation from the actual code. k After that, by violating the degree threshold V max The hallucination hypothesis is eliminated to obtain the valid candidate set H. i H i ={H k |V(H k )≤V max}; In the formula, H i V is the set of feasible root cause hypotheses. max For violating the degree threshold, the specific V max The value of is based on historical data, and in this embodiment it is set to V. max ∈[0.2,0.5].
[0077] The verification predicates and verification methods in this embodiment are shown in Table 1.
[0078] Table 1: Core Verification Predicates and Automatic Validation Methods
[0079] This method transforms the "whether the hypothesis is reasonable" into an "automatically computable Boolean decision" by examining the predicate, thus achieving automatic detection and elimination of hallucinations. The violation degree V(H) defined in this method... k The calculation method for the predicate is not a simple "pass / fail", but a weighted continuous value (range [0,1]), which allows the model to tolerate some predicates not being satisfied (because some predicates may not be fully and accurately verified due to tool limitations). Only when the overall violation exceeds a threshold is the hypothesis eliminated. When multiple objective facts contradict the hypothesis, the cumulative violation exceeds the threshold, and the hypothesis is judged as an illusion and eliminated. This fact-anchoring mechanism does not rely on the model's own judgment, but uses externally verifiable objective evidence to constrain the model output, fundamentally solving the illusion problem of "model self-consistency but factual error" that occurs in LLM inference.
[0080] This application also includes a post-processing module that maintains a probability-based distribution of root cause beliefs on the feasible root cause hypotheses output by the constraint reasoning module. It integrates dual-channel feedback from R&D review and system environment observation, and updates beliefs according to Bayesian rules. At the same time, it updates the model through lightweight parameter adaptation and supplements the historical knowledge base, forming a closed loop of analysis-feedback-optimization.
[0081] The post-processing module is based on node attention score a v With hypothesis violation degree V(H) kThe initial belief distribution of the root cause hypothesis is initialized to obtain the initial belief value π of the k-th root cause hypothesis. (0) k A dual-channel feedback mechanism is designed to update beliefs based on feedback and output the updated root cause hypothesis set and belief distribution.
[0082] Input of the post-processing module C ={H i ,{α v} v∈G_i ,{V(H k )} k=1 K O obs The input to the post-processing module includes the dynamic fault propagation subgraph G constructed in the constraint reasoning module. i Feasible root cause hypothesis set H i The root cause candidate score α of node v v v∈G i Assume H k Violation degree V(H) k ) and dual-channel observation feedback O obs .
[0083] The output of the post-processing module is Output C ={Π i U i The updated root cause belief distribution and the system update data (fusion embedding, labeled root causes and parameter increments) are the core parts of the global output.
[0084] The operations in the post-processing module include: d1: Initialize the belief distribution to achieve high initial confidence for the hypothesis with high score and low violation. Based on node attention score a v With hypothesis violation degree V(H) k ), initialize the belief distribution of the root cause hypothesis, and obtain the initial belief π. k (0) ; ; In the formula, π k (0) ∈△ K Let the initial belief value of the k-th hypothesis satisfy the following constraints: ; V(H k ) represents the root cause hypothesis H k The corresponding dynamic subgraph node, α V(H_k) Represents the dynamic subgraph node V(H) kThe root cause candidate score is β, which is a preset violation penalty coefficient. The larger β is, the heavier the penalty for high violation hypotheses. The specific value of β is obtained based on historical data statistics. In this embodiment, β∈[1,5]. exp(-β·V(H k α is an exponential penalty term for the degree of violation; the higher the degree of violation, the smaller this term's value. V(H_k) From GAT graph attention, exp(-β·V(H k The high initial belief is obtained by multiplying the predicate test and ensuring that the assumptions of "the model's judgment is credible" and "the facts are consistent" are satisfied at the same time.
[0085] Initial belief π k (0) The distribution calculation method simultaneously integrates node candidate scores and hypothesis violation: the higher the candidate score of node v, the stronger the hypothesis H. k The corresponding code node is more likely to be the source of the failure; assuming H k The lower the degree of violation, the more consistent the hypothesis is with the facts in the code. This application uses both together as the basis for calculating the initial belief, so that credible hypotheses with high scores and low violations receive higher initial confidence, while uncredible hypotheses with low scores and high violations are automatically suppressed, thus achieving accurate and reasonable initialization of the initial belief.
[0086] d2: Dual-channel observation feedback definition: Define the observation feedback of the human review channel and the system channel as O. obs ={O human O sys These correspond to subjective annotations and objective observations of the system, respectively, providing a basis for updating beliefs; Among them, O human Human-centric observations, the annotation results of R&D engineers on root cause hypotheses, are represented in the form of a modified belief distribution. ∈△ K In practice, developers can manually select "which is the true root cause," and the system will automatically convert it into a standard probability distribution for the model to learn; alternatively, the true root cause index k can be directly labeled. * It can be converted into one-hot form. .
[0087] O sys To ensure objective system observation, the code is corrected based on the detection results output by the constraint reasoning module. The result of reproducing the defect is then tested in a specified environment. The reproduction result is represented by a binary variable y, where y∈{0,1}, and y=0 indicates that the defect was not reproduced. That is, assuming H... k Accurate code fix; y=1 indicates defect reproduction, indicating hypothesis H kInaccurate; code fix unsuccessful. Reproduction environments include: test environment, gray-scale environment, and production environment. The specific environments for reproducing the bug should be selected based on actual needs.
[0088] d3: Belief update in the human review channel: The belief in the human review channel is updated by using projection KL minimization, which ensures that the updated belief is close to the R&D label and avoids belief abrupt changes caused by labeling errors, thus achieving "smooth update". ; Where, π (t) This represents the confidence probability that the root cause hypothesis is the true root cause in the t-th iteration step; KL is the KL divergence, which represents the difference between two probability distributions. The smaller the value, the more similar the distributions are. ; K represents the total number of root cause hypotheses in the root cause hypothesis set; γ is the smoothing coefficient, the smaller the value, the more emphasis is placed on human review and labeling, and the larger the value, the more historical beliefs are preserved. In this embodiment, the value of γ is set to γ∈(0,1).
[0089] Then, the confidence probability π of the k-th root cause hypothesis being the true root cause in the (t+1)-th iteration step is... k (t+1) The result was obtained through calculation: .
[0090] d4: System channel belief update: Likelihood weighting is used to update the belief in the objective observation of the system, which transforms the objective observation of the system environment into probability weights, and positively strengthens and negatively weakens the hypothesis belief without human intervention; ; In the formula, L(y|H k Let be the likelihood function, representing the likelihood under hypothesis H. k Under the given conditions, the probability of observing the result y is calibrated by the model's historical accuracy: ; ρ correct ∈(0,1) represents the historical root cause localization accuracy of the model, estimated by the confusion matrix, ρ correct The specific value range of ρ is obtained based on historical data statistics. In this embodiment, ρ correct ∈[0.8,0.95].
[0091] The post-processing module also includes operations such as lightweight model parameter adaptation and historical knowledge base updates.
[0092] Lightweight parameter adaptation for the model: To avoid the high computational cost of full fine-tuning, only the core sub-modules of the model are updated in small steps using low-rank adaptation (LoRA), while freezing the main body of the large model and the underlying parameters of the encoder. W t+1 =W (t) +△W; △W=A·B T ; In the formula, W (t) The weight matrix to be updated for the module in this embodiment includes: a specific calibration head g. m The projection matrix of the graph attention GAT; the specific calibration head g m To control uncertainty estimation, the GAT projection matrix controls the attention allocation in the control graph. This method updates these two matrices based on the results of each defect analysis, ensuring a direct impact on root cause localization accuracy. Furthermore, it features a small parameter set and low update risk. The semantic encoding capabilities of the large model (BERT / CodeBERT) are fully learned during pre-training, eliminating the need for adjustments after each defect analysis.
[0093] △W represents the weight increment, obtained from the labeled data trained using dual-channel feedback; A∈R d×r and B∈R r×d The matrix is low-rank, with r much smaller than d. The value of the low-rank dimension r is obtained based on historical data statistics; in this embodiment, r ∈ [8, 64]. After adopting the low-rank adaptation method, the number of parameters is only 2r / d (less than 1% of the total number of parameters) of the full fine-tuning. Model optimization can be completed by updating only a small number of parameters, which greatly reduces the training and deployment costs.
[0094] The historical knowledge base is updated to incorporate and embed the current defects into h. i Compared with the labeled true root cause index k i Add to the historical defect knowledge base H * This enables case self-accumulation and strengthens the prior constraints of comparison and retrieval in the evidence fusion module.
[0095] This application uses feedback data to update the LoRA of the model sub-modules and supplement the historical knowledge base, constructing a self-growing closed loop of defect analysis, feedback, and optimization. Real-world defect cases are added to the historical knowledge base in real time, continuously strengthening prior constraints and enabling long-term adaptive evolution of the system. Each defect fixed is automatically added to the knowledge base; the longer it is used, the more accurate the prior retrieval of the evidence fusion module becomes, ensuring that the system's reasoning results become increasingly accurate with use.
[0096] S5: Construct a defect analysis model, such as Figure 1 As shown, the defect analysis model includes: an evidence fusion module, a constraint reasoning module, and a post-processing module connected in sequence.
[0097] This application proposes a hypothesis belief update and dual-channel feedback mechanism. Through dual-factor initialization, manual and system dual-channel calibration, KL smoothing iteration, LoRA lightweight update and knowledge base self-growth, a self-optimizing system for intelligent bug location is constructed that is closed-loop, robust, efficient and low-cost.
[0098] This method also incorporates node attention score α. v and the degree of hypothesis violation V(H) k This application employs a two-factor initialization method, automatically increasing the confidence of nodes with high scores and low violations, while automatically lowering the confidence of nodes with low scores and high violations, thus suppressing illusions and strengthening credible assumptions from the outset. The proposed joint belief initialization mechanism, based on node candidate scores and violation index penalties, integrates model structural features with code fact-checking results to achieve accurate and reasonable allocation of initial beliefs, thereby improving the reliability of root cause ranking from the source.
[0099] S6: Construct training sample data based on historical defect data, and train the model to obtain a trained defect analysis model; the structure of the training sample data includes: system prompt words - global input I i -Global Output O i The standard ground truth is used as the label when constructing the training sample data. For example... Figure 4 The image shows an example of a label. The field `root_cause_node` marks the node where the defect occurred, `root_cause_label` marks the description of the defect phenomenon, `root_cause_category` marks the actual cause and introduction method of the defect, `fix_method` marks the defect repair method, and `verified` marks the defect correction result. The standardized JSON data structure for a single training sample is shown below. Figure 5 As shown.
[0100] Examples of system prompts in the constructed training samples after integrating multi-source heterogeneous evidence are as follows: Figure 6 As shown, the global input (user input section) is as follows: Figure 7 As shown, the input text section is as follows: Figure 8 As shown. Based on Figures 6-8 Example, V max =0.5, after the calculable violation test: V(H1) = 0.12 (Pass, below the threshold of 0.5): setAngle() coverage is only 25% (P2 is satisfied), and there have been recent code changes (P3 is satisfied); V(H2) = 0.28 (passes, below the threshold of 0.5): can_send_with_retry() is a new function (P3 satisfies this condition); V(H3) = 0.55 (rejected, exceeding the threshold of 0.5): TrajectoryController coverage 67% (P2 not satisfied), no recent changes (P3 not satisfied); Then, retain the hypothesis set Hi = {H1, H2}; Belief initialization: confidence level π (t) k In the embodiment, it is represented as pi_k; pi_1(0) = 0.78, pi_2(0) = 0.22.
[0101] The final output is as follows Figure 9 As shown. This method outputs not a single root cause conclusion, but a set of root cause hypotheses {H1, H2, ..., HK}, each accompanied by a belief distribution pi_k (range [0, 1], satisfying Sum(pi_k) = 1), quantifying the confidence of each hypothesis in probabilistic form. In practical applications, engineers can determine the order of investigation based on the confidence level, rather than facing a single conclusion that is "possibly right or wrong" and unable to make a judgment. This method models root cause localization as a Bayesian inference problem: the root cause hypothesis is a random variable, and the belief distribution is the posterior probability. The initial belief is jointly determined by the graph attention score and the violation degree, and the posterior is continuously corrected through dual-channel feedback. This probabilistic output conforms to the objective reality of "uncertainty in the root cause of defects." In complex systems, the same phenomenon may be caused by multiple factors, and the probability distribution precisely expresses this uncertainty.
[0102] S7: Deploy the trained defect analysis model in the defect management system; Based on the received defect-related data, the system constructs a global input I for each defect. i The data is fed into the defect analysis model, and the model's output root cause hypothesis set, belief distribution, and system update data are used to construct the output data O. i The inference results obtained from the input data can also be integrated with systems such as R&D work orders and automated regression.
[0103] The following section uses experimental data to illustrate the performance advantages of this method. Experimental setup: Experimental dataset: A total of 862 historically fixed defects were collected from three industrial projects (intelligent parking system, order management system, and distributed storage platform). These defects were divided into a training set (603 defects), a validation set (130 defects), and a test set (129 defects) in a 7:1.5:1.5 ratio. Each sample contains complete evidence from all four categories and manually labeled true root causes.
[0104] Baseline comparison: (1) LLM-Direct: Directly input logs and descriptions into a large language model without any constraints, freely generating root causes; (2) Keyword-Match: Traditional defect analysis based on keyword matching and rule templates; (3) RAG-Base: Root cause generation is enhanced by retrieving historical cases using only log text; (4) MEU-HBUF (this method): Complete three-module process (MEU + DPG-CR + HBUF).
[0105] Evaluation indicators: Top-1 accuracy (the proportion of the model whose first-ranked hypothesis is the true root cause); Top-3 recall (the percentage of the true root cause that appears in the top 3 hypotheses); Illusion rate (the proportion of models whose output assumptions clearly contradict the facts in the code); Expected Calibration Error (ECE) is the average confidence level.
[0106] The experimental results are shown in Table 2 below.
[0107] Table 2: Comparison of Experimental Results
[0108] Based on the data in Table 2, it can be seen that the data from this method has a clear advantage among the various evaluation indicators, especially for the hallucination rate, where this method can significantly reduce the hallucination rate of the model.
[0109] To verify the independent contribution of each innovative module, ablation experiments were conducted by removing individual modules from each of the three modules of this method in turn. The experimental results are shown in Table 3.
[0110] Table 3: Ablation Experiment Data
[0111] As shown in Table 3, comparing the complete model constructed using this method in the first row, we can see that in the model in the second row, after removing the uncertain weighted constraints in the evidence fusion module, all nodes are fused with equal weights, leading to excessive noise data interfering with the results, thus increasing the illusion rate and decreasing the accuracy. In the model in the third row, removing the constraint of graph attention candidate node pruning in the constraint reasoning module allows the LLM model to reason freely, significantly increasing the illusion rate. In the model in the fourth row, after removing the dual-channel feedback in the post-processing module, due to the presence of the evidence fusion module and the constraint reasoning module, both the accuracy and illusion rate are better than the results of the models in the second and third rows. However, because of the lack of feedback improvement channels, the accuracy and illusion rate are still lower than the data of the complete model in the first row.
Claims
1. An automatic defect analysis method, characterized in that, It includes the following steps: S1: construct global input I i = {ε i , G, H * ,ρ}; Where, ε i Let G be the multi-source evidence set corresponding to the i-th input data, and the total number of evidence types M in the multi-source evidence set is greater than or equal to 4; G is the basic graph structure of the program, and H is the multi-source evidence set corresponding to the i-th input data. * For historical defect knowledge base, ρ is the test predicate base; I i This is the global input corresponding to the i-th input data; S2: Construct global output O i ={ H i ~ ,Π i U i }; Among them, H i ~ H represents the set of root cause hypotheses corresponding to the i-th sample. i ~ ={H k } K k=1 H k This represents the k-th root cause hypothesis corresponding to the i-th sample, where K represents the total number of root cause hypotheses in the root cause hypothesis set; O i This is the global output corresponding to the i-th sample; Π i ={π (t) k } K k=1 , Π i ∈△ K π represents the root cause belief distribution corresponding to the i-th sample; (t) k △ represents the confidence probability that the k-th root cause hypothesis is the true root cause in the t-th iteration step; K This represents the set of mathematical constraints that the distribution of root cause beliefs must satisfy, specifically including: non-negativity and the sum of all values being 1; U i ={h i ,k i ,△Θ},U i For the system update data corresponding to the i-th sample, h i For the current i-th sample, the defect panorama embedding is given by k. i ΔΘ is the true root cause index after labeling corresponding to the i-th sample, and ΔΘ is the parameter increment of the model's lightweight update. S3: Construct the evidence fusion module; The evidence fusion module provides a multi-source heterogeneous evidence set ε corresponding to each input data. i After preprocessing and fusion, a comprehensive feature representing the current defect is generated, denoted as: Defect Panoramic Embedding h. i Simultaneously, the historical defect knowledge base H was searched. * Generate non-generative prior constraints N i This constrains the range of subsequent root cause hypotheses that can be generated. S4: Construct the constraint reasoning module; The constraint reasoning module constructs a dynamic fault propagation subgraph G for each input data point, using the defect anomaly point as the root. i By using the Graph Attention Network (GAT) to fuse the panoramic embedding of defects, h i With node feature x v Calculate the root cause candidate score α for each node. v , v∈G i Filtering low-scoring nodes achieves structured pruning of the hypothesis generation space, eliminating irrelevant nodes to obtain a root cause candidate node set; then, based on the large language model, root cause hypotheses are generated only within the root cause candidate node set; for each generated root cause hypothesis H... k By verifying the consistency between the predicate library ρ and the actual code, the violation degree V(H) is quantitatively calculated. k ), eliminating hallucination assumptions whose violation exceeds a threshold; S5: Construct a defect analysis model, which includes: an evidence fusion module and a constraint reasoning module connected in sequence; S6: Construct training sample data based on historical defect data, and train the model to obtain the trained defect analysis model; The structure of the training sample data includes: global input I i and global output O i ; S7: Deploy the trained defect analysis model in the defect management system; Based on the received defect-related data, the system constructs a global input I for each defect. i The data is fed into the defect analysis model, and the model's output is used to construct output data O. i The inference results based on the input data are obtained.
2. The automatic defect analysis method according to claim 1, characterized in that: The defect analysis model also includes a post-processing module, which is set after the constraint reasoning module; The post-processing module is based on node attention score a v With hypothesis violation degree V(H) k The initial belief distribution of the root cause hypothesis is initialized to obtain the initial belief value π of the k-th root cause hypothesis. (0) k A dual-channel feedback mechanism is designed to update beliefs based on feedback and output the updated root cause hypothesis set and belief distribution.
3. The automatic defect analysis method according to claim 1, characterized in that: The global input construction method includes the following steps: a1: The basic graph structure of the program is: G=(V,E), where V is a node and E is an edge; The program's basic graph structure is a directed graph structure that represents the internal logical connections and data flow of the software code; The node V represents a basic unit of code in the software that executes independently and has logical boundaries; it is the physical carrier of defects. The edge E represents the semantic relationship between nodes; it is the path carrier of fault propagation. a2: Construct a multi-source evidence set as: ε i ={e i,m } M m=1 ; In the multi-source evidence set ε, e i,m Let m represent the evidence of the m-th type corresponding to the i-th defect sample, where M is the total number of evidence types; the types corresponding to m include: stack trace, code change, test coverage, and dependency topology; According to the preset function definition, find the function where the defect occurred and record it as: function to be counted; The stack log is a slice log submitted when a defect is reported; The code changes mentioned refer to the code changes of the function to be counted, from the version where the defect was last detected to the version where the defect was detected this time; code change records are not required for functions being tested for the first time. The test coverage rate refers to the test coverage rate of the code of the function to be statistically analyzed in the version where the defect was found. The dependency topology relationship refers to the calling relationship between nodes V in the code of the function to be statistically analyzed; a3: Construct a historical defect knowledge base as follows: H * ={h j * ,k j * } j=1 N ; Among them, h j * The fusion embedding of the j-th historical defect case with already labeled root causes; k j * Let N be the root cause index corresponding to the j-th historical defect case, and N be the total number of historical defect cases in the historical defect knowledge base. a4: Construct the test predicate library as follows: ρ={P r } R r=1 ; Among them, P r To test the predicate, let represent the r-th code fact determination rule; R is the total number of code fact determination rules defined in the test predicate library; the code fact determination rules are used to verify the root cause hypothesis H. k Does the content match the facts in the code? Root cause hypothesis H k This includes: the nodes for defect location and a description of the cause of the defect.
4. The automatic defect analysis method according to claim 3, characterized in that: The types of edges include: synchronous call edges, data dependency edges, inheritance edges, implementation dependency edges, instantiation edges, initialization edges, asynchronous callback edges, and log printing edges; The types of nodes include: function nodes, method nodes, class nodes, module nodes, interface nodes, code block nodes, critical path nodes, and resource nodes.
5. The automatic defect analysis method according to claim 1, characterized in that: The operations in the evidence fusion module include: b1: Map all original evidence to a unified dimensional real number space to eliminate modal heterogeneity; and i,m ~ =Enc m (and i,m ); Among them, e i,m ∈R d_m For the original evidence of the m-th type of the i-th sample, d m Let Enc be the dimension of the original evidence of the m-th type; m (·) represents a modal encoder specifically designed for different types of evidence; e i,m ~ ∈R d The embedded m-th type of evidence after encoding is unified into d dimensions for all modalities; b2: The confidence center embedding μ is obtained by quantifying the uncertainty of each modality on the current sample using the evidence calibration head. i,m and reliability weight ω i,m ; (m i,m ,t i,m )=g m (e i,m ~ ); ; Among them, g m (·) is a specific calibration head for the m-th type of evidence, consisting of an input layer, a hidden layer and an output layer, with the dimension of the hidden layer being d / 2; μ i,m ∈R d The confidence center embedding for the m-th type of evidence characterizes the valid features of the evidence; τ i,m Let be the log-variance uncertainty of the m-th type of evidence corresponding to the i-th input data; τ i,m’ Let be the log-variance uncertainty of the m'-th type of evidence corresponding to the i-th input data; ω i,m ∈(0,1) represents the reliability weight of the m-th type of evidence corresponding to the i-th input data, satisfying the normalization constraint: ; b3: The confidence center embeddings of each modality are fused according to reliability weights to generate a defect panorama embedding h. i The comprehensive characteristics that characterize the current defect: ; In the formula, It is a linear projection function. =W p x+b p W p ∈R d×d and b p ∈R d h are the learnable parameters of the projection function. i ∈R d For defect panoramic embedding; b4: Calculate the cosine similarity s between the current defect embedding and the historical defect embeddings. ij Retrieve the L most similar cases to generate non-generative prior constraints N. i This constrains the range of subsequent root cause hypotheses that can be generated. ; N i ={ h j * |rank(s ij )≤L}; In the formula, h j * ∈R d For the fusion embedding of the j-th historical defect case in the historical defect knowledge base; s ij ∈[-1,1] represents the cosine similarity between the defect corresponding to the current i-th input data and the j-th historical defect; L is the number of similar cases; rank() is the ranking function; N i For the i-th input data, there is a non-generative prior constraint.
6. The automatic defect analysis method according to claim 5, characterized in that: The modality-specific encoders include: Transformer-type models for stack and log text, CodeBERT or GraphCodeBERT models for code changes, single-layer MLP models for test coverage, and graph encoders for topology-dependent models.
7. The automatic defect analysis method according to claim 1, characterized in that: The operations in the constraint reasoning module include: c1: Starting from the defect anomaly root node v0, perform bounded depth expansion and effective edge filtering on the program base graph G to generate a dynamic subgraph G containing only nodes related to fault propagation. i ; G i ={(v,e)∈G|depth(v,v0)≤D max ,e∈ε valid }; Among them, G i =(V i E i G represents the dynamic fault propagation subgraph of the i-th sample. i A subset of G; V i For G i The set of nodes in E i For G i The set of edges in; depth(v,v0) represents the shortest path length from node v to root node v0; D max Indicates the maximum expansion depth; ε valid The set of valid edges; c2: is the dynamic subgraph G i Each node v constructs a multi-dimensional comprehensive node feature x v ; x v =[cyclo(v),change_cnt(v),cov(v),embed(v)] T ; In the formula, x v ∈R d_x d x =3+d represents the total dimension of node features; cyclo(v)∈[0,1] represents the cyclomatic complexity of the function; change_cnt(v)∈[0,1]: Number of code changes within a specified time period; cov(v)∈[0,1] represents test coverage; embed(v)∈R d Represents the semantic embedding of functions and modules; c3: Fusing defective panoramic embeddings using Graph Attention Network (GAT) h i With node feature x v The root cause candidate score α is obtained by calculating the attention score of each node. v Filter out low-scoring nodes to obtain the root cause candidate node set C. i ; ; C i ={v∈V i |a v ≥e}; In the formula, α v ∈(0,1) represents the root cause candidate scores of node v, satisfying ∑α v =1, v∈V i ; a is the attention vector of the learnable graph attention, a∈R d+d_x ; W1∈R (d+d_x)×d and W2∈R (d+d_x)×d_x The feature projection matrix is a learnable feature; ε represents the temperature coefficient; C represents the scoring threshold; i This is the set of candidate nodes for root causes. c4: Based on the Large Language Model (LLM), it only applies to the root cause candidate node set C. i Internally, combined with non-generative prior constraints N i The root cause hypothesis {H} k } k=1 K ; c5: For each generated root cause hypothesis H k H was verified by examining the predicate corpus ρ. k Consistency with the code facts is used to calculate the violation degree V(H). k Remove violations exceeding the threshold V. max The illusion hypothesis yields the feasible root cause hypothesis set H. i ; ; In the formula, P r To test the predicate, let represent the r-th code fact judgment rule; R is the total number of code fact judgment rules defined in the test predicate library ρ; P r (H k ) indicates the judgment hypothesis H k Does it meet P? r Defined code facts; λ r ∈(0,1) represents the predicate weight, satisfying the condition: Σλ r =1, 1≤r≤R; H i ={H k |V(H k )≤V max }; In the formula, H i V is the set of feasible root cause hypotheses. max This is a violation of the threshold.
8. The automatic defect analysis method according to claim 2, characterized in that: The operations in the post-processing module include: d1: Node attention score a v With hypothesis violation degree V(H) k ), initialize the belief distribution of the root cause hypothesis, and obtain the initial belief π. k (0) ; ; In the formula, π k (0) ∈△ K Let the initial belief value of the k-th hypothesis satisfy the following constraints: ; V(H k ) represents the root cause hypothesis H k The corresponding dynamic subgraph node, α V(H_k) Represents the dynamic subgraph node V(H) k The root cause candidate score, where β is the preset violation penalty coefficient, exp(-β·V(H)). k )) represents the penalty item based on the degree of violation; d2: Define the observation feedback of the human review channel and the system channel as O. obs ={O human O sys }; Among them, O human Human-centric observations, the annotation results of R&D engineers on root cause hypotheses, are represented in the form of a modified belief distribution. ∈△ K ; O sys To ensure objective system observation, the code is corrected according to the detection results output by the constraint reasoning module. The reproduction results of the defect are then tested in a specified environment. The reproduction results are represented by a binary variable y, where y∈{0,1}, and a value of 0 indicates that the defect was not reproduced, while a value of 1 indicates that the defect was reproduced. d3: Updating beliefs about the human passageway using projected KL minimization; The confidence probability π of the k-th root cause hypothesis being the true root cause in the (t+1)-th iteration step. k (t+1) The result was obtained through calculation: ; Where, π (t) This represents the confidence probability that the root cause hypothesis is the true root cause in the t-th iteration step; γ is the smoothing coefficient. d4: Likelihood weighting is used to update beliefs about objective observations of the system, transforming objective observations of the system environment into probability weights; ; In the formula, L(y|H k Let be the likelihood function, representing the likelihood under hypothesis H. k Under the given conditions, the probability of observing the result y is calibrated by the model's historical accuracy: ; ρ correct ∈(0,1) represents the historical root cause localization accuracy of the model, estimated by the confusion matrix.
9. The automatic defect analysis method according to claim 8, characterized in that: The post-processing module also includes operations such as lightweight model parameter adaptation and historical knowledge base update. The lightweight parameter adaptation of the model is as follows: only the core sub-modules of the model are updated in small steps using low-rank LoRA adaptation, while the parameters of the main body of the large model and the encoder are frozen. IN t+1 =W (t) +△W; △W=A·B T ; In the formula, W (t) Here is the weight matrix of the module to be updated; △W is the weight increment, obtained from the labeled data of the dual-channel feedback; A∈R d×r and B∈R r×d Let r be a low-rank matrix; The historical knowledge base is updated to embed the current defect's full picture into h. i Compared with the labeled true root cause index k i Add to the historical defect knowledge base H * .