Software exception report adaptive construction method based on deep semantic analysis
By employing hierarchical semantic parsing and adaptive construction methods, the problem of semantic mismatch in software anomaly reports has been solved, enabling more accurate and stable anomaly localization and report generation, thereby improving the quality and efficiency of anomaly reporting.
Patent Information
- Authority / Receiving Office
- CN · China
- Patent Type
- Patents(China)
- Current Assignee / Owner
- GOLDEN SHIELD TESTING TECH CO LTD
- Filing Date
- 2026-04-15
- Publication Date
- 2026-06-16
AI Technical Summary
Existing methods for building software anomaly reports cannot adapt to the dynamic changes in software anomaly information due to fixed-depth semantic parsing, resulting in a semantic mismatch between the generated report and the actual anomaly scenario, which affects the effectiveness of localization and repair decisions.
A hierarchical semantic parsing mechanism is adopted. By constructing a hierarchical semantic representation structure of symbol feature layer, call topology layer and business logic layer, and combining the distinction between trusted anchor points and non-anchored nodes, the parsing depth boundary is dynamically determined and targeted completion is performed to generate adaptive anomaly reports.
It improves the semantic consistency and interpretability of anomaly localization, reduces the probability of misjudgment, enhances the engineering practicality and credibility of anomaly reports, realizes the fusion and conflict coordination of multi-source information, and ensures the structural integrity and semantic consistency of reports.
Smart Images

Figure CN122019243B_ABST
Abstract
Description
Technical Field
[0001] This invention relates to the field of semantic parsing technology, specifically to an adaptive construction method for software anomaly reports based on deep semantic parsing. Background Technology
[0002] Existing methods for building software anomaly reports typically employ a combination of predefined templates and rule engines. This involves pre-setting fixed report structures and field mapping rules, and then, when an anomaly is triggered, filling the corresponding fields with raw data such as stack traces, error codes, and timestamps to generate a formatted report. Some improved solutions incorporate natural language processing techniques, using keyword extraction or shallow semantic matching to summarize the anomaly log text and enrich the report's descriptive content.
[0003] However, all the above methods implicitly assume a common technical premise: performing fixed-depth semantic parsing on anomaly information and driving the structured construction of reports based on fixed parsing results. This assumption ignores the essential characteristics of software anomaly information: anomaly information is a cross-sectional projection of the program's runtime state, with a structural break between its surface semantics and deep technical semantics. The same surface anomaly feature can correspond to multiple fundamentally different deep semantics, and the effective boundaries of deep semantics dynamically change with changes in code context, business execution path, and system runtime state. Therefore, fixed-depth semantic parsing cannot provide semantically sufficient and granularly adapted parsing results for report construction, leading to a systematic semantic mismatch between the structure of the generated report and the actual anomaly scenario, severely affecting the report's ability to effectively support anomaly localization and remediation decisions.
[0004] To address this, an adaptive construction method for software anomaly reports based on deep semantic parsing is proposed. Summary of the Invention
[0005] The purpose of this invention is to provide an adaptive construction method for software anomaly reports based on deep semantic parsing. By constructing a hierarchical semantic representation structure, determining the parsing depth boundary, performing targeted completion and semantic increment negotiation, the method achieves reliable parsing of anomaly information and adaptive generation of report structure.
[0006] To achieve the above objectives, the present invention provides the following technical solution:
[0007] An adaptive construction method for software anomaly reports based on deep semantic parsing includes:
[0008] When an exception is triggered, the bytecode structure index, thread frame state sequence, and bounded log buffer are exported to form an evidence set.
[0009] Based on the evidence set, the original abnormal information is parsed progressively by the symbol feature layer, the call topology layer, and the business logic layer. Semantic nodes that match the evidence set are marked as trusted anchors, and unmatched nodes are marked as non-anchored nodes. Based on the coverage of trusted anchors in each layer, the built-in confidence index of each layer is generated to form a hierarchical semantic representation structure.
[0010] Based on the hierarchical semantic representation structure, the gradient sequence is formed by the change in confidence between adjacent layers. The effective parsing depth boundary is determined, bytecode indexing and directional completion are performed on non-anchored nodes in the boundary layer, and the semantic parsing result data packet is output.
[0011] Based on the semantic parsing result data packet, the necessity markers, maximum semantic reference depth and non-anchored explicit annotation requirements of each report structure unit are obtained to form report structure constraint data;
[0012] Based on the semantic parsing result data packets and report structure constraint data, non-anchored reference nodes are detected, and local supplementation requests are sent after merging according to the evidence missing type. The semantic increment returned each time is counted. Cross-unit conflict nodes are resolved based on the conclusion of the degree of coverage of trusted anchor points, and an adaptive anomaly report is output.
[0013] Preferably, the evidence set acquisition process includes: when an abnormal trigger signal is generated, exporting the bytecode structure index of the currently loaded class; for dynamically generated classes that cannot export complete bytecode records in the runtime environment using standard interfaces, the associated semantic nodes are directly marked as non-anchored nodes during parsing and the evidence missing type is recorded as unreachable by the method definition;
[0014] Export the current thread frame state sequence, which includes the fully qualified name of the method, the bytecode execution location, and the local variable type information corresponding to each frame; extract the log buffer content within a preset time window before the exception trigger time, which includes the business operation identifier sequence, the timestamp corresponding to each identifier, and the thread identifier corresponding to each identifier.
[0015] The evidence is solidified into an immutable set at the moment of the exception trigger, and no update operation is performed on the evidence set during the parsing process.
[0016] Preferably, the hierarchical semantic representation structure acquisition process includes: the symbol feature layer performs a matching search on the exception class name, method signature and bytecode line number in the stack frame with the type definition records in the bytecode structure index one by one, and the symbol nodes with corresponding type definition records in the index are marked as trusted anchors, and the symbol nodes without corresponding records are marked as non-anchored nodes, and the confidence index of the symbol feature layer is generated by the ratio of the number of trusted anchors to the number of non-anchored nodes in this layer.
[0017] The topology layer constructs a directed call graph with stack frames as nodes in the call order. The call relationship represented by the directed edge is searched in the bytecode structure index for the corresponding method call instruction record. The nodes at both ends of the matched edge are marked as trusted anchors. The confidence index of the call topology layer is generated by the ratio of the number of trusted anchors to the number of non-anchored nodes.
[0018] The business logic layer performs co-occurrence matching on the path segments in the directed call graph and the business operation identifier sequence in the bounded log buffer that is consistent with the thread identifier of the current exception in the time interval. It excludes business operation identifiers generated by other threads in the same time window, and marks the path nodes that have a co-occurrence relationship with the business operation identifier in the corresponding time interval as trusted anchors. The confidence index of the business logic layer is generated by the ratio of the number of trusted anchors in this layer to the number of non-anchored nodes.
[0019] The three-layer confidence index and the annotation results of each layer node are encapsulated together into a hierarchical semantic representation structure.
[0020] Preferably, the semantic parsing result data packet acquisition process includes: calculating the difference between confidence indices between adjacent layers to form a gradient sequence, and scanning the gradient sequence with a preset absolute threshold; if there is a difference exceeding the absolute threshold, the layer position where the difference is largest is determined as the effective parsing depth boundary; if all differences are below the absolute threshold, the effective parsing depth boundary is extended to the deepest layer; if all differences exceed the absolute threshold and the difference between the maximum and minimum difference is below a preset relative threshold, the effective parsing depth boundary is shrunk to the first layer, and a contextual evidence deficiency notification containing the evidence missing status of each layer is generated.
[0021] After determining the effective parsing depth boundary, a targeted completion query is performed on each non-anchored node within the boundary layer in the bytecode structure index. The query scope includes the method definition records of the direct caller frame and the directly called frame corresponding to the non-anchored node. If the query is successful, the non-anchored node is upgraded to a trusted anchor and the confidence index of this layer is updated. If the query fails, the evidence missing type is recorded according to the ambiguity of the call relationship, the lack of type information, or the unreachability of the method definition. The hierarchical position markers of the effective parsing depth boundary, the annotation results of each layer of nodes, and the evidence missing type classification records are encapsulated into a semantic parsing result data packet for output.
[0022] Preferably, the report structure constraint data acquisition process includes: extracting the effective parsing depth boundary level position from the semantic parsing result data packet, and generating labels for the report structure units according to the boundary level: when the boundary is located in the symbol feature layer, the code location unit is marked as mandatory, and the business impact description unit is marked as prohibited; when the boundary is located in the call topology layer, extracting the number of connected components in the directed call graph, when the number of connected components is one, the single path root cause location unit is marked as mandatory, and when the number of connected components is greater than one, the multi-candidate path comparison unit is marked as mandatory; when the boundary is located in the business logic layer, counting the number of co-occurrence coverage nodes of the business operation identifier in the thread-consistent log buffer, when exceeding the preset diffusion threshold, the functional impact diffusion unit is marked as mandatory; and using the level where the effective parsing depth boundary is located as the maximum semantic layer depth that all report structure units are allowed to reference.
[0023] Map each missing type in the evidence missing type classification record to the corresponding non-anchored explicit annotation requirements; encapsulate the set of necessity markers, the maximum semantic layer depth, and the set of non-anchored explicit annotation requirements into report structure constraint data output.
[0024] Preferably, detecting non-anchored reference nodes, merging them according to the type of missing evidence and sending partial supplementation requests, and calculating the semantic increment returned each time specifically includes: grouping the partial supplementation requests according to the code module corresponding to the non-anchored node referenced by each structural unit, executing requests pointing to different code modules concurrently, and merging requests pointing to the same code module into a single targeted query;
[0025] After the parsing module returns the differential supplementary data, the number of newly added trusted anchors is counted as the semantic increment: when the semantic increment is greater than zero, the content of the corresponding structural unit is updated in a differential manner and the non-anchored reference detection is re-executed to enter the next round of request process; when the semantic increment is equal to zero, it is determined that the structural unit has reached the semantic limit of the current evidence set, and structured semantic boundary annotations are inserted at the positions of the remaining non-anchored content. The annotation content includes a standardized description of the corresponding evidence missing type, and the unit is marked as completed and the negotiation loop of the unit is terminated.
[0026] Preferably, the adaptive anomaly reporting output process includes:
[0027] After all structural units have completed negotiation, a mapping table is established from the identifiers of all cross-unit reference semantic nodes in the report to the parsing conclusions. Conflicts are detected where the same semantic node identifier has different parsing conclusions in different structural units. For each conflicting node, the number of trusted anchors associated with each of the two parsing conclusions is compared, and the parsing conclusion with more trusted anchors is taken as the final parsing conclusion. When the number of trusted anchors associated with each of the two conflicting parsing conclusions is equal, the parsing conclusion from the shallower parsing depth is taken as the final parsing conclusion.
[0028] The content of conflict nodes in the relevant structural units is updated based on the final analysis results. The overall structural integrity of the report is verified. After confirming that the necessity markers of each structural unit have been satisfied, the adaptive report text, the analysis depth limitation declaration, and the conflict resolution record are packaged and output to form a complete adaptive anomaly report.
[0029] Compared with the prior art, the beneficial effects of the present invention are as follows:
[0030] 1. This invention constructs a layered semantic parsing mechanism consisting of a "symbolic feature layer—call topology layer—business logic layer," and introduces a distinction between trusted anchor points and non-anchored nodes. This enables a step-by-step mapping and verification of anomaly information from the underlying code to the upper-level business semantics. Compared to existing analysis methods that rely solely on stack traces or logs, this scheme utilizes bytecode structure indexes and thread frame states for cross-validation, effectively improving the semantic consistency and interpretability of anomaly localization. Simultaneously, by quantifying the reliability of each parsing layer through confidence metrics, the anomaly analysis process is transformed from experience-driven to a calculable and measurable process, significantly reducing the probability of misjudgment and improving the accuracy and stability of root cause analysis in complex systems.
[0031] 2. This invention proposes an effective analytical depth adaptive determination mechanism based on inter-layer confidence gradients. This mechanism can dynamically identify the credible boundaries of semantic analysis and perform targeted completion queries at the boundary layer. Compared with traditional fixed-depth or full-scale analysis methods, this method can avoid ineffective in-depth analysis while ensuring analysis quality and reducing computational overhead. Simultaneously, through structured classification and explicit labeling of evidence-missing types, the system can clearly express the boundaries between "known" and "unknown," avoiding the generation of misleading conclusions. Even with insufficient evidence, it can still output reports with complete structure and clear semantic boundaries, thus significantly improving the engineering practicality and credibility of anomaly reports.
[0032] 3. This invention introduces a semantic increment-based negotiation-based report construction mechanism to achieve dynamic improvement and conflict self-consistency in abnormal report content. The system can perform local supplementation requests for non-anchored nodes by module merging and determine whether to continue iterative optimization through semantic increments, avoiding invalid request loops. Simultaneously, when conflicting conclusions exist among multiple structural units, automatic resolution is performed based on the coverage of trusted anchor points and the parsing level, ensuring overall report consistency. Compared to existing static generation or manual revision methods, this method can automatically complete multi-source information fusion and conflict coordination, significantly improving report generation efficiency and ensuring high reliability of output results in terms of structural integrity and semantic consistency. Attached Figure Description
[0033] Figure 1 A schematic diagram of the adaptive construction method for software anomaly reports based on deep semantic parsing provided by the present invention;
[0034] Figure 2 This is a schematic diagram of the effective parsing depth boundary determination logic provided by the present invention;
[0035] Figure 3 This is a schematic diagram of the semantic increment-driven local supplementation logic flow provided by the present invention. Detailed Implementation
[0036] To make the objectives, technical solutions, and advantages of this invention clearer, the invention will be further described in detail below with reference to the accompanying drawings and embodiments. It should be understood that the specific embodiments described herein are for illustrative purposes only and are not intended to limit the invention.
[0037] Example 1
[0038] This embodiment provides a complete description of the adaptive construction method for software anomaly reports based on deep semantic parsing. The method runs in software systems with runtime bytecode inspection capabilities (such as server-side applications deployed in a Java Virtual Machine environment), with the ultimate output goal being adaptive anomaly reports for R&D and operations personnel.
[0039] Please see Figure 1 This invention provides an adaptive construction method for software anomaly reports based on deep semantic parsing. The technical solution is as follows: When an anomaly is triggered, the bytecode structure index, thread frame state sequence, and bounded log buffer are exported to form an evidence set; based on the evidence set, the original anomaly information is parsed progressively according to the symbol feature layer, call topology layer, and business logic layer; semantic nodes that match the evidence set are marked as trusted anchors, and unmatched nodes are marked as non-anchored nodes; based on the coverage of trusted anchors in each layer, a layer-in-layer confidence index is generated to form a hierarchical semantic representation structure; based on the hierarchical semantic representation structure, the change in confidence between adjacent layers is calculated to form a ladder. The system first determines the effective parsing depth boundary based on the degree sequence, performs bytecode indexing-oriented completion on non-anchored nodes within the boundary layer, and outputs a semantic parsing result data packet. Based on the semantic parsing result data packet, it obtains the necessity markers, maximum semantic reference depth, and non-anchored explicit annotation requirements for each report structure unit, forming report structure constraint data. Based on the semantic parsing result data packet and the report structure constraint data, it detects non-anchored reference nodes, merges them according to the evidence missing type, and sends local supplementation requests, calculating the semantic increment returned each time. For cross-unit conflict nodes, it resolves them based on the conclusion of the degree of credible anchor coverage, and outputs an adaptive anomaly report.
[0040] Furthermore, the process of acquiring the evidence set includes:
[0041] When an abnormal trigger signal is generated, the bytecode structure index of the currently loaded classes is exported. The bytecode structure index contains the method signature set, method call instruction position sequence, and type inheritance relationship mapping of each loaded class. For dynamically generated classes that cannot export complete bytecode records using the standard interface in the runtime environment, the method signature record of the corresponding frame is included in the bytecode structure index and marked as a dynamic type entry. The semantic nodes associated with this entry are directly marked as non-anchored nodes in subsequent parsing and the evidence missing type is recorded as unreachable method definition, which does not affect the matching process of other loaded class entries.
[0042] Export the frame state sequence of the current thread. The frame state sequence includes the fully qualified method name, bytecode execution location, and local variable type information for each frame. Extract the log buffer content within a preset time window before the exception trigger time. The log buffer content includes a business operation identifier sequence, a timestamp corresponding to each identifier, and a thread identifier corresponding to each identifier. The business operation identifier is a structured identifier field in the log buffer used to represent business processing actions or a reliably extractable text identifier (such as interface name, transaction code, or pre-written business tag). For the same log entry, only one business operation identifier is extracted.
[0043] The above three types of data are solidified into an immutable evidence set at the moment of anomaly triggering. No update operation is performed on the evidence set in subsequent analysis processes to ensure that the evidence sources on which each analysis is based have temporal consistency.
[0044] Specifically, after the runtime environment receives a software exception trigger signal, it immediately initiates three types of evidence export operations concurrently. All three types of operations must be completed synchronously at the moment of exception triggering and must not be added or updated in subsequent analysis stages.
[0045] The first category is bytecode structure indexing. This involves using runtime bytecode inspection interfaces (such as the JVMTI interface or an equivalent bytecode manipulation framework) to enumerate all classes loaded in the current runtime environment. For each loaded class, the following are extracted: the fully qualified signatures of all methods (including method name, parameter type list, and return type); the bytecode offset sequence of all method call instructions within each method body; and the type inheritance mapping of the class's parent classes and implemented interfaces. These three items together constitute the bytecode structure index entry for that class. For classes dynamically generated at runtime through reflection proxies, bytecode enhancement frameworks, or anonymous inner class mechanisms, and for which complete bytecode records cannot be exported using standard interfaces, only the method signature string of the corresponding frame is included in the index and a dynamic type entry annotation is added. All semantic nodes associated with entries bearing this annotation are directly marked as non-anchored nodes in subsequent layers of parsing, with the evidence missing type pre-defined as "method definition unreachable," and this pre-defined type does not affect the matching process of other normally exported entries. Semantic nodes are the basic processing units for structurally representing the original information of an exception and its associated runtime evidence. A semantic node must include at least: node identifier, parsing level, node type, corresponding evidence source identifier, and node content description.
[0046] The second category involves exporting the thread frame state sequence. This involves obtaining the complete frame stack of the thread that triggered the exception, and extracting the following from each frame: the fully qualified name of the corresponding method (including package path, class name, method name, and parameter types); the current execution position of the frame in the method bytecode (represented by bytecode offset); and the declaration type information of each variable in the local variable table. The frame stack is arranged in the order of invocation, from the innermost called frame to the outermost calling frame, forming an ordered frame state sequence.
[0047] The third category is bounded log buffer truncation, which truncates the log buffer content within a preset time window before the exception is triggered. This includes the identifier string of each business operation, the corresponding millisecond-precision timestamp, and the thread identifier that generated the identifier.
[0048] The preset time window is obtained as follows: Collect a historical set of anomaly samples from the target application, ideally no fewer than 30 anomaly events with complete log records; for each sample, calculate the time interval between the earliest timestamp of the business operation identifier associated with the call path of this anomaly in the log before the anomaly trigger time and the anomaly trigger time; take the 75th value of this interval after sorting all samples as the initial configuration value of the preset time window to cover the complete time span of most business operation sequences. In the initial deployment scenario without historical data, use 5 seconds as a conservative default value, and update the configuration using the above method after accumulating 30 samples.
[0049] Furthermore, based on the evidence set, the original abnormal information is progressively analyzed, and layer-based built-in reliability indicators are generated according to the coverage degree of credible anchor points at each layer, forming a hierarchical semantic representation structure, specifically including:
[0050] In the symbolic feature layer, the exception class name, method signature, and bytecode line number in the stack frame are matched against the type definition records in the bytecode structure index one by one. Symbol nodes with corresponding type definition records in the index are marked as trusted anchors, and symbol nodes without corresponding records are marked as non-anchored nodes. The confidence index of the symbolic feature layer is generated by the ratio of the number of trusted anchors to the number of non-anchored nodes in this layer.
[0051] In the call topology layer, a directed call graph is constructed with stack frames as nodes in the call order. For each directed edge in the graph, the corresponding method call instruction record is searched in the bytecode structure index. The nodes at both ends of the found edge are marked as trusted anchors. The confidence index of the call topology layer is generated by the ratio of the number of trusted anchors to the number of non-anchored nodes.
[0052] In the business logic layer, path segments in the directed call graph and business operation identifier sequences in the bounded log buffer that are consistent with the thread identifier of the current exception are performed co-occurrence matching according to time intervals. Business operation identifiers generated by other threads in the same time window are excluded to ensure execution context consistency. If the method signature set corresponding to a certain path segment and the business operation identifier sequence have an associated co-occurrence relationship in the corresponding time interval, the path segment node is marked as a trusted anchor point. The confidence index of the business logic layer is generated by the ratio of the number of trusted anchor points in this layer to the number of non-anchored nodes.
[0053] The three-layer confidence index and the annotation results of each layer node are encapsulated together into a hierarchical semantic representation structure.
[0054] Specifically, based on the solidified evidence set, the abnormal information is analyzed in three layers: symbolic feature layer, call topology layer, and business logic layer.
[0055] The layer's built-in confidence index is defined as follows: The number of semantic nodes marked as trusted anchors (denoted as A) and the number of semantic nodes marked as non-anchor nodes (denoted as U) in this layer are counted. The ratio of A to U is used as the layer's confidence index value. When U is zero (i.e., all nodes in this layer are trusted anchors), U is substituted with 1 in the calculation (i.e., the confidence index value equals the total number of nodes in this layer). The handling logic for this special case is written into the configuration during system initialization to ensure that the confidence index is a finitely comparable value in all scenarios.
[0056] Symbolic feature layer parsing: Using the exception class name (extracted if the current frame is an exception-throwing frame), fully qualified method signature, and bytecode execution position of each frame in the frame state sequence as symbol nodes, perform an exact match search in the bytecode structure index to confirm whether there is a type definition record that is completely consistent with the method signature and an instruction record corresponding to the bytecode position in the index; those that match are marked as trusted anchors, and those that do not match are marked as non-anchored nodes; calculate the confidence index of the symbolic feature layer by statistically analyzing A and U.
[0057] The call topology layer resolution process involves using each frame in the frame state sequence as a directed graph node, establishing directed edges (from the calling frame to the called frame) according to the actual call order, and constructing a call directed graph. For each directed edge in the graph, the bytecode structure index is searched to determine if there exists a call instruction pointing to the corresponding method in the called frame, and if the bytecode offset of this call instruction matches the current execution position of the calling frame. For matched directed edges, both ends are marked as trusted anchors; for unmatched directed edges, both ends are marked as non-anchored nodes (when the same node is marked by multiple edges, trusted anchors take priority and are only counted once); the call topology layer confidence index is then calculated.
[0058] Business logic layer parsing: Log entries with thread identifiers consistent with the current exception-triggered thread are filtered from the bounded log buffer, excluding log entries generated by other threads, forming a thread-consistent log subset. For each continuous frame sequence (path segment) in the directed call graph, the business operation identifier associated with that path segment is found within the thread-consistent log subset whose timestamp falls within a preset time window before the exception trigger time; path nodes with the above co-occurrence relationship are marked as trusted anchors, and path nodes without co-occurrence relationship are marked as non-anchored nodes; A and U are statistically analyzed, and the business logic layer confidence index is calculated.
[0059] The three-layer confidence index and the annotation results of each layer node are encapsulated into a hierarchical semantic representation structure and passed to the next stage.
[0060] Furthermore, based on the hierarchical semantic representation structure, gradient sequences are calculated, valid parsing depth boundaries are determined, bytecode indexing and directional completion are performed on non-anchored nodes within the boundary layer, and semantic parsing result data packets are output, referring to... Figure 2 Specifically, it includes:
[0061] The difference between confidence indices between adjacent layers is calculated sequentially to form a gradient sequence. The gradient sequence is scanned with a preset absolute threshold. If there is a difference exceeding the absolute threshold, the layer with the largest difference is determined as the effective analytical depth boundary. If all differences are below the absolute threshold, the effective analytical depth boundary is extended to the deepest layer. If all differences exceed the absolute threshold and the difference between the maximum and minimum differences is below a preset relative threshold, the effective analytical depth boundary is shrunk to the first layer, and a contextual evidence deficiency notification containing the evidence deficiency status of each layer is generated. The description of the evidence deficiency status of each layer contained in the contextual evidence deficiency notification is written into the report metadata in a structured form and is explicitly presented in the adaptive anomaly report in the form of an analysis depth limitation declaration. The declaration includes the boundary layer position and the judgment basis for triggering the contraction.
[0062] After determining the effective parsing depth boundary, a targeted completion query is performed on each non-anchored node within the boundary layer in the bytecode structure index. The query scope is the method definition records of the direct caller frame and the directly called frame corresponding to the non-anchored node. If the query is successful, the non-anchored node is upgraded to a trusted anchor and the confidence index of this layer is updated. If the query is unsuccessful, the evidence missing type is recorded according to the category of unclear call relationship, missing type information, or unreachable method definition.
[0063] The hierarchical location markers of the effective parsing depth boundaries, the annotation results of each layer node, and the classification records of evidence missing types are encapsulated into a semantic parsing result data packet for output.
[0064] Specifically, the difference in confidence index between adjacent layers is calculated sequentially to form a gradient sequence (positive values indicate a decrease in confidence from the previous layer to the next layer, and negative values indicate an increase).
[0065] The method for obtaining the preset absolute threshold is as follows: Calculate the statistical distribution of gradient values between layers for the historical abnormal sample set, and take the 70th value after sorting the distribution as the initial value of the absolute threshold. A decrease exceeding this value is considered a significant decrease. When there is no historical data, 0.4 is used as the default value, which means that a decrease of more than 40% in the confidence ratio between adjacent layers is considered significant.
[0066] The preset relative threshold is obtained as follows: In scenarios where all gradients exceed the absolute threshold, if the difference between the maximum and minimum values of the gradient sequence is lower than the relative threshold, the descent magnitudes of each layer are similar, making it impossible to distinguish the priority of the parsing layer. The relative threshold is set to one-quarter of the absolute threshold value by default, meaning that when the gradient differences between layers are insufficient to distinguish priorities, the overall parsability is considered low. After system deployment, the relative threshold can be updated by taking the 25th percentile value of the gradient sequence range distribution in all historical cases that trigger scenario three.
[0067] The boundary determination rules are as follows:
[0068] Scenario 1: If there are elements in the gradient sequence that exceed the absolute threshold, the interlayer position corresponding to the largest value is taken as the effective parsing depth boundary. Scenario 2: If all gradient values are below the absolute threshold, the boundary extends to the business logic layer (deepest layer), and the report can reference the parsing results of all three layers. Scenario 3: If all gradient values exceed the absolute threshold and the difference between the maximum and minimum values of the gradient sequence is below the preset relative threshold, the boundary shrinks to the symbolic feature layer (first layer). The system generates a notification of insufficient contextual evidence, which includes the gradient values of each layer and the comparison results with the threshold. This notification is written into the report metadata in a structured form and presented in the final report as a statement of limited analysis depth. The statement includes the boundary layer position (first layer) and the criteria for triggering the shrinkage.
[0069] After determining the boundary, a targeted completion query is performed on each non-anchored node within the boundary layer. The query scope is limited to the method definition records of the direct caller frame and the directly called frame corresponding to the node. If the query hits, the node is upgraded to a trusted anchor and the confidence index of this layer is updated; if the query misses, it is recorded according to the following criteria: if there is no record of the corresponding method in the bytecode index, it is recorded as "method definition unreachable"; if there is a method record but lacks type declaration information, it is recorded as "type information missing"; if the method definition exists but cannot be found in the calling instructions to have a direct calling relationship with the current frame, it is recorded as "calling relationship ambiguous".
[0070] The system encapsulates the boundary level location markers, the node annotation results of each level, and the evidence missing type classification records into a semantic parsing result data packet for output.
[0071] Furthermore, based on the semantic parsing result data packet, the necessity markers, maximum semantic reference depth, and non-anchored explicit annotation requirements for each report structure unit are obtained, forming report structure constraint data, specifically including:
[0072] Extract the effective parsing depth boundary level positions from the semantic parsing result data packets, and generate necessity markers for each report structure unit based on the boundary level to control the content constraints when the computer automatically generates reports: when the boundary is located at the symbol feature layer, the code location precision positioning unit is marked as mandatory, and the business impact description unit is marked as prohibited; when the boundary is located at the call topology layer, extract the number of connected components in the directed call graph, and when the number of connected components is one, the single path root cause positioning unit is marked as mandatory, and when the number of connected components is greater than one, the multi-candidate path comparison unit is marked as mandatory; when the boundary is located at the business logic layer, count the number of co-occurrence coverage nodes of the business operation identifier in the thread-consistent log buffer, and when the coverage number exceeds a preset diffusion threshold, the functional impact diffusion unit is marked as mandatory.
[0073] The maximum semantic layer depth that all report structure units are allowed to reference is the level at which the effective resolution depth boundary is located.
[0074] Map the various missing types in the evidence missing type classification record to the corresponding non-anchored explicit annotation requirements: unreachable method definition is mapped to definition boundary annotation constraints, unclear call relationship is mapped to path uncertainty declaration constraints, and missing type information is mapped to type source untraceable annotation constraints.
[0075] The set of necessity tags, the maximum semantic reference depth, and the set of non-anchored explicit annotation requirements are encapsulated into report structure constraint data output.
[0076] Specifically, based on the effective resolution depth boundary level location, necessity markers are generated for each report structural unit:
[0077] When the boundary is located at the symbolic feature layer, the code location precision positioning unit (a structured data unit containing technical positioning information such as the class name, method name, and bytecode line number where the exception occurred) is marked as mandatory; the business impact description unit is marked as prohibited because the current evidence set only supports symbolic layer technical information and has no business semantic layer evidence.
[0078] When the boundary is located in the calling topology layer, the number of connected components is extracted from the calling directed graph (connected component refers to a subset of the set of nodes in the graph, where there is a directed path between any two nodes and no directed path between the node and any node outside the subset): when the number of connected components is one, the single path root cause localization unit is marked as required; when the number of connected components is greater than one, the multi-candidate path comparison unit is marked as required, and the trusted anchor coverage of each path unit must meet the minimum anchor coverage limit configured by the system (the default value is 50%, that is, the proportion of trusted anchor nodes in each path is not less than half to be included in the report).
[0079] When the boundary is located at the business logic layer, the number of co-occurring coverage nodes of the business operation identifier in the thread-consistent log subset is counted (the number of non-repeating business operation identifiers that have a co-occurrence relationship with the parsing path). When the coverage number exceeds the preset diffusion threshold, the functional impact diffusion unit is marked as mandatory. The preset diffusion threshold is obtained in the following way: combined with the system architecture document, the median of the business operation nodes directly associated with each module is counted, and 1.5 times this median is rounded up as the diffusion threshold, indicating that when the impact range exceeds the normal association range of a single module, it is considered diffusion; when there is no architecture document, 3 is used as the default value.
[0080] The maximum semantic layer depth that all report structure units are allowed to reference is the level at which the effective resolution depth boundary is located. No unit may reference semantic content beyond this depth.
[0081] The mappings for evidence missing types are as follows: Method definition unreachable is mapped to definition boundary label constraint, requiring the corresponding unit to declare in a standardized format that the method definition information cannot be obtained from the current bytecode index; Indistinct call relationship is mapped to path uncertainty declaration constraint, requiring the declaration that the call path segment cannot be traced deterministically in the current context; Missing type information is mapped to type source untraceable label constraint, requiring the declaration that the type source of the node cannot be traced back to the current bytecode index.
[0082] The above three types of constraints are encapsulated into report structure constraint data output.
[0083] Furthermore, non-anchored reference nodes are detected, local supplementation requests are sent after merging according to the type of missing evidence, and the semantic increment of each return is counted, referring to... Figure 3 Specifically, it includes:
[0084] Local supplement requests are grouped according to the code modules corresponding to the non-anchored nodes referenced by each structural unit. Requests pointing to different code modules are executed concurrently, while requests pointing to the same code module are merged into a single targeted query. The query results are synchronously returned to all related structural units to eliminate data inconsistency under concurrent access.
[0085] After the parsing module returns the differential supplementary data, the number of newly added trusted anchors is counted as the semantic increment: when the semantic increment is greater than zero, the content of the corresponding structural unit is updated in a differential manner and the non-anchored reference detection is re-executed to enter the next round of request process; when the semantic increment is equal to zero, it is determined that the structural unit has reached the semantic limit of the current evidence set, and structured semantic boundary annotations are inserted at the positions of the remaining non-anchored content. The annotation content contains a standardized description of the corresponding evidence missing type, the unit is marked as completed, and the negotiation loop of the unit is terminated. Since the number of non-anchored nodes in each structural unit is monotonically non-increasing in each iteration and the node state can only be changed from non-anchored to trusted anchor in one direction, the negotiation process will inevitably terminate within a finite number of iterations.
[0086] Furthermore, for cross-unit conflict nodes, the conclusion of the degree of coverage of trusted anchor points is used as the criterion for resolution, and an adaptive anomaly report is output, specifically including:
[0087] After all structural units have completed negotiation, a mapping table is established from the identifiers of all cross-unit reference semantic nodes in the report to the parsing conclusions, and conflicting situations are detected in which the same semantic node identifier has different parsing conclusions in different structural units.
[0088] For each conflict node, compare the number of trusted anchors associated with each of the two resolution conclusions, and take the resolution conclusion with more trusted anchors as the final resolution conclusion; when the number of trusted anchors associated with each of the two conflict resolution conclusions is equal, take the resolution conclusion from the shallower resolution depth as the final resolution conclusion, and write the resolution conclusion to be replaced, the number of trusted anchors of each, whether the tie resolution rule is triggered, and the basis for conflict resolution into the conflict resolution record of the report metadata.
[0089] The content of conflict nodes in the relevant structural units is updated based on the final analysis results. The overall structural integrity of the report is verified. After confirming that the necessity markers of each structural unit have been satisfied, the adaptive report text, the analysis depth limitation declaration, and the conflict resolution record are packaged and output to form a complete adaptive anomaly report.
[0090] Specifically, supplementary requests are grouped according to the code modules corresponding to the non-anchored nodes referenced by each structural unit. The division of code modules is based on the granularity of the combination of the top-level package name and the sub-package name of the fully qualified method name in the bytecode structure index (e.g., com.example.order and com.example.payment are different modules). Requests pointing to different code modules are executed concurrently; multiple requests pointing to the same code module are merged into a single directed query, and the query results are synchronously returned to all related units, eliminating duplicate queries and data inconsistencies under concurrent access.
[0091] For each structural unit, after the assembler completes the initial content mapping, it performs non-anchored reference detection: if non-anchored reference nodes exist, a local supplement request is generated and sent to the parsing module; after the parsing module returns differential supplement data, it counts the number of newly added trusted anchors (semantic increment). When the semantic increment is greater than zero, only the content covered by the newly added anchors is updated in a differential manner, and the non-anchored reference detection is re-executed on the updated unit to enter the next round; when the semantic increment is equal to zero, it is determined that the unit has reached the semantic limit of the current evidence set, and structured semantic boundary annotations are inserted at the remaining non-anchored content positions (the annotation content contains a standardized description of the corresponding missing evidence type, such as the call relationship being untraceable in the current context), marking the unit as completed and exiting the negotiation loop. The limited termination of the negotiation process is based on the following fact: in each iteration, the number of non-anchored nodes monotonically does not increase (the node state can only be changed from non-anchored to trusted anchor in one direction, which is irreversible), and the negotiation process terminates when the semantic increment is equal to zero. Therefore, the negotiation process will inevitably terminate within a finite number of iterations, without the need for an additional timeout protection mechanism.
[0092] After all units have completed negotiation, the system establishes a mapping table from all cross-unit reference semantic node identifiers in the report to the resolution conclusions of each unit, detecting conflicts where the same node identifier has different resolution conclusions in different units. For each conflicting node, the resolution conclusion with more associated trusted anchors is taken as the final conclusion; if the number of trusted anchors is equal, the resolution conclusion from the shallower resolution depth is taken as the final conclusion, because the bytecode direct matching evidence based on the shallower resolution conclusion is more direct and does not rely on the inference link. The replaced conclusion, the number of trusted anchors of each party, whether the tie resolution rule was triggered, and the resolution basis are all written into the conflict resolution record of the report metadata.
[0093] The system performs structural integrity verification on the entire report. After confirming that all required elements have generated content and all prohibited elements have not generated content, it encapsulates and outputs the adaptive report text, the analysis depth limitation declaration (only included when case three is triggered), and the conflict resolution record to form a complete adaptive anomaly report.
[0094] Example 2
[0095] This embodiment, based on the method described in Embodiment 1, provides a detailed explanation of the call path signature hash caching mechanism and the confidence gradient normalization mechanism. The technical content of the five stages—evidence set acquisition, three-layer progressive parsing, boundary determination and controlled completion, report structure constraint generation, and negotiated assembly and conflict resolution—from Embodiment 1 is fully retained in this embodiment. The following description focuses on the two new technical features added in this embodiment.
[0096] After outputting the semantic parsing result data packet, a hash calculation is performed on the call path signature, which consists of a sequence of fully qualified method names arranged in the call order of all frames in the thread frame state sequence. The hash value of the call path signature is combined with the bytecode structure index version identifier obtained by performing a hash calculation on the contents of all entries in the bytecode structure index to form a composite cache key. The composite cache key, along with the hierarchical position marker of the effective parsing depth boundary, the distribution record of trusted anchor points at each layer, and the evidence missing type classification record, is stored as a cache entry. When a subsequent exception is triggered, before starting the hierarchical semantic parsing, the call path signature hash value is calculated on the frame state sequence of the current exception, and the version identifier is calculated on the current bytecode structure index. The two are combined to perform a cache lookup. If the lookup is successful, the boundary hierarchical position and trusted anchor point distribution in the cache entry are loaded as the initial parsing state. Only non-anchored nodes within the boundary layer are re-executed using the current evidence set, skipping the full parsing process of the symbol feature layer, call topology layer, and business logic layer. The completion result is merged with the cache state and encapsulated into a semantic parsing result data packet for output. If the lookup is unsuccessful, the complete hierarchical parsing process is executed.
[0097] During the cache lookup process, after confirming a match in the call path signature hash value, the bytecode structure index version identifier stored in the matched cache entry is further extracted and compared character by character with the version identifier of the bytecode structure index in the current runtime environment. If the two are completely identical, the cache entry is deemed valid, and cache loading is performed. If there is any difference between the two, it is determined that the bytecode structure has changed since the cache entry was created, the cache entry is removed from the cache, the complete hierarchical parsing process is initiated, and the newly generated semantic parsing result data packet is rewritten to the cache after the complete parsing is completed. By binding cache validity to bytecode version consistency, it is ensured that changes in bytecode structure caused by code deployment updates or dynamic class loading will not cause the system to use expired trusted anchor data, thereby avoiding parsing errors caused by cache obsolescence.
[0098] Specifically, in this embodiment, after the complete three-layer parsing process described in Embodiment 1 is completed and the semantic parsing result data packet is output, the cache writing operation is started immediately.
[0099] The call path signature is calculated as follows: Extract the fully qualified names of all methods from the thread frame state sequence, arranged in call order (from the outermost calling frame to the innermost called frame). Concatenate the fully qualified names of all methods in order using a preset separator (such as the vertical bar). Perform a SHA-256 hash operation on this string, and use the hexadecimal representation of the result as the call path signature hash value. The SHA-256 algorithm will always produce the same 256-bit output under the same input, and the probability of different inputs producing the same output is extremely low (approximately on the order of 2 to the power of -256). In engineering practice, it can be considered as having no collision risk, satisfying the requirement of cache key uniqueness.
[0100] The bytecode structure index version identifier is calculated as follows: Iterate through all loaded class entries in the bytecode structure index (including normal and dynamically typed entries). For each entry, concatenate the fully qualified class name with the signatures of all methods of that class in lexicographical order. Perform a SHA-256 hash operation on the final concatenation result, and use the hexadecimal representation as the version identifier. When a new class is loaded or a class is unloaded at runtime, the content of all entries changes, and the version identifier changes accordingly, thus in principle detecting all runtime changes affecting the bytecode structure.
[0101] The hash value of the aforementioned call path signature is concatenated with the bytecode structure index version identifier (separated by a preset connector) to form a composite cache key. Using the composite cache key as an index, the effective parsing depth boundary level position markers, the distribution records of trusted anchor points at each layer (including the node identifiers of trusted anchor points at each layer and the evidence source type), and the evidence missing type classification records in the semantic parsing result data packet are stored as cache entries and written to a hash table structure in the process memory (the upper limit of the cache capacity is configured to the fixed number of entries in the system configuration file, with a default value of 1000 entries; when this limit is exceeded, the earliest written entry is evicted according to the least recently used strategy).
[0102] When a subsequent exception is triggered, before starting the symbol signature layer parsing, the system first calculates the call path signature hash value of the current exception and the current bytecode structure index version identifier, and uses the two to form a lookup key, and performs an exact key-value lookup in the cache hash table.
[0103] If a match is found, the system further performs a cache validity check: extracts the bytecode structure index version identifier stored in the matched cache entry, performs a character-by-character equality comparison with the currently calculated version identifier, and determines that the two strings are consistent if their lengths are the same and the character values at each character position are the same; otherwise, they are determined to be inconsistent.
[0104] When the version identifiers match, the cache is deemed valid: the boundary layer positions and trusted anchor point distributions in the cache entries are loaded as the initial state for the current parsing. The full parsing process of the symbol feature layer and the topology layer is skipped. The business logic layer needs to re-verify in conjunction with the current logs and directly enter the directional completion step for non-anchored nodes within the boundary layer. The completion step uses the evidence set solidified at the current trigger time (including the content truncated from the current log buffer) to ensure that even if the paths are the same, the log buffer differences at the time of this exception trigger are fully included in the completion judgment. After the completion is completed, the completion result is merged with the cache state and encapsulated into a semantic parsing result data packet for output. The subsequent report structure constraint generation and negotiation assembly steps are exactly the same as in Example 1.
[0105] When the version identifiers are inconsistent, the cache is deemed invalid: the entry is deleted from the cache hash table, and the complete three-level parsing process is started; after the three-level parsing is completed and the semantic parsing result data packet is output, a new cache entry is written with the current composite cache key (including the latest version identifier) for subsequent abnormal reuse of the same path.
[0106] When a cache lookup fails, the complete three-layer parsing process is executed, and the cache entry is written after completion.
[0107] Through the above mechanism, in scenarios where similar anomalies are repeatedly triggered in the production environment, the system skips the first two layers of full parsing and only performs single-layer completion, which significantly reduces the amount of CPU computation compared to the full parsing path (the improvement is more obvious in scenarios with a deep call stack and a large number of topology nodes), and the delay in generating anomaly reports can be objectively and measurably shortened; the version-aware failure mechanism ensures that the cache is automatically rebuilt after code deployment and updates without sacrificing the accuracy of the parsing conclusions.
[0108] After calculating the difference in confidence indices between adjacent layers to form a gradient sequence, and before applying absolute and relative thresholds, a gradient normalization step is performed: for each element in the gradient sequence, the value of the element is divided by the sum of the total number of nodes in the corresponding two adjacent layers. The sum of the total number of nodes is equal to the sum of the total number of trusted anchor points and the total number of non-anchored nodes in the two layers. The normalized gradient sequence after replacing each element is used as the input for absolute and relative threshold determination. The sum of the total number of nodes in the two layers used for normalization is greater than zero in actual abnormal scenarios, and there is no case where the division by zero occurs. The absolute and relative thresholds are uniformly applied in the normalized numerical domain, without the need for separate configuration for different call stack depths. This ensures that the boundary determination conclusions remain comparable between abnormal events with large differences in the number of call stack frames, reducing boundary misjudgments caused by changes in call stack depth.
[0109] In this embodiment, after the gradient sequence calculation described in Embodiment 1 is completed and before the absolute threshold and relative threshold determination are performed, a gradient normalization step is inserted.
[0110] The normalization calculation process is as follows: For each element in the gradient sequence (corresponding to the confidence difference between a pair of adjacent layers), extract the total number of nodes in each of the two adjacent layers corresponding to that element. The total number of nodes is equal to the sum of the number of trusted anchor points and the number of non-anchored nodes in that layer. Add the total number of nodes in the two layers to obtain the total number of nodes in the two layers. Divide the original gradient difference by the total number of nodes in the two layers. The result is the normalized gradient value of the boundary between the layers, which means the average confidence change contributed by each node at the boundary between the layers. The total number of nodes in the two layers must be greater than zero in actual abnormal scenarios because: the call stack contains at least one frame, which corresponds to at least one symbolic feature layer node. Therefore, the number of symbolic feature layer nodes in the three parsing layers must be greater than zero, and there is no case where the total number of nodes in the two layers is zero. After performing the above operation on all elements of the gradient sequence, replace the original gradient sequence with the normalized gradient sequence. Subsequent absolute threshold scanning and relative threshold determination are both applied to the normalized gradient sequence.
[0111] To ensure the operability of the normalized threshold configuration, the default setting for the absolute threshold in the normalized scenario in the system configuration file is as follows: Analyze the historical abnormal sample set, calculate the normalized gradient value between each layer for each sample, and take the 70th value after sorting all normalized gradient values as the absolute threshold; when there is no historical data, 0.05 is used as the default value, indicating that an average change in confidence for each node exceeding 0.05 is considered a significant decrease. This value corresponds to the approximate result of dividing the absolute threshold of 0.4 by the typical call stack frame number of 8 in the non-normalized scenario in Example 1. The two configurations behave consistently in typical scenarios. The normalized relative threshold is also set to one-quarter of the absolute threshold.
[0112] The normalization mechanism makes the boundary determination results independent of the call stack frame size: between a typical call stack with 8 frames and a deep call stack with 40 frames, the same type of confidence pattern produces similar normalized gradient values. A fixed threshold can produce consistent determination behavior in both scenarios, reducing the misjudgment of deep stack anomalies as overall low resolvability due to changes in call stack depth, and improving the accuracy of boundary determination in diverse runtime environments.
[0113] Those skilled in the art will readily understand that the above description is merely a preferred embodiment of the present invention and is not intended to limit the present invention. Any modifications, equivalent substitutions, and improvements made within the spirit and principles of the present invention should be included within the scope of protection of the present invention.
Claims
1. An adaptive construction method for software anomaly reports based on deep semantic parsing, characterized in that, include: When an exception is triggered, the bytecode structure index, thread frame state sequence, and bounded log buffer are exported to form an evidence set. Based on the evidence set, the original abnormal information is parsed progressively by the symbol feature layer, the call topology layer, and the business logic layer. Semantic nodes that match the evidence set are marked as trusted anchors, and unmatched nodes are marked as non-anchored nodes. Based on the coverage of trusted anchors in each layer, the built-in confidence index of each layer is generated to form a hierarchical semantic representation structure. Based on the hierarchical semantic representation structure, the gradient sequence is formed by the change in confidence between adjacent layers. The effective parsing depth boundary is determined, bytecode indexing and directional completion are performed on non-anchored nodes in the boundary layer, and the semantic parsing result data packet is output. Based on the semantic parsing result data packet, the necessity markers, maximum semantic reference depth and non-anchored explicit annotation requirements of each report structure unit are obtained to form report structure constraint data; Based on the semantic parsing result data packets and report structure constraint data, non-anchored reference nodes are detected, and local supplementation requests are sent after merging according to the evidence missing type. The semantic increment returned each time is counted. Cross-unit conflict nodes are resolved based on the conclusion of the degree of coverage of trusted anchor points, and an adaptive anomaly report is output.
2. The adaptive construction method for software anomaly reports based on deep semantic parsing according to claim 1, characterized in that: The evidence set acquisition process includes: when an abnormal trigger signal is generated, exporting the bytecode structure index of the currently loaded class; for dynamically generated classes that cannot export complete bytecode records in the runtime environment using standard interfaces, the associated semantic nodes are directly marked as non-anchored nodes during parsing and the evidence missing type is recorded as unreachable method definition; Export the current thread frame state sequence, which includes the fully qualified name of the method, the bytecode execution location, and the local variable type information corresponding to each frame; extract the log buffer content within a preset time window before the exception trigger time, which includes the business operation identifier sequence, the timestamp corresponding to each identifier, and the thread identifier corresponding to each identifier. The evidence is solidified into an immutable set at the moment of the exception trigger, and no update operation is performed on the evidence set during the parsing process.
3. The adaptive construction method for software anomaly reports based on deep semantic parsing according to claim 1, characterized in that: The process of obtaining the hierarchical semantic representation structure includes: the symbol feature layer performs a matching search on the exception class name, method signature and bytecode line number in the stack frame with the type definition record in the bytecode structure index. Symbol nodes with corresponding type definition records in the index are marked as trusted anchors, and symbol nodes without corresponding records are marked as non-anchored nodes. The confidence index of the symbol feature layer is generated by the ratio of the number of trusted anchors to the number of non-anchored nodes in this layer. The topology layer constructs a directed call graph with stack frames as nodes in the call order. The call relationship represented by the directed edge is searched in the bytecode structure index for the corresponding method call instruction record. The nodes at both ends of the matched edge are marked as trusted anchors. The confidence index of the call topology layer is generated by the ratio of the number of trusted anchors to the number of non-anchored nodes. The business logic layer performs co-occurrence matching on the path segments in the directed call graph and the business operation identifier sequence in the bounded log buffer that is consistent with the thread identifier of the current exception in the time interval. It excludes business operation identifiers generated by other threads in the same time window, and marks the path nodes that have a co-occurrence relationship with the business operation identifier in the corresponding time interval as trusted anchors. The confidence index of the business logic layer is generated by the ratio of the number of trusted anchors in this layer to the number of non-anchored nodes. The three-layer confidence index and the annotation results of each layer node are encapsulated together into a hierarchical semantic representation structure.
4. The adaptive construction method for software anomaly reports based on deep semantic parsing according to claim 1, characterized in that: The semantic parsing result data packet acquisition process includes: calculating the difference between confidence indices between adjacent layers to form a gradient sequence, scanning the gradient sequence with a preset absolute threshold; if there is a difference exceeding the absolute threshold, the layer position with the largest difference magnitude is determined as the effective parsing depth boundary; if all differences are below the absolute threshold, the effective parsing depth boundary is extended to the deepest layer; if all differences exceed the absolute threshold and the difference between the maximum and minimum difference values is below a preset relative threshold, the effective parsing depth boundary is shrunk to the first layer, and a contextual evidence deficiency notification containing the evidence missing status of each layer is generated. After determining the effective parsing depth boundary, a targeted completion query is performed on each non-anchored node within the boundary layer in the bytecode structure index. The query scope includes the method definition records of the direct caller frame and the directly called frame corresponding to the non-anchored node. If the query is successful, the non-anchored node is upgraded to a trusted anchor and the confidence index of this layer is updated. If the query fails, the evidence missing type is recorded according to the ambiguity of the call relationship, the lack of type information, or the unreachability of the method definition. The hierarchical position markers of the effective parsing depth boundary, the annotation results of each layer of nodes, and the evidence missing type classification records are encapsulated into a semantic parsing result data packet for output.
5. The adaptive construction method for software anomaly reports based on deep semantic parsing according to claim 1, characterized in that: The report structure constraint data acquisition process includes: extracting the effective parsing depth boundary level position from the semantic parsing result data packet; generating labels for report structure units based on the boundary level: when the boundary is located at the symbol feature layer, the code location unit is marked as mandatory, and the business impact description unit is marked as prohibited; when the boundary is located at the call topology layer, extracting the number of connected components in the directed call graph; when the number of connected components is one, the single path root cause location unit is marked as mandatory, and when the number of connected components is greater than one, the multi-candidate path comparison unit is marked as mandatory; when the boundary is located at the business logic layer, counting the number of co-occurrence coverage nodes of the business operation identifier in the thread-consistent log buffer; when the number exceeds a preset diffusion threshold, the functional impact diffusion unit is marked as mandatory; and using the level where the effective parsing depth boundary is located as the maximum semantic layer depth that all report structure units are allowed to reference. Map each missing type in the evidence missing type classification record to the corresponding non-anchored explicit annotation requirements; encapsulate the set of necessity markers, the maximum semantic layer depth, and the set of non-anchored explicit annotation requirements into report structure constraint data output.
6. The adaptive construction method for software anomaly reports based on deep semantic parsing according to claim 1, characterized in that: The process involves detecting non-anchored reference nodes, merging them according to the type of missing evidence, sending partial supplementation requests, and calculating the semantic increment returned each time. Specifically, this includes grouping partial supplementation requests according to the code module corresponding to the non-anchored nodes referenced by each structural unit, executing requests pointing to different code modules concurrently, and merging requests pointing to the same code module into a single targeted query. After the parsing module returns the differential supplementary data, the number of newly added trusted anchors is counted as the semantic increment: when the semantic increment is greater than zero, the content of the corresponding structural unit is updated in a differential manner and the non-anchored reference detection is re-executed to enter the next round of request process; when the semantic increment is equal to zero, it is determined that the structural unit has reached the semantic limit of the current evidence set, and structured semantic boundary annotations are inserted at the positions of the remaining non-anchored content. The annotation content includes a standardized description of the corresponding evidence missing type, and the unit is marked as completed and the negotiation loop of the unit is terminated.
7. The adaptive construction method for software anomaly reports based on deep semantic parsing according to claim 1, characterized in that: The process of adaptive anomaly reporting output includes: After all structural units have completed negotiation, a mapping table is established from the identifiers of all cross-unit reference semantic nodes in the report to the parsing conclusions. Conflicts are detected where the same semantic node identifier has different parsing conclusions in different structural units. For each conflicting node, the number of trusted anchors associated with each of the two parsing conclusions is compared, and the parsing conclusion with more trusted anchors is taken as the final parsing conclusion. When the number of trusted anchors associated with each of the two conflicting parsing conclusions is equal, the parsing conclusion from the shallower parsing depth is taken as the final parsing conclusion. The content of conflict nodes in the relevant structural units is updated based on the final analysis results. The overall structural integrity of the report is verified. After confirming that the necessity markers of each structural unit have been satisfied, the adaptive report text, the analysis depth limitation declaration, and the conflict resolution record are packaged and output to form a complete adaptive anomaly report.