A software security model explanation method fusing feature attribution and semantic reasoning

By integrating feature attribution and semantic reasoning, a multi-view feature space and a security knowledge graph are constructed to generate a semantic explanation report. This solves the problems of unstable explanation and semantic missingness in existing technologies, and improves the understandability and stability of the explanation results.

CN122241697APending Publication Date: 2026-06-19NANJING UNIV OF INFORMATION SCI & TECH

Patent Information

Authority / Receiving Office
CN · China
Patent Type
Applications(China)
Current Assignee / Owner
NANJING UNIV OF INFORMATION SCI & TECH
Filing Date
2026-03-13
Publication Date
2026-06-19

AI Technical Summary

Technical Problem

Existing software security detection model interpretation methods lack semantic explanations, making them difficult for security personnel to understand, and the interpretation results are unstable and easily affected by adversarial examples.

Method used

The method integrates feature attribution and semantic reasoning. By constructing a multi-view feature space, it generates semantic explanation reports using an improved path integral gradient algorithm and a security domain knowledge graph, and introduces a stability verification mechanism.

Benefits of technology

It achieves a leap from data interpretation to semantic interpretation, improves the understandability and stability of interpretation results, enables rapid location of attack vectors, and enhances the usability of security models.

✦ Generated by Eureka AI based on patent content.

Smart Images

  • Figure CN122241697A_ABST
    Figure CN122241697A_ABST
Patent Text Reader

Abstract

This invention relates to a software security model interpretation method that integrates feature attribution and semantic reasoning, relating to the fields of cyberspace security and artificial intelligence interpretability. The method includes the following steps: acquiring a sample of software to be detected, extracting code structure features and behavioral sequence features, and constructing a hybrid feature space; inputting the hybrid features into a pre-trained software security detection model to obtain classification prediction results and the activation states of neurons within the model; calculating the original attribution values ​​of the input features to the prediction results based on an improved path integral gradient algorithm; constructing or loading a security domain knowledge graph, aggregating the original attribution values ​​to the semantic level, and generating a semantic interpretation report using graph reasoning; performing minimal perturbation on high-attribution features to generate counterfactual samples, verifying the fidelity of the interpretation results, and outputting the final interpretation result.
Need to check novelty before this filing date? Find Prior Art

Description

Technical Field

[0001] This invention relates to the fields of cyberspace security and the interpretability of artificial intelligence, specifically to a method for interpreting software security models that integrates feature attribution and semantic reasoning. Background Technology

[0002] With the widespread application of deep learning technology in software security fields such as malware detection, the accuracy of software security detection models has been greatly improved. However, most of these models are black-box models, and their decision-making processes are difficult for security personnel to understand, which brings great difficulties to the analysis and handling of software security.

[0003] Existing technological shortcomings:

[0004] Existing technology 1 (SHAP / LIME): This is a relatively common feature attribution method, but it is computationally intensive and can only output simple mathematical contribution information such as "API_CreateFile contribution is 0.8". It cannot explain in what context the API was called, that is, it cannot explain whether the API call is a normal file operation or a malicious behavior such as ransomware encryption, and lacks semantic explanation.

[0005] Existing technique 2 (gradient-based interpretation): is susceptible to gradient fragmentation and adversarial examples. For two similar malware, drastically different interpretations may be obtained, leading to unstable interpretation results and failing to provide reliable reference for security personnel.

[0006] This invention combines mathematical feature attribution with knowledge graphs in the security field, achieving a leap from "data interpretation" to "semantic interpretation," and introduces a stability verification mechanism, effectively solving the problems existing in the prior art. Summary of the Invention

[0007] Purpose of the invention: To provide a software security model interpretation method that integrates feature attribution and semantic reasoning, so as to solve the above-mentioned problems existing in the prior art.

[0008] Technical solution: A software security model interpretation method integrating feature attribution and semantic reasoning, comprising the following steps:

[0009] The process involves acquiring software samples for testing, extracting code structure features and behavioral sequence features, and constructing a hybrid feature space. Specifically, this includes extracting the control flow graph and generating graph embedding vectors at the static level, and running the samples in a sandbox environment at the dynamic level to capture API call sequences and parameters. The graph embedding vectors are then concatenated with the API sequence vectors to form a multi-view joint feature vector.

[0010] The mixed features are input into a pre-trained software security detection model to obtain classification prediction results and the activation state of neurons within the model.

[0011] Based on the improved path integral gradient algorithm, the original attribution value of the input features to the prediction result is calculated. The algorithm selects the all-zero vector or the average feature vector of benign software as the reference point and generates a non-linear interpolation path between the reference point and the input sample. This path avoids the adversarial sample region in the feature space. The integral of the model gradient is calculated along the interpolation path to obtain the attribution score of the feature, so as to eliminate the noise interference caused by the traditional linear path.

[0012] Construct or load a security domain knowledge graph, which contains a mapping relationship from low-level code features to high-level attack tactics;

[0013] Its construction method is to define a three-layer ontology structure: feature layer, behavior layer, and intent layer. It uses natural language processing technology to extract entities and relationships from security assessment reports and technical documents, and materializes the knowledge base of the standard attack framework and integrates it into the graph.

[0014] The original attribution values ​​are aggregated to the semantic level, and semantic explanation reports are generated using graph reasoning. Specifically, based on the "containment" and "trigger" relationships in the knowledge graph, the underlying features with high original attribution values ​​are propagated upwards, and the activation scores of behavioral and intent layer nodes are calculated. If the activation score of an intent layer node exceeds a preset threshold, it is determined to be the core attack intent that causes the model to make a malicious judgment.

[0015] Minimal perturbation is applied to high-attribution features to generate counterfactual samples, verifying the fidelity of the explanation results, and outputting the final explanation results;

[0016] The fidelity of the interpretation results is verified by calculating the "interpretation confidence index".

[0017] In a further embodiment, the extraction of the control flow graph and generation of graph embedding vectors at the static level specifically includes: disassembling the binary file using a disassembler (such as IDA Pro or Ghidra), extracting the function call graph, and converting it into a graph embedding vector using graph embedding methods.

[0018] In a further embodiment, the dynamic layer runs the sample in a sandbox environment to capture API call sequences and parameters, specifically including: running the sample in a malware analysis system (such as Cuckoo) sandbox, recording Windows API call sequences, and performing one-hot encoding or language text encoding (such as One-hot or Word2Vec encoding) on ​​the API call sequences.

[0019] In a further embodiment, the pre-trained software security detection model is a detection model based on a long short-term memory network and an attention mechanism.

[0020] In a further embodiment, the calculation formula for the improved path integral gradient algorithm is as follows:

[0021]

[0022] in, The value of the i-th feature at the reference point. Given the value of the i-th feature of the input sample, It is the k-th point on the path. Is the model in The output at the specified location, m is the number of sampling points in the path.

[0023] In a further embodiment, the extracted entities and relationships are cleaned and verified to remove redundant and erroneous information, ensuring the accuracy and completeness of the graph.

[0024] In a further embodiment, the generation of a semantic explanation report using graph reasoning further includes: formatting the generated semantic explanation report so that it is presented to security personnel in a clear and easy-to-understand manner.

[0025] In a further embodiment, the method further includes an interpretation visualization step: using the intensity of heatmap colors to indicate the magnitude of feature attribution values ​​on the control flow graph or disassembled code of the software to be detected, while displaying the generated semantic attack intent description in natural language in the sidebar.

[0026] In a further embodiment, a defense step against adversarial interpretation attacks is also included: before calculating the attribution value, Gaussian noise is added to the input sample for multiple inferences, the variance of the multiple inference attribution results is calculated, and if the variance is greater than the stability threshold, the sample may be subjected to adversarial perturbation, and the smoothed average attribution value is output.

[0027] In a further embodiment, the calculation method for the explanatory confidence index is as follows:

[0028] ECI = (Original predicted probability - Masked predicted probability) / (Sum of feature importance)

[0029] The masking operation refers to setting the top - K features with the highest attribution values ​​as the baseline values. If the ECI is lower than the preset threshold, the interpretation is deemed invalid, and a resampling process is initiated to correct the attribution values.

[0030] Beneficial Effects: This invention relates to a software security model interpretation method that integrates feature attribution and semantic reasoning, and has the following beneficial effects:

[0031] 1. Highly understandable: It transforms obscure mathematical values ​​into security semantics, assisting junior analysts in making quick judgments. By introducing a security domain knowledge graph, it achieves a leap from "data interpretation" to "semantic interpretation," enabling security personnel to better understand the model's decision-making process.

[0032] 2. High robustness: Improved integration path and stability checking mechanisms prevent malicious attackers from deceiving the interpreter through minor perturbations. The discrete path integration algorithm avoids noise interference from traditional linear paths, and counterfactual verification and anti-perturbation mechanisms ensure the stability and reliability of the interpretation results.

[0033] 3. Closed-loop verification: Counterfactual verification is introduced to ensure that the explanation results are not merely correlated, but also causally related. By performing minimal perturbation on high-attribution features, the fidelity of the explanation results is verified, thus improving the credibility of the explanation results. Attached Figure Description

[0034] Figure 1 This is a flowchart illustrating the software security model interpretation method that integrates feature attribution and semantic reasoning as described in this invention.

[0035] Figure 2 This is a simplified schematic diagram of the framework of Module 1: Multi-view Feature Extraction and Detection as described in an embodiment of the present invention.

[0036] Figure 3 This is a simplified schematic diagram of module two of the embodiments of the present invention: improved path integral gradient attribution.

[0037] Figure 4 This is a simplified schematic diagram of module three of the embodiments of the present invention: semantic mapping based on knowledge graphs.

[0038] Figure 5 This is a simplified schematic diagram of module four of the present invention: counterfactual verification and stability check. Detailed Implementation

[0039] In the following description, numerous specific details are set forth in order to provide a more thorough understanding of the invention. However, it will be apparent to those skilled in the art that the invention can be practiced without one or more of these details. In other instances, certain technical features well-known in the art have not been described in order to avoid obscuring the invention.

[0040] The purpose of this invention is to provide a software security model interpretation method that integrates feature attribution and semantic reasoning, which solves the problems of semantic missingness, difficulty in being understood by security personnel, and unstable interpretation results in existing interpretation methods, and significantly improves the usability of security models in actual adversarial environments.

[0041] The software security model interpretation method that integrates feature attribution and semantic reasoning, as involved in this invention, mainly includes the following steps:

[0042] The process involves acquiring software samples for testing, extracting code structure features and behavioral sequence features, and constructing a hybrid feature space. Specifically, this includes extracting the control flow graph and generating graph embedding vectors at the static level, and running the samples in a sandbox environment at the dynamic level to capture API call sequences and parameters. The graph embedding vectors are then concatenated with the API sequence vectors to form a multi-view joint feature vector.

[0043] The mixed features are input into a pre-trained software security detection model to obtain classification prediction results and the activation state of neurons within the model.

[0044] Based on the improved path integral gradient algorithm, the original attribution value of the input features to the prediction result is calculated. The algorithm selects the all-zero vector or the average feature vector of benign software as the reference point and generates a non-linear interpolation path between the reference point and the input sample. This path avoids the adversarial sample region in the feature space. The integral of the model gradient is calculated along the interpolation path to obtain the attribution score of the feature, so as to eliminate the noise interference caused by the traditional linear path.

[0045] Construct or load a security domain knowledge graph, which contains a mapping relationship from low-level code features to high-level attack tactics;

[0046] Its construction method is to define a three-layer ontology structure: feature layer, behavior layer, and intent layer. It uses natural language processing technology to extract entities and relationships from security assessment reports and technical documents, and materializes the knowledge base of standard attack frameworks (such as MITRE ATT&CK) and integrates it into the graph.

[0047] The original attribution values ​​are aggregated to the semantic level, and semantic explanation reports are generated using graph reasoning. Specifically, based on the "containment" and "trigger" relationships in the knowledge graph, the underlying features with high original attribution values ​​are propagated upwards, and the activation scores of behavioral and intent layer nodes are calculated. If the activation score of an intent layer node exceeds a preset threshold, it is determined to be the core attack intent that causes the model to make a malicious judgment.

[0048] Minimal perturbation is applied to high-attribution features to generate counterfactual samples, verifying the fidelity of the explanation results, and outputting the final explanation results;

[0049] The fidelity of the interpretation results is verified by calculating the "interpretation confidence index".

[0050] The process of extracting the control flow graph and generating graph embedding vectors at the static level specifically includes: disassembling the binary file using a disassembler, extracting the function call graph, and converting it into graph embedding vectors using graph embedding methods.

[0051] The dynamic layer involves running samples in a sandbox environment to capture API call sequences and parameters. Specifically, this includes running samples in a malware analysis system sandbox, recording Windows API call sequences, and performing one-hot encoding or language text encoding on the API call sequences.

[0052] The pre-trained software security detection model is a detection model based on long short-term memory network + attention mechanism.

[0053] The calculation formula for the improved path integral gradient algorithm is as follows:

[0054]

[0055] in, The value of the i-th feature at the reference point. Given the value of the i-th feature of the input sample, It is the k-th point on the path. Is the model in The output at the specified location, m is the number of sampling points in the path.

[0056] The extracted entities and relationships are cleaned and verified to remove redundant and erroneous information, ensuring the accuracy and completeness of the map.

[0057] The method of generating semantic interpretation reports using graph reasoning also includes: formatting the generated semantic interpretation reports to present them to security personnel in a clear and easy-to-understand manner.

[0058] The method also includes an interpretation and visualization step: using heatmap colors to indicate the magnitude of feature attribution values ​​on the control flow graph or disassembled code of the software to be detected, while displaying the generated semantic attack intent description in natural language in the sidebar.

[0059] It also includes defensive steps against adversarial interpretation attacks: before calculating the attribution value, Gaussian noise is added to the input sample for multiple inferences, the variance of the multiple inference attribution results is calculated, and if the variance is greater than the stability threshold, it indicates that the sample may have been subjected to adversarial perturbation and outputs the smoothed average attribution value.

[0060] The specific calculation method for the confidence index is as follows:

[0061] ECI = (Original predicted probability - Masked predicted probability) / (Sum of feature importance)

[0062] The masking operation refers to setting the top - K features with the highest attribution values ​​as the baseline values. If the ECI is lower than the preset threshold, the interpretation is deemed invalid, and a resampling process is initiated to correct the attribution values.

[0063] Furthermore, the implementation process of this invention mainly includes four core modules:

[0064] Module 1: Multi-view Feature Extraction and Detection

[0065] 1. Preprocessing: Use professional disassemblers such as IDA Pro or Ghidra to disassemble the binary file and convert the binary code into assembly code for subsequent feature extraction.

[0066] 2. Feature A (Structure):

[0067] Extracting the function call graph reveals the calling relationships between functions in the software and reflects the software's structural information.

[0068] Graph2Vec is used to transform function call graphs into graph embedding vectors. Graph2Vec is a graph embedding algorithm that can convert graph structure information into low-dimensional vector representations, facilitating subsequent model processing.

[0069] 3. Feature B (behavior):

[0070] Running samples in the Cuckoo Sandbox, an open-source automated malware analysis system, allows samples to run in an isolated environment and record their behavior.

[0071] Recording Windows API call sequences reflects the interaction between the software and the operating system, and is an important manifestation of the software's behavioral characteristics.

[0072] Perform one-hot or Word2Vec encoding on the API call sequence. One-hot encoding represents each API call as a one-hot vector, while Word2Vec encoding can represent API calls as vectors with semantic information.

[0073] 4. Model Inference: Feature A and feature B are concatenated to form a multi-view joint feature vector, which is then input into a detection model based on LSTM + Attention. LSTM (Long Short-Term Memory) can process sequential data, and the Attention mechanism can assign different weights to different parts of the input sequence, improving model performance. Through model inference, the malicious probability (e.g., 99%) is obtained.

[0074] Module 2: Improved Path Integral Gradient Attribution

[0075] Traditional integral gradients assume linear feature changes, but in software space, API changes are discontinuous. This embodiment employs discrete path integrals:

[0076] 1. Set the baseline point x′ as either a zero vector or the average eigenvector of benign software. A zero vector represents a state without any features, while the average eigenvector of benign software represents the feature distribution of normal software.

[0077] 2. Instead of linear interpolation between x′ and the input sample x, a "legal" transformation path is selected based on the co-occurrence probability matrix of the features. The co-occurrence probability matrix of the features records the probability of different features appearing simultaneously. By selecting a "legal" transformation path, the generation of code feature combinations that do not exist in nature can be avoided.

[0078] 3. The calculation formula is approximately as follows:

[0079]

[0080] in, The value of the i-th feature at the reference point. Given the value of the i-th feature of the input sample, It is the k-th point on the path. Is the model in The output at point m is the number of sampling points along the path. This formula calculates the attribution value of each feature to the prediction result by integrating the gradient along the path.

[0081] Module 3: Semantic Mapping Based on Knowledge Graphs

[0082] 1. Constructing the graph:

[0083] Define a three-layer ontology structure: feature layer, behavior layer, and intent layer. The feature layer contains low-level code features, the behavior layer describes the specific behaviors of the software, and the intent layer corresponds to high-level attack tactics.

[0084] Natural Language Processing (NLP) technology is used to extract entities and relationships from security assessment reports and technical documents. NLP technology can analyze and process text information to extract entities and relationships.

[0085] The knowledge base of standard attack frameworks (such as MITRE ATT&CK) is materialized and integrated into the graph. MITRE ATT&CK is a widely used attack framework that includes various attack tactics and techniques; integrating it into the graph can enhance its authority and usability.

[0086] The extracted entities and relationships are cleaned and verified to remove redundant and erroneous information, ensuring the accuracy and completeness of the map.

[0087] This invention integrates standard frameworks such as MITRE, ATT&CK, and CK into the graph, utilizes NLP to extract entity-relation triples from security reports, and ensures the integrity of the graph (accuracy >95%) through cleaning and verification. This structure emphasizes semantic hierarchy, avoids the rigid limitations of existing technologies (such as rule-based interpretation), and supports dynamic reasoning.

[0088] 2. Semantic aggregation:

[0089] The feature scores obtained from input module two.

[0090] Based on the "containment" and "trigger" relationships in the knowledge graph, the underlying features with high original attribution values ​​are propagated upwards to calculate the activation scores of nodes in the behavior layer and intent layer.

[0091] If the activation score of a certain intent layer node exceeds a preset threshold, it is determined to be the core attack intent that causes the model to make a malicious judgment.

[0092] The propagation algorithm employs a variant of a graph neural network, iteratively updating nodes to achieve semantic reasoning from lower levels (such as API calls) to higher levels (such as "persistent tactics"). This process emphasizes causal chains; for example, highly attributable operational behaviors can be inferred as "command and control" intents, far exceeding the isolated feature analysis of existing gradient methods and providing actionable security insights. If the activation score of a node at the intent layer exceeds a preset threshold, it is identified as the core attack intent that leads the model to make a malicious judgment.

[0093] 3. Output Generation: The generated semantic interpretation report is formatted to present it to security personnel in a clear and easy-to-understand manner. For example, the model no longer outputs "Features 45 and 89 are important", but instead outputs "Ransomware behavior detected, based on high-frequency encryption function calls and file traversal operations".

[0094] This semantic output supports visualization integration (such as heatmaps + natural language sidebars), which can improve interpretation accuracy by 30% in actual deployments, helping junior analysts quickly locate attack vectors.

[0095] Module 4: Counterfactual Verification and Stability Check

[0096] To prevent "interpretive illusion" (i.e., the model focuses on irrelevant features but happens to make the correct prediction):

[0097] Top-K masking test: Remove or replace the most important features (such as encrypted APIs) in the original sample with harmless APIs to generate counterfactual samples.

[0098] Further reasoning: Input the modified sample back into the detection model, obtain the mask, and then predict the probability.

[0099] Judgment: Calculate the "Explanation Confidence Index (ECI)", ECI = (Original Predicted Probability - Masked Predicted Probability) / (Sum of Feature Importance). If the ECI is lower than a preset threshold, the explanation is deemed invalid, and a resampling process is initiated to correct the attribution value.

[0100] Perturbation Resistance: Before calculating the attribution value, Gaussian noise is added to the input sample for multiple inferences, and the variance of the attribution results from these multiple inferences is calculated. If the variance is greater than the stability threshold, it indicates that the sample may have suffered adversarial perturbations, and the smoothed average attribution value is output.

[0101] To further illustrate the effectiveness of the present invention, specific embodiments are described below:

[0102] Experimental environment:

[0103] Operating System: Windows 10

[0104] Processor: Intel Core i7 - 8700K

[0105] Memory: 16GB

[0106] Programming language: Python 3.7

[0107] Deep learning framework: TensorFlow 2.0

[0108] Experimental data:

[0109] A publicly available malware dataset was used, containing 10,000 malware samples and 10,000 benign software samples. For comprehensive evaluation, this experiment introduced an adversarial sample subset (with added minor perturbations, such as API parameter noise) to simulate real-world attack scenarios.

[0110] Experimental results:

[0111] Compared with existing technologies (SHAP / LIME), the method of this invention significantly improves semantic interpretation capability, interpretation result stability, and robustness. Specific data are as follows (based on F1-score for interpretation accuracy, variance for stability, and fidelity under adversarial examples):

[0112] method Semantic interpretation ability (score, 0-100) Interpretation of results stability (variance) SHAP / LIME 30 0.2 Gradient-based interpretations, such as Integrated Gradients 45 0.15

[0113] While gradient-based saliency maps or ensemble gradients can capture gradient-sensitive features, they are susceptible to gradient fragmentation and adversarial perturbations. In adversarial example testing, their stability variance reaches as high as 0.15, leading to large fluctuations in the interpretation results of similar malware (e.g., for the same ransomware sample, the API attribution dropped from 0.8 to 0.3). This invention reduces the variance to 0.05 and improves fidelity by 22% through an improved nonlinear path integral algorithm and anti-perturbation mechanism, demonstrating its superior robustness in the field of software security. Simultaneously, the semantic reasoning module shifts the interpretation from "gradient heatmap" to "intent chain description," improving the score by 75% and effectively solving the "black box gradient discontinuity" problem of existing technology 2. Experiments show that this method can shorten security assessment time by 40% in actual deployment.

[0114] The experimental results show that the method of the present invention can effectively solve the problems existing in the prior art and has high practicality.

[0115] This invention effectively solves the technical problems of semantic loss, gradient fragmentation, and unstable interpretation in existing technologies (such as SHAP, LIME, or traditional gradient methods) through the following three aspects:

[0116] 1. Integrating a Semantic Attribution Mechanism with Security Domain Knowledge Graphs (Solving the "Semantic Missing" Problem): Existing technologies mostly only output the mathematical contribution of features (e.g., "API_CreateFile contribution value 0.8"), failing to explain their specific malicious intent. The innovation of this invention lies in constructing a knowledge graph with a three-layer ontology structure of "feature layer - behavior layer - intent layer," and incorporating standard attack frameworks (such as MITRE ATT&CK) into it. By defining the relationships of "containment" and "trigger," it creatively propagates and aggregates the underlying mathematical attribution values ​​upwards, achieving a leap from simple "numerical interpretation" to "semantic interpretation." This directly tells security personnel that the model's determination of malice is based on "detected ransomware encryption behavior" rather than simple code snippets.

[0117] 2. Improved Nonlinear Path Integral Gradient Algorithm (Solving the "Gradient Noise and Discontinuity" Problem): Given the discrete and discontinuous nature of software features, traditional linear interpolation paths are prone to noise interference. This invention creatively proposes an improved algorithm:

[0118] Nonlinear path generation: Instead of integrating along a straight line, a "legitimate" path is selected based on the feature co-occurrence probability matrix to avoid adversarial sample regions in the feature space and ensure that the interpolation points are semantically reasonable in the software.

[0119] Discretization: For discrete features such as API calls, the gradient fragmentation problem in traditional gradient methods is effectively eliminated by sampling and integrating the model gradient along the path, which significantly improves the accuracy of attribution.

[0120] 3. Closed-Loop Verification of Explanation Fidelity Based on Counterfactual Perturbation (Solving the "Explanation Unreliability" Problem): Existing explanation methods lack verification mechanisms and are prone to "explanation illusions." This invention creatively introduces a closed-loop verification step:

[0121] Explanation of the confidence index: A specific quantitative index is proposed, which generates counterfactual samples by performing a Top-K masking operation on high-attribution features, and calculates the effectiveness of the explanation by comparing the changes in the predicted probabilities.

[0122] Disturbance Resistance Defense: Gaussian noise is introduced for multiple inferences and variance calculations before attribution calculation, proactively detecting and defending against adversarial interpretation attacks. This closed-loop "interpretation-verification-correction" mechanism ensures that the interpretation results are not only relevant but also causal and robust.

[0123] 4. Innovative Applications of Semantic Reasoning: This method's three-layer knowledge graph not only achieves semantic attribution but also supports cross-domain reasoning (such as from API to MITRE tactics), enhancing the depth and breadth of interpretation based on existing technologies. Experiments show that this mechanism achieves a semantic accuracy of 85% in complex attack scenarios (such as APT persistence), far exceeding the numerical interpretation of existing technologies, helping security teams shift from passive detection to proactive defense.

[0124] As described above, although the invention has been shown and described with reference to specific preferred embodiments, it should not be construed as limiting the invention itself. Various changes in form and detail may be made without departing from the spirit and scope of the invention as defined in the appended claims.

Claims

1. A software security model interpretation method that integrates feature attribution and semantic reasoning, characterized by: Includes the following steps: The process involves acquiring software samples for testing, extracting code structure features and behavioral sequence features, and constructing a hybrid feature space. Specifically, this includes extracting the control flow graph and generating graph embedding vectors at the static level, and running the samples in a sandbox environment at the dynamic level to capture API call sequences and parameters. The graph embedding vectors are then concatenated with the API sequence vectors to form a multi-view joint feature vector. The mixed features are input into a pre-trained software security detection model to obtain classification prediction results and the activation state of neurons within the model. Based on the improved path integral gradient algorithm, the original attribution value of the input features to the prediction result is calculated; The algorithm selects the zero vector or the average feature vector of benign software as the reference point, and generates a non-linear interpolation path between the reference point and the input sample. This path avoids adversarial sample regions in the feature space. The integral of the model gradient is calculated along the interpolation path to obtain the attribution score of the feature, so as to eliminate the noise interference caused by the traditional linear path. Construct or load a security domain knowledge graph, which contains a mapping relationship from low-level code features to high-level attack tactics; Its construction method is to define a three-layer ontology structure: feature layer, behavior layer, and intent layer. It uses natural language processing technology to extract entities and relationships from security assessment reports and technical documents, and materializes the knowledge base of the standard attack framework and integrates it into the graph. The original attribution values ​​are aggregated to the semantic level, and semantic explanation reports are generated using graph reasoning. Specifically, based on the "containment" and "trigger" relationships in the knowledge graph, the underlying features with high original attribution values ​​are propagated upwards, and the activation scores of the behavior layer and intent layer nodes are calculated. If the activation score of an intent layer node exceeds a preset threshold, it is determined to be the core attack intent that causes the model to make a malicious judgment. Minimal perturbation is applied to high-attribution features to generate counterfactual samples, verifying the fidelity of the explanation results, and outputting the final explanation results; The fidelity of the interpretation results is verified by calculating the "interpretation confidence index".

2. The software security model interpretation method integrating feature attribution and semantic reasoning according to claim 1, characterized in that: The process of extracting the control flow graph and generating graph embedding vectors at the static level specifically includes: disassembling the binary file using a disassembler, extracting the function call graph, and converting it into graph embedding vectors using graph embedding methods.

3. The software security model interpretation method integrating feature attribution and semantic reasoning according to claim 1, characterized in that: The dynamic layer involves running samples in a sandbox environment to capture API call sequences and related parameters. Specifically, this includes running samples in a malware analysis system sandbox, recording Windows API call sequences, and performing one-hot encoding or language text encoding on the API call sequences.

4. The software security model interpretation method integrating feature attribution and semantic reasoning according to claim 1, characterized in that: The pre-trained software security detection model is a detection model based on long short-term memory network + attention mechanism.

5. The software security model interpretation method integrating feature attribution and semantic reasoning according to claim 1, characterized in that: The calculation formula for the improved path integral gradient algorithm is as follows: in, The value of the i-th feature at the reference point. Given the value of the i-th feature of the input sample, It is the k-th point on the path. Is the model in The output at the specified location, m is the number of sampling points in the path.

6. The software security model interpretation method integrating feature attribution and semantic reasoning according to claim 1, characterized in that: The extracted entities and relationships are cleaned and verified to remove redundant and erroneous information, ensuring the accuracy and completeness of the map.

7. The software security model interpretation method integrating feature attribution and semantic reasoning according to claim 1, characterized in that: The method of generating semantic interpretation reports using graph reasoning also includes: formatting the generated semantic interpretation reports to present them to security personnel in a clear and easy-to-understand manner.

8. The software security model interpretation method integrating feature attribution and semantic reasoning according to claim 1, characterized in that: The method also includes an interpretation and visualization step: using heatmap colors to indicate the magnitude of feature attribution values ​​on the control flow graph or disassembled code of the software to be detected, while displaying the generated semantic attack intent description in natural language in the sidebar.

9. The software security model interpretation method integrating feature attribution and semantic reasoning according to claim 1, characterized in that: It also includes defensive steps against adversarial interpretation attacks: before calculating the attribution value, Gaussian noise is added to the input sample for multiple inferences, the variance of the multiple inference attribution results is calculated, and if the variance is greater than the stability threshold, it indicates that the sample may have been subjected to adversarial perturbation and outputs the smoothed average attribution value.

10. The software security model interpretation method integrating feature attribution and semantic reasoning according to claim 1, characterized in that: The specific calculation method for the confidence index is as follows: ECI = (Original predicted probability - Masked predicted probability) / (Sum of feature importance) The masking operation refers to setting the top - K features with the highest attribution values ​​as the baseline values. If the ECI is lower than the preset threshold, the interpretation is deemed invalid, and a resampling process is initiated to correct the attribution values.