A software vulnerability exploit risk prediction method and system

By integrating the temporal awareness mechanism of the learning prediction model with multidimensional feature extraction and dynamic weight adjustment, the limitations of the CVSS scoring system are overcome, achieving high-precision prediction of software vulnerability exploitation risks and adapting to changes in different network environments and threat intelligence.

CN122241710APending Publication Date: 2026-06-19QI AN XIN TECHNOLOGY GROUP INC +1

Patent Information

Authority / Receiving Office
CN · China
Patent Type
Applications(China)
Current Assignee / Owner
QI AN XIN TECHNOLOGY GROUP INC
Filing Date
2026-02-05
Publication Date
2026-06-19

AI Technical Summary

Technical Problem

In existing technologies, the CVSS scoring system suffers from limitations in software vulnerability risk assessment due to its single-dimensional assessment, static assessment model, lack of threat intelligence fusion, and poor environmental adaptability, resulting in inaccurate prediction of vulnerability exploitation risks.

Method used

A time-aware prediction mechanism using an ensemble learning prediction model is employed. Through a multi-dimensional feature extraction engine, static features, time-series features, threat intelligence features, and environmental context features are extracted from software vulnerabilities and external threat intelligence sources. Risk prediction is performed by combining random forest, gradient boosting, XGBoost, neural networks, and LSTM time-series predictors, dynamically adjusting model weights, and integrating multi-time-window predictions.

Benefits of technology

It achieves accurate prediction of software vulnerability exploitation risks, can dynamically reflect risk change trends, takes into account personalized network environments and threat intelligence, and improves the accuracy and adaptability of prediction.

✦ Generated by Eureka AI based on patent content.

Smart Images

  • Figure CN122241710A_ABST
    Figure CN122241710A_ABST
Patent Text Reader

Abstract

This invention discloses a method for predicting software vulnerability exploitation risks, used in a software vulnerability exploitation risk prediction system. The system includes a multi-dimensional feature extraction engine and an ensemble learning prediction model. The method includes: acquiring software vulnerability information to be predicted and multiple external threat intelligence sources; using the multi-dimensional feature extraction engine to extract multi-dimensional features from the software vulnerability information and the multiple external threat intelligence sources, wherein the multi-dimensional features include at least: static features, temporal features, threat intelligence features, and environmental context features; inputting the multi-dimensional features into the ensemble learning prediction model; and predicting the probability of software vulnerability exploitation risk based on the temporal awareness prediction mechanism of the ensemble learning prediction model. This method, employing the temporal awareness prediction mechanism of the ensemble learning prediction model and based on multi-dimensional features, can accurately predict the exploitation risk of software vulnerabilities.
Need to check novelty before this filing date? Find Prior Art

Description

Technical Field

[0001] This invention relates to the field of network security technology, and in particular to a method and system for predicting software vulnerability exploitation risks. Background Technology

[0002] As cybersecurity threats become increasingly complex, the number of software vulnerabilities discovered and disclosed is growing exponentially. Statistics show that more than 20,000 new CVE vulnerabilities are discovered each year, but only about 2-7% of these vulnerabilities are actually exploited.

[0003] In existing technologies, vulnerability management methods mainly rely on the CVSS scoring system for risk assessment, but this method has the following drawbacks: 1. Limitations of single-dimensional assessment: CVSS scores are based solely on the technical characteristics of vulnerabilities and do not consider other valid information.

[0004] 2. Static evaluation mode: cannot dynamically reflect the changing trend of vulnerability exploitation risk over time.

[0005] 3. Insufficient threat intelligence fusion: There is a lack of effective threat intelligence feature extraction and fusion mechanisms, making it impossible to fully utilize external threat information.

[0006] 4. Poor environmental adaptability: It fails to consider the individualized factors such as the network environment, security control measures, and business criticality of different organizations.

[0007] Therefore, there is an urgent need to provide a software vulnerability exploitation risk prediction solution that can solve at least one of the above technical problems. Summary of the Invention

[0008] This invention provides a method and system for predicting software vulnerability exploitation risks. It employs a time-aware prediction mechanism based on an ensemble learning prediction model and multi-dimensional features to accurately predict the exploitation risks of software vulnerabilities.

[0009] On one hand, embodiments of the present invention provide a method for predicting software vulnerability exploitation risks, used in a software vulnerability exploitation risk prediction system. The software vulnerability exploitation risk prediction system includes: a multi-dimensional feature extraction engine and an ensemble learning prediction model. The method includes: Obtain information on software vulnerabilities to be predicted and multiple external threat intelligence sources; A multidimensional feature extraction engine is used to extract multidimensional features from the software vulnerability information and the multiple external threat intelligence sources. The multidimensional features include at least: static features, temporal features, threat intelligence features, and environmental context features. The multidimensional features are input into an ensemble learning prediction model, and the probability of exploitation of the software vulnerability is predicted based on the time-aware prediction mechanism of the ensemble learning prediction model.

[0010] Furthermore, the multi-dimensional feature extraction engine includes a threat intelligence feature fusion unit, which is used to fuse threat intelligence from the multiple external threat intelligence sources; the method further includes: The threat intelligence feature fusion ... Threat actor analysis, threat activity analysis, and exploitation capability assessment are performed on the fused data, and a threat intelligence feature vector is calculated based on the feature fusion algorithm. The threat intelligence feature vector is a set of multiple quantitative indicators, which include at least: APT organization interest, number of mentions on the dark web, weaponization probability, ransomware association, and PoC availability.

[0011] Furthermore, the threat actor analysis includes: APT organization identification and calculation of APT organization interest level; Criminal organization identification and calculation of interest in criminal organizations; Identify national-level threats and calculate national-level threat scores; The threat activity analysis includes: The analysis includes mentions of the dark web and the calculation of a dark web popularity score. Social media monitoring and calculation of social media popularity; Detect and analyze the vulnerability exploitation toolkits and their contents. Ransomware association analysis and calculation of ransomware association degree; The utilization capability assessment and analysis includes: PoC code usability analysis and calculation of PoC usability score; Weaponization assessment and analysis, and calculation of weaponization probability; The multidimensional feature extraction engine also includes: a static feature extractor, a temporal feature builder, and an environmental context feature extractor; The static feature extractor is used at least for: Based on the CVSS vector parsing algorithm, basic features are extracted, including: attack vector AV, attack complexity AC, and permission requirement PR. Based on the CWE vulnerability type mapping, a correlation model between vulnerability type and exploitation difficulty is established; Analyze the affected products and calculate supplier concentration and product coverage characteristics; The time-series feature constructor is used at least for: Calculate vulnerability timing, construct patch availability time windows, and leverage maturity scores and update frequency characteristics; The environmental context feature extractor is used at least for: Network exposure assessment, business criticality calculation, and security control effectiveness calculation; The static features include: CVSS base score, impact score, availability score, CWE category, number of affected products, number of affected suppliers, number of reference links, description length, remote access, certification requirements, and complexity level; The time-series characteristics include: number of days since vulnerability release, number of days since patch availability, exploit maturity, update frequency, age classification, seasonality factor, time decay factor, remediation confidence, and report confidence. The threat intelligence features include: APT group interest level, dark web mention frequency, weaponization potential, state-level threats, criminal organization interest, exploit kits, ransomware associations, PoC availability, and social media popularity. The environmental context features include: asset exposure score, business criticality, network reachability, security control effectiveness, patch deployment difficulty, system redundancy, data sensitivity, and compliance impact.

[0012] Furthermore, the ensemble learning prediction model includes at least: a random forest predictor, a gradient boosting predictor, an XGBoost predictor, a neural network predictor, and an LSTM time series predictor; wherein, the random forest predictor is used to handle nonlinear feature relationships and feature importance ranking; the gradient boosting predictor is used to optimize prediction accuracy and handle feature interactions; the XGBoost predictor is used to process large-scale data with a data volume reaching a first set value to prevent overfitting; the neural network predictor is used to learn nonlinear patterns with a complexity reaching a second set value; the LSTM time series predictor is used to capture time series dependencies; the ensemble strategies of each predictor in the ensemble learning prediction model include: a voting ensemble strategy, a stacking ensemble strategy, and a weighted ensemble strategy; the ensemble learning prediction model adopts an adaptive weight allocation algorithm and dynamically adjusts the weights of each predictor based on the model's prediction diversity and accuracy.

[0013] Furthermore, the time-aware prediction mechanism of the ensemble learning prediction model uses the following method to predict software vulnerability exploitation risks: Obtain basic information and historical time-series data of the software vulnerability. The basic information includes: the release date, last modification date, and CVSS score of the vulnerability. The historical time-series data is data related to the vulnerability that changes over time, including: the appearance time of the exploit code PoC, the number of times it is mentioned on the dark web, and social media popularity data. Construct time-series features, which include: number of days since vulnerability release, number of days since the last modification of the vulnerability, number of days since patch availability, vulnerability age classification, vulnerability information update frequency, and seasonality factor; The model utilizes maturity evolution modeling, where a base score is calculated based on the public state of the exploit code, with a higher base score indicating a more mature exploit technique. The base score is combined with external threat intelligence to assess the likelihood of the vulnerability being weaponized for actual attacks. An exponential decay function is applied to simulate the phenomenon of vulnerability popularity naturally decaying over time, and the decay rate is controlled based on the decay constant. A multi-time-window prediction calculation is performed based on a rule-based and attenuation factor method, wherein the attenuation factor increases with the increase of the prediction time range to increase the risk weight of long-term prediction; the base prediction value is multiplied by the attenuation factor to obtain the adjusted prediction probability; and a pruning function is used to ensure that the prediction probability value is within the effective range. Prediction based on LSTM time series model: Historical feature data is processed into a fixed-length sequence, which is used as the input sequence of the LSTM model. A pre-trained Long Short-Term Memory (LSTM) network is used to predict the input sequence and output the prediction result. The prediction result is the future utilization probability sequence. Based on the prediction result, the future trend of utilization risk is calculated. Prediction result fusion: The baseline prediction results of multiple time windows are multiplied by the prediction weights of the LSTM model and then fused. Based on the fused prediction results and the uncertainty of each model, the confidence score is calculated. Output the prediction vector and confidence score of the exploit probability of the software vulnerability in each time window.

[0014] Furthermore, the ensemble learning prediction model also includes: a time-series data collector, a feature extraction and fusion module, a time-series prediction engine, an adaptive weight adjuster, a confidence quantifier, and a feedback loop module. The multi-dimensional feature vector is input into the ensemble learning prediction model, and the exploitation risk probability of the software vulnerability is predicted based on the time-series-aware prediction mechanism of the ensemble learning prediction model, including: Multi-source time-series data is collected through a time-series data collector, which includes: vulnerability disclosure timestamps, exploitation event sequences, and APT interest changes. The data collector supports real-time streaming input of data. The feature extraction and fusion module extracts time-related features and fuses non-time-related features, and uses an attention mechanism to highlight key times. The time-related features include vulnerability age distribution and exploitation time series, and the non-time-related features include CVSS scores and APT organization interest. The time-series prediction engine performs multi-step predictions based on the input time-series vector, outputting the probability of vulnerability exploitation over multiple future time windows; and / or, The method further includes: Reinforcement learning or gradient descent algorithms based on adaptive weight adjusters dynamically adjust model weights; By integrating a confidence quantifier calculated using Bayesian methods or confidence intervals, the uncertainty score of the prediction is output to assess the reliability of the prediction. The prediction results are fed back into the model training via the feedback loop module for online learning.

[0015] Furthermore, the APT organization interest level is represented by the APT organization interest score, calculated using the following formula: APT_interest_score = Σ(confidence_level_i × actor_sophistication_i) / n Where, confidence_level_i: the confidence level for the i-th APT organization, with a value between 0 and 1, used to characterize the credibility of the intelligence source of the APT organization; actor_sophistication_i: The complexity or capability score of the i-th APT organization, used to quantify the organization's attack technology level, resources and historical behavior, with a value between 0 and 1; Σ: summation over all relevant APT organizations; n: number of APT organizations; APT_interest_score: The interest score of the APT organization. The higher the score, the greater the interest. The value ranges from 0 to 1.

[0016] On the other hand, embodiments of the present invention provide a software vulnerability exploitation risk prediction system, including a multi-dimensional feature extraction engine and an ensemble learning prediction model, the system comprising: The data acquisition module is used to obtain information on software vulnerabilities to be predicted and multiple external threat intelligence sources. A multi-dimensional feature extraction engine is used to extract multi-dimensional features from the software vulnerability information and the multiple external threat intelligence sources. The multi-dimensional features include at least: static features, temporal features, threat intelligence features, and environmental context features. An ensemble learning prediction model is used to predict the probability of exploitation of the software vulnerability based on a time-aware prediction mechanism using the multidimensional features of the input.

[0017] Thirdly, embodiments of the present invention provide an electronic device, including: a memory and a processor, wherein the memory and the processor are connected; Memory, used to store computer programs; A processor is used to invoke a computer program stored in memory to perform any of the above methods.

[0018] Fourthly, embodiments of the present invention provide a computer-readable storage medium having a computer program stored thereon, wherein the computer program, when run by a computer, performs any of the methods described above.

[0019] This invention discloses a method and system for predicting software vulnerability exploitation risks. The system includes a multi-dimensional feature extraction engine and an ensemble learning prediction model. The method includes: acquiring software vulnerability information to be predicted and multiple external threat intelligence sources; using the multi-dimensional feature extraction engine to extract multi-dimensional features from the software vulnerability information and the multiple external threat intelligence sources, wherein the multi-dimensional features include at least: static features, temporal features, threat intelligence features, and environmental context features; inputting the multi-dimensional features into the ensemble learning prediction model; and predicting the probability of software vulnerability exploitation risk based on the temporal awareness prediction mechanism of the ensemble learning prediction model. This method and system, employing the temporal awareness prediction mechanism of the ensemble learning prediction model and based on multi-dimensional features, can accurately predict the exploitation risk of software vulnerabilities.

[0020] Other features and advantages of the invention will be set forth in the description which follows, and will be apparent in part from the description, or may be learned by practicing the invention. The objects and other advantages of the invention may be realized and obtained by means of the structures particularly pointed out in the written description, claims, and drawings.

[0021] The technical solution of the present invention will be further described in detail below with reference to the accompanying drawings and embodiments. Attached Figure Description

[0022] The accompanying drawings are provided to further illustrate the invention and form part of the specification. They are used together with the embodiments of the invention to explain the invention and do not constitute a limitation thereof.

[0023] Figure 1 A flowchart illustrating a software vulnerability exploitation risk prediction method provided in an embodiment of the present invention; Figure 2 A block diagram of a software vulnerability exploitation risk prediction system provided in an embodiment of the present invention; Figure 3 A block diagram of another software vulnerability exploitation risk prediction system provided in an embodiment of the present invention. Detailed Implementation

[0024] The embodiments of the present invention will be described below with reference to the accompanying drawings. It should be understood that the embodiments described herein are for illustration and explanation only and are not intended to limit the present invention.

[0025] The following is a brief explanation of the proper nouns and terms used in this patent application: APT Interest: This refers to the degree of potential interest or attention a specific APT group (usually a country-sponsored or highly organized cyber threat actor) has in a particular vulnerability, system, or target. As a quantitative indicator, it is primarily used to measure the likelihood that an APT group might exploit the vulnerability for attacks. It is calculated based on threat intelligence data (such as historical attack patterns, organizational motivations, geopolitical factors, etc.) to help predict whether a vulnerability will become a high-risk target.

[0026] The technical solution of this patent application will be explained and illustrated through specific embodiments below.

[0027] This invention provides a method for predicting software vulnerability exploitation risks, such as... Figure 1 As shown, for use as Figure 2 The software vulnerability exploitation risk prediction system shown includes a multi-dimensional feature extraction engine and an ensemble learning prediction model. The method includes steps 101-103. Step 101: Obtain information on the software vulnerabilities to be predicted and multiple external threat intelligence sources; Step 102: Use a multi-dimensional feature extraction engine to extract multi-dimensional features from the software vulnerability information and the multiple external threat intelligence sources. The multi-dimensional features include at least: static features, temporal features, threat intelligence features, and environmental context features. Step 103: Input the multidimensional features into the ensemble learning prediction model, and predict the probability of exploitation of the software vulnerability based on the time-aware prediction mechanism of the ensemble learning prediction model.

[0028] In one embodiment, the multidimensional feature extraction engine includes: a threat intelligence feature fusion unit, which is used to fuse threat intelligence from the multiple external threat intelligence sources; the method further includes: The threat intelligence feature fusion ... Threat actor analysis, threat activity analysis, and exploitation capability assessment are performed on the fused data, and a threat intelligence feature vector is calculated based on the feature fusion algorithm. The threat intelligence feature vector is a set of multiple quantitative indicators, which include at least: APT organization interest, number of mentions on the dark web, weaponization probability, ransomware association, and PoC availability.

[0029] The threat actor analysis in the above embodiments may include: APT organization identification and calculation of APT organization interest level; Criminal organization identification and calculation of interest in criminal organizations; Identify national-level threats and calculate national-level threat scores; The threat activity analysis in the above embodiments may include: The analysis includes mentions of the dark web and the calculation of a dark web popularity score. Social media monitoring and calculation of social media popularity; Detect and analyze the vulnerability exploitation toolkits and their contents. Ransomware association analysis and calculation of ransomware association degree; The utilization capability assessment analysis in the above embodiments includes: PoC code usability analysis and calculation of PoC usability score; Weaponization assessment and analysis, and calculation of weaponization probability; The multidimensional feature extraction engine also includes: a static feature extractor, a temporal feature builder, and an environmental context feature extractor; The static feature extractor is used at least for: Based on the CVSS vector parsing algorithm, basic features are extracted, including: attack vector AV, attack complexity AC, and permission requirement PR. Based on the CWE vulnerability type mapping, a correlation model between vulnerability type and exploitation difficulty is established; Analyze the affected products and calculate supplier concentration and product coverage characteristics; The time-series feature constructor is used at least for: Calculate vulnerability timing, construct patch availability time windows, and leverage maturity scores and update frequency characteristics; The environmental context feature extractor is used at least for: Network exposure assessment, business criticality calculation, and security control effectiveness calculation; The static features include: CVSS base score, impact score, availability score, CWE category, number of affected products, number of affected suppliers, number of reference links, description length, remote access, certification requirements, and complexity level; The time-series characteristics include: number of days since vulnerability release, number of days since patch availability, exploit maturity, update frequency, age classification, seasonality factor, time decay factor, remediation confidence, and report confidence. The threat intelligence features include: APT group interest level, dark web mention frequency, weaponization potential, state-level threats, criminal organization interest, exploit kits, ransomware associations, PoC availability, and social media popularity. The environmental context features include: asset exposure score, business criticality, network reachability, security control effectiveness, patch deployment difficulty, system redundancy, data sensitivity, and compliance impact.

[0030] In one embodiment, the ensemble learning prediction model includes at least: a random forest predictor, a gradient boosting predictor, an XGBoost predictor, a neural network predictor, and an LSTM time series predictor; wherein, the random forest predictor is used to handle nonlinear feature relationships and feature importance ranking; the gradient boosting predictor is used to optimize prediction accuracy and handle feature interactions; the XGBoost predictor is used to process large-scale data with a data volume reaching a first set value to prevent overfitting; the neural network predictor is used to learn nonlinear patterns with a complexity reaching a second set value; the LSTM time series predictor is used to capture time series dependencies; the ensemble strategies of each predictor in the ensemble learning prediction model include: a voting ensemble strategy, a stacking ensemble strategy, and a weighted ensemble strategy; the ensemble learning prediction model adopts an adaptive weight allocation algorithm and dynamically adjusts the weights of each predictor based on the model's prediction diversity and accuracy.

[0031] In another embodiment, the time-aware prediction mechanism of the ensemble learning prediction model uses the following method to predict software vulnerability exploitation risks: Obtain basic information and historical time-series data of the software vulnerability. The basic information includes: the release date, last modification date, and CVSS score of the vulnerability. The historical time-series data is data related to the vulnerability that changes over time, including: the appearance time of the exploit code PoC, the number of times it is mentioned on the dark web, and social media popularity data. Construct time-series features, which include: number of days since vulnerability release, number of days since the last modification of the vulnerability, number of days since patch availability, vulnerability age classification, vulnerability information update frequency, and seasonality factor; The model utilizes maturity evolution modeling, where a base score is calculated based on the public state of the exploit code, with a higher base score indicating a more mature exploit technique. The base score is combined with external threat intelligence to assess the likelihood of the vulnerability being weaponized for actual attacks. An exponential decay function is applied to simulate the phenomenon of vulnerability popularity naturally decaying over time, and the decay rate is controlled based on the decay constant. A multi-time-window prediction calculation is performed based on a rule-based and attenuation factor method, wherein the attenuation factor increases with the increase of the prediction time range to increase the risk weight of long-term prediction; the base prediction value is multiplied by the attenuation factor to obtain the adjusted prediction probability; and a pruning function is used to ensure that the prediction probability value is within the effective range. Prediction based on LSTM time series model: Historical feature data is processed into a fixed-length sequence, which is used as the input sequence of the LSTM model. A pre-trained Long Short-Term Memory (LSTM) network is used to predict the input sequence and output the prediction result. The prediction result is the future utilization probability sequence. Based on the prediction result, the future trend of utilization risk is calculated. Prediction result fusion: The baseline prediction results of multiple time windows are multiplied by the prediction weights of the LSTM model and then fused. Based on the fused prediction results and the uncertainty of each model, the confidence score is calculated. Output the prediction vector and confidence score of the exploit probability of the software vulnerability in each time window.

[0032] In one embodiment, the ensemble learning prediction model further includes: a time-series data collector, a feature extraction and fusion module, a time-series prediction engine, an adaptive weight adjuster, a confidence quantifier, and a feedback loop module. The multi-dimensional feature vector is input into the ensemble learning prediction model, and the exploit risk probability of the software vulnerability is predicted based on the time-aware prediction mechanism of the ensemble learning prediction model, including: Multi-source time-series data is collected through a time-series data collector, which includes: vulnerability disclosure timestamps, exploitation event sequences, and APT interest changes. The data collector supports real-time streaming input of data. The feature extraction and fusion module extracts time-related features and fuses non-time-related features, and uses an attention mechanism to highlight key times. The time-related features include vulnerability age distribution and exploitation time series, and the non-time-related features include CVSS scores and APT organization interest. The time-series prediction engine performs multi-step predictions based on the input time-series vector, outputting the probability of vulnerability exploitation over multiple future time windows; and / or, In another embodiment, the method further includes: Reinforcement learning or gradient descent algorithms based on adaptive weight adjusters dynamically adjust model weights; By integrating a confidence quantifier calculated using Bayesian methods or confidence intervals, the uncertainty score of the prediction is output to assess the reliability of the prediction. The prediction results are fed back into the model training via the feedback loop module for online learning.

[0033] The interest level of APT organizations is represented by the APT organization interest score, and the calculation formula is as follows: APT_interest_score = Σ(confidence_level_i × actor_sophistication_i) / n Where, confidence_level_i: the confidence level for the i-th APT organization, with a value between 0 and 1, used to characterize the credibility of the intelligence source of the APT organization; actor_sophistication_i: The complexity or capability score of the i-th APT organization, used to quantify the organization's attack technology level, resources and historical behavior, with a value between 0 and 1; Σ: summation over all relevant APT organizations; n: number of APT organizations; APT_interest_score: The interest score of the APT organization. The higher the score, the greater the interest. The value ranges from 0 to 1.

[0034] A key objective of this invention is to construct a system capable of integrating multi-dimensional feature vector information to achieve high-precision vulnerability exploitation prediction. The vulnerability exploitation prediction system based on multi-dimensional feature fusion and ensemble learning includes the following core technical components: (I) Multidimensional Feature Extraction Engine 1. Static Feature Extractor (1) CVSS vector parsing algorithm to extract basic features such as attack vector (AV), attack complexity (AC), and permission requirements (PR); (2) CWE weakness type mapping, establishing a correlation model between weakness type and exploitation difficulty; (3) Analyze the affected products and calculate the supplier concentration and product coverage characteristics.

[0035] 2. Timing Feature Constructor (1) Vulnerability time calculation: days_since_published = (current_time - published_date).days (2) Patch availability time window: patch_availability_days = max(0, days_since_published - 30) (3) Using maturity score: exploit_maturity_score = f(exploit_availability_status) (4) Update frequency feature: update_frequency = 1.0 / max(1, (modified_date - published_date).days) 3. Threat intelligence feature fusion algorithm APT_interest_score = Σ(confidence_level_i × actor_sophistication_i) / n Dark_web_mentions = count(mentions in dark_web_sources) Weaponization_likelihood = base_score + intel_factor × 0.2 4. Contextual Feature Extractor Network exposure assessment: exposure_score = f(attack_vector, network_config) Business criticality calculation: criticality = Σ(product_weight_i × business_impact_i) (3) Safety control effectiveness: controls_effectiveness = Σ(control_weight_i ×deployment_status_i).

[0036] In summary, multidimensional feature extraction engines include: (1) Static feature extractor, used to extract static features such as CVSS score, CWE type, and affected products; (2) Temporal feature constructor, used to calculate temporal features such as vulnerability age, patch availability, and exploit maturity; (3) Threat intelligence feature fusion tool, used to extract threat intelligence features such as APT organization interest level, number of mentions on the dark web, and weaponization potential; (4) Contextual feature extractor, used to assess environmental features such as asset exposure, business criticality, and security control effectiveness.

[0037] (ii) Ensemble learning prediction model The ensemble learning prediction model employs an adaptive weight allocation algorithm, which dynamically adjusts the weights of each predictor (also known as the base predictor) based on the model's predictive diversity and accuracy. It can support multi-time window prediction and predict the risk probability of vulnerability exploitation within multiple set time periods (e.g., 30 days, 90 days, 180 days).

[0038] 1. Multi-algorithm fusion architecture (1) Random Forest Predictor: handles nonlinear feature relationships and feature importance ranking; (2) Gradient boosting predictor: optimizes prediction accuracy and handles feature interactions; (3) XGBoost predictor: handles large-scale data (data volume reaches the first set value) and prevents overfitting; (4) Neural network predictor: learns complex (complexity reaches a second set value) nonlinear patterns (5) LSTM time series predictor: captures time series dependencies.

[0039] 2. Integration Strategy Voting ensemble strategy: soft_voting = Σ(weight_i × probability_i) / Σ(weight_i); Stacked ensemble strategy: Use meta-learners to combine the prediction results of the base model; Weighted ensemble strategy: dynamically adjust weights based on model performance.

[0040] 3. Confidence Calculation Algorithm confidence_score = |prediction_probability - 0.5| × 2 ensemble_confidence = agreement_factor × 0.6 + individual_confidence × 0.4 (III) Time-aware prediction mechanism By employing a time-aware prediction mechanism, a multi-time-window prediction model is designed, and a maturity evolution algorithm is utilized to achieve dynamic prediction of vulnerability exploitation risk. By integrating the time dimension (time-awareness), dynamic prediction of vulnerability exploitation probability and uncertainty quantification are achieved. Unlike traditional static prediction models, this mechanism emphasizes the time-series evolution characteristics of vulnerability risk, capturing patterns of risk change over time, such as the lifecycle curve from vulnerability disclosure to exploitation, thereby providing a more accurate and forward-looking risk assessment.

[0041] In the vulnerability exploitation risk prediction system, this mechanism serves as a core module, integrating multi-dimensional features (including technical features, threat intelligence, and historical exploitation data) and introducing time-series models to predict the dynamic spread of risks, which helps identify potential outbreak points for "zero-day" vulnerabilities or emerging threats.

[0042] Time-series awareness: Traditional forecasting often ignores the time factor, resulting in a slow response of the model to sudden events. This mechanism introduces a time-series awareness layer, which views the vulnerability risk as a time series process (e.g., ARIMA or LSTM model) to capture the trend, seasonality and noise of the risk.

[0043] Prediction Mechanism: Employing ensemble learning combined with a temporal model, this mechanism achieves multi-step prediction. It first extracts temporal features (such as vulnerability age and post-disclosure utilization changes), then generates dynamic prediction scores through a perceptual fusion algorithm. Through "temporal adaptive weight adjustment," which adjusts model weights in real-time based on historical data, it ensures higher sensitivity to new intelligence predictions, thus distinguishing it from standard temporal models (such as Prophet). This model places greater emphasis on security domain-specific adaptations (such as the suddenness of APT attacks).

[0044] The Temporal Risk Score (TRS) is calculated as follows: TRS_t = f(TRS_{t-1}) + Δ(feature fusion) × ω_t, where f() is the temporal recursive function (e.g., LSTM state transition), Δ is the feature increment at the current time step, and ω_t is the adaptive weight (dynamically calculated based on confidence level).

[0045] This time-aware prediction mechanism can be composed of the following key technical components, forming a closed-loop prediction pipeline: (1) Time-series data collector: Collects time-series data from multiple external threat intelligence sources (such as CVE vulnerability intelligence database, threat intelligence platform), including vulnerability disclosure timestamps, exploitation event sequences, APT interest changes, etc., and can support real-time streaming input (e.g., using Kafka or similar queues).

[0046] (2) Feature extraction and fusion module: Extract time-related features (such as vulnerability age distribution and utilization time series) and fuse non-time-related features (e.g., CVSS score, APT_interest_score), and use attention mechanism (Attention Layer) to highlight key time points.

[0047] (3) Time series prediction engine: The core is LSTM (Long Short-Term Memory) or a variant of Transformer, which supports multi-step prediction. The model input is a time series vector X_t = [feature1_t, feature2_t, ..., featurek_t], and the output is the exploit probability P (exploit) from t+1 to t+n in the future.

[0048] (4) Adaptive weight adjuster: Uses reinforcement learning or gradient descent algorithms to dynamically optimize model weights. For example, with the recent increase in APT activity, the weights will be biased towards threat intelligence features.

[0049] (5) Confidence quantifier: Integrates Bayesian methods or confidence interval calculations to output the uncertainty score of the prediction (e.g., 95% confidence interval) to help users assess the reliability of the prediction.

[0050] (6) Feedback loop: The prediction results are fed back into the model training to achieve online learning, thereby adapting to new threats.

[0051] The following is an explanation using a specific example: Suppose we are evaluating a web server vulnerability (CVE-2024-YYYY), disclosed in January 2024. The system collects time-series data: the utilization rate is 5% in the first week after disclosure, rising to 15% in the second week (influenced by APT interest).

[0052] Input: Time series [5%, 15%, probability of future events?] + features (e.g., CVSS=8.5, APT_interest_score=0.7).

[0053] Handling: The LSTM model predicted a utilization rate of 25% in week 3, with an uncertainty of ±5%.

[0054] Output: TRS=0.65 (high risk), it is recommended to patch immediately and monitor traffic.

[0055] Effect: Without this mechanism, the static model might underestimate the value to 0.4; the time-series awareness captures the upward trend and provides early warning of potential attacks.

[0056] The following explains how to construct a multi-dimensional feature vector fusion framework in this patent application. In this embodiment, a 37-dimensional feature vector is used as an example.

[0057] Step 1: Construction of the Feature Extraction Pipeline Initialize the FeatureExtractionEngine instance, and establish a feature vector standardization and normalization process by configuring the static feature extractor, temporal feature builder, threat intelligence feature fusionist and environmental context feature extractor, so as to realize the automatic generation of 37-dimensional feature vectors and complete the creation of vectors based on static features + temporal features + threat intelligence features + environmental features.

[0058] Step 2: Machine Learning Model Training Data preprocessing: Feature missing value imputation and outlier detection are performed using training / validation / test set partitions (e.g., a 6:2:2 ratio). Hyperparameter optimization is conducted through random search or Bayesian methods. Multiple base models are trained in parallel for different models. The optimal ensemble strategy is selected based on the validation set performance.

[0059] Feature vector construction algorithm: def extract_feature_vector(vulnerability_features): vector = [] # Static features (11 dimensions) static = vulnerability_features.static_features vector.extend([ static.cvss_base_score, static.cvss_impact_score, static.cvss_exploitability_score, # ... Other static features ]) # Temporal features (9 dimensions) temporal = vulnerability_features.temporal_features vector.extend([ min(temporal.days_since_published / 365.0, 2.0), temporal.exploit_maturity_score, # ... Other time-series characteristics ]) # Threat Intelligence Characteristics (9 Dimensions) threat_intel = vulnerability_features.threat_intel_features vector.extend([ threat_intel.apt_group_interest, threat_intel.weaponization_likelihood, # ... Other threat intelligence characteristics ]) # Environmental characteristics (8 dimensions) environmental = vulnerability_features.environmental_features vector.extend([ environmental.asset_exposure_score, environmental.business_criticality, # ... Other environmental characteristics ]) Examples of feature extraction dimensions are as follows: Feature Extraction Engine ================================================== Testing vulnerability: CVE-2023-12345 CVSS Score: 8.5 CWE: CWE-79 Exploit Available: ExploitAvailability.POC Extracting features... Static Features: CVSS Base Score: 8.5 CVSS Impact Score: 9.1 CVSS Exploitability Score: 3.9 CWE Category: Input Validation CWE Weakness Type: Web Application Affected Product Count: 2 Vendor Count: 1 Reference Count: 2 Description Length: 93 Has Remote Access: True Requires Authentication: False Complexity Level: low Temporal Features: Days Since Published: 30 Days Since Modified: 15 Patch Availability Days: 0 Exploit Maturity Score: 0.3 Exploit Code Availability: 0.5 Remediation Level: 1.0 Report Confidence: 1.0 Age Category: new Update Frequency: 0.06666666666666667 Threat Intelligence Features: APT Group Interest: 0.8 Dark Web Mentions: 1 Social Media Buzz: 0.0 Exploit Kit Inclusion: True Ransomware Association: False Nation State Interest: 0.8 Criminal Group Interest: 0.6 PoC Availability Score: 0.5 Weaponization Likelihood: 0.43999999999999995 Environmental Features: Asset Exposure Score: 0.7 Business Criticality: 0.8 Network Accessibility: 1.0 Security Controls Effectiveness: 0.9999999999999999 Patch Deployment Difficulty: 0.3 System Redundancy: 0.5 Data Sensitivity: 0.8 Compliance Impact: 0.8999999999999999 Converting to feature vector... Feature vector length: 37 Feature vector (first 10 values): [8.5, 9.1, 3.9, 2, 1, 2,0.093, 1.0,0.0, 0.2] Testing individual extractors... Static extractor - CWE Category: Input Validation Temporal extractor - Age Category: new Threat Intel extractor - APT Interest: 0.8 Environmental extractor - Business Criticality: 0.8 Feature extraction test completed successfully! All extractors are working correctly. The following describes the method for extracting associated threat intelligence features involved in this patent application.

[0060] Threat intelligence feature vectorization ensures breadth and depth by aggregating intelligence from commercial channels, open-source networks, the dark web, and proactive defense facilities (honeypots). The threat intelligence feature fusion engine serves as the intersection of all data, performing deduplication, standardization, and correlation to provide a unified data view for subsequent analysis. A feature fusion algorithm then combines the outputs from the three analytical dimensions (threat actor analysis, threat activity analysis, and exploitation capability assessment) into a structured threat intelligence feature vector. This vector is a collection of multiple quantitative indicators, such as: APT group interest level, dark web mention frequency, weaponization probability, ransomware association, and Proof-of-Concept (PoC) availability.

[0061] Output: APT organization interest level: 0.8 (numerical value, the higher the value, the more dangerous). Dark web mentions: 15 (numerical value, reflecting popularity) Weaponization probability: 0.65 (numerical value, reflecting technological maturity) Ransomware association: True (Boolean value, a strong indicator of risk) PoC availability: 0.7 (numerical value, lowering the attack threshold) Among them, the interest level of APT organizations is represented by the APT organization interest score, specifically as follows: APT_interest_score = Σ(confidence_level_i × actor_sophistication_i) / n `confidence_level_i`: The confidence level for the i-th APT organization. It is a score based on intelligence reliability (typically ranging from 0 to 1), indicating the degree of credibility of the intelligence of interest to that organization. For example, if the intelligence source is reliable (such as an official report), the confidence level is high. actor_sophistication_i: The complexity or capability score of the i-th APT organization, quantifying the organization's attack technology level, resources, and historical behavior (range 0-1). For example, a state-sponsored organization (such as APT29) may have a higher complexity score; Σ: Summation over all relevant APT organizations (n ​​in total); / n: Divide by the number of organizations to get the average interest score (the final score ranges from 0 to 1, with a higher score indicating greater overall interest).

[0062] The calculation logic used above is to employ a weighted average algorithm, emphasizing that APT organizations with high confidence and high complexity contribute more to the total score. Because it integrates data from multiple sources, it can avoid bias from a single intelligence source and ensure that the score is objective and reliable.

[0063] The following example illustrates this point: Suppose we need to assess a vulnerability (CVE-2023-XXXX) affecting energy infrastructure, and the system detects three relevant APT groups that may be of interest (n=3). Based on threat intelligence databases (such as MITRE ATT&CK or custom APT profiles), we collect data and calculate: APT Group 1: APT28 (Fancy Bear, with a background in Country A) confidence_level_1 = 0.8 (Intelligence comes from reliable sources, such as CrowdStrike reports, confirming that the group has targeted the energy system).

[0064] actor_sophistication_1 = 0.9 (High complexity: possesses advanced tools and persistent penetration capabilities).

[0065] Contribution: 0.8 × 0.9 = 0.72.

[0066] APT Group 2: APT43 (Background in Country B) confidence_level_2 = 0.6 (Intelligence is moderately reliable, based on an open-source report, but a direct link has not been confirmed).

[0067] actor_sophistication_2 = 0.7 (Medium complexity: focused on intellectual property theft, but not energy expertise).

[0068] Contribution: 0.6 × 0.7 = 0.42.

[0069] APT Group 3: Lazarus Group (Background in Country B) confidence_level_3 = 0.9 (High confidence: Historical data shows multiple instances of targeting critical infrastructure).

[0070] actor_sophistication_3 = 0.85 (High complexity: involving destructive attacks such as WannaCry).

[0071] Contribution: 0.9 × 0.85 = 0.765.

[0072] Overall calculated score: APT_interest_score = (0.72 + 0.42 + 0.765) / 3 ≈ 0.635 (or 63.5%).

[0073] The overall calculated score indicates a moderate to high overall APT interest (>0.5), suggesting that this vulnerability may be exploited by these groups, especially APT28 and Lazarus Group, due to their significant contributions. The system will adjust the vulnerability exploitation probability prediction accordingly, and it is recommended to strengthen the protection of energy systems (such as applying patches or monitoring abnormal traffic).

[0074] The feature fusion process described above comprehensively quantifies and characterizes the external threat environment faced by a vulnerability. When the threat intelligence feature vector is input into the prediction model along with the vulnerability's static and temporal features, the model can make more accurate and forward-looking risk predictions based not only on technical details but also on the real-world threat landscape.

[0075] The timing-aware prediction mechanism involved in this patent application is described below.

[0076] The time-series-aware prediction algorithm's workflow framework integrates feature engineering, maturity modeling, multi-time-window analysis, and deep learning models, aiming to provide a multi-dimensional, high-precision framework for time-series vulnerability exploitation prediction.

[0077] The algorithm's input consists of two parts: Basic vulnerability information: This includes static information such as the vulnerability's release date, last modification date, and CVSS score.

[0078] Historical time series data: This includes data on historical events related to vulnerabilities, such as the emergence time of exploit code (PoC), the number of mentions on the dark web, and the changes in social media popularity over time.

[0079] Specifically, it can be divided into the following stages: Phase 1: Temporal Feature Construction This phase focuses on feature engineering, extracting dynamic features with business implications from raw time data to provide high-quality input for subsequent models.

[0080] days_since_published: Calculates the number of days since the vulnerability was published, and is a core indicator for measuring the "age" of a vulnerability.

[0081] days_since_modified: Calculates the number of days since the vulnerability was last modified, reflecting the stability of the vulnerability information and its recent activity level.

[0082] patch_availability_days: Simulates the "security window" after a patch is released, assuming the patch becomes available 30 days after the vulnerability is released, and is used to assess the risks of a delay in patching.

[0083] age_category: Classifies vulnerabilities by age (e.g., "new", "medium", "old"), making it easier for the model to capture risk patterns at different lifecycle stages.

[0084] update_frequency: The frequency at which vulnerability information is updated. High-frequency updates may mean that the vulnerability is being actively researched or exploited.

[0085] seasonal_factor: Extracts the month of release as a seasonal factor to capture potential annual periodic attack patterns (such as during holidays).

[0086] Phase Two: Exploit Maturity Evolution Modeling This stage models the maturation process of vulnerability exploitation, quantifying its evolution from theoretical existence to actual weaponization.

[0087] exploit_maturity_score: Calculates a score based on the public status of the exploit code (e.g., none, private, PoC, functional exploit). A higher score indicates a more mature exploit technique.

[0088] weaponization_likelihood: Combines the base score and external threat intelligence (intel_factor) to assess the likelihood that the vulnerability could be weaponized for actual attacks.

[0089] temporal_decay: Applies an exponential decay function to simulate the phenomenon of vulnerability popularity naturally decreasing over time. decay_constant controls the decay rate.

[0090] Phase 3: Multi-Time-Window Prediction Calculation This stage employs a rule-based and decay factor-based approach to quickly generate baseline forecasts for multiple future time windows (30 days, 90 days, and 180 days).

[0091] decay_factor: This decay factor increases with the forecast horizon and is used to adjust the risk weights for long-term forecasts. This reflects the assumption that the longer the time horizon, the greater the uncertainty, but the higher the cumulative risk may also be.

[0092] adjusted_pred: Multiply the base prediction value by the decay factor to obtain the adjusted prediction probability.

[0093] clip(adjusted_pred, 0, 1): Uses a clipping function to ensure that the predicted probability value is always within the valid range of [0, 1].

[0094] Phase Four: LSTM Time-series Model Prediction This stage introduces deep learning models to capture complex, non-linear temporal dependencies in the data.

[0095] create_sequences: Processes historical feature data into a fixed-length sequence (sequence_length) to serve as the input sequence for the LSTM model.

[0096] lstm_model.predict: Uses a pre-trained Long Short-Term Memory (LSTM) network model to predict the future utilization probability sequence of the input sequence.

[0097] calculate_trend: Calculates the future trend of risk (upward, downward, or stable) based on the prediction results of LSTM.

[0098] Phase 5: Prediction Result Fusion This stage is the core decision-making part of the algorithm, which integrates the results of the two prediction paths (rule-based multi-window prediction and deep learning-based LSTM prediction) to achieve complementary advantages.

[0099] `final_prediction`: This performs a weighted fusion by multiplying the baseline predictions from multiple time windows with the prediction weights (`lstm_weight`) of the LSTM model. `lstm_weight` can be a fixed value or a variable that is dynamically adjusted based on the model confidence, aiming to balance fast heuristic predictions with the complex patterns captured by deep models.

[0100] calculate_temporal_confidence: Based on the fused prediction results and the uncertainties of each model, calculate the final confidence score to provide users with a measure of the reliability of the prediction.

[0101] Output result: The final output is a prediction vector containing probabilities for multiple future time windows (e.g., 30 days, 90 days, 180 days), along with a comprehensive confidence score, providing comprehensive, dynamic, and interpretable data support for security decisions.

[0102] For example: For example, we need to handle a typical software vulnerability, CVE-2023-12345, which affects popular web server software and has a base CVSS score of 7.5. The system first extracts a multidimensional feature vector X (simplified example: [0.75, 0.6, 0.8, 0.9], representing CVSS score, time-series maturity, threat intelligence relevance, and environmental exposure, respectively).

[0103] The list of trained models includes Random Forest (prediction probability 0.65), XGBoost (prediction probability 0.72), and LSTM (prediction probability 0.68), with corresponding weights of [0.4, 0.3, 0.3].

[0104] The function predict_exploitation_probability(X, models, weights) executes as follows: individual_predictions = [0.65, 0.72, 0.68] individual_confidences = [assuming calculated values ​​of 0.8, 0.85, 0.82] weighted_predictions = 0.65*0.4 + 0.72*0.3 + 0.68*0.3 = 0.26 + 0.216+ 0.204 = 0.68 ensemble_predictions = 0.68 / 1.0 = 0.68 (68% probability of using) ensemble_confidence = calculate_ensemble_confidence(...) = 0.83 (confidence 83%) By integrating the outputs of multiple models, quantitative predictions are provided to support the priority remediation of high-risk vulnerabilities.

[0105] This invention discloses a method for predicting software vulnerability exploitation risks, used in a software vulnerability exploitation risk prediction system. The system includes a multi-dimensional feature extraction engine and an ensemble learning prediction model. The method includes: acquiring software vulnerability information to be predicted and multiple external threat intelligence sources; using the multi-dimensional feature extraction engine to extract multi-dimensional features from the software vulnerability information and the multiple external threat intelligence sources, wherein the multi-dimensional features include at least: static features, temporal features, threat intelligence features, and environmental context features; inputting the multi-dimensional features into the ensemble learning prediction model; and predicting the probability of software vulnerability exploitation risk based on the temporal awareness prediction mechanism of the ensemble learning prediction model. This method, employing the temporal awareness prediction mechanism of the ensemble learning prediction model and based on multi-dimensional features, can accurately predict the exploitation risk of software vulnerabilities.

[0106] On the other hand, embodiments of the present invention provide a software vulnerability exploitation risk prediction system, such as... Figure 2 As shown, the system includes a multi-dimensional feature extraction engine 202 and an ensemble learning prediction model 203. The system comprises: The acquisition module 201 is used to acquire information on software vulnerabilities to be predicted and multiple external threat intelligence sources; A multi-dimensional feature extraction engine 202 is used to extract multi-dimensional features from the software vulnerability information and the multiple external threat intelligence sources. The multi-dimensional features include at least: static features, temporal features, threat intelligence features, and environmental context features. An ensemble learning prediction model 203 is used to predict the probability of exploitation of the software vulnerability based on a time-aware prediction mechanism using the input multidimensional features.

[0107] This invention provides another software vulnerability exploitation risk prediction system, such as... Figure 3 As shown, the multidimensional feature extraction engine 202 may include: a static feature extractor, a temporal feature builder, a threat intelligence feature fusionist, and an environmental context feature extractor; The ensemble learning prediction model 203 may include: random forest predictor, gradient boosting predictor, XGBoost predictor, neural network predictor and LSTM time series predictor.

[0108] Thirdly, embodiments of the present invention provide an electronic device, including: a memory and a processor, wherein the memory and the processor are connected; Memory, used to store computer programs; A processor is used to invoke a computer program stored in memory to perform any of the above methods.

[0109] Fourthly, embodiments of the present invention provide a computer-readable storage medium having a computer program stored thereon, wherein the computer program, when run by a computer, performs any of the methods described above.

[0110] It should be noted that the content of the method embodiments and system embodiments provided in this patent application corresponds one-to-one with each other. The content involved in any embodiment can be referenced or combined with other embodiments to form part of that embodiment. For ease of description, this patent application focuses on explaining the method embodiments. The description of relevant technical features and solutions of the system, electronic device, and computer-readable storage medium embodiments can be referred to the relevant content in the method embodiments.

[0111] Those skilled in the art will understand that embodiments of the present invention can be provided as methods, apparatus, systems, electronic devices, or computer program products. Therefore, the present invention can take the form of a completely hardware embodiment, a completely software embodiment, or an embodiment combining software and hardware aspects. Furthermore, the present invention can take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage and optical storage) containing computer-usable program code.

[0112] These computer program instructions may also be stored in a computer-readable storage medium that can direct a computer or other programmable data processing device to function in a particular manner, such that the instructions stored in the computer-readable storage medium produce an article of manufacture including instruction means, which are implemented in a process Figure 1 One or more processes and / or boxes Figure 1 The function specified in one or more boxes.

[0113] These computer program instructions may also be loaded onto a computer or other programmable data processing equipment to cause a series of operational steps to be performed on the computer or other programmable equipment to produce a computer-implemented process, thereby providing instructions that execute on the computer or other programmable equipment for implementing the process. Figure 1 One or more processes and / or boxes Figure 1 The steps of the function specified in one or more boxes.

[0114] Obviously, those skilled in the art can make various modifications and variations to this invention without departing from its spirit and scope. Therefore, if these modifications and variations fall within the scope of the claims of this invention and their equivalents, this invention also intends to include these modifications and variations.

Claims

1. A software exploit risk prediction method, characterized by, A software vulnerability exploitation risk prediction system, the software vulnerability exploitation risk prediction system comprising: a multi-dimensional feature extraction engine and an ensemble learning prediction model, the method comprising: Obtain information on software vulnerabilities to be predicted and multiple external threat intelligence sources; A multidimensional feature extraction engine is used to extract multidimensional features from the software vulnerability information and the multiple external threat intelligence sources. The multidimensional features include at least: static features, temporal features, threat intelligence features, and environmental context features. The multidimensional features are input into an ensemble learning prediction model, and the probability of exploitation of the software vulnerability is predicted based on the time-aware prediction mechanism of the ensemble learning prediction model.

2. The method of claim 1, wherein, The multi-dimensional feature extraction engine includes a threat intelligence feature fusion unit, which is used to fuse threat intelligence from multiple external threat intelligence sources; the method further includes: The threat intelligence feature fusion ... Threat actor analysis, threat activity analysis, and exploitation capability assessment are performed on the fused data, and a threat intelligence feature vector is calculated based on the feature fusion algorithm. The threat intelligence feature vector is a set of multiple quantitative indicators, which include at least: APT organization interest, number of mentions on the dark web, weaponization probability, ransomware association, and PoC availability.

3. The method according to claim 2, characterized in that: The threat actor analysis includes: APT organization identification and calculation of APT organization interest level; Criminal organization identification and calculation of interest in criminal organizations; Identify national-level threats and calculate national-level threat scores; The threat activity analysis includes: The analysis includes mentions of the dark web and the calculation of a dark web popularity score. Social media monitoring and calculation of social media popularity; Detect and analyze the vulnerability exploitation toolkits and their contents. Ransomware association analysis and calculation of ransomware association degree; The utilization capability assessment and analysis includes: PoC code usability analysis and calculation of PoC usability score; Weaponization assessment and analysis, and calculation of weaponization probability; The multidimensional feature extraction engine also includes: a static feature extractor, a temporal feature builder, and an environmental context feature extractor; The static feature extractor is used at least for: Based on the CVSS vector parsing algorithm, basic features are extracted, including: attack vector AV, attack complexity AC, and permission requirement PR. Based on the CWE vulnerability type mapping, a correlation model between vulnerability type and exploitation difficulty is established; Analyze the affected products and calculate supplier concentration and product coverage characteristics; The time-series feature constructor is used at least for: Calculate vulnerability timing, construct patch availability time windows, and leverage maturity scores and update frequency characteristics; The environmental context feature extractor is used at least for: Network exposure assessment, business criticality calculation, and security control effectiveness calculation; The static features include: CVSS base score, impact score, availability score, CWE category, number of affected products, number of affected suppliers, number of reference links, description length, remote access, certification requirements, and complexity level; The time-series characteristics include: number of days since vulnerability release, number of days since patch availability, exploit maturity, update frequency, age classification, seasonality factor, time decay factor, remediation confidence, and report confidence. The threat intelligence features include: APT group interest level, dark web mention frequency, weaponization potential, state-level threats, criminal organization interest, exploit kits, ransomware associations, PoC availability, and social media popularity. The environmental context features include: asset exposure score, business criticality, network reachability, security control effectiveness, patch deployment difficulty, system redundancy, data sensitivity, and compliance impact.

4. The method according to claim 1, characterized in that, The ensemble learning prediction model includes at least: a random forest predictor, a gradient boosting predictor, an XGBoost predictor, a neural network predictor, and an LSTM time series predictor; wherein, the random forest predictor is used to handle nonlinear feature relationships and feature importance ranking; the gradient boosting predictor is used to optimize prediction accuracy and handle feature interactions; the XGBoost predictor is used to process large-scale data with a data volume reaching a first set value to prevent overfitting; the neural network predictor is used to learn nonlinear patterns with a complexity reaching a second set value; and the LSTM time series predictor is used to capture time series dependencies; the ensemble strategies of the predictors in the ensemble learning prediction model include: a voting ensemble strategy, a stacking ensemble strategy, and a weighted ensemble strategy; the ensemble learning prediction model adopts an adaptive weight allocation algorithm and dynamically adjusts the weights of each predictor based on the model's prediction diversity and accuracy.

5. The method according to claim 1, characterized in that, The time-aware prediction mechanism of the ensemble learning prediction model uses the following method to predict software vulnerability exploitation risks: Obtain basic information and historical time-series data of the software vulnerability. The basic information includes: the release date, last modification date, and CVSS score of the vulnerability. The historical time-series data is data related to the vulnerability that changes over time, including: the appearance time of the exploit code PoC, the number of times it is mentioned on the dark web, and social media popularity data. Construct time-series features, which include: number of days since vulnerability release, number of days since the last modification of the vulnerability, number of days since patch availability, vulnerability age classification, vulnerability information update frequency, and seasonality factor; The model utilizes maturity evolution modeling, where a base score is calculated based on the public state of the exploit code, with a higher base score indicating a more mature exploit technique. The base score is combined with external threat intelligence to assess the likelihood of the vulnerability being weaponized for actual attacks. An exponential decay function is applied to simulate the phenomenon of vulnerability popularity naturally decaying over time, and the decay rate is controlled based on the decay constant. A multi-time-window prediction calculation is performed based on a rule-based and attenuation factor method, wherein the attenuation factor increases with the increase of the prediction time range to increase the risk weight of long-term prediction; the base prediction value is multiplied by the attenuation factor to obtain the adjusted prediction probability; and a pruning function is used to ensure that the prediction probability value is within the effective range. Prediction based on LSTM time series model: Historical feature data is processed into a fixed-length sequence, which is used as the input sequence of the LSTM model. A pre-trained Long Short-Term Memory (LSTM) network is used to predict the input sequence and output the prediction result. The prediction result is the future utilization probability sequence. Based on the prediction result, the future trend of utilization risk is calculated. Prediction result fusion: The baseline prediction results of multiple time windows are multiplied by the prediction weights of the LSTM model and then fused. Based on the fused prediction results and the uncertainty of each model, the confidence score is calculated. Output the prediction vector and confidence score of the exploit probability of the software vulnerability in each time window.

6. The method according to claim 1, characterized in that, The ensemble learning prediction model further includes: a time-series data collector, a feature extraction and fusion module, a time-series prediction engine, an adaptive weight adjuster, a confidence quantifier, and a feedback loop module. The multi-dimensional feature vector is input into the ensemble learning prediction model, and the exploit probability of the software vulnerability is predicted based on the time-aware prediction mechanism of the ensemble learning prediction model, including: Multi-source time-series data is collected through a time-series data collector, which includes: vulnerability disclosure timestamps, exploitation event sequences, and APT interest changes. The data collector supports real-time streaming input of data. The feature extraction and fusion module extracts time-related features and fuses non-time-related features, and uses an attention mechanism to highlight key times. The time-related features include vulnerability age distribution and exploitation time series, and the non-time-related features include CVSS scores and APT organization interest. The time-series prediction engine performs multi-step predictions based on the input time-series vector, outputting the probability of vulnerability exploitation over multiple future time windows; and / or, The method further includes: Reinforcement learning or gradient descent algorithms based on adaptive weight adjusters dynamically adjust model weights; By integrating a confidence quantifier calculated using Bayesian methods or confidence intervals, the uncertainty score of the prediction is output to assess the reliability of the prediction. The prediction results are fed back into the model training via the feedback loop module for online learning.

7. The method according to claim 2, characterized in that, The interest level of APT organizations is represented by the APT organization interest score, and the calculation formula is as follows: APT_interest_score = Σ(confidence_level_i × actor_sophistication_i) / n Where, confidence_level_i: the confidence level for the i-th APT organization, with a value between 0 and 1, used to characterize the credibility of the intelligence source of the APT organization; actor_sophistication_i: The complexity or capability score of the i-th APT organization, used to quantify the organization's attack technology level, resources and historical behavior, with a value between 0 and 1; Σ: summation over all relevant APT organizations; n: number of APT organizations; APT_interest_score: The interest score of the APT organization. The higher the score, the greater the interest. The value ranges from 0 to 1.

8. A software vulnerability exploitation risk prediction system, characterized in that, The system includes a multi-dimensional feature extraction engine and an ensemble learning prediction model. The data acquisition module is used to obtain information on software vulnerabilities to be predicted and multiple external threat intelligence sources. A multi-dimensional feature extraction engine is used to extract multi-dimensional features from the software vulnerability information and the multiple external threat intelligence sources. The multi-dimensional features include at least: static features, temporal features, threat intelligence features, and environmental context features. An ensemble learning prediction model is used to predict the probability of exploitation of the software vulnerability based on a time-aware prediction mechanism using the multidimensional features of the input.

9. An electronic device, characterized in that, include: Memory and processor, and the connection between memory and processor; Memory, used to store computer programs; A processor for invoking a computer program stored in memory to perform the method as claimed in any one of claims 1-7.

10. A computer-readable storage medium, characterized in that, It stores a computer program, which, when executed by a computer, performs the method as described in any one of claims 1-7.