A digital twin network active fault prediction and sandbox verification system and method

The digital twin network proactive fault prediction and sandbox verification system has achieved high-precision fault prediction and efficient solution verification, solving the problems of low prediction accuracy, low verification efficiency and high operation and maintenance costs in traditional network fault management, and improving network operation and maintenance capabilities.

CN122247833APending Publication Date: 2026-06-19NANJING YUNWEI COMM TECH CO LTD

Patent Information

Authority / Receiving Office
CN · China
Patent Type
Applications(China)
Current Assignee / Owner
NANJING YUNWEI COMM TECH CO LTD
Filing Date
2026-05-12
Publication Date
2026-06-19

AI Technical Summary

Technical Problem

Existing network fault management technologies suffer from problems such as low fault prediction accuracy, delayed early warning, lack of standardized verification system, high operation and maintenance costs, and lack of continuous optimization mechanism.

Method used

A digital twin network proactive fault prediction and sandbox verification system is adopted. The network twin modeling module achieves high-precision real-time mapping, the multi-source data acquisition module processes the data accurately, and the machine learning algorithm is combined to predict faults. The system is then simulated and verified in the sandbox verification module to form a closed-loop management system.

Benefits of technology

It enables accurate fault prediction 30-180 minutes in advance, efficiently verifies handling solutions, reduces operation and maintenance costs, improves network operation stability and operation and maintenance efficiency, and adapts to complex network operation and maintenance needs.

✦ Generated by Eureka AI based on patent content.

Smart Images

  • Figure CN122247833A_ABST
    Figure CN122247833A_ABST
Patent Text Reader

Abstract

This invention discloses a digital twin network proactive fault prediction and sandbox verification system and method, belonging to the field of intelligent network operation and maintenance technology. It includes a network twin modeling module, a multi-source data acquisition module, an intelligent predictive analysis module, and a sandbox verification module. The network twin modeling module is used to construct a full-element digital twin of the physical network, realizing real-time mapping of the physical network topology, device status, and operating parameters. Through the high-precision real-time mapping of the network twin modeling module and the accurate processing of the multi-source data acquisition module, combined with a hybrid model of "random forest feature screening + LSTM time series prediction" and incremental learning of the fault knowledge base, accurate fault prediction 30-180 minutes in advance is achieved, solving the problems of low accuracy, delayed early warning, insufficient data support, and reliance on post-event repair in traditional network fault prediction.
Need to check novelty before this filing date? Find Prior Art

Description

Technical Field

[0001] This invention relates to the field of intelligent network operation and maintenance technology, and in particular to a digital twin network proactive fault prediction and sandbox verification system and method. Background Technology

[0002] Driven by next-generation information technologies such as 5G, cloud computing, and the Internet of Things, networks have become the core infrastructure supporting social production and daily life, as well as the digital transformation of industry. Their coverage continues to expand, the number of device nodes is surging, and business scenarios are becoming increasingly diversified, placing higher demands on the stability, reliability, and operational efficiency of network operation. Digital twin networks, as an emerging technology integrating digital modeling, data acquisition, intelligent analysis, and virtual simulation, provide a new technical path for network fault management by constructing a full-element digital mapping of the physical network. Network fault prediction and handling are crucial for ensuring continuous network operation. Their core logic lies in collecting network operation data, identifying early warning signs of faults, predicting potential faults in advance, and formulating scientific handling plans to reduce the impact of faults on services. This process relies on accurate network mapping, high-quality data support, efficient intelligent analysis, and secure solution verification as its foundation.

[0003] Currently, existing network fault management technologies in the industry still have significant technical shortcomings: First, traditional network fault prediction relies heavily on single data sources and simple algorithm models, lacking high-precision digital mapping of physical networks. Data collection coverage is incomplete, preprocessing is ineffective, and models do not fully incorporate historical fault experience, resulting in low fault prediction accuracy and delayed warnings. This makes it difficult to effectively predict potential faults and remains a passive "post-event repair" mode. Second, fault handling solutions lack standardized verification systems. In most cases, testing is done directly in the physical network, which not only easily leads to secondary faults and service interruptions but also uses a serial verification process, resulting in low efficiency and an inability to quickly select the optimal solution. Furthermore, the verification scenarios have poor adaptability and cannot meet the diverse operating conditions such as high-load voice services and high-definition video transmission. Third, there is a lack of a closed-loop management mechanism for the entire process. Historical fault cases and handling experience have not been systematically accumulated and reused. Models and solutions cannot be continuously iterated and optimized based on new data, resulting in difficulty in improving operation and maintenance capabilities, high operation and maintenance costs, and an inability to adapt to the operation and maintenance needs of modern complex networks. Summary of the Invention

[0004] The purpose of this invention is to provide a digital twin network proactive fault prediction and sandbox verification system and method, which solves the technical problems mentioned in the background art.

[0005] To achieve the above objectives, the present invention provides the following technical solution: a digital twin network active fault prediction and sandbox verification system, comprising a network twin modeling module, a multi-source data acquisition module, an intelligent prediction and analysis module, and a sandbox verification module; The network twin modeling module is used to construct a full-element digital twin of the physical network, enabling real-time mapping of the physical network topology, device status, and operating parameters; The multi-source data acquisition module is used to collect traffic data, device log data, and link status data of the physical network, and output standardized multi-dimensional network operation data; The intelligent predictive analysis module analyzes multi-dimensional network operation data based on machine learning algorithms to predict network faults in advance and output fault prediction results and candidate handling solutions. The sandbox verification module is used to build a virtual simulation environment to simulate and verify the fault prediction results and corresponding fault handling solutions. It outputs a verification report that includes the effectiveness of the solution and the degree of business impact, forming a closed-loop management of "prediction-verification-optimization".

[0006] Preferably, the network twin modeling module includes a topology mapping unit and a dynamic update unit; The topology mapping unit uses 3D modeling technology to restore network devices, link connections and deployment environment, with a modeling accuracy error of ≤1%, and supports digital mapping of fine-grained parameters such as device ports and link bandwidth; The dynamic update unit adopts an improved MQTT / OPCUA dual-mode synchronization protocol, which adjusts the data transmission frequency through an adaptive heartbeat mechanism to achieve real-time linkage between the digital twin and the physical network, with a data synchronization delay of ≤50ms. Digital twins support cross-level and cross-regional visualization of network status, and can be used to view the operational details of a single device or the overall network topology through the drill-down function.

[0007] Preferably, the multi-source data acquisition module includes a data acquisition unit and a data preprocessing unit; The data acquisition unit integrates a network probe, an SNMPv3 protocol collector, and a log collection tool, covering data acquisition from the core network, access network, and terminal devices, and supports compliant parsing of encrypted data. The data preprocessing unit uses the 3σ criterion combined with the isolated forest algorithm to denoise the collected data, unifies the data units through Min-Max standardization, performs format conversion simultaneously, and removes outlier data at a rate of ≤3%. The sampling frequency is adjustable, ranging from 1 time / second to 1 time / minute. It supports dynamic adaptation of the sampling granularity based on network load, automatically increasing the sampling frequency in high-load scenarios and reducing the frequency in low-load scenarios to reduce resource consumption.

[0008] Preferably, the intelligent predictive analysis module includes a model training unit and a fault prediction unit; The model training unit adopts a hybrid fusion architecture of "random forest feature selection + LSTM time series prediction". First, the feature importance score is calculated by random forest to select key features with a weight ≥ 0.6. Then, the key features are input into the LSTM neural network for time series modeling. The model is trained based on more than 10,000 historical fault data, and the fault prediction accuracy is ≥ 92%. The fault prediction unit is used to analyze real-time network operation data, identify fault precursor characteristics, continuously monitor data trends through a sliding window algorithm, and output the predicted results of fault type, location and impact range 30-180 minutes in advance, supporting automatic classification of fault risk levels.

[0009] Preferably, the sandbox verification module includes a simulation environment setup unit and a solution verification unit; The simulation environment building unit adopts containerization and network function virtualization technology to reproduce the topology and operating environment of the physical network. It reduces the difference between the simulation and the physical environment through virtual-real calibration algorithm, supports large-scale network simulation with 1000+ nodes, and the data interaction latency between nodes is ≤20ms. The solution verification unit is used to simulate the execution of fault handling solutions, test the effectiveness and feasibility of the solutions, and output verification data including fault recovery time, business impact, and resource utilization. A distributed scheduling architecture is adopted to achieve parallel verification of multiple schemes. Through task sharding and dynamic resource allocation mechanism, the verification efficiency is improved by more than 50%, and it supports the simultaneous verification of up to 8 candidate schemes.

[0010] Preferably, the intelligent predictive analysis module also includes a fault knowledge base unit; The fault knowledge base unit stores historical fault cases, fault characteristics, and handling solutions, with a cumulative number of cases ≥10,000, categorized and indexed by fault type and impact level; It supports online incremental learning based on new knowledge, and selects newly added fault data from the most recent 30 days through a sliding time window to fine-tune and update the hybrid prediction model. The model iteration cycle is ≤7 days, continuously improving prediction accuracy. The knowledge base supports natural language queries and can quickly match similar historical cases and optimal solutions based on fault characteristics, providing data support for the generation of candidate solutions.

[0011] Preferably, the sandbox verification module has a three-level verification standard, including basic function verification, performance index verification and business continuity verification. Basic function verification focuses on the completeness of the fault handling plan, requiring no omissions in core operations, no errors in configuration distribution, and a pass rate of ≥95%. Performance metrics verification focuses on fault recovery efficiency and resource consumption, requiring fault recovery time ≤30 seconds, resource utilization ≤60%, and a compliance rate ≥90%. The business continuity verification focuses on detecting the impact of the solution on normal business operations, requiring business interruption time ≤ 5 seconds and data packet loss rate ≤ 0.1%; When all three verifications meet the threshold requirements, the fault handling plan is deemed executable; otherwise, optimization suggestions are automatically generated.

[0012] A method for proactive fault prediction and sandbox validation of digital twin networks includes: Step S1: Construct a digital twin of the physical network through the network twin modeling module, and use the improved MQTT / OPCUA dual-mode synchronization protocol to realize the real-time mapping of the virtual and physical networks, and synchronously complete the initial calibration of the digital twin and the physical network. Step S2: The multi-source data acquisition module collects multi-dimensional operational data of the physical network. After preprocessing such as noise reduction, standardization and format conversion, the data is transmitted to the intelligent predictive analysis module through an encrypted transmission channel. Step S3: The intelligent prediction and analysis module analyzes the data using a hybrid model of "random forest feature screening + LSTM time series prediction" to generate fault prediction results and candidate disposal solutions. Step S4: The sandbox verification module uses containerization + NFV technology to build a simulation environment, performs parallel verification of candidate treatment solutions, and outputs a verification report including the solution's compliance status and optimization suggestions. Step S5: Select the optimal handling solution based on the verification report, push it to the network management terminal and record the entire process data, and update it to the fault knowledge base simultaneously.

[0013] Preferably, the fault prediction process in step S3 specifically includes: Step S31: Extract feature parameters from multi-dimensional operational data, including traffic fluctuation coefficient, device CPU utilization, link packet loss rate, cumulative port error code value, and memory usage growth rate. Step S32: Calculate the weights of each feature using the random forest algorithm, select key features with a weight ≥ 0.6, input them into the trained LSTM neural network, and calculate the probability of failure and confidence level. Step S33: When the probability of failure is ≥60% and the confidence level is ≥85%, it is judged as a high-risk failure, and 3-5 candidate handling solutions are generated; when the probability of failure is between 30% and 60% and the confidence level is ≥80%, it is judged as a medium-risk failure, and 2-3 candidate handling solutions are generated; when the probability of failure is <30%, it is judged as a low-risk failure, and early warning prompts and monitoring suggestions are output.

[0014] Preferably, the verification process in step S4 supports custom verification scenarios, and parameters such as network load rate, service type, and fault triggering timing can be set. The parameter adjustment range for the verification scenario includes network load rate of 20%-100%, more than 3 types of mainstream services, and fault triggering delay of 0-30 seconds. It supports saving and recalling scenario templates, and has preset three typical scenario templates: high-load voice service, high-definition video transmission, and mixed service concurrency, to adapt to the verification needs of different network operating conditions. During the verification process, the deviation between the simulation environment and the physical network is monitored in real time. When the deviation exceeds 5%, the calibration mechanism is automatically activated to ensure the authenticity of the verification results.

[0015] Compared with related technologies, the digital twin network active fault prediction and sandbox verification system and method provided by the present invention have the following beneficial effects: 1. This invention provides a digital twin network proactive fault prediction and sandbox verification system and method. Through the high-precision real-time mapping of the network twin modeling module and the accurate processing of the multi-source data acquisition module, combined with the incremental learning of the "random forest feature screening + LSTM time series prediction" hybrid model and the fault knowledge base, it can achieve accurate fault prediction 30-180 minutes in advance. This solves the problems of low accuracy, delayed early warning, insufficient data support, and reliance on post-event repair in traditional network fault prediction.

[0016] 2. This invention provides a digital twin network proactive fault prediction and sandbox verification system and method. It builds a high-fidelity simulation environment through containerization + NFV technology, adopts a three-level verification standard and a distributed parallel verification architecture, and efficiently verifies candidate disposal solutions and outputs optimization suggestions while ensuring service continuity. It solves the problems of low verification efficiency, high risk, lack of standardized evaluation system, and easy interruption of physical network services in traditional disposal solutions.

[0017] 3. This invention provides a digital twin network proactive fault prediction and sandbox verification system and method. Through a closed-loop management process of "prediction-verification-optimization", combined with flexible scenario adaptability and continuous iterative model optimization mechanism, it realizes the transformation of network operation and maintenance from passive response to proactive prevention and control, while reducing operation and maintenance costs and improving network operation stability. It solves the problems of traditional network operation and maintenance lacking continuous optimization mechanism, poor scenario adaptability, and difficulty in improving operation and maintenance efficiency and network reliability. Attached Figure Description

[0018] Figure 1 This is a flowchart of the present invention; Figure 2 This is an extended flowchart of the network twin modeling module of the present invention; Figure 3 This is an extended flowchart of the multi-source data acquisition module of the present invention; Figure 4 This is an extended flowchart of the intelligent predictive analysis module of the present invention; Figure 5 This is an extended flowchart of the sandbox verification module of the present invention. Detailed Implementation

[0019] The technical solutions of the present invention will be clearly and completely described below with reference to the accompanying drawings of the embodiments of the present invention. Obviously, the described embodiments are only some embodiments of the present invention, and not all embodiments. All other embodiments obtained by those skilled in the art based on the embodiments of the present invention without creative effort are within the scope of protection of the present invention.

[0020] Example 1: Please see Figures 1-5 The present invention provides a technical solution: a digital twin network active fault prediction and sandbox verification system, including a network twin modeling module, a multi-source data acquisition module, an intelligent prediction and analysis module, and a sandbox verification module; The network twin modeling module is used to construct a full-element digital twin of the physical network, enabling real-time mapping of the physical network topology, device status, and operating parameters; The network twin modeling module includes a topology mapping unit and a dynamic update unit; The topology mapping unit uses 3D modeling technology to restore network devices, link connections and deployment environment, with a modeling accuracy error of ≤1%, and supports digital mapping of fine-grained parameters such as device ports and link bandwidth; The dynamic update unit adopts an improved MQTT / OPCUA dual-mode synchronization protocol, which adjusts the data transmission frequency through an adaptive heartbeat mechanism to achieve real-time linkage between the digital twin and the physical network, with a data synchronization delay of ≤50ms. Digital twins support cross-level and cross-regional visualization of network status, and can view the operational details of a single device or the overall network topology through the drill-down function; In this implementation scheme, the topology mapping unit uses laser scanning to interface with equipment manufacturers, obtaining the precise dimensions, port layout, and link connections of physical network devices. Combined with a 3D rendering engine, it constructs a 1:1 digital twin visualization model with an error margin controlled within 1%, ensuring that fine-grained parameters such as device port speeds and link bandwidth limits perfectly match the physical devices. This meets the need for accurate access to detailed device parameters during fault prediction. The dynamic update unit employs an improved MQTT / OPCUA dual-mode synchronization protocol. For high-priority data, the OPCUA protocol ensures transmission reliability, while the MQTT protocol reduces transmission overhead for routine operating parameters. An adaptive heartbeat mechanism dynamically adjusts the transmission interval based on the frequency of changes in physical network data, ranging from a minimum of 100ms / time to a maximum of 10s / time, ensuring that data synchronization latency is stably controlled within 50ms. The visualization interface of the digital twin supports three-level display by backbone network, aggregation network and access network, and can also display cross-regional network topology by geographical region. Administrators can drill down to view real-time operating data such as device CPU usage, memory usage, and port traffic by clicking on device icons. The overall network status is presented intuitively through color marking, providing visual support for fault location.

[0021] The multi-source data acquisition module is used to collect traffic data, device log data, and link status data of the physical network, and output standardized multi-dimensional network operation data; The multi-source data acquisition module includes a data acquisition unit and a data preprocessing unit; The data acquisition unit integrates a network probe, an SNMPv3 protocol collector, and a log collection tool, covering data acquisition from the core network, access network, and terminal devices, and supports compliant parsing of encrypted data. The data preprocessing unit uses the 3σ criterion combined with the isolated forest algorithm to denoise the collected data, unifies the data units through Min-Max standardization, performs format conversion simultaneously, and removes outlier data at a rate of ≤3%. The sampling frequency is adjustable, ranging from 1 time / second to 1 time / minute. It supports dynamic adaptation of the sampling granularity according to network load, automatically increasing the sampling frequency in high-load scenarios and reducing the frequency in low-load scenarios to reduce resource consumption. In this implementation scheme, the data acquisition unit deploys distributed network probes covering key devices such as core switches, routers, and access points. It collects device operating parameters via the SNMPv3 protocol, which supports encryption authentication mechanisms to ensure data transmission security. The log collection tool adopts a Flume+Logstash architecture to collect device system logs, application logs, and security logs in real time. It supports compliant parsing of HTTPS encrypted logs and ensures the integrity of encrypted data extraction by connecting to the decryption interface provided by the device manufacturer. The data preprocessing unit first uses the 3σ criterion to identify and remove outliers that deviate from the mean by three times the standard deviation. Then, it further filters hidden outliers using the Isolation Forest algorithm. This dual denoising mechanism strictly controls the outlier removal rate to within 3%. Min-Max standardization maps data of different dimensions to the [0,1] interval, and the format conversion is uniformly set to JSON format for easy model invocation by the intelligent predictive analysis module. The dynamic adaptation logic of the collection frequency is triggered based on the network load rate threshold. When the CPU utilization of the core device is ≥70% or the link bandwidth utilization is ≥80%, it is determined to be a high load scenario, and the collection frequency is automatically increased to 1 time / second. When the load rate is below 30%, it is reduced to 1 time / minute, which ensures the timeliness of data while minimizing the occupation of network resources during the collection process.

[0022] The intelligent predictive analysis module analyzes multi-dimensional network operation data based on machine learning algorithms to predict network faults in advance and output fault prediction results and candidate handling solutions. The intelligent predictive analysis module includes a model training unit and a fault prediction unit; The model training unit adopts a hybrid fusion architecture of "random forest feature selection + LSTM time series prediction". First, the feature importance score is calculated by random forest to select key features with a weight ≥ 0.6. Then, the key features are input into the LSTM neural network for time series modeling. The model is trained based on more than 10,000 historical fault data, and the fault prediction accuracy is ≥ 92%. The fault prediction unit is used to analyze real-time network operation data, identify fault precursor characteristics, continuously monitor data trends through a sliding window algorithm, and output the predicted results of fault type, location and impact range 30-180 minutes in advance, supporting automatic classification of fault risk levels. The intelligent predictive analysis module also includes a fault knowledge base unit; The fault knowledge base unit stores historical fault cases, fault characteristics, and handling solutions, with a cumulative number of cases ≥10,000, categorized and indexed by fault type and impact level; It supports online incremental learning based on new knowledge, and selects newly added fault data from the most recent 30 days through a sliding time window to fine-tune and update the hybrid prediction model. The model iteration cycle is ≤7 days, continuously improving prediction accuracy. The knowledge base supports natural language queries and can quickly match similar historical cases and optimal handling solutions based on fault characteristics, providing data support for the generation of candidate solutions. In this implementation scheme, the hybrid fusion architecture of the model training unit uses the Scikit-learn library in Python to implement random forest feature selection, setting the number of decision trees to 100 and the maximum depth to 15 layers. The importance score of each feature is calculated using the Gini coefficient, and key features with a weight ≥0.6, such as traffic fluctuation coefficient and cumulative port error code value, are selected to reduce the interference of redundant features on model training. The LSTM neural network is built using the TensorFlow framework, with 3 hidden layers, 128 neurons in each layer, and a dropout rate of 0.2. It is trained based on more than 10,000 historical data covering 12 types of faults, including hardware failure, link interruption, and configuration error. An early stopping mechanism is used to avoid overfitting, and the fault prediction accuracy is finally stabilized at over 92%. The sliding window algorithm of the fault prediction unit is set with a window size of 5 minutes and a step size of 1 minute. It continuously monitors the changing trend of key features. When the feature parameters are detected to exceed the normal threshold range and show a continuous deterioration trend, the fault prediction process is started. Based on the threshold model trained on historical data, the prediction results are output 30-180 minutes in advance. The lead time for hardware fault prediction is 60-180 minutes, and the lead time for link congestion fault prediction is 30-60 minutes. The risk level is divided into three levels: high, medium and low, which correspond to different response priorities. The fault knowledge base unit uses a MySQL+Elasticsearch architecture to store data. MySQL is used for structured storage of basic information about fault cases, while Elasticsearch is used for full-text retrieval of fault characteristics and handling solutions, supporting indexing by fault type and impact level. Online incremental learning selects newly added fault data from the last 30 days through a sliding time window and updates the output layer parameters of the LSTM model using a fine-tuning method. The model iteration cycle is controlled within 7 days, and each iteration can improve the prediction accuracy by 0.5%-1%. The natural language query function supports searching by Chinese keywords, matching similar historical cases based on the TF-IDF algorithm, and returning the Top 5 optimal handling solutions and execution effect data, providing direct reference for candidate solution generation.

[0023] The sandbox verification module is used to build a virtual simulation environment to simulate and verify the fault prediction results and corresponding fault handling solutions, and output a verification report that includes the effectiveness of the solution and the degree of business impact, forming a closed-loop management of "prediction-verification-optimization". The sandbox verification module includes a simulation environment setup unit and a solution verification unit; The simulation environment building unit adopts containerization and network function virtualization technology to reproduce the topology and operating environment of the physical network. It reduces the difference between the simulation and the physical environment through virtual-real calibration algorithm, supports large-scale network simulation with 1000+ nodes, and the data interaction latency between nodes is ≤20ms. The solution verification unit is used to simulate the execution of fault handling solutions, test the effectiveness and feasibility of the solutions, and output verification data including fault recovery time, business impact, and resource utilization. A distributed scheduling architecture is adopted to achieve parallel verification of multiple schemes. Through task sharding and dynamic resource allocation mechanism, the verification efficiency is improved by more than 50%, and it supports the simultaneous verification of up to 8 candidate schemes. The sandbox verification module has a three-level verification standard, including basic function verification, performance indicator verification, and business continuity verification. Basic function verification focuses on the completeness of the fault handling plan, requiring no omissions in core operations, no errors in configuration distribution, and a pass rate of ≥95%. Performance metrics verification focuses on fault recovery efficiency and resource consumption, requiring fault recovery time ≤30 seconds, resource utilization ≤60%, and a compliance rate ≥90%. The business continuity verification focuses on detecting the impact of the solution on normal business operations, requiring business interruption time ≤ 5 seconds and data packet loss rate ≤ 0.1%; When all three verifications meet the threshold requirements, the fault handling plan is deemed executable; for plans that do not meet the requirements, optimization suggestions are automatically generated. In this implementation plan, the simulation environment building unit uses Docker containerization technology to deploy network function modules and combines NFV technology to virtualize network resources. By reading the topology data and operating parameters of the network twin, it reproduces the device connection relationships, bandwidth limitations, and service deployment of the physical network in a 1:1 manner. The virtual-real calibration algorithm compares the key indicators of the simulation environment and the physical network every 10 minutes. When the deviation exceeds 3%, the simulation parameters are automatically adjusted to ensure that the difference is controlled within an acceptable range. It supports large-scale network simulation with 1000+ nodes, and the data interaction latency between nodes is controlled within 20ms by optimizing the forwarding mechanism of the network virtualization layer, meeting the fault verification requirements of large enterprise networks and campus networks. The solution verification unit simulates the execution of candidate solutions in four steps: "fault injection - solution execution - indicator monitoring - result statistics". Fault injection simulates target fault scenarios through scripts. Indicator monitoring adopts the Prometheus + Grafana architecture to collect data such as fault recovery time, resource utilization, and business interruption duration in real time. The distributed scheduling architecture is implemented based on Kubernetes, which shards the verification tasks of multiple candidate solutions to different computing nodes and dynamically allocates CPU and memory resources according to the node resource utilization, improving the verification efficiency by more than 50% compared with traditional serial verification. It supports the simultaneous verification of 8 candidate solutions, which significantly shortens the solution selection cycle. The Level 3 verification standard uses an AND condition triggering logic. Basic function verification checks the accuracy of configuration command issuance and the completeness of operation steps in the handling plan through automated scripts, with a pass rate of over 95%. Performance indicator verification requires that the fault recovery time, from the start of plan execution to the restoration of network service, should not exceed 30 seconds, and the CPU and memory utilization of core devices should not exceed 60% during plan execution. Both indicators must be met to pass. Business continuity verification monitors the business interruption time and data packet loss rate during plan execution by simulating voice calls, video transmissions, and file downloads. Plans that fail to meet the standards will output optimization suggestions based on the specific non-compliance items, such as "fault recovery time is too long, it is recommended to optimize the execution order of configuration commands" and "resource utilization exceeds the standard, it is recommended to reduce redundant operation steps."

[0024] A method for proactive fault prediction and sandbox validation of digital twin networks includes: Step S1: Construct a digital twin of the physical network through the network twin modeling module, and use the improved MQTT / OPCUA dual-mode synchronization protocol to realize the real-time mapping of the virtual and physical networks, and synchronously complete the initial calibration of the digital twin and the physical network. Step S2: The multi-source data acquisition module collects multi-dimensional operational data of the physical network. After preprocessing such as noise reduction, standardization and format conversion, the data is transmitted to the intelligent predictive analysis module through an encrypted transmission channel. Step S3: The intelligent prediction and analysis module analyzes the data using a hybrid model of "random forest feature screening + LSTM time series prediction" to generate fault prediction results and candidate disposal solutions. The fault prediction process in step S3 specifically includes: Step S31: Extract feature parameters from multi-dimensional operational data, including traffic fluctuation coefficient, device CPU utilization, link packet loss rate, cumulative port error code value, and memory usage growth rate. Step S32: Calculate the weights of each feature using the random forest algorithm, select key features with a weight ≥ 0.6, input them into the trained LSTM neural network, and calculate the probability of failure and confidence level. Step S33: When the probability of failure is ≥60% and the confidence level is ≥85%, it is judged as a high-risk failure, and 3-5 candidate handling solutions are generated; when the probability of failure is between 30% and 60% and the confidence level is ≥80%, it is judged as a medium-risk failure, and 2-3 candidate handling solutions are generated; when the probability of failure is <30%, it is judged as a low-risk failure, and early warning prompts and monitoring suggestions are output. In this implementation scheme, the feature parameters extracted in step S31 are all core precursor indicators of network faults. The traffic fluctuation coefficient is obtained by calculating the ratio of the standard deviation to the mean of traffic over a 5-minute period. The device CPU utilization and memory usage growth rate are calculated using a moving average algorithm to determine their trends over a 10-minute period. The link packet loss rate is calculated by statistically analyzing the ratio of lost data packets to total data packets per unit time. The cumulative port error code value is summarized hourly, representing the number of error codes at the physical layer and data link layer, ensuring that the feature parameters comprehensively reflect the network's operational status. In the random forest feature selection process of step S32, the selected key features are normalized and then input into the trained LSTM neural network. The model calculates the temporal variation of the feature parameters and outputs the probability of fault occurrence and confidence level. The confidence level reflects the reliability of the model's prediction results and is calculated backwards based on historical prediction accuracy. Step S33's risk level determination logic is based on the scope of the fault's impact. High-risk faults refer to situations that may cause core business interruptions, generating 3-5 candidate handling solutions covering different strategies such as rapid recovery, resource redundancy, and business switching. Medium-risk faults refer to situations that affect local businesses or some users, generating 2-3 more targeted solutions. Low-risk faults refer to situations that only have minor anomalies and do not affect business operations for the time being, outputting early warning prompts and monitoring suggestions, without needing to generate handling solutions, thus reducing unnecessary operational costs.

[0025] Step S4: The sandbox verification module uses containerization + NFV technology to build a simulation environment, performs parallel verification of candidate treatment solutions, and outputs a verification report including the solution's compliance status and optimization suggestions. The verification process in step S4 supports custom verification scenarios, and parameters such as network load rate, service type, and fault triggering timing can be set. The parameter adjustment range for the verification scenario includes network load rate of 20%-100%, more than 3 types of mainstream services, and fault triggering delay of 0-30 seconds. It supports saving and recalling scenario templates, and has preset three typical scenario templates: high-load voice service, high-definition video transmission, and mixed service concurrency, to adapt to the verification needs of different network operating conditions. During the verification process, the deviation between the simulation environment and the physical network is monitored in real time. When the deviation exceeds 5%, the calibration mechanism is automatically activated to ensure the authenticity of the verification results. In this implementation scheme, the customized verification scenario in step S4 allows users to configure parameters through a visual interface. The network load rate can be adjusted by simulating the number of virtual users. The service types cover more than three mainstream types, including voice, high-definition video, file transfer, and web services. The fault trigger delay can be set from 0 to 30 seconds to adapt to the verification requirements of different fault occurrence scenarios. Three preset typical scenario templates are optimized for high-frequency service scenarios: the high-load voice service template simulates 1000 concurrent VoIP calls, focusing on monitoring call latency and jitter; the high-definition video transmission template simulates 500 concurrent 4K video streams, focusing on monitoring bandwidth usage and stuttering; and the mixed service concurrency template mixes voice, video, and file transfer services in a 3:4:3 ratio to simulate the service distribution of a real network. A virtual-to-real deviation calibration mechanism is implemented during the verification process. By periodically synchronizing real-time data from the physical network to the simulation environment, when the deviation between the simulation environment's link latency, packet loss rate, and other indicators and the physical network exceeds 5%, the network topology parameters of the simulation environment are automatically adjusted to ensure that the verification results accurately reflect the implementation effect of the solution in the physical network, avoiding misjudgments of the solution due to differences between the simulation and actual environments.

[0026] Step S5: Select the optimal handling solution based on the verification report, push it to the network management terminal and record the entire process data, and update it to the fault knowledge base simultaneously; In this implementation plan, the selection of the optimal solution adopts a weighted scoring method, based on three core indicators in the verification report. The weights are assigned to these indicators: solution effectiveness (50%), business impact (30%), and resource utilization (20%). Each indicator is scored according to its compliance status, and the solution with the highest total score is the optimal solution. The selected optimal solution is pushed to the network management terminal via a RESTful API interface, and an operation guide document is generated, including the solution execution steps, precautions, and emergency rollback strategies, facilitating rapid execution by operations and maintenance personnel. The entire process is recorded, encompassing data collection, fault prediction, solution verification, and solution execution, including collection timestamps, feature parameter values, prediction results, verification indicators, and execution effects. This data is synchronously updated to the fault knowledge base as training data for subsequent incremental model learning and as historical case references, continuously improving the closed-loop management system of "prediction-verification-optimization".

[0027] It should be noted that, in this document, relational terms such as "first" and "second" are used merely to distinguish one entity or operation from another, and do not necessarily require or imply any such actual relationship or order between these entities or operations. Furthermore, the terms "comprising," "including," or any other variations thereof are intended to cover non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements includes not only those elements but also other elements not expressly listed, or elements inherent to such a process, method, article, or apparatus. Without further limitations, the phrase "comprising an element defined as..." does not exclude the presence of other identical elements in the process, method, article, or apparatus that includes the element.

[0028] Although embodiments of the invention have been shown and described, it will be understood by those skilled in the art that various changes, modifications, substitutions and variations can be made to these embodiments without departing from the principles and spirit of the invention, the scope of which is defined by the appended claims and their equivalents.

Claims

1. A digital twin network active fault prediction and sandbox verification system, characterized in that: It includes a network twin modeling module, a multi-source data acquisition module, an intelligent predictive analysis module, and a sandbox verification module; The network twin modeling module is used to construct a full-element digital twin of the physical network, enabling real-time mapping of the physical network topology, device status, and operating parameters; The multi-source data acquisition module is used to collect traffic data, device log data, and link status data of the physical network, and output standardized multi-dimensional network operation data; The intelligent predictive analysis module analyzes multi-dimensional network operation data based on machine learning algorithms to predict network faults in advance and output fault prediction results and candidate handling solutions. The sandbox verification module is used to build a virtual simulation environment to simulate and verify the fault prediction results and corresponding fault handling solutions, and output a verification report that includes the effectiveness of the solution and the degree of business impact, forming a closed-loop management of "prediction-verification-optimization".

2. The digital twin network active fault prediction and sandbox verification system according to claim 1, characterized in that: The network twin modeling module includes a topology mapping unit and a dynamic update unit; The topology mapping unit uses 3D modeling technology to restore network devices, link connections and deployment environment, with a modeling accuracy error of ≤1%, and supports digital mapping of fine-grained parameters such as device ports and link bandwidth; The dynamic update unit adopts an improved MQTT / OPCUA dual-mode synchronization protocol, which adjusts the data transmission frequency through an adaptive heartbeat mechanism to achieve real-time linkage between the digital twin and the physical network, with a data synchronization delay of ≤50ms. Digital twins support cross-level and cross-regional visualization of network status, and can be used to view the operational details of a single device or the overall network topology through the drill-down function.

3. The digital twin network active fault prediction and sandbox verification system according to claim 1, characterized in that: The multi-source data acquisition module includes a data acquisition unit and a data preprocessing unit; The data acquisition unit integrates a network probe, an SNMPv3 protocol collector, and a log collection tool, covering data acquisition from the core network, access network, and terminal devices, and supports compliant parsing of encrypted data. The data preprocessing unit uses the 3σ criterion combined with the isolated forest algorithm to denoise the collected data, unifies the data units through Min-Max standardization, performs format conversion simultaneously, and removes outlier data at a rate of ≤3%. The sampling frequency is adjustable, ranging from 1 time / second to 1 time / minute. It supports dynamic adaptation of the sampling granularity based on network load, automatically increasing the sampling frequency in high-load scenarios and reducing the frequency in low-load scenarios to reduce resource consumption.

4. The active fault prediction and sandbox verification system for digital twin networks according to claim 1, characterized in that: The intelligent predictive analysis module includes a model training unit and a fault prediction unit; The model training unit adopts a hybrid fusion architecture of "random forest feature selection + LSTM time series prediction". First, the feature importance score is calculated by random forest to select key features with a weight ≥ 0.

6. Then, the key features are input into the LSTM neural network for time series modeling. The model is trained based on more than 10,000 historical fault data, and the fault prediction accuracy is ≥ 92%. The fault prediction unit is used to analyze real-time network operation data, identify fault precursor characteristics, continuously monitor data trends through a sliding window algorithm, and output the predicted results of fault type, location and impact range 30-180 minutes in advance, supporting automatic classification of fault risk levels.

5. The digital twin network active fault prediction and sandbox verification system according to claim 1, characterized in that: The sandbox verification module includes a simulation environment setup unit and a solution verification unit; The simulation environment building unit adopts containerization and network function virtualization technology to reproduce the topology and operating environment of the physical network. It reduces the difference between the simulation and the physical environment through virtual-real calibration algorithm, supports large-scale network simulation with 1000+ nodes, and the data interaction latency between nodes is ≤20ms. The solution verification unit is used to simulate the execution of fault handling solutions, test the effectiveness and feasibility of the solutions, and output verification data including fault recovery time, business impact, and resource utilization. A distributed scheduling architecture is adopted to achieve parallel verification of multiple schemes. Through task sharding and dynamic resource allocation mechanism, the verification efficiency is improved by more than 50%, and it supports the simultaneous verification of up to 8 candidate schemes.

6. The digital twin network active fault prediction and sandbox verification system according to claim 1, characterized in that: The intelligent predictive analysis module also includes a fault knowledge base unit; The fault knowledge base unit stores historical fault cases, fault characteristics, and handling solutions, with a cumulative number of cases ≥10,000, categorized and indexed by fault type and impact level; It supports online incremental learning based on new knowledge, and selects newly added fault data from the most recent 30 days through a sliding time window to fine-tune and update the hybrid prediction model. The model iteration cycle is ≤7 days, continuously improving prediction accuracy. The knowledge base supports natural language queries and can quickly match similar historical cases and optimal solutions based on fault characteristics, providing data support for the generation of candidate solutions.

7. The digital twin network active fault prediction and sandbox verification system according to claim 1, characterized in that: The sandbox verification module has a three-level verification standard, including basic function verification, performance indicator verification, and business continuity verification. Basic function verification focuses on the completeness of the fault handling plan, requiring no omissions in core operations, no errors in configuration distribution, and a pass rate of ≥95%. Performance metrics verification focuses on fault recovery efficiency and resource consumption, requiring fault recovery time ≤30 seconds, resource utilization ≤60%, and a compliance rate ≥90%. The business continuity verification focuses on detecting the impact of the solution on normal business operations, requiring business interruption time ≤ 5 seconds and data packet loss rate ≤ 0.1%; When all three verifications meet the threshold requirements, the fault handling plan is deemed executable. The system automatically generates optimization suggestions for solutions that do not meet the standards.

8. A method for proactive fault prediction and sandbox verification of digital twin networks, applied to a digital twin network proactive fault prediction and sandbox verification system as described in any one of claims 1-7, characterized in that, include: Step S1: Construct a digital twin of the physical network through the network twin modeling module, and use the improved MQTT / OPCUA dual-mode synchronization protocol to realize the real-time mapping of the virtual and physical networks, and synchronously complete the initial calibration of the digital twin and the physical network. Step S2: The multi-source data acquisition module collects multi-dimensional operational data of the physical network. After preprocessing such as noise reduction, standardization and format conversion, the data is transmitted to the intelligent predictive analysis module through an encrypted transmission channel. Step S3: The intelligent prediction and analysis module analyzes the data using a hybrid model of "random forest feature screening + LSTM time series prediction" to generate fault prediction results and candidate disposal solutions. Step S4: The sandbox verification module uses containerization + NFV technology to build a simulation environment, performs parallel verification of candidate treatment solutions, and outputs a verification report including the solution's compliance status and optimization suggestions. Step S5: Select the optimal handling solution based on the verification report, push it to the network management terminal and record the entire process data, and update it to the fault knowledge base simultaneously.

9. The method for proactive fault prediction and sandbox verification of a digital twin network according to claim 8, characterized in that: The fault prediction process in step S3 specifically includes: Step S31: Extract feature parameters from multi-dimensional operational data, including traffic fluctuation coefficient, device CPU utilization, link packet loss rate, cumulative port error code value, and memory usage growth rate. Step S32: Calculate the weights of each feature using the random forest algorithm, select key features with a weight ≥ 0.6, input them into the trained LSTM neural network, and calculate the probability of failure and confidence level. Step S33: When the probability of failure is ≥60% and the confidence level is ≥85%, it is judged as a high-risk failure, and 3-5 candidate handling solutions are generated; when the probability of failure is between 30% and 60% and the confidence level is ≥80%, it is judged as a medium-risk failure, and 2-3 candidate handling solutions are generated; when the probability of failure is <30%, it is judged as a low-risk failure, and early warning prompts and monitoring suggestions are output.

10. The method for proactive fault prediction and sandbox verification of a digital twin network according to claim 8, characterized in that: The verification process in step S4 supports custom verification scenarios, and parameters such as network load rate, service type, and fault triggering timing can be set. The parameter adjustment range for the verification scenario includes network load rate of 20%-100%, more than 3 types of mainstream services, and fault triggering delay of 0-30 seconds. It supports saving and recalling scenario templates, and has preset three typical scenario templates: high-load voice service, high-definition video transmission, and mixed service concurrency, to adapt to the verification needs of different network operating conditions. During the verification process, the deviation between the simulation environment and the physical network is monitored in real time. When the deviation exceeds 5%, the calibration mechanism is automatically activated to ensure the authenticity of the verification results.