An AI-based method and system for adaptive archiving and backup of massive data

By using AI multimodal data evaluation and deep reinforcement learning models, an adaptive backup strategy is constructed, which solves the problems of insufficient intelligence and inefficient collaboration in existing technologies, and realizes intelligent management and control of the entire process of massive data backup and improves disaster recovery efficiency.

CN122240399APending Publication Date: 2026-06-19GUILIN UNIVERSITY OF TECHNOLOGY +1

Patent Information

Authority / Receiving Office
CN · China
Patent Type
Applications(China)
Current Assignee / Owner
GUILIN UNIVERSITY OF TECHNOLOGY
Filing Date
2026-03-18
Publication Date
2026-06-19

AI Technical Summary

Technical Problem

Existing massive data backup technologies suffer from shallow intelligence, inefficient edge-cloud collaboration, static backup strategies, and insufficient disaster recovery capabilities. They are unable to adapt to dynamic changes in data types and business needs, resulting in limited resource utilization and backup efficiency.

Method used

By employing an AI multimodal data evaluation model and a deep reinforcement learning model, a data feature matrix is ​​constructed and an adaptive backup strategy is generated, enabling intelligent transfer and storage under a three-tier architecture of edge, cloud, and endpoint. Combined with fault prediction and self-healing mechanisms, the dynamic optimization capability of the backup strategy is improved.

🎯Benefits of technology

It enables intelligent management and control of the entire process of massive data backup, improves resource utilization and disaster recovery speed, ensures data consistency and integrity, and adapts to backup needs in complex scenarios.

✦ Generated by Eureka AI based on patent content.

Smart Images

  • Figure CN122240399A_ABST
    Figure CN122240399A_ABST
Patent Text Reader

Abstract

This invention relates to the field of intelligent storage, specifically disclosing an AI-based method and system for adaptive archiving and backup of massive data. The method includes: collecting massive data and metadata from a three-tier architecture (edge, cloud, and endpoint) to construct a feature matrix; outputting backup priorities through an AI multimodal evaluation model; generating adaptive backup strategies through a deep reinforcement learning model; executing archiving and backup and intelligent data flow under the three-tier architecture; and simultaneously achieving node failure prediction, automatic switching, and data consistency verification. This invention solves the technical problems of low intelligence in existing massive data backup systems, insufficient edge-cloud collaboration, static backup strategies, and low disaster recovery efficiency, achieving AI-driven adaptive archiving and backup of massive data and intelligent management throughout the entire process under a three-tier architecture (edge, cloud, and endpoint).
Need to check novelty before this filing date? Find Prior Art

Description

Technical Field

[0001] This invention relates to the field of intelligent storage technology, specifically to a method and system for adaptive archiving and backup of massive amounts of data based on AI. Background Technology

[0002] In the field of intelligent storage, massive data archiving and backup is a core component for ensuring data security and long-term management. The edge-cloud three-tier architecture is widely adopted due to its ability to balance storage performance and cost-effectiveness, and AI technology is gradually being integrated to improve backup automation. While existing massive data backup methods and systems have incorporated AI and edge-cloud architecture, significant shortcomings still exist: The AI ​​applications suffer from several shortcomings: First, they are superficially intelligent and reliant on manual intervention. Current AI applications largely focus on basic data feature recognition, lacking multimodal data fusion and evaluation capabilities. They require manual configuration of backup parameters, cannot automatically determine the necessity and priority of data backups, and struggle to adapt to dynamic changes in data types and business needs. Second, there are gaps in edge-cloud collaboration. Data and AI capabilities flow poorly between the three-tier architecture. Edge and terminal AI models are rigid and have poor resource adaptability. Cloud AI capabilities cannot be effectively deployed, creating data silos and intelligence gaps, limiting overall backup efficiency. Third, backup strategies are static and lack optimization. A multi-objective dynamic optimization mechanism has not been built, making it impossible to adjust backup frequency, number of replicas, and storage tiers based on real-time conditions such as network bandwidth, storage resources, and data popularity. Resource utilization and backup rationality are insufficient. Fourth, disaster recovery capabilities are weak. A full-process fault prediction and self-healing mechanism is lacking, data consistency verification methods are limited, making it difficult to balance recovery speed and data integrity, and unable to meet disaster recovery needs in complex scenarios.

[0003] In summary, existing technologies suffer from insufficient intelligence, inefficient edge-cloud collaboration, rigid backup strategies, and inadequate disaster recovery capabilities. There is an urgent need for an AI-driven edge-cloud collaborative adaptive archiving and backup solution to achieve intelligent management and control of the entire process of massive data backup. Summary of the Invention

[0004] To address the aforementioned problems in existing technologies, this invention provides an AI-based method and system for adaptive archiving and backup of massive data, solving issues such as insufficient intelligence and inefficient collaboration in existing massive data backup systems, and improving intelligent backup management and disaster recovery efficiency.

[0005] To achieve the above objectives, this invention proposes an AI-based adaptive archiving and backup method for massive amounts of data, comprising: S1. Collect massive amounts of data and corresponding metadata from each layer of the edge-cloud three-tier architecture, and construct a data feature matrix. The expression of the data feature matrix is ​​as follows: ; Where S is the file size, T is the data type, C is the creation time, F is the modification frequency, H is the access popularity, B is the business tag, L is the compliance level, and O is the data source; S2. Input the data feature matrix into the AI ​​multimodal data evaluation model, output the data backup priority, and the output result is a quantized value of 0-10. The backup priority is: ; S3. Construct a backup policy optimizer based on a deep reinforcement learning model. Using the backup priority as input, generate an adaptive backup policy. The reward function of the deep reinforcement learning model is: ; in, R1 is the weighting coefficient, R2 is the reliability parameter, R3 is the performance parameter, and R4 is the cost parameter. S4. According to the adaptive backup strategy, perform massive data archiving and backup operations under the three-level architecture of end-edge-cloud to realize intelligent data flow and storage between different levels.

[0006] Preferably, in S2, the AI ​​multimodal data evaluation model integrates a deep learning model and a knowledge graph. The deep learning model is an LSTM model or a Transformer model, and the knowledge graph is used to construct a dependency network between data in various dimensions.

[0007] Preferably, in S3, the state space of the deep reinforcement learning model includes data volume, network bandwidth, storage resource utilization, node health status and backup task queue, and the action space includes backup frequency adjustment, replica number change, storage level migration and backup node switching.

[0008] Preferably, in S4, the three-tier architecture of end-edge-cloud archive backup operation specifically involves the terminal layer performing local cache backup of critical data and synchronizing it to the edge layer in real time; the edge layer performing a combination of incremental and differential backup operations on the received data and periodically synchronizing it to the cloud layer; and the cloud layer performing full backup and long-term archive storage on the received data.

[0009] Preferably, it also includes real-time monitoring of the operating status of backup nodes at each level in the three-tier architecture of end-edge-cloud, and automatically triggering data migration and backup node switching operations when the probability of node failure is detected to reach a preset threshold.

[0010] A system for adaptive archiving and backup of massive amounts of data based on AI is also proposed, including a data acquisition module, an AI multimodal evaluation module, a reinforcement learning optimization module, and an edge-cloud collaborative backup module. The data acquisition module is used to collect massive amounts of data and corresponding metadata at each level of the three-tier architecture of edge-cloud, and construct the data feature matrix described in S1. The AI ​​multimodal evaluation module is used to receive the data feature matrix and output the backup priority mentioned in S2 through the AI ​​multimodal data evaluation model. The reinforcement learning optimization module is used to build a backup strategy optimizer based on a deep reinforcement learning model, and to generate an adaptive backup strategy by taking the backup priority as input. The deep reinforcement learning model adopts the reward function described in S3. The edge-cloud collaborative backup module is used to perform massive data archiving and backup under the edge-cloud three-level architecture described in S4 according to the adaptive backup strategy, so as to realize intelligent data flow and storage across levels.

[0011] Preferably, the AI ​​multimodal evaluation module includes a model deployment unit, which performs knowledge distillation processing on the AI ​​multimodal data evaluation model to adapt to the hardware resource deployment requirements of the edge layer and the terminal layer.

[0012] Preferably, the reinforcement learning optimization module includes a distributed scheduling unit, which deploys a multi-agent reinforcement learning framework, with each node in the edge layer deploying an independent agent and the cloud layer deploying a globally coordinated agent.

[0013] Preferably, the edge-cloud collaborative backup module includes a data consistency verification unit, which generates a unique blockchain certificate for each backup data block and performs a data backup integrity verification operation.

[0014] Preferably, it also includes a fault prediction and self-healing module, which monitors the operating status of nodes at each level through an AI anomaly detection model, constructs a virtual model of the system in conjunction with digital twin technology, and performs advance prediction and automatic recovery operations for node faults.

[0015] Therefore, this invention proposes an AI-based adaptive archiving and backup method and system for massive data, with the following beneficial effects: (1) The backup strategy is dynamically generated through multimodal AI evaluation and deep reinforcement learning. Combined with the intelligent collaboration and data flow of the three-level architecture of end-edge-cloud, it adapts to the real-time changes in data and resource status, and greatly improves the utilization rate of backup resources and the overall backup efficiency of the architecture.

[0016] (2) By integrating AI fault prediction, digital twin and blockchain verification technologies, we can achieve early warning and automatic self-healing of node faults, ensure the integrity of backup data, significantly improve the response speed and data consistency of disaster recovery, and realize intelligent management and control of the entire process of massive data backup.

[0017] The technical solution of the present invention will be further described in detail below with reference to the accompanying drawings and embodiments. Attached Figure Description

[0018] Figure 1 This is an overall flowchart of a method and system for adaptive archiving and backup of massive amounts of data based on AI, according to the present invention. Figure 2 This is a schematic diagram of the data flow and archiving backup of a three-tier architecture (edge, cloud, and end) of a method and system for adaptive archiving and backup of massive data based on AI, as described in this invention. Figure 3 This is a schematic diagram of the AI-driven backup priority evaluation and strategy optimization logic of the AI-based adaptive archiving and backup method and system for massive data according to the present invention. Figure 4 This is a schematic diagram of the fault prediction, self-healing and data consistency verification mechanism of the method and system for adaptive archiving and backup of massive data based on AI of the present invention, wherein (a) is a schematic diagram of the data consistency verification mechanism and (b) is a schematic diagram of the fault prediction and self-healing mechanism. Detailed Implementation

[0019] To make the technical solutions, advantages, and objectives of the present invention clearer, the technical solutions of the embodiments of the present invention will be clearly and completely described below. The described embodiments are only some, not all, of the embodiments of the present invention. All other embodiments obtained by those skilled in the art based on the described embodiments of the present invention without creative effort are within the protection scope of this application.

[0020] Unless otherwise defined, the technical or scientific terms used in this invention shall have the ordinary meaning as understood by one of ordinary skill in the art to which this invention pertains.

[0021] like Figures 1-4 As shown, this invention provides a method and system for adaptive archiving and backup of massive amounts of data based on AI.

[0022] This invention provides an AI-based adaptive archiving and backup method for massive amounts of data, comprising: S1. Collect massive amounts of data and corresponding metadata from each layer of the edge-cloud three-tier architecture, and construct a data feature matrix. The expression for the data feature matrix is ​​as follows: ; Where S is the file size, T is the data type, C is the creation time, F is the modification frequency, H is the access popularity, B is the business tag, L is the compliance level, and O is the data source; S2. Input the data feature matrix into the AI ​​multimodal data evaluation model, and output the data backup priority. The output result is a quantized value from 0 to 10, and the backup priority is: ; The AI ​​multimodal data evaluation model integrates deep learning models and knowledge graphs. The deep learning model is either an LSTM model or a Transformer model, and the knowledge graph is used to construct a dependency network between data in various dimensions.

[0023] S3. Construct a backup policy optimizer based on a deep reinforcement learning model. Taking backup priority as input, generate an adaptive backup policy. The reward function of the deep reinforcement learning model is: ; in, R1 is the weighting coefficient, R2 is the reliability parameter, R3 is the performance parameter, and R4 is the cost parameter. The state space of a deep reinforcement learning model includes data volume, network bandwidth, storage resource utilization, node health status, and backup task queue, while the action space includes backup frequency adjustment, replica number change, storage tier migration, and backup node switching.

[0024] S4. Following the adaptive backup strategy, perform massive data archiving and backup operations under the three-tier architecture of end-edge-cloud to achieve intelligent data flow and storage between different levels.

[0025] The three-tier architecture of endpoint, edge, and cloud performs archiving and backup operations as follows: the endpoint layer performs local cache backup of critical data and synchronizes it to the edge layer in real time; the edge layer performs a combination of incremental and differential backup operations on the received data and synchronizes it to the cloud layer periodically; and the cloud layer performs full backup and long-term archiving storage on the received data.

[0026] The AI-based adaptive archiving and backup method for massive data proposed in this invention also includes real-time monitoring of the operating status of backup nodes at each level in the three-tier architecture of edge-cloud. When the probability of node failure is detected to reach a preset threshold, data migration and backup node switching operations are automatically triggered.

[0027] This invention proposes an AI-based system for adaptive archiving and backup of massive amounts of data, comprising a data acquisition module, an AI multimodal evaluation module, a reinforcement learning optimization module, and an edge-cloud collaborative backup module.

[0028] The data acquisition module is used to collect massive amounts of data and corresponding metadata from each layer of the edge-cloud three-tier architecture to construct the data feature matrix of S1. The AI ​​multimodal evaluation module is used to receive the data feature matrix and output the backup priority of S2 through the AI ​​multimodal data evaluation model. The reinforcement learning optimization module is used to build a backup strategy optimizer based on a deep reinforcement learning model. It takes backup priority as input to generate an adaptive backup strategy. The deep reinforcement learning model adopts the S3 reward function. The reinforcement learning optimization module includes a distributed scheduling unit, which deploys a multi-agent reinforcement learning framework. Each node in the edge layer deploys an independent agent, and the cloud layer deploys a globally coordinated agent.

[0029] The edge-cloud collaborative backup module is used to perform massive data archiving and backup under the S4 edge-cloud three-level architecture according to the adaptive backup strategy, so as to realize intelligent data flow and storage across levels.

[0030] The edge-cloud collaborative backup module includes a data consistency verification unit, which generates a unique blockchain certificate for each backup data block and performs data backup integrity verification operations.

[0031] The AI ​​multimodal evaluation module includes a model deployment unit, which performs knowledge distillation on the AI ​​multimodal data evaluation model to adapt to the hardware resource deployment requirements of the edge layer and the terminal layer.

[0032] The present invention proposes an AI-based system for adaptive archiving and backup of massive data, which also includes a fault prediction and self-healing module. The fault prediction and self-healing module monitors the operating status of nodes at each level through an AI anomaly detection model, and constructs a virtual model of the system by combining digital twin technology to perform advance prediction and automatic recovery operations for node faults.

[0033] This invention takes the production data archiving and backup scenario of a smart manufacturing enterprise as an example. This enterprise includes 500 production devices (terminal layer), 10 workshop edge servers (edge ​​layer), and 1 enterprise private cloud platform (cloud layer). It needs to securely archive and back up massive amounts of heterogeneous data such as equipment sensor data, production scheduling data, and quality inspection data. The specific implementation process is as follows: Data Acquisition and Feature Matrix Construction: The system uses a data acquisition module to collect sensor data from terminal-level production equipment in real time (file size S=200KB / file, data type T=time series data, creation time C=2024-XX-XX10:05:30, modification frequency F=real-time update, access frequency H=3 times per hour, business tag B=key production data, compliance level L=level 2, data source O=equipment A101). It also synchronously collects workshop-level statistical data aggregated at the edge layer and historical archived data from the cloud layer, according to the formula... Construct the data feature matrix for each data set.

[0034] Backup Priority Assessment: After receiving the data feature matrix, the AI ​​multimodal assessment module analyzes the temporal variation patterns of the data using an LSTM model optimized by knowledge distillation. Combined with the dependency network of "equipment data - production process - quality control" constructed from the knowledge graph, it outputs the backup priority. Among them, the priority quantification value of critical production data is 8.5 points, and the priority of ordinary log data is 3.2 points.

[0035] Adaptive Backup Strategy Generation: The reinforcement learning optimization module takes backup priority as input and constructs a policy optimizer based on a deep reinforcement learning model. Its state space includes parameters such as the current workshop network bandwidth (100Mbps), edge server storage utilization (65%), and backup task queue length (12 tasks). The action space outputs a policy of "incremental backup of critical data every 30 minutes, 2 replicas, and storage tiers of edge + cloud." The reward function is calculated based on... Computational optimization.

[0036] Edge-cloud collaborative archiving and backup: The edge-cloud collaborative backup module executes the strategy. Terminal device A101 caches and backs up key data locally and then synchronizes it to the edge server in workshop 1 in real time. The edge layer performs incremental and differential backups on the received data and synchronizes the summarized data to the cloud at 3:00 AM every day. The cloud layer performs a full backup and stores it to the cold archive node. The data consistency verification unit generates a unique blockchain certificate for each backup data block.

[0037] Fault prediction and self-healing: The fault prediction and self-healing module monitors the status of the edge server in workshop 3 in real time through the AI ​​anomaly detection model. When the probability of failure is detected to be 85% (preset threshold 80%), the module automatically triggers data migration to the backup edge node after simulating the impact of the failure using a digital twin virtual model. There is no data loss during the switching process, and the recovery takes 20 seconds.

[0038] Therefore, this invention provides an AI-based method and system for adaptive archiving and backup of massive data, which solves the problems of shallow intelligence in existing massive data backup, gaps in edge-cloud collaboration, static backup strategies lacking dynamic optimization, and weak disaster recovery capabilities. It improves the adaptive control capability and full-process intelligent management level of massive data archiving and backup under the three-level edge-cloud architecture, while also improving the utilization rate of backup resources and the data consistency guarantee and fault self-healing efficiency in disaster recovery scenarios.

[0039] Finally, it should be noted that the above embodiments are only used to illustrate the technical solutions of the present invention and not to limit them. Although the present invention has been described in detail with reference to preferred embodiments, those skilled in the art should understand that modifications or equivalent substitutions can still be made to the technical solutions of the present invention, and these modifications or equivalent substitutions cannot cause the modified technical solutions to deviate from the spirit and scope of the technical solutions of the present invention.

Claims

1. An AI-based method for adaptive archiving backup of massive data, characterized in that, include: S1. Collect massive amounts of data and corresponding metadata from each layer of the edge-cloud three-tier architecture, and construct a data feature matrix. The expression of the data feature matrix is ​​as follows: ; Where S is the file size, T is the data type, C is the creation time, F is the modification frequency, H is the access popularity, B is the business tag, L is the compliance level, and O is the data source; S2. Input the data feature matrix into the AI ​​multimodal data evaluation model, output the data backup priority, and the output result is a quantized value of 0-10. The backup priority is: ; S3. Construct a backup policy optimizer based on a deep reinforcement learning model. Using the backup priority as input, generate an adaptive backup policy. The reward function of the deep reinforcement learning model is: ; wherein, R1 is a reliability parameter, R2 is a performance parameter, and R3 is a cost parameter; S4. According to the adaptive backup strategy, perform massive data archiving and backup operations under the three-level architecture of end-edge-cloud to realize intelligent data flow and storage between different levels.

2. The method for adaptive archiving and backup of massive amounts of data based on AI according to claim 1, characterized in that, In S2, the AI ​​multimodal data evaluation model integrates a deep learning model and a knowledge graph. The deep learning model is an LSTM model or a Transformer model, and the knowledge graph is used to construct a dependency network between data in various dimensions.

3. The method for adaptive archiving and backup of massive amounts of data based on AI according to claim 1, characterized in that, In S3, the state space of a deep reinforcement learning model includes data volume, network bandwidth, storage resource utilization, node health status, and backup task queue, while the action space includes backup frequency adjustment, replica number change, storage level migration, and backup node switching.

4. The method and system for adaptive archiving and backup of massive amounts of data based on AI according to claim 1, characterized in that, In S4, the three-tier architecture of endpoint, edge, and cloud performs archiving and backup operations. Specifically, the endpoint layer performs local cache backup of critical data and synchronizes it to the edge layer in real time. The edge layer performs a combination of incremental and differential backup operations on the received data and synchronizes it to the cloud layer periodically. The cloud layer performs full backup and long-term archiving storage on the received data.

5. The method for adaptive archiving and backup of massive amounts of data based on AI according to claim 1, characterized in that, It also includes real-time monitoring of the operating status of backup nodes at each level in the three-tier architecture of edge-cloud. When the probability of node failure is detected to reach a preset threshold, data migration and backup node switching operations are automatically triggered.

6. A system for adaptive archiving and backup of massive amounts of data based on AI, characterized in that, The method for adaptive archiving and backup of massive data based on AI, as described in any one of claims 1-5, includes a data acquisition module, an AI multimodal evaluation module, a reinforcement learning optimization module, and an edge-cloud collaborative backup module. The data acquisition module is used to collect massive amounts of data and corresponding metadata at each level of the three-tier architecture of edge-cloud, and construct the data feature matrix described in S1. The AI ​​multimodal evaluation module is used to receive the data feature matrix and output the backup priority mentioned in S2 through the AI ​​multimodal data evaluation model. The reinforcement learning optimization module is used to build a backup strategy optimizer based on a deep reinforcement learning model, and to generate an adaptive backup strategy by taking the backup priority as input. The deep reinforcement learning model adopts the reward function described in S3. The edge-cloud collaborative backup module is used to perform massive data archiving and backup under the edge-cloud three-level architecture described in S4 according to the adaptive backup strategy, so as to realize intelligent data flow and storage across levels.

7. The AI-based adaptive archiving and backup system for massive amounts of data according to claim 6, characterized in that, The AI ​​multimodal evaluation module includes a model deployment unit, which performs knowledge distillation processing on the AI ​​multimodal data evaluation model to adapt to the hardware resource deployment requirements of the edge layer and the terminal layer.

8. The AI-based adaptive archiving and backup system for massive amounts of data according to claim 6, characterized in that, The reinforcement learning optimization module includes a distributed scheduling unit, which deploys a multi-agent reinforcement learning framework. Each node in the edge layer deploys an independent agent, and the cloud layer deploys a globally coordinated agent.

9. A system for adaptive archiving and backup of massive amounts of data based on AI, as described in claim 6, is characterized in that, The edge-cloud collaborative backup module includes a data consistency verification unit, which generates a unique blockchain certificate for each backup data block and performs a data backup integrity verification operation.

10. The method and system for adaptive archiving and backup of massive amounts of data based on AI according to claim 6, characterized in that, It also includes a fault prediction and self-healing module, which monitors the operating status of nodes at each level through an AI anomaly detection model, and constructs a virtual model of the system in conjunction with digital twin technology to perform advance prediction and automatic recovery operations for node faults.