A big data-oriented cloud disaster recovery backup method

A disaster recovery backup and big data technology, which is applied in the direction of data error detection and response error generation, can solve the problem of limiting the throughput of big data storage systems, failing to meet real-time system requirements, and aggravating the client. Load and other issues, to achieve the effect of enhancing the remote disaster recovery function, saving disaster recovery costs, and reducing the risk of data leakage

Active Publication Date: 2018-02-27
广州鼎甲计算机科技有限公司
View PDF7 Cites 0 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

Deduplicate the backup database files in the resource pool, but as the database files grow, this differential deletion method will also cause system performance bottlenecks
On the other hand, using client-side compressed storage to decompose the high load problem of the storage server is usually that the client runs a data deduplication program on the input file to generate split data blocks and corresponding fingerprint feature values; The query request of the characteristic value; the distribution server records the storage location of the split data block; the distribution server forwards the query request to the corresponding repeated data processing device according to the fingerprint characteristic value; the repeated data processing device judges whether the fingerprint characteristic value already exists; if not The fingerprint feature value, the repeated data processing device will store the new split data block to the storage server according to the new fingerprint feature value, but such operations usually increase the load on the client
In practice, it has been shown that data in big data storage systems has different access heat. Usually, the access volume and update rate of hot data far exceed some old cold data. When distinguishing data heat, it is inevitable to face a large amount of data. Block segmentation and reassembly, while the I / O performance of the storage medium and the bandwidth of the storage network usually limit the throughput of the big data storage system
[0005] The current disaster recovery and backup system usually uses HDFS on the private cloud as the platform, uses MapReduce tasks to realize data segmentation and combines data deduplication technology based on content recognition, or directly stores data in the public cloud, relying on the deduplication of the public cloud technology and multi-copy remote disaster recovery strategy, etc. These methods are only suitable for offline storage backup services, and usually cannot meet the current real-time system requirements

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • A big data-oriented cloud disaster recovery backup method
  • A big data-oriented cloud disaster recovery backup method
  • A big data-oriented cloud disaster recovery backup method

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0028] The present invention will be further described in detail below in conjunction with the embodiments and the accompanying drawings, but the embodiments of the present invention are not limited thereto.

[0029] The present invention utilizes the repeated data deletion technology based on content recognition to perform distributed data deduplication. After the server of the cloud storage network performs disaster recovery backup for the client of the production system, it reads and extracts the metadata of the data objects in the backup set, and stores them in the cache nodes of the cloud storage network. When new metadata enters, the old and new Compare the metadata array spaces of different versions, and if metadata of the same version is found, further compare the data objects byte by byte, so as to find changed data (even if the metadata versions are the same). If the data object is repeated, assign a pointer to the data object, and finally delete the data object. Th...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

A big data-oriented cloud disaster recovery backup method disclosed by the present invention includes the following steps: establishing file block hash fingerprints and snapshot pointers to realize compressed storage backup of different versions of files, and simultaneously transmitting file block fingerprints to a private cloud storage system ;The private cloud establishes a file block fingerprint index database, and compares the hash fingerprints through the MapReduce task to perform preliminary deduplication on the transmission blocks, and performs fine-grained content-based re-block hashing on the data blocks, and calculates through another MapReduce subtask The similarity matrix of data blocks and the distribution of block pointers count the access heat of data blocks, cache the fingerprint index database and hot data in the front section of storage, store cold data and archived backup data centrally and create version snapshots, and regularly back up in the public cloud Storage System. The method of the invention solves the problem of poor real-time performance of the data deduplication technology in the traditional disaster recovery backup by caching the fingerprint library and hot data.

Description

technical field [0001] The invention relates to the field of data backup, in particular to a big data-oriented cloud disaster recovery backup method. Background technique [0002] In the past, data protection solutions were based on data deduplication of stand-alone devices, but the development trend of data storage and backup networks is a large-scale distributed storage network. Multiple storage and data processing devices are connected through high-speed communication lines to provide cloud storage and high-speed Available services. The disaster recovery backup of massive heterogeneous data usually uses a distributed cloud storage network. A backup set is stored in different devices in the form of data blocks. This has the advantage of sharing the load of each device and improving the fault tolerance of the data. However, there may be The same data block is repeatedly stored in different devices, and a large amount of redundant data is accumulated in the cloud storage ne...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
Patent Type & Authority Patents(China)
IPC IPC(8): G06F11/14
Inventor 林伟伟张子龙钟坯平
Owner 广州鼎甲计算机科技有限公司
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products