Big-data-oriented cloud disaster tolerant backup method

A disaster recovery backup and big data technology, which is applied in the direction of data error detection and response error generation, can solve the problem of limiting the throughput of big data storage systems, failing to meet real-time system requirements, and aggravating the client. Load and other issues, to achieve the effect of enhancing the remote disaster recovery function, saving disaster recovery costs, and reducing the risk of data leakage

Active Publication Date: 2015-09-23
SOUTH CHINA UNIV OF TECH
View PDF7 Cites 36 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

Deduplicate the backup database files in the resource pool, but as the database files grow, this differential deletion method will also cause system performance bottlenecks
On the other hand, using client-side compressed storage to decompose the high load problem of the storage server is usually that the client runs a data deduplication program on the input file to generate split data blocks and corresponding fingerprint feature values; The query request of the characteristic value; the distribution server records the storage location of the split data block; the distribution server forwards the query request to the corresponding repeated data processing device according to the fingerprint characteristic value; the repeated data processing device judges whether the fingerprint characteristic value already exists; if not The fingerprint feature value, the repeated data processing device will store the new split data block to the storage server according to the new fingerprint feature value, but such operations usually increase the load on the client
In practice, it has been shown that data in big data storage systems has different access heat. Usually, the access volume and update rate of hot data far exceed some old cold data. When distinguishing data heat, it is inevitable to face a large amount of data. Block segmentation and reassembly, while the I / O performance of the storage medium and the bandwidth of the storage network usually limit the throughput of the big data storage system
[0005] The current disaster recovery and backup system usually uses HDFS on the private cloud as the platform, uses MapReduce tasks to realize data segmentation and combines data deduplication technology based on content recognition, or directly stores data in the public cloud, relying on the deduplication of the public cloud technology and multi-copy remote disaster recovery strategy, etc. These methods are only suitable for offline storage backup services, and usually cannot meet the current real-time system requirements

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Big-data-oriented cloud disaster tolerant backup method
  • Big-data-oriented cloud disaster tolerant backup method
  • Big-data-oriented cloud disaster tolerant backup method

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0028] The present invention will be further described in detail below in conjunction with the embodiments and the accompanying drawings, but the embodiments of the present invention are not limited thereto.

[0029] The present invention utilizes the repeated data deletion technology based on content recognition to perform distributed data deduplication. After the server of the cloud storage network performs disaster recovery backup for the client of the production system, it reads and extracts the metadata of the data objects in the backup set, and stores them in the cache nodes of the cloud storage network. When new metadata enters, the old and new Compare the metadata array spaces of different versions, and if metadata of the same version is found, further compare the data objects byte by byte, so as to find changed data (even if the metadata versions are the same). If the data object is repeated, assign a pointer to the data object, and finally delete the data object. Th...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention discloses a big-data-oriented cloud disaster tolerant backup method, which comprises the following steps of: building a file block Hash fingerprint and a snapshot pointer to realize compression storage backup on different versions of a file; meanwhile, transmitting the file block fingerprint to a private cloud storage system; building a file block fingerprint index database by a private cloud; comparing the Hash fingerpoint through the MapRedue task to perform primary deduplication on a transmission block; performing fine-granularity content-based secondary blocking hash on a data block; calculating the similarity matrix and the block pointer distribution of the data block through another MapReduce sub task; counting the access hot degree of the data block; caching a fingerprint index database and hot data into a storage front section; storing cold data and filing backup data in a centralized way; building a version snapshot; and regularly backing up the data in a public cloud storage system. The big-data-oriented cloud disaster tolerant backup method solves the problems of poor real-time performance and the like of a data deduplication technology in the conventional disaster tolerant backup through a cache fingerprint database and hot data.

Description

technical field [0001] The invention relates to the field of data backup, in particular to a big data-oriented cloud disaster recovery backup method. Background technique [0002] In the past, data protection solutions were based on data deduplication of stand-alone devices, but the development trend of data storage and backup networks is a large-scale distributed storage network. Multiple storage and data processing devices are connected through high-speed communication lines to provide cloud storage and high-speed Available services. The disaster recovery backup of massive heterogeneous data usually uses a distributed cloud storage network. A backup set is stored in different devices in the form of data blocks. This has the advantage of sharing the load of each device and improving the fault tolerance of the data. However, there may be The same data block is repeatedly stored in different devices, and a large amount of redundant data is accumulated in the cloud storage ne...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
IPC IPC(8): G06F11/14
Inventor 林伟伟张子龙钟坯平
Owner SOUTH CHINA UNIV OF TECH
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products