Repeating data and deleted data placement method and device based on load balancing

A technology for deduplication and load balancing, which is applied in electrical digital data processing, special data processing applications, instruments, etc. It can solve the problems of sacrificing deduplication rate, limited capacity, narrow scope of application, large workload and overhead, etc. Read load bottleneck and improve read performance

Active Publication Date: 2016-08-03
NAT UNIV OF DEFENSE TECH
View PDF5 Cites 11 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0010] Research on a single node adds some redundant data, thus sacrificing the deduplication rate. At the same time, the capacity is limited and the scope of application is relatively narrow, which cannot well meet the requirements of large capacity in the era of big data.
Although multi-nodes can expand stor

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Repeating data and deleted data placement method and device based on load balancing
  • Repeating data and deleted data placement method and device based on load balancing
  • Repeating data and deleted data placement method and device based on load balancing

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0031] Figure 1 to Figure 4 Take 4 nodes as an example. figure 1 It is a schematic diagram of data deduplication using the round-robin placement method. When data blocks are stored, they are placed in round-robin order according to the node number.

[0032] figure 2 It is the basic flowchart of data deduplication, including data block, calculation of characteristic value, query index table, deletion of duplicate blocks and storage of unique blocks.

[0033] image 3 It is a schematic diagram of data placement for deduplication based on load balancing adopted in the present invention. The specific execution process is:

[0034] The first step is to define two new data structures. The array PlacementTable[NodeNum] stores the placement node numbers of the corresponding blocks that arrive sequentially in a placement, and the character array Last_RequestID stores the RequestID of the previous data block;

[0035] The second step is to initialize the array PlacementTable[No...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention relates to a repeating data and deleted data placement method and device based on load balancing. Based on various kinds of distributed repeating data deletion systems, by virtue of a strategy of changing data block placement, read performance of files is further improved under the condition that a repeated deletion rate is invariable. The method is characterized in that all the contained data blocks are placed by taking single write IO as a basic unit, so that the data blocks in the same IO are independently placed on relatively independent storage nodes as many as possible. Therefore, loading bottleneck during file reading can be eliminated to the greatest degree, parallel maximized utilization of each independent node is realized, and the read performance of the system is improved.

Description

technical field [0001] The invention is applicable to the technical field of deduplication of data, and provides a data placement method based on a load balancing distributed deduplication system (DataDeduplicationSystem), which eliminates the load bottleneck when reading files and improves the reading performance of the system. Background technique [0002] With the rapid development of the information technology revolution, big data and cloud computing have become the mainstream of today's era. The explosive growth of data and the continuous improvement of computer performance have put forward higher and higher requirements for storage systems. Storage systems are facing capacity and performance challenge. [0003] Faced with the rapid growth of data volume, large-scale data centers continue to need larger-capacity storage devices. Blindly purchasing storage devices and increasing storage capacity is not an effective way to solve capacity problems. In addition, purchasing...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
IPC IPC(8): G06F17/30
CPCG06F16/1752
Inventor 肖侬邓明翥陈志广刘芳张学成
Owner NAT UNIV OF DEFENSE TECH
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products