Dynamic copy management method based on HDFS

A copy management and copy technology, applied in the direction of electrical digital data processing, special data processing applications, instruments, etc., to achieve the effect of improving concurrent performance and accuracy

Inactive Publication Date: 2014-03-12
LANGCHAO ELECTRONIC INFORMATION IND CO LTD
View PDF3 Cites 19 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

Specifically, it mainly sol

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Dynamic copy management method based on HDFS
  • Dynamic copy management method based on HDFS
  • Dynamic copy management method based on HDFS

Examples

Experimental program
Comparison scheme
Effect test

Embodiment 1

[0048] A HDFS-based dynamic copy management method, comprising a copy placement strategy, a dynamic copy creation strategy and a dynamic copy deletion strategy, characterized in that: the copy placement strategy includes the placement strategy of the master copy default copy and the placement strategies of other copies, It is an active leveling strategy, which fully considers the problem of load balancing at the beginning of replica creation, actively places replicas in the best position, and the best position is the position with the lightest load, so as to eliminate the potential of load imbalance as much as possible Risk, this strategy avoids the arbitrary placement of the created copy in the entire storage system, but judges the best location according to the computing power of the storage node and the number of data blocks already stored.

Embodiment 2

[0050] On the basis of Embodiment 1, in the replica placement strategy of this embodiment, the master replica and the default replica placement strategy are: for each data block in HDFS, when the file is written into the file system, there will be 1 master replica by default. copy and two default copies. The master copy and one of the default copies are saved on the local rack (the cluster under the same router where the uploaded file is located), and the other default copy is placed on any other rack except the local rack.

Embodiment 3

[0052] On the basis of embodiment 2, the selection of the machine in the rack of this embodiment has two parameter indexes:

[0053] How many data blocks have been stored

[0054] cpu processing performance

[0055] Among them, let the number of stored data blocks of the i-th machine be Ni, the cpu processing performance be CAi, let the variable where k 1 、k 2 is a constant coefficient. Calculate the P value of all nodes in the local rack, select the two machines with the smallest P value to create the primary copy and one of the default copies, calculate the P value of all nodes in the remote rack, and select the machine with the smallest P value to create another A default copy. During the selection process, machines that already have a copy of this data block are skipped; at the same time, the size of the space is detected, and machines with insufficient space to save the copy are skipped.

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention relates to a dynamic copy management method based on an HDFS. According to the method, a copy placement strategy, a dynamic copy creation strategy and a dynamic copy deletion strategy are adopted. The copy placement strategy comprises a main copy and default copy placement strategy and a placement strategy of other copies and is an active leveling strategy. The load balancing problem is considered fully when copy creation is started, and a copy is actively placed at the position where a load is the lightest. Aiming at solving the problem that hot spot data processing efficacy of the HDFS is low, the dynamic copy management method is provided, and a copy of the HDFS is created through hot spot data dynamic conditions. Judged hot spot data are combined with N times of historical records so that the accuracy can be improved; copy creation is made to be predictable, the dynamic copy deletion strategy is provided, and therefore the concurrence performance of a whole HDFS cluster can be effectively improved.

Description

technical field [0001] The invention relates to the field of HDFS in the current big data Hadoop ecosystem, in particular to a dynamic copy management method based on HDFS. technical background [0002] Hadoop Distributed File System, HDFS for short, is a distributed file system. GFS is also google File System, a dedicated file system designed by Google to store massive search data. [0003] With the rapid development of the Internet, the amount of data has increased exponentially. In order to adapt to this situation, many large server architectures such as data centers and cloud computing have emerged. In terms of big data processing, Google's GFS provides an effective method for processing large files, and the file system HDFS under Hadoop is an open source implementation of GFS, which realizes most of the functions of GFS and is widely used in the field of big data processing at this stage. A distributed parallel file system is used, then in the parallel file system, re...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
IPC IPC(8): G06F17/30
CPCG06F16/184
Inventor 孟祥飞孙志云吴楠
Owner LANGCHAO ELECTRONIC INFORMATION IND CO LTD
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products