Copy management strategy for data blocks in HDFS

A copy management and data block technology, which is applied in the computer field, can solve problems such as user uncertain waiting, unbalanced data placement, and imbalance, and achieve the effect of reducing waiting time and improving system throughput

Active Publication Date: 2013-12-04
XI AN JIAOTONG UNIV
View PDF3 Cites 27 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0005] However, due to its random nature, this replica placement strategy will lead to uneven placement of data
At the same time, a large amount of data will be generated in the cloud computing environment. These data and their copies are stored in HDFS. Due to the imbala...

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Copy management strategy for data blocks in HDFS
  • Copy management strategy for data blocks in HDFS
  • Copy management strategy for data blocks in HDFS

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0028] The present invention will be described in detail below in conjunction with the accompanying drawings.

[0029] A copy management strategy for data blocks in HDFS, including the selection method of the starting timing of the copy, the selection method of the copy number, the selection method of the copy position, and the placement method of the new copy;

[0030] The method for selecting the start timing of the copy adopts a fixed-period copy strategy, that is, starts the copy strategy once in a fixed period, checks the access situation of the file, and determines the number and location of the copy. In theory, the copy The start of the strategy only needs to be started once in a cycle. In order to minimize the impact of copy data replication on system performance, the optimal method is to complete the start of the copy strategy and the execution of copy data replication at the moment when the system load is lightest. The specific steps are: 1) Determine the start-up cy...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

A copy management strategy for data blocks in an HDFS comprises a method for selecting the starting time of copies, a method for selecting the number of the copies, a method for selecting the positions of the copies and a method for arranging newly added copies. According to the method for selecting the starting time of the copies, a starting cycle is determined first, one day is regarded as an access cycle to carry out starting of a copy strategy, and then starting moments are determined; according to the method for selecting the number of the copies, the number of the copies are calculated by adopting Poisson distribution, and then calculation is carried out through a copy number calculation method based on a queuing theory; according to the method for selecting the positions of the copies, a copy arranging strategy is adopted in the selection of bays and nodes, and the utilization conditions need to be taken into consideration; according to the method for arranging the newly added copies, the positions of the newly added copies are selected in the bay with the nodes accessed the most by users, and nodes with the lightest load are selected in the bay with the nodes accessed the most by the users to carry out the establishment of the copies. The copy management strategy for the data blocks in the HDFS proposes an HDFS copy-first strategy, can reduce the waiting time when the users access HDFS data and improve the throughput of the system.

Description

technical field [0001] The invention belongs to the technical field of computers, and in particular relates to a copy management strategy of data blocks in HDFS (Hadoop Distributed File System). Background technique [0002] Hadoop is a highly reliable and highly scalable storage and distributed parallel computing platform developed by the Apache open source organization. It was first developed as the basic platform of the open source search engine project Nutch, and then became independent from the Nutch project and became One of the typical open source cloud computing platforms. The Hadoop core implements a block-based distributed file system (Hadoop Distributed File System, HDFS) and a MapReduce computing model for distributed computing. [0003] The HDFS file system uses a block mechanism to store data sets in a distributed manner, and improves system reliability through a data block redundancy strategy. Each data block has multiple copies in the system at the same time...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
IPC IPC(8): G06F17/30
Inventor 伍卫国樊源泉姚超魏伟高颜曹莹方段章峰朱霍
Owner XI AN JIAOTONG UNIV
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products