Looking for breakthrough ideas for innovation challenges? Try Patsnap Eureka!

Copy balancing method based HDFS

A balancing and replica technology, applied in special data processing applications, instruments, electrical and digital data processing, etc., can solve problems such as difficult node configuration consistency, and achieve the effect of improving load balancing capabilities and cluster performance.

Active Publication Date: 2014-09-24
UNIV OF ELECTRONICS SCI & TECH OF CHINA
View PDF3 Cites 10 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

In actual large clusters, it is difficult to ensure the consistency of node configuration in the cluster. Therefore, it is necessary to develop a new HDFS-based replica balancing method

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Copy balancing method based HDFS
  • Copy balancing method based HDFS
  • Copy balancing method based HDFS

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0045] The HDFS-based copy balancing method of the present invention includes three parts: cluster configuration, data collection, and execution of the Balancer program. The present invention will be further described below with reference to the drawings and embodiments.

[0046] 1), cluster configuration

[0047] Such as figure 1 As shown, the Performance class is designed to represent the performance evaluation index of the DataNode, and the Performance class provides a getPerformance method for obtaining corresponding performance data. Define the performance class corresponding to the performance indicators of the DataNode. The performance class is a subclass of the Performance class. Specifically, the performance class includes the CpuPerformance class used to obtain the CPU speed of the DataNode, the MemoryPerformance class used to obtain the memory capacity of the DataNode, and the Obtain the DiskPerformance class of the disk capacity of the DataNode; modify the HDFS communi...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

PUM

No PUM Login to View More

Abstract

The invention discloses a copy balancing method based on an HDFS. An abstract Performance class is designed in configuration items of a cluster, and performance data of all DataNodes are collected through heartbeat information; in the data migration process, matching of the DataNodes needs to meet node matching rules of an existing Balancer program, and the performance index data of the DataNodes also need to be referred; evaluation is conducted according to the specific value of performance grades and memory spaces of the DataNodes, and matching is conducted between the DataNode with the optimal evaluation and the DataNode with the worst evaluation; the amount of data stored by the DataNodes is made to be in direct proportion to the performance of the DataNodes, the load balancing capability of the HDFS is improved, and the performance of the cluster is improved. When the cluster is established, the performance difference of the configuration of all the nodes in the cluster does not need to be considered.

Description

Technical field [0001] The invention relates to the technical field of data processing and control, and in particular to a copy balancing method based on HDFS. Background technique [0002] In recent years, with the advent of the WEB2.0 era marked by user-generated content, applications such as blogs, SNS, P2P, IM, pictures, and videos have developed rapidly, and information services have become closer to people’s lives. Data in the Internet Shows geometric growth. Although it can meet the computing and storage requirements of Internet data by relying on the powerful computing and storage capabilities of supercomputers, its cost is very high and it is difficult to be widely used. In order to break through the limitations of stand-alone storage, computing, memory and other capabilities at a low price, people have turned their attention to distributed systems. Common distributed computing projects use idle computers to transmit data via the Internet or Ethernet, allowing each com...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

Application Information

Patent Timeline
no application Login to View More
Patent Type & Authority Applications(China)
IPC IPC(8): G06F17/30
CPCG06F16/119G06F16/182
Inventor 罗光春田玲陈爱国舒康
Owner UNIV OF ELECTRONICS SCI & TECH OF CHINA
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Patsnap Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Patsnap Eureka Blog
Learn More
PatSnap group products