Looking for breakthrough ideas for innovation challenges? Try Patsnap Eureka!

HDFS load source and sink node selection method based on multiple measurement indexes

A sink node and target node technology, applied in the field of HDFS load source and sink node selection based on multiple metrics, can solve problems such as inaccurate selection of HDFS source and sink nodes, cluster performance degradation, etc., achieve reasonable and accurate load migration, and improve work efficiency , the effect of ensuring accuracy

Inactive Publication Date: 2015-10-14
SICHUAN UNIV
View PDF5 Cites 12 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0056] Aiming at the deficiencies of the prior art, the present invention provides a method for selecting HDFS load source and sink nodes based on multiple measurement indicators, which can effectively solve the problem of inaccurate selection of HDFS source and sink nodes and cause cluster performance degradation

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • HDFS load source and sink node selection method based on multiple measurement indexes
  • HDFS load source and sink node selection method based on multiple measurement indexes
  • HDFS load source and sink node selection method based on multiple measurement indexes

Examples

Experimental program
Comparison scheme
Effect test

Embodiment 1

[0091] Embodiment 1: When using the AHP method to quantify the actual workload of the server, the main steps are as follows:

[0092] (1) Build a server load hierarchy model (such as figure 1 shown);

[0093] (2) Construct a judgment matrix of the importance of each factor or index;

[0094] A 1 = U 11 U 12 U 13 U 14 U 15 U 21 U 22 ...

Embodiment 2

[0120] figure 1 It is a model diagram of the quantitative server load hierarchy in the present invention;

[0121] In previous studies, the load of servers in the cluster is estimated by a combination of one or more indicators, the main indicators are as follows:

[0122] ●Storage usage

[0123] ●Disk I / O access rate

[0124] ●Service response time

[0125] ● CPU utilization

[0126] ●Memory usage

[0127] ●Number of tasks

[0128] ●Response delay time of network communication

[0129] ●Virtual memory usage

[0130] ●Cumulative processing time of currently active tasks

[0131] ●CPU temperature

[0132] ●Network bandwidth usage

[0133] ●Failure time

[0134] In the present invention, the emphasis for the balance of the distributed cluster system is the balance of data, that is to say, it is only for the operation of files in HDFS, including uploading and downloading files. It can be seen from this that under this scenario, for the cluster system The main pressure ...

Embodiment 3

[0138] figure 2 It is a flow chart of the load migration strategy based on naive Bayesian in the present invention:

[0139] Its main steps are as follows:

[0140] 1) The master node collects the load information of the node and saves it in a file.

[0141] 2) Use the NB algorithm to train the classifier according to the historical load information of the node. There are three types in the classifier: overload class, balance class, idle class, and each type has 8 characteristic attributes. The classification thresholds of these 8 attribute values ​​are as follows: Figure 4 shown.

[0142] 3) After the classifier is generated, use this classifier to calculate its category for each node, and output it to a file for the load balancer to select the source and sink nodes.

[0143] 4) The equalizer is started, and the classification result file is read.

[0144] 5) The balancer divides the nodes into three queues according to the classification results, and the queues are so...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

PUM

No PUM Login to View More

Abstract

The present invention discloses an HDFS load source and sink node selection method based on multiple measurement indexes and belongs t the field of internet data storage. The HDFS load source and sink node selection method based on the multiple measurement indexes comprises the following steps of: 1, quantizing load values of data nodes by adopting an AHP method; 2, carrying out sorting on the data nodes by adopting a naive bayesian algorithm and carrying out internal sorting according to the actual load values of the nodes, which are obtained by quantization in the step 1; and 3, selecting source and sink nodes according to a defined node selection strategy. The HDFS load source and sink node selection method based on the multiple measurement indexes has the following beneficial effects of effectively solving the problem of reduction of cluster performance, which is caused by inaccuracy of selecting HDFS source and sink nodes, enabling an HDFS cluster to have a better balanced effect, reducing the load balancing frequency of the HDFS cluster, reducing consumption of resources, which are used for load balancing, of the HDFS cluster and effectively improving integral performance of the HDFS cluster.

Description

technical field [0001] The invention belongs to the field of Internet data storage, and in particular relates to a method for selecting HDFS load source and sink nodes based on multiple measurement indicators. Background technique [0002] In recent years, with the improvement of social informatization and the rapid development of Internet technology, more and more people use the Internet to obtain the required information, shopping and entertainment. The resulting massive data puts forward higher requirements on the servers of various data storage and processing centers, especially how to store these massive information and process the massive data. Cloud computing and cloud storage are also applied in this context, and Hadoop is one of the relatively mature cloud computing platforms with good development momentum. Some large enterprises and research institutions at home and abroad are using the working mechanism of Hadoop to develop and Build their own cloud computing pla...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

Application Information

Patent Timeline
no application Login to View More
IPC IPC(8): G06F9/50G06F17/30
Inventor 刘晓洁康承昆林平
Owner SICHUAN UNIV
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Patsnap Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Patsnap Eureka Blog
Learn More
PatSnap group products