Data placement method based on distributed cluster

A distributed cluster and data technology, applied in electrical components, transmission systems, etc., can solve problems such as data recovery performance loss, increase data recovery time, and computing power affecting performance, to prevent waste of resources, ensure load balance, and ensure transmission. The effect of efficiency

Inactive Publication Date: 2014-02-19
LANGCHAO ELECTRONIC INFORMATION IND CO LTD
View PDF3 Cites 29 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

But at this time, unnecessary data recovery time may be increased because the remote node is too far away from the local node, and random selection of nodes cannot guarantee the balance of data storage between nodes
Due to frequent node failures in the system, random selection of remote nodes will cause unnecessary performance loss in data recovery, resulting in performance degradation of the entire storage system
However, the network distance of the remote data copy and the data load of each node and the computing power of each node will affect the performance

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Data placement method based on distributed cluster
  • Data placement method based on distributed cluster
  • Data placement method based on distributed cluster

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0029] Referring to the accompanying drawings, a specific example will be used to describe the process of implementing the distributed cluster-based data placement method for the content of the present invention.

[0030] First, deploy a distributed cluster environment, and install hadoop components on the operating system centos6.3 according to official documents. Then enable the hdfs and mapreduce services. The nodes in rack 1 have ordinary computing capabilities, and the nodes in racks 2 and 3 have fast computing capabilities. There are 5 Datanode nodes in each rack. The flow chart of the data placement method for distributed clusters is as follows figure 1As shown in , when a user submits a data storage request, first select nodes in different racks, and then judge whether the obtained nodes reach the selected fixed value. node. When entering the data placement evaluation module, it is first necessary to calculate the distance information of the current node, the numbe...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention discloses a data placement method based on a distributed cluster. In order to solve the problem that the loading condition, the computing power of a computational node and movement of mass data can have an influence on operational performance, the three factors are effectively combined to compute an evaluation value of data placement, and then a node is selected according to the evaluation value. The data placement method based on the distributed cluster has the advantages that load balancing of data placement can be achieved, and the degree of parallelism is improved when data read-write is carried out; the computing power of the node can be well used, corresponding computation tasks are distributed according to the computing power, and the time of operation is reduced; good transmission performance is achieved, data are stored in the nearby computational node, data transmission can be minimized, and efficiency is improved.

Description

technical field [0001] The invention relates to a data placement method based on a distributed cluster. technical background [0002] With the continuous development of Internet technology and the rapid increase of network information, the ability to efficiently and reliably process large-scale data sets is crucial to the development of the Internet. MapReduce is an easy-to-write parallel programming framework. Massive data can be processed through the MapReduce framework in the Hadoop cluster to improve efficiency through parallelism. However, since the input data of the operation in MapReduce is usually a large amount of data, if the data is distributed on different racks, a large amount of data will be moved, which will affect the performance of the operation. Therefore, the data should be placed close to the computing nodes to reduce the performance loss caused by large amounts of data movement. Therefore, the data placement method of the distributed cluster is very i...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
Patent Type & Authority Applications(China)
IPC IPC(8): H04L29/08
Inventor 郭美思王秀娟
Owner LANGCHAO ELECTRONIC INFORMATION IND CO LTD
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products