Data transmission method and system based on Hadoop

A data transmission method and non-transmission technology, applied in the field of big data processing systems, can solve problems such as speeding up the transmission efficiency of intermediate results, and achieve the effects of reducing system storage overhead, improving transmission efficiency, and shortening execution time

Active Publication Date: 2016-09-21
INST OF COMPUTING TECHNOLOGY - CHINESE ACAD OF SCI
View PDF5 Cites 2 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

However, this patent focuses on taking advantage of the network bandwidth advantages of the storage network to accelerate the transmission efficiency of i

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Data transmission method and system based on Hadoop
  • Data transmission method and system based on Hadoop
  • Data transmission method and system based on Hadoop

Examples

Experimental program
Comparison scheme
Effect test

Example Embodiment

[0049] In order to make the objectives, technical solutions and advantages of the present invention clearer, the following describes the Hadoop-based data transmission method and system of the present invention with reference to the accompanying drawings. It should be understood that the specific embodiments described herein are only used to explain the present invention, but not to limit the present invention.

[0050] In order to reduce the waiting time of the Reduce task, increase the parallelism of the Map task and the Reduce task, and reduce the system storage overhead, the present invention improves the way that each Map task generates an intermediate result file in the background art, and is modified to the same server. All the Map tasks running in, use the same intermediate result file for the storage of intermediate results. With the use of index files, the intermediate results can be transmitted during the execution of the Map task, and the intermediate results are activ...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention discloses a data transmission method and system based on Hadoop, and the method comprises the steps of generating an intermediate result file, namely establishing an intermediate result file to store all intermediate results generated by a Map task momentarily; establishing index, namely establishing an index file and updateing the index file according to the intermediate result file momentarily; transmitting, namely sending untransmitted intermediate result to the Reduce task actively when the untransmitted intermediate result exists in the intermediate result file and a corresponding Reduce task is started according to the index file. The Hadoop task execution time is shortened, making the parallel degree between the Map task and the Reduce task higher. System resource utilization rate is raised, and system storage cost is reduced.

Description

technical field [0001] The invention relates to the field of big data processing systems, in particular to a Hadoop-based data transmission method and system. Background technique [0002] In order to understand the present invention more clearly, at first several nouns are explained as follows: [0003] Hadoop system: a distributed system infrastructure developed by the Apache Foundation, users can develop distributed programs without knowing the underlying details of the distribution. Hadoop implements a distributed file system (Hadoop Distributed File System), referred to as HDFS. [0004] MapReduce computing framework: a software framework for parallel processing of large data sets based on the Hadoop distributed file system, and HDFS constitute the two core components of Hadoop. The MapReduce computing framework needs to run Map tasks and Reduce tasks implemented by users. [0005] figure 1 It is a schematic diagram of the data processing flow of the Hadoop system. ...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
IPC IPC(8): G06F9/50
CPCG06F9/5066
Inventor 曹政郭嘉梁李强
Owner INST OF COMPUTING TECHNOLOGY - CHINESE ACAD OF SCI
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products