Looking for breakthrough ideas for innovation challenges? Try Patsnap Eureka!

Data equalization processing method, device and system based on mapping and specification

A mapping processing and equalization technology, which is applied in the field of data equalization processing based on mapping and protocol, can solve problems such as extended data processing time, reduced data processing efficiency, and ineffective use of TaskTracker resources, so as to reduce the total time, Reduce load imbalance and improve efficiency

Active Publication Date: 2016-06-15
TENCENT CLOUD COMPUTING BEIJING CO LTD
View PDF3 Cites 0 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

In practical applications, the time required for data processing (the time at which the task ends) is determined by the Reducer1 with a long processing time. Therefore, due to the unbalanced load of the Reducer, the time for the system to process data will be extended. For example, the Reducer1 in the system is still Processing data, but at the same time, Reducer2 is in an idle state, so that TaskTracker resources cannot be effectively used, reducing the efficiency of data processing

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Data equalization processing method, device and system based on mapping and specification
  • Data equalization processing method, device and system based on mapping and specification
  • Data equalization processing method, device and system based on mapping and specification

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0071] In order to make the purpose, technical solution and advantages of the present invention clearer, the present invention will be further described in detail below in conjunction with the accompanying drawings and specific embodiments.

[0072] In the prior art, when the intermediate result processed by Mapper is stored in RAM, and when the intermediate result data stored in RAM is merged and partitioned, the processing is performed in the form of a specified Reduce partition and output to the corresponding buffer, so that The intermediate result data load in the input buffer is unbalanced.

[0073] In the embodiment of the present invention, before partitioning the intermediate result data stored in RAM, the distribution of the intermediate result data of the fine-grained partition is obtained by performing fine-grained partition preprocessing on the intermediate result data, and according to the acquired data distribution According to the pre-set balance strategy, the i...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

PUM

No PUM Login to View More

Abstract

The invention discloses a data equalization processing method, device and system based on mapping and agreement. Including: obtaining the data submitted by the client, performing initial partitioning on the obtained data according to the preset number of mappers; performing mapping processing on the data in the initial partitioning respectively to obtain the intermediate result data; calling the partitioner function to perform the intermediate result data according to Carry out fine-grained partitioning according to the preset number of fine-grained partitions, the preset number of fine-grained partitions is greater than the number of specified partitions; output the data amount information of the intermediate results in each fine-grained partition to the working server, and receive the fine-grained data returned by the working server. The corresponding relationship between the partition and the protocol partition, merge the intermediate result data belonging to the same protocol partition in the fine-grained partition and output it to the corresponding protocol partition; perform protocol processing on the intermediate result data in the protocol partition, and obtain the corresponding data processing results. By applying the invention, the data load can be balanced and the efficiency of data processing can be improved.

Description

technical field [0001] The invention relates to distributed data computing technology, in particular to a data balance processing method, device and system based on MapReduce (MapReduce). Background technique [0002] MapReduce is an existing system architecture applied to large-scale data processing. As a programming system architecture, it is widely used in parallel computing of large-scale data sets (such as data sets larger than 1TB), such as large-scale distributed filtering, Large-scale distribution sorting, web connection graph inversion, web access log analysis, inverted index construction, document clustering, machine learning, and statistical-based machine translation, etc. In the MapReduce system architecture, the data processing process is divided into two stages: the first stage is the Mapping (Map) stage, where the data to be processed is initially partitioned, and each element in the initial partition is calculated and output to the protocol partition. Among ...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

Application Information

Patent Timeline
no application Login to View More
Patent Type & Authority Patents(China)
IPC IPC(8): G06F17/30H04L29/06H04L29/08
Inventor 蔡斌田万鹏万乐史晓峰邱翔虎刘奕慧肖桂菊宫振飞张文郁韩欣崔小丰
Owner TENCENT CLOUD COMPUTING BEIJING CO LTD
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Patsnap Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Patsnap Eureka Blog
Learn More
PatSnap group products