Data distribution method and device based on MapReduce as well as computer readable storage medium

A technology of data distribution and to-be-distributed, which is applied in the field of big data processing in the field of Internet information technology, and can solve the problems of small data volume, affecting job completion time, partition processing time and long partition processing time, etc.

Active Publication Date: 2018-09-28
MIGU CO LTD +1
View PDF4 Cites 15 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0004] However, when the data distribution in the key-value pair is seriously unbalanced, the data volume of the hot partition will be large, while the data volume of the non-hot partition will be small, which will lead to the partition processing of reduce corresponding to the data of the hot partition. The time is longer than the processing time of the reduce partition corresponding to the non-hot partition data, which in turn affects the completion time of the entire job

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Data distribution method and device based on MapReduce as well as computer readable storage medium
  • Data distribution method and device based on MapReduce as well as computer readable storage medium
  • Data distribution method and device based on MapReduce as well as computer readable storage medium

Examples

Experimental program
Comparison scheme
Effect test

Embodiment 1

[0055] The embodiment of the present invention provides a data allocation method based on MapReduce, such as figure 2 As shown, the method may include:

[0056] S101. Execute a preset Map function on the input document to be processed to obtain a set of key-value pairs.

[0057] The data allocation method provided by the embodiment of the present invention is applicable to the scenario where the partitoner operation is used to perform data partitioning when the MapReduce computing model is used to process large data.

[0058] In the embodiment of the present invention, MapReduce calculates input slices according to the documents to be processed, and each input slice corresponds to a map task, and map processes the input slices according to the map() method (preset Map function) to obtain a set of key-value pairs .

[0059] In the embodiment of the present invention, the storage format of the key-value pair is , where key is a key and value is a value.

[0060] S102. Calcul...

Embodiment 2

[0095] The embodiment of the present invention provides a data allocation method based on MapReduce, such as image 3 As shown, the method may include:

[0096] S201. The MapReduce-based data distribution device runs a preset Map function on an input document to be processed to obtain a set of key-value pairs.

[0097] The data allocation method provided by the embodiment of the present invention is applicable to the scenario where the partitoner operation is used to perform data partitioning when the MapReduce computing model is used to process large data.

[0098] In the embodiment of the present invention, MapReduce calculates input slices according to the documents to be processed, and each input slice corresponds to a map task, and map processes the input slices according to the map() method (preset Map function) to obtain a set of key-value pairs , map uses the count() method to count the set of key-value pairs to obtain the number of key-value pairs corresponding to th...

Embodiment 3

[0173] Figure 4 Schematic diagram of the composition and structure of the MapReduce-based data distribution device proposed for the embodiment of the present invention Figure 1 , in practical application, based on the same inventive concept of Embodiment 1 to Embodiment 2, such as Figure 4 As shown, the MapReduce-based data distribution device 1 of the embodiment of the present invention includes: a processor 10 , a memory 11 and a communication bus 12 . In the process of a specific embodiment, the above-mentioned processor 10 may be an application-specific integrated circuit (ASIC, Application Specific Integrated Circuit), a digital signal processor (DSP, Digital Signal Processor), a digital signal processing device (DSPD, Digital Signal Processing Device ), a programmable logic device (PLD, Programmable Logic Device), a field programmable gate array (FPGA, Field Programmable Gate Array), a CPU, a controller, a microcontroller, and a microprocessor. It can be understood ...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The embodiment of the invention discloses a data distribution method and device based on MapReduce as well as a computer readable storage medium. The method can comprise the following steps; running apreset Map function on an input to-be-processed document, and obtaining a key value pair set; calculating a type number of keys in the key value pair set by utilizing a preset counting method; calculating a standard numerical value corresponding to each key by utilizing a preset algorithm; carrying out reminder calculation on the standard numerical values and the type number, so as to divide keyvalue pairs with the same type number in the key value pair set into one partition, thus at least one partition is obtained, and the number of at least one partition is the same with the type number;and based on a quantity of the key value pairs, corresponding to at least one partition, of at least one partition and a quantity of at least one to-be-distributed key value pair corresponding to at least one Reduce task, establishing corresponding relationship between at least one partition and at least one Reduce task, so that processing can be carried out on the key value pairs of at least onepartition by at least one Reduce task.

Description

technical field [0001] The present invention relates to big data processing technology in the field of Internet information technology, in particular to a MapReduce-based data distribution method, device and computer-readable storage medium. Background technique [0002] In recent years, with the rapid development of electronic technology, the amount of data that computers need to process has become larger and larger. In order to deal with massive data, the MapReduce distributed computing model is used to process big data. The core of MapReduce is the Map stage and the Reduce stage. , where the main task of the Map stage is that a group of map servers process the data required in the input shards into the form of <key, value> key-value pairs, and the main task of the Reduce stage is that a group of reduce servers process the data of the same key Merge, when the map server sends the <key, value> key-value pair to the reduce server for processing, it needs to parti...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
Patent Type & Authority Applications(China)
IPC IPC(8): G06F9/50
CPCG06F9/5027
Inventor 徐健张文启曹中强严国友孙一波
Owner MIGU CO LTD
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products