Unlock instant, AI-driven research and patent intelligence for your innovation.

A mapreduce system and method for processing data streams

A data flow and data technology, applied in the computer field, can solve the problems that the data flow cannot be divided into a fixed number of data blocks, the data flow cannot be processed, and Reduce processing, etc.

Active Publication Date: 2016-08-31
TENCENT CLOUD COMPUTING BEIJING CO LTD
View PDF1 Cites 0 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

Since the data stream has no fixed quantity range, it is impossible to divide the data stream into a fixed number of data blocks, and it is also impossible to reduce the data after obtaining the intermediate result key-value pairs of all data streams, so the existing MapReduce system cannot process the data flow

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • A mapreduce system and method for processing data streams
  • A mapreduce system and method for processing data streams
  • A mapreduce system and method for processing data streams

Examples

Experimental program
Comparison scheme
Effect test

Embodiment 1

[0057] see figure 1 , the embodiment of the present invention provides a kind of MapReduce system, and this system comprises:

[0058] A Client module 101, a Job Tracker module 102, M Map modules 103 and R Reduce modules 104, wherein M is a positive integer and R is a natural number.

[0059] The client module 101 is configured to submit the parallel processing job written by the user to the job tracking module 102; and inform the job tracking module 102 of the source location information of the parallel computing data stream, wherein the parallel computing data stream corresponds to the parallel processing job pending data flow.

[0060] Wherein, the data in the parallel computing data stream exists in the form of key-value pairs (key, Value).

[0061] Wherein, the job tracking module 102 includes:

[0062] The job decomposition and allocation unit 102a is configured to receive the parallel processing job submitted by the client module 101, decompose the parallel processin...

Embodiment 2

[0103] see Figure 8 , an embodiment of the present invention provides a method for processing a data stream, the method comprising:

[0104] 201: According to the arrival time of the data stream or the preset time interval, continuously obtain a preset number of original data stream data from the source location of the parallel computing data stream; where the parallel computing data stream corresponds to the parallel processing job submitted by the user pending data flow.

[0105] 202: Perform Map processing on each piece of acquired original data flow data to obtain intermediate result data corresponding to each piece of original data flow data.

[0106] 203: Merge and group the intermediate result data into segments to obtain multiple intermediate result data segments.

[0107] 204: Perform corresponding Reduce processing on all intermediate result data having the same key in multiple intermediate result data segments to obtain corresponding final result data.

[0108] ...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

PUM

No PUM Login to View More

Abstract

The invention discloses a MapReduce system and a method for processing data streams, belonging to the technical field of computers. The MapReduce system includes: M mapping Map modules and R statute Reduce modules; the Map module includes: an original data flow data acquisition unit, an intermediate result data acquisition unit, an intermediate result data segment acquisition unit and an intermediate result data segment processing unit ; The Reduce module includes a Reduce processing unit. In the process of parallel processing the data stream, the present invention merges the output of the Map module into segments, and then performs Reduce processing on the segment, so that the MapReduce system can support the processing of the data stream, and solves the problem that the existing MapReduce system cannot process data flow limit.

Description

technical field [0001] The invention relates to the field of computer technology, in particular to a method for processing data streams in a MapReduce system. Background technique [0002] With the development of computer technology, the amount of data that computers need to process is also increasing. A single computer can no longer handle some large-scale data. Therefore, it is necessary to combine multiple computers to form a computer cluster to process large-scale data in parallel. The data. In order to combine multiple computers to process large-scale data in parallel, the prior art provides a system that can be applied to process large-scale data in parallel—MapReduce (mapping protocol) system. [0003] At present, the MapReduce system can process various large-scale data sets in parallel. The process of parallel processing the data sets is as follows: according to the number of Map (mapping) tasks, the data set is decomposed into multiple data blocks; Each original ...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

Application Information

Patent Timeline
no application Login to View More
Patent Type & Authority Patents(China)
IPC IPC(8): G06F17/30
Inventor 蔡斌万乐刘奕慧史晓峰宫振飞张文郁张迪楚大鹏自然
Owner TENCENT CLOUD COMPUTING BEIJING CO LTD