A mapreduce system and method for processing data streams
A data flow and data technology, applied in the computer field, can solve the problems that the data flow cannot be divided into a fixed number of data blocks, the data flow cannot be processed, and Reduce processing, etc.
- Summary
- Abstract
- Description
- Claims
- Application Information
AI Technical Summary
Problems solved by technology
Method used
Image
Examples
Embodiment 1
[0057] see figure 1 , the embodiment of the present invention provides a kind of MapReduce system, and this system comprises:
[0058] A Client module 101, a Job Tracker module 102, M Map modules 103 and R Reduce modules 104, wherein M is a positive integer and R is a natural number.
[0059] The client module 101 is configured to submit the parallel processing job written by the user to the job tracking module 102; and inform the job tracking module 102 of the source location information of the parallel computing data stream, wherein the parallel computing data stream corresponds to the parallel processing job pending data flow.
[0060] Wherein, the data in the parallel computing data stream exists in the form of key-value pairs (key, Value).
[0061] Wherein, the job tracking module 102 includes:
[0062] The job decomposition and allocation unit 102a is configured to receive the parallel processing job submitted by the client module 101, decompose the parallel processin...
Embodiment 2
[0103] see Figure 8 , an embodiment of the present invention provides a method for processing a data stream, the method comprising:
[0104] 201: According to the arrival time of the data stream or the preset time interval, continuously obtain a preset number of original data stream data from the source location of the parallel computing data stream; where the parallel computing data stream corresponds to the parallel processing job submitted by the user pending data flow.
[0105] 202: Perform Map processing on each piece of acquired original data flow data to obtain intermediate result data corresponding to each piece of original data flow data.
[0106] 203: Merge and group the intermediate result data into segments to obtain multiple intermediate result data segments.
[0107] 204: Perform corresponding Reduce processing on all intermediate result data having the same key in multiple intermediate result data segments to obtain corresponding final result data.
[0108] ...
PUM
Login to View More Abstract
Description
Claims
Application Information
Login to View More 