Data partitioning method and device for flow data processing system
A technology for processing system and data partition, applied in the field of data processing of big data technology, it can solve the problems of heavy workload of working nodes, heavy workload of working nodes, affecting system performance, etc., to achieve good load balance and avoid communication overhead.
- Summary
- Abstract
- Description
- Claims
- Application Information
AI Technical Summary
Problems solved by technology
Method used
Image
Examples
Embodiment
[0043] The data partition method provided by the present invention is applied to count the top k words with the highest frequency in the stream data.
[0044] Use the Key Grouping method to count the top k words with the highest frequency in the flow data: the word is used as the key value key, and the data source node source uses a hash function to map different words to different work nodes for processing, and the work node worker runs Several counting programs count the frequency of occurrence of different words, select the Top-k of the node after a period of time, send it to the downstream working nodes for summary, and count the final Top-k.
[0045] Since the occurrence frequency of each word in the data stream is different, for example, the occurrence frequency of "the" will be significantly higher than that of the word "champagne", so the load of the worker nodes processing different words will be severely uneven.
[0046] The top k words with the highest frequency in ...
PUM
Login to View More Abstract
Description
Claims
Application Information
Login to View More 


