A highly reliable distributed data flow real-time statistical method and system
A statistical method and distributed technology, applied in the field of big data, which can solve the problems of insufficiency of second-level delay, non-dynamically scalable nodes, single point of failure, etc.
- Summary
- Abstract
- Description
- Claims
- Application Information
AI Technical Summary
Problems solved by technology
Method used
Image
Examples
example 1
[0066] Suppose the format of the data source data contains 4 fields, id, timestamp (timestamp), source ip (sip), destination ip (dip), and one of the data streams is as follows:
[0067] ID Timestamp Sip dip 1 09:25 1.1.1.1 2.2.2.2 1 09:25 1.1.1.1 3.3.3.3 1 09:26 1.1.1.1 6.6.6.6 1 09:26 3.3.3.3 5.5.5.5 2 09:27 4.4.4.4 2.2.2.2 2 09:28 4.4.4.4 3.3.3.3 2 09:28 6.6.6.6 1.1.1.1
[0068] Now it is necessary to count how many pieces of information are generated by using a certain ip for all ids that appear in the 3min window. Using this technical invention, the above requirements are configured as a service rule as follows:
[0069] data source Map Granularity Reduce Granularity send to Data Processing Rules Info_mq 1min 3min Result_mq Group_by_and_count: id, sip
[0070] Both the map node and the reduce node will read this rule and parse the rule into tasks corresponding to map a...
PUM
Login to View More Abstract
Description
Claims
Application Information
Login to View More 


