Index statistics method, system and device and storage medium

A statistical method and statistical system technology, applied in computing, special data processing applications, instruments, etc., can solve the problems of not supporting time window update one by one, large amount of calculation, and high delay

Pending Publication Date: 2019-11-19
BEIJING JINGDONG SHANGKE INFORMATION TECH CO LTD +1
View PDF0 Cites 8 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0004] The technical problem to be solved by the present invention is to overcome the defect that the default window mechanism of the big data real-time processing framework in the prior art does not support item-by-item update of the time window, inheriting the defects of large amount of calculation and high delay in the development of the API method, and to provide a fully Index statistics method, system, device and storage medium based on the sliding window of the big data processing framework by reusing the calculated statistical values ​​to reduce the amount of calculation and delay

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Index statistics method, system and device and storage medium
  • Index statistics method, system and device and storage medium
  • Index statistics method, system and device and storage medium

Examples

Experimental program
Comparison scheme
Effect test

Embodiment 1

[0097] Such as figure 1 As shown, this embodiment provides an index statistics method based on the sliding window of the big data processing framework, wherein the big data processing framework is Flink, and this method uses the State object in the mapWithState operator provided by Flink to store the following keys The parameters are the time linked list, the minimum value of the time linked list, the maximum value of the time linked list, and the statistical value of the indicators; among them, the time linked list is a two-way linked list. The direction of positioning to the location of out-of-order data and expired data, the complexity of inserting and deleting operations in the time-linked list is O(1), which can improve processing speed and efficiency.

[0098] The index statistical method provided in this embodiment includes the following steps:

[0099] Step 101, Flink uses multiple receivers to receive external data in parallel; wherein, each piece of data includes an...

Embodiment 2

[0119] Such as figure 2 As shown, the index statistics system based on the sliding window of the big data processing framework in this embodiment includes a parameter storage module 1 , a data processing module 2 and a real-time processing module 3 .

[0120] The big data processing framework is Flink, which includes multiple data receivers.

[0121] The parameter storage module 1 uses the State object in the mapWithState operator to store several parameters. The parameters include the time linked list, the minimum value of the time linked list, the maximum value of the time linked list, and index statistics. Among them, the time-linked list is a doubly-linked list, which stores data in chronological order, and can choose the fastest direction to locate out-of-order data and expired data each time it is searched. The complexity of inserting and deleting operations in the time-linked list is O (1), it can increase the processing speed and improve the efficiency.

[0122] The...

Embodiment 3

[0152] On the basis of Embodiment 2, each piece of data in this embodiment includes several fields of indicators to be counted. In addition, if Figure 7 As shown, different from the real-time processing module 3 in Embodiment 2, the real-time processing module 3 in this embodiment includes a third data generating module 304 , a fourth data generating module 305 and a second current data generating module 306 .

[0153] The third data generating module 304 is configured to use the latest received piece of data as the third data.

[0154] The fourth data generation module 305 is configured to use a KeyBy operator to distribute the third data to different nodes according to the dimension of the indicator field to be counted, and the data distributed to the nodes is the fourth data.

[0155] The second current data generating module 306 is configured to use the fourth data as the current data.

[0156] Common statistical indicators are based on a certain business dimension for ...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention discloses an index statistics method, system and device for a sliding window based on a big data processing framework and a storage medium. The index statistics method comprises the following steps: storing a plurality of parameters by using a state object in a mapWithstate operator, wherein the parameters comprise a time linked list, a minimum value of the time linked list, a maximum value of the time linked list and an index statistics value; enabling a big data processing framework to receive data, wherein each piece of data comprises to-be-counted index fields and corresponding timestamps, and a time chain table is used for storing the timestamps of the data according to a time sequence; taking the received latest data as the current data; and updating the time linked list, the minimum value, the maximum value and the index statistical value according to the timestamp of the current data, the to-be-counted index field of the current data and a preset time window value, wherein the index statistical value is obtained through multiplexing calculation. The calculated statistical value is reused, so that the calculation amount is reduced, the calculation speed is increased, the delay is reduced, and the statistical requirement of the service can be responded more quickly.

Description

technical field [0001] The present invention relates to the field of index statistics of big data, in particular to an index statistics method, system, device and storage medium based on a sliding window of a big data processing framework. Background technique [0002] With the rapid development of big data processing technology, the business has higher and higher requirements on the timeliness of data. The traditional offline index calculation can no longer meet the business requirements. The rapid growth of demand has promoted the update and iteration of technology, and various big data can be processed in real time. Frameworks continue to emerge, and Flink (a big data processing framework) has received widespread attention for its ultra-short processing latency and high throughput, and has been applied in production environments. [0003] In the field of indicator statistics, statistical indicators based on time windows are the most common form. Among various types of tim...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
IPC IPC(8): G06F16/2458G06F16/27
Inventor 白荣林徐峰张帅王浩
Owner BEIJING JINGDONG SHANGKE INFORMATION TECH CO LTD
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products