Variance optimization histogram construction method and device based on Spark Streaming

A histogram and variance technology, which is applied in the field of big data computing, can solve problems such as the inability to quickly construct online stream data, the inability to meet the construction of variance optimized histogram in the streaming data environment, and the unfavorable dynamic construction of variance optimized histogram.

Pending Publication Date: 2017-09-22
NAT COMP NETWORK & INFORMATION SECURITY MANAGEMENT CENT
View PDF4 Cites 2 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0004] The traditional algorithm based on dynamic programming needs to traverse the data set multiple times, so the time and space complexity are large; there is also a method that can construct a variance optimization histogram for any data set in sub-linear time, but the method can only be used for offline Data cannot satisfy the construction of variance-optimized histograms in streaming data environments; for the premise of limited memory space in streaming data environments, the academic community has also proposed a way to use sample data to construct variance-optimized histograms, but the construction of this method The premise is that the data distribution must be known in advance, and the continuously arriving flow data can be randomly sampled according to the obtained data distribution information; in addition, there is currently a dynamically adjusted approximate variance optimization histogram method, which inserts each newly arrived element into the corresponding In the buckets, the sum of the overall variance of the histogram is approximately optimal by splitting and merging the buckets. The advantage of this method is that it greatly reduces the time complexity of constructing a variance-optimized histogram, but the disadvantage is that all original data need to be saved. Therefore, this method is not conducive to dynamically constructing variance optimization histograms in a limited-space streaming big data environment; at present, under the framework of distributed computing, a method using MapReduce The calculation framework constructs an approximate variance optimization histogram method for the data in the probability database, but this method can only be calculated for offline data, and cannot be quickly constructed for online streaming data, and the current popular streaming data computing platform does not provide calculation variance Methods for Optimizing Histograms

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Variance optimization histogram construction method and device based on Spark Streaming
  • Variance optimization histogram construction method and device based on Spark Streaming
  • Variance optimization histogram construction method and device based on Spark Streaming

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0073] The specific implementation manners of the present invention will be described in detail below in conjunction with the accompanying drawings.

[0074] In order to make the purpose, technical solutions and advantages of the embodiments of the present invention clearer, the technical solutions in the embodiments of the present invention will be clearly and completely described below in conjunction with the drawings in the embodiments of the present invention. Obviously, the described embodiments It is a part of embodiments of the present invention, but not all embodiments. Based on the embodiments of the present invention, all other embodiments obtained by persons of ordinary skill in the art without making creative efforts belong to the protection scope of the present invention.

[0075] With the explosive growth of network data, the real-time analysis technology of streaming data has become a hot field of research. The variance optimized histogram can return high-preci...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention relates to a variance optimization histogram construction method and device based on Spark Streaming. The method comprises the steps that Spark Streaming is utilized to perform online sampling on streaming data; a variance optimization histogram is dynamically constructed according to online sampling data; and the variance optimization histogram is dynamically updated by use of newly added data, and the variance optimization histogram is dynamically constructed again according to the newly added data. Through the technical scheme, the high-precision approximate variance optimization histogram can be constructed by scanning data once within a limited memory space.

Description

technical field [0001] The invention relates to the field of big data computing, in particular to a method and device for constructing a variance optimization histogram based on Spark Streaming. Background technique [0002] With the rapid development of the Internet, Internet of Things, cloud computing, and communication technologies, the explosive growth of data has brought new development opportunities and challenges to many industries. Efficient computing of massive data has become the focus of current research. The computing mode of big data can be divided into two modes: batch computing and streaming computing. Since streaming data has the characteristics of persistence, timeliness, burstiness, and unknown data distribution, compared with batch computing of offline data, streaming data The online processing technology is not yet perfect, and with the development of technology, the demand for efficient calculation of streaming data and statistical analysis and other app...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
Patent Type & Authority Applications(China)
IPC IPC(8): G06F17/30
CPCG06F16/9024
Inventor 史亮王勇张鸿何慧虹
Owner NAT COMP NETWORK & INFORMATION SECURITY MANAGEMENT CENT
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products