Freshness sensitive big data summary information maintenance and aggregate value query method

A technology for information maintenance and big data, applied in database design/maintenance, electronic digital data processing, structured data retrieval, etc., can solve how to receive and manage in real time, cannot obtain real-time aggregate statistical results, and is difficult to support real-time query Requests and other issues to achieve the effect of improving real-time processing performance and improving real-time query efficiency

Active Publication Date: 2015-09-30
INST OF INFORMATION ENG CHINESE ACAD OF SCI
View PDF2 Cites 8 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0004] In the fast data environment, building real-time summary information for each temporal object faces two basic problems: first: how to receive and manage high-speed streaming big data in real time; current research results have proved that Hadoop-based It is difficult for analysis software to support high real-time query requests; especially in streaming big data, it is difficult to give meaningful results to query requests with strict time constraints (reference: G.Mishne, J.Dalton, Z.Li , A. Sharma, and J. Lin, "Fast data in the era of big data: Twitter's real-time related query suggestion architecture," in Proceedings of the 2013ACM SIGMOD International Conference on Management of Data, ser. SIGMOD'13. NewYork , NY, USA: ACM, 2013, pp.1147–1158.)
The current research results can quickly obtain the aggregation value of top-k objects in big data, but it is impossible to obtain its real-time aggregation statistics for any object (reference: F.Li, K.Yi, and W.Le, "Top -k queries on temporal data,” The VLDB Journal, vol.19, no.5, pp.715–733, Oct.2010)
Research on temporal object management and query optimization techniques has existed for many years (reference: I.F.Ilyas, G.Beskales, and M.A.Soliman, "A survey of topk query processing techniques in relational database systems," ACM Comput.Surv., vol. 40, no.4, 2008), but many technologies are based on MVB-Tree management time attributes, data writing or query requires at least O(log B The time complexity of N) cannot cope with the computing requirements of high-speed loading and real-time query in the fast data environment. Approximate computing technology is a method to effectively process and accelerate data stream computing, such as the method of approximate summation for range-sum (refer to : X.Yun, G.Wu, G.Zhang, K.Li, and S.Wang, "Fastraq: A fast approach to range-aggregate queries in big data environments," Cloud Computing, IEEE Transactions on, vol.PP, no.99, pp.1–1, 2014), an ordered set sampling method (ref: E. Cohen, G. Cormode, and N. Duffield, “Structure-aware sampling: Flexible and accurate summarization,” Proceedings of the VLDB Endowment , vol.4, no.11, 2011), and sliding window technology (reference: M.Datar, A.Gionis, P.Indyk, and R.Motwani, "Maintaining stream statistics over sliding windows: (extended abstract)," in Proceedings of the Thirteenth Annual ACM-SIAM Symposium on Discrete Algorithms, ser. SODA'02, 2002, pp.635–644) etc.
However, the current approximate calculation method does not take into account the time-sensitive characteristics of temporal objects. The new and old data adopt a unified error standard. If you want to obtain high-precision calculation errors, the entire system will set lower error parameters, and you need to maintain a large number of sample
If you save a small amount of sample data, it cannot provide high-precision approximate calculations

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Freshness sensitive big data summary information maintenance and aggregate value query method
  • Freshness sensitive big data summary information maintenance and aggregate value query method
  • Freshness sensitive big data summary information maintenance and aggregate value query method

Examples

Experimental program
Comparison scheme
Effect test

example

[0054] In order to make the above objects, features and advantages of the present invention more obvious and understandable, the present invention will be further described below through specific examples. Example: time interval aggregation query in streaming big data environment.

[0055] In this example, FS-Sketch is applied in the streaming big data environment, as the front-end receiver of the streaming data, it completes the data reception in O(1) time and maintains the summary data. FS-Sketch can effectively support the statistical query of TRAQ type proposed by the present invention. FS-Sketch is generally deployed in the memory structure, which can further serialize the data in FS-Sketch to files for persistent storage. Based on FS-Sketch, data distribution support for streaming big data can be effectively obtained, providing a basis for building high-level index structures and summary data.

[0056] 2. Experimental data and conclusion

[0057] The experiment accord...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention discloses a freshness sensitive big data summary information maintenance and aggregate value query method. The method comprises the steps that (1), a time tracker is built for time object data of each time object; (2), time object data to be written in are mapped to the corresponding time trackers according to the time objects, and then the trackers divide the corresponding time object data into multiple time stages and set the error parameters of each time stage; (3), the trackers sample the time object data within the time stage according to the error parameters of each time stage and store the time object data into sample sets corresponding to the time trackers. In the query process, firstly, the corresponding time trackers are positioned according to the keys of the time objects, then the trackers query the time stages in the time trackers according to the query time information, and a query value is returned according to the samples corresponding to the found time stages. The method can effectively manage and query the time object data and support a higher level of subject-oriented computing application.

Description

technical field [0001] The invention belongs to the field of information technology. Aiming at the application characteristics of streaming big data, combined with the data characteristics of big data in the life cycle, a freshness-sensitive big data summary information maintenance and aggregation value query method is proposed, which effectively supports streaming High-precision approximate aggregation statistical query of big data in any time interval, providing basic tools and platforms for online computing of other streaming big data. Background technique [0002] Streaming big data refers to the big data source generated by a class of applications with high throughput and massive data scale, also known as Fast Data. Typical applications include: microblog data of large microblog websites, click stream data of shopping websites, transaction log stream data, etc. A common feature of this type of data is that in the data record, there is a time attribute (Ts) generated by...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
Patent Type & Authority Applications(China)
IPC IPC(8): G06F17/30
CPCG06F16/21G06F16/2477
Inventor 吴广君王树鹏云晓春张晓宇
Owner INST OF INFORMATION ENG CHINESE ACAD OF SCI
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products