Time Series Similarity Query Method Based on Inverted Index

A time series and inverted index technology, applied in special data processing applications, instruments, electrical digital data processing, etc., can solve the problems of query performance differences, low index overhead, low data scale scalability, etc., and achieve low space overhead and maintenance costs, reducing disk I/O overhead, and stabilizing scalability

Inactive Publication Date: 2017-06-13
ZHEJIANG UNIV
View PDF4 Cites 0 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

Such methods include segmented aggregation approximation, segmented linear approximation, symbolic aggregation approximation, singular value decomposition, principal component analysis, etc. The first three methods need to segment the original time series first, and then process each sub-segment separately : The segmented aggregation approximation is to calculate the average value of each segment; the segmented linear approximation is to do line segment fitting on each segment; the symbolic aggregation approximation is to discretize the average value of each segment into symbols based on the segmental aggregation approximation; Different approximations to the original time series will lead to differences in their query performance
Singular value decomposition and principal component analysis are realized by performing a unified eigenmatrix decomposition on all time series; the typical defects of these two types of methods are that they have high computational complexity, and the decomposition process can only be done in memory, and the data Very low scalability at scale
[0005] Most of the indexing methods used in the industry so far are tree-based spatial indexing methods. B-trees were first used to index one-dimensional data and are the basis of many hierarchical index structures; R-tree series, such as R*-tree, R + -Trees, etc., use the minimum bounding rectangle to organize data, but the minimum bounding rectangle will cover a large amount of space without data, resulting in a large number of "false hits" in the query results, thereby reducing query efficiency; A-tree uses a vector approximation file to store the minimum The upper and lower boundaries of the bounding rectangle and virtual bounding rectangle, thus ensuring low indexing overhead and high query completeness
Due to the high-dimensional or ultra-high-dimensional characteristics of the time series in industrial production, even if the dimensionality reduction process is performed within the acceptable range of precision loss, it may still have a very high dimensionality. Therefore, the tree-based index method is prone to "dimensionality". "disaster" problem

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Time Series Similarity Query Method Based on Inverted Index
  • Time Series Similarity Query Method Based on Inverted Index
  • Time Series Similarity Query Method Based on Inverted Index

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0036] The present invention will be described in further detail below in conjunction with the accompanying drawings.

[0037] Such as figure 1 As shown, the time series similarity query method based on the inverted index of the present invention comprises the following steps:

[0038] (1) Index construction, specifically including the following sub-steps:

[0039] (1.1) Read each time series T={t of the time series database sequentially 1 ,t 2 ,...,t i ,...,t n};

[0040] (1.2) Perform feature extraction on the time series T to obtain the coarse-grained symbol aggregation approximate word SW' and the fine-grained symbol aggregation approximate word SW"; specifically:

[0041] (1.2.1) For the time series T, calculate the mean m and standard deviation σ of all the sampling points, and perform Z-normalization processing on T according to the formula (1), and obtain the normalized time series T'={t' 1 ,t' 2 ,...,t' i ,...,t' n};

[0042]

[0043] (1.2.2) Use symbol ...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention discloses a time sequence similarity query method based on inverted indexes. The method comprises steps of index building and query processing, firstly, a real value type time sequence is converted into a discrete character string through symbol aggregate approximation representation, then a characteristic subsequence is extracted, codes are stored by vector approximation files, the subsequence is converted into word insertion inverted indexes with two types of granularity, and multi-granularity time sequence inverted indexes are built. According to the time sequence similarity query method based on the inverted indexes, an efficient two-stage filtration query method is designed for the indexes, k nearest neighbor similarity query can be realized, on the premise that a higher precision ratio is guaranteed, query time overhead is shorter, and good extendibility for the time sequence length, k nearest neighbor similarity query scale and data set scale is achieved; and the method can play an important role in daily activities and industrial production such as real-time query of stock volatility, on-line pattern recognition of sensor data flow and the like.

Description

technical field [0001] The invention relates to the fields of database, data mining and information retrieval, in particular to a time series similarity query method based on an inverted index. Background technique [0002] Time series widely exist in people's daily life and industrial production, such as real-time transaction data of funds or stocks, daily sales data in the retail market, sensor monitoring data in the process industry, astronomical observation data, aerospace radar, satellite monitoring data, real-time Weather temperature and air quality index, etc. [0003] Time series similarity query, also known as time series sample retrieval, has a wide range of application requirements in the industry. For example, in the real-time trading of the stock market, traders want to query the k historical sequences most similar to the current stock trend form from the massive historical stock data as a reference to obtain valuable knowledge and inspiration. Complete the tr...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
Patent Type & Authority Patents(China)
IPC IPC(8): G06F17/30
CPCG06F16/2228G06F16/2246G06F16/2272G06F16/244G06F16/2452G06F16/24528
Inventor 孙建伶陈岭蔡青林马骄阳
Owner ZHEJIANG UNIV
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products