Time series data motif identification method and device

A time-series data and motif recognition technology, applied in electrical digital data processing, special data processing applications, instruments, etc., can solve problems such as large amount of calculation, loss of time-series data information, slow recognition of motifs, etc., to improve accuracy , the effect of increasing the number of models

Inactive Publication Date: 2015-06-17
NEC CORP
View PDF4 Cites 2 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0006] Since the above-mentioned accurate identification method needs to calculate the Euclidean distance between every two data subsequences scanned, when the number of scanned data subsequences is large, calculate the Euclidean distance between every two scanned data subsequences The calculation of the K-distance is very heavy, resulting in slow recognition of motifs
[0007] However, the above probabilistic recognition method discretizes and reduces the dimensionality of the time series data through symbolization and random projection, which leads to the loss of some time series data information that may become motifs, and causes a large displacement difference and the original time series that needs to be analyzed Data subsequences with a low probability of repeated occurrence in the data may be identified as motifs. Therefore, the accuracy of the motifs identified by the probabilistic identification method is not high; in addition, when a symbol subsequence and other symbol subsequences are projected When the number of times with the same symbol at the position reaches a certain threshold, the corresponding data subsequence is recognized as a motif, resulting in some motifs that have the same symbol with other symbol subsequences at the projected position that do not reach the threshold cannot be recognized. , so the number of motifs identified by the probabilistic identification method is limited, which further reduces the accuracy of motif identification

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Time series data motif identification method and device
  • Time series data motif identification method and device
  • Time series data motif identification method and device

Examples

Experimental program
Comparison scheme
Effect test

Embodiment 1

[0075] An embodiment of the present invention provides a method for motif recognition of time series data, see figure 1 , the method flow provided by this embodiment includes:

[0076] 101: Obtain time series data to be analyzed, divide the time series data to be analyzed into at least two data subsequences, and perform symbolic processing on each data subsequence to obtain at least two symbolic subsequences.

[0077] 102: Perform preset number of random projections on the symbol subsequence, and record the number of times that each projected symbol subsequence has the same symbol at the projection position as other projected symbol subsequences.

[0078] 103: Calculate the distance between two data subsequences corresponding to the number of recorded times exceeding the threshold, and use the two data subsequences whose distance is smaller than the first preset distance as the identified standard motif.

[0079] 104: Cluster the standard motifs within each preset range to ob...

Embodiment 2

[0103] Because the analysis and research on the motifs of these time series data can reveal the important laws of the movement, change and development of things, which is of great significance to people's correct understanding of things and making scientific decisions based on them. For example, by studying the time-series data of the city's annual traffic conditions, important indicators of the city's traffic conditions can be obtained, and these indicators can provide a basis for us to predict the city's future traffic conditions. To this end, an embodiment of the present invention provides a method for pattern recognition of time series data. The method provided in this embodiment will now be explained in detail in combination with the content of the first embodiment above. see figure 2 , the method flow provided by this embodiment includes:

[0104] 201: Obtain time series data to be analyzed.

[0105] This embodiment does not specifically limit the way to obtain the ti...

Embodiment 3

[0231] see Figure 10 , the embodiment of the present invention provides a time series data motif recognition device, the device includes:

[0232] An acquisition module 1001, configured to acquire time series data to be analyzed;

[0233] A segmentation module 1002, configured to segment the time series data to be analyzed into at least two data subsequences;

[0234] A processing module 1003, configured to perform symbolic processing on each data subsequence to obtain at least two symbolic subsequences;

[0235] A projection module 1004, configured to perform a preset number of random projections on the symbol subsequence;

[0236] A recording module 1005, configured to record the number of times that each projected symbol subsequence and other projected symbol subsequences have the same symbol at the projection position;

[0237] The first identification module 1006 is used to calculate the distance between the two data subsequences corresponding to the number of recordi...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention discloses a time series data motif identification method and device, and belongs to the field of time series data analysis. The time series data motif identification method comprises the steps that time series data needing to be analyzed are divided into at least two data subsequences, and each data subsequence is converted into a symbol subsequence; random projection is performed on the symbol subsequences, and the times that each projected symbol subsequence and other projected symbol subsequences have same signals are recorded; two data subsequences which are corresponding to the time exceeding a threshold value in the recorded times and between which the distance is smaller than a first preset distance serve as identified standard motifs; clustering is performed on the standard motifs in each preset range to obtain a center data subsequence, and the variance of each preset range is calculated according to the standard motifs in each preset range and the center data subsequence; the threshold value is decreased, the distances between two data subsequences which are corresponding to the time exceeding the decreased threshold value in the recorded times and the center data subsequence in the preset range where the two data subsequences are located, and the data subsequence of which the distance is smaller than the variance of the preset range where the data subsequence is located serves as the identified motif. According to the time series data motif identification method and device, under the condition that the motif identification speed is guaranteed, the motif identification accuracy can be improved.

Description

technical field [0001] The invention relates to the field of time-series data analysis, in particular to a method and device for pattern recognition of time-series data. Background technique [0002] With the development of statistics, more and more data are in the form of time series data. Wherein, time series data refers to data recorded in time order. For example, daily fluctuation data of the stock market, annual rainfall data, annual traffic condition data, etc. In these time series data, there are some recurring similar subsequences, and these recurring similar subsequences are called motifs. Since the motifs in time series data are of great significance to scientific research, how to identify motifs in large-scale time series data is the key to the study of time series data. [0003] In the existing motif recognition methods for time series data, there are two common recognition methods: precise recognition method and probabilistic recognition method. For the prec...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
Patent Type & Authority Applications(China)
IPC IPC(8): G06F17/30
Inventor 刘博陈成李建强
Owner NEC CORP
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products