A time series data similarity measurement method and measurement system

A technology of time series data and measurement methods, applied in neural learning methods, medical data mining, electrical digital data processing and other directions, can solve problems such as reducing the efficiency and accuracy of similarity calculation, loss of time information, etc. Indicates dense, reasonable and effective effect

Inactive Publication Date: 2019-06-28
XI AN JIAOTONG UNIV
View PDF3 Cites 5 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0003] At present, the sequence representation method based on one-hot vector is usually used. Due to the characteristics of sparsity and high dimensionality, this representation method will seriously reduce the efficiency and accuracy of similarity calculation.
In addition,

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • A time series data similarity measurement method and measurement system
  • A time series data similarity measurement method and measurement system
  • A time series data similarity measurement method and measurement system

Examples

Experimental program
Comparison scheme
Effect test

Embodiment

[0080] see figure 1 , a time series data similarity measurement method according to an embodiment of the present invention is applied to the similarity measurement of electronic health records, including the following steps:

[0081] S101, constructing an effective representation of medical sequence events in electronic health records.

[0082] Step1, the electronic health record (EMR) matrix is ​​too sparse. The first thing to do is to make the sparse matrix dense and reduce the dimensionality of the high-dimensional matrix. see figure 2 , convert each EMR matrix into an event sequence, arrange the events according to the relative time of the relative events, the events that occur on the same day do not count the order, and finally get a vector H;

[0083] Step2, use word2vec to map each medical event in the electronic health record into a fixed-length vector to obtain the relative relationship of each medical event in the electronic health record. word2vec is an efficient...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention discloses a time series data similarity measurement method and measurement system, and the method comprises the following steps: firstly, learning the vector representation of each eventfor events in all time series data; Secondly, mapping the occurrence time of each event into a vector with the same dimension as the event vector, and embedding the vector into the event vector through vector addition; And finally, sending the final event sequence representation into a convolutional neural network for supervised learning, and finally learning a robust time sequence data similarity measurement model; carrying out similarity measurement through the obtained similarity measurement model. According to the method, the expression of the time sequence data is more reasonable and effective, so that the accuracy of time sequence data similarity measurement can be improved.

Description

technical field [0001] The invention belongs to the technical field of time series data similarity, and in particular relates to a time series data similarity measurement method and a measurement system. Background technique [0002] Data similarity measurement is a basic problem in data science, which involves many application fields such as natural language processing, data retrieval, and cohort analysis. There are a large amount of time series data in the real scene, and these data usually have the characteristics of time series, high dimensionality, heterogeneity, sparsity, unequal dimension and irregularity. [0003] At present, the sequence representation method based on one-hot vector is usually used. Due to the characteristics of sparsity and high dimensionality, this representation method will seriously reduce the efficiency and accuracy of similarity calculation. In addition, existing methods usually aggregate sequence events within a specific time period, ignorin...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
IPC IPC(8): G06K9/62G06N3/04G06N3/08G06F17/27G16H50/70
Inventor 钱步月张先礼陆亮王谞动刘小彤李扬卫荣郑庆华
Owner XI AN JIAOTONG UNIV
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products