Method for calculating similarity connection of mass time series data

A time series and calculation method technology, applied in the database field, can solve problems such as calculation efficiency in the actuarial stage after partitioning is not considered, and achieve the effect of balanced calculation and uniform data volume

Inactive Publication Date: 2019-03-19
XINJIANG INST OF ENG
View PDF7 Cites 2 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

Google and Microsoft proposed two calculation methods, MAPSS and ClusterJoin, which only focus on the calculation amount in the partition stage, but do not consider the calculation efficiency in the actuarial stage after partition.

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Method for calculating similarity connection of mass time series data
  • Method for calculating similarity connection of mass time series data
  • Method for calculating similarity connection of mass time series data

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0053] Attached below Figure 1-7 , a specific embodiment of the present invention will be described in detail, but it should be understood that the protection scope of the present invention is not limited by the specific embodiment.

[0054] The invention provides a massive time series data similarity connection calculation method, comprising the following steps:

[0055] S1. Data preprocessing. Since it is too difficult to directly process massive data, first randomly sample a small data set S from the massive data set D (such as figure 2 step 1);

[0056] S2. Select a reference point for the sampling data S and build an SJT tree, denoted as SJT S ; In the process of calculating the similarity connection of massive time series data, the SJT tree can be used to prune unnecessary data comparisons, which can improve processing efficiency. Based on the SJT tree, the data pair comparison of the similarity connection is divided into two types, the first type is the internal da...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention relates to the technical field of databases, and discloses a method for calculating similarity connection of mass time series data in a distributed environment, which comprises the following steps: S1, data preprocessing: randomly sampling a small data set S from a mass data set D; s2, selecting a reference point for the sampling data S, establishing an SJT tree, and recording the SJT tree as an SJTS; S3, expanding the SJTS tree into a complete tree SJTC; S4, establishing a partition set P = {G1, G2,... Gi,... Gn} for leaf nodes in the complete tree SJTC; and S5, calculating similarity connection comparison data pairs in the partition set P = {G1, G2,... Gi,... Gn} by using a distributed computing framework MapReduce, and obtaining all data pairs meeting a threshold value inthe mass time sequence data set D. Aiming at similarity connection calculation design, the method has the greatest advantages that the massive data sets are pruned through partition information, the calculation amount is effectively reduced, the calculation efficiency is improved, and through testing, the method is superior to two methods, namely the MAPSS proposed by Google and the Cluster Join proposed by Microsoft.

Description

technical field [0001] The invention relates to the technical field of databases, in particular to a method for calculating the similarity connection of massive time series data. Background technique [0002] With the rapid development of the Internet of Things, sensor networks, the Internet, and various smart devices, many industries (such as medical, cyberspace, and various monitoring application scenarios) have continuously accumulated massive time series data. Analyzing and mining time series data is of great significance, because as time goes by, time series contains a large number of rules and characteristics of measured objects changing over time, and value information can be well presented through analysis and mining algorithms. [0003] At present, one of the hot issues in the analysis of massive time series data is to use distributed storage and computing platforms to study the sequence similarity connection problem, which means that under a given similarity measur...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
Patent Type & Authority Applications(China)
IPC IPC(8): G06F16/2458
Inventor 刘文张土前王思秀刘俊霞付国庆
Owner XINJIANG INST OF ENG
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products