Method for calculating similarity connection of mass time series data

What is AI technical title?
AI technical title is built by Patsnap AI team. It summarizes the technical point description of the patent document.
A time series and calculation method technology, applied in the database field, can solve problems such as calculation efficiency in the actuarial stage after partitioning is not considered, and achieve the effect of balanced calculation and uniform data volume

Inactive Publication Date: 2019-03-19

XINJIANG INST OF ENG

View PDF7 Cites 2 Cited by

Summary
Abstract
Description
Claims
Application Information

AI Technical Summary
This helps you quickly interpret patents by identifying the three key elements:
Problems solved by technology
Method used
Benefits of technology

Problems solved by technology

Google and Microsoft proposed two calculation methods, MAPSS and ClusterJoin, which only focus on the calculation amount in the partition stage, but do not consider the calculation efficiency in the actuarial stage after partition.

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Image

Smart Image Click on the blue labels to locate them in the text.

Viewing Examples

Smart Image

Examples

Experimental program

Comparison scheme

Effect test

Embodiment Construction

[0053] Attached below Figure 1-7 , a specific embodiment of the present invention will be described in detail, but it should be understood that the protection scope of the present invention is not limited by the specific embodiment.

[0054] The invention provides a massive time series data similarity connection calculation method, comprising the following steps:

[0055] S1. Data preprocessing. Since it is too difficult to directly process massive data, first randomly sample a small data set S from the massive data set D (such as figure 2 step 1);

[0056] S2. Select a reference point for the sampling data S and build an SJT tree, denoted as SJT S ; In the process of calculating the similarity connection of massive time series data, the SJT tree can be used to prune unnecessary data comparisons, which can improve processing efficiency. Based on the SJT tree, the data pair comparison of the similarity connection is divided into two types, the first type is the internal da...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Login to View More

PUM

Login to View More

Abstract

The invention relates to the technical field of databases, and discloses a method for calculating similarity connection of mass time series data in a distributed environment, which comprises the following steps: S1, data preprocessing: randomly sampling a small data set S from a mass data set D; s2, selecting a reference point for the sampling data S, establishing an SJT tree, and recording the SJT tree as an SJTS; S3, expanding the SJTS tree into a complete tree SJTC; S4, establishing a partition set P = {G1, G2,... Gi,... Gn} for leaf nodes in the complete tree SJTC; and S5, calculating similarity connection comparison data pairs in the partition set P = {G1, G2,... Gi,... Gn} by using a distributed computing framework MapReduce, and obtaining all data pairs meeting a threshold value inthe mass time sequence data set D. Aiming at similarity connection calculation design, the method has the greatest advantages that the massive data sets are pruned through partition information, the calculation amount is effectively reduced, the calculation efficiency is improved, and through testing, the method is superior to two methods, namely the MAPSS proposed by Google and the Cluster Join proposed by Microsoft.

Description

technical field [0001] The invention relates to the technical field of databases, in particular to a method for calculating the similarity connection of massive time series data. Background technique [0002] With the rapid development of the Internet of Things, sensor networks, the Internet, and various smart devices, many industries (such as medical, cyberspace, and various monitoring application scenarios) have continuously accumulated massive time series data. Analyzing and mining time series data is of great significance, because as time goes by, time series contains a large number of rules and characteristics of measured objects changing over time, and value information can be well presented through analysis and mining algorithms. [0003] At present, one of the hot issues in the analysis of massive time series data is to use distributed storage and computing platforms to study the sequence similarity connection problem, which means that under a given similarity measur...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Login to View More

Application Information

Patent Timeline

Login to View More

Patent Type & AuthorityApplications(China)

IPC IPC(8): G06F16/2458

Inventor刘文张土前王思秀刘俊霞付国庆

OwnerXINJIANG INST OF ENG

Method for calculating similarity connection of mass time series data

AI Technical Summary This helps you quickly interpret patents by identifying the three key elements: Problems solved by technologyMethod usedBenefits of technology

Problems solved by technology

Method used

Image

Examples

Embodiment Construction

PUM

Abstract

Description

Claims

Application Information

AI Technical Summary
This helps you quickly interpret patents by identifying the three key elements:
Problems solved by technology
Method used
Benefits of technology