Clustering-based time series data compression method and system

A technology of time series data and compression method, applied in database model, relational database, electronic digital data processing and other directions, can solve the problems of resource waste, lossy compression, unable to achieve the minimum data occupied space, etc., to improve the compression effect and save the compression effect of space

Pending Publication Date: 2020-01-17
NANJING ILUVATAR COREX TECH CO LTD (DBA ILUVATAR COREX INC NANJING)
View PDF6 Cites 0 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0003] The existing time series data compression methods are either lossy compression, that is, the accuracy of the data will be lost; or the comp

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Clustering-based time series data compression method and system
  • Clustering-based time series data compression method and system

Examples

Experimental program
Comparison scheme
Effect test

Example Embodiment

[0057] The specific embodiment of the method of the present invention is as follows:

[0058] Suppose the data block to be compressed is:

[0059] (1) 1.0, 2.0, 3.0, 4.0, 5.0

[0060] (2) 2.0, 3.0, 4.0, 5.0, 6.0

[0061] (3) 1.0, 0.0, 0.0, 1.0, 1.0

[0062] (4) 2.0, 0.0, 0.0, 2.0, 2.0

[0063] (5) 3.0, 0.0, 0.0, 3.0, 3.0

[0064] (6) 0.0, 1.0, 2.0, 3.0, 4.0

[0065] Using min-max normalized data as features, the features of 6 data strings are:

[0066] (1) 0.0, 0.25, 0.5, 0.75, 1.0

[0067] (2) 0.0, 0.25, 0.5, 0.75, 1.0

[0068] (3) 1.0, 0.0, 0.0, 1.0, 0.0

[0069] (4) 1.0, 0.0, 0.0, 1.0, 0.0

[0070] (5) 1.0, 0.0, 0.0, 1.0, 0.0

[0071] (6) 0.0, 0.25, 0.5, 0.75, 1.0

[0072] Using DBSCAN clustering algorithm and Euclidean distance, the data string can be divided into two categories:

[0073] The first category is (1), (2), (6), and the second category is (3), (4), (5);

[0074] For each category, select the corresponding optimal compression algorithm for compression.

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention discloses a clustering-based time series data compression method and system. The method comprises the following steps: dividing data into a plurality of data blocks, and segmenting datastrings in the data blocks; for each data block, using a similarity measurement method and a clustering algorithm to measure the similarity between data strings in the data block, and clustering all the data strings in the data block; for each data block, determining an optimal compression algorithm of each type of data string; and for each data block, compressing the data string in each data string category by using the corresponding optimal compression algorithm in the category. According to the method, different time series data can be divided into different groups according to the distribution condition of the data, and a corresponding compression algorithm is determined in each group. The purpose of saving the compression space is achieved through the differentiation of the compression algorithm; the method is not limited to a specific clustering algorithm, a similarity measurement algorithm or a compression algorithm, and the compression effect of the whole algorithm can be improved through continuous expansion.

Description

technical field [0001] The invention belongs to the technical field of time series data compression, and in particular relates to a clustering-based time series data compression method and system. Background technique [0002] When time series data is stored, it needs to occupy a large amount of storage space, resulting in a waste of storage resources. The compression of time series data is to compress the key-value pair (Key-Value Pair) composed of timestamp and value to reduce the occupied space on the hard disk or in memory, and to reproduce the original data according to a certain algorithm. [0003] The existing time series data compression methods are either lossy compression, which will lose the accuracy of the data; or the compression rate is not satisfactory, and the goal of minimum data occupation space cannot be achieved, which will cause a certain waste of resources. Contents of the invention [0004] The technical problem to be solved by the present invention...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
IPC IPC(8): G06F16/215G06F16/2458G06F16/28
CPCG06F16/285G06F16/2474G06F16/215
Inventor 戴峰赵志强
Owner NANJING ILUVATAR COREX TECH CO LTD (DBA ILUVATAR COREX INC NANJING)
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products