Looking for breakthrough ideas for innovation challenges? Try Patsnap Eureka!

Data stream similarity connection method

A connection method and data flow technology, applied in the field of data management, can solve problems such as reducing system processing efficiency and large index maintenance overhead, and achieve the effects of improving processing efficiency and performance, avoiding index maintenance overhead, and fast speed

Active Publication Date: 2016-12-21
GUANGXI UNIV
View PDF4 Cites 4 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

This solution is not suitable for the data stream environment, because: First, the data on the data stream arrives in the system continuously, and it is impossible to build a large index in the system to organize all the data, so it should be deleted regularly according to the semantics of the sliding window Some expired data indexes, however, if you frequently delete expired data on a large index structure, it will bring huge index maintenance overhead (such as frequently adjusting the balance of the B+ tree) and reduce the processing efficiency of the system, so it is urgently needed Design a lightweight index for the data stream environment; secondly, the data on the data stream may arrive in the system out of order, so it is necessary to carefully consider the deletion strategy of expired data while designing the lightweight index structure to ensure that future queries The correctness and completeness of the results

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Data stream similarity connection method
  • Data stream similarity connection method
  • Data stream similarity connection method

Examples

Experimental program
Comparison scheme
Effect test

Embodiment 1

[0050] In the embodiment of the present invention, given two data streams R and S, each data stream consists of a basic form r i =(r, time), s j =(s, time) data tuple (Tuple) form, where r=(r 1 ,...r n ) is a histogram record containing n data buckets, and time is the timestamp when the histogram record was generated. Given the similarity threshold θ and the sliding window size |W|, the embodiment of the present invention returns a set of tuple pairs, namely {i ,s j >}, where r i ∈R,s j ∈S, and satisfy the sliding window time constraint|r i .timestamp-s j .timestamp|≦|W| and similarity limit EMD based on EMD distance (r i ,s j )≦θ. The meanings of related symbols are detailed in Table 1.

[0051]

[0052]

[0053] Table 1

[0054] figure 1 A flowchart showing a data flow similarity connection method provided by an embodiment of the present invention, as shown in figure 1 As shown, the method may include:

[0055] Step S1. Build a B+ tree forest set index ...

Embodiment 2

[0085] Figure 4 A flow chart showing a data flow similarity connection method provided by another embodiment of the present invention, in Figure 4 neutralize figure 1 Steps with the same reference numbers are the same as figure 1 The same text descriptions are applicable and will not be repeated here.

[0086] Such as Figure 4 As shown, step S3 is also included after step S1, when the number of data tuples contained in the B+ tree forest set index is greater than or equal to the value of c*P and F active .maxTime-F active When .minTime>=P, create a new B+ tree forest index F new , and index the B+ tree forest F new Set to the current active index F active ; where, F active .maxTime is the maximum timestamp of the data tuple maintained by the current active index, F active .minTime is the minimum timestamp of the data tuple maintained by the current active index, and c is the capacity factor of the preset B+ tree forest index.

[0087] Specifically, each B+ tree ...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

PUM

No PUM Login to View More

Abstract

The invention relates to a data stream similarity connection method. According to a preset time span value P, a B+tree forest set index is constructed on a data stream R; and when the timestamps of data tuples on the data stream R and a data stream S are within a time range of a current sliding window, on the basis of the B+tree forest set index, similarity connection between the data stream R and the data stream S on the basis of an EMD (Earth Mover's Distance) under sliding window semantics is carried out. By use of the method, on the basis of the +tree forest set index, a similarity connection method based on the sliding window semantics and the EMD on the data stream can be designed on the basis of the B+tree forest set index, a solution is put forward for the data stream similarity connection on the basis of the EMD under the sliding window semantics, and the processing efficiency and performance of the similarity connection is obviously improved.

Description

technical field [0001] The invention relates to the technical field of data management, in particular to a connection method for data flow similarity. Background technique [0002] The Melody-Join strategy designs an efficient index construction strategy for the similarity query based on the earth moving distance (English: Earth Mover's Distance, EMD for short). First, the high-dimensional data tuple is mapped to a one-dimensional histogram through the feature vector, and then the cumulative distribution function (CDF) is constructed for the mapped one-dimensional histogram, and then the CDF is transformed into a normal distribution through approximate estimation, and then the The obtained normal distribution is transformed by Hough transform to obtain data points in two-dimensional space. The above process converts high-dimensional data tuples into data points in two-dimensional space. After Melody-Join, a grid index can be constructed on the two-dimensional space and the...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

Application Information

Patent Timeline
no application Login to View More
Patent Type & Authority Applications(China)
IPC IPC(8): G06F17/30
CPCG06F16/2246G06F16/2453G06F16/24553
Inventor 许嘉宋超吕品李陶深张佳振
Owner GUANGXI UNIV
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Patsnap Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Patsnap Eureka Blog
Learn More
PatSnap group products