Data stream similarity connection method
A connection method and data flow technology, applied in the field of data management, can solve problems such as reducing system processing efficiency and large index maintenance overhead, and achieve the effects of improving processing efficiency and performance, avoiding index maintenance overhead, and fast speed
- Summary
- Abstract
- Description
- Claims
- Application Information
AI Technical Summary
Problems solved by technology
Method used
Image
Examples
Embodiment 1
[0050] In the embodiment of the present invention, given two data streams R and S, each data stream consists of a basic form r i =(r, time), s j =(s, time) data tuple (Tuple) form, where r=(r 1 ,...r n ) is a histogram record containing n data buckets, and time is the timestamp when the histogram record was generated. Given the similarity threshold θ and the sliding window size |W|, the embodiment of the present invention returns a set of tuple pairs, namely {i ,s j >}, where r i ∈R,s j ∈S, and satisfy the sliding window time constraint|r i .timestamp-s j .timestamp|≦|W| and similarity limit EMD based on EMD distance (r i ,s j )≦θ. The meanings of related symbols are detailed in Table 1.
[0051]
[0052]
[0053] Table 1
[0054] figure 1 A flowchart showing a data flow similarity connection method provided by an embodiment of the present invention, as shown in figure 1 As shown, the method may include:
[0055] Step S1. Build a B+ tree forest set index ...
Embodiment 2
[0085] Figure 4 A flow chart showing a data flow similarity connection method provided by another embodiment of the present invention, in Figure 4 neutralize figure 1 Steps with the same reference numbers are the same as figure 1 The same text descriptions are applicable and will not be repeated here.
[0086] Such as Figure 4 As shown, step S3 is also included after step S1, when the number of data tuples contained in the B+ tree forest set index is greater than or equal to the value of c*P and F active .maxTime-F active When .minTime>=P, create a new B+ tree forest index F new , and index the B+ tree forest F new Set to the current active index F active ; where, F active .maxTime is the maximum timestamp of the data tuple maintained by the current active index, F active .minTime is the minimum timestamp of the data tuple maintained by the current active index, and c is the capacity factor of the preset B+ tree forest index.
[0087] Specifically, each B+ tree ...
PUM
Abstract
Description
Claims
Application Information
- R&D Engineer
- R&D Manager
- IP Professional
- Industry Leading Data Capabilities
- Powerful AI technology
- Patent DNA Extraction
Browse by: Latest US Patents, China's latest patents, Technical Efficacy Thesaurus, Application Domain, Technology Topic, Popular Technical Reports.
© 2024 PatSnap. All rights reserved.Legal|Privacy policy|Modern Slavery Act Transparency Statement|Sitemap|About US| Contact US: help@patsnap.com