A suffix array indexing method and apparatus for real-time data stream
A suffix array and data indexing technology, applied in the field of data indexing, can solve problems such as speeding up the response time, and the accuracy of the inverted index being easily affected by the effect of word segmentation
- Summary
- Abstract
- Description
- Claims
- Application Information
AI Technical Summary
Problems solved by technology
Method used
Image
Examples
Embodiment 1
[0066] In the method for suffix array indexing of real-time data streams described in this embodiment, the process of creating a suffix array index can be divided into two parts: data processing and storage, and generation of a suffix array index.
[0067] A. Data processing and storage, such as figure 1 , figure 2 As shown, a piece of source data corresponds to a document, a document contains multiple fields, and the field is the data storage unit; a field contains multiple segments, and the segments are divided into temporary segments, dynamic segments, and persistent segments. Segments improve indexing efficiency; segments are independent suffix array indexes, and each segment independently maintains source data and index information. The data processing and storage process includes the following steps:
[0068] A101. The client submits an index request to the server through an HTTP request, records the name of the index library and other information through the request l...
Embodiment 2
[0093] A suffix array indexing method for real-time data streams, using temporary segments to improve indexing efficiency.
[0094] Assuming that the real-time data stream arrives at the server in three batches, source data A, source data B, and source data C are respectively extracted from the real-time data stream. The data size of the source data is 100MB. The index of the real-time data stream There are two implementations:
[0095] Implementation 1: Without the use of temporary segments
[0096] Because the suffix array can only be constructed for a complete segment at a time, if the new data is spliced at the end of the old data, and then the index operation is performed, it will cause the problem of repeatedly creating the suffix array index for the old data, such as Figure 4 shown.
[0097] Time T1: Create a suffix array index for source data A (100MB);
[0098] Time T2: Source data B is spliced at the end of source data A, and a suffix array index is created f...
Embodiment 3
[0108] A suffix array index method for real-time data streams, the suffix array index is composed of segment source data, segment suffix array, and segment information, such as Figure 6 As shown, the suffix array index retrieval process includes the following steps:
[0109] C101. The client initiates a search request, specifying the name of the target index library, the domain to be retrieved, and the search content; if the target index library and the domain to be retrieved are not specified, it defaults to all index libraries and all domains;
[0110] C102. The server receives and parses the retrieval request, determines the target index library, and obtains the corresponding domain object according to the domain to be retrieved;
[0111] C103. Each domain object starts an independent thread to complete data retrieval, reads all segments of the domain to be retrieved (including temporary segments, dynamic segments and persistent segments), and retrieves each segment independ...
PUM
Abstract
Description
Claims
Application Information
- R&D Engineer
- R&D Manager
- IP Professional
- Industry Leading Data Capabilities
- Powerful AI technology
- Patent DNA Extraction
Browse by: Latest US Patents, China's latest patents, Technical Efficacy Thesaurus, Application Domain, Technology Topic, Popular Technical Reports.
© 2024 PatSnap. All rights reserved.Legal|Privacy policy|Modern Slavery Act Transparency Statement|Sitemap|About US| Contact US: help@patsnap.com