A Merge Optimization Method Based on Lucene Index Segment

An optimization method and a technology of index segments, which are applied in the direction of text database indexing and unstructured text data retrieval, etc., can solve the problems that the index segments to be merged cannot improve the retrieval speed, affect the performance of index functions and retrieval functions, etc., and achieve improved retrieval The effect of speed, fast calculation

Active Publication Date: 2021-08-31
CHONGQING UNIV OF POSTS & TELECOMM
View PDF6 Cites 0 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0006] In view of this, the purpose of the present invention is to provide a method for merging optimization based on Lucene index segments, which is used to solve the problem that the index segment merging process affects the performance of index functions and retrieval functions, and the selected index segments to be merged cannot improve retrieval speed. question

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • A Merge Optimization Method Based on Lucene Index Segment
  • A Merge Optimization Method Based on Lucene Index Segment
  • A Merge Optimization Method Based on Lucene Index Segment

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0021] The preferred embodiments of the present invention will be described in detail below with reference to the accompanying drawings.

[0022] The method for merging and optimizing index segments proposed by the present invention is to first collect the CPU and I / O load conditions of the current node and obtain segment information of the index when a new index segment is refreshed to the disk, and submit the information to the merge The analysis module performs analysis and processing to determine whether the combination condition is satisfied. When the conditions for merging index segments are met, first calculate the feature matrix of the index segment according to the dictionary files of each index segment, and then combine the minhash algorithm and the minimum hash signature algorithm to calculate the signature matrix of the index segment. The similarity coefficient between each index segment is calculated through the signature matrix, and then the index segment is divi...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention relates to a method for merging and optimizing based on Lucene index segments, and belongs to the technical field of computer indexing. It includes the following steps: combining the load information of the current node and the segment information of the index, constructing a merge analysis module to judge whether the merge condition is satisfied. According to the dictionary files contained in each index segment, the feature matrix of the index segment in the index is obtained, and then combined with the minHash algorithm and the minimum hash signature algorithm to calculate the signature matrix of the index segment. Combined with the signature matrix of the index segment and the Jaccard similarity principle, the similarity coefficient between each index segment is calculated, and the index segment is divided into different similar sets according to the similarity coefficient. Use the similarity evaluation model to score each similar set, and sort according to the set score, and select one or more sets with the highest score to be merged by the merge thread. The optimization method of the invention can reduce the impact of the merge operation on the performance of the index function and the retrieval function and can effectively improve the speed of retrieval.

Description

technical field [0001] The invention belongs to the technical field of computer indexing and relates to a method for merging and optimizing based on Lucene index segments. Background technique [0002] Lucene is not a full-text indexing application that can be used directly, but a full-text indexing toolkit written based on the java development language. It is convenient and quick for Lucene developers to develop full-text indexing applications. [0003] There are several indexes in Lucene, and there are several index segments under each index. The index segment is composed of inverted files, forward files and some intermediate files, while the inverted files are composed of dictionaries and posting lists. When new data is added to the index, the data will first be written to the cache in Lucene, and when the cache reaches a certain threshold, it will be flushed to disk to form a new index segment. Each index segment is an independent retrieval unit and can be retrieved. ...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
Patent Type & Authority Patents(China)
IPC IPC(8): G06F16/31
Inventor 熊安萍李传根龙林波
Owner CHONGQING UNIV OF POSTS & TELECOMM
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products