Looking for breakthrough ideas for innovation challenges? Try Patsnap Eureka!

Combination optimizing method based on Lucene index section

An optimization method and technology of index segments, which are applied in special data processing applications, instruments, electronic digital data processing, etc., can solve the problems that affect the performance of index functions and retrieval functions, and the index segments to be merged cannot improve the retrieval speed, etc. The effect of retrieval speed and fast calculation

Active Publication Date: 2018-11-30
CHONGQING UNIV OF POSTS & TELECOMM
View PDF6 Cites 2 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0006] In view of this, the purpose of the present invention is to provide a method for merging optimization based on Lucene index segments, which is used to solve the problem that the index segment merging process affects the performance of index functions and retrieval functions, and the selected index segments to be merged cannot improve retrieval speed. question

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Combination optimizing method based on Lucene index section
  • Combination optimizing method based on Lucene index section
  • Combination optimizing method based on Lucene index section

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0022] The preferred embodiments of the present invention will be described in detail below with reference to the accompanying drawings.

[0023] The method for merging and optimizing index segments proposed by the present invention is to first collect the CPU and I / O load conditions of the current node and obtain segment information of the index when a new index segment is refreshed to the disk, and submit the information to the merge The analysis module performs analysis and processing to determine whether the combination condition is satisfied. When the conditions for merging index segments are met, first calculate the feature matrix of the index segment according to the dictionary files of each index segment, and then combine the minhash algorithm and the minimum hash signature algorithm to calculate the signature matrix of the index segment. The similarity coefficient between each index segment is calculated through the signature matrix, and then the index segment is divi...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

PUM

No PUM Login to View More

Abstract

The invention relates to a combination optimizing method based on a Lucene index section, and belongs to the technical field of the computer index. The method comprises the following steps: combiningcurrent node load information and section information of index, building a combination analyzing module to judge whether to meet a combination condition or not; according to a dictionary file contained in each index section, to obtain a characteristic matrix in the index with respect to an index section, processing by combining a minHash algorithm and a minimum hash signature algorithm, so as to calculate the signature matrix of the index section; through combining the signature matrix of the index section and a Jaccard similarity principle, calculating a similarity coefficient between the index sections, and according to the similarity coefficient, dividing the index sections into different similar sets; and using a similarity evaluation model to grade each similar set, and sorting according to a set score, selecting one or more sets with the highest score to be combined by a combination thread. The optimizing method is capable of reducing the effect of combination operation to performance of an index function and a search function and effectively improving a search speed.

Description

technical field [0001] The invention belongs to the technical field of computer indexing and relates to a method for merging and optimizing based on Lucene index segments. Background technique [0002] Lucene is not a full-text indexing application that can be used directly, but a full-text indexing toolkit written based on the java development language. It is convenient and quick for Lucene developers to develop full-text indexing applications. [0003] There are several indexes in Lucene, and there are several index segments under each index. The index segment is composed of inverted files, forward files and some intermediate files, while the inverted files are composed of dictionaries and posting lists. When new data is added to the index, the data will first be written to the cache in Lucene, and when the cache reaches a certain threshold, it will be flushed to disk to form a new index segment. Each index segment is an independent retrieval unit and can be retrieved. ...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

Application Information

Patent Timeline
no application Login to View More
IPC IPC(8): G06F17/30
Inventor 熊安萍李传根龙林波
Owner CHONGQING UNIV OF POSTS & TELECOMM
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Patsnap Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Patsnap Eureka Blog
Learn More
PatSnap group products