Unlock instant, AI-driven research and patent intelligence for your innovation.
Pretreatment method for compressing inverted index
What is Al technical title?
Al technical title is built by PatSnap Al team. It summarizes the technical point description of the patent document.
A technology of inverted indexing and preprocessing, which is applied in electrical digital data processing, special data processing applications, instruments, etc., and can solve the problems of low efficiency and inappropriate parallel decompression.
Active Publication Date: 2012-08-22
NANKAI UNIV
View PDF0 Cites 0 Cited by
Summary
Abstract
Description
Claims
Application Information
AI Technical Summary
This helps you quickly interpret patents by identifying the three key elements:
Problems solved by technology
Method used
Benefits of technology
Problems solved by technology
[0019] The purpose of the present invention is to provide a new type of inverted index based on linear regression for the existing d-gap preprocessing method based on the parallel decompression efficiency of the inverted index compression method is low, not suitable for combination with the set merge method compression preprocessing method
Method used
the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more
Image
Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
Click on the blue label to locate the original text in one second.
Reading with bidirectional positioning of images and text.
Smart Image
Examples
Experimental program
Comparison scheme
Effect test
no. 1 example
[0062] refer to Figure 4 , showing the first embodiment of the preprocessing method for inverted index compression of the present invention, the specific steps are as follows:
[0063] Step S401, for each posting list, use the index x of docID i is the abscissa, the value y i Make a two-dimensional scatter plot for the ordinate, x i 、y i All are non-negative integers, where i=1,...,n, n are positive integers, and a linear regression line y=f(x)=α+βx is generated based on the least squares method, β = Σ i = 1 n ( x i - x ‾ ) ( y i - y ‾ ) / Σ i ...
Embodiment 2
[0078] refer to Figure 5 , showing the second embodiment of the preprocessing method for inverted index compression of the present invention, the specific steps are as follows:
[0079] Step S501, for each posting list, the index x of docID i is the abscissa, the value y i Make a two-dimensional scatter plot for the ordinate, x i 、y i All are non-negative integers, where i=1,...,n, n are positive integers, and a linear regression line y=f(x)=α+βx is generated based on the least squares method, β = Σ i = 1 n ( x i - x ‾ ) ( y i - y ‾ ) / Σ i =...
Embodiment 3
[0090] refer to Image 6 , showing the third embodiment of the preprocessing method for inverted index compression of the present invention, the specific steps are as follows:
[0091] Step S601, for each posting list, use the index x of docID i is the abscissa, the value y i Make a two-dimensional scatter plot for the ordinate, x i 、y i All are non-negative integers, where i=1,...,n, n are positive integers, and a linear regression line y=f(x)=α+βx is generated based on the least squares method, β = Σ i = 1 n ( x i - x ‾ ) ( y i - y ‾ ) / Σ i ...
the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More
PUM
Login to View More
Abstract
The invention relates to a pretreatment method for compressing an inverted index, which comprises the following steps: for each inverted list, generating a linear regression line based on the least square method by using docID indices as horizontal coordinates and using values as longitudinal coordinates to draw a two-dimensional scatter diagram, and ensuring that the quadratic sum of vertical dispersion from each point in the diagram to the line is the minimum so as to obtain a vertical dispersion list equivalent to the inverted list; for each vertical dispersion list, rounding up all the vertical dispersions to obtain an integer dispersion list equivalent to the vertical dispersion list; and for each integer dispersion list, calculating the minimum value, and simultaneously subtracting the minimum value from all the integer dispersions to obtain a nonnegative integer dispersion list equivalent to the integer dispersion list. Based on the compression algorithm provided by the invention, a higher compression ratio is achieved, the parallel decompression efficiency is improved, and a set merging method can be combined better.
Description
【Technical field】 [0001] The invention relates to the field of inverted index compression, in particular to a preprocessing method for inverted index compression. 【Background technique】 [0002] The most widely used data structure in full-text search engines is the inverted index. An inverted index consists of two main parts: a dictionary and an inverted list. The dictionary establishes a one-to-one correspondence between keywords and postings, and a keyword postings is composed of a series of basic units called postings. Given a keyword, its post may contain information such as the document identifier (called docID), frequency, and location of the webpage where the keyword appears, or may only contain the docID of the webpage where the keyword appears. In this invention, we assume that each posting list consists of a series of docIDs. [0003] The full-text search engine continuously receives user query requests, performs word segmentation on query requests to obtain sev...
Claims
the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More
Application Information
Patent Timeline
Application Date:The date an application was filed.
Publication Date:The date a patent or application was officially published.
First Publication Date:The earliest publication date of a patent with the same application number.
Issue Date:Publication date of the patent grant document.
PCT Entry Date:The Entry date of PCT National Phase.
Estimated Expiry Date:The statutory expiry date of a patent right according to the Patent Law, and it is the longest term of protection that the patent right can achieve without the termination of the patent right due to other reasons(Term extension factor has been taken into account ).
Invalid Date:Actual expiry date is based on effective date or publication date of legal transaction data of invalid patent.