Pretreatment method for compressing inverted index
A technology of inverted indexing and preprocessing, which is applied in electrical digital data processing, special data processing applications, instruments, etc., and can solve the problems of low efficiency and inappropriate parallel decompression.
- Summary
- Abstract
- Description
- Claims
- Application Information
AI Technical Summary
Problems solved by technology
Method used
Image
Examples
no. 1 example
[0047] refer to Figure 4 , showing the first embodiment of the preprocessing method for inverted index compression of the present invention, the specific steps are as follows:
[0048] Step S401, for each posting list, use the index x of docID i is the abscissa, the value y i Make a two-dimensional scatter plot for the ordinate, x i 、y i All are non-negative integers, where i=1,...,n, n are positive integers, and a linear regression line y=f(x)=α+βx is generated based on the least squares method, in so that all points in the graph (x i ,y i ) to the vertical deviation y of the line i -f(x i ) sum of squares Minimum, get a list of vertical deviations equivalent to the posting list. This process is called linear regression. Obviously, it is only necessary to calculate the slope, intercept and vertical deviation list offline and save them to a file, and the corresponding inverted list can be calculated based on them when decompressing online, that is to say, th...
Embodiment 2
[0064] refer to Figure 5 , showing the second embodiment of the preprocessing method for inverted index compression of the present invention, the specific steps are as follows:
[0065] Step S501, for each posting list, the index x of docID i is the abscissa, the value y i Make a two-dimensional scatter plot for the ordinate, x i 、y i All are non-negative integers, where i=1,...,n, n are positive integers, and a linear regression line y=f(x)=α+βx is generated based on the least squares method, in so that all points in the graph (x i ,y i ) to the vertical deviation y of the line i -f(x i ) sum of squares Minimum, get a list of vertical deviations equivalent to the posting list.
[0066] Step S502, for each vertical deviation list, all vertical deviations y i -f(x i ) is rounded up and recorded as Obtains a list of integer deviations equivalent to this list of vertical deviations.
[0067] Step S503, for each integer dispersion list, if the integer disper...
Embodiment 3
[0076] refer to Image 6 , showing the third embodiment of the preprocessing method for inverted index compression of the present invention, the specific steps are as follows:
[0077] Step S601, for each posting list, use the index x of docID i is the abscissa, the value y i Make a two-dimensional scatter plot for the ordinate, x i 、y iAll are non-negative integers, where i=1,...,n, n are positive integers, and a linear regression line y=f(x)=α+βx is generated based on the least squares method, in so that all points in the graph (x i ,y i ) to the vertical deviation y of the line i -f(x i ) sum of squares Minimum, get a list of vertical deviations equivalent to the posting list.
[0078] Step S602. Divide each vertical deviation list into segments of equal length. Here, in order to achieve a better compression ratio, the segment length s is generally taken as a power of 2, such as 128, 256.
[0079] Step S603, for each segment of each vertical deviation lis...
PUM
Login to View More Abstract
Description
Claims
Application Information
Login to View More 