Index query method and index query device
A query method and index technology, applied in the field of big data, to achieve the effect of improving query performance and realizing distributed fast indexing
- Summary
- Abstract
- Description
- Claims
- Application Information
AI Technical Summary
Problems solved by technology
Method used
Image
Examples
Embodiment 1
[0065] see figure 1 It is a schematic flowchart of an index query method provided in Embodiment 1 of the present invention, the method includes the following steps:
[0066] S11. Obtain original data in the distributed file system, and generate inverted index information of the original data, wherein the inverted index information includes a file name and an offset;
[0067] Specifically, for the original data, the generated inverted index information mainly stores data index information, that is, the position information of the specified fields of the original record. With this index, you can quickly query the file where the data is located and the offset of the corresponding data. The reason why it is an inverted index is that each item in this index table includes an attribute value and the address of each record with the attribute value, because the attribute value is not determined by the record, but the record is determined by the attribute value position, thus becomin...
Embodiment 2
[0082] Referring to embodiment one of the present invention and figure 1 For the specific process of steps S11 to S15 described in , see figure 2 , the step S12 compresses the original data according to a preset LZO compression mode to generate an LZO compression block, and generates a corresponding random access index, specifically including:
[0083] S121. According to the size of the preset compressed block, compress the original data into an LZO compressed block by using the LZO compression mode;
[0084] It is understandable that through the modification of the source code of LZO, the production of related index files and supporting classes such as MapReduce have been realized. Among them, MapReduce is a programming model mainly used for parallel computing of large-scale data sets. The MapReduce monitoring directory implements LZO compression on the original data to obtain compressed file blocks. The LZO file format is a file header and multiple file blocks. Each block ...
Embodiment 3
[0110] Corresponding to the methods disclosed in Embodiment 1 and Embodiment 2 of the present invention, Embodiment 3 of the present invention also provides an index query device, see Figure 6 , the device specifically includes:
[0111] The acquisition module 1 is configured to acquire original data in the distributed file system, and generate inverted index information of the original data, wherein the inverted index information includes a file name and an offset;
[0112] The compression module 2 is configured to compress the original data according to a preset LZO compression mode to generate an LZO compressed block, and generate a corresponding random access index, wherein the random access index includes an original data offset and a compressed block Offset;
[0113] The writing module 3 is used to write the inverted index information into the inverted index file according to the index file format, wherein the inverted index file includes a file header, an index block,...
PUM
Login to View More Abstract
Description
Claims
Application Information
Login to View More 


