Unlock instant, AI-driven research and patent intelligence for your innovation.

Method and device for creating full-text index data

A full-text indexing and data technology, applied in digital data processing, special data processing applications, instruments, etc., can solve problems such as index data efficiency decline, achieve the effect of reducing word segmentation time and effectively using computing resources

Inactive Publication Date: 2017-05-31
TIANJIN NANKAI UNIV GENERAL DATA TECH
View PDF4 Cites 3 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0005] In view of this, an embodiment of the present invention provides a method and device for creating full-text index data to solve the technical problem of decreasing efficiency of index data in the prior art

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Method and device for creating full-text index data
  • Method and device for creating full-text index data
  • Method and device for creating full-text index data

Examples

Experimental program
Comparison scheme
Effect test

Embodiment 1

[0043] figure 1 The flow chart of the method for creating full-text index data provided by Embodiment 1 of the present invention. This embodiment is applicable to the situation of creating full-text index data. This method can be executed by a device for creating full-text index data. The device can be implemented by software / hardware It can be realized in a way and can be integrated in the database system.

[0044] see figure 1 , the creation method of the full-text index data, including:

[0045] S110, perform word segmentation on the document in parallel, and record word positions and word marks.

[0046] figure 2 It is a schematic diagram of the process in the method for creating full-text index data provided by Embodiment 1 of the present invention, and C refers to figure 1 and figure 2 , Exemplarily, multiple parallel processing threads can be used for word segmentation, and the specific number can be adjusted according to the actual hardware resources of the user...

Embodiment 2

[0055] image 3It is a schematic flowchart of the method for creating full-text index data provided by Embodiment 2 of the present invention. The embodiment of the present invention is based on the above-mentioned embodiments. Further, before performing word segmentation on documents in parallel, add: number the documents to be indexed, and according to The number and the content of the document generate a data item; and the data item is filled into a memory block of a preset size.

[0056] see image 3 , the creation method of the full-text index data, including:

[0057] S210. Number the document to be indexed, and generate a data item according to the number and document content.

[0058] Data items are encapsulated in the order of Doc ID (document number), and each data block contains ordered Doc ID and corresponding document content.

[0059] S220. Fill the data item into a memory block with a preset size.

[0060] Exemplarily, the main thread can pre-read the documen...

Embodiment 3

[0067] Figure 4 It is a schematic flowchart of the method for creating full-text index data provided by Embodiment 3 of the present invention. The embodiment of the present invention is based on the above-mentioned embodiments. Further, after classifying the same words, the following steps are added: according to the classification The word is encapsulated to generate an encapsulated data packet, the encapsulated data packet includes: a word mark and position information data corresponding to the word mark, and the position information data is stored in a differential manner.

[0068] see Figure 4 , the creation method of the full-text index data, including:

[0069] S310. Number the document to be indexed, and generate a data item according to the number and document content.

[0070] Figure 5 It is a schematic diagram of instance creation in the method for creating full-text index data provided by Embodiment 3 of the present invention, see Figure 4 and Figure 5 . ...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

PUM

No PUM Login to View More

Abstract

The invention provides a method and a device for creating full-text index data. The method comprises steps as follows: documents are subjected to word segmentation in parallel, and word positions and word marks are recorded; word segmentation results are encapsulated according to a sequence of storage data blocks for recording the word positions and the word marks, encapsulating blocks and an index of the encapsulating blocks are generated, wherein the index comprises numbers and quantity of words; the words are sorted according to the index; the encapsulating blocks are unpacked, and the same words are classified. According to the method and the device, computing resources can be effectively utilized, word segmentation time is reduced, besides, the word segmentation results can be subjected to sorting processing, and the index data is obtained accurately.

Description

technical field [0001] The invention belongs to the field of data retrieval, and in particular relates to a method and device for creating full-text index data. Background technique [0002] In a relational database system, full-text indexing is one of the most efficient ways to retrieve document data. In the current network environment, the amount of information and users are growing explosively. Full-text indexing has become one of the main means of information retrieval systems. The inverted index is the core part of the full-text retrieval system, and its creation efficiency also has a great influence on the application of the full-text retrieval system. [0003] Inverted index (English: Inverted index), also known as reverse index, embedded file or reverse file, is an index method used to store a word in a document or a group of documents under full-text search. A map of storage locations in the document. It is the most commonly used data structure in document retriev...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

Application Information

Patent Timeline
no application Login to View More
IPC IPC(8): G06F17/30
CPCG06F16/328
Inventor 崔维力武新史大义梁东阳
Owner TIANJIN NANKAI UNIV GENERAL DATA TECH