A method and device for constructing an index library, and a query method and device

An index library and inverted index technology, applied in the field of index library construction, can solve the problems of long time and heavy workload, and achieve the effects of reducing the amount of indexing, fast response, and saving overhead.

Active Publication Date: 2015-10-14
ALIBABA GRP HLDG LTD
View PDF1 Cites 0 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0023] This application provides a method and device for constructing an index database to solve the technical problems of heavy workload and long time consumption in the indexing process existing in the prior art

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • A method and device for constructing an index library, and a query method and device
  • A method and device for constructing an index library, and a query method and device
  • A method and device for constructing an index library, and a query method and device

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0042] Such as image 3 as shown, image 3 It is a flowchart of a method for building an index library in an embodiment of the present application. A method for building an index library in this embodiment includes:

[0043] Step 310: collecting electronic documents;

[0044] Step 312: Extract keywords in the electronic document;

[0045] Step 314: Classify keywords into first category keywords, second category keywords and third category keywords; and

[0046] Step 316: filter out the first category keywords and the second category keywords; and

[0047] Step 318: Build an inverted index for the keywords of the third category.

[0048]Wherein, the keywords of the first category, the keywords of the second category and the keywords of the third category belong to keywords of different categories. Electronic documents include web pages, WORD documents, PDF documents and other electronic information.

[0049] In step 312, the keywords may be obtained by segmenting the arti...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention discloses a method and a device for building an index database as well as a method and a device for querying. The method for building an index database comprises the following steps of: collecting electronic documents; extracting keywords in the electronic documents; classifying the keywords into first-class keywords, second-class keywords and third-class keywords; filtering out the first-class keywords and the second-class keywords; and building an inverted index aiming at the third-class keywords, wherein the first-class keywords, the second-class keywords and the third-class keywords belong to different classes of keywords. In the embodiment of the invention, a method for classifying keywords is adopted, and the keywords without a need to build an index are eliminated, thus saving the space of a disk; and moreover, in the case that the queried keywords are the keywords without a need to build an index, the index database is not queried, thus saving the overhead of the read-write operation of the disk.

Description

technical field [0001] The invention relates to information processing technology, in particular to a method and device for constructing an index database, and a query method and device. Background technique [0002] With the development of the Internet, the amount of information is increasing, and various search engines are applied. Such as figure 1 As shown, the traditional search engine mainly includes the following parts: [0003] Searcher 101, its function is mainly to roam in the Internet, find and collect information; [0004] Indexer 102, whose function is to understand the information searched by the searcher, extract index items from it, use it to represent documents and generate an index table of the document library, and store them in the index library 105; [0005] Retriever 103, its function is to quickly retrieve documents in the index library 105 according to the user's query, perform correlation evaluation, sort the results to be output, and provide reaso...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
Patent Type & Authority Patents(China)
IPC IPC(8): G06F17/30
Inventor 吴凯杨二宝沈加翔陈维
Owner ALIBABA GRP HLDG LTD
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products