Method for automatically creating keyword index table

A keyword and index table technology, applied in the computer field, can solve the problems of low precision, not taking into account the semantics of words, parts of speech and other information, and achieve the effect of improving the precision and recall rate

Inactive Publication Date: 2013-04-24
IOL WUHAN INFORMATION TECH CO LTD
View PDF3 Cites 43 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0008] The above two types of methods extract keywords from the frequency or rules, without taking into ac

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Method for automatically creating keyword index table
  • Method for automatically creating keyword index table

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0037] Faced with a large number of reference translation document libraries, in order to find suitable similar documents for the documents to be translated, it is necessary to perform complete similarity matching on the document libraries, which is difficult to meet the requirements regardless of time or space. By establishing a keyword index table for the reference translation library, it is possible to quickly find a suitable subset of reference documents for the documents to be translated in the translation library, which can effectively improve the query speed and obtain relatively accurate matching documents. Keywords are used to characterize the important information and core content of the document, which is convenient for obtaining the abstract information of the document and retrieving the specific document.

[0038] The thesaurus dictionary is a synonym classification dictionary encoded in a tree structure. Each node of the tree structure has a unique code and corres...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention discloses a method for automatically creating a keyword index table. The method includes subjecting a file to be translated to word segmentation process to obtain a word list of the file, and subjecting the word list to part-of-speech tagging; filtering candidate keywords in the word list to obtain a coarse candidate word collection and codes of each sense of the candidate keywords; subjecting the candidate keywords to synonym chain construction according to semantic similarity of the words to obtain a synonym chain collection; acquiring word weight of vocabularies in the synonym chain collection and extracting keywords to form a keyword collection according to the word weight; and comparing the keyword collection with an existing reference library keyword index collection, providing a relevant file collection if the existing reference library keyword index collection contains the candidate keywords, otherwise, adding the candidate keywords to the reference library keyword index collection, and simultaneously, creating an index. Compared with traditional keyword extraction methods, the method has the advantages that precision rates and recall rates are obviously increased.

Description

technical field [0001] The invention relates to a computer technology, in particular to a method for automatically establishing a keyword index table. Background technique [0002] Keywords are used to characterize the important information and core content of the document, which is convenient for obtaining the abstract information of the document and retrieving the specific document. Traditional keyword extraction generally adopts manual extraction, and manual keyword extraction is very time-consuming. With the sharp increase in the number of documents, manual keyword extraction is increasingly unable to meet the needs of practical applications. Therefore, how to automatically extract keywords is a hot and difficult point in document retrieval research. [0003] Keyword extraction is a basic research problem in the field of text mining. Many text mining systems use the sentence where the keyword is located as the abstract sentence. Most clustering and classification algori...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
IPC IPC(8): G06F17/30
Inventor 江潮
Owner IOL WUHAN INFORMATION TECH CO LTD
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products