Method for extracting key phrases based on lexical chain

A technology of key phrases and vocabulary chains, which is applied in special data processing applications, instruments, and electronic digital data processing, etc., can solve the problems of low coverage of accurate document topic information and the inability of keyword extraction methods to accurately reflect topic information, etc., to achieve Effects of increased speed, reduced dimensionality, and avoidance of redundancy

Inactive Publication Date: 2011-04-27
HARBIN INST OF TECH
View PDF0 Cites 16 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0005] The purpose of the present invention is to solve the problem that the existing keyword extraction method cannot accurately reflect the topic information described in the article, and the accuracy of key phrase extraction and the low coverage of the document topic information in the existing key phrase extraction method based on lexical chains problem, providing a key phrase extraction method based on lexical chains

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Method for extracting key phrases based on lexical chain
  • Method for extracting key phrases based on lexical chain
  • Method for extracting key phrases based on lexical chain

Examples

Experimental program
Comparison scheme
Effect test

specific Embodiment approach 1

[0029] Specific implementation mode one: combine figure 1 Illustrate this implementation mode, a kind of key phrase extracting method based on lexical chain, be based on computer realization, "HowNet" dictionary is housed in this computer, the specific steps of method are:

[0030] Step 1: take the document of the article to be processed as the extraction object, and obtain the word meaning in the document;

[0031] Step 2: Use the dictionary "HowNet" to disambiguate words, and filter out the abstract sememes in "HowNet";

[0032] Step 3: Construct a lexical chain for the disambiguated words, obtain a set L of lexical chains, and obtain multiple strong chains;

[0033] Step 4: Select a core word from each strong chain, and use these core words to form the core word set of the document;

[0034] Step 5: Calculate the co-occurrence rate between different core words in the core word set, and select the core word whose co-occurrence rate is greater than the extraction threshold ...

specific Embodiment approach 2

[0037] Specific embodiment two: this embodiment is a further description of step 1 in the method for extracting key phrases based on lexical chains described in specific embodiment 1. The steps for obtaining word meanings described in step 1 are:

[0038] Step A: Perform word segmentation and stop word filtering on the document to obtain the word space WordSet of the document;

[0039] Step B: Sequentially scan the word space WordSet to obtain the meaning of each word in the word space WordSet one by one. The process of obtaining the meaning of each word is:

[0040] Step B1: Set the word sequence in the document as: M1, M2, M, M3, M4, where M is the word whose meaning is currently to be determined, and M1, M2, M3, M4 is the context information of M, such as figure 2 as shown, figure 2 The vertices in represent the sense classes corresponding to each word, and the edges between the vertices are the degree of association between the sense classes;

[0041] Step B2: From f...

specific Embodiment approach 3

[0043] Specific embodiment three: This embodiment is a further description of step 2 in the method for extracting key phrases based on lexical chains described in specific embodiment 1. The dictionary "HowNet" described in step 2 is a word database , stored on the computer hard drive.

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention discloses a method for extracting key phrases based on a lexical chain, relating to a method for extracting the key phrases and aiming at solving the problems that the traditional method for extracting key words cannot exactly reflect subject information described by articles and the traditional method for extracting the key phrases based on the lexical chain has lower accuracy for extracting the key phrases and lower coverage for subject information of documents. The method comprises the following steps of: firstly, acquiring word meanings; secondly, eliminating the ambiguity of words by using a dictionary 'Hownet'; thirdly, constructing the words eliminating the ambiguity into the lexical chain and acquiring a plurality of strong chains; fourthly, selecting center words from the strong chains to form a center word set; and fifthly, calculating the concurrence rate among different center words in the center word set and selecting the center words the concurrent rate of which is greater than the extraction threshold set by a user as key phrases. By using the invention, the subject information of the documents can be effectively reflected, the accuracy for extracting the key phrases can be improved and the subject information of the documents can be effectively covered by fewer key phrases. The invention is applied to the filed of key word extraction.

Description

technical field [0001] The invention relates to a key phrase extraction method. Background technique [0002] With the popularity of the Internet, people are exposed to more and more information every day, so how to quickly and accurately grasp the content described by a large amount of information has become more and more important in people's daily life. Keyword tagging technology is a good solution to the above problems. Good keywords can enable readers to quickly grasp the main content of the article, and at the same time deepen readers' understanding of the article. [0003] Keyword extraction has always been the main research problem in the field of text mining. At the same time, this technology can also be applied to other fields. For example, a large number of library systems and information retrieval systems use keyword extraction technology to construct document indexes; The sentence is used as an abstract sentence; many clustering and classification algorithms al...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
Patent Type & Authority Applications(China)
IPC IPC(8): G06F17/30
Inventor 刘铭刘远超王晓龙刘秉权林磊单丽莉孙承杰
Owner HARBIN INST OF TECH
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products