Improved TextRank keyword extraction method and device

An extraction method and keyword technology, applied in the field of improved TextRank keyword extraction methods and devices, can solve the problems of low keyword accuracy, ignoring the importance of candidate keywords, etc.

Active Publication Date: 2021-06-11
YUNNAN UNIV
View PDF9 Cites 1 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0005] This application discloses an improved TextRank keyword extraction method and device, which is used to solve the problem in the prior art that when the traditional TextRank algorithm extracts keywords, the number of word co-occurrences is used as the edge weight, but the candidate keywords are ignored. The importance of the word itself makes the technical problem of low accuracy of keyword extraction

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Improved TextRank keyword extraction method and device
  • Improved TextRank keyword extraction method and device
  • Improved TextRank keyword extraction method and device

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0058] In order to solve the problem in the prior art, the traditional TextRank algorithm uses the number of word co-occurrences as the edge weight when extracting keywords, but ignores the importance of the candidate keywords themselves, which makes the accuracy of keyword extraction low. Technical problem, this application discloses an improved TextRank keyword extraction method and device through the following two embodiments.

[0059] The first embodiment of the present application discloses an improved TextRank keyword extraction method, see figure 1 Shown workflow schematic diagram, described improved TextRank keyword extraction method comprises:

[0060] In step S101, initial text is obtained, and the initial text is preprocessed to determine a total set of candidate keywords. The preprocessing refers to dividing the initial text into multiple sentences, performing word segmentation, part-of-speech tagging, part-of-speech filtering and removing stop words on any senten...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention discloses an improved TextRank keyword extraction method and device. The method comprises the following steps: constructing a word co-occurrence network of a text, and then based on the word co-occurrence network, introducing two complex network statistical characteristics of degree centrality and clustering coefficient of a node to obtain an initial weight of the node; according to the importance degree of the adjacent nodes to the nodes, allocating the initial weight to the connecting edge between the two nodes, and determining the weight of the connecting edge, so the weighting of the connecting edge is realized, and the importance score of each node is determined; introducing a position coefficient to adjust the importance score of the node, and determining the final weight of each node; and finally, sequencing the nodes according to the final weight of each node, and determining keywords of the text. According to the method and the device, the two features of the degree centrality and the clustering coefficient of the node are used for edge connection weighting, and the keyword extraction of the text is realized in combination with the position feature of the node, so that the keyword extraction accuracy can be effectively improved.

Description

technical field [0001] The present application relates to the technical field of natural language processing, in particular to an improved TextRank keyword extraction method and device. Background technique [0002] Text keywords refer to important words that can accurately summarize the text content and reflect the author's writing intention. Text keywords can not only summarize the theme of the text, but also reflect the main content and emotional tendency of the text. Therefore, accurate and efficient text keyword extraction is very important for text clustering, text summarization and information retrieval. [0003] The traditional TextRank algorithm is a keyword extraction algorithm based on graph ranking. It uses the co-occurrence relationship of text candidate keywords in the window to establish links between associated candidate keywords to build a word co-occurrence network. The formula iteratively calculates the weight of each node in the word co-occurrence networ...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
Patent Type & Authority Applications(China)
IPC IPC(8): G06F16/31G06F16/35G06F40/284
CPCG06F16/313G06F16/35G06F40/284Y02D10/00
Inventor 赵娜杨燕王莹港郁湧王剑康雁王鑫锴张强荐胡盛柴焰明龙镇文俊杰马伟云
Owner YUNNAN UNIV
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products