Multi-factor fused textrank keyword extraction algorithm

A keyword and multi-factor technology, applied in the field of textrank keyword extraction algorithm that integrates multiple factors, can solve the problems of good effect and high cost, and achieve the effect of small influence of local features

Pending Publication Date: 2020-01-24
YANAN UNIV
View PDF6 Cites 5 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

Guided learning needs to label high-quality training data in advance, the cost of manual preprocessing is high, but the effect is better

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Multi-factor fused textrank keyword extraction algorithm
  • Multi-factor fused textrank keyword extraction algorithm
  • Multi-factor fused textrank keyword extraction algorithm

Examples

Experimental program
Comparison scheme
Effect test

Embodiment

[0085] Example: If you want to calculate the PageRank value (referred to as PR value) of webpage A, you need to know which webpages are linked to webpage A, that is, you must first obtain the inbound link of webpage A, and then calculate it by voting for webpage A through inbound links Page A's PR value;

[0086] The calculation iteration of the PR value is calculated using formula 1;

[0087]

[0088] In formula (1), S(v i ) represents the PR value of webpage i, that is, the importance score, In(v i ) means all inbound webpages of webpage i, and Out(v j ) means all outlinks of webpage j. It can be seen from the formula (1) that the score of webpage i is composed of the sum of the votes of all inbound webpages, and for the vote of one of the linked webpage j, what webpage i gets is The average value of all outgoing links of page j. This average does not take into account the importance of page i. The d in formula (1) is the damping coefficient, which is used to ensure ...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention relates to the technical field of natural language processing, and especially relates to a multi-factor fused textrank keyword extraction algorithm. Influence factors of the keyword extraction algorithm TextRank include five factors including word coverage, word position, word frequency, word length, word span and the like. 1, global factors are greater than local factors in a keyword extraction process; 2, the word coverage, the word length, the word frequency, the word span and the word position influence weight are gradually increased; 3, the influence weights of the word coverage and the word length are basically equivalent, the word span and the word frequency influence weight are basically equivalent when the keyword of the text is extracted by using the TextRank algorithm, only two factors of word positions and word spans can be considered; wherein the ratio of the two factors is 7: 3; 3, because the text needs to be traversed again on the basis of establishing a word graph when the word span is calculated, a certain running time needs to be consumed, if the requirement on the running speed of the algorithm is strict, the word span can be replaced by the word frequency, and the extraction effect is slightly influenced, but is also good.

Description

technical field [0001] The invention relates to the technical field of natural language processing, in particular to a multi-factor textrank keyword extraction algorithm. Background technique [0002] Keyword extraction is the foundation and core technology in natural language processing, and it is widely used in information retrieval, text classification, text clustering, text similarity, automatic summarization, man-machine dialogue, string similarity measurement and other fields. [0003] Automatic keyword extraction can be divided into two categories: supervised and unsupervised according to whether supervised learning is performed. It is a typical guided learning method to build a learning model through training data and judge whether words belong to the keyword category. Guided learning needs to label high-quality training data in advance, and the cost of manual preprocessing is high, but the effect is better. Unsupervised learning is widely used because it does not ...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
Patent Type & Authority Applications(China)
IPC IPC(8): G06F40/216G06F40/289
Inventor 牛永洁
Owner YANAN UNIV
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products