A text keyword weight calculation method integrating a word position factor and a word frequency factor

A technology of weight calculation and keyword, applied in the field of text keyword weight calculation, it can solve the problems of loss of information, not considering the influence of weight, insufficient to express the actual weight, etc., and achieve the effect of simple effect.

Pending Publication Date: 2019-05-17
SHANGHAI UNIV
View PDF3 Cites 7 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0003] (1) The vector space model regards the text as a collection of words, and regards the relationship between words as independent, thus losing the information reflected in the text paragraph structure
[0004] (2) The TFIDF method does not consider the influence of their location factors on their weight when calculating word frequency, but considering the number of occurrences or co-occurrence times alone is not enough to express its actual weight
[0005] (3) When the TFIDF method calculates the inverse document frequency of a word, it needs to rely on the domain text collection and cannot target a single text, and the quality and scale of the domain collection have a huge impact on word weight calculation and keyword extraction

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • A text keyword weight calculation method integrating a word position factor and a word frequency factor
  • A text keyword weight calculation method integrating a word position factor and a word frequency factor
  • A text keyword weight calculation method integrating a word position factor and a word frequency factor

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0021] Embodiments of the present invention will be further described below in conjunction with the accompanying drawings.

[0022] This embodiment takes the article "HRing: A Structured P2P Overlay Based on Harmonic Series" published in IEEE Transactions on Parallel and Distributed Systems as an example. Such as figure 1 As shown, a text representation model, the steps are as follows:

[0023] S1. Open a single text, and recombine its paragraphs to form a new text. The title of the original text shall be the first paragraph of the new text; the first and last two paragraphs of the original text shall be the second and third paragraphs of the new text respectively; the rest of the original text shall be merged into one paragraph in the original order. The new text therefore has four paragraphs.

[0024] S2. Preprocessing the new text, including word segmentation and removing stop words, the remaining words are used as candidate keywords, and a candidate keyword matrix A[i][...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention relates to a text keyword weight calculation method integrating a word position factor and a word frequency factor, and the method comprises the following specific steps of (1) opening asingle text, and recombining paragraphs of the text to form a new text; (2) preprocessing the new text, including word segmentation and stop word removal, and constructing a candidate keyword matrixby taking the rest words as candidate keywords; (3) calculating the weight of each candidate keyword by using the position factors and the word frequency factors of the harmonious series comprehensivewords; and (4) outputting the weight corresponding to each candidate keyword. According to the method, the text structure information is fully utilized, that is, word position factors and word frequency factors in the text are fused, and the weight of a keyword can be calculated only for a single text on the premise of not depending on a field text set. Compared with TFIDF and TEXTRANK, the method has the advantages that the operation is simple and easy, the effect is good, and the functions of the TFIDF and the TEXTRANK can be realized.

Description

technical field [0001] The present invention relates to a kind of text key word weight calculation method of comprehensive word position factor and word frequency factor, specifically relate to adopting harmonic progression comprehensive word position factor and word frequency factor to calculate the weight of word, improve title and first and last two paragraphs of words Weight, and make each word as the word frequency increases, the weight of the position where the word appears decreases. Background technique [0002] The most widely used keyword extraction algorithm is vector space model. The vector space model represents the text as a weight vector, each item in the vector is composed of a word, and the weight of each word is determined by the TFIDF method. Among them, the TFIDF method uses the word weight formula to calculate the importance of a word to a single text in the corpus. The word weight of the TFIDF method is the product of the term frequency TF (Term Frequ...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
Patent Type & Authority Applications(China)
IPC IPC(8): G06F16/33G06F17/27
Inventor 骆祥峰陈雪陈光勇王鹏张惠然王小飞魏晓
Owner SHANGHAI UNIV
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products