A text keyword weight calculation method integrating a word position factor and a word frequency factor

A technology of weight calculation and keyword, applied in the field of text keyword weight calculation, it can solve the problems of loss of information, not considering the influence of weight, insufficient to express the actual weight, etc., and achieve the effect of simple effect.
CN109766408APending Publication Date: 2019-05-17SHANGHAI UNIV

Patent Information

Authority / Receiving Office
CN ยท China
Patent Type
Applications(China)
Current Assignee / Owner
SHANGHAI UNIV
Publication Date
2019-05-17

Smart Images

  • Figure 1
    Figure 1
  • Figure 2
    Figure 2
  • Figure 3
    Figure 3
Patent Text Reader

Abstract

The invention relates to a text keyword weight calculation method integrating a word position factor and a word frequency factor, and the method comprises the following specific steps of (1) opening asingle text, and recombining paragraphs of the text to form a new text; (2) preprocessing the new text, including word segmentation and stop word removal, and constructing a candidate keyword matrixby taking the rest words as candidate keywords; (3) calculating the weight of each candidate keyword by using the position factors and the word frequency factors of the harmonious series comprehensivewords; and (4) outputting the weight corresponding to each candidate keyword. According to the method, the text structure information is fully utilized, that is, word position factors and word frequency factors in the text are fused, and the weight of a keyword can be calculated only for a single text on the premise of not depending on a field text set. Compared with TFIDF and TEXTRANK, the method has the advantages that the operation is simple and easy, the effect is good, and the functions of the TFIDF and the TEXTRANK can be realized.
Need to check novelty before this filing date? Find Prior Art

Description

technical field

[0001] The present invention relates to a kind of text key word weight calculation method of comprehensive word position factor and word frequency factor, specifically relate to adopting harmonic progression comprehensive word position factor and word frequency factor to calculate the weight of word, improve title and first and last two paragraphs of words Weight, and make each word as the word frequency increases, the weight of the position where the word appears decreases. Background technique

[0002] The most widely used keyword extraction algorithm is vector space model. The vector space model represents the text as a weight vector, each item in the vector is composed of a word, and the weight of each word is determined by the TFIDF method. Among them, the TFIDF method uses the word weight formula to calculate the importance of a word to a single text in the corpus. The word weight of the TFIDF method is the product of the term frequency TF (Term Frequ...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More