Method for ordering significance of keywords in text

A sorting method and keyword technology, which is applied in special data processing applications, instruments, electrical digital data processing, etc., can solve the problems of poor reliability of judgment results and incomplete judgment basis of keyword importance, etc., so as to improve accuracy The effect of degree and credibility

Inactive Publication Date: 2014-04-23
SHANGHAI UNIV
View PDF5 Cites 17 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

However, the method mentioned above only uses one kind of information in the co-occurrence relationship between the word frequency of the keyword and the term, and judges the importance of the k...

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Method for ordering significance of keywords in text
  • Method for ordering significance of keywords in text
  • Method for ordering significance of keywords in text

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0024] The implementation examples of the present invention will be described in detail below in conjunction with the accompanying drawings. A method for sorting the importance of keywords in the text according to the present invention, such as figure 1 As shown, the specific steps are as follows:

[0025] (1) Segment the text, remove the stop words in the text, retain the punctuation marks with sentence segmentation function in the text, and combine the keywords in the text into a keyword set, which is recorded as A, for example, keyword set A ={data mining, classification, algorithm, decision tree};

[0026] (2), the word frequency of the keyword in the statistical keyword set A generates the word frequency vector of the keyword with the word frequency of the statistical keyword, and is recorded as B, for example, the word frequency vector B of the keyword = [9,6,11, 11];

[0027] (3), according to the order of the term in the word frequency vector B of keyword, the co-oc...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention discloses a method for ordering significance of keywords in a text. The method comprises the steps of performing a word segmentation operation in a text firstly, and then removing stop words to obtain a keyword set of the text; then counting word frequencies of the keywords to obtain word frequency vectors corresponding to the keywords; setting punctuation marks with pausing functions as boundary endpoints of a co-occurrence window, counting co-occurrence information between word items to obtain co-occurrence arrays, and obtaining vectors of distribution conditions of co-occurrence of the keywords from the co-occurrence arrays of the keywords; processing the co-occurrence arrays of the keywords to obtain a keyword significance vector which is judged by a keyword co-occurrence relationship; then integrating the vectors of the distribution conditions of co-occurrence of the keywords from the co-occurrence arrays of the keywords with the word frequency vectors of the keywords by the keyword significance vector which is judged by the co-occurrence of the keywords to obtain a comprehensive significance of the keywords in the test; and finally, ordering the keywords according to the significance degree of the keywords obtained by calculating. According to the method, the significance of the keywords in the text is judged by using various types of information; and accuracy and reliability for judging the significance of the keywords in the text can be improved.

Description

technical field [0001] The invention relates to a method for sorting the importance of keywords in a text. The method is to comprehensively utilize the word frequency of keywords, the co-occurrence relationship between words and the distribution of word co-occurrence, and calculate the keywords in the text. Importance, to achieve the order of importance of keywords in the text. Background technique [0002] In the field of text processing, important keywords in the text are extracted, and then used to represent the text, and then continue to complete the corresponding tasks. In order to extract important keywords in the text, it is essentially necessary to sort the importance of keywords. In the case of not introducing external knowledge, there are two ways to judge the importance of keywords based on the word frequency of keywords and the co-occurrence relationship of terms: one is to judge the importance of keywords only by using the word frequency information of keywords...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
IPC IPC(8): G06F17/30
CPCG06F16/3334
Inventor 陈雪汤文清
Owner SHANGHAI UNIV
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products