Text keyword extracting method based on subject model

A topic model and extraction method technology, applied in special data processing applications, instruments, electrical digital data processing, etc., can solve problems such as difficulty in obtaining effects and effect decline

Inactive Publication Date: 2014-04-23
SHANGHAI UNIV
View PDF3 Cites 29 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

Under the above circumstances, if the word frequency of the term is used as the basis for extracting keywords from the text, it is bound to be difficult to obtain better results
Moreover, the traditional text keyword extraction meth

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Text keyword extracting method based on subject model
  • Text keyword extracting method based on subject model
  • Text keyword extracting method based on subject model

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0032] Embodiments of the present invention will be described in further detail below in conjunction with the accompanying drawings.

[0033] Such as figure 1 As shown, a text keyword extraction method based on topic model, the specific steps of the method are as follows:

[0034] S1. Use the method in the topic model to obtain the probability matrix between terms and topics from a large number of text training sets, which is recorded as the probability matrix of terms and topics in the training text set ;

[0035] S2. Perform word segmentation and preprocessing operations for removing stop words on a text to obtain the corresponding candidate keyword set A, and then according to the candidate keyword set Key words in , take out the probability matrix of terms and topics in the above training text set A line corresponding to the candidate keywords in the middle, generate a set of candidate keywords The term-to-topic probability matrix for the corresponding term-to-topi...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention discloses a text keyword extracting method based on a subject model. The method comprises the following steps: firstly obtaining a probability matrix WT of the lexical item and the subject of a training text set between the lexical item and the subject obtained through training by the subject model from a great deal of text training sets by using the subject model method ; further obtaining a probability matrix B of the lexical item and the subject of candidate keyword composed of the set of probability vectors of the subject and the lexical item in a candidate keyword set A, and obtaining a word frequency weight vector D of the candidate keyword corresponding to the candidate keyword set, cyclically computing by using the probability matrix B of the subject of the candidate keyword through the weight vector of the lexical item of the candidate keyword and the subject vector of the text to obtain the finally modified text subject vector and lexical item weight proportion vector, and thus extracting the keyword of the text. According to the text keyword extracting method based on the subject model, the error in keyword extraction due to different lengths of texts is reduced, and the keyword more proper to represent the text content is extracted.

Description

technical field [0001] The present invention relates to a method for extracting keywords from text, more specifically, relates to a method based on obtaining a probability matrix between terms and topics from a topic model, and then using the matrix to extract more expressive text from the text The method of subject content keywords. Background technique [0002] Before using the computer to process the text, it is necessary to formally represent the text. In traditional methods, keywords are usually extracted from the text to represent the content of the text. Key words are extracted from the text, and the word frequency of the key words is used as a very important basis. However, since the lengths of different types of texts are different, there will be large errors in the word frequency of keywords. Especially for short texts, many terms in the short text appear only once. Under the above circumstances, if the word frequency of the term is used as the basis for extrac...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
IPC IPC(8): G06F17/27
Inventor 陈雪汤文清
Owner SHANGHAI UNIV
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products