Automatic extraction method for text document theme word meaning

A text document and automatic extraction technology, applied in special data processing applications, instruments, electrical digital data processing, etc., can solve problems such as inaccurate extraction algorithms for topic meanings

Active Publication Date: 2010-11-17
COMTEC SOLAR JIANGSU +1
View PDF0 Cites 14 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0009] In order to eliminate the problem that the existing subject meaning extraction algorithm is inaccurate due to the ambiguity of words, the pre

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Automatic extraction method for text document theme word meaning
  • Automatic extraction method for text document theme word meaning
  • Automatic extraction method for text document theme word meaning

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0063] Given a training text document set T={t 1 ,...,t |T|} and the text document set to be extracted (test text document set) E={e 1 ,...,e |E|}, each text document in T and E is processed according to the following steps 1 and 2, specifically:

[0064] Step 1: Text document preprocessing. For a text document t in T i (i=1, ..., |T|, |T| is the number of text documents in the text document collection T), at first utilize step 1.1 to obtain the candidate topic word of this text document, then utilize step 1.2 to obtain candidate topic word sense, Finally, use step 1.3 to merge the meanings of the candidate subject words to obtain the text document t i The final set of candidate subject word senses.

[0065] Step 1.1: Obtain candidate subject words. First, remove the text document t i Numbers and various punctuation marks in , representing a text document as a collection of words: t i ={w 1 ,...,w ij ,...}; Then, for each word w in the word set ij , the present inv...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention relates to an automatic extraction method for a text document theme word meaning, which comprises the following steps of: firstly, performing text document preprocessing on a training text document set and a testing text document set to obtain a candidate theme word meaning set of text documents in the training text document set and the testing text document set; then, calculating a characteristic attribute value of each candidate theme word meaning; and finally, extracting a final theme word meaning of each text document in the testing text document set by using a Bayesian model. The whole process for extracting the theme meaning by using word-meaning substituting words avoids inaccuracy caused by polysemy, and the method can improve the extraction precision of the theme meaning.

Description

technical field [0001] The invention relates to a method for automatically extracting the subject meaning of a text document, belonging to the fields of computer information processing, natural language processing and the like. It is suitable for fast and accurate extraction of topics from a large number of text documents. Background technique [0002] With the development of the Internet, the growth rate of the total amount of information is increasing exponentially, and a large amount of information is presented in front of people in the form of electronic text documents. There is an urgent need for automated tools to help people quickly find the information they really need in the massive information. In order to achieve this goal, the primary task is to extract the topic meaning of the text document. In addition, topic meaning can also be applied to many other text mining fields, such as text classification, text clustering, and text retrieval. In the most ideal situat...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
IPC IPC(8): G06F17/27G06F17/30
Inventor 方俊郭雷常威威
Owner COMTEC SOLAR JIANGSU
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products