Subject extract method based on word simultaneous occurences frequency

A technology of co-occurrence frequency and topic, applied in the field of automatic extraction of text topics

A technology of co-occurrence frequency and topic, applied in the field of automatic extraction of text topics

CN1560762AInactive Publication Date: 2005-01-05SHANGHAI JIAO TONG UNIV

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Subject extract method based on word simultaneous occurences frequency
  • Subject extract method based on word simultaneous occurences frequency
  • Subject extract method based on word simultaneous occurences frequency

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0017] figure 1 It is a block diagram of the method of the present invention, and the following provides embodiments of the present invention in conjunction with the content of the method of the present invention and the accompanying drawings:

[0018] Sample text:

[0019] The first battle of Huaihe River pollution control has been successfully discharged along the Huaihe River

[0020] Bengbu Newspaper reporter Huang Zhenzhong and Bai Jianfeng reported on January 1: The bell of the New Year has just sounded, and there is good news from the thousands of miles of Huaihe River: the discharge of industrial pollution sources along the Huaihe River has reached the standard, and the pollution load has been reduced by more than 40%. The first battle of Huaihe River pollution control has won .

[0021] Xie Zhenhua, director of the National Environmental Protection Agency, solemnly announced that among the 1,562 polluting enterprises in the Huaihe River Basin, 1,139 have completed t...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

PUM

No PUM Login to View More

Abstract

The invention is a theme extracting method based on word coexisting frequency, which belongs to information processing field. The invention uses word as base processing unit, at first, accounts the times of each word to the inputted text, deletes the word occurring once and the normal words, then, accounts the coexisting frequency of word in the text, and acquires the information quantity of coexisting between word and word, the static result is reserved in the matrix, then calculates the information in each sentence and segment because of the coexisting relation of word in each sentence or segment, finally carries on the arrangement and output of the theme sentence and segment after the weight adjustment, thus realizes the extraction of theme sentence or segment.

Description

technical field [0001] The invention relates to a method for automatically extracting text topics, in particular to a method for extracting topics based on word co-occurrence frequency. The field of information processing technology for networks. Background technique [0002] Topic extraction is one of the basic tasks of automatic text processing. Topic extraction can be performed at multiple levels such as topic words, topic concepts, topic sentences, and topic paragraphs. The topic extraction step usually applies various weighting algorithms to calculate the contribution of topic words, sentences, paragraphs, etc. to the text topic, and select the topic words, sentences, paragraphs, etc. that contribute greatly. However, the weighting and extraction algorithms are mostly statistical and empirical weighting systems, which do not take into account the relationship between words appearing in the text, especially when the text style changes, the empirical and statistical wei...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

Application Information

Patent Timeline
05 Jan 2005
Publication
CN1560762A
IPC
G06F17/27
Inventors
李建华; 李生红