Subject extract method based on word simultaneous occurences frequency

A technology of co-occurrence frequency and topic, applied in the field of automatic extraction of text topics

Inactive Publication Date: 2005-01-05
SHANGHAI JIAO TONG UNIV
View PDF0 Cites 10 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

However, this method cannot be style-independent, and th

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Subject extract method based on word simultaneous occurences frequency
  • Subject extract method based on word simultaneous occurences frequency
  • Subject extract method based on word simultaneous occurences frequency

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0017] figure 1 It is a block diagram of the method of the present invention, and the following provides embodiments of the present invention in conjunction with the content of the method of the present invention and the accompanying drawings:

[0018] Sample text:

[0019] The first battle of Huaihe River pollution control has been successfully discharged along the Huaihe River

[0020] Bengbu Newspaper reporter Huang Zhenzhong and Bai Jianfeng reported on January 1: The bell of the New Year has just sounded, and there is good news from the thousands of miles of Huaihe River: the discharge of industrial pollution sources along the Huaihe River has reached the standard, and the pollution load has been reduced by more than 40%. The first battle of Huaihe River pollution control has won .

[0021] Xie Zhenhua, director of the National Environmental Protection Agency, solemnly announced that among the 1,562 polluting enterprises in the Huaihe River Basin, 1,139 have completed t...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention is a theme extracting method based on word coexisting frequency, which belongs to information processing field. The invention uses word as base processing unit, at first, accounts the times of each word to the inputted text, deletes the word occurring once and the normal words, then, accounts the coexisting frequency of word in the text, and acquires the information quantity of coexisting between word and word, the static result is reserved in the matrix, then calculates the information in each sentence and segment because of the coexisting relation of word in each sentence or segment, finally carries on the arrangement and output of the theme sentence and segment after the weight adjustment, thus realizes the extraction of theme sentence or segment.

Description

technical field [0001] The invention relates to a method for automatically extracting text topics, in particular to a method for extracting topics based on word co-occurrence frequency. The field of information processing technology for networks. Background technique [0002] Topic extraction is one of the basic tasks of automatic text processing. Topic extraction can be performed at multiple levels such as topic words, topic concepts, topic sentences, and topic paragraphs. The topic extraction step usually applies various weighting algorithms to calculate the contribution of topic words, sentences, paragraphs, etc. to the text topic, and select the topic words, sentences, paragraphs, etc. that contribute greatly. However, the weighting and extraction algorithms are mostly statistical and empirical weighting systems, which do not take into account the relationship between words appearing in the text, especially when the text style changes, the empirical and statistical wei...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
IPC IPC(8): G06F17/27
Inventor 李建华李生红杨树堂苏贵洋马颖华陆松年
Owner SHANGHAI JIAO TONG UNIV
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products