Looking for breakthrough ideas for innovation challenges? Try Patsnap Eureka!

Text theme mining method based on intra-sentence association graph

A topic and text technology, applied in the field of text topic mining based on the relationship graph between sentences, can solve the problems of low topic quality, inability to adapt to large-scale text data processing, immature technology development, etc., and achieve the effect of high versatility

Inactive Publication Date: 2015-01-21
SHANGHAI CHUWA SOFTWARE
View PDF4 Cites 10 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

Although the application of these technologies has improved the quality of the theme to a certain extent, the development of these technologies is still immature
It cannot meet the needs of large-scale text data processing; topic mining methods based on shallow feature statistics (such as word frequency statistics in sentences) are universal, but some existing statistical methods are too simple, and the quality of the extracted topics is not good. high

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Text theme mining method based on intra-sentence association graph
  • Text theme mining method based on intra-sentence association graph
  • Text theme mining method based on intra-sentence association graph

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0027] The embodiments of the present invention will be described in further detail below in conjunction with the accompanying drawings, but the present embodiments are not intended to limit the present invention, and any similar structures and similar changes of the present invention should be included in the protection scope of the present invention.

[0028] like figure 1 As shown, a text topic mining method based on an inter-sentence correlation graph provided by an embodiment of the present invention is characterized in that the specific steps are as follows:

[0029] 1) Target text preprocessing

[0030] Divide the target text by sentence, obtain the sentence sequence table S of the text, and carry out lexical analysis to each sentence in the sentence sequence table S, extract the vocabulary in each sentence, and use the vocabulary in each sentence as a feature word;

[0031] 2) Construct the sentence affinity matrix of the target text as:

[0032] A=[A ij ] m×m

[...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

PUM

No PUM Login to View More

Abstract

The invention provides a text theme mining method based on an intra-sentence association graph and relates to the technical field of data mining. The technical problems that an existing mining method is low in quality and poor in universality can be solved by the text theme mining method. The method includes the steps that a target text is firstly divided according to sentences, a sentence sequence table of the text is acquired, then, a sentence association matrix of the target text is established, the weight of each element in the sentence sequence table is calculated, theme sentences are selected according to the calculated weights, the weights of all the non-theme sentences are adjusted each time the theme sentences are selected, theme sentences are selected again according to the adjusted weights, the operation is conducted repeatedly until the sum of character sizes of all the theme sentences reaches a preset character number threshold value, and finally, all the theme sentences serve as the theme content mined from the target text. The method is suitable for text documents of various forms of literature, styles and types.

Description

technical field [0001] The invention relates to data mining technology, in particular to a text topic mining method based on an inter-sentence correlation graph. Background technique [0002] Text data topic mining technology mainly refers to the technology of using computer to automatically extract key sentences that best represent the subject content from the text collection to form concise and coherent essays. With the exponential growth of the number of documents on the web, the topic of quickly discovering text becomes more and more important. Refined and accurate subject content can save user information filtering time and improve user work efficiency. [0003] Among the existing topic mining methods, the topic mining methods based on text structure features and sentence positions are related to the genre of the target text, and the practical methods for scientific and technological literature are not necessarily suitable for news literature; with the development of n...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

Application Information

Patent Timeline
no application Login to View More
IPC IPC(8): G06F17/30G06F17/27
CPCG06F16/36G06F40/216
Inventor 陶余会吴康宁孙煦峰赵亮
Owner SHANGHAI CHUWA SOFTWARE
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Patsnap Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Patsnap Eureka Blog
Learn More
PatSnap group products