Abstracting method for mass-text quick understanding

A text and summarization technology, applied in the field of generating text summaries using topic models, can solve the problems of ignoring the sentence length and sentence position of text units, and difficult to accurately measure the semantic correlation between sentences.

Inactive Publication Date: 2017-01-04
UNIV OF ELECTRONICS SCI & TECH OF CHINA
View PDF4 Cites 18 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0006] The purpose of the present invention is: to provide a kind of summarization method for the fast understanding of massive texts in view of the above-mentioned problems

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Abstracting method for mass-text quick understanding
  • Abstracting method for mass-text quick understanding
  • Abstracting method for mass-text quick understanding

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0067] All the features disclosed in this specification, except mutually exclusive features and / or steps, can be combined in any way.

[0068] Combine below Figure 1-5 The present invention will be described in detail.

[0069] The invention proposes an abstract method aimed at rapid understanding of massive texts, and the model is applied to speaker recognition to obtain good results. The implementation diagram of the entire algorithm is similar to figure 1 , including the steps:

[0070] Step 1: Obtain a text collection composed of text to be analyzed;

[0071] Step 2: Perform word segmentation, anaphora resolution, redundant information removal and basic unit division on the corpus of the text collection to obtain the preprocessed corpus;

[0072] Step 2.1: Use the ICTCLAS word segmentation system for rough segmentation of the text set based on the N shortest path algorithm to obtain all possible word segmentation combinations, and use the cascaded hidden Markov model ...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention discloses an abstracting method for mass-text quick understanding. The abstracting method includes the steps that the content of text is obtained; the text is subjected to pretreatment operating such as word segmentation, coreference eliminating, redundant information removing and analysis unit dividing; the content of the text is subjected to subject analysis with a topic model to obtain subject distribution in the text; a graph model is built according to subject incidence relationships between analysis units, and weights of all directed edges in the graph model are calculated; the graph model is calculated with the contribution iterative method till is converged, and the text summarization with the suitable length is generated as requirement. According to the text abstracting method, mass unstructured text data can be automatically analyzed, the text summarization with which a core topic can be fully covered is obtained to serve as replacement of mass original data, and therefore the quick understanding aim is achieved.

Description

technical field [0001] The invention relates to the field of text information mining, in particular to a method for generating text summaries using topic models. Background technique [0002] At present, the rapid popularization of the Internet has resulted in explosive growth of information resources. Abundant information resources bring great convenience to people on the one hand, but also face many difficulties in the selection of effective resources. From the perspective of the types of network information resources, the proportion of unstructured resources shows an increasing trend, and the processing technology involved is more difficult than structured data. Among them, text-type information has typical unstructured characteristics, and its effective analysis and processing has very important theoretical value and practical significance in the Internet and many industries. [0003] Text summarization is a very important part of text information processing. The gene...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
IPC IPC(8): G06F17/30G06F17/27
CPCG06F16/345G06F40/211G06F40/284G06F40/30
Inventor 刘贵松秦科罗光春卢国明李宝程
Owner UNIV OF ELECTRONICS SCI & TECH OF CHINA
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products