Unsupervised multi-document abstract generation method for public opinion analysis

A public opinion analysis, multi-document technology, applied in the field of unsupervised generation of document summaries, can solve the problems of poor practicability of generative summaries, lack of Chinese public opinion summaries training corpus, low effect, etc. The effect of the search space

Active Publication Date: 2020-08-28
HARBIN INST OF TECH
View PDF9 Cites 5 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0007] The present invention provides an unsupervised multi-document summarization method for public opinion analysis, which solves the problems of low effect of the existing multi-document su...

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Unsupervised multi-document abstract generation method for public opinion analysis
  • Unsupervised multi-document abstract generation method for public opinion analysis
  • Unsupervised multi-document abstract generation method for public opinion analysis

Examples

Experimental program
Comparison scheme
Effect test

Embodiment 1

[0059] A method for generating unsupervised multi-document summarization oriented to public opinion analysis, the generating method comprising the following steps:

[0060] Step 1: Collect network public opinion news in real time, and automatically divide news collections according to network hotspots;

[0061] Step 2: Unsupervised extraction of single-document summarization for each public opinion news in the collection;

[0062] Step 3: Analyze all extracted single-document abstracts in the collection to obtain unsupervised multi-document abstracts.

[0063] Further, in the step 1, the automatic division of news collections according to network hotspots is specifically, obtaining hotspots from the Internet, such as Weibo hotspots, Baidu hotspots, WeChat hotspots, etc., using the hotspots as query sentences, and using search engines to collect For the news related to the hot spot, establish hot spot-news, a relationship between one hot spot and multiple news, so as to divide...

Embodiment 2

[0092] A method for generating unsupervised multi-document summarization oriented to public opinion analysis, the generating method comprising the following steps:

[0093] Step 1: Collect network public opinion news in real time, and automatically divide news collections according to network hotspots;

[0094] Step 2: Unsupervised extraction of single-document summarization for each public opinion news in the collection;

[0095] Step 3: Analyze all extracted single-document abstracts in the collection to obtain unsupervised multi-document abstracts.

[0096] The purpose of this step is to generate a text summary with fluent sentences, low redundancy, and the core content of the document collection based on the multiple single-document summaries output in step 2. Unsupervised, generative, and multi-document, these three characteristics meet the needs of public opinion analysis, so the supervised generative multi-document summarization method is used to analyze the public opi...

Embodiment 3

[0131] The difference between this embodiment and embodiment 2 is that in the step 2, an unsupervised algorithm model is adopted, no manual data labeling is required, and the consumption of manpower and time costs for labeling data is avoided, and the data obtained in step 1 is directly used as a training corpus, which can Fully tap the data potential of the large-scale corpus crawled from the Internet;

[0132] This step adopts the extractive summarization method to identify a series of sentences that are strongly related to the core theme of the article from the original news text. Afterwards, it needs to be sent to step 3. If the generative abstract method is adopted, it is easy to get the output of unsound sentences, which will cause error propagation and affect the overall performance of the method;

[0133] This step adopts the single-document summarization method, which is considered for the subsequent multi-document summarization task. Due to the long text length of pu...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention discloses an unsupervised multi-document abstract generation method for public opinion analysis. The method comprises the steps: 1, collecting online public opinion news in real time, and automatically dividing news sets according to network hotspots; 2, extracting a single document abstract of each piece of public opinion news in the set in an unsupervised manner; and 3, analyzing all the extracted single-document abstracts in the set to obtain an unsupervised multi-document abstract. According to the method, the problems that an existing multi-document abstract method is relatively low in effect, relatively poor in generative abstract practicability and lack of Chinese public opinion abstract training corpora are solved, so that public opinion news is monitored.

Description

technical field [0001] The invention belongs to the technical field of unsupervised generation of document abstracts, and in particular relates to an unsupervised method for generating multi-document abstracts oriented to public opinion analysis. Background technique [0002] Automatic summarization is one of the most important technologies in the field of natural language processing. Its research purpose is to use computers to automatically extract or generate concise and coherent short texts that can accurately repeat the meaning of the original text from texts or text collections. Users only need to read the abstract results to understand the main information of the document, saving a lot of time for searching and reading large documents, thus improving people's reading efficiency. [0003] According to different classification standards, automatic summarization technology can be divided into different categories, mainly in the following three ways: [0004] 1. Depending...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
IPC IPC(8): G06F16/34G06F16/9532G06N3/04G06N3/08G06F16/36
CPCG06F16/345G06F16/9532G06N3/08G06F16/36G06N3/045Y02D10/00
Inventor 赵铁军徐冰杨沐昀宋治勋曹海龙朱聪慧
Owner HARBIN INST OF TECH
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products