Multi-document abstract generation method and system

A multi-document and abstract technology, applied in the field of natural language processing and deep learning, can solve the problem that the word vector is not enough to meet the keyword extraction task, ignore the topic information of the document, etc., and achieve high accuracy and good readability

Inactive Publication Date: 2019-10-15
COMMUNICATION UNIVERSITY OF CHINA
View PDF4 Cites 3 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0015] In order to solve the problem that word vectors are not enough to meet the needs of keyword extraction tasks, and existing algorithms ignore the subject information of documents, the present invention provides a method and system for generating multi-document summaries

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Multi-document abstract generation method and system
  • Multi-document abstract generation method and system
  • Multi-document abstract generation method and system

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0091] In order to make the purpose, technical solutions and advantages of the embodiments of the present invention clearer, the technical solutions in the embodiments of the present invention will be clearly and completely described below in conjunction with the drawings in the embodiments of the present invention. Obviously, the described embodiments It is a part of embodiments of the present invention, but not all embodiments. Based on the embodiments of the present invention, all other embodiments obtained by persons of ordinary skill in the art without making creative efforts belong to the protection scope of the present invention.

[0092] The present invention will be further described below in conjunction with the accompanying drawings and specific embodiments.

[0093] In order to make the technical solutions and advantages in the examples of the present application clearer, the exemplary embodiments of the present application will be further described in detail below...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention provides a multi-document abstract generation method which comprises the following steps: S1, determining a theme, acquiring a plurality of documents related to the theme, and constructing a first corpus; S2, constructing an HLDA topic model for the topic, and obtaining sub-topics; S3, calculating importance scores of the clauses; S4, calculating the importance degree of the sub-topics; and S5, extracting abstract sentences. According to the method, news features are added, an HLDA theme importance calculation method is improved, reasonable sentence scores are obtained, and meanwhile on the basis of a traditional abstract sorting step, features of inter-sentence information are added to serve as one of bases for judging sentence sorting, so that finally obtained abstract sentences are more accurate, and sentences are smoother.

Description

technical field [0001] The invention relates to the technical fields of natural language processing and deep learning, in particular to a method and system for generating multi-document abstracts. Background technique [0002] In recent years, while massive data has brought great convenience to people, it has also brought great challenges to data analysis and search. In the context of big data, how to quickly obtain the required key information from massive data has become an urgent problem that people need to solve. [0003] For example, for hot news topics, there will be a large number of related documents on the Internet. The content of these document webpages has many repetitions and similarities. It takes a lot of time and effort for readers to obtain the required document information. Multi-document summary technology uses machine learning, graph model, topic model and other technologies to obtain the content of multiple documents related to the topic, automatically e...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
Patent Type & Authority Applications(China)
IPC IPC(8): G06F16/33G06F16/34G06F17/27
CPCG06F16/3344G06F16/345G06F40/211G06F40/289
Inventor 李樱胡诚成王永滨于水源胡滔
Owner COMMUNICATION UNIVERSITY OF CHINA
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products