Document abstract generation method, device and equipment, and computer readable storage medium

A document abstract and computer program technology, applied in the field of data processing, can solve problems such as abstract redundancy, poor readability, and inability to accurately represent the meaning of the document, and achieve the effect of good logic and good readability

Pending Publication Date: 2019-08-16
RICOH KK
View PDF3 Cites 2 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0004] There are two main ways to generate document summaries. The first way is extractive summarization. The summaries generated by this way are redundant, and the accuracy of the output summaries is low due to the lack of features used. Accurately represent the meaning of the document; the second method is the generative abstract generation method, which has high accuracy in outputting the abstract, but the generated abstract does not conform to people's reading habits and is poor in readability

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Document abstract generation method, device and equipment, and computer readable storage medium
  • Document abstract generation method, device and equipment, and computer readable storage medium
  • Document abstract generation method, device and equipment, and computer readable storage medium

Examples

Experimental program
Comparison scheme
Effect test

Embodiment 1

[0064] Embodiments of the present invention provide a method for generating a document abstract, such as figure 1 shown, including:

[0065] Step 101: Use training data to train a neural network model with an attention matrix. The training data includes at least a group of first original sentences and their corresponding abstracts, and the first original sentences are plain text sentences in the training document. ;

[0066] Specifically, the neural network model may be a seq2seq model.

[0067] Step 102: Input each second original sentence of the document to be processed into the neural network model to obtain a summary corresponding to each second original sentence, where the second original sentence is a plain text sentence in the document to be processed;

[0068] Step 103: Establish a phrase attention table according to the attention matrix between each second original sentence of the document to be processed and its corresponding abstract, where the phrase attention ta...

Embodiment 2

[0091] The embodiment of the present invention also provides a device for generating a document abstract, such as Image 6 shown, including:

[0092] The training module 21 is used to train a neural network model with an attention matrix by using training data, the training data includes at least one group of first original sentences and their corresponding abstracts, and the first original sentences are the ones in the training document. plain text sentences;

[0093] The input module 22 is configured to input each second original sentence of the document to be processed into the neural network model to obtain a summary corresponding to each second original sentence, where the second original sentence is a plain text sentence in the document to be processed ;

[0094] The processing module 23 is configured to establish a phrase attention table according to the attention matrix between each second original sentence of the document to be processed and its corresponding abstra...

Embodiment 3

[0108] The embodiment of the present invention also provides an electronic device 30 for generating a document abstract, such as Figure 7 shown, including:

[0109] processor 32; and

[0110] a memory 34 in which computer program instructions are stored,

[0111] Wherein, when the computer program instructions are executed by the processor, the processor 32 is caused to perform the following steps:

[0112] A neural network model with an attention matrix is ​​obtained by training with training data, wherein the training data includes at least one group of first original sentences and their corresponding abstracts, and the first original sentences are plain text sentences in the training document;

[0113] Inputting each second original sentence of the document to be processed into the neural network model to obtain a summary corresponding to each second original sentence, where the second original sentence is a plain text sentence in the document to be processed;

[0114] ...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention provides document abstract generation method, device and equipment, and a computer readable storage medium, and belongs to the technical field of data processing. The document abstract generation method comprises the steps: training by means of training data to obtain a neural network model with an attention matrix, wherein the training data comprises at least one set of first original sentences and corresponding abstracts; inputting each second original sentence of a to-be-processed document into the neural network model to obtain an abstract corresponding to each second original sentence; establishing a phrase attention table according to the attention matrix between each second original sentence of the to-be-processed document and the corresponding abstract; and selectinginitial phrases from the to-be-processed document, expanding each initial phrase according to the phrase attention table to obtain a plurality of expanded phrase candidate sets, aggregating the phrases in each phrase candidate set into sentences, and generating the abstract of the to-be-processed document. The document abstract generated from the document abstract generation method is refined, accurate and good in readability, and redundant information does not exist in the document abstract.

Description

technical field [0001] The present invention relates to the technical field of data processing, and in particular, to a method, apparatus, device, and computer-readable storage medium for generating a document abstract. Background technique [0002] With the rapid development of Internet technology, more and more users tend to view news information through the Internet. Currently, it is a common way for users to view news information provided on the Internet through a mobile terminal such as a mobile phone. However, with the rapid development of science and technology, the amount of news updated on the Internet every day is very large, with various categories and forms. It is very difficult for people to read so much news content in a limited time and understand its main points. And for news with a large amount of content, due to the limited screen of mobile terminals such as mobile phones, the full content of the news often cannot be displayed on the first screen of the mob...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
Patent Type & Authority Applications(China)
IPC IPC(8): G06F16/34G06F17/27G06N3/04
CPCG06N3/04G06F40/258G06F40/289G06F16/345
Inventor 秦添轶张永伟董滨姜珊珊张佳师
Owner RICOH KK
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products