Automatic text abstracting method based on pre-trained language model

A language model and automatic summarization technology, applied in neural learning methods, biological neural network models, natural language data processing, etc., can solve the problems of repetitive or semantically irrelevant abstracts at the decoding end, and the inability of the coding end to learn information well. The effect of overcoming the lack of long-distance access to information and improving training speed

Pending Publication Date: 2020-09-29
HOHAI UNIV
View PDF2 Cites 22 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0008] In order to solve some problems in the current automatic text summarization, such as the fact that the encoding end cannot learn the information in the source text well, resulting in repeated summaries or irrelevant semantics at the decoding end, etc., the present invention proposes a pre-training based Automatic text summarization method based on language model

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Automatic text abstracting method based on pre-trained language model
  • Automatic text abstracting method based on pre-trained language model
  • Automatic text abstracting method based on pre-trained language model

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0034] The invention will be described in further detail below in conjunction with the accompanying drawings.

[0035] figure 1 Represents the flow chart of the present invention—a method for automatic text summarization based on a pre-trained language model. It is mainly divided into two parts:

[0036] (1) Encode the source text using an encoder built on a pre-trained language model.

[0037] (2) Use a decoder built on the basis of LSTM network, combined with the attention mechanism to decode the encoded vector and generate a text summary.

[0038] For a piece of text content, to automatically generate a general summary, you can use figure 2 The method described is implemented. The specific process is:

[0039] (1) Use the BERT pre-trained network to encode the source text.

[0040] The processed source text information can be expressed as H 0 =[E 1 ,...,E N ], where: E 1 represents the vector representation of the first word (character) in the source text; E N A...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention relates to an automatic text abstracting method based on a pre-trained language model, and belongs to the technical field of natural language processing. The method comprises the following steps: encoding source text information by using a pre-trained language model BERT network; and then automatically generating an abstract for the source text through an LSTM joint attention mechanism. According to the method, in an automatic abstract task of the Chinese text, the generated Chinese abstract achieves good readability, the quality of the generated abstract is high, meanwhile, themodel training speed is high, and due to the fact that the pre-training language model serves as an encoder, the abstract with the relatively high quality can be generated even under the condition offew training data.

Description

technical field [0001] The invention relates to a text automatic summarization method based on a pre-trained language model, belonging to the technical field of natural language processing. Background technique [0002] With the continuous emergence of new media platforms, the information that people are exposed to daily has shown explosive growth, which has caused people to be troubled by information overload, and with the accelerated pace of life, people have no time to sort out all the information they receive. By reading the abstract, people can improve the efficiency of understanding the original text and effectively reduce the time and energy spent browsing information. [0003] At present, the commonly used text summarization techniques at home and abroad can be mainly divided into extractive and generative. [0004] The extraction method refers to calculating the importance of the sentence through the statistical characteristics of the text, such as word frequency, ...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
Patent Type & Authority Applications(China)
IPC IPC(8): G06F40/126G06F40/258G06F40/30G06N3/04G06N3/08
CPCG06F40/126G06F40/258G06F40/30G06N3/049G06N3/08G06N3/047G06N3/045G06N3/044
Inventor 王宇师岩
Owner HOHAI UNIV
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products