An automatic text abstraction method based on a pre-training language model

A language model and automatic text technology, applied in the field of text processing, can solve problems such as large number of parameters, difficult model convergence, and inability to read abstracts, and achieves the effect of improving performance, good semantic compression effect, and improved readability

Inactive Publication Date: 2019-06-14
BEIHANG UNIV
View PDF5 Cites 50 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0007] (1) Summary often requires a more complex model to learn the meaning in the text and generate a summary. The number of parameters is large, and the relevant large-scale high-quality training corpus is very scarce, making it difficult for the model to converge to a better performance.
[0008] (2) Chinese high-quality automatic text summary training corpus is very scarce. Although there is a Weibo corpus released by Harbin Institute of Technology, the quality is not good. There are a lot of low-quality corpus, and the abstract has nothing to do with the original content.
[0009] (3) The readability of generative abstracts is often not good. Different words often repeat meaninglessly, some words have no connection between them, and some abstracts are even completely unreadable.

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • An automatic text abstraction method based on a pre-training language model
  • An automatic text abstraction method based on a pre-training language model

Examples

Experimental program
Comparison scheme
Effect test

Embodiment 1

[0042] Example 1, for the following text:

[0043] Original: In 2007, Jobs showed people the iPhone and declared that "it will change the world". Some people thought he was exaggerating. However, 8 years later, touch-screen smartphones represented by the iPhone have swept all corners of the world. In the future, smartphones will become "true personal computers" and make greater contributions to human development.

[0044] The present invention can obtain following summary:

[0045] Abstract: iPhone has been popularized all over the world and will make greater contributions to mankind

Embodiment 2

[0046] Example 2, for the following text:

[0047] Original: Last night, a China United Airlines flight from Chengdu to Beijing was found to have many people smoking. Later, due to weather reasons, the plane made an alternate landing at Taiyuan Airport. Several passengers were found smoking by the cabin door. Some passengers requested to re-check the security check, and the captain decided to continue the flight, which caused conflicts between the crew and non-smoking passengers. China United Airlines is currently contacting the crew for verification.

[0048] The present invention can obtain following summary:

[0049] Abstract: Passengers smoked when the plane landed, and non-smokers requested a second security check but failed to cause conflicts

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention provides an automatic text abstraction method based on a pre-training language model. According to the method, an ultra-large-scale unsupervised Chinese corpus is used for training a complex deep language model; the low-layer network structure of the model can extract and retain the grammar and structure information of a text, and the high-layer network structure can extract and retain the semantics and context information of the text, so that the richer text features and semantic information are provided for an automatic text summary task; a pre-training language model and an Encoder are combined to realize, the text features and the semantic information in the pre-training language model are fully utilized, so that a better semantic compression effect is provided, and the performance of an automatic text abstract is improved; a pre-training language model and a decoder are combined, not only the semantics in an original text are considered in the text generation process, but also the semantic information of vocabularies is also considered, so that the readability of the generated text and relevance with the original text are improved, and the performance of an automatic text abstract is improved.

Description

technical field [0001] The invention relates to a text processing method, and mainly relates to an automatic text summarization method based on a pre-trained language model. Background technique [0002] The task of automatic text summarization is to take a certain length of text as input and a shorter length of text as output, and the output text can express the core meaning of the input text. Automatic text summarization is an important research direction of natural language processing. There are two main methods. One is extractive automatic summarization. Find the sentences closest to the main idea of ​​the full text in the original text, and extract them as a summary of the full text; the other type is generative automatic summarization, that is, after the system reads the original text, on the basis of understanding the meaning of the whole article, press certain The method generates a piece of text word by word or word by word to express the central idea of ​​the orig...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
Patent Type & Authority Applications(China)
IPC IPC(8): G06F16/34G06F16/36G06F17/27
Inventor 李建欣唐彬毛乾任闫昊包梦蛟邰振赢
Owner BEIHANG UNIV
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products