Automatic abstract generation method and device, electronic equipment and storage medium

An automatic generation and summarization technology, which is applied in electronic digital data processing, natural language data processing, unstructured text data retrieval, etc. It can solve the problem that rare named entities cannot be generated, so as to enhance the feature representation ability and improve the accuracy. Effect

Pending Publication Date: 2021-01-05
杭州远传新业科技股份有限公司
View PDF0 Cites 7 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0004] In order to overcome the deficiencies in the prior art, one of the purposes of the present invention is to provide a method for automatically generating abstracts, which uses the trained Transformer codec model to encode and decode the word vectors of each word to obtain word vectors of multiple generated words , can enhance the feature representation ability of the word vectors of multiple generated words, divide each generated wo

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Automatic abstract generation method and device, electronic equipment and storage medium
  • Automatic abstract generation method and device, electronic equipment and storage medium
  • Automatic abstract generation method and device, electronic equipment and storage medium

Examples

Experimental program
Comparison scheme
Effect test

Embodiment 1

[0050] Embodiment 1 provides a method for automatically generating abstracts, please refer to figure 1 shown, including the following steps:

[0051] S110. According to the original text and the named entities in the original text, calculate based on two trained embedding vector models respectively to obtain the first character vector of each word in the original text and the second character vector of each word in the named entity, and set The first character vector and the second character vector of each word are concatenated to obtain the word vector of each word.

[0052] Named entity information is very important. Named entity words such as person names, place names, time or organization names in the original text have a high probability of appearing in the corresponding abstracts. Therefore, by combining the first character vector and the second character vector of each word Splicing is performed so that the word vector of each word in the original text contains the nam...

Embodiment 2

[0081] Embodiment 2 is an improvement on the basis of Embodiment 1. The trained Transformer encoding and decoding model includes a position encoding layer, an encoder and a decoder. Since the Transformer model relies entirely on the attention mechanism for encoding and decoding, unlike traditional methods Relying on the sequence model, the temporal characteristics cannot be reflected from the input perspective of the model. Therefore, adding the position vector obtained by the word vector of each word through the position encoding layer to the embedding vector of each word helps to determine the position of each word, or the distance between each word in the sequence, and then better express relationship between words. Specifically include the following steps:

[0082] Input the word vector of each word into the position encoding layer to perform position encoding to obtain the position vector of each word; calculate the word vector and position vector of each word to obtain ...

Embodiment 3

[0092] Embodiment 3 is an improvement on the basis of Embodiment 2. The pointer network uses the probability distribution calculation function to connect the output of the decoder and the output of the encoder in the trained Transformer codec model, and solves the named entities that do not appear in the dictionary through the pointer mechanism. Problems that cannot be generated, and have the ability to effectively handle long-distance dependencies and parallel computing characteristics.

[0093] The trained pointer network includes a linear transformation layer, a normalization layer and a probability distribution calculation function. The linear transformation layer is a simple fully connected neural network that projects the word vectors produced by the decoder into a much larger vector called logits. Assuming that 10,000 different English words in the dictionary are learned from the training set, the log probability vector is a vector of 10,000 cells in length - each cell ...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention discloses an abstract automatic generation method and apparatus, an electronic device and a storage medium. The method comprises the steps of performing calculation on an original text and a named entity in the original text based on two trained embedded vector models to obtain a first character vector and a second character vector of each single character, and performing splicing toobtain a word vector of each single character; encoding and decoding the word vectors of each single character through the trained Transformer encoding and decoding model to obtain the word vectors of the multiple generated words, so that the feature representation capacity of the word vectors of the multiple generated words can be enhanced, and dividing each generated word into the first type ofgenerated words or the second type of generated words; and calculating the first type of generated words and the second type of generated words by adopting a trained pointer network and a trained memory network respectively to obtain first type of output words and second type of output words, and forming a target abstract by a plurality of first type of output words and/or a plurality of second type of output words, so that the problem that a remote named entity cannot be generated can be effectively solved.

Description

technical field [0001] The present invention relates to the field of natural language processing, in particular to an abstract automatic generation method, device, electronic equipment and storage medium. Background technique [0002] Automatic summarization refers to the automatic generation of short texts that can summarize the original text by means of algorithms for a piece of original text. At present, automatic summarization algorithms are mainly divided into extractive automatic summarization and generative automatic summarization. Extractive automatic summarization refers to extracting one or more sentences with generalization ability from the original text as an abstract in units of sentences, but this method is difficult to adapt to the changeable original text content. Generative automatic summarization generates abstracts by excavating deeper semantic information to paraphrase and summarize the central idea of ​​the original text. The content of the abstract is ...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
IPC IPC(8): G06F40/284G06F40/295G06F40/216G06F40/242G06F16/35G06F40/126
CPCG06F16/355G06F40/126G06F40/216G06F40/242G06F40/284G06F40/295
Inventor 嵇望王伟凯郭心南董悦李舟扬钱艳安毫亿朱鹏飞梁青
Owner 杭州远传新业科技股份有限公司
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products