Unlock instant, AI-driven research and patent intelligence for your innovation.

Text abstract generation method for enhancing semantic correlation

A technology of semantic correlation and summarization, applied in semantic analysis, semantic tool creation, text database query, etc., can solve problems such as training model, language sequence information coding ability defect, semantic gap, etc., to achieve convenient implementation and strong generalization ability , the effect of reducing difficulty

Inactive Publication Date: 2020-02-07
BEIJING UNIV OF TECH
View PDF6 Cites 5 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

However, the ability of convolutional neural networks to encode language sequence information is flawed. The Transformer model proposed by Ashish et al. in 2017 can not only process language information, but also train in parallel.
But under normal circumstances, we cannot use an infinite amount of corpus to train our model, which causes the summary generated by Transformer to be fluent in language, but there is a certain gap between the obtained semantics and the target summary.

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Text abstract generation method for enhancing semantic correlation
  • Text abstract generation method for enhancing semantic correlation
  • Text abstract generation method for enhancing semantic correlation

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0032] The present invention will be described in further detail below in combination with specific network model examples and with reference to the accompanying drawings.

[0033] The used hardware equipment of the present invention has one PC machine, 1 piece of 1080 graphics cards;

[0034] In this section, we conduct extensive experiments to explore the impact of our proposed method. The network architecture operation flow chart designed by the present invention is as follows figure 1 As shown, it specifically includes the following steps:

[0035] Step 1, process the text data set, remove special symbols, and perform text processing according to word frequency, and build a dictionary for training. The key in the dictionary is the word, and the value is the id of the word.

[0036] Step 2, randomly initialize the Embedding layer matrix, and select the word vector corresponding to each word according to the id in the dictionary.

[0037] Step 3, get the pre...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

PUM

No PUM Login to View More

Abstract

The invention discloses a text abstract generation method for enhancing semantic correlation, and belongs to the field of text generation. The method comprises the following steps: firstly, carrying out preprocessing such as denoising on massive Chinese data, then enabling an abstract to pass through a pre-training network to obtain a word vector corresponding to a text, converting an article needing to be abstracted into the word vector, and sending the word vector to a model Encoder end to carry out feature representation; and finally, at the Decoder end of the model, combining the generatedabstract vector with the pre-training calculation similarity and the LOSS value of the model, and performing gradient calculation of the model. The pre-training vector and the model generation vectorare combined, so that the pressure of model feature extraction can be reduced, it can be ensured that abstract semantics generated by the model are the same as those generated by an original text, and the problem that the generated abstract is not related to the original text is solved.

Description

Technical field: [0001] The invention belongs to the field of natural language generation, and in particular relates to a sequence-to-sequence text abstract generation related method. Background technique: [0002] With the rapid development of information technology, information explosion is impacting people's life. On the one hand, there are a large number of web pages and texts on the Internet, but there are a lot of redundant content among the content-related texts, and it takes a lot of time and energy for people to read and obtain these duplicate content. On the other hand, social development accelerates the pace of people's lives, and increasingly fragmented time drives people to obtain content through the Internet instead of traditional paper materials such as books. Therefore, in order to solve how to extract the main content from a large amount of text information, it has become a hot spot of current academic research. [0003] Regarding the issue of text summari...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

Application Information

Patent Timeline
no application Login to View More
IPC IPC(8): G06F16/34G06F16/33G06F16/36G06F40/30G06F40/247
CPCG06F16/3335G06F16/3344G06F16/345G06F16/374
Inventor 刘博申利彬
Owner BEIJING UNIV OF TECH