Looking for breakthrough ideas for innovation challenges? Try Patsnap Eureka!

Text simplification method based on word vector query model

A text simplification and word vector technology, applied in biological neural network models, instruments, computing, etc., can solve the problems of long training time, low text simplification efficiency, slow model training and convergence speed, etc., to improve quality and accuracy. degree, reduce training time and memory usage, and reduce the effect of the number of parameters

Active Publication Date: 2018-03-27
PEKING UNIV
View PDF1 Cites 40 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

However, since the traditional sequence-to-sequence algorithm generates words in the decoder, it only maps the hidden layer representation to the vocabulary list through a large matrix multiplication, and the semantics of the words are not fully utilized.
Moreover, mapping through a large matrix makes the entire network structure use a huge number of parameters (the vocabulary is generally large), resulting in slower model training and convergence, longer training time, and more memory resources. Inefficiency of text simplification

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Text simplification method based on word vector query model
  • Text simplification method based on word vector query model
  • Text simplification method based on word vector query model

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0033] Below in conjunction with accompanying drawing, further describe the present invention through embodiment, but do not limit the scope of the present invention in any way.

[0034] The invention provides a method for generating simplified text based on a word vector retrieval model, figure 1 It is a block flow diagram of the method provided by the present invention, figure 2 It is a schematic diagram of the specific implementation of the present invention. By improving the generation algorithm in the classic sequence-to-sequence algorithm, the target output is generated in the form of a search word vector; and then the negative log likelihood of the standard answer and the predicted word is maximized through training, thereby generating a complete The simplified text of .

[0035]The following embodiments take simplified text in Wikipedia as an example, the original text is as follows:

[0036] “Depending on the context, another closely-related meaning of constituent ...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

PUM

No PUM Login to View More

Abstract

The invention provides a text simplification method based on a word vector query model. Based on a sequence and a sequence model, when decoding is conducted, the correlation between the hidden state of a decoder and the word vectors of all vocabularies is obtained through the reference of an attention mechanism to serve as a measurement of the possibility of the words to be generated in the next step. The method includes the following steps that a text encoder is designed, and an original text is compressed; a text simplification decoding generator is designed, and the current hidden layer vector and the context vector at every moment are calculated circularly; the retrieval correlation of each word in a word list is obtained, the predicted words at the current moment are output, and a complete simplified text is obtained; a model for generating the simplified text is trained, and the log likelihood of the predicted words and actual target words is minimized; after training, the complete simplified text is generated. The method can improve the quality and accuracy of the generated text, greatly reduce the number of parameters of the existing method, and reduce the training time andthe memory usage.

Description

technical field [0001] The invention belongs to the technical field of natural language processing and relates to a text simplification method, in particular to a text simplification method based on a word vector query model. Background technique [0002] Many existing text simplification algorithms use sequence-to-sequence-based generative models. These text simplification algorithms are based on deep learning technology and evolved from neural network machine translation algorithms. By observing large-scale source language to The training corpus of the target language can automatically simplify the text after a certain period of training. However, because the traditional sequence-to-sequence algorithm only maps the hidden layer representation to the vocabulary list through a large matrix multiplication when generating words in the decoder, the semantics of the words are not fully utilized. Moreover, mapping through a large matrix makes the entire network structure use a h...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

Application Information

Patent Timeline
no application Login to View More
IPC IPC(8): G06F17/24G06F17/30G06N3/04
CPCG06F16/332G06F40/186G06N3/04
Inventor 孙栩马树铭李炜
Owner PEKING UNIV
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Patsnap Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Patsnap Eureka Blog
Learn More
PatSnap group products