Text simplification method based on word vector query model

What is Al technical title?
Al technical title is built by PatSnap Al team. It summarizes the technical point description of the patent document.
A text simplification and word vector technology, applied in biological neural network models, instruments, computing, etc., can solve the problems of long training time, low text simplification efficiency, slow model training and convergence speed, etc., to improve quality and accuracy. degree, reduce training time and memory usage, and reduce the effect of the number of parameters

Active Publication Date: 2018-03-27

PEKING UNIV

View PDF1 Cites 40 Cited by

Summary
Abstract
Description
Claims
Application Information

AI Technical Summary
This helps you quickly interpret patents by identifying the three key elements:
Problems solved by technology
Method used
Benefits of technology

Problems solved by technology

However, since the traditional sequence-to-sequence algorithm generates words in the decoder, it only maps the hidden layer representation to the vocabulary list through a large matrix multiplication, and the semantics of the words are not fully utilized.

Moreover, mapping through a large matrix makes the entire network structure use a huge number of parameters (the vocabulary is generally large), resulting in slower model training and convergence, longer training time, and more memory resources. Inefficiency of text simplification

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Image

Smart Image Click on the blue labels to locate them in the text.

Viewing Examples

Smart Image

Examples

Experimental program

Comparison scheme

Effect test

Embodiment Construction

[0033] Below in conjunction with accompanying drawing, further describe the present invention through embodiment, but do not limit the scope of the present invention in any way.

[0034] The invention provides a method for generating simplified text based on a word vector retrieval model, figure 1 It is a block flow diagram of the method provided by the present invention, figure 2 It is a schematic diagram of the specific implementation of the present invention. By improving the generation algorithm in the classic sequence-to-sequence algorithm, the target output is generated in the form of a search word vector; and then the negative log likelihood of the standard answer and the predicted word is maximized through training, thereby generating a complete The simplified text of .

[0035]The following embodiments take simplified text in Wikipedia as an example, the original text is as follows:

[0036] “Depending on the context, another closely-related meaning of constituent ...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Login to View More

PUM

Login to View More

Abstract

The invention provides a text simplification method based on a word vector query model. Based on a sequence and a sequence model, when decoding is conducted, the correlation between the hidden state of a decoder and the word vectors of all vocabularies is obtained through the reference of an attention mechanism to serve as a measurement of the possibility of the words to be generated in the next step. The method includes the following steps that a text encoder is designed, and an original text is compressed; a text simplification decoding generator is designed, and the current hidden layer vector and the context vector at every moment are calculated circularly; the retrieval correlation of each word in a word list is obtained, the predicted words at the current moment are output, and a complete simplified text is obtained; a model for generating the simplified text is trained, and the log likelihood of the predicted words and actual target words is minimized; after training, the complete simplified text is generated. The method can improve the quality and accuracy of the generated text, greatly reduce the number of parameters of the existing method, and reduce the training time andthe memory usage.

Description

technical field [0001] The invention belongs to the technical field of natural language processing and relates to a text simplification method, in particular to a text simplification method based on a word vector query model. Background technique [0002] Many existing text simplification algorithms use sequence-to-sequence-based generative models. These text simplification algorithms are based on deep learning technology and evolved from neural network machine translation algorithms. By observing large-scale source language to The training corpus of the target language can automatically simplify the text after a certain period of training. However, because the traditional sequence-to-sequence algorithm only maps the hidden layer representation to the vocabulary list through a large matrix multiplication when generating words in the decoder, the semantics of the words are not fully utilized. Moreover, mapping through a large matrix makes the entire network structure use a h...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Login to View More

Application Information

Patent Timeline

Login to View More

IPC IPC(8): G06F17/24G06F17/30G06N3/04

CPCG06F16/332G06F40/186G06N3/04

Inventor 孙栩马树铭李炜

Owner PEKING UNIV

Features

R&D
Intellectual Property
Life Sciences
Materials
Tech Scout

Why Patsnap Eureka

Unparalleled Data Quality
Higher Quality Content
60% Fewer Hallucinations

Social media

Patsnap Eureka Blog

Learn More

Browse by: Latest US Patents, China's latest patents, Technical Efficacy Thesaurus, Application Domain, Technology Topic, Popular Technical Reports.

Text simplification method based on word vector query model

AI Technical Summary This helps you quickly interpret patents by identifying the three key elements: Problems solved by technologyMethod usedBenefits of technology

Problems solved by technology

Method used

Image

Examples

Embodiment Construction

PUM

Abstract

Description

Claims

Application Information

AI Technical Summary
This helps you quickly interpret patents by identifying the three key elements:
Problems solved by technology
Method used
Benefits of technology