Looking for breakthrough ideas for innovation challenges? Try Patsnap Eureka!

Omission recovery method for short text understanding

A recovery method and short text technology, which can be used in electrical digital data processing, special data processing applications, instruments, etc., and can solve the problems of model loss of source sequence structure information and low accuracy.

Active Publication Date: 2019-11-08
SUZHOU UNIV
View PDF3 Cites 9 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0058] In the seq2seq model proposed by Zheng et al., the structural information of the sequence to be restored comes from the coding information of Bi-LSTM. The seq2seq model uses MLE to implicitly learn the semantic structure of the sequence to be restored, which leads to loss of the model during prediction. Part of the source sequence structure information, resulting in a low accuracy rate of complete matching

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Omission recovery method for short text understanding
  • Omission recovery method for short text understanding
  • Omission recovery method for short text understanding

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0100] The present invention will be further described below in conjunction with the accompanying drawings and specific embodiments, so that those skilled in the art can better understand the present invention and implement it, but the examples given are not intended to limit the present invention.

[0101] The omission recovery model used in the present invention is a neural network framework based on an encoder and a decoder. The model is mainly divided into embedding layer, encoding layer and decoding layer. The embedding layer is to obtain the distributed representation of discrete words; the encoding layer is to mine the features of the text; the decoding layer is to use the features extracted by the encoding layer to generate the result after omitting the completion. Each part is introduced separately below.

[0102] embedding layer

[0103] The main function of the embedding layer (Embedding) is to map discrete word units to a low-dimensional semantic space, a...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

PUM

No PUM Login to View More

Abstract

The invention discloses an omission recovery method for short text understanding. The model is mainly divided into an embedding layer, a coding layer and a decoding layer. The embedding layer is usedfor obtaining distributed representation of discrete words; the coding layer is used for mining features of the text; and the decoding layer generates a result after completion omission by using the features extracted by the coding layer. The method has the beneficial effects that the given model assumes that words in the sentence are possibly omitted, the structure information of the to-be-recovered sequence is fully considered when the model is trained and predicted, and the problem of 'sick sentences' caused by seq2seq is effectively solved. Besides, in the aspect of short text understanding, the model fuses a cross attention mechanism and a self-attention mechanism, and compared with a seq2seq model, more and deeper text features can be extracted.

Description

technical field [0001] The invention relates to the field of language processing, in particular to an omission recovery method for short text understanding. Background technique [0002] As a linguistic phenomenon, ellipsis is ubiquitous in natural languages. According to statistics, about 96% of English subjects are explicit subjects, while only 64% of them are explicit subjects in Chinese. It can be seen that ellipsis is more frequent in Chinese than in English. In addition, in a short text application scenario, such as a dialogue system, the two parties in the dialogue have the same dialogue background, and some information is shared in multiple rounds of the dialogue, so ellipsis is more common in the dialogue. Similarly, in a series of question answering systems, there is a correlation between multiple consecutive questions, and there is also a certain correlation between the follow-up questions and the answers to the previous questions. They may share some informatio...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

Application Information

Patent Timeline
no application Login to View More
IPC IPC(8): G06F17/27G06F17/22
CPCY02D10/00
Inventor 孔芳郑杰周国栋
Owner SUZHOU UNIV
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Patsnap Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Patsnap Eureka Blog
Learn More
PatSnap group products