Sentence alignment method based on depth neural network

A deep neural network and neural network technology, applied in the field of sentence alignment based on neural network, can solve the problems of easy loss of word matching information, loss of sentence context information, etc.

Inactive Publication Date: 2018-12-21
SUZHOU UNIV
View PDF10 Cites 21 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

Computing the similarity of word pairs only through word embedding may lose the context information of the sentence, and judging the sentence al

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Sentence alignment method based on depth neural network
  • Sentence alignment method based on depth neural network
  • Sentence alignment method based on depth neural network

Examples

Experimental program
Comparison scheme
Effect test

Embodiment 1

[0070] The sentence alignment method of deep neural network (Bi-RNN+GRN+CNN) such as image 3 Shown, be the structural diagram of the sentence alignment method based on deep neural network of the present invention, the sentence alignment method based on deep neural network comprises the following steps, simultaneously Figure 9 The specific flow chart is given:

[0071] 1) Corpus preprocessing: Generate vocabulary and word embedding vocabulary according to the training corpus;

[0072] 2) Word embedding layer, for each word in the sentence, find its corresponding word embedding from the word embedding table, that is, use the bilingual word embedding provided by the reference paper [Note 1] to represent the word as a vector, so that similar words have a similar representation;

[0073] 3) A bidirectional recurrent neural network layer is used to encode sentences, not only considering the semantic information of the word itself, but also considering the context information of ...

Embodiment 2

[0080] The bidirectional recurrent neural network model (Bi-RNN) such as Figure 6 Shown, be another embodiment of the present invention, the sentence alignment method based on deep neural network The present invention comprises the steps:

[0081] 1) Corpus preprocessing: Generate vocabulary and word embedding vocabulary according to the training corpus;

[0082] 2) Generate a word embedding layer, and find its corresponding word embedding from the word embedding table for each word in the sentence;

[0083] 3) A bidirectional recurrent neural network layer is used to encode sentences, not only considering the semantic information of the word itself, but also considering the context information of the word, so that each word obtains a hidden state containing its context information; The hidden state of the word is averaged to obtain the sentence vector, and then the two sentence vectors are concatenated to obtain v r ;

[0084] 4) Multi-layer perceptron layer, input the re...

Embodiment 3

[0088] Bi-directional cyclic neural network + convolutional neural network model (Bi-RNN+CNN) such as Figure 7 Shown, be another embodiment of the present invention, the sentence alignment method based on deep neural network comprises the steps:

[0089] 1) Corpus preprocessing: Generate vocabulary and word embedding vocabulary according to the training corpus;

[0090] 2) Generate a word embedding layer, and find its corresponding word embedding from the word embedding table for each word in the sentence;

[0091] 3) A bidirectional recurrent neural network layer is used to encode sentences, not only considering the semantic information between words itself, but also considering the context information of the word, so that each word obtains a hidden state containing its context information; each sentence Find the average of the hidden states of the words in the word to obtain the sentence vector, and then splicing the two sentence vectors together to obtain v r ;

[0092]...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention relates to a sentence alignment method based on a depth neural network, which comprises the following steps: preprocessing corpus, generating a word list and a word embedding word list,adopting a bi-directional circulating neural network layer to code the sentence, not only considering the semantic information of the word itself, but also considering the context information of the word, so that each word obtains a hidden state containing the context information of the word; The hidden states of words in each sentence are averaged to obtain sentence vectors, and then the two sentence vectors are stitched together. The perceptron layer is used to obtain a more abstract representation to determine whether the sentences are aligned or not. In addition, the word concealment stateobtained by the bidirectional loop neural network encoding can not only contain its own meaning, but also contain its context information.

Description

technical field [0001] The invention relates to a sentence alignment method based on a neural network. Background technique [0002] Parallel corpus is an extremely important resource for multiple natural language processing tasks. Many tasks in natural language processing, such as machine translation, cross-language information retrieval, and bilingual dictionaries, require the support of parallel corpora. The sentence alignment task is to extract parallel sentence pairs that are mutually translated from two documents in different languages, and use them to expand the parallel corpus, so as to solve the problem of small parallel corpora. [0003] The early research methods of sentence alignment were mainly based on feature matching, which only focused on the surface information between bilingual sentences, that is, judging whether the sentences are aligned according to the length relationship between the two sentences. Then, according to the relationship between word pairs...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
IPC IPC(8): G06F17/27G06N3/04
CPCG06F40/211G06N3/045
Inventor 丁颖李军辉周国栋
Owner SUZHOU UNIV
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products