Looking for breakthrough ideas for innovation challenges? Try Patsnap Eureka!

Sentence relationship-based word vector training method

A training method and a technology of sentence relationship, applied in the field of deep learning and natural language, can solve problems such as the inability to express the polysemy of Chinese words, the inability to fully express the mapping relationship between word vectors, etc., and achieve the effect of full mapping relationship

Active Publication Date: 2020-02-28
SUN YAT SEN UNIV
View PDF2 Cites 0 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0005] In order to overcome the problem that word vectors cannot express the ambiguity of Chinese words and the mapping relationship between word vectors cannot be fully expressed in the above-mentioned prior art, the present invention provides a word vector training method based on sentence relationship, which combines the inter-sentence of Chinese sentences relationship and the calculation of the matrix K, Q, and V matrix in the self-attention algorithm uses the nonlinear method of the neural network, so as to better express the ambiguity of Chinese words and better express the matrix mapping relationship of word vectors

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Sentence relationship-based word vector training method
  • Sentence relationship-based word vector training method
  • Sentence relationship-based word vector training method

Examples

Experimental program
Comparison scheme
Effect test

Embodiment

[0035] Such as figure 1 Shown as a kind of word vector training method based on the sentence relationship is an embodiment, comprising the following steps:

[0036] Step 1: Pick out and number all the words in the training data set, each word corresponds to a number, and create a word list;

[0037] Step 2: Use several sets of sentence groups as training samples, the sentence groups include two sentences and the relationship between the two sentences, number the words in the sentence corresponding to the word list, and convert the relationship between the two sentences into digital labels; the two sentences Insert a symbol [seq] representing the sentence interval between them; insert a classification symbol [CLS] representing the task type at the beginning of the two sentences.

[0038] Step 3: After numbering the sentence words, input the corresponding word embedding vector, and set the dimension vector expression for each word, including word position relationship vector, s...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

PUM

No PUM Login to View More

Abstract

The invention relates to a sentence relationship-based word vector training method,which comprises the following steps: in the pre-training of a pre-training first stage,adding an inter-sentence relationship of Chinese sentences to train a model,and calculating matrixes K,Q and V in a self-attention algorithm by using a nonlinear calculation mode of a neural network. According to the method,the ambiguity of words can be better expressed by combining the language characteristics of Chinese. And the calculation of matrixes K,Q and V in the self-attention algorithm uses a nonlinear method of theneural network,so that the mapping relationship between vectors can be expressed more fully.

Description

technical field [0001] The present invention relates to the field of deep learning and natural language, and more specifically, relates to a word vector training method based on sentence relationship. Background technique [0002] In natural language processing technology, especially the natural language processing tasks based on applied deep learning, after converting words into tokens (converting tokens means labeling each word in the word list, and each word corresponds to a digital label), Both need to represent the word vectorization again. A word vector is a multi-dimensional vector. The goal of word vectors is to better express the relationship between words and better express the ambiguity between words. (Ambiguity refers to words that have different meanings in different texts.) [0003] The initial word vectors are trained by three methods: NNLM neural network language model, word2vec, and glove, but the word vectors obtained by the three methods cannot reflect ...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

Application Information

Patent Timeline
no application Login to View More
IPC IPC(8): G06F40/211G06F40/284G06F16/33G06F16/35G06N3/04G06N3/08
CPCG06F16/3344G06F16/35G06N3/08G06N3/045Y02D10/00
Inventor 谢梓莹潘嵘
Owner SUN YAT SEN UNIV
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Patsnap Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Patsnap Eureka Blog
Learn More
PatSnap group products