Looking for breakthrough ideas for innovation challenges? Try Patsnap Eureka!

A method of generating a word vector from a multi-task model

A multi-task model, word vector technology, applied in neural learning methods, biological neural network models, special data processing applications, etc., can solve the problem of limited word vector information, achieve strong generalization ability, ensure high efficiency, The effect of improving quality and versatility

Active Publication Date: 2019-02-12
SUN YAT SEN UNIV
View PDF9 Cites 21 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

Even because of the lack of supervision and guidance, there are many gaps with our human learning words, so the information of word vectors has limitations.

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • A method of generating a word vector from a multi-task model
  • A method of generating a word vector from a multi-task model
  • A method of generating a word vector from a multi-task model

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0026] The accompanying drawings are for illustrative purposes only, and should not be construed as limitations on this patent; in order to better illustrate this embodiment, certain components in the accompanying drawings will be omitted, enlarged or reduced, and do not represent the size of the actual product; for those skilled in the art It is understandable that some well-known structures and descriptions thereof may be omitted in the drawings. The positional relationship described in the drawings is for illustrative purposes only, and should not be construed as a limitation on this patent.

[0027] In step S1, in the present invention, a large amount of text corpus needs to be used as training data. The present invention mainly uses open-source Wikipedia Chinese data, and unsupervised tasks can directly train the data without labels, and part-of-speech tagging The annotations of Wikipedia can be obtained by predicting Wikipedia data through existing open source tools, and...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

PUM

No PUM Login to View More

Abstract

The present invention relates to the technical field of natural language processing in the field of computers and, more particularly, to a method of generating a word vector from a multi-task model. The method integrates the information of unsupervised task, classification task, part-of-speech tagging and other task models to enhance the information contained in the generated word vector, at the saem time, uses the efficient and good enough models for multitasking integration, so that the method can be used on large datasets. The method trains unsupervised tasks through the Global Vector forWord representation (Global Vectors for Word representation) model to obtain the information related to the language model. The classification task is trained by the Fasttext model to obtain the category information in the text. The part-of-speech task was trained by logistic regression model to obtain the part-of-speech related information. The method can quickly get high-quality word vectors with rich meanings on large-scale datasets, so that the method can be applied to the natural language processing task scenarios.

Description

technical field [0001] The present invention relates to the technical field of natural language processing in the computer field, and more specifically, relates to a method for generating word vectors by a multi-task model. Background technique [0002] The word vector expression is an operation to convert the text code into a numerical code that is easy to model calculation. It is initially expressed by a simple one-hot vector, that is, each dimension of the vector represents a word: "0" in any dimension means it is not the word , "1" is represented as the word. In such an expression, each word vector has only one dimension as "1", and the others are "0". Although the one-hot vector is simple, it has many problems such as high dimensionality, sparseness, and vocabulary gap (word meaning is not related). With the development of deep learning technology, the distributed representation of words (Distributed Representation) is currently the most used - representing words as lo...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

Application Information

Patent Timeline
no application Login to View More
Patent Type & Authority Applications(China)
IPC IPC(8): G06F17/27G06N3/08
CPCG06N3/084G06F40/284Y02T10/40
Inventor 黄定帮潘嵘
Owner SUN YAT SEN UNIV
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Patsnap Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Patsnap Eureka Blog
Learn More
PatSnap group products