Word vector generation method and device, computer storage medium and electronic equipment

A technology of word vectors and vectors, applied in the field of natural language processing, can solve problems such as inability to reflect different semantics of polysemy words, poor accuracy of natural language processing tasks, etc.

Active Publication Date: 2020-09-22
TENCENT TECH (SHENZHEN) CO LTD
View PDF10 Cites 3 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

However, there are polysemous words in natural language, that is, a word can represent different semantics in different contexts, and the existing methods can only generate a unique word vector for a word, which cannot reflect the different semantics of polysemous words, resulting in the subsequent implementation based on word vectors. have poor accuracy on natural language processing tasks

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Word vector generation method and device, computer storage medium and electronic equipment
  • Word vector generation method and device, computer storage medium and electronic equipment
  • Word vector generation method and device, computer storage medium and electronic equipment

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0062] The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the accompanying drawings in the embodiments of the present invention. Obviously, the described embodiments are only a part of the embodiments of the present invention, but not all of the embodiments. Based on the embodiments of the present invention, all other embodiments obtained by those of ordinary skill in the art without creative efforts shall fall within the protection scope of the present invention.

[0063] Natural language refers to the language that people use every day. For example, English, Chinese, Russian, etc. are all natural languages. Natural Language Processing (NLP) is an important direction in the field of computer science and artificial intelligence. Natural language processing is a science that is closely related to the study of linguistics and integrates linguistics, computer science, and mathematics. It mainly stud...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention provides a word vector generation method and device, a computer storage medium and electronic equipment. The method comprises the steps of: obtaining an associated text composed of associated text units of a context window of a target text unit in each statement, mining a text set composed of all the associated texts by utilizing a document topic generation model to obtain topic distribution vectors corresponding to the target text units of all the statements; and configuring the same semantic tags for the target text units with relatively high topic distribution vector similarity, configuring different semantic tags for the target text units with relatively low topic distribution vector similarity, and training a word vector model by using the training corpus set added withthe semantic tags to obtain a word vector of each semantic tag, wherein the topic distribution vector corresponding to the target text unit can reflect the context of the target text unit, the semantics of the target text unit in different statements can be distinguished according to the similarity of the topic distribution vector, and a word vector model is trained based on the distinguishing result to obtain word vectors of different semantics.

Description

technical field [0001] The present invention relates to the technical field of natural language processing, and in particular, to a method, apparatus, computer storage medium and electronic device for generating word vectors. Background technique [0002] Natural language processing is an important direction in the field of artificial intelligence, which usually involves tasks such as sentiment analysis, intelligent question answering systems, and machine translation. However, computer programs cannot directly process words or words in natural language. In order for computers to understand natural language, a necessary link is to use the corresponding word vector to represent each word or word contained in natural language, so that the above can be achieved by processing the word vector subsequently. Task. [0003] The prior art generally directly uses a corpus set (including multiple sentences) to train an existing word vector (Word2Vec) model, so as to obtain word vectors...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
Patent Type & Authority Applications(China)
IPC IPC(8): G06F40/30G06F40/237
CPCG06F40/30G06F40/237Y02D10/00
Inventor 刘志煌
Owner TENCENT TECH (SHENZHEN) CO LTD
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products