Word vector generation method and device, computer storage medium and electronic equipment

A word vector, target text technology, applied in the field of natural language processing, can solve problems such as poor accuracy and inability to reflect the different semantics of polysemy

Active Publication Date: 2020-09-22
TENCENT TECH (SHENZHEN) CO LTD
View PDF5 Cites 1 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

However, there are polysemous words in natural language, that is, a word can represent different semantics in different contexts, and the existing method can only generate a unique word vector for a word, which cannot reflect the different semantics of polysemous words, resulting in the subsequent implementation based on word vectors. task accuracy is poor

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Word vector generation method and device, computer storage medium and electronic equipment
  • Word vector generation method and device, computer storage medium and electronic equipment
  • Word vector generation method and device, computer storage medium and electronic equipment

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0064] The following will clearly and completely describe the technical solutions in the embodiments of the present invention with reference to the accompanying drawings in the embodiments of the present invention. Obviously, the described embodiments are only some, not all, embodiments of the present invention. Based on the embodiments of the present invention, all other embodiments obtained by persons of ordinary skill in the art without making creative efforts belong to the protection scope of the present invention.

[0065] Natural language refers to the language that people use every day, for example, English, Chinese, Russian, etc. are all types of natural language. Natural Language Processing (NLP) is an important direction in the field of computer science and artificial intelligence. Natural language processing is a science that is closely related to the study of linguistics and integrates linguistics, computer science, and mathematics. It mainly studies methods and re...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention provides a word vector generation method and device, a computer storage medium and electronic equipment. The method comprises the steps of: obtaining a target sequence pattern which contains a target text unit in the training corpus set through sequence pattern mining, wherein the support degree of the target sequence pattern is not less than a support degree threshold value; for each statement, if the context window of the target text unit in the statement contains the target sequence mode meeting the length condition, configuring a semantic tag for the target text unit to obtain target text units corresponding to the same target sequence mode, wherein the target text units corresponding to the same target sequence mode carry the same semantic label, and the target text units corresponding to different target sequence modes carry different semantic labels; and obtaining a word vector of each semantic label by training a word vector model. According to the scheme, the target sequence mode capable of reflecting different contexts of the target text unit is mined, so that different semantic tags are configured for the target text unit with different contexts, and a plurality of word vectors representing different semantics of the target text unit are obtained.

Description

technical field [0001] The present invention relates to the technical field of natural language processing, in particular to a method, device, computer storage medium and electronic equipment for generating word vectors. Background technique [0002] Natural language processing is an important direction in the field of artificial intelligence, which usually involves tasks such as sentiment analysis, intelligent question answering system and machine translation. However, computer programs cannot directly process words or words in natural language. In order for the computer to understand natural language, a necessary link is to use the corresponding word vector to represent each word or word contained in natural language, so that the above can be realized by processing the word vector later. Task. [0003] The prior art generally directly uses a corpus set (which contains multiple sentences) to train an existing word vector (Word2Vec) model, so as to obtain word vectors corre...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
Patent Type & Authority Applications(China)
IPC IPC(8): G06F40/30G06F40/237
CPCG06F40/30G06F40/237
Inventor 刘志煌
Owner TENCENT TECH (SHENZHEN) CO LTD
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products