Word vector construction method and device, electronic equipment and storage medium

A construction method and word vector technology, applied in the field of big data, can solve the problems of reducing accuracy and inability to distinguish polysemous words, and achieve the effect of accurate polysemy encoding and disambiguation

Active Publication Date: 2020-06-30
TENCENT TECH (SHENZHEN) CO LTD
View PDF12 Cites 4 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

When the current mainstream word vector models (such as word2vec model, etc.) construct word vectors, a word is usually represented by a unique vectorized code, resulting in

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Word vector construction method and device, electronic equipment and storage medium
  • Word vector construction method and device, electronic equipment and storage medium
  • Word vector construction method and device, electronic equipment and storage medium

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0044] In order to make the purpose, technical solutions and advantages of the embodiments of the present application clearer, the technical solutions in the embodiments of the present application will be clearly and completely described below in conjunction with the drawings in the embodiments of the present application.

[0045] For the convenience of understanding, the nouns involved in the embodiments of the present application are explained below:

[0046] Cloud technology refers to a hosting technology that unifies a series of resources such as hardware, software, and network in a wide area network or a local area network to realize data calculation, storage, processing, and sharing.

[0047] Cloud technology (Cloud technology) is a general term for network technology, information technology, integration technology, management platform technology, application technology, etc. based on cloud computing business model applications. It can form a resource pool, which can be u...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention relates to the technical field of big data, and discloses a word vector construction method and device, electronic equipment and a storage medium. A more accurate word vector is constructed by fusing a local word vector representing local context features and a topic distribution vector representing probability distribution of topics of texts where segmented words are located, so asto realize ambiguity elimination of polysemous words. The method comprises the steps: performing word segmentation processing on a plurality of to-be-processed texts to obtain segmented words in eachto-be-processed text; obtaining a local word vector of each segmented word based on the context of each segmented word; acquiring a topic distribution vector of each segmented word based on probability distribution of a topic to which each segmented word belongs in the plurality of to-be-processed texts; and respectively fusing the local word vector and the topic distribution vector of each segmented word to obtain a target word vector of each segmented word.

Description

technical field [0001] The present application relates to the field of big data technology, and in particular to a word vector construction method, device, electronic equipment and storage medium. Background technique [0002] Polysemy is a common phenomenon in natural language processing and a problem that needs to be solved in many scenarios. Whether it is Chinese or English, many words often have different semantics in different contexts. For example, "apple" is a type of fruit in a text such as "Apple has been produced in recent times with plenty of water," while "Apple" refers to a mobile phone brand in a text such as "Apple has released the latest version of its iPhone." When the current mainstream word vector models (such as word2vec model, etc.) construct word vectors, a word is usually represented by a unique vectorized code, resulting in the inability to distinguish polysemous words in different contexts during the word vector construction stage, reducing the subse...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
IPC IPC(8): G06F40/284G06F40/289G06F40/30G06K9/62
CPCG06F18/253
Inventor 刘志煌
Owner TENCENT TECH (SHENZHEN) CO LTD
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products