Semantic computing method for improving word vector model

A technology of semantic computing and word vector, applied in the field of information science, can solve the problem of semantic gap, dimensional disaster, affecting the deep understanding of models, etc., to achieve the effect of expanding functions

Active Publication Date: 2017-10-24
GUANGZHOU HEYAN BIG DATA TECH CO LTD
View PDF9 Cites 56 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

This kind of word vector representation has two disadvantages: (1) it is prone to the disaster of dimensionality; (2) it cannot describe the similarity between words well, that is, the semantic gap
[0004] However, Word2vec in the prior art does not fully consider the part-of-speech factor, and the part-of-speech is important information for training word vectors, because the part-of-speech itself covers semantic information and grammatical rules, and the use of part-of-speech information can well combine the order and regularity of words And relations, such as adjectives can be followed by nouns but not adverbs, thus affecting the model's deep understanding of semantics

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Semantic computing method for improving word vector model
  • Semantic computing method for improving word vector model
  • Semantic computing method for improving word vector model

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0056] Hereinafter, the present invention will be further described in detail with reference to the embodiments and drawings, but the implementation of the present invention is not limited thereto.

[0057] In order to solve the defects of the prior art, the present invention provides a semantic calculation method and system for improving the Word2vec word vector model.

[0058] It is specifically introduced through the following embodiments:

[0059] First, first introduce the overall process of the semantic calculation method of the improved word vector model of the present invention. Please refer to figure 1 , Which is a flowchart of the steps in the semantic calculation method of the improved word vector model of the present invention. The present invention provides a semantic calculation method for an improved word vector model, including the following steps:

[0060] S1: corpus preprocessing. The specific step S1 includes:

[0061] S11: Remove irrelevant characters, including r...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention provides a semantic computing method for improving a word vector model. The method comprises the following steps of S1, preprocessing a corpus; S2, tagging parts of speech and tagging the parts of speech of words obtained through preprocessing of the corpus; S3, initializing vectors and vectoring the words obtained through tagging of the parts of speech and the parts of speech; S4, integrating context vectors and computing and integrating context word vectors and part of speed vectors of the words; S5, establishing a Huffman tree, training a network, optimizing a target function and judging whether an error reaches a threshold value or not; S6, obtaining the vectors and obtaining the word vectors and the part of speed vectors; and S7, applying the vectors and carrying out semantic computing through application of the word vectors and the part of speed vectors. Compared with the prior art, the method has the advantages that a part of speech factor is added to the vectors, an existing Word2vec model is improved, moreover, innovative application is carried out according to the improved model, and a function of carrying out semantic computing through the Word2vec is expanded.

Description

Technical field [0001] The invention relates to the field of information science, in particular to a semantic calculation method and system for improving the Word2vec word vector model. Background technique [0002] Handing natural language to machine learning algorithms for processing requires mathematical processing of the language, and word vectorization is one way. The easiest way to vectorize words is One-hot Representation. This method creates a vocabulary library and sequentially numbers each word in the vocabulary to give each word a vector. There is only one component of the vector, and all others are zero. This kind of word vector representation has two shortcomings: (1) it is easy to produce dimensionality disaster; (2) it cannot well describe the similarity between words, that is, the semantic gap. In order to overcome this defect, Hinton proposed the Distributed Representation in 1986. The basic idea is to map each word into a fixed-length k-dimensional short vecto...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
Patent Type & Authority Applications(China)
IPC IPC(8): G06F17/27G06F17/30
CPCG06F16/30G06F40/30
Inventor 刘志煌刘冶李宏浩傅自豪邝秋华
Owner GUANGZHOU HEYAN BIG DATA TECH CO LTD
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products