A Semantic Computing Method for Improved Word Vector Model

A semantic calculation and word vector technology, applied in the field of information science, can solve problems such as semantic gap, dimension disaster, and insufficient consideration of part-of-speech factors, and achieve the effect of expanding functions

Active Publication Date: 2021-01-12
GUANGZHOU HEYAN BIG DATA TECH CO LTD
View PDF9 Cites 0 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

This kind of word vector representation has two disadvantages: (1) it is prone to the disaster of dimensionality; (2) it cannot describe the similarity between words well, that is, the semantic gap
[0004] However, Word2vec in the prior art does not fully consider the part-of-speech factor, and the part-of-speech is important information for training word vectors, because the part-of-speech itself covers semantic information and grammatical rules, and the use of part-of-speech information can well combine the order and regularity of words And relations, such as adjectives can be followed by nouns but not adverbs, thus affecting the model's deep understanding of semantics

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • A Semantic Computing Method for Improved Word Vector Model
  • A Semantic Computing Method for Improved Word Vector Model
  • A Semantic Computing Method for Improved Word Vector Model

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0056] The present invention will be described in further detail below in conjunction with the examples and the accompanying drawings, but the embodiments of the present invention are not limited thereto.

[0057] In order to solve the defects of the prior art, the present invention provides a semantic calculation method and system for improving the Word2vec word vector model.

[0058] Specifically introduce through the following examples:

[0059] First, the overall flow of the semantic calculation method for the improved word vector model of the present invention is introduced. For details, please refer to figure 1 , which is a flow chart of the steps of the semantic calculation method for the improved word vector model of the present invention. The invention provides a semantic calculation method for improving a word vector model, comprising the following steps:

[0060] S1: Corpus preprocessing. Concretely described step S1 includes:

[0061] S11: Remove irrelevant ch...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention provides a semantic computing method for improving a word vector model. The method comprises the following steps of S1, preprocessing a corpus; S2, tagging parts of speech and tagging the parts of speech of words obtained through preprocessing of the corpus; S3, initializing vectors and vectoring the words obtained through tagging of the parts of speech and the parts of speech; S4, integrating context vectors and computing and integrating context word vectors and part of speed vectors of the words; S5, establishing a Huffman tree, training a network, optimizing a target function and judging whether an error reaches a threshold value or not; S6, obtaining the vectors and obtaining the word vectors and the part of speed vectors; and S7, applying the vectors and carrying out semantic computing through application of the word vectors and the part of speed vectors. Compared with the prior art, the method has the advantages that a part of speech factor is added to the vectors, an existing Word2vec model is improved, moreover, innovative application is carried out according to the improved model, and a function of carrying out semantic computing through the Word2vec is expanded.

Description

technical field [0001] The invention relates to the field of information science, in particular to a semantic calculation method and system for improving the Word2vec word vector model. Background technique [0002] Handing over natural language to machine learning algorithms requires mathematical processing of the language, and word vectorization is one way. The easiest way to vectorize words is One-hot Representation. This method assigns each word a vector by creating a vocabulary library and sequentially numbering each word in the vocabulary. The vector has only one component, and the others are all 0. This word vector representation has two disadvantages: (1) it is prone to the disaster of dimensionality; (2) it cannot describe the similarity between words well, that is, the semantic gap. In order to overcome this defect, Hinton proposed a distributed representation method (Distributed Representation) in 1986. The basic idea is to map each word into a fixed-length k-di...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
Patent Type & Authority Patents(China)
IPC IPC(8): G06F40/30G06F16/30
CPCG06F16/30G06F40/30
Inventor 刘志煌刘冶李宏浩傅自豪邝秋华
Owner GUANGZHOU HEYAN BIG DATA TECH CO LTD
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products