Word vector training method and system fusing word class information and position information

A technology of location information and word vectors, which is applied in the fields of instruments, computing, and electronic digital data processing, etc., and can solve problems such as insufficient use of part-of-speech information, large granularity of part-of-speech information, and unreasonable update of part-of-speech information.

Active Publication Date: 2017-10-10
HUAZHONG UNIV OF SCI & TECH
View PDF3 Cites 37 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0005] In view of the above defects or improvement needs of the prior art, the object of the present invention is to provide a word vector training method and system that fuses part-of-speech and position information, thereby solving the problem of

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Word vector training method and system fusing word class information and position information
  • Word vector training method and system fusing word class information and position information
  • Word vector training method and system fusing word class information and position information

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0044] In order to make the object, technical solution and advantages of the present invention clearer, the present invention will be further described in detail below in conjunction with the accompanying drawings and embodiments. It should be understood that the specific embodiments described here are only used to explain the present invention, not to limit the present invention. In addition, the technical features involved in the various embodiments of the present invention described below can be combined with each other as long as they do not constitute a conflict with each other.

[0045] Since existing word vector learning methods ignore part of speech and its importance in natural language, the present invention provides a word vector learning method that combines part of speech and position information. This method aims to consider the part-of-speech relationship and positional relationship between words on the basis of the original skip-gram model, so that the model ca...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention discloses a word vector training method and system fusing word class information and position information. The method includes the steps that data is preprocessed to obtain a target text; the target text is subjected to word segmentation and word class tagging; the word class information and the position information are modeled; the word class information and the position information are fused on the basis of a skip-gram model based on the negative sampling strategy for word vector learning to obtain target word vectors, and the target word vectors are used for word analog task and word similarity task evaluation. The word class information and the position information of words are considered, on the basis of modeling the word class information and the position information of the words, the word class information of the words and the position information between word classes are fully used for helping training of the word vectors, and parameters are updated more reasonably in the training process.

Description

technical field [0001] The invention belongs to the technical field of natural language processing, and more specifically relates to a word vector training method and system that integrates part of speech and position information. Background technique [0002] In recent years, with the rapid development of mobile Internet technology, the scale of data in the Internet has grown rapidly, and the complexity of data has also increased dramatically. This makes the processing and analysis of these massive unstructured and unlabeled data a major problem. [0003] Traditional machine learning methods use feature engineering (Feature engineering) to symbolize data to facilitate model modeling and solution. However, bag-of-words representation techniques commonly used in feature engineering, such as One-hot vectors, increase with the complexity of data. The dimensionality of features will also increase dramatically leading to the curse of dimensionality problem. And there is still a...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
IPC IPC(8): G06F17/27G06K9/62
CPCG06F40/284G06F18/214
Inventor 文坤梅李瑞轩刘其磊李玉华辜希武昝杰杨琪
Owner HUAZHONG UNIV OF SCI & TECH
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products