Chinese word vector modeling method

A modeling method and word vector technology, applied in the field of Chinese word vector modeling, can solve problems such as unregistered words

Active Publication Date: 2020-10-30
TONGJI UNIV
View PDF4 Cites 0 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

The present invention mainly solves the problem of unregistered words from the perspective of word vector training, and disassembles the words into strokes that have been included in the user dictionary, thereby characterizing all Chinese characters

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Chinese word vector modeling method
  • Chinese word vector modeling method
  • Chinese word vector modeling method

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0039] The present invention will be further described below in conjunction with the accompanying drawings and specific embodiments, so that those skilled in the art can better understand the present invention and implement it, but the examples given are not intended to limit the present invention.

[0040] The existing Chinese word vector modeling methods simply introduce information such as radicals and strokes. Considering the complexity and diversity of the morphology of Chinese characters, the simple n-gram model cannot complete the semantic representation well. The present invention proposes a A variable-length representation method for Chinese characters and using the attention mechanism to explore the internal relationship of stroke combinations of Chinese characters and the spatial connection with a higher degree of freedom, and designed an exquisite model to strengthen the morphological fine-grained information and correlate with semantic information Fusion strengthen...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention discloses a Chinese word vector modeling method. A Chinese word vector modeling method of the present invention includes: using the BPE algorithm to adaptively combine Chinese strokes, constructing Chinese character sub-blocks and using the attention mechanism to complete the combination and representation of the internal structure of Chinese characters; Chinese character representation for information extraction uses Highway network for fine-grained information enhancement; considering the complexity of Chinese grammar, a bidirectional LSTM structure is constructed for semantic encoding in the semantic extraction stage, and a Chinese character adaptive combination layer, morphological information extraction layer, fine-grained An end-to-end deep neural language model with granular information enhancement layer and semantic information extraction layer as the basic components. Beneficial effects of the present invention: the present invention creatively constructs an input form different from n-gram, and performs adaptive fusion of Chinese strokes and radicals to form sub-blocks of Chinese characters.

Description

technical field [0001] The invention relates to the field of natural language processing, in particular to a Chinese word vector modeling method. Background technique [0002] Word embeddings have become an essential part of any deep learning-based natural language processing system. Natural language processing systems encode words and sentences in fixed-length dense vectors, resulting in vastly improved processing of text data by neural networks. In recent years, a large number of word embedding methods have been proposed. The most commonly used models are Word2VEC and GloVe, both of which are unsupervised methods based on distributional assumptions and are available in various languages. Considering the complexity of Chinese character morphology, more and more scholars have begun to study the modeling methods of Chinese word vectors. Scholars at the Hong Kong Polytechnic University first proposed to use Chinese radical information as components of CBOW and Skip-Gram to ...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
Patent Type & Authority Patents(China)
IPC IPC(8): G06F40/284G06N3/04
CPCG06F40/284G06N3/045
Inventor 徐斌辰康琦马璐
Owner TONGJI UNIV
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products