Chinese word vector generation method based on similar contexts and reinforcement learning

A technology of reinforcement learning and context, applied in neural learning methods, text database queries, biological neural network models, etc., can solve the problems of semantic irrelevance, lack of consideration, and low quality of word vector representation, so as to avoid semantic irrelevance , improve quality, and enhance the effect of learning architecture performance

Active Publication Date: 2020-04-17
UNIV OF ELECTRONICS SCI & TECH OF CHINA
View PDF7 Cites 4 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0005] The technical problem to be solved by the present invention is that the existing Chinese word vector generation methods all consider the relationship between the adjacent context of the target word and the target word for prediction, and do not take into account that although some words in Chinese are adjacent, they have no semantic meaning. Irrelevant situation, and the problem that the quality of word vector representation is not high

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Chinese word vector generation method based on similar contexts and reinforcement learning
  • Chinese word vector generation method based on similar contexts and reinforcement learning
  • Chinese word vector generation method based on similar contexts and reinforcement learning

Examples

Experimental program
Comparison scheme
Effect test

Embodiment

[0064] Such as Figures 1 to 8 Shown, a kind of Chinese word vector generation method based on similar context and reinforcement learning of the present invention, the method comprises:

[0065] Select a corpus and perform corpus preprocessing to construct a Chinese corpus;

[0066] Perform similar context discovery on Chinese target words to obtain similar contexts related to the semantics of Chinese target words;

[0067] Construct a Chinese word vector reinforcement learning framework, and conduct reinforcement learning to obtain the word vector representation of the Chinese target word.

[0068] The overall process of the present invention is as figure 1 As shown, the specific implementation steps are as follows:

[0069] Step 1, corpus construction: select a corpus and perform corpus preprocessing to construct a Chinese corpus;

[0070] 1.1, corpus preprocessing: use the opencc toolkit to convert the downloaded Internet text into simplified and traditional characters,...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention discloses a Chinese word vector generation method based on similar contexts and reinforcement learning. The problems that an existing Chinese word vector generation method carries out prediction on the basis of the relation between adjacent contexts of target words and the target words, the situation that although some words in Chinese are adjacent, semantics are irrelevant is not considered, and the expression quality of word vectors is not high are solved. The method comprises the following steps of: selecting a corpus, and preprocessing the corpus so as to construct a Chinesecorpus; carrying out similar context discovery on the Chinese target words to obtain similar contexts related to semantics of the Chinese target words; and constructing a Chinese word vector reinforcement learning framework, and performing reinforcement learning to obtain word vector representation of the Chinese target word. According to the method, the problem that adjacent Chinese words are irrelevant can be solved, and high-quality Chinese word vectors are generated.

Description

technical field [0001] The invention relates to the technical field of natural language processing, in particular to a method for generating Chinese word vectors based on similar context and reinforcement learning. Background technique [0002] Natural language processing is an important direction in the field of computer science and artificial intelligence. Currently, natural language processing tasks include machine translation, sentiment analysis, text summarization, text classification, and information extraction. In natural language processing tasks, the first step is to consider how to enable computers to represent natural language. Computers cannot directly represent natural language. Therefore, we need to design a method to mathematicize natural language so that computers can process it. This is the word vector. Word vectors represent natural language as real vectors containing semantics. Specifically, it is to map words into a vector space and represent them with...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
Patent Type & Authority Applications(China)
IPC IPC(8): G06F16/33G06F40/30G06N3/04G06N3/08
CPCG06F16/3344G06N3/08G06N3/045
Inventor 杨尚明张云刘勇国李巧勤
Owner UNIV OF ELECTRONICS SCI & TECH OF CHINA
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products