Word embedding representation method based on internal semantic hierarchical structure

A technology of hierarchical structure and embedded representation, applied in semantic analysis, neural architecture, natural language data processing, etc., can solve the problem of neglecting rich semantic information

Inactive Publication Date: 2017-08-08
XIAMEN UNIV
View PDF2 Cites 18 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

However, these methods ignore the rich semantic information contained in the Chinese vocabulary, which is composed of characters.
Therefore, the current method of word embedding representation learning is still insufficient, and how to obtain a better word embedding representation is still of great research value

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Word embedding representation method based on internal semantic hierarchical structure
  • Word embedding representation method based on internal semantic hierarchical structure
  • Word embedding representation method based on internal semantic hierarchical structure

Examples

Experimental program
Comparison scheme
Effect test

specific Embodiment approach

[0028] The first step is to serialize the tree structure according to the invariance of the hierarchical structure of the internal characters of the word;

[0029] In the second step, the above sequence is used as the input of the bidirectional GRU network for embedded representation encoding;

[0030] The third step is to perform parameter training with the goal of maximizing the language model probability.

[0031] The following describes the implementation details of the key steps:

[0032] 1. Serialize the tree structure

[0033] In the present invention, an open-source tool [9] is used to obtain the word internal hierarchy in the form of a character-level tree. Based on this tree structure, serialized word structure information can be extracted.

[0034] figure 1 Character-level tree structures obtained through open-source tools are given. The character-level tree structure of the sentence "China's construction industry presents a new pattern" contains the words "Chi...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention relates to a word embedding representation method based on an internal semantic hierarchical structure and relates to natural language processing based on deep learning. The internal hierarchical structure of each word in an input text is acquired by an open-source tool; the structure is similar to a conventional constituency tree structure, the difference is that a character is taken as a basic unit, and the hierarchical position and classification information of each character in the structure are labeled; the structure is subjected to serialization operation according to the principle that the hierarchical structure is invariant, and a text sequence keeping the hierarchical position and the classification information in a word is obtained; a bidirectional GRU network acts on the sequence for embedding representation of codes, then, two embedding representation vectors obtained by forward and backward GRU networks are spliced and finally subjected to non-linear transformation and ReLU operation, and a final embedding representation vector of the word is obtained. The frame structure is clear and concise, and the method is direct and favorable to learning of word embedding representation with richer meaning and better serves follow-up natural language processing tasks.

Description

technical field [0001] The present invention relates to natural language processing based on deep learning, in particular to a word embedding representation method based on internal semantic hierarchy. Background technique [0002] Natural language processing, as an intersection branch of computer science and linguistics, is a very popular subject right now. It mainly discusses various theories and methods of processing and using natural language to enable effective communication with computer systems. In recent years, the research on natural language processing based on deep learning has become the main trend in the development of this discipline. [0003] Word embedding refers to the use of distributed vectors to represent the semantic information of words. By mapping the words in natural language into low-dimensional and dense vectors, the words are in the same vector space, and the concept of "distance" is introduced to measure the semantic similarity between words, wh...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
Patent Type & Authority Applications(China)
IPC IPC(8): G06F17/27G06N3/04
CPCG06F40/247G06F40/30G06N3/04
Inventor 苏劲松杨静阮志伟张祥文
Owner XIAMEN UNIV
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products