Looking for breakthrough ideas for innovation challenges? Try Patsnap Eureka!

A word vector improvement method embedding external dictionary information

A technology of word vectors and dictionaries, which is applied in the field of improving word vectors embedded with external dictionary information, can solve problems such as insufficient word meaning mining, inability to directly determine the meaning of central words, insufficient training, etc., to reduce the distance between word vectors and alleviate the shortage of labeled data Effect

Active Publication Date: 2019-01-25
SUN YAT SEN UNIV
View PDF4 Cites 3 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

The reason for this shortcoming is that the words contained around the central word can only determine the usage of the central word, and cannot directly determine the meaning of the central word
[0006] (2) Since the corpus is manually written, the frequencies of two words with similar meanings are sometimes very different, which will lead to low-frequency words Insufficient training and insufficient word meaning mining

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • A word vector improvement method embedding external dictionary information

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0025] The accompanying drawings are for illustrative purposes only, and should not be construed as limitations on this patent; in order to better illustrate this embodiment, certain components in the accompanying drawings will be omitted, enlarged or reduced, and do not represent the size of the actual product; for those skilled in the art It is understandable that some well-known structures and descriptions thereof may be omitted in the drawings. The positional relationship described in the drawings is for illustrative purposes only, and should not be construed as a limitation on this patent.

[0026] Such as figure 1 As shown, an improved word vector embedding method of external dictionary information, which includes the following steps:

[0027] S1: Prepare a large corpus and an electronic dictionary;

[0028] S2: Dictionary of similar words: Each word in the electronic dictionary may have synonyms and synonyms, which are extracted and recorded using scripts;

[0029] S...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

PUM

No PUM Login to View More

Abstract

The present invention relates to the technical field of natural language processing and more particularly to a word vector improvement method embedding external dictionary information. The invention fuses the information of the similar word dictionary and the related word dictionary on the basis of the common word vector. Compared with the common word vector, the invention can better separate theinfluence of the co-occurrence word, at the same time, reduce the distance of the word vector of the word with similar meaning, so that the final word vector is closer to the objective word meaning ofthe word. On the other hand, because word vectors are the underlying technology for many natural language processing tasks, word vectors that are closer to the objective meaning of a word help to enhance downstream tasks. External pre-trained high-quality word vectors can also alleviate the shortage of tagged data in some tasks.

Description

technical field [0001] The present invention relates to the technical field of natural language processing, and more specifically, relates to a method for improving word vectors embedded with external dictionary information. Background technique [0002] The word vector often used in the current underlying technology of natural language processing is the vectorized representation of words, and each dimension of the vector often has a certain physical meaning related to the meaning of the word. The most widely used word vector technology at this stage is word2vec, which belongs to distributed word vectors. It obeys a distributed assumption that the meaning of a word is jointly determined by the words contained in the context. There are two ways to implement word2vec, the CBOW model and the skip-gram model. The idea of ​​the CBOW model is to predict the center word based on several words around the center word, while the idea of ​​the skip-gram model is to predict the center ...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

Application Information

Patent Timeline
no application Login to View More
Patent Type & Authority Applications(China)
IPC IPC(8): G06F17/27
CPCG06F40/247
Inventor 黄淼鑫潘嵘
Owner SUN YAT SEN UNIV
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Patsnap Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Patsnap Eureka Blog
Learn More
PatSnap group products