Eureka AIR delivers breakthrough ideas for toughest innovation challenges, trusted by R&D personnel around the world.

Dictionary-based word vector generation method and system

A technology of word vectors and dictionaries, which is applied in the field of generating word vectors based on dictionaries, can solve problems such as insufficient word meaning mining and insufficient word training, and achieve the effect of accurate word vectors and easy mining

Pending Publication Date: 2021-01-01
WORKWAY SHENZHENINFORMATION TECH CO LTD
View PDF0 Cites 0 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

The corpus is manually compiled, and the frequency of two words with similar meanings is sometimes very different, which will lead to insufficient training of words with low frequency and insufficient word meaning mining

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Dictionary-based word vector generation method and system
  • Dictionary-based word vector generation method and system
  • Dictionary-based word vector generation method and system

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0039] Embodiments of the present invention will be described in detail below in conjunction with the accompanying drawings.

[0040] It should be noted that, in the case of no conflict, the following embodiments and the features in the embodiments can be combined with each other; and, based on the embodiments in the present disclosure, those of ordinary skill in the art obtained without creative work All other embodiments belong to the protection scope of the present disclosure.

[0041] It is noted that the following describes various aspects of the embodiments that are within the scope of the appended claims. It should be apparent that the aspects described herein may be embodied in a wide variety of forms and that any specific structure and / or function described herein is illustrative only. Based on the present disclosure one skilled in the art should appreciate that an aspect described herein may be implemented independently of any other aspects and that two or more of t...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

PUM

No PUM Login to View More

Abstract

The invention relates to a dictionary-based word vector generation method and system, and the method comprises the steps: enabling vocabularies contained in the dictionary to form a vocabulary set, carrying out the statistics of the occurrence frequency of each vocabulary in the vocabulary set in vocabulary paraphrases contained in the dictionary, carrying out the word segmentation of each vocabulary paraphrase according to the frequency, and obtaining a paraphrase vocabulary sequence; taking the vocabularies as nodes, connecting the nodes according to the corresponding relation between the vocabularies and the paraphrasing vocabulary sequences to form directed edges, and determining the weight of each directed edge to obtain a directed graph based on a dictionary; and calculating the directed graph based on a depth walk algorithm to obtain a word vector. According to the method, the vocabulary information provided by the dictionary is fused into the word vector, so that a high-qualitydata basis can be provided for word vector training, word meanings can be better mined, and a natural language processing task is supported.

Description

technical field [0001] The invention relates to the field of natural language processing, in particular to a method and system for generating word vectors based on a dictionary. Background technique [0002] The technique of representing words as vectors originated in the 1960s with the development of vector space models for information retrieval, using singular value decomposition to reduce dimensionality, and then latent semantic analysis was introduced in the late 1980s. With the continuous development of technology, word vectors combined with deep networks are widely used in existing natural language processing tasks. Usually, word vectors are generated based on massive unlabeled corpus. The basic idea is to predict the current word using the context and context of the text. The corpus is manually compiled, and the frequency of two words with similar meanings is sometimes very different, which will lead to insufficient training of words with low frequency and insufficien...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

Application Information

Patent Timeline
no application Login to View More
IPC IPC(8): G06F40/242G06F40/289G06F40/284G06F40/30
CPCG06F40/242G06F40/284G06F40/289G06F40/30
Inventor 练睿肖杰莫永卓赵顺峰
Owner WORKWAY SHENZHENINFORMATION TECH CO LTD
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Eureka Blog
Learn More
PatSnap group products