Unlock instant, AI-driven research and patent intelligence for your innovation.

A low-dimensional word representation learning method based on frequency distribution correction

A learning method and technology of dimensional words, which are applied in the fields of instruments, calculations, electrical digital data processing, etc., can solve problems such as low-dimensional word representation accuracy needs to be improved, semantic calculation failure, etc.

Active Publication Date: 2021-05-14
SHANXI UNIV
View PDF4 Cites 0 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

However, this representation method has a big flaw: the distributed representation of each word is a very sparse and high-dimensional vector, which often results in failure of semantic calculations based on this high-dimensional vector
However, the existing methods for learning low-dimensional word representations still need to improve the accuracy of low-dimensional word representations.

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • A low-dimensional word representation learning method based on frequency distribution correction
  • A low-dimensional word representation learning method based on frequency distribution correction
  • A low-dimensional word representation learning method based on frequency distribution correction

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0017] The technical solutions of the present invention will be further described in more detail below in conjunction with specific embodiments. Apparently, the described embodiments are only some of the embodiments of the present invention, not all of them. Based on the embodiments of the present invention, all other embodiments obtained by persons of ordinary skill in the art without creative efforts shall fall within the protection scope of the present invention.

[0018] refer to figure 1 , figure 1 It is a schematic diagram of a low-dimensional word representation learning method based on frequency distribution correction provided by the present invention. The steps of the method include:

[0019] S110: According to the given corpus C, generate a vocabulary V; wherein, the vocabulary V is a set of all different words appearing in the corpus C.

[0020] In the present invention, corpus is a collection of language instances collected in the form of natural language text...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

PUM

No PUM Login to View More

Abstract

The invention discloses a low-dimensional word representation learning method based on frequency distribution correction. For a given corpus, the co-occurrence frequency of word pairs in a set window is counted, and the logarithmic value of the co-occurrence frequency is properly raised Transformation, adaptively optimize the parameter value of the power index in the power transformation according to the corpus, first correct the distribution of the co-occurrence frequency of word pairs to the Zipf distribution, and then learn the low-dimensional word representation vector through the GloVe model, and the experiment proves that the obtained word Representation has higher accuracy and is faster to train. Through the present invention, a low-dimensional word representation with higher precision can be generated.

Description

technical field [0001] The invention relates to the field of low-dimensional word representation learning, in particular to a low-dimensional word representation learning method based on frequency distribution correction. Background technique [0002] In natural language, words are the basic units that carry semantics. How to represent the meaning of words? The distributional hypothesis proposed by Harris in 1954 provides a theoretical basis for this assumption: words with similar contexts have similar semantics. Firth further elaborated the distribution hypothesis in 1957: a word is characterized by the company it keeps (the meaning of a word can be characterized by the words around it). [0003] With the widespread use of large-scale corpus in natural language, based on the above distribution hypothesis, a distributed representation of words has evolved. This method needs to construct a word pair co-occurrence matrix, and directly obtain the distributed representation of...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

Application Information

Patent Timeline
no application Login to View More
Patent Type & Authority Patents(China)
IPC IPC(8): G06F40/284
CPCG06F40/284
Inventor 曹学飞李济洪王瑞波王钰石隽峰谷波牛倩
Owner SHANXI UNIV