Unlock instant, AI-driven research and patent intelligence for your innovation.

Word vector matrix compression method and device, and method and device for obtaining word vectors

A compression method and word vector technology, applied in the field of data processing, can solve the problems of large volume of word vector matrix, large storage space, etc.

Active Publication Date: 2022-04-12
HUAWEI TECH CO LTD
View PDF9 Cites 0 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0004] Because the language has the characteristics of large vocabulary and complex syntax and grammar, a large number of features are required to describe it. Therefore, the generated word vector matrix is ​​large in size and occupies a relatively large storage space. Usually, it cannot be directly applied to the client device.

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Word vector matrix compression method and device, and method and device for obtaining word vectors
  • Word vector matrix compression method and device, and method and device for obtaining word vectors
  • Word vector matrix compression method and device, and method and device for obtaining word vectors

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0051] The relevant terms involved in the embodiments of the present application are briefly introduced below to facilitate readers' understanding.

[0052] Vocabulary is the sum of all words and / or phrases in a language (including Chinese, English, etc.) or within a specific range. In the embodiment of the present application, unless otherwise specified, "word" may refer to a word or a phrase. Wherein, the words here include "character" and "ci" in Chinese characters, and "words" in languages ​​such as English.

[0053] Semantic information of a word is a collection of feature information used to describe the word. Wherein, the feature information of the word may include but not limited to at least one of the following: meaning of the word, part of speech (such as noun, adjective, etc.), synonyms and antonyms, and the like. For example, the semantic information of "beautiful" may include: the meaning is "beautiful, that is, it is close to perfection or ideal in form, propor...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

PUM

No PUM Login to View More

Abstract

The present application discloses a word vector matrix compression method and device, and a method and device for obtaining word vectors, which relate to the technical field of data processing and help to save storage space of client devices. The word vector matrix compression method includes: generating a word vector matrix for representing the vocabulary to be processed based on the word vector model, and using the generated word vector matrix as the word vector matrix to be compressed; one row or column of the word vector matrix to be compressed is a word Vector, a word vector in the word vector matrix to be compressed is used to represent a word in the vocabulary to be processed; according to the semantic information of the vocabulary to be processed, the word vectors included in the word vector matrix to be compressed are classified to obtain at least two categories; Compressing at least one of the at least two categories, and constructing a compressed word vector matrix according to the compressed word vector of the at least one category, and storing the compressed word vector matrix.

Description

technical field [0001] The present application relates to the technical field of data processing, and in particular to a word vector matrix compression method and device, and a method and device for obtaining word vectors. Background technique [0002] With the development of deep learning technology, the performance of natural language processing (NLP) tasks has been greatly improved. Among them, NLP tasks can include: word segmentation, part-of-speech tagging, named-entity recognition (NER ), sentence classification, dialogue system, etc. Based on deep learning technology, the core of NLP tasks is to represent vocabulary through language models. The language model not only contains the meaning of the words themselves, but also reflects the relationship between different words, such as synonyms, antonyms, and contextual relationships. [0003] The word vector matrix is ​​a specific representation of the language model. A word vector matrix is ​​a matrix composed of one or...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

Application Information

Patent Timeline
no application Login to View More
Patent Type & Authority Patents(China)
IPC IPC(8): G06F40/253G06F16/35
CPCG06F16/35G06F40/284
Inventor 谢月飞宋增猛王俊汤华马占寅
Owner HUAWEI TECH CO LTD