Word vector matrix compression method and device and word vector acquisition method and device

A technology of compression method and compression device, which is applied in the field of data processing, and can solve the problems of large volume of word vector matrix and large storage space, etc.

Active Publication Date: 2019-08-06
HUAWEI TECH CO LTD
View PDF9 Cites 1 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0004] Because the language has the characteristics of large vocabulary and complex syntax and grammar, a large number of features are required to describe it. Therefore, the generated word vector matrix is ​​large in size and occupies a relatively large storage space. Usually, it cannot be directly applied to the client device.

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Word vector matrix compression method and device and word vector acquisition method and device
  • Word vector matrix compression method and device and word vector acquisition method and device
  • Word vector matrix compression method and device and word vector acquisition method and device

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0051] The relevant terms involved in the embodiments of the present application are briefly introduced below to facilitate readers' understanding.

[0052] Vocabulary is the sum of all words and / or phrases in a language (including Chinese, English, etc.) or within a specific range. In the embodiment of the present application, unless otherwise specified, "word" may refer to a word or a phrase. Wherein, the words here include "character" and "ci" in Chinese characters, and "words" in languages ​​such as English.

[0053] Semantic information of a word is a collection of feature information used to describe the word. Wherein, the feature information of the word may include but not limited to at least one of the following: meaning of the word, part of speech (such as noun, adjective, etc.), synonyms and antonyms, and the like. For example, the semantic information of "beautiful" may include: the meaning is "beautiful, that is, it is close to perfection or ideal in form, propor...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention discloses a word vector matrix compression method and device and a word vector acquisition method and device, which relate to the technical field of data processing, and are beneficial to saving the storage space of client equipment. The word vector matrix compression method comprises the following steps of generating a word vector matrix for representing a to-be-processed vocabularybased on a word vector model, and taking the generated word vector matrix as a to-be-compressed word vector matrix, wherein one row or one column of the to-be-compressed word vector matrix is a wordvector, and one word vector in the to-be-compressed word vector matrix is used for representing one word in the to-be-processed vocabulary; classifying the word vectors included in the to-be-compressed word vector matrix according to the semantic information of the to-be-processed vocabulary to obtain at least two categories; and compressing at least one of the at least two categories, constructing a compressed word vector matrix according to the word vector obtained by compressing the at least one category, and storing the compressed word vector matrix.

Description

technical field [0001] The present application relates to the technical field of data processing, and in particular to a word vector matrix compression method and device, and a method and device for obtaining word vectors. Background technique [0002] With the development of deep learning technology, the performance of natural language processing (NLP) tasks has been greatly improved. Among them, NLP tasks can include: word segmentation, part-of-speech tagging, named-entity recognition (NER ), sentence classification, dialogue system, etc. Based on deep learning technology, the core of NLP tasks is to represent vocabulary through language models. The language model not only contains the meaning of the words themselves, but also reflects the relationship between different words, such as synonyms, antonyms, and contextual relationships. [0003] The word vector matrix is ​​a specific representation of the language model. A word vector matrix is ​​a matrix composed of one or...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
Patent Type & Authority Applications(China)
IPC IPC(8): G06F17/27G06F16/35
CPCG06F16/35G06F40/284
Inventor 谢月飞宋增猛王俊汤华马占寅
Owner HUAWEI TECH CO LTD
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products