Unlock instant, AI-driven research and patent intelligence for your innovation.

A text information processing method, device and equipment

A text information and processing method technology, applied in the field of information processing, can solve the problems of model performance loss, accuracy reduction, large data space, etc., achieve the effect of less accuracy loss, reduce the amount of parameters, and maintain model accuracy

Active Publication Date: 2022-04-19
CLOUD WISDOM BEIJING TECH
View PDF9 Cites 0 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

However, when this method has a large vocabulary, the word2vec data will take up a lot of space. For example, a piece of English word vector has a dimension of 300, including a total of 2 million words and subwords. The size of the word vector file is only 7GB, which is common in For application scenarios, memory and computing resources are limited, so it is not available
[0005] In addition, the methods for model compression usually include knowledge distillation, network pruning, and low-rank approximation. The problem is that the expected effect may not be obtained, and the efficiency is not high.
At the same time, after compression, the performance loss of the model is usually serious. For example, after the word vector is changed from 300 dimensions to 50 dimensions with a low-rank approximation method, the accuracy rate is reduced by 30-40%. Another common model compression method is the product Quantization, the core idea of ​​product quantization is to cluster the weights, and the categories are represented by indexes, which are replaced by indexes in the original weight matrix. However, this method of model compression is not suitable for the vocabulary compression of the sentence vector model. The change of each dimension data will affect the expression accuracy of the entire word vector

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • A text information processing method, device and equipment
  • A text information processing method, device and equipment
  • A text information processing method, device and equipment

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0043] Exemplary embodiments of the present invention will be described in more detail below with reference to the accompanying drawings. Although exemplary embodiments of the present invention are shown in the drawings, it should be understood that the invention may be embodied in various forms and should not be limited to the embodiments set forth herein. Rather, these embodiments are provided for more thorough understanding of the present invention and to fully convey the scope of the present invention to those skilled in the art.

[0044] Such as figure 1 As shown, the present invention provides a method for processing text information, including:

[0045] Step 11, obtaining the sentence vector model and word vector of the text information;

[0046] Step 12, according to the sentence vector model and the word vector, determine the first parameter matrix and the first vocabulary;

[0047] Step 13, performing product quantization processing on the first parameter matrix t...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

PUM

No PUM Login to View More

Abstract

Embodiments of the present invention provide a text information processing method, device, and equipment. The method includes: acquiring a sentence vector model and a word vector of the text information; and determining the first parameter matrix and The first vocabulary; performing product quantization processing on the first parameter matrix to obtain a quantized encoding matrix; performing compression processing on the first vocabulary to obtain a compressed vocabulary; according to the quantized encoding matrix and the compressed vocabulary , to process the text information to obtain a processing result; the embodiments of the present invention greatly reduce the amount of model parameters while maintaining model accuracy with less loss of accuracy.

Description

technical field [0001] The present invention relates to the technical field of information processing, in particular to a text information processing method, device and equipment. Background technique [0002] word2vec is a word embedding method that can convert uncomputable and unstructured words into computable and structured vectors, and can transform natural language processing problems into mathematical problems. It is a common natural language processing task such as text The premise of classification, semantic similarity calculation, machine translation, etc.; [0003] Based on an idea similar to word2vec, the sentence-level content is vectorized, that is, sentence2vec, so that it can be efficiently calculated in later use, such as finding similar sentences. Common methods are as follows: [0004] The word vector corresponding to the word appearing in the sentence is mapped to the sentence vector space by mapping the parameter matrix to obtain the sentence vector. H...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

Application Information

Patent Timeline
no application Login to View More
Patent Type & Authority Patents(China)
IPC IPC(8): G06F40/289G06F16/35G06K9/62
CPCG06F40/289G06F16/35G06F18/23213
Inventor 梁矗郑铁樵张博
Owner CLOUD WISDOM BEIJING TECH