Looking for breakthrough ideas for innovation challenges? Try Patsnap Eureka!

Data processing device, data processing method, and recording medium

A data processing device and quantitative technology, which is applied in the direction of electrical digital data processing, special data processing applications, database indexing, etc., can solve problems such as uneven dispersion, excess clusters, and reduced quantization efficiency

Pending Publication Date: 2019-09-06
KK TOSHIBA +1
View PDF6 Cites 2 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

However, in this method, the degree of dispersion varies greatly between different sub-vectors, and the number of clusters may be excessive or too small due to sub-vectors when generating a codebook.
Furthermore, if the codebook generated in this way is used to perform direct product quantization of feature vectors, there is a concern that the quantization efficiency will decrease.

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Data processing device, data processing method, and recording medium
  • Data processing device, data processing method, and recording medium
  • Data processing device, data processing method, and recording medium

Examples

Experimental program
Comparison scheme
Effect test

no. 1 example >

[0042] Figure 6 It is a block diagram showing a functional configuration example of the data processing device 10A of the first embodiment. Such as Figure 6 As shown, the data processing device 10A of this embodiment includes a subvector group generation unit 11 , a codebook generation unit 12 and a conversion unit 13 .

[0043] The sub-vector group generator 11 generates M sub-vector groups 230 from the feature vector set 200 composed of N feature vectors 210 . The M sub-vector groups 230 respectively include N-dimensional variable sub-vectors 220 obtained from each of the N feature vectors 210 . The N variable sub-vectors 220 each have one or more dimension values ​​extracted from the feature vector 210 as elements. The number M of sub-vector groups 230 generated by the sub-vector group generation unit 11 is a value smaller than the dimension D of the feature vector 210, but it is not a fixed value as in the past, but a variable value determined adaptively. .

[0044]...

no. 2 example >

[0058] Next, a second embodiment will be described. Compared with the first embodiment described above, the present embodiment adds a function of adjusting the upper limit value T of the number of clusters as a parameter for determining the quantization level. Other functions are the same as those of the above-mentioned first embodiment, so only the functions specific to this embodiment will be described below.

[0059] If the actual application is considered, it is necessary to have the following goals: to what extent can the variation rate of the retrieval accuracy before and after converting the feature vector set 200 into a compressed code set 260 be allowed; or to convert the feature vector set 200 into a compressed code set 260 To what extent is the compression ratio increased. Therefore, it is required to set a target value of a rate of change or a compression rate for retrieval accuracy as a hyperparameter.

[0060]Here, where X is the number of times the feature vec...

no. 3 example >

[0072] Next, a third embodiment will be described. Compared with the above-mentioned second embodiment, this embodiment adds the following function: when a new feature vector 210 is added to the feature vector set 200, it is judged whether the codebook 240 needs to be updated, and only when it is judged that it needs to be updated The codebook 240 is updated. Other functions are the same as those of the above-mentioned first embodiment, so only the specific functions of this embodiment will be described below.

[0073] In practical applications, it is sometimes required to add new feature vectors 210 corresponding to the reserved feature vector set 200 at any time. Here, if the codebook 240 is updated every time a new feature vector 210 is added to the feature vector set 200, updating the codebook 240 will require a lot of calculation time, which is not efficient. Therefore, in this embodiment, when a new feature vector 210 is added to the feature vector set 200, it is judge...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

PUM

No PUM Login to View More

Abstract

The invention relates to a data processing apparatus, a data processing method, and a recording medium, wherein feature vectors can be quantized efficiently. A data processing device according to an embodiment includes a sub-vector group generating unit, a codebook generating unit, and a converting unit. The sub-vector group generating unit generates, from a feature vector set of N number of D-dimensional feature vectors, M number of sub-vector groups (where M<D holds true). Each of the M number of sub-vector groups includes N number of dimension-variable sub-vectors obtained from the N numberof D-dimensional feature vectors. For each of the M number of sub-vector groups, the codebook generating unit performs clustering of the N number of dimension-variable sub-vectors, and generates a codebook in which the representative vector of each cluster is associated with an index. The converting unit performs product quantization using the codebook and converts each of the N number of D-dimensional feature vectors into a compressed code made of a combination of M number of indexes.

Description

[0001] This application benefits from the priority of Japanese Patent Application No. 2018-024700 filed on February 15, 2018, and incorporates the entire contents thereof. technical field [0002] Embodiments of the present invention relate to a data processing device, a data processing method, and a recording medium. Background technique [0003] With the advent of the era of big data, the necessity of retaining a large number of feature vectors used as examples in, for example, pattern recognition and the like has increased. Accompanying this, hardware costs such as a memory and a hard disk drive for retaining feature vectors increase. As one of the solutions to this problem, there is known a method of reducing the memory size of feature vectors by direct product quantization. Direct product quantization is a technique of dividing a feature vector into a plurality of sub-vectors, referring to a codebook, and replacing each sub-vector with an index of a representative vect...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

Application Information

Patent Timeline
no application Login to View More
IPC IPC(8): G06F16/22G06F16/583G06K9/62
CPCG06F16/583G06F16/2228G06F18/23213G06F16/9017G06F16/2237G06F17/16G06F16/35
Inventor 近藤真晖
Owner KK TOSHIBA
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Patsnap Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Patsnap Eureka Blog
Learn More
PatSnap group products