Method and device for quantizing local features of picture into visual vocabularies

A local feature and visual vocabulary technology, applied in special data processing applications, instruments, electrical digital data processing, etc., can solve problems such as large computing overhead, poor robustness, quantization error, etc., to reduce computing overhead and improve robustness , the effect of reducing the quantization error

Active Publication Date: 2013-04-03
BEIJING BAIDU NETCOM SCI & TECH CO LTD
3 Cites 0 Cited by

AI-Extracted Technical Summary

Problems solved by technology

[0006] The above-mentioned closest path mapping method is easy to cause quantization errors because each layer selects a nearest vocabulary, and small changes in the local features of the picture are also easily quantized to different visual vocabulary, resulting in m...
View more

Abstract

The invention provides a method and device for quantizing local features of a picture into visual vocabularies, wherein the method comprises the steps of S1, deciding to-be-selected vocabularies from a first layer of a visual vocabulary tree; S2, computing the confidence degree of a route in which each to-be-selected vocabulary in the current layer exists respectively by utilizing the distance between the local feature and each to-be-selected vocabulary in the current layer and the confidence degree of a route in which a father node of each to-be-selected vocabulary in the current layer exists; and S3, selecting the to-be-selected vocabularies in the route of which the confidence degree is more than or equal to a predetermined confidence degree threshold value in the current layer and judging whether the current layer is the last layer or not, wherein if the current layer is the last layer, the vocabularies selected in the current layer are determined as the visual vocabularies of the local features; and if the current layer is not the last layer, the vocabularies selected from the current layer enter the next layer, child nodes of the selected vocabularies are determined as the to-be-selected vocabulary in the next layer and the operation returns to the step S2. With the adoption of the method and device for quantizing the local features of the picture into the visual vocabularies, based on the improvement of the robustness of quantization errors, the computation overhead during the quantization process is decreased.

Application Domain

Special data processing applications

Technology Topic

Relationship - FatherConfidence factor +4

Image

  • Method and device for quantizing local features of picture into visual vocabularies
  • Method and device for quantizing local features of picture into visual vocabularies
  • Method and device for quantizing local features of picture into visual vocabularies

Examples

  • Experimental program(2)

Example Embodiment

[0036] Embodiment one,
[0037] In the embodiment of the present invention, a fixed number of vocabulary is no longer selected at each layer, but the concept of path confidence is introduced, and the path confidence is used to determine whether to select the vocabulary corresponding to the path and enter the next layer of nodes, specifically as figure 2 As shown, in the process of querying the visual vocabulary tree for each local feature of the picture, starting from the first layer of the visual vocabulary tree, the following steps are performed:
[0038] Step 201: Determine the vocabulary to be selected from the first level of the visual vocabulary tree.
[0039] In the embodiment of the present invention, all the vocabulary in the first layer can be used as the vocabulary to be selected. In addition to this method, other selection methods can also be used to use the vocabulary of one or several nodes as the vocabulary to be selected.
[0040]For the convenience of understanding, a brief introduction to the visual vocabulary tree is given first. The visual vocabulary tree is pre-established based on large-scale images, and the visual vocabulary is extracted from the local features of the large-scale images, and the extracted large-scale visual vocabulary is clustered based on the hierarchy to form a visual vocabulary tree. Wherein, each child node is obtained by further clustering the vocabulary of the parent node, and each node is a collection of more than one word, which is usually called a vocabulary in this field.
[0041] Step 202: Using the distance between the local feature and each word to be selected at the current level and the confidence level of the path where the parent node of each word to be selected at the current level is, respectively calculate the confidence level of the path where each word to be selected at the current level is located.
[0042] When calculating the confidence of the path of each word to be selected in the current vocabulary, the following formula can be used:
[0043] γ i = γ c × Dist min Dist i - - - ( 1 )
[0044] Among them, γ c is the confidence degree of the path where the parent node of the i-th word to be selected is located, wherein the confidence degree of the path where the parent node of each word to be selected in the first layer is located can adopt a preset initial value, for example, 1. Dist i is the distance between the local feature and the i-th vocabulary to be selected, Dist min is the minimum value of the distance between the local feature and each vocabulary to be selected at the current level.
[0045] Since the local feature is an n-dimensional vector, the vocabulary of each node in the visual vocabulary tree is also an n-dimensional vector. For example, if the scale-invariant feature transformation (SIFT, Scale-invariant featuretransform) feature is used, N is 128, so the local The distance between the feature and the vocabulary to be selected can be calculated in a specific way between vectors, including but not limited to: Euclidean distance, cosine distance, etc.
[0046] In fact, the confidence degree of the path where each vocabulary is located is a cumulative value, which is extended from the path where its parent node is located, and its confidence value is accumulated from the path where its parent node is located, and reflects the deviation from The degree of the minimum distance between the vocabulary to be selected and the local features in the current level.
[0047] Step 203: Select the vocabulary to be selected whose confidence degree of the path in the current hierarchy is greater than or equal to a preset confidence threshold.
[0048] After the confidence degree of the path where the vocabulary to be selected at the current level is calculated according to step 202, the visual vocabulary is continued to be selected according to the confidence degree, and a preset reliability threshold is used as the selection basis, and the confidence threshold is usually an empirical value. For example 0.97.
[0049] The path is selected by judging the confidence of the path where the vocabulary is located. The vocabulary selected at each layer is not a fixed value, but depends on the degree of local characteristics it reflects.
[0050] Step 204: Judging whether the current layer is the last layer, if yes, execute step 205; otherwise, execute step 206.
[0051] Step 205: Determine the vocabulary selected in the current level as the visual vocabulary of the local feature, and end the visual vocabulary quantification for the graph feature.
[0052] Step 206: The vocabulary selected from the current level enters the next level, and the child nodes of the selected vocabulary are determined as the vocabulary to be selected in the next level, and the next level is taken as the current level to execute step 202 .
[0053] By analogy, local features are mapped to the last layer of vocabulary on all paths whose cumulative confidence is greater than or equal to the confidence threshold. It should be noted that the last layer here may be the last layer of the visual vocabulary tree, that is, the leaf node of the visual vocabulary tree. If optimal efficiency is not required, you can also set a certain layer of the visual vocabulary tree as the last layer of quantified visual vocabulary, for example, set the penultimate layer of the visual vocabulary tree as the last layer of quantified visual vocabulary, so that local features The next-to-last level of vocabulary that is mapped to a path with a confidence greater than or equal to the confidence threshold.
[0054] Give a simple example to figure 2 The flow of the method shown is described in order to image 3 Take the visual feature tree shown as an example, assuming that the visual feature tree has three layers, starting from the first layer for a certain local feature, first determine that all the words in the first layer are words to be selected, that is, word 1 and word 2, respectively according to Formula (1) calculates the confidence of the path where vocabulary 1 is located γ 1 And the confidence of the path where vocabulary 2 is located γ 2 , where when calculating γ c Using the preset initial value 1, assuming the calculated γ 1 greater than the preset confidence threshold θ, and γ 2 If it is less than θ, select vocabulary 1 to enter the second layer, and determine the child nodes vocabulary 3 and vocabulary 4 of vocabulary 1 as the vocabulary to be selected.
[0055] Calculate the confidence of the path where vocabulary 3 is located according to formula (1) γ 3 and the confidence of vocabulary 4 in the path γ 4 , where when calculating γ c Take the confidence of the path where vocabulary 1 is located, that is, γ 1. Suppose the computed gamma 3 and gamma 4 are greater than the preset confidence threshold θ, select vocabulary 3 and vocabulary 4 to enter the third layer respectively, and determine the child nodes vocabulary 7 and vocabulary 8 of vocabulary 3 and the child nodes vocabulary 9 and vocabulary 10 of vocabulary 4 as the vocabulary to be selected.
[0056] Calculate the confidence of the path where vocabulary 7 is located according to formula (1) γ 7 , the confidence of the path where vocabulary 8 is located γ 8 , the confidence of the path where vocabulary 9 is located γ 9 And the confidence of the path where vocabulary 10 is located γ 10 , where the γ of vocabulary 7 and vocabulary 8 c Take the confidence of the path where vocabulary 3 is located γ 3 , γ for vocabulary 9 and vocabulary 10 c Take the confidence of the path where vocabulary 4 is located γ 4 , assuming that the confidence of the path where vocabulary 7 and vocabulary 9 are located is greater than θ, since the third layer is the last layer, vocabulary 7 and vocabulary 9 are selected as the visual vocabulary of this local feature.
[0057] The example given here is just a simple example. In the actual query process, the level of the visual vocabulary tree and the number of child nodes corresponding to each parent node are often large, and the number of visual vocabulary represented by the visual vocabulary tree is also very large. Therefore, the effect of saving computational overhead when performing calculations using the method provided by the embodiment of the present invention is still obvious.
[0058] In addition, since pictures with more local features are affected by quantization errors, more local features can still be quantized to the same visual vocabulary for effective retrieval. Therefore, in order to further reduce unnecessary computing overhead (this The computational overhead at this point refers to the computational overhead of searching the inverted index during the image retrieval process after the inverted index is established using the visual vocabulary), and the upper limit of the visual vocabulary of the image can be limited. After all local features of the picture have been quantified for visual vocabulary, the visual vocabulary of all local features is sorted according to the confidence of the path, and the top N visual vocabulary is selected as the visual vocabulary of the picture. N is the upper limit of the set visual vocabulary limit, which can be a fixed positive integer, or can be set according to the number of local features.
[0059] When setting according to the number of local features, if the picture has more local features, the picture will have a stronger distinguishing power during retrieval, and a relatively small number of visual words is required to achieve higher retrieval accuracy. On the contrary, if the local features of the picture are few, the picture will have weak discriminative power during retrieval, and relatively more visual words are needed to achieve high retrieval accuracy. Therefore, the larger the number of local features of the picture, the smaller the value of N can be set, and the smaller the number of local features of the picture, the larger the value of N can be set.

Example Embodiment

[0061] Embodiment two,
[0062] Figure 4 The device structure diagram provided for the second embodiment of the present invention, such as Figure 4 As shown, the device may include: an initial query unit 401 , a confidence calculation unit 402 , a selection judgment unit 403 , and a visual vocabulary determination unit 404 .
[0063] The quantification of the visual vocabulary tree for local features is actually the process of querying the visual vocabulary tree for each local feature of the picture. In the process of querying the visual vocabulary tree for the local features of the picture, the initial query unit 401 starts from the first visual vocabulary tree of the visual vocabulary tree. One level determines the vocabulary to be selected, and the first level is used as the current level to trigger the confidence calculation unit 402 .
[0064] Specifically, the initial query unit 401 may use all the vocabulary in the first layer as the vocabulary to be selected. Besides this method, other selection methods may also be used to use the vocabulary of one or several nodes as the vocabulary to be selected.
[0065] After the confidence calculation unit 402 is triggered (initially triggered by the initial query unit 401, and subsequently triggered by the selection judging unit 403), it uses the distance between the local features and each vocabulary to be selected at the current level and each word at the current level. Confidence of the path where the parent node of the vocabulary to be selected is calculated respectively, and the confidence of the path of each vocabulary to be selected in the current level is calculated, wherein the confidence of the path of the parent node of each vocabulary to be selected in the first layer is the preset initial value .
[0066] The confidence degree of the path where the i-th word to be selected in the current level can be calculated according to the following formula γ i :
[0067] γ i = γ c × Dist min Dist i
[0068] where gamma c is the confidence degree of the path where the parent node of the i-th vocabulary to be selected is located, Dist i is the distance between the local feature and the i-th vocabulary to be selected, Dist min is the minimum value of the distance between the local feature and each vocabulary to be selected at the current level.
[0069] When calculating the distance between the local features and the vocabulary to be selected, any method of calculating the distance between vectors, such as Euclidean distance, cosine distance, etc., can be used.
[0070] According to the calculation result of the confidence calculation unit 402, the selection judgment unit 403 selects the words to be selected whose confidence of the path in the current hierarchy is greater than or equal to the preset confidence threshold, and judges whether the current hierarchy is the last layer, and if so, the current hierarchy The vocabulary selected in the hierarchy is provided to the visual vocabulary determining unit 404; otherwise, the vocabulary selected from the current hierarchy enters the next layer, and the child nodes of the selected vocabulary are determined as the vocabulary to be selected in the next hierarchy, and the next hierarchy is used as The current level triggers the confidence calculation unit 402 .
[0071] Wherein, the above-mentioned confidence threshold generally adopts an empirical value, such as 0.97.
[0072] The visual vocabulary determining unit 404 is configured to determine the vocabulary provided by the selection judging unit 403 as a visual vocabulary of local features.
[0073]The final local features are mapped to the last layer vocabulary on all paths whose cumulative confidence is greater than or equal to the confidence threshold. The last layer here may be the last layer of the visual vocabulary tree, that is, the leaf node of the visual vocabulary tree. If optimal efficiency is not required, you can also set a certain layer of the visual vocabulary tree as the last layer of quantified visual vocabulary, for example, set the penultimate layer of the visual vocabulary tree as the last layer of quantified visual vocabulary, so that local features The next-to-last level of vocabulary that is mapped to a path with a confidence greater than or equal to the confidence threshold.
[0074] In addition, since pictures with more local features are affected by quantization errors, more local features can still be quantized to the same visual vocabulary for effective retrieval. Therefore, in order to further reduce unnecessary computing overhead (this The calculation overhead at refers to the calculation overhead of searching the inverted index during the image retrieval process after the inverted index is established by using the visual vocabulary), and the device may also include: a vocabulary control unit 405, which is used to After quantifying the visual vocabulary of all local features, the visual vocabulary of all local features is sorted according to the confidence of the path, and the top N visual vocabulary is selected as the visual vocabulary of the picture, and N is a preset positive integer.
[0075] N is the upper limit of the set visual vocabulary limit, which can be a fixed positive integer, or can be set according to the number of local features.
[0076] When setting according to the number of local features, if the picture has more local features, the picture will have a stronger distinguishing power during retrieval, and a relatively small number of visual words is required to achieve higher retrieval accuracy. On the contrary, if the local features of the picture are few, the picture will have weak discriminative power during retrieval, and relatively more visual words are needed to achieve high retrieval accuracy. Therefore, the larger the number of local features of the picture, the smaller the value of N can be set, and the smaller the number of local features of the picture, the larger the value of N can be set.
[0077] After the above-mentioned method and device provided by the present invention are used to quantify the visual vocabulary of pictures in the picture library, the visual vocabulary can be used to establish an inverted index, thereby providing a basis for image retrieval, and can be applied to a large number of image retrieval products. In addition, in the process of image retrieval, the above-mentioned method and device provided by the present invention can be used to quantify the visual vocabulary of the local features of the image to be retrieved, and then use the obtained visual vocabulary to search the inverted index in the image database to determine The hit pictures are used as the search results.
[0078] As can be seen from the above description, in the process of quantifying visual vocabulary, the present invention does not select a nearest vocabulary for each layer, nor fixedly select N vocabulary, but according to the nodes and numbers of each layer in the visual vocabulary tree. The proximity of local features adaptively selects an appropriate number of words to enter the next layer, and the proximity is measured by the confidence of the path where the words are located. Therefore, the method and device provided by the invention have the following advantages:
[0079] 1) Compared with the method of selecting a nearest vocabulary at each layer, the quantization error is reduced and the robustness of the quantization error is improved.
[0080] 2) Because the influence of quantization error is reduced, the recall rate is improved in the process of image retrieval. It has been proved through experiments that the average number of correctly recalled results for each image retrieval increases by 37%.
[0081] 3) Compared with the method of selecting a fixed number of words in each layer, the method of choosing words in the present invention is more reasonable, and some words that are not close to local features will not be selected for participation due to the fixed number requirement. The calculation of visual vocabulary quantification reduces unnecessary expansion, thereby reducing computational overhead and improving quantization efficiency.

PUM

no PUM

Description & Claims & Application Information

We can also present the details of the Description, Claims and Application information to help users get a comprehensive understanding of the technical details of the patent, such as background art, summary of invention, brief description of drawings, description of embodiments, and other original content. On the other hand, users can also determine the specific scope of protection of the technology through the list of claims; as well as understand the changes in the life cycle of the technology with the presentation of the patent timeline. Login to view more.

Similar technology patents

Low-power time-to-digital converter

ActiveCN110174834AHigh precisionReduce quantization error
Owner:FUDAN UNIV

Method and device for converting a quantized digital value

InactiveUS7046173B2reduce quantization errorminimize error
Owner:EADS SECURE NETWORKS

ITQ algorithm-based Indonetic similar news recommendation method

ActiveCN109992716AReduce computation and memory overheadReduce quantization error
Owner:UNIV OF ELECTRONIC SCI & TECH OF CHINA

Cross-modal image audio retrieval method based on deep heterogeneous correlation learning

PendingCN113343014AReduce quantization errorImprove retrieval accuracy
Owner:WUHAN UNIV OF TECH

Method for improving ranging precision of radar based on Gaussian interpolation

PendingCN112462356Asmall amount of calculationReduce quantization error
Owner:CNGC INST NO 206 OF CHINA ARMS IND GRP

Classification and recommendation of technical efficacy words

  • Reduce quantization error
  • Improve robustness

Low-power time-to-digital converter

ActiveCN110174834AHigh precisionReduce quantization error
Owner:FUDAN UNIV

Method for improving ranging precision of radar based on Gaussian interpolation

PendingCN112462356Asmall amount of calculationReduce quantization error
Owner:CNGC INST NO 206 OF CHINA ARMS IND GRP

Method and device for converting a quantized digital value

InactiveUS7046173B2reduce quantization errorminimize error
Owner:EADS SECURE NETWORKS

Cross-modal image audio retrieval method based on deep heterogeneous correlation learning

PendingCN113343014AReduce quantization errorImprove retrieval accuracy
Owner:WUHAN UNIV OF TECH

ITQ algorithm-based Indonetic similar news recommendation method

ActiveCN109992716AReduce computation and memory overheadReduce quantization error
Owner:UNIV OF ELECTRONIC SCI & TECH OF CHINA

Automatic parking method based on fusion of vision and ultrasonic perception

ActiveCN110775052AImprove robustnessImprove obstacle avoidance reliability
Owner:ZHEJIANG LEAPMOTOR TECH CO LTD

Finite time convergence time-varying sliding mode attitude control method

InactiveCN105242676AImprove robustnessEliminate jumping problems
Owner:BEIJING INSTITUTE OF TECHNOLOGYGY
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products