Image feature decomposition method, device, equipment and storage medium
By using clustering, normalization, and weighted summation methods on the feature maps output by the neural network, the problem that Grad-Cam cannot distinguish different features of an image is solved, and feature decomposition and importance display are realized, supporting in-depth research on causal analysis.
Patent Information
- Authority / Receiving Office
- CN · China
- Patent Type
- Patents(China)
- Current Assignee / Owner
- PING AN TECH (SHENZHEN) CO LTD
- Filing Date
- 2023-02-15
- Publication Date
- 2026-06-23
AI Technical Summary
The existing Grad-Cam method cannot effectively distinguish different features of an image and give the importance between different features in causal analysis, which makes it impossible to interpret the learning results of the neural network in detail.
By clustering, normalizing, and weighted summing the feature map set output by the neural network, the feature map set is decomposed and the relative importance between different features is displayed.
It enables effective decomposition and importance display of different features in images, supporting in-depth research on subsequent causal analysis.
Smart Images

Figure CN116797526B_ABST
Abstract
Description
Technical Field
[0001] This application relates to the field of artificial intelligence technology, and in particular to an image feature decomposition method, apparatus, device and storage medium. Background Technology
[0002] Causal analysis is a mainstream direction in current work on the interpretability of neural networks. Its main idea is to infer the causal relationship between different features of an image and the output of the neural network, thereby providing a better explanation of the neural network as a "black box." Grad-Cam is a relatively mature method for feature visualization, which can, to some extent, explain the results learned by the neural network. Applying Grad-Cam to causal analysis is a technique worth exploring, but Grad-Cam itself still has significant limitations, such as a broad visualization range, inability to distinguish different image features, and inability to provide the importance between different features. For example, it cannot distinguish different image features in medical images, nor can it assign importance to different features. Furthermore,... Figure 1 While the study identifies the important regions of the bird learned by the neural network, it fails to distinguish between different bird features. For instance, the features of the bird's head and body are not well decoupled, nor is their relative importance given. This poses a significant obstacle to subsequent causal analysis. Summary of the Invention
[0003] To address the aforementioned technical problems, the purpose of this application is to provide an image feature decomposition method, apparatus, device, and storage medium, which aims to distinguish different features of an image and display the relative importance between different features.
[0004] In a first aspect, embodiments of the present invention provide an image feature decomposition method, comprising:
[0005] Obtain the feature map set output by the neural network;
[0006] Clustering the feature maps in the feature map set yields feature map subsets of different categories;
[0007] The feature maps in the feature map subsets of different categories are normalized to obtain normalized feature map subsets for each category;
[0008] The feature maps in the subsets of the normalized feature maps for each category are weighted and summed to obtain the final feature maps for each category.
[0009] Furthermore, the clustering of feature maps in the feature map set to obtain feature maps of different categories includes:
[0010] The feature maps in the feature map set are clustered according to the pixel where the maximum pixel value of the feature map is located, resulting in feature maps of different categories.
[0011] Furthermore, the step of normalizing the feature maps in the subset of feature maps includes:
[0012] Divide each pixel in the feature map by the maximum pixel value in the feature map set.
[0013] Furthermore, the step of performing a weighted summation of the feature maps in the normalized feature map subset includes:
[0014] Obtain the gradient map corresponding to each feature map in the normalized feature map subset;
[0015] Calculate the weights of each feature map based on the gradient map corresponding to each feature map;
[0016] The feature maps in the normalized feature map subset are weighted and summed according to their respective weights.
[0017] Furthermore, the step of calculating the weights of each feature map based on the gradient map corresponding to each feature map includes:
[0018] The gradient maps corresponding to each feature map are globally averaged to obtain the weights of each feature map.
[0019] In a second aspect, embodiments of the present invention provide an image feature decomposition apparatus, comprising:
[0020] The acquisition module is used to acquire the feature map set output by the neural network;
[0021] The clustering module is used to cluster the feature maps in the feature map set to obtain feature map subsets of different categories;
[0022] The normalization processing module is used to normalize the feature maps in the feature map subsets of different categories respectively, so as to obtain the normalized feature map subsets of each category;
[0023] The weighted summation module is used to perform weighted summation on the feature maps in the feature map subsets of each normalized category to obtain the final feature map for each category.
[0024] Furthermore, the clustering module is specifically used for:
[0025] The feature maps in the feature map set are clustered according to the pixel where the maximum pixel value of the feature map is located, resulting in feature maps of different categories.
[0026] Furthermore, the weighted summation module is specifically used for:
[0027] Obtain the gradient map corresponding to each feature map in the normalized feature map subset;
[0028] Calculate the weights of each feature map based on the gradient map corresponding to each feature map;
[0029] The feature maps in the normalized feature map subset are weighted and summed according to their respective weights.
[0030] Thirdly, embodiments of this application provide a computer device, including a memory and a processor, wherein the memory stores a computer program, and the processor executes the computer program to implement the steps of any of the methods described above.
[0031] Fourthly, embodiments of this application provide a computer-readable storage medium having a computer program stored thereon, wherein the computer program, when executed by a processor, implements the steps of the method described in any of the preceding claims.
[0032] This invention, through its embodiments, obtains a feature map set output by a neural network; clusters the feature maps in the feature map set to obtain feature map subsets of different categories; normalizes the feature maps in the feature map subsets of different categories to obtain normalized feature map subsets of each category; and performs weighted summation on the feature maps in the feature map subsets of each category after normalization to obtain the final feature map of each category. In this way, different features are effectively decomposed, and the relative importance between different features can be displayed. Attached Figure Description
[0033] To more clearly illustrate the technical solution of this application, the drawings used in the embodiments will be briefly introduced below. Obviously, the drawings described below are only some embodiments of this application. For those skilled in the art, other drawings can be obtained from these drawings without creative effort.
[0034] Figure 1 This is a schematic flowchart of the image feature decomposition method provided in the embodiments of this application;
[0035] Figure 2 This is another schematic flowchart of the image feature decomposition method provided in the embodiments of this application;
[0036] Figure 3 This is a feature decomposition diagram of a bird provided in an embodiment of this application;
[0037] Figure 4 This is a flowchart illustrating the weighted summation of feature maps within a subset of feature maps, provided in an embodiment of this application.
[0038] Figure 5This is a schematic diagram of the structure of an image feature decomposition device provided in an embodiment of this application;
[0039] Figure 6 This is a schematic block diagram of the structure of the computer device provided in the embodiments of this application. Detailed Implementation
[0040] The technical solutions of the embodiments of this application will be clearly and completely described below with reference to the accompanying drawings. Obviously, the described embodiments are only some embodiments of this application, and not all embodiments. Based on the embodiments of this application, all other embodiments obtained by those skilled in the art without creative effort are within the scope of protection of this application.
[0041] Example 1:
[0042] Grad-Cam (Gradient-weighted Class Activation Mapping) is a well-established method for feature visualization, offering some interpretation of the results learned by neural networks. Applying Grad-Cam to causal analysis is a promising technique, but it has significant limitations, such as a broad visualization range, inability to distinguish different image features, and inability to assess the importance of different features. To address these issues, this invention improves upon Grad-Cam to differentiate image features, providing a better foundation for causal analysis.
[0043] Please see Figure 1 , Figure 2 This application provides an image feature decomposition method, including steps S1-S4:
[0044] S1. Obtain the feature map set output by the neural network;
[0045] S2. Cluster the feature maps in the feature map set to obtain feature map subsets of different categories;
[0046] S3. Normalize the feature maps in the feature map subsets of different categories respectively to obtain normalized feature map subsets of each category;
[0047] S4. Perform weighted summation on the feature maps in the subsets of the normalized feature maps for each category to obtain the final feature map for each category.
[0048] As described in step S1 above, since the feature map output by the neural network includes multiple feature maps, all the feature maps output by the neural network are treated as a single feature map set. The neural network can be a convolutional neural network or other similar neural network; this embodiment of the invention does not limit the scope of the invention.
[0049] As explained in step S2 above, clustering divides a dataset into different classes or clusters according to a specific criterion, maximizing the similarity of data objects within the same cluster and maximizing the differences between data objects in different clusters. In other words, after clustering, data of the same class should be grouped together as much as possible, while data of different classes should be separated as much as possible. This embodiment of the invention, by clustering feature maps in a feature map set according to set rules, can obtain subsets of feature maps of different categories, thereby achieving the decomposition of different features.
[0050] As in step S3 above, it should be noted that if the feature maps of each category are directly weighted and summed, only the relatively important regions of each category can be obtained, but the regions and features that the network is most concerned with cannot be explained. Therefore, it is necessary to normalize all feature maps.
[0051] As described in step S4 above, by performing weighted summation on the feature maps in the feature map subsets of each normalized category, the final feature map for the corresponding category can be obtained, thus completing the final decomposition of image features.
[0052] The present invention is applied to the feature decomposition of bird images, and the feature decomposition diagram is shown below. Figure 3 As shown, by Figure 3 It can be observed that the embodiments of the present invention can effectively distinguish between the bird's head and abdomen, and that the bird's head is more important to the network than the bird's abdomen. Based on this, we can manipulate different features to study the causal relationships between different features, as well as between different features and the network output.
[0053] The embodiments of the present invention are also applied to feature decomposition of images in medical imaging, for example, visualization of pneumonia detection features.
[0054] This invention, through its embodiments, obtains a feature map set output by a neural network; clusters the feature maps in the feature map set to obtain feature map subsets of different categories; normalizes the feature maps in the feature map subsets of different categories to obtain normalized feature map subsets of each category; and performs weighted summation on the feature maps in the normalized feature map subsets of each category to obtain the final feature map of each category. This effectively decomposes different features and reveals the relative importance between different features.
[0055] Furthermore, based on this, the relationships between different features of interest to the network and between these features and the network output can be well studied, facilitating subsequent causal analysis. Finally, the image feature decomposition method provided in this embodiment can be easily applied to different networks without changing the network architecture, and the method is simple and has good scalability.
[0056] In one embodiment, clustering the feature maps in the feature map set to obtain feature maps of different categories includes:
[0057] The feature maps in the feature map set are clustered according to the pixel where the maximum pixel value of the feature map is located, resulting in feature maps of different categories.
[0058] In this embodiment of the invention, it should be noted that by clustering the feature maps in the feature map set according to the different pixel points where the maximum pixel value of the feature map is located, the feature maps of different categories obtained are more distinct, thereby realizing the decomposition of different features.
[0059] To better understand why the feature maps in the feature map set are clustered based on the pixel with the largest pixel value, an example is given below for explanation.
[0060] Taking the ResNet50 network as an example, the network outputs 2048 feature maps, each with a size of 8x8 pixels. When visualizing features using Grad-Cam, interpolation is typically used to upsample the weighted summed feature maps to the original image size. Therefore, each pixel in the 8x8 feature map corresponds to a fixed region in the original image. Furthermore, experimental results show that each feature map has its own region of greatest interest; specifically, each feature map has its own maximum pixel value, and the pixel values surrounding the maximum pixel value are often also relatively large. This explains why the heatmap obtained by Grad-Cam has a wide range. Based on these reasons, the 2048 feature maps are clustered according to the pixel containing the maximum pixel value, thus completing the decomposition of different features.
[0061] In one embodiment, the step of normalizing the feature maps in the feature map subset includes:
[0062] Divide each pixel in the feature map by the maximum pixel value in the feature map set.
[0063] In this embodiment of the invention, it should be noted that step S3 involves normalizing the feature maps in the feature map subsets of different categories. Since the normalization process is the same for feature maps in the feature map subsets of different categories, the feature maps in each category are normalized according to the above-described normalization steps to obtain the normalized feature map subsets for each category. For ease of understanding, an example is given below.
[0064] Suppose that clustering the feature maps in the feature map set yields three subsets: the first subset, the second subset, and the third subset. Since the normalization method for each subset is the same, we will only explain the first subset as an example. Assume the first subset contains two feature maps, the first and the second, and the feature map set contains 2048 feature maps, with a maximum pixel value of 240. The normalization method for the first feature map is to divide each pixel value by 240. Similarly, the normalization method for the second feature map is to divide each pixel value by 240. After normalizing all feature maps in the first subset, we obtain the normalized first subset.
[0065] Furthermore, to facilitate understanding of the reason for normalizing the clustered feature maps, the following explanation is provided.
[0066] Taking the ResNet50 network as an example, after clustering the 2048 feature maps output by the ResNet50 network, a maximum of 64 categories of attributes can be obtained. Therefore, the feature map size is 8*8, resulting in only 64 similar points. However, there may be cases with fewer than 64 categories because the regions corresponding to some pixels are not the areas the network is most interested in. In this case, directly performing a weighted summation of the feature maps for each category only yields the relatively important regions for each category, failing to explain the regions the network is most interested in. This is because the Grad-Cam method performs a weighted summation of all feature maps, thus requiring overall normalization.
[0067] Please see Figure 4 In one embodiment, the step of performing a weighted summation of the feature maps in the normalized feature map subset includes:
[0068] S21. Obtain the gradient map corresponding to each feature map in the normalized feature map subset;
[0069] S22. Calculate the weight of each feature map based on the gradient map corresponding to each feature map;
[0070] S23. Perform a weighted summation of the feature maps in the normalized feature map subset according to the weights of each feature map.
[0071] In this embodiment of the invention, it should be noted that step S4 involves performing a weighted summation on the feature maps in the normalized feature map subsets for each category. Since the weighted summation method is the same for the feature maps in each category's feature map subset, by performing a weighted summation on the feature maps in the feature map subsets for each category according to the above-described weighted summation method, the final feature map for each category can be obtained. For ease of understanding, an example is given below.
[0072] Suppose there are three subsets of feature maps for different categories: the first category, the second category, and the third category. Since the weighted summation of the feature maps in each subset is performed in the same way, the following explanation will only use the first category's feature map subset as an example.
[0073] Suppose that the normalized feature map subset for the first category consists of two feature maps: a normalized first feature map and a normalized second feature map. Obtain the gradient map corresponding to the normalized first feature map and calculate its weights based on this gradient map. Obtain the gradient map corresponding to the normalized second feature map and calculate its weights based on this gradient map. Then, perform a weighted sum of the normalized first and second feature maps using their respective weights to obtain the final feature map for the first category.
[0074] In one embodiment, calculating the weights of each feature map based on the gradient map corresponding to each feature map includes:
[0075] The gradient maps corresponding to each feature map are globally averaged to obtain the weights of each feature map.
[0076] In this embodiment of the invention, by performing global averaging on the gradient maps corresponding to each feature map, the weights of each feature map can be obtained, and the feature maps can be weighted and summed according to their weights to obtain the final feature map.
[0077] Example 2:
[0078] Please see Figure 5 This invention provides an image feature decomposition apparatus, comprising:
[0079] Module 1 is used to acquire the feature map set output by the neural network;
[0080] Clustering module 2 is used to cluster the feature maps in the feature map set to obtain subsets of various feature maps;
[0081] Normalization processing module 3 is used to normalize the feature maps in the feature map subsets of different categories respectively, so as to obtain the normalized feature map subsets of each category;
[0082] The weighted summation module 4 is used to perform weighted summation on the feature maps in the feature map subsets of each normalized category to obtain the final feature map for each category.
[0083] As described in step 1 above, since the final feature map output by the neural network includes multiple feature maps, all the final feature maps output by the neural network are treated as a single feature map set. The neural network can be a convolutional neural network or other similar neural network; this embodiment of the invention does not limit the scope of the neural network.
[0084] As described in clustering module 2 above, clustering is the process of dividing a dataset into different classes or clusters according to a specific criterion, maximizing the similarity of data objects within the same cluster and maximizing the differences between data objects in different clusters. In other words, after clustering, data of the same class should be grouped together as much as possible, while data of different classes should be separated as much as possible. This embodiment of the invention, by clustering feature maps in a feature map set according to set rules, can obtain subsets of feature maps of different categories, thereby achieving the decomposition of different features.
[0085] As mentioned in the normalization module 3 above, it should be noted that if the feature maps of each category are directly weighted and summed, only the relatively important regions of each category can be obtained, but the regions and features that the network is most concerned with cannot be explained. Therefore, it is necessary to normalize all feature maps.
[0086] As described in the weighted summation module 4 above, by performing weighted summation on the feature maps in the feature map subsets of each normalized category, the final feature map of the corresponding category can be obtained, thereby completing the final decomposition of image features.
[0087] This invention, through its embodiments, obtains a feature map set output by a neural network; clusters the feature maps in the feature map set to obtain feature map subsets of different categories; normalizes the feature maps in the feature map subsets of different categories to obtain normalized feature map subsets of each category; and performs weighted summation on the feature maps in the normalized feature map subsets of each category to obtain the final feature map of each category. This effectively decomposes different features and reveals the relative importance between different features.
[0088] In one embodiment, the clustering module 2 is specifically used for:
[0089] The feature maps in the feature map set are clustered according to the pixel where the maximum pixel value of the feature map is located, resulting in feature maps of different categories.
[0090] In this embodiment of the invention, it should be noted that by clustering the feature maps in the feature map set according to the different pixel points where the maximum pixel value of the feature map is located, the feature maps of different categories obtained are more distinct, thereby realizing the decomposition of different features.
[0091] To better understand why the feature maps in the feature map set are clustered based on the pixel with the largest pixel value, an example is given below for explanation.
[0092] Taking the ResNet50 network as an example, the network outputs 2048 feature maps, each with a size of 8x8 pixels. When visualizing features using Grad-Cam, interpolation is typically used to upsample the weighted summed feature maps to the original image size. Therefore, each pixel in the 8x8 feature map corresponds to a fixed region in the original image. Furthermore, experimental results show that each feature map has its own region of greatest interest; specifically, each feature map has its own maximum pixel value, and the pixel values surrounding the maximum pixel value are often also relatively large. This explains why the heatmap obtained by Grad-Cam has a wide range. Based on these reasons, the 2048 feature maps are clustered according to the pixel containing the maximum pixel value, thus completing the decomposition of different features.
[0093] In one embodiment, the normalization processing module 3 is specifically used for:
[0094] Divide each pixel in the feature map by the maximum pixel value in the feature map set.
[0095] In this embodiment of the invention, it should be noted that since the normalization process for feature maps in different categories of feature map subsets is the same, the feature maps in each category of feature map subsets are normalized according to the above-described normalization steps to obtain the normalized feature map subsets for each category. For ease of understanding, an example is given below.
[0096] Suppose that clustering the feature maps in the feature map set yields three subsets: the first subset, the second subset, and the third subset. Since the normalization method for each subset is the same, we will only explain the first subset as an example. Assume the first subset contains two feature maps, the first and the second, and the feature map set contains 2048 feature maps, with a maximum pixel value of 240. The normalization method for the first feature map is to divide each pixel value by 240. Similarly, the normalization method for the second feature map is to divide each pixel value by 240. After normalizing all feature maps in the first subset, we obtain the normalized first subset.
[0097] Furthermore, to facilitate understanding of the reason for normalizing the clustered feature maps, the following explanation is provided.
[0098] Taking the ResNet50 network as an example, after clustering the 2048 feature maps output by the ResNet50 network, a maximum of 64 categories of attributes can be obtained. Therefore, the feature map size is 8*8, resulting in only 64 similar points. However, there may be cases with fewer than 64 categories because the regions corresponding to some pixels are not the areas the network is most interested in. In this case, directly performing a weighted summation of the feature maps for each category only yields the relatively important regions for each category, failing to explain the regions the network is most interested in. This is because the Grad-Cam method performs a weighted summation of all feature maps, thus requiring overall normalization.
[0099] In one embodiment, the weighted summation module 4 is specifically used for:
[0100] Obtain the gradient map corresponding to each feature map in the normalized feature map subset;
[0101] Calculate the weights of each feature map based on the gradient map corresponding to each feature map;
[0102] The feature maps in the normalized feature map subset are weighted and summed according to their respective weights.
[0103] In this embodiment of the invention, it should be noted that since the weighted summation of the feature maps in each category's feature map subset is performed in the same way after normalization, the final feature map for each category can be obtained by performing a weighted summation on the feature maps in the feature map subset according to the above-mentioned weighted summation method. For ease of understanding, an example is given below.
[0104] Suppose there are three subsets of feature maps for different categories: the first category, the second category, and the third category. Since the weighted summation of the feature maps in each subset is performed in the same way, the following explanation will only use the first category's feature map subset as an example.
[0105] Suppose that the normalized feature map subset for the first category consists of two feature maps: a normalized first feature map and a normalized second feature map. Obtain the gradient map corresponding to the normalized first feature map and calculate its weights based on this gradient map. Obtain the gradient map corresponding to the normalized second feature map and calculate its weights based on this gradient map. Then, perform a weighted sum of the normalized first and second feature maps using their respective weights to obtain the final feature map for the first category.
[0106] In one embodiment, calculating the weights of each feature map based on the gradient map corresponding to each feature map includes:
[0107] The gradient maps corresponding to each feature map are globally averaged to obtain the weights of each feature map.
[0108] In this embodiment of the invention, by performing global averaging on the gradient maps corresponding to each feature map, the weights of each feature map can be obtained, and the feature maps can be weighted and summed according to their weights to obtain the final feature map.
[0109] Example 3:
[0110] Reference Figure 6 This application also provides a computer device, which may be a server, and its internal structure may be as follows: Figure 6As shown. The computer device includes a processor, memory, neural network interface, and database connected via a system bus. The processor is designed to provide computational and control capabilities. The memory includes a non-volatile storage medium and internal memory. The non-volatile storage medium stores the operating system, computer programs, and the database. The memory provides an environment for the operation of the operating system and computer programs in the non-volatile storage medium. The database stores data applicable to an image feature decomposition method, etc. The neural network interface is used to communicate with external terminals via a neural network connection. When the computer program is executed by the processor, it implements an image feature decomposition method, including: acquiring a feature map set output by a neural network; clustering the feature maps in the feature map set to obtain feature map subsets of different categories; normalizing the feature maps in the feature map subsets of different categories to obtain normalized feature map subsets of each category; and weighted summing the feature maps in the normalized feature map subsets of each category to obtain the final feature map of each category.
[0111] This invention, through its embodiments, obtains a feature map set output by a neural network; clusters the feature maps in the feature map set to obtain feature map subsets of different categories; normalizes the feature maps in the feature map subsets of different categories to obtain normalized feature map subsets of each category; and performs weighted summation on the feature maps in the normalized feature map subsets of each category to obtain the final feature map of each category. This effectively decomposes different features and reveals the relative importance between different features.
[0112] Example 4:
[0113] This application also provides a computer-readable storage medium storing a computer program thereon. When the computer program is executed by a processor, it implements an image feature decomposition method, including the steps of: acquiring a feature map set output by a neural network; clustering the feature maps in the feature map set to obtain feature map subsets of different categories; normalizing the feature maps in the feature map subsets of different categories to obtain normalized feature map subsets of each category; and performing weighted summation on the feature maps in the feature map subsets of each category after normalization to obtain the final feature map of each category.
[0114] This invention provides an embodiment of the invention that obtains a feature map set output by a neural network; clusters the feature maps in the feature map set to obtain feature map subsets of different categories; and normalizes the feature maps in the feature map subsets of different categories to obtain normalized feature map subsets of each category.
[0115] By performing weighted summation on the feature maps in the subsets of the normalized feature maps for each category, the final feature map for each category is obtained. This effectively decomposes different features and shows the relative importance of different features.
[0116] Those skilled in the art will understand that all or part of the processes in the methods of the above embodiments can be implemented by a computer program instructing related hardware. The computer program can be stored in a non-volatile computer-readable storage medium. When executed, the computer program can include the processes of the embodiments of the above methods. Any references to memory, storage, databases, or other media provided in this application and in the embodiments may include non-volatile and / or volatile memory. Non-volatile memory may include read-only memory (ROM), programmable ROM (PROM), electrically programmable ROM (EPROM), electrically erasable programmable ROM (EEPROM), or flash memory. Volatile memory may include random access memory (RAM) or external cache memory. By way of illustration and not limitation, RAM is available in a variety of forms, such as static RAM (SRAM), dynamic RAM (DRAM), synchronous DRAM (SDRAM), dual-speed SDRAM (SSRSDRAM), enhanced SDRAM (ESDRAM), synchronous link DRAM (SLDRAM), RAMbus direct RAM (RDRAM), direct memory bus dynamic RAM (DRDRAM), and memory bus dynamic RAM (RDRAM), etc.
[0117] It should be noted that, in this document, the terms "comprising," "including," or any other variations thereof are intended to cover non-exclusive inclusion, such that a process, apparatus, article, or method that comprises a list of elements includes not only those elements but also other elements not expressly listed, or elements inherent to such a process, apparatus, article, or method. Unless otherwise specified, an element defined by the phrase "comprising one..." does not exclude the presence of other identical elements in the process, apparatus, article, or method that includes that element.
[0118] The above description is only a preferred embodiment of this application and does not limit the patent scope of this application. Any equivalent structural or procedural changes made based on the content of this application's specification and drawings, or direct or indirect applications in other related technical fields, are similarly included within the patent protection scope of this application.
Claims
1. An image feature decomposition method, characterized in that, include: Obtain the feature map set output by the neural network, wherein the neural network is a convolutional neural network, and the feature map in the feature map set is an 8×8 feature map; The feature maps in the feature map set are clustered according to the pixel where the maximum pixel value of the feature map is located, resulting in feature map subsets of different categories. The feature maps in the feature map subsets of different categories are normalized to obtain normalized feature map subsets for each category; The feature maps in the subsets of the normalized feature maps for each category are weighted and summed to obtain the final feature maps for each category. The step of performing a weighted summation of the feature maps in the normalized feature map subset includes: Obtain the gradient map corresponding to each feature map in the normalized feature map subset; Calculate the weights of each feature map based on the gradient map corresponding to each feature map; The feature maps in the normalized feature map subset are weighted and summed according to the weights of each feature map. The step of calculating the weights of each feature map based on the gradient map corresponding to each feature map includes: The gradient maps corresponding to each feature map are globally averaged to obtain the weights of each feature map.
2. The image feature decomposition method according to claim 1, characterized in that, The steps for normalizing the feature maps in the subset of feature maps include: Divide each pixel in the feature map by the maximum pixel value in the feature map set.
3. An image feature decomposition device, characterized in that, include: The acquisition module is used to acquire the feature map set output by the neural network, wherein the neural network is a convolutional neural network, and the feature map in the feature map set is an 8×8 feature map. The clustering module is used to cluster the feature maps in the feature map set according to the different pixel points where the maximum pixel value of the feature map is located, so as to obtain feature map subsets of different categories. The normalization processing module is used to normalize the feature maps in the feature map subsets of different categories respectively, so as to obtain the normalized feature map subsets of each category; The weighted summation module is used to perform weighted summation on the feature maps in the feature map subsets of each normalized category to obtain the final feature map for each category. The weighted summation module is specifically used for: Obtain the gradient map corresponding to each feature map in the normalized feature map subset; Calculate the weights of each feature map based on the gradient map corresponding to each feature map; The feature maps in the normalized feature map subset are weighted and summed according to the weights of each feature map. The step of calculating the weights of each feature map based on the gradient map corresponding to each feature map includes: The gradient maps corresponding to each feature map are globally averaged to obtain the weights of each feature map.
4. A computer device comprising a memory and a processor, wherein the memory stores a computer program, characterized in that, When the processor executes the computer program, it implements the steps of the method of claim 1 or 2.
5. A computer-readable storage medium having a computer program stored thereon, characterized in that, When the computer program is executed by a processor, it implements the steps of the method of claim 1 or 2.