Image detection method, device, medium and apparatus

By employing multi-round detection and feature mask fusion decoding techniques, the actual abnormal region is gradually approximated, solving the problem of insufficient accuracy in abnormal image detection in existing technologies and achieving higher precision in abnormal region localization and detection.

CN116958025BActive Publication Date: 2026-06-26TENCENT TECH SHANGHAI

Patent Information

Authority / Receiving Office
CN · China
Patent Type
Patents(China)
Current Assignee / Owner
TENCENT TECH SHANGHAI
Filing Date
2022-12-15
Publication Date
2026-06-26

AI Technical Summary

Technical Problem

Existing technologies for abnormal image detection and abnormal region localization are not accurate enough, and are prone to misjudging normal images as abnormal.

Method used

A multi-round detection method is adopted. The feature encoding map of the image to be detected is masked and fused using the abnormal region of the previous detection round to obtain the current feature fusion map. The current abnormal region is determined by feature decoding, and the actual abnormal region is approximated round by round.

Benefits of technology

It improves the accuracy of anomaly location and anomaly detection, and reduces false positives.

✦ Generated by Eureka AI based on patent content.

Smart Images

  • Figure CN116958025B_ABST
    Figure CN116958025B_ABST
Patent Text Reader

Abstract

The application discloses an image detection method and device, a medium and equipment, relates to the technical field of image processing, and comprises the following steps: acquiring at least one first feature coding graph corresponding to a to-be-detected image, and an abnormal region corresponding to the to-be-detected image in a previous detection round; performing mask fusion on the at least one first feature coding graph based on the abnormal region, to obtain a current feature fusion graph; the current feature fusion graph represents image features of the to-be-detected image except the abnormal region; performing feature decoding processing on the current feature fusion graph to obtain at least one current feature decoding graph; and determining a current abnormal region corresponding to the to-be-detected image in a current detection round according to the at least one first feature coding graph and the at least one current feature decoding graph. The application can improve the accuracy of abnormal detection and abnormal region positioning.
Need to check novelty before this filing date? Find Prior Art

Description

Technical Field

[0001] This application relates to the field of image processing technology, specifically to image detection methods, apparatus, media, and equipment. Background Technology

[0002] Artificial Intelligence (AI) is a comprehensive technology within computer science that studies the design principles and implementation methods of various intelligent machines, enabling them to possess perception, reasoning, and decision-making capabilities. AI technology is a multidisciplinary field encompassing a wide range of areas, including natural language processing, machine learning, and deep learning. With technological advancements, AI will be applied in more fields and play an increasingly important role.

[0003] In related technologies, artificial intelligence-based image processing methods can be used to detect abnormal images and locate abnormal regions. However, the location of abnormal regions is still not accurate enough, and in extreme cases, normal images may be judged as abnormal. Summary of the Invention

[0004] To improve the accuracy of anomaly detection and anomaly region localization, this application provides an image detection method, apparatus, medium, and device. The technical solution is as follows:

[0005] In a first aspect, this application provides an image detection method, the method comprising:

[0006] Obtain at least one first feature encoding map corresponding to the image to be detected, and the previous abnormal region corresponding to the image to be detected in the previous detection round;

[0007] Based on the previous abnormal region, the at least one first feature coding map is masked and fused to obtain the current feature fusion map; the current feature fusion map represents the image features of the image to be detected other than the previous abnormal region;

[0008] The current feature fusion map is subjected to feature decoding processing to obtain at least one current feature decoding map;

[0009] Based on the at least one first feature encoding map and the at least one current feature decoding map, the current abnormal region corresponding to the image to be detected in the current detection round is determined.

[0010] Secondly, this application provides an image detection apparatus, the apparatus comprising:

[0011] The acquisition module is used to acquire at least one first feature encoding map corresponding to the image to be detected, and the previous abnormal region corresponding to the image to be detected in the previous detection round;

[0012] The mask fusion module is used to perform mask fusion on the at least one first feature coding map based on the previous abnormal region to obtain a current feature fusion map; the current feature fusion map represents the image features of the image to be detected other than the previous abnormal region;

[0013] The decoding module is used to perform feature decoding processing on the current feature fusion map to obtain at least one current feature decoding map;

[0014] An abnormal region determination module is used to determine the current abnormal region corresponding to the image to be detected in the current detection round based on the at least one first feature encoding map and the at least one current feature decoding map.

[0015] Thirdly, this application provides a computer-readable storage medium storing at least one instruction or at least one program, which is loaded and executed by a processor to implement the image detection method as described in the first aspect.

[0016] Fourthly, this application provides a computer device including a processor and a memory, wherein the memory stores at least one instruction or at least one program, the at least one instruction or at least one program being loaded and executed by the processor to implement the image detection method as described in the first aspect.

[0017] Fifthly, this application provides a computer program product comprising computer instructions that, when executed by a processor, implement the image detection method as described in the first aspect.

[0018] The image detection method, apparatus, medium, and device provided in this application have the following technical effects:

[0019] The solution provided in this application utilizes the previous abnormal region corresponding to the image to be detected in the previous detection round to perform mask fusion on at least one first feature encoding map corresponding to the image to be detected, thereby obtaining a current feature fusion map. The current feature fusion map can represent the image features of the image to be detected other than the previous abnormal region. The current feature fusion map is then subjected to feature decoding processing to obtain at least one current feature decoding map. The current feature decoding map is the result of masking the image features corresponding to the previous abnormal region and decoding and restoring the previous abnormal region as if it were a normal region. For the features within the previous abnormal region, there will be differences in feature representation between at least one first feature encoding map and at least one current feature decoding map. By utilizing the differences in feature representation, the current abnormal region in the current detection round can be determined. Compared to the previous abnormal region, the current abnormal region can correct the part of the previous abnormal region that was misjudged as an abnormal region to a normal region, thereby improving the accuracy of abnormal region localization.

[0020] The solution provided in this application also employs a multi-round detection method, which continuously brings the current abnormal area closer to the actual abnormal area, thereby improving the accuracy of abnormal area location and enhancing the accuracy of abnormal detection.

[0021] Additional aspects and advantages of this application will be set forth in part in the description which follows, and in part will be obvious from the description, or may be learned by practice of this application. Attached Figure Description

[0022] To more clearly illustrate the technical solutions and advantages in the embodiments of this application or the prior art, the drawings used in the description of the embodiments or the prior art will be briefly introduced below. Obviously, the drawings described below are only some embodiments of this application. For those skilled in the art, other drawings can be obtained based on these drawings without creative effort.

[0023] Figure 1 This is a schematic diagram of the implementation environment of an image detection method provided in an embodiment of this application;

[0024] Figure 2 This is a schematic flowchart of an image detection method provided in an embodiment of this application;

[0025] Figure 3 This is a schematic diagram of a process for mask fusion of a first feature coding map provided in an embodiment of this application;

[0026] Figure 4 This is a schematic diagram of another process for masking and fusing the first feature coding map provided in an embodiment of this application;

[0027] Figure 5This is a flowchart illustrating a process for determining a current abnormal region, provided in an embodiment of this application.

[0028] Figure 6 This is a schematic diagram of a process for performing the first round of detection on an image to be detected, provided in an embodiment of this application;

[0029] Figure 7 This is a schematic flowchart of an image detection method based on a back distillation anomaly detection model provided in an embodiment of this application;

[0030] Figure 8 This is a schematic diagram of the training process of a reverse distillation anomaly detection model provided in an embodiment of this application;

[0031] Figure 9 This is a schematic diagram of a process for mask fusion of the feature coding map of a first sample provided in an embodiment of this application;

[0032] Figure 10 This is a schematic diagram of a process for defect detection in industrial products provided in an embodiment of this application;

[0033] Figure 11 This is a schematic diagram illustrating the difference in detection accuracy between the technical solution provided in this application and related technologies, as provided in an embodiment of this application.

[0034] Figure 12 This is a schematic diagram of an image detection device provided in an embodiment of this application;

[0035] Figure 13 This is a schematic diagram of the hardware structure of a device for implementing an image detection method provided in an embodiment of this application. Detailed Implementation

[0036] Artificial Intelligence (AI) is the theory, methods, technology, and application systems that use digital computers or machines controlled by digital computers to simulate, extend, and expand human intelligence, perceive the environment, acquire knowledge, and use that knowledge to achieve optimal results. In other words, AI is a comprehensive technology within computer science that attempts to understand the essence of intelligence and produce a new kind of intelligent machine that can react in a way similar to human intelligence. AI studies the design principles and implementation methods of various intelligent machines, enabling them to have perception, reasoning, and decision-making capabilities. AI technology is a comprehensive discipline involving a wide range of fields, encompassing both hardware and software technologies. Fundamental AI technologies generally include sensors, dedicated AI chips, cloud computing, distributed storage, big data processing technology, operating / interactive systems, and mechatronics.

[0037] The solutions provided in this application involve technologies such as deep learning (DL) in artificial intelligence.

[0038] Deep learning (DL) is a major research direction in the field of machine learning (ML), bringing it closer to its original goal—artificial intelligence. Deep learning learns the inherent patterns and hierarchical representations of sample data; the information gained during this learning process greatly aids in interpreting data such as text, images, and sound. Its ultimate goal is to enable machines to possess analytical and learning capabilities like humans, capable of recognizing data such as text, images, and sound. Deep learning is a complex machine learning algorithm that has achieved results in speech and image recognition far exceeding previous related technologies. Deep learning has yielded significant achievements in search technology, data mining, machine learning, machine translation, natural language processing, multimedia learning, speech recognition, recommendation and personalization technologies, and other related fields. Deep learning enables machines to mimic human activities such as sight, hearing, and thought, solving many complex pattern recognition problems and significantly advancing artificial intelligence-related technologies.

[0039] The solutions provided in this application can be deployed in the cloud, and also involve cloud technologies.

[0040] Cloud technology refers to a hosting technology that unifies hardware, software, and network resources within a wide area network (WAN) or local area network (LAN) to achieve data computation, storage, processing, and sharing. It can also be understood as a general term for network technologies, information technologies, integration technologies, management platform technologies, and application technologies based on cloud computing business models. These technologies can form resource pools, allowing for on-demand use and flexibility. Backend services of cloud computing systems require substantial computing and storage resources, such as video websites, image websites, and many portal websites. With the rapid development and application of the internet industry, every item may have its own identification mark in the future, requiring transmission to backend systems for logical processing. Data at different levels will be processed separately, and various industry data require robust system support; therefore, cloud technology relies on cloud computing as its foundation. Cloud computing is a computing model that distributes computing tasks across a resource pool composed of numerous computers, enabling various application systems to obtain computing power, storage space, and information services as needed. The network providing these resources is called the "cloud." From the user's perspective, resources in the "cloud" are infinitely scalable, readily available, and can be used on demand, expanded at any time, and paid for based on usage. As a provider of fundamental cloud computing capabilities, a cloud resource pool platform, often referred to as a cloud platform or Infrastructure as a Service (IaaS), is established. This platform deploys various types of virtual resources within the resource pool for external customers to choose from. The cloud resource pool primarily includes: computing devices (which can be virtualized machines containing operating systems), storage devices, and network devices.

[0041] To improve the accuracy of anomaly detection and anomaly region localization, embodiments of this application provide image detection methods, apparatus, media, and devices. The technical solutions of the embodiments of this application will be clearly and completely described below with reference to the accompanying drawings. Obviously, the described embodiments are only a part of the embodiments of this application, and not all of them. Based on the embodiments of this application, all other embodiments obtained by those skilled in the art without creative effort are within the scope of protection of this application. Examples of the embodiments are shown in the accompanying drawings, wherein the same or similar reference numerals denote the same or similar elements or elements having the same or similar functions throughout.

[0042] It should be noted that the terms "first," "second," etc., in the specification, claims, and accompanying drawings of this application are used to distinguish similar objects and are not necessarily used to describe a specific order or sequence. It should be understood that such data can be interchanged where appropriate so that the embodiments of this application described herein can be implemented in orders other than those illustrated or described herein. Furthermore, the terms "comprising" and "having," and any variations thereof, are intended to cover non-exclusive inclusion; for example, a process, method, system, product, or server that comprises a series of steps or units is not necessarily limited to those steps or units explicitly listed, but may include other steps or units not explicitly listed or inherent to such processes, methods, products, or devices.

[0043] It is understood that in the specific embodiments of this application, data such as user information are involved. When the above embodiments of this application are applied to specific products or technologies, user permission or consent is required, and the collection, use and processing of related data must comply with the relevant laws, regulations and standards of the relevant countries and regions.

[0044] Please see Figure 1 This is a schematic diagram of the implementation environment of an image detection method provided in an embodiment of this application, such as... Figure 1 As shown, the implementation environment may include at least a client 110 and a server 120.

[0045] Specifically, the client 110 may include devices such as smartphones, desktop computers, tablets, laptops, in-vehicle terminals, digital assistants, smart wearable devices, image acquisition devices, and voice interaction devices. It may also include software running on the device, such as web pages provided to users by service providers, or applications provided by such service providers. Specifically, the client 110 can be used to acquire an image to be detected and upload it to the server 120. As an image acquisition device, the client 110 can also generate an image of the object to be detected and use it as the image to be detected.

[0046] Specifically, the server 120 can be a standalone physical server, a server cluster or distributed system composed of multiple physical servers, or a cloud server providing basic cloud computing services such as cloud services, cloud databases, cloud computing, cloud functions, cloud storage, network services, cloud communication, middleware services, domain name services, security services, CDN (Content Delivery Network), and big data and artificial intelligence platforms. The server 120 may include a network communication unit, a processor, and a memory, etc. The terminal and the server can be directly or indirectly connected via wired or wireless communication, which is not limited herein. Specifically, the server 120 can be used to perform multiple rounds of image detection processing on the image to be detected, ultimately determining whether the image to be detected is abnormal and, if not, identifying the abnormal region of the image.

[0047] This application embodiment can also be implemented using cloud technology. Cloud technology refers to a hosting technology that unifies hardware, software, and network resources within a wide area network (WAN) or local area network (LAN) to achieve data computation, storage, processing, and sharing. It can also be understood as a general term for network technologies, information technologies, integration technologies, management platform technologies, and application technologies based on cloud computing business models. Cloud technology requires cloud computing as its support. Cloud computing is a computing model that distributes computing tasks across a resource pool composed of a large number of computers, enabling various application systems to obtain computing power, storage space, and information services as needed. The network providing these resources is called the "cloud." Specifically, the server 120 and the database are located in the cloud. The server 120 can be a physical machine or a virtualized machine.

[0048] The following describes an image detection method provided in this application. Figure 2 This is a flowchart illustrating an image detection method provided in an embodiment of this application. This application provides the operational steps of the method described in the embodiments or flowchart, but based on conventional or non-inventive methods, more or fewer operational steps may be included. The order of steps listed in the embodiments is merely one possible execution order among many and does not represent the only possible execution order. In actual system or server product execution, the method can be executed sequentially according to the embodiments or drawings, or in parallel (e.g., in a parallel processor or multi-threaded processing environment). Please refer to... Figure 2 An image detection method provided in this application embodiment may include the following steps:

[0049] S210: Obtain at least one first feature encoding map corresponding to the image to be detected, and the previous abnormal region corresponding to the image to be detected in the previous detection round.

[0050] In this embodiment, the image to be detected is the image whose abnormality needs to be determined, and if it is an abnormal image, the abnormal region needs to be identified. The abnormal image can be an image containing sensitive information or an image indicating that the photographed item has a defect. The abnormal region is the image area corresponding to the sensitive information or the defect.

[0051] In this embodiment, the first feature encoding map is a feature map obtained after feature encoding processing of the image to be detected, containing features of normal regions and features of potentially abnormal regions in the image to be detected. Specifically, in the current detection round, the image to be detected is input into multiple encoders, and each encoder performs multi-scale feature extraction and encoding processing on the image to be detected to obtain at least one first feature encoding map. The scales of each first feature encoding map are different. Multi-scale processing can efficiently detect and extract features of targets of different sizes in the image to be detected, improving the accuracy of detection. In this embodiment, the image to be detected is detected in multiple rounds. In each detection round, at least one first feature encoding map corresponding to the image to be detected can be the same. Therefore, at least one first feature encoding map generated in the first round of detection can be reused to improve the efficiency of image detection.

[0052] In the embodiments of this application, steps S210 to S240 are described with the Xth (X = 2, 3, ...) detection round as the current detection round. The previous detection round is also the X-1th detection round. In each detection round, the abnormal region corresponding to the image to be detected is determined and used as the previous abnormal region in the next detection round.

[0053] S220: Based on the previous abnormal region, perform mask fusion on at least one first feature coding map to obtain the current feature fusion map; the current feature fusion map represents the image features of the image to be detected other than the previous abnormal region.

[0054] In this embodiment, the previous abnormal region is the image region identified as sensitive information or a defect in the previous detection round. Based on the previous abnormal region, the features corresponding to the previous abnormal region in at least one first feature encoding map are masked, and the at least one masked first feature encoding map is fused to obtain the current feature fusion map in the current detection round. Since the features corresponding to the previous abnormal region are masked, the current feature fusion map represents the image features of the image to be detected excluding the previous abnormal region, that is, it represents the image features contained in the previous normal region corresponding to the image to be detected in the previous detection round.

[0055] In one embodiment of this application, step S220 can be specifically implemented as follows:

[0056] S221: Based on the previous abnormal region, determine at least one current mask region; the at least one current mask region and at least one first feature code. Figure 1 One-to-one correspondence.

[0057] When at least one first feature encoding map has a different scale, the previous abnormal region is scaled to determine at least one current mask region, such that the at least one current mask region is aligned with at least one first feature encoding map. Figure 1 One-to-one correspondence, such as Figure 3 As shown, the current mask region Ma1 and the first feature encoding map f E1 Correspondingly, the first feature encoding map f can be generated based on the current mask region Ma1. E1 Masking is performed, and similarly, the current masked region Ma2 is compared with the first feature encoding map f. E2 Correspondingly, the current mask region Ma3 corresponds to the first feature encoding map f. E3 correspond.

[0058] S222: Based on at least one current mask region, perform masking processing on at least one first feature coding map to obtain at least one second feature coding map.

[0059] In one specific implementation, specifically, such as Figure 4 As shown, step S222 can be implemented as follows:

[0060] S2221: Based on the target mask region, determine the feature occlusion region and the feature preservation region in the target feature coding map; the target feature coding map is any first feature coding map in at least one first feature coding map, and the target mask region is the current mask region in at least one current mask region corresponding to the target feature coding map.

[0061] That is, based on the target mask region, the target feature coding map is divided into regions, where the feature occlusion region is equivalent to the target mask region, that is, the feature region in the target feature coding map that corresponds to the previous abnormal region, and the feature preservation region is the feature region in the target feature coding map that corresponds to the previous normal region.

[0062] S2222: Update the feature data within the feature occlusion region to obtain the updated feature occlusion region.

[0063] In some embodiments, the feature data can be set to zero or noise data can be added to the feature data to obtain an updated feature occlusion region. The feature data is the feature representation data of the feature points, which can be in vector form. Setting the feature data within the feature occlusion region to zero or adding noise data achieves the occlusion of features corresponding to the previous abnormal region, distinguishing it from the features corresponding to the previous normal region, thus facilitating the subsequent determination of the region requiring feature decoding and restoration.

[0064] S2223: Based on the updated feature occlusion region and feature preservation region, obtain the second feature encoding map corresponding to the target feature encoding map.

[0065] For example, such as Figure 3 As shown, f E1 It can represent the eigenvectors of each feature point, and Ma1 is f E1 The feature masking area, f E1 The feature preservation region is achieved based on formula (1) to preserve f. E1 The masking process yields the corresponding second feature encoding map f'. E1 :

[0066]

[0067] Formula (1) represents the... With Ma1, Perform a dot product operation between them, set the region Ma1 to 0, and leave the other regions unchanged. Correspondingly, adjust f according to Ma2 and Ma3. E2 f E3 Perform a similar masking process to obtain f'. E2 f' E3 .

[0068] In the embodiments disclosed in S2221-S2223, based on at least one current mask region, the feature masking regions of each first feature coding map are masked accordingly, and each of the resulting second feature coding maps can completely and accurately retain the features corresponding to the previous normal region in the corresponding first feature coding map, while accurately masking the features corresponding to the previous abnormal region, thereby ensuring the reliability of the subsequent feature decoding and restoration results.

[0069] S223: Downsample and concatenate at least one second feature coding map to obtain the current feature fusion map.

[0070] When at least one first feature coding map has a different scale, at least one second feature coding map will also have a different scale, thus requiring a scale transformation of the at least one second feature coding map. Specifically, as follows: Figure 3 As shown, based on the scale ratio relationship between at least one first feature coding map, for f' E1 The f" is obtained by performing downsampling processing with two 3x3 convolutions. E1 , for f' E2 Performing a 3x3 convolution downsampling process yields f". E2 , by f' E3 Directly determine f" E3 ,f"E1 f" E2 and f" E3 The scales are the same, then f" E1 f" E2 and f" E3 The feature data of feature points (i.e., pixels) at the same location are concatenated (concat) to obtain a concatenated feature map. For example, feature point A in f" E1 f" E2 and f" E3 The dimensions of the feature data in the image are p, q and r, respectively. The dimension of the feature data of feature point A obtained after concatenation is p+q+r. A 1×1 convolution operation can also be performed on the concatenated feature map to reduce the number of channels and obtain the current feature fusion map.

[0071] In the embodiments disclosed in S221-S223, when the scales of at least one first feature coding map are different, a current masking region that matches the scale of each first feature coding map is determined based on the previous abnormal region. This allows the feature regions corresponding to the previous abnormal region in each first feature coding map to be accurately masked, ensuring that each masked feature region corresponds to the previous abnormal region. This provides an accurate data foundation for subsequent feature decoding and restoration. At the same time, at least one second feature coding map is downsampled and stitched together. The resulting current feature fusion map retains the information used for feature decoding and restoration while reducing the redundancy of feature data, thereby improving the processing efficiency of image detection.

[0072] S230: Perform feature decoding processing on the current feature fusion map to obtain at least one current feature decoding map.

[0073] In this embodiment, the current feature fusion map is subjected to feature decoding processing to obtain at least one current feature decoded map in the current detection round. The feature data contained in the at least one current feature decoded map obtained in different detection rounds are different. Specifically, in the current detection round, the current feature fusion map is input into multiple decoders, and each encoder performs multi-scale feature decoding and reconstruction processing on the current feature fusion map to obtain at least one current feature decoded map. The scales of the current feature decoded maps are different.

[0074] It is understandable that the current feature fusion map represents the image features corresponding to the previous normal region determined in the previous detection round, while the image features corresponding to the previous abnormal region are occluded. In the decoding stage, feature restoration is performed on the premise that the image to be detected is a normal image, that is, the occluded feature region is regarded as a normal feature region and restored.

[0075] S240: Determine the current abnormal region corresponding to the image to be detected in the current detection round based on at least one first feature encoding map and at least one current feature decoding map.

[0076] In this embodiment, at least one current feature decoding map and at least one first feature encoding map have a one-to-one scale correspondence. The first feature encoding map is a feature map obtained after feature encoding processing of the image to be detected. When the image to be detected is actually an abnormal image, the first feature encoding map contains the image features corresponding to the actual normal region and the image features corresponding to the actual abnormal region of the image to be detected. The current feature decoding map, which has the same scale as the first feature encoding map, is the result of feature decoding and restoration after assuming that the image to be detected is a normal image and masking the features corresponding to the previous abnormal region in the first feature encoding map. If region A in the previous abnormal region belongs to the actual normal region, the encoding result in the first feature encoding map and the decoding result in the corresponding current feature decoding map will be consistent. If region B in the previous abnormal region belongs to the actual abnormal region, the encoding result in the first feature encoding map and the decoding result in the corresponding current feature decoding map will be significantly different. As a result, the first feature encoding map and the corresponding current feature decoding map produce a difference in the feature representation of region B. Region B can be identified as the current abnormal region in the current detection round, while region A will be corrected to belong to the current normal region in the current detection round. Compared with the previous abnormal region, the current abnormal region is closer to the actual abnormal region.

[0077] This application adopts a multi-round detection method. As the number of detection rounds increases, when the image to be detected is actually an abnormal image, the current abnormal area will continuously approach the actual abnormal area, thereby improving the accuracy of abnormal area detection and localization.

[0078] In some embodiments, specifically, such as Figure 5 As shown, step S240 may include the following steps:

[0079] S241: Determine at least one set of feature maps based on at least one first feature coding map and at least one current feature decoding map; the feature maps include the first feature coding map and the current feature decoding map with the same scale.

[0080] S242: Based on the similarity relationship between the first feature encoding map and the current feature decoding map in each group of feature maps, at least one anomaly index distribution map is obtained.

[0081] For each set of feature maps, for two feature points at the same position in the first feature encoding map and the current feature decoding map with the same scale, calculate the feature similarity index between them, such as cosine similarity, spatial distance, etc., to obtain the anomaly map corresponding to each set of feature maps. The scale of the anomaly map is consistent with the scale of the feature map, and the anomaly index indicated by the anomaly map is negatively correlated with the feature similarity index.

[0082] S243: Overlay at least one abnormal indicator distribution map to obtain the current abnormal indicator distribution map.

[0083] When at least one first feature coding map has a different scale, the scales of the anomaly index distribution maps will also be different. Therefore, it is necessary to transform at least one anomaly index distribution map to the same scale and then superimpose them to obtain the current anomaly index distribution map. For example, anomaly index distribution map A has a scale of 64*64, anomaly index distribution map B has a scale of 32*32, and anomaly index distribution map C has a scale of 16*16. Anomaly index distribution maps A, B, and C are upsampled at different scales to obtain anomaly index distribution maps A′, B′, and C′, each with a scale of 256*256. The anomaly index data corresponding to the same pixel position in anomaly index distribution maps A′, B′, and C′ are added together to obtain the current anomaly index distribution map with a scale of 256*256. Taking the image of the product to be detected as an example, the current anomaly index distribution map can be represented as an anomaly index heatmap of the product to be detected.

[0084] S244: Based on a preset threshold, perform binarization processing on the current abnormal indicator distribution map to determine the current abnormal region.

[0085] The current anomaly indicator distribution map represents the numerical distribution of anomaly indicators in the current detection round. By binarizing the current anomaly indicator distribution map according to a preset threshold, the current anomaly region and the current normal region corresponding to the image to be detected in the current detection round can be determined. For example, regions with anomaly indicator values ​​higher than the preset threshold are considered current anomaly regions, and the rest are considered current normal regions. In a special case, the current anomaly region can be empty, meaning that the image to be detected in the previous detection round was identified as an anomaly image but will be corrected to a normal image in the current detection round, avoiding false positives in anomaly detection and improving the accuracy of anomaly detection.

[0086] In some embodiments, the preset threshold can be inferred from the detection results of the validation sample set during the model training phase. The validation sample set consists of positive samples. The preset threshold can be the product of the maximum value of the abnormal indicator data obtained after the model detects each positive sample in the validation sample set and the application coefficient. The application coefficient can be determined according to business needs. For example, in the defect detection of industrial products, the application coefficient can be selected as 2.

[0087] In the above embodiments, the similarity relationship between the first feature encoding map and the current feature decoding map of the same scale is used to obtain a corresponding anomaly index distribution map. The anomaly index distribution map is used to characterize the degree of difference between the features of each feature point after feature encoding and decoding. Thus, the current abnormal region can be accurately and reliably determined based on the degree of difference and the size of a preset threshold. If region A in the previous abnormal region belongs to the actual normal region, the encoding result in the first feature encoding map and the decoding result in the corresponding current feature decoding map tend to be consistent. If region B in the previous abnormal region belongs to the actual abnormal region, the encoding result in the first feature encoding map and the decoding result in the corresponding current feature decoding map are significantly different. Thus, the first feature encoding map and the corresponding current feature decoding map produce a difference in the feature representation of region B. The abnormal index data in region B is high, so region B can be determined as the current abnormal region in the current detection round, while region A will be corrected to belong to the current normal region in the current detection round. Compared with the previous abnormal region, the current abnormal region is closer to the actual abnormal region, realizing the approximation of the actual abnormal region based on the previous abnormal region, and improving the accuracy of abnormal region positioning.

[0088] In this embodiment, steps S310 to S350 are described with the first round of detection as the current detection round. Specifically, as follows: Figure 6 As shown:

[0089] S310: Acquire the image to be detected.

[0090] S320: Perform feature encoding processing on the image to be detected to obtain at least one first feature encoding map.

[0091] The image to be detected is input into the encoder for feature encoding processing. Feature encoding represents the extraction and description of image information of the image to be detected.

[0092] S330: Fuse at least one first feature coding map to obtain the current feature fusion map.

[0093] In the first round of detection, since there is no previous abnormal region, the fusion of at least one first feature coding map does not include masking. That is, at least one first feature coding map is directly downsampled and spliced ​​at the corresponding scale to obtain the current feature fusion map.

[0094] S340: Perform feature decoding processing on the current feature fusion map to obtain at least one current feature decoding map.

[0095] S350: Determine the current abnormal region corresponding to the image to be detected in the current detection round based on at least one first feature encoding map and at least one current feature decoding map.

[0096] Steps S340 and S350 can be referred to as steps S230 and S240 respectively, and will not be repeated here.

[0097] Figure 7 This is a flowchart illustrating an image detection method based on a back-distillation anomaly detection model provided in an embodiment of this application. (Refer to...) Figure 7 An image detection method provided in this application embodiment may include the following steps:

[0098] S410: Acquire the image to be detected.

[0099] S420: Input the image to be detected into the encoder in the reverse distillation anomaly detection model to obtain at least one first feature encoding map corresponding to the image to be detected.

[0100] In the current detection round, the encoder in the reverse distillation anomaly detection model can perform feature extraction and encoding on the image to be detected to obtain at least one first feature encoding map corresponding to the image to be detected, or at least one first feature encoding map obtained in the first round of detection can be used directly.

[0101] S430: The fusion module in the reverse distillation anomaly detection model performs mask fusion on at least one first feature coding map based on the previous anomaly region to obtain the current feature fusion map.

[0102] S440: Input the current feature fusion map into the decoder of the reverse distillation anomaly detection model to perform feature decoding processing and obtain at least one current feature decoding map.

[0103] Steps S430 and S440 can be referred to steps S230 and S240 in the foregoing embodiments, and will not be repeated here.

[0104] like Figure 7As shown, the backdistillation anomaly detection model can include an encoding module (encoder 1, encoder 2, and encoder 3), a fusion module, and a decoding module (decoder 1, decoder 2, and decoder 3). The encoding module extracts multi-scale image features from the image to be detected. Encoder 1, encoder 2, and encoder 3 each output a first feature encoding map. Specifically, encoder 2 can extract features based on the first feature encoding map output by encoder 1, and encoder 3 can extract features based on the first feature encoding map output by encoder 2. The low-dimensional first feature encoding map can contain information such as texture and edges, while the high-dimensional first feature encoding map can contain information such as semantic structure. The fusion module aggregates the first feature encoding maps of different scales and reduces their dimensionality to a low-dimensional space, retaining the information used for feature decoding and reconstruction, and also compressing features and reducing redundant information. For a detailed fusion process, please refer to [reference needed]. Figure 3 The illustrated embodiment will not be described in detail here. The decoding module mainly implements the distillation process. The decoding module first restores the deep features from the current feature fusion map, and then restores the shallow features to obtain the current feature decoding map at multiple scales. Figure 7 The backdistillation anomaly detection model shown integrates knowledge backdistillation and an encoder-decoder framework. The encoding and decoding modules have a symmetrical network architecture, which allows the feature dimensions of corresponding positions to be consistent. For example, the feature maps output by encoder 1 and decoder 1 have the same scale, and the feature dimensions of the feature points in the feature maps are the same. Based on the first feature encoding map and the current feature decoding map of the same scale, multiple anomaly index distribution maps M1, M2, and M3 are determined, and the current anomaly region in the current detection round can be determined based on M1, M2, and M3. Figure 7 The back distillation anomaly detection model shown can also include a feature extraction module, which can be a ResBlock, to obtain a fused feature representation that is more suitable for feature reconstruction by a decoder.

[0105] The back-distillation anomaly detection model disclosed in the above embodiments integrates back-distillation of knowledge and the encoder-decoder framework to perform multi-scale feature encoding, which enriches the feature information. At the same time, compared with directly restoring the first feature encoding map of the last encoder (which contains a lot of redundant information in semantic structure), the above embodiments adopt multi-scale fusion and multi-scale decoding processing, which reduces the difficulty of feature decoding and recovery and ensures the effective restoration of key feature information.

[0106] Figure 8 This is a schematic diagram of the training process of a reverse distillation anomaly detection model provided in an embodiment of this application, referred to... Figure 8 The model training process may include the following steps:

[0107] S410: Obtain sample images, which are positive samples.

[0108] Positive samples represent normal images. During model training, all sample images can be used for iterative training; one iteration is called an epoch. Since the sample images are positive, the model to be trained performs one round of image detection on each sample image in one iteration, eliminating the need for multiple rounds of detection.

[0109] S420: Input the sample image into the encoder of the model to be trained, perform feature encoding processing, and obtain at least one first sample feature encoding map corresponding to the sample image.

[0110] S430: The fusion module in the model to be trained performs random mask fusion on at least one first sample feature encoding map to obtain a sample feature fusion map.

[0111] For the sample images of positive samples, there are no abnormal regions. Therefore, a random mask can be used to partially mask the feature encoding map of the first sample in order to train the model decoder's ability to decode and restore normal features.

[0112] Specifically, step S430 may include the following steps:

[0113] S431: Determine at least one random mask region, wherein the at least one random mask region is encoded with at least one first sample feature. Figure 1 One-to-one correspondence.

[0114] For example, Figure 9 The first sample feature encoding map S E1 The corresponding random mask region Ma1 can be a certain proportion of noise region, as shown in formula (2). The noise region function is represented by ρ, which is a predefined parameter (default value can be 0.001). Epoch represents the training rounds; as the training rounds increase, the random mask region gradually increases to continuously improve the decoder's context awareness and thus enhance feature decoding and reconstruction capabilities. Scaling Ma1 yields the feature encoding map S of the first sample. E2 The corresponding random mask region Ma2 and the first sample feature encoding map S E3 The corresponding random mask region Ma3.

[0115]

[0116] S432: Based on at least one random mask region, perform masking processing on at least one first sample feature coding map to obtain at least one second sample feature coding map.

[0117] For example, such asFigure 9 As shown, S E1 It can represent the feature vector of each sample feature point, and the random mask region Ma1 is also S. E1 The feature masking area, For S E1 The feature preservation region is achieved based on formula (3) for S. E1 The masking process yields the corresponding second sample feature encoding map S'. E1 :

[0118]

[0119] Formula (3) represents the... With Ma1, Perform a dot product operation between them, set the region Ma1 to 0, and leave the other regions unchanged. Correspondingly, adjust S according to Ma2 and Ma3 respectively. E2 S E3 Perform a similar masking process to obtain S'. E2 S' E3 .

[0120] S433: Downsample and concatenate at least one second sample feature encoding map to obtain a sample feature fusion map.

[0121] like Figure 9 As shown, based on the scale ratio relationship between at least one first feature coding map, S' E1 S" is obtained by performing downsampling processing with two 3x3 convolutions. E1 , to S' E2 S" is obtained by performing a 3x3 convolution downsampling process. E2 , by S' E3 S" is obtained directly E3 Then S" E1 S" E2 and S" E3 Furthermore, the concatenated features can be further processed by a 1×1 convolution operation to reduce the number of channels, resulting in the sample feature fusion map in this embodiment.

[0122] In the embodiments disclosed in S431-S433 above, when the scales of at least one first sample feature coding map are different, a random mask region adapted to the scale of each first sample feature coding map is determined, so that the feature masking regions in each first sample feature coding map correspond to each other, providing an accurate data basis for subsequent feature decoding and restoration. At the same time, at least one second sample feature coding map is downsampled and spliced. The resulting sample feature fusion map retains the information used to realize feature decoding and restoration, and reduces the redundancy of feature data, which can improve the efficiency of model training.

[0123] S440: Input the sample feature fusion map into the decoder in the model to be trained, perform feature decoding processing, and obtain at least one sample feature decoding map.

[0124] Step S440 can be referred to the aforementioned embodiments, and will not be repeated here.

[0125] S450: Determine the target loss information based on at least one first sample feature encoding map and at least one sample feature decoding map.

[0126] Specifically, the first sample feature encoding map is a feature map obtained after feature encoding processing of the sample image. The sample feature decoding map, which has the same scale as the first sample feature encoding map, is the result of feature decoding and restoration after masking the features corresponding to the random masked region, assuming that the sample image is a normal image. Considering that the sample image is actually a normal image, ideally, the feature representation data contained in the first sample feature encoding map and the corresponding sample feature decoding map should be consistent. During the training phase, there will be differences between the first sample feature encoding map and the corresponding sample feature decoding map. The difference in feature representation between the first sample feature encoding map and the corresponding sample feature decoding map can be used to determine the target loss information, thereby adjusting the model to be trained to improve the decoder's feature decoding and restoration capability for positive samples. It should be noted that the training method provided in this application uses positive samples for training, so the decoder only has the ability to decode and restore the features of positive samples, or in other words, the ability to decode and restore features in normal areas of the image, but not the ability to decode and restore features in abnormal areas of the image.

[0127] Specifically, step S450 may include the following steps:

[0128] S451: Determine at least one set of sample feature maps based on at least one first sample feature encoding map and at least one sample feature decoding map; the sample feature maps include first sample feature encoding maps and sample feature decoding maps of the same scale.

[0129] S452: Based on at least one random mask region, determine at least one feature masking region and feature preservation region for each of the first sample feature coding maps.

[0130] That is, the feature encoding map of the first sample is divided into regions according to the random mask region, and the feature masking region is equivalent to the random mask region corresponding to the feature encoding map of the first sample.

[0131] S453: Based on the first sample feature encoding map and the sample feature decoding map in each set of sample feature maps, determine at least one set of sample loss information; the sample loss information includes the first sample loss information corresponding to the feature occlusion region and the second sample loss information corresponding to the feature preservation region.

[0132] For a set of sample feature maps, for two feature points at the same position in the first sample feature encoding map and the sample feature decoding map of the same scale, calculate the similarity index between them, such as cosine similarity, spatial distance, etc., to obtain the anomaly map corresponding to each set of sample feature maps. The scale of the anomaly map is consistent with the scale of the sample feature maps. The anomaly index indicated by the anomaly map is negatively correlated with the similarity index.

[0133] The first sample loss information corresponding to the feature-masked region can be the sum of the abnormal index values ​​corresponding to each feature point within the feature-masked region, and the second sample loss information corresponding to the feature-preserving region can be the sum of the abnormal index values ​​corresponding to each feature point within the feature-masked region. In another feasible approach, the first sample loss information can be the average of the sums of the abnormal index values ​​corresponding to each feature point within the feature-masked region, and the second sample loss information can be obtained similarly.

[0134] S454: The first sample loss information and the second sample loss information in each group of sample loss information are weighted and summed to obtain at least one target sample loss information.

[0135] S455: Summing the loss information of at least one target sample yields the target loss information.

[0136] Reference Figure 8 The target loss information Loss can be calculated as shown in formula (4), where Li represents the target sample loss information corresponding to the feature map of the i-th (i = 1, 2, or 3) group of samples, and σ represents the first sample loss information L corresponding to the feature occlusion region. i*Mai The learning weights, where 1-σ is the loss information of the second sample corresponding to the feature-preserving region. The learning weights, σ, can be set between 0.6 and 0.7, which makes the learning speed for the feature-occluded region higher and the learning speed for the feature-preserving region slightly slower, thereby accelerating the learning of the decoding ability of the occluded features and improving the training efficiency of the model.

[0137]

[0138] In the above embodiments, first sample loss information and second sample loss information are determined based on the feature occlusion region and the feature preservation region, respectively. The first sample loss information and second sample loss information in each group of sample loss information are then weighted and summed to obtain the target sample loss information corresponding to each group of sample feature maps. Finally, the target sample loss information is summed to obtain the target loss information. Assigning weights to the sample loss information of the feature occlusion region and the feature preservation region can accelerate the learning of the decoding and reconstruction capability of occlusion features, while also improving the training efficiency of the model.

[0139] S460: Adjust the model to be trained based on the target loss information to obtain the back distillation anomaly detection model.

[0140] refer to Figure 8 The training process shown uses encoders 1 to 3, which are pre-trained network modules using positive and negative samples. These modules can be ResNet, WideResNet, etc. Therefore, during the training process provided in steps S410-S460, it is not necessary to update the encoder weights; only the decoder, fusion module, and possibly a feature extraction module need to be adjusted. Figure 8 The direction shown by the dashed line is used to backpropagate the target loss information and adjust the weights and other parameters of the decoder, fusion module, etc., to obtain the back distillation anomaly detection model. The back distillation anomaly detection model includes encoders 1 to 3, decoders 1 to 3, fusion module, etc.

[0141] In the above embodiments, a back-distillation anomaly detection model is constructed and trained based on the inverse knowledge distillation and encoder-decoder framework. This model can efficiently realize the feature encoding, fusion, and decoding processing in the image detection method provided in this application embodiment. At the same time, only positive samples are used during the training phase to enable the decoder to decode and restore features in the normal area of ​​the image, while it does not have the ability to decode and restore features in the abnormal area of ​​the image. Thus, the representation of features in the abnormal area differs from the output of the encoder, so as to accurately locate the abnormal area.

[0142] The method provided in this application can be used for anomaly detection in various industrial products, achieving better defect location results. Especially for rigid objects commonly found on industrial production lines, the improved accuracy in anomaly areas allows for product grading based on the size of the anomaly area, thus significantly saving manpower. Taking a toothbrush as an example... Figure 10As shown, the image to be detected is an image taken of a toothbrush. The image is input into the back-distillation anomaly detection model provided in this embodiment for a first round of detection, resulting in a first-round anomaly heatmap. This first-round anomaly heatmap is also the distribution map of anomaly indicators in the first round of detection. The first-round anomaly heatmap is binarized according to a threshold to determine the first-round anomaly region. In the second round of detection, the back-distillation anomaly detection model performs feature fusion and decoding based on multiple first feature encoding maps corresponding to the image to be detected and the first-round anomaly region, obtaining multiple current feature decoding maps. Based on these multiple first feature encoding maps and multiple current feature decoding maps, a second-round anomaly heatmap is determined. This second-round anomaly heatmap is also the distribution map of anomaly indicators in the second round of detection. Furthermore, the second-round anomaly region can be determined based on the second-round anomaly heatmap. The second-round anomaly region is located in... Figure 10 Although not shown in the figure, the difference between the first and second rounds of abnormal heat maps shows that the current abnormal area is constantly approaching the actual abnormal area, which can improve the accuracy of defect area detection.

[0143] like Figure 11 As shown, in the inspection of various industrial products, compared with the abnormal areas determined by the prior art, the abnormal areas determined by the present application embodiment are closer to the true abnormal value. In practical application scenarios, it is easier to set the segmentation threshold, which is conducive to the automatic classification of defective products.

[0144] This application also provides an image detection device 1200, such as... Figure 12 As shown, the device 1200 may include:

[0145] The acquisition module 1210 is used to acquire at least one first feature encoding map corresponding to the image to be detected, and the previous abnormal region corresponding to the image to be detected in the previous detection round;

[0146] The mask fusion module 1220 is used to perform mask fusion on the at least one first feature coding map based on the previous abnormal region to obtain a current feature fusion map; the current feature fusion map represents the image features of the image to be detected other than the previous abnormal region;

[0147] Decoding module 1230 is used to perform feature decoding processing on the current feature fusion map to obtain at least one current feature decoding map;

[0148] The abnormal region determination module 1240 is used to determine the current abnormal region corresponding to the image to be detected in the current detection round based on the at least one first feature encoding map and the at least one current feature decoding map.

[0149] In one embodiment of this application, the mask fusion module 1220 may include:

[0150] The mask region determination unit is configured to determine at least one current mask region based on the previous abnormal region; the at least one current mask region and the at least one first feature code. Figure 1 One-to-one correspondence;

[0151] A masking unit is used to perform masking processing on the at least one first feature coding map according to the at least one current masking region to obtain at least one second feature coding map;

[0152] The fusion unit is used to downsample and concatenate the at least one second feature coding map to obtain the current feature fusion map.

[0153] In one embodiment of this application, the masking unit may include:

[0154] The region partitioning subunit is used to determine the feature occlusion region and the feature preservation region in the target feature coding map based on the target mask region; the target feature coding map is any one of the at least one first feature coding maps, and the target mask region is the current mask region in the at least one current mask region that corresponds to the target feature coding map;

[0155] The feature data update subunit is used to update the feature data within the feature occlusion area to obtain the updated feature occlusion area;

[0156] The feature combination subunit is used to obtain the second feature encoding map corresponding to the target feature encoding map based on the updated feature occlusion region and the feature preservation region.

[0157] In one embodiment of this application, the feature data update subunit can be used to set the feature data to zero or add noise data to the feature data to obtain the updated feature occlusion region.

[0158] In one embodiment of this application, the abnormal region determination module 1240 may include:

[0159] A feature map combining unit is configured to determine at least one set of feature maps based on the at least one first feature encoding map and the at least one current feature decoding map; the feature maps include the first feature encoding map and the current feature decoding map with the same scale;

[0160] The first unit for determining the distribution of abnormal indicators is used to obtain at least one distribution map of abnormal indicators based on the similarity relationship between the first feature encoding map and the current feature decoding map in each group of feature maps.

[0161] The second abnormal indicator distribution determination unit is used to overlay the at least one abnormal indicator distribution map to obtain the current abnormal indicator distribution map;

[0162] The current abnormal region determination unit is used to perform binarization processing on the current abnormal indicator distribution map based on a preset threshold to determine the current abnormal region.

[0163] In one embodiment of this application, the apparatus may further include:

[0164] The image acquisition module is used to acquire the image to be detected;

[0165] The model encoding module is used to input the image to be detected into the encoder in the reverse distillation anomaly detection model to obtain the at least one first feature encoding map corresponding to the image to be detected.

[0166] The model fusion module is used to perform mask fusion on the at least one first feature coding map based on the previous abnormal region by the fusion module in the back distillation anomaly detection model to obtain the current feature fusion map;

[0167] The model decoding module is used to input the current feature fusion map into the decoder of the reverse distillation anomaly detection model, perform feature decoding processing, and obtain the at least one current feature decoding map.

[0168] In one embodiment of this application, the device 1200 may further include:

[0169] A sample image acquisition module is used to acquire sample images, wherein the sample images are positive samples;

[0170] The sample encoding module is used to input the sample image into the encoder in the model to be trained, perform feature encoding processing, and obtain at least one first sample feature encoding map corresponding to the sample image.

[0171] The sample mask fusion module is used to perform random mask fusion on the at least one first sample feature encoding map by the fusion module in the model to be trained, so as to obtain a sample feature fusion map.

[0172] The sample decoding module is used to input the sample feature fusion map into the decoder in the model to be trained, perform feature decoding processing, and obtain at least one sample feature decoding map.

[0173] The target loss information determination module is used to determine target loss information based on the at least one first sample feature encoding map and the at least one sample feature decoding map;

[0174] The model adjustment module is used to adjust the model to be trained based on the target loss information to obtain the back distillation anomaly detection model.

[0175] In one embodiment of this application, the sample mask fusion module may include:

[0176] A random mask region determination unit is configured to determine at least one random mask region, wherein the at least one random mask region is encoded with the at least one first sample feature. Figure 1 One-to-one correspondence;

[0177] A random masking unit is used to perform masking processing on the at least one first sample feature coding map according to the at least one random masking region to obtain at least one second sample feature coding map.

[0178] The sample fusion unit is used to downsample and concatenate the at least one second sample feature encoding map to obtain the sample feature fusion map.

[0179] In one embodiment of this application, the target loss information determination module may include:

[0180] A sample feature map combining unit is configured to determine at least one set of sample feature maps based on the at least one first sample feature encoding map and the at least one sample feature decoding map; the sample feature maps include the first sample feature encoding map and the sample feature decoding map with the same scale;

[0181] The region segmentation unit is used to determine the feature masking region and feature preservation region of each of the at least one first sample feature coding map according to at least one random mask region;

[0182] The sample loss information determination unit is used to determine at least one set of sample loss information based on the first sample feature encoding map and the sample feature decoding map in each set of sample feature maps; the sample loss information includes the first sample loss information corresponding to the feature occlusion region and the second sample loss information corresponding to the feature preservation region;

[0183] The target sample loss information determination unit is used to perform a weighted summation of the first sample loss information and the second sample loss information in each group of sample loss information to obtain at least one target sample loss information.

[0184] The target loss information determination unit is used to sum the loss information of the at least one target sample to obtain the target loss information.

[0185] It should be noted that the apparatus provided in the above embodiments is only illustrated by the division of the above functional modules when implementing its functions. In actual applications, the above functions can be assigned to different functional modules as needed, that is, the internal structure of the device can be divided into different functional modules to complete all or part of the functions described above. In addition, the apparatus and method embodiments provided in the above embodiments belong to the same concept, and the specific implementation process can be found in the method embodiments, which will not be repeated here.

[0186] This application provides a computer device including a processor and a memory. The memory stores at least one instruction or at least one program, which is loaded and executed by the processor to implement an image detection method as provided in the above method embodiments.

[0187] Figure 13 A schematic diagram of the hardware structure of an apparatus for implementing an image detection method provided in an embodiment of this application is shown. This apparatus may constitute or include the device or system provided in the embodiment of this application. Figure 13 As shown, device 10 may include one or more processors 1002 (shown as 1002a, 1002b, ..., 1002n in the figure) 1002 (processor 1002 may include, but is not limited to, a microprocessor MCU or a programmable logic device FPGA, etc.), a memory 1004 for storing data, and a transmission device 1006 for communication functions. In addition, it may also include: a display, an input / output interface (I / O interface), a universal serial bus (USB) port (which may be included as one of the ports of the I / O interface), a network interface, a power supply, and / or a camera. Those skilled in the art will understand that... Figure 13 The structure shown is for illustrative purposes only and does not limit the structure of the electronic device described above. For example, device 10 may also include a... Figure 13 The more or fewer components shown, or having the same Figure 13 The different configurations shown.

[0188] It should be noted that the aforementioned one or more processors 1002 and / or other data processing circuits are generally referred to herein as "data processing circuits". These data processing circuits may be embodied, in whole or in part, in software, hardware, firmware, or any other combination thereof. Furthermore, the data processing circuits may be a single, independent processing module, or may be wholly or partially integrated into any other element within device 10 (or mobile device). As involved in the embodiments of this application, the data processing circuits serve as a processor control mechanism (e.g., selection of a variable resistor termination path connected to an interface).

[0189] The memory 1004 can be used to store software programs and modules of application software, such as the program instructions / data storage device corresponding to the method described in the embodiments of this application. The processor 1002 executes various functional applications and data processing by running the software programs and modules stored in the memory 1004, thereby realizing the above-described image detection method. The memory 1004 may include high-speed random access memory, and may also include non-volatile memory, such as one or more magnetic storage devices, flash memory, or other non-volatile solid-state memory. In some instances, the memory 1004 may further include memory remotely located relative to the processor 1002, and these remote memories can be connected to the device 10 via a network. Examples of such networks include, but are not limited to, the Internet, corporate intranets, local area networks, mobile communication networks, and combinations thereof.

[0190] The transmission device 1006 is used to receive or send data via a network. Specific examples of the network described above may include a wireless network provided by the communication provider of device 10. In one example, the transmission device 1006 includes a Network Interface Controller (NIC), which can connect to other network devices via a base station to communicate with the Internet. In another example, the transmission device 1006 may be a Radio Frequency (RF) module, used for wireless communication with the Internet.

[0191] The display may be, for example, a touchscreen liquid crystal display (LCD) that allows a user to interact with the user interface of device 10 (or a mobile device).

[0192] This application also provides a computer-readable storage medium, which can be disposed in a server to store at least one instruction or at least one program related to implementing an image detection method in the method embodiment. The at least one instruction or at least one program is loaded and executed by the processor to implement the image detection method provided in the above method embodiment.

[0193] Optionally, in this embodiment, the storage medium may be located at at least one of the multiple network servers in a computer network. Optionally, in this embodiment, the storage medium may include, but is not limited to, various media capable of storing program code, such as USB flash drives, read-only memory (ROM), random access memory (RAM), portable hard drives, magnetic disks, or optical disks.

[0194] This invention also provides a computer program product or computer program, which includes computer instructions stored in a computer-readable storage medium. A processor of a computer device reads the computer instructions from the computer-readable storage medium and executes the computer instructions, causing the computer device to perform an image detection method provided in the various optional embodiments described above.

[0195] As can be seen from the above embodiments, the image detection method, apparatus, medium, and device provided in this application have the following technical effects:

[0196] The solution provided in this application utilizes the previous abnormal region corresponding to the image to be detected in the previous detection round to perform mask fusion on at least one first feature encoding map corresponding to the image to be detected, thereby obtaining a current feature fusion map. The current feature fusion map can represent the image features of the image to be detected other than the previous abnormal region. The current feature fusion map is then subjected to feature decoding processing to obtain at least one current feature decoding map. The current feature decoding map is the result of decoding and restoring the image features corresponding to the previous abnormal region after masking. For the features in the previous abnormal region, there will be differences in feature representation between at least one first feature encoding map and at least one current feature decoding map. By utilizing the differences in feature representation, the current abnormal region in the current detection round can be determined. Compared with the previous abnormal region, the current abnormal region can correct the part of the previous abnormal region that was misjudged as an abnormal region to a normal region, thereby improving the accuracy of abnormal region localization.

[0197] The solution provided in this application also employs a multi-round detection method, which continuously brings the current abnormal area closer to the actual abnormal area, thereby improving the accuracy of abnormal area location and enhancing the accuracy of abnormal detection.

[0198] It should be noted that the order of the embodiments described above is merely for descriptive purposes and does not represent the superiority or inferiority of the embodiments. Furthermore, the above description focuses on specific embodiments of this application. Other embodiments are within the scope of the appended claims. In some cases, the actions or steps described in the claims can be performed in a different order than that shown in the embodiments and still achieve the desired results. Additionally, the processes depicted in the drawings do not necessarily require a specific or sequential order to achieve the desired results. In some implementations, multitasking and parallel processing are also possible or may be advantageous.

[0199] The various embodiments in this application are described in a progressive manner. Similar or identical parts between embodiments can be referred to mutually. Each embodiment focuses on describing the differences from other embodiments. In particular, the device, equipment, and storage medium embodiments are basically similar to the method embodiments, so the descriptions are relatively simple; relevant parts can be referred to the descriptions of the method embodiments.

[0200] Those skilled in the art will understand that all or part of the steps of the above embodiments can be implemented by hardware or by a program instructing related hardware. The program can be stored in a computer-readable storage medium, such as a read-only memory, a disk, or an optical disk.

[0201] The above description is only a preferred embodiment of this application and is not intended to limit this application. Any modifications, equivalent substitutions, improvements, etc., made within the spirit and principles of this application should be included within the protection scope of this application.

Claims

1. An image detection method, characterized in that, The method includes: Obtain at least one first feature encoding map corresponding to the image to be detected, and the previous abnormal region corresponding to the image to be detected in the previous detection round; Based on the previous abnormal region, the at least one first feature coding map is masked and fused to obtain the current feature fusion map; the current feature fusion map represents the image features of the image to be detected other than the previous abnormal region; The current feature fusion map is subjected to feature decoding processing to obtain at least one current feature decoding map; Based on the at least one first feature encoding map and the at least one current feature decoding map, the current abnormal region corresponding to the image to be detected in the current detection round is determined.

2. The method according to claim 1, characterized in that, The step of performing mask fusion on the at least one first feature coding map based on the previous abnormal region to obtain the current feature fusion map includes: Based on the previous abnormal region, at least one current mask region is determined; the at least one current mask region corresponds one-to-one with the at least one first feature coding map; Based on the at least one current mask region, the at least one first feature coding map is masked accordingly to obtain at least one second feature coding map; The at least one second feature encoding map is downsampled and concatenated to obtain the current feature fusion map.

3. The method according to claim 2, characterized in that, The step of masking the at least one first feature coding map according to the at least one current mask region to obtain at least one second feature coding map includes: Based on the target mask region, the feature masking region and the feature preservation region in the target feature coding map are determined; the target feature coding map is any one of the at least one first feature coding maps, and the target mask region is the current mask region in the at least one current mask region that corresponds to the target feature coding map; The feature data within the feature occlusion region is updated to obtain the updated feature occlusion region; Based on the updated feature occlusion region and feature preservation region, a second feature encoding map corresponding to the target feature encoding map is obtained.

4. The method according to claim 3, characterized in that, The step of updating the feature data within the feature occlusion region to obtain the updated feature occlusion region includes: The feature data is set to zero or noise data is added to the feature data to obtain the updated feature occlusion region.

5. The method according to claim 1, characterized in that, The step of determining the current abnormal region corresponding to the image to be detected in the current detection round based on the at least one first feature encoding map and the at least one current feature decoding map includes: Based on the at least one first feature encoding map and the at least one current feature decoding map, at least one set of feature maps is determined; the feature maps include the first feature encoding map and the current feature decoding map with the same scale; Based on the similarity relationship between the first feature encoding map and the current feature decoding map in each group of feature maps, at least one anomaly index distribution map is obtained; The distribution maps of the at least one abnormal indicator are superimposed to obtain the current distribution map of the abnormal indicator. Based on a preset threshold, the current abnormal indicator distribution map is binarized to determine the current abnormal region.

6. The method according to claim 1, characterized in that, The method further includes: Acquire the image to be detected; The image to be detected is input into the encoder in the reverse distillation anomaly detection model to obtain the at least one first feature encoding map corresponding to the image to be detected. The fusion module in the reverse distillation anomaly detection model performs mask fusion on the at least one first feature coding map based on the previous anomaly region to obtain the current feature fusion map; The current feature fusion map is input into the decoder of the reverse distillation anomaly detection model for feature decoding processing to obtain the at least one current feature decoding map.

7. The method according to claim 6, characterized in that, The method further includes: Acquire sample images, wherein the sample images are positive samples; The sample image is input into the encoder of the model to be trained and feature encoding is performed to obtain at least one first sample feature encoding map corresponding to the sample image. The fusion module in the model to be trained performs random mask fusion on the feature encoding map of at least one first sample to obtain a sample feature fusion map. The sample feature fusion map is input into the decoder in the model to be trained for feature decoding processing to obtain at least one sample feature decoding map. The target loss information is determined based on the at least one first sample feature encoding map and the at least one sample feature decoding map; The model to be trained is adjusted based on the target loss information to obtain the reverse distillation anomaly detection model.

8. The method according to claim 7, characterized in that, The step of performing random mask fusion on the at least one first sample feature encoding map by the fusion module in the model to be trained to obtain a sample feature fusion map includes: Determine at least one random mask region, wherein the at least one random mask region corresponds one-to-one with the at least one first sample feature coding map; Based on the at least one random mask region, the at least one first sample feature coding map is masked accordingly to obtain at least one second sample feature coding map; The at least one second sample feature encoding map is downsampled and spliced ​​to obtain the sample feature fusion map.

9. The method according to claim 7, characterized in that, The step of determining the target loss information based on the at least one first sample feature encoding map and the at least one sample feature decoding map includes: Based on the at least one first sample feature encoding map and the at least one sample feature decoding map, at least one set of sample feature maps is determined; the sample feature maps include the first sample feature encoding map and the sample feature decoding map with the same scale; Based on at least one random mask region, the feature masking region and feature preservation region of each of the at least one first sample feature coding maps are determined accordingly; Based on the first sample feature encoding map and the sample feature decoding map in each set of sample feature maps, at least one set of sample loss information is determined; the sample loss information includes the first sample loss information corresponding to the feature occlusion region and the second sample loss information corresponding to the feature preservation region. The first sample loss information and the second sample loss information in each group of sample loss information are weighted and summed to obtain at least one target sample loss information. The target loss information is obtained by summing the loss information of the at least one target sample.

10. An image detection device, characterized in that, The device includes: The acquisition module is used to acquire at least one first feature encoding map corresponding to the image to be detected, and the previous abnormal region corresponding to the image to be detected in the previous detection round; The mask fusion module is used to perform mask fusion on the at least one first feature coding map based on the previous abnormal region to obtain a current feature fusion map; the current feature fusion map represents the image features of the image to be detected other than the previous abnormal region; The decoding module is used to perform feature decoding processing on the current feature fusion map to obtain at least one current feature decoding map; An abnormal region determination module is used to determine the current abnormal region corresponding to the image to be detected in the current detection round based on the at least one first feature encoding map and the at least one current feature decoding map.

11. A computer-readable storage medium, characterized in that, The computer-readable storage medium stores at least one instruction or at least one program, which is loaded and executed by a processor to implement the image detection method as described in any one of claims 1 to 9.

12. A computer device, characterized in that, The computer device includes a processor and a memory, the memory storing at least one instruction or at least one program, the at least one instruction or at least one program being loaded and executed by the processor to implement the image detection method as described in any one of claims 1 to 9.