Abnormality recognition method and apparatus, and storage medium, electronic device and program product

WO2025139040A9PCT designated stage Publication Date: 2026-06-18TENCENT TECHNOLOGY (SHENZHEN) CO LTD

Patent Information

Authority / Receiving Office
WO · WO
Patent Type
Applications
Current Assignee / Owner
TENCENT TECHNOLOGY (SHENZHEN) CO LTD
Filing Date
2024-09-09
Publication Date
2026-06-18

AI Technical Summary

Technical Problem

In existing technologies, anomaly detection is inefficient and requires retraining the model on past objects, resulting in high resource consumption and low efficiency.

Method used

The first stage of feature extraction is performed by acquiring images of objects to determine basic visual features. Then, the target guidance information is matched with the first visual features whose similarity to standard visual features is greater than a preset threshold to perform the second stage of feature extraction. Finally, the anomaly identification results are obtained by comparison.

🎯Benefits of technology

It enables rapid identification of past objects, reduces computational load, ensures identification accuracy, and improves anomaly identification efficiency.

✦ Generated by Eureka AI based on patent content.

Smart Images

  • Figure CN2024117645_18062026_PF_FP_ABST
    Figure CN2024117645_18062026_PF_FP_ABST
Patent Text Reader

Abstract

Disclosed in the present application are an abnormality recognition method and apparatus, and a storage medium and an electronic device. The method comprises: acquiring a first image to be subjected to recognition; performing first-stage feature extraction on the first image to obtain at least one basic visual feature; determining, from among at least one standard visual feature, a first visual feature having a feature similarity to the basic visual feature that is greater than or equal to a preset threshold value; acquiring target guidance information matching the first visual feature; by means of the target guidance information, performing second-stage feature extraction on the first image to obtain at least one second visual feature related to a target object type; when a standard visual feature of a second image is acquired, comparing the at least one second visual feature with the standard visual feature of the second image to obtain a comparison result; and by means of the comparison result, determining an abnormality recognition result for the first image. The present application solves the technical problem of the abnormality recognition efficiency being low.
Need to check novelty before this filing date? Find Prior Art

Description

Anomaly detection methods, devices, storage media, electronic devices, and software products

[0001] This application claims priority to Chinese Patent Application No. 2023118209059, filed on December 26, 2023, entitled “Anomaly Identification Method, Apparatus, Storage Medium and Electronic Device”, the entire contents of which are incorporated herein by reference. Technical Field

[0002] This application relates to the field of computer science, and more specifically, to anomaly detection. Background Technology

[0003] In anomaly detection scenarios, pre-trained models are typically used to identify anomalies in objects. However, for past objects, the model needs to be retrained to successfully identify them. Furthermore, model training itself requires significant resources, leading to low anomaly detection efficiency. Therefore, anomaly detection efficiency remains a problem.

[0004] There is currently no effective solution to the above problems.

[0005] Summary of the Invention

[0006] This application provides an anomaly identification method, apparatus, storage medium, electronic device, and program product to at least solve the technical problem of low anomaly identification efficiency.

[0007] According to one aspect of the embodiments of this application, an anomaly identification method is provided, comprising: acquiring a first image to be identified, wherein the first image is an image obtained by collecting an object; performing a first-stage feature extraction on the first image to obtain at least one basic visual feature, wherein the basic visual feature is used to represent the visual attributes presented by the object surface, the object surface including the surface of the object; determining a first visual feature from at least one standard visual feature whose feature similarity with the basic visual feature is greater than or equal to a preset threshold, wherein the standard visual feature is a feature obtained by feature extraction from multiple sample images belonging to multiple object types, and the sample image corresponding to the first visual feature is a second image belonging to the target object type; acquiring target guidance information matching the first visual feature, wherein the target guidance information is used to guide attention to features in the image related to the target object type; performing a second-stage feature extraction on the first image using the target guidance information to obtain at least one second visual feature related to the target object type; comparing the at least one second visual feature with the standard visual feature of the second image to obtain a comparison result; and determining an anomaly identification result of the first image based on the comparison result.

[0008] According to another aspect of the embodiments of this application, an anomaly recognition device is also provided, comprising: a first acquisition unit, configured to acquire a first image to be recognized, wherein the first image is an image obtained by collecting an object; a first extraction unit, configured to perform a first-stage feature extraction on the first image to obtain at least one basic visual feature, wherein the basic visual feature is used to represent the visual attributes presented by the object surface, the object surface including the surface of the object; and a first determination unit, configured to determine, from at least one standard visual feature, a first visual feature whose feature similarity with the basic visual feature is greater than or equal to a preset threshold, wherein the standard visual feature is obtained by processing multiple sample images belonging to multiple object types. The first visual feature is extracted from the first visual feature, and the sample image corresponding to the first visual feature is a second image belonging to the target object type. The second acquisition unit is used to acquire the target guidance information matched by the first visual feature, wherein the target guidance information is used to guide attention to features in the image related to the target object type. The second extraction unit is used to perform a second-stage feature extraction on the first image through the target guidance information to obtain at least one second visual feature related to the target object type. The first comparison unit is used to compare the at least one second visual feature with the standard visual features of the second image to obtain a comparison result. The second determination unit is used to determine the anomaly recognition result of the first image through the comparison result.

[0009] According to another aspect of the embodiments of this application, an electronic device is also provided, including a memory, a processor, and a computer program stored in the memory and executable on the processor, wherein the processor executes the above-described anomaly identification method through the computer program.

[0010] In another aspect, embodiments of this application provide a storage medium for storing a computer program for executing the above-mentioned anomaly identification method.

[0011] In another aspect, embodiments of this application provide a computer program product including a computer program, which, when run on a computer, causes the computer to perform the above-mentioned anomaly identification method.

[0012] In this embodiment, a first image to be identified is acquired, wherein the first image is an image obtained by capturing an object; a first-stage feature extraction is performed on the first image to obtain at least one basic visual feature, wherein the basic visual feature is used to represent the visual attributes presented by the object surface, and the object surface includes the surface of the object; from the at least one standard visual feature, a first visual feature is determined whose feature similarity with the basic visual feature is greater than or equal to a preset threshold, wherein the standard visual feature is a feature obtained by performing the first-stage extraction on multiple sample images belonging to multiple object types, and the first visual feature is a feature obtained by performing the first-stage extraction on a second image belonging to the target object type. The process involves: extracting features in the first stage; obtaining target guidance information matching the first visual features, wherein the target guidance information is used to guide attention to features in the image related to the target object type; using the target guidance information, performing a second stage of feature extraction on the first image to obtain at least one second visual feature related to the target object type; obtaining standard visual features of the second image, comparing the at least one second visual feature with the standard visual features of the second image to obtain a comparison result, wherein the standard visual features are those obtained from the second stage of feature extraction on the second image; and determining the anomaly recognition result of the first image based on the comparison result. By extracting basic visual features and first visual features, and comparing them with the standard visual features of the second image, rapid recognition of past objects is achieved. Furthermore, by introducing target guidance information, this embodiment can not only quickly identify past objects, but also focus on features related to the target object type. This not only reduces the amount of computation in the second-stage feature extraction process, but also ensures the accuracy of identifying past objects. Thus, it achieves the goal of quickly identifying past objects while ensuring the accuracy of identification, thereby improving the technical effect of anomaly identification efficiency and solving the technical problem of low anomaly identification efficiency. Attached Figure Description

[0013] Figure 1 is a schematic diagram of the application environment of an optional anomaly identification method according to an embodiment of this application;

[0014] Figure 2 is a schematic flowchart of an optional anomaly identification method according to an embodiment of this application;

[0015] Figure 3 is a schematic diagram of an optional anomaly identification method according to an embodiment of this application;

[0016] Figure 4 is a schematic diagram of another optional anomaly identification method according to an embodiment of this application;

[0017] Figure 5 is a schematic diagram of another optional anomaly identification method according to an embodiment of this application;

[0018] Figure 6 is a schematic diagram of an optional anomaly identification device according to an embodiment of this application;

[0019] Figure 7 is a schematic diagram of the structure of an optional electronic device according to an embodiment of this application. Detailed Implementation

[0020] To enable those skilled in the art to better understand the present application, the technical solutions in the embodiments of the present application will be clearly and completely described below with reference to the accompanying drawings. Obviously, the described embodiments are only some embodiments of the present application, and not all embodiments. Based on the embodiments in the present application, all other embodiments obtained by those of ordinary skill in the art without creative effort should fall within the scope of protection of the present application.

[0021] It should be noted that the terms "first," "second," etc., in the specification, claims, and accompanying drawings of this application are used to distinguish similar objects and are not necessarily used to describe a specific order or sequence. It should be understood that such data can be interchanged where appropriate so that the embodiments of this application described herein can be implemented in orders other than those illustrated or described herein. Furthermore, the terms "comprising" and "having," and any variations thereof, are intended to cover non-exclusive inclusion; for example, a process, method, system, product, or apparatus that comprises a series of steps or units is not necessarily limited to those steps or units explicitly listed, but may include other steps or units not explicitly listed or inherent to such processes, methods, products, or apparatus.

[0022] For ease of understanding, the following terms are explained:

[0023] The solutions provided in this application involve technologies such as computer vision technology in artificial intelligence, and are specifically illustrated through the following embodiments:

[0024] According to one aspect of the embodiments of this application, an anomaly identification method is provided. Optionally, as an optional implementation, the above-mentioned anomaly identification method can be applied to the environment shown in FIG1, but is not limited to. This environment may include, but is not limited to, a user device 102 and a server 112. The user device 102 may include, but is not limited to, a display 104, a processor 106, and a memory 108, and the server 112 includes a database 114 and a processing engine 116.

[0025] The specific process can be summarized in the following steps:

[0026] In step S102, the user equipment 102 acquires a first image to be identified, wherein the first image is an image obtained by capturing an object.

[0027] Step S104: Send the first image to the server 112 via network 110;

[0028] In steps S106-S114, server 112 performs a first-stage feature extraction on the first image through processing engine 116 to obtain at least one basic visual feature; from the at least one standard visual feature, a first visual feature with a feature similarity greater than or equal to a preset threshold with respect to the basic visual feature is determined; target guidance information matching the target object type is obtained; through the target guidance information, a second-stage feature extraction is performed on the first image to obtain at least one second visual feature related to the target object type; and when the standard visual features of the second image are obtained, the at least one second visual feature is compared with the standard visual features of the second image to obtain a comparison result.

[0029] In step S116, the comparison result is sent to user equipment 102 via network 110. User equipment 102 determines the anomaly recognition result of the first image through processor 106 based on the comparison result, displays the anomaly recognition result on display 104, and stores the comparison result in memory 108.

[0030] Besides the example shown in Figure 1, the aforementioned terminal device can be a terminal device configured with a target client, which may include, but is not limited to, at least one of the following: mobile phone (such as Android phone, iOS phone, etc.), laptop computer, tablet computer, PDA, MID (Mobile Internet Device), PAD, desktop computer, smart TV, etc. The target client may be a video client, instant messaging client, browser client, educational client, etc. The aforementioned network may include, but is not limited to, wired network and wireless network, wherein the wired network includes: local area network, metropolitan area network and wide area network, and the wireless network includes: Bluetooth, WIFI and other networks that enable wireless communication. The aforementioned server may be a single server, a server cluster composed of multiple servers, or a cloud server. The above is only an example, and no limitation is made in this embodiment.

[0031] Optionally, as an alternative implementation, as shown in FIG2, the anomaly detection method can be performed by an electronic device, such as a user device or server as shown in FIG1, and the specific steps include:

[0032] S202, acquire the first image to be identified, wherein the first image is an image of the object being acquired;

[0033] S204, perform a first-stage feature extraction on the first image to obtain at least one basic visual feature, wherein the basic visual feature is used to represent the visual attributes presented by the object surface, and the object surface includes the surface of the object.

[0034] S206, from at least one standard visual feature, determine a first visual feature whose feature similarity with the basic visual feature is greater than or equal to a preset threshold, wherein the standard visual feature is a feature obtained by feature extraction from multiple sample images belonging to multiple object types, and the sample image corresponding to the first visual feature is a second image belonging to the target object type.

[0035] S208, Obtain target guidance information for first visual feature matching, wherein the target guidance information is used to guide the focus on features related to the target object type in the image;

[0036] S210, using target guidance information, perform a second-stage feature extraction on the first image to obtain at least one second visual feature related to the target object type;

[0037] S212, compare at least one second visual feature with the standard visual features of the second image to obtain the comparison result;

[0038] S214, by comparing the results, determine the anomaly recognition result of the first image.

[0039] Optionally, in this embodiment, the above-mentioned anomaly identification method can be applied, but is not limited to, continuous inspection scenarios for various industrial products, to achieve abnormal product location results in continuous inspection scenarios. For model recognition methods, whenever the production line changes the items it produces, the detection model needs to be retrained and deployed, even if the replaced item has been detected by the model before. However, the method proposed in this embodiment does not require retraining and deployment for previously encountered items; it only requires additional training when the replaced product on the production line is a new product. This can greatly save the manpower, material resources, and time costs associated with retraining and deployment when encountering old products.

[0040] Optionally, in this embodiment, the object can be understood as, but is not limited to, an actual existing object or an object that needs to be identified. It can be any form of entity, such as a product, part, or organism. For example, on a production line, if anomaly identification is required for every part produced, then the part can be understood as an object. Whenever a new part is produced and captured by a camera, the image captured by that camera can be considered the first image.

[0041] Optionally, in this embodiment, basic visual features that can describe the visual attributes of the object's surface are extracted from the image to capture various attributes of the object's surface, such as color, texture, and shape, providing important information for subsequent anomaly identification. The visual attributes of the object's surface can refer to the characteristics presented by the object's surface that can be perceived and recognized by the visual system. These are the basis for people to observe and identify objects and are also an important basis for computer vision systems to perform image analysis and understanding, such as gloss, color, texture, shape, and transparency.

[0042] In this embodiment, the basic visual features can be features extracted from the image to describe the surface visual attributes of the object. These features cover multiple aspects and together constitute a comprehensive description of the visual characteristics of the object's surface, such as color features, shape features, spatial relationship features, and local features.

[0043] Optionally, in this embodiment, a first visual feature similar to the basic visual feature extracted from the first image is determined from a series of standard visual features. For example, the similarity between the basic visual feature and the standard visual features extracted from multiple sample images is compared, and a standard visual feature with a similarity greater than or equal to a preset threshold is selected from these standard visual features as the first visual feature.

[0044] To further illustrate, consider an optional assumption: a sample image library containing various fruits, with multiple sample images for each fruit. Standard visual features are extracted from these sample images. An image of an apple to be identified (the first image) is then used, from which basic visual features are extracted. Next, this embodiment compares these basic visual features with the standard visual features in the sample image library, identifying first visual features whose similarity to the apple image is greater than or equal to a preset threshold. The fruits included in the sample images corresponding to these first visual features are visually very similar to apples and are highly likely to belong to the same object type. Therefore, these first visual features will be used for subsequent anomaly identification of the apple image.

[0045] Optionally, in this embodiment, the standard visual features can be representative visual features extracted from multiple sample images belonging to various object types. After processing and refinement, the standard visual features can accurately describe and represent the typical visual attributes of a particular object type. The extraction of standard visual features typically involves analyzing and learning from a large number of sample images to extract the most representative and distinguishable features, such as color, texture, shape, spatial relationships, etc., used to describe the visual attributes of an object's surface.

[0046] Optionally, in this embodiment, feature similarity can be an indicator used to measure the degree of similarity between two visual features, typically obtained by calculating the distance or correlation between the two features. In anomaly detection, feature similarity is used to compare the similarity between basic visual features extracted from the image to be identified and standard visual features extracted from the sample image. Feature similarity can be calculated using methods such as Euclidean distance, cosine similarity, and Pearson correlation coefficient. When the similarity between two features is higher than a certain preset threshold, the features can be considered similar.

[0047] Optionally, in this embodiment, object type refers to the classification or category of objects with similar visual attributes and characteristics. In anomaly identification, object type can refer to normal, non-abnormal objects, or it can refer to a specific type of abnormal object. For example, on an industrial production line, object type can refer to type A of product A, type B of product B, or type C of product C. By identifying and classifying object types, anomalies can be detected and handled in a timely manner.

[0048] Optionally, in this embodiment, the sample images are image datasets used to extract standard visual features and first visual features. These datasets contain images of various object types, with multiple sample images for each object type. By analyzing and learning from the sample images, standard visual features representing each object type can be extracted, including first visual features used for comparison and recognition.

[0049] Optionally, in this embodiment, target guidance information matching the target object type is acquired. The purpose of this target guidance information is to guide the focus on features in the image related to the target object type, enabling more accurate identification and analysis of these features. Acquiring and using target guidance information matching the target object type can significantly improve the accuracy and efficiency of anomaly identification. By guiding the image processing algorithm to focus on features related to the target object type, interference from irrelevant information in feature extraction can be reduced, improving the accuracy and efficiency of feature extraction. This will help to more accurately identify abnormal objects or phenomena and take appropriate measures in a timely manner. Simultaneously, by using target guidance information, the need for manual intervention can be reduced, the degree of automation can be increased, and the cost and complexity of anomaly identification can be lowered.

[0050] To further illustrate, consider an anomaly detection task targeting a specific type of fruit (such as an apple). In this case, the target guidance information might include information about the apple's visual features such as color, shape, and texture. This information will guide the image processing algorithm to focus more on apple-related features, such as red or green areas, circular or elliptical shapes, and specific texture patterns. By acquiring and using this target guidance information, this embodiment can more accurately extract apple-related features from the image, thereby enabling anomaly detection.

[0051] Optionally, in this embodiment, the target guidance information can be information specifically designed to guide image processing or analysis algorithms to focus more on features related to a specific object or task. It can take various forms, such as pre-defined rules, models or parameters learned from large amounts of data, or even guidance obtained through expert experience or user interaction. Target guidance information can be used to enhance the feature extraction process, making it more focused on task-related features. For example, in a face recognition task, target guidance information might guide the algorithm to focus more on key areas such as the eyes, nose, and mouth. In deep learning, attention mechanisms are a commonly used technique that dynamically adjusts the model's attention to different image regions through target guidance information. This helps the model better understand and interpret images. Simultaneously, target guidance information can also be used as a supervisory signal during training, helping the model learn and converge to a good solution more quickly.

[0052] Optionally, in this embodiment, using target-guided information for the second-stage feature extraction can significantly improve the accuracy and relevance of the second visual features. By focusing on features related to the target object type through the guidance algorithm, interference from irrelevant information can be reduced, thereby improving the accuracy and efficiency of feature extraction. This will help to more accurately identify objects in images and distinguish subtle differences between different object types. Simultaneously, the extracted second visual features can also be used for further data processing tasks, such as object classification, scene understanding, and 3D reconstruction, providing strong support for improving the performance of computer vision systems.

[0053] To further illustrate, consider an optional assumption: a dataset of images containing various fruits has been created, and target guidance information matching the target object type of apple has been acquired. This target guidance information may include features such as the apple's color, shape, and texture. This embodiment can use this target guidance information to perform a second-stage feature extraction on a first image containing an apple, focusing on extracting second visual features related to the apple. These second visual features may include the smoothness of the apple's skin, the presence or absence of spots or stripes, etc., which will be used to more accurately identify the apple in the image and distinguish it from other fruit types.

[0054] Optionally, in this embodiment, the second visual feature may be, but is not limited to, the same as, or different from, the first visual feature. For example, the second visual feature may be a feature used to describe the texture structure of an object's surface in an image. Texture is a visual attribute of an object's surface, typically composed of many small, repeating, or periodic patterns. These patterns may be the result of the combined effects of factors such as the object's surface micro-geometry, lighting conditions, and material properties, providing rich information about the object's surface structure and material properties, thereby aiding in object identification and classification. Compared to other visual features such as color and shape, the second visual feature is generally more robust to changes in lighting and viewing angle.

[0055] Optionally, in this embodiment, texture attributes refer to the specific texture characteristics exhibited by the surface of an object. These attributes may include roughness, smoothness, directionality, regularity, etc. For example, the surface of a stone may exhibit rough and irregular texture attributes, while the surface of a piece of glass may exhibit smooth and reflective texture attributes.

[0056] Optionally, in this embodiment, at least one second visual feature related to the target object type refers to a second visual feature closely related to a specific object type. Each object type may have one or more unique second visual features, which are unique to that type of object or are the main differences between that type of object and other types of objects. Taking wood as an example, typical second visual features may include growth rings, grain direction, and density. These features can be used to distinguish different types of wood, such as oak and pine. In image processing, the extraction and analysis of second visual features for wood can help automatically identify and classify different types of wood.

[0057] Optionally, in this embodiment, by comparing at least one second visual feature extracted from the first image with standard visual features, a quantified comparison result can be obtained, thereby accurately determining whether there is an anomaly in the first image. This can also significantly improve the accuracy and efficiency of anomaly identification and reduce false alarms and missed alarms. Furthermore, since the second visual features are relatively robust to changes in illumination and viewing angle, this embodiment has good stability and reliability in practical applications.

[0058] To further illustrate, consider an optional assumption: a dataset containing normal apple images has been provided, from which standard visual features have been extracted. Now, for an apple image to be identified (i.e., the first image), this embodiment extracts at least one second visual feature. Next, this embodiment compares this feature with the standard visual features of the second image to obtain a comparison result. If the comparison result shows that the two are very similar, then this embodiment can consider the apple image to be identified to be normal; if the comparison result shows that the two are significantly different, then this embodiment can consider the image to be abnormal.

[0059] Optionally, in this embodiment, the standard visual features are secondary visual features extracted from images considered "normal" or "no anomalies." These features represent the texture attributes that such objects should normally possess. The standard visual features act as a "benchmark" in anomaly detection. When anomaly detection is required on a new image, its secondary visual features can be compared with the standard visual features to determine whether an anomaly exists. Typically, standard visual features are obtained through feature extraction and statistical analysis of a large number of normal sample images. This may require the use of specific algorithms or tools, such as deep learning models, image processing techniques, etc., to extract and describe these features.

[0060] Optionally, in this embodiment, the comparison result is obtained by comparing the second visual features of the image to be detected with the standard visual features. This result is typically a numerical value or a set of numerical values ​​used to quantify the similarity or difference between the two. The comparison result may include a similarity score, a difference score, a distance metric, etc. For example, if Euclidean distance is used as the similarity metric, then the comparison result will be a numerical value representing the degree of difference between the image to be detected and the standard image in the second visual features.

[0061] Optionally, in this embodiment, the anomaly identification result is a conclusion about whether the image to be detected is abnormal, derived from the comparison results. This result is typically a binary label (normal or abnormal) or a continuous value representing the degree of abnormality. The anomaly identification result is usually determined by comparing the comparison results with a preset threshold or comparison standard. If the comparison result exceeds the threshold or does not meet the comparison standard, the image to be detected is identified as abnormal; otherwise, it is identified as normal.

[0062] It should be noted that this embodiment, by combining a two-stage feature extraction and comparison process and using target guidance information to focus on features related to the target object type, can significantly improve the accuracy and efficiency of anomaly identification, effectively reducing false positives and false negatives. Furthermore, due to the use of multiple features and guidance information, this embodiment also possesses high flexibility and adaptability, and can be applied to different types of objects and anomaly identification tasks.

[0063] In this embodiment, the standard visual features used in the first stage and the standard visual features used in the second stage can be the same features. For example, for sample image a, which is the second image, after the first image obtains basic visual features through the first stage, a similarity comparison is performed between the basic visual image and the standard visual features of sample image a. After the first image obtains second visual features through the second stage, a comparison is performed between the second visual features and the same standard visual feature of sample image a. That is, for the same sample image, feature extraction is not repeatedly performed based on feature comparisons at different stages to obtain its standard visual features. In this case, the standard visual features can be pre-extracted.

[0064] In some cases, feature extraction can be performed twice on the same sample image used as the second image in the aforementioned two stages to obtain standard visual features for feature comparison at different stages. This application does not limit this.

[0065] Further illustrating with an example, as shown in Figure 3, a first image 302 to be identified is acquired, wherein the first image 302 is an image obtained by capturing an object; a first-stage feature extraction is performed on the first image 302 to obtain at least one basic visual feature 304, wherein the basic visual feature 304 is used to represent the visual attributes presented by the object surface, and the object surface includes the surface of the object; from at least one standard visual feature 306, a first visual feature 308 is determined that has a feature similarity to the basic visual feature 304 greater than or equal to a preset threshold, wherein the standard visual feature 306 is a feature obtained by the first-stage extraction of multiple sample images belonging to various object types, and the first visual feature 308 is a feature obtained by the first-stage extraction of a second image belonging to the target object type 310, wherein the multiple sample images include the second image, and the second image is a feature obtained by the first-stage extraction of multiple sample images belonging to various object types. An image is acquired from a standard object of target object type 310; target guidance information 312 matching target object type 310 is obtained, wherein the target guidance information 312 is used to guide attention to features in the image related to target object type 310; through the target guidance information 312, a second-stage feature extraction is performed on the first image 302 to obtain at least one second visual feature 314 related to target object type 310, wherein the second visual feature is used to represent the texture attribute of the object surface; when the standard visual feature 316 of the second image is obtained, the at least one second visual feature 314 and the standard visual feature 316 are compared to obtain a comparison result 318, wherein the standard visual feature 316 is the feature obtained by performing the second-stage feature extraction on the second image; through the comparison result 318, the anomaly recognition result 320 of the first image 302 is determined.

[0066] In the embodiment corresponding to Figure 3, the standard visual feature 306 and the standard visual feature 316 for the same sample image can be the same standard visual feature.

[0067] The embodiments provided in this application extract basic visual features and first visual features, and compare them with standard visual features of the second image to achieve rapid identification of passing objects. Furthermore, by introducing target guidance information, this embodiment not only enables rapid identification of passing objects but also focuses on features related to the target object type. This reduces the computational load in the second-stage feature extraction process and ensures the accuracy of passing object identification. Thus, it achieves the goal of rapid identification of passing objects while maintaining accuracy, thereby improving the technical effect of anomaly detection efficiency.

[0068] As an optional approach, the method further includes the following steps before obtaining target guidance information matching the target object type:

[0069] S1-1, Perform region segmentation on the second image to obtain a first region and a second region, wherein the feature similarity between each second visual feature within the first region or the second region is greater than or equal to a first threshold.

[0070] S1-2, guided by initial guidance information, the second-stage feature extraction process is performed on the image belonging to the target object type, extracting first sample features and second sample features, wherein the first sample features are features with first feature attributes, the second sample features are features with second feature attributes, the first feature attributes are feature attributes within the first region, and the second feature attributes are feature attributes within the second region.

[0071] S1-3, compare the first sample features with the second visual features in the first region to obtain a first result, and compare the second sample features with the second visual features in the second region to obtain a second result;

[0072] S1-4, Adjust the initial guidance information according to the first result and the second result to obtain the target guidance information.

[0073] Optionally, in this embodiment, region segmentation can be a key step in image processing, involving dividing an image into multiple regions with similar visual attributes. These regions can be segmented based on color, texture, shape, or other image features. In region segmentation, the algorithm attempts to identify and aggregate visually similar parts of the image while preserving the boundaries between different parts. The main purpose of region segmentation is to simplify image representation, highlight important image structures, and reduce the complexity of subsequent processing tasks. By dividing an image into regions with similar attributes, it is easier to analyze and understand image content and extract useful information. Region segmentation can be implemented using various methods, including but not limited to thresholding, edge detection, region growing, level set methods, clustering algorithms (such as K-means), graph cut methods, etc.

[0074] Optionally, in this embodiment, the shared feature attributes of the second visual features within a region refer to the common or similar visual characteristics exhibited by the texture within a specific region. These feature attributes can be descriptions based on the texture's shape, size, orientation, density, periodicity, roughness, etc. Understanding the shared feature attributes of the second visual features within a region is crucial for image processing and computer vision tasks. By identifying and analyzing these feature attributes, this embodiment can better understand image content and extract useful information. For example, in object recognition tasks, different objects may have different second visual feature attributes; by identifying and comparing these attributes, object classification and recognition can be achieved.

[0075] Optionally, the initial guidance information provides this embodiment with basic information or preliminary identification results regarding the type of the target object. This information may be basic features such as the object's shape, color, and texture, or results obtained from the first-stage feature extraction. Based on this initial guidance information, this embodiment can perform the second-stage feature extraction on the image more effectively.

[0076] Optionally, in this embodiment, the image is divided into multiple image blocks. Each image block represents a local region of the image and contains information about the pixels within that region. By analyzing and processing these image blocks, this embodiment can extract more detailed and specific feature information.

[0077] Specifically, this embodiment utilizes initial guidance information to guide the feature extraction process for image patches. It may focus on image patches related to the initial guidance information and extract their features as first-sample features. Simultaneously, this embodiment also considers the relationships and contextual information between image patches, extracting higher-level features related to the target object type as second-sample features.

[0078] In this way, image patches serve as carriers of local information. Guided by initial guidance information, this embodiment can more accurately extract feature information related to the target object type, thereby improving the efficiency and accuracy of feature extraction and providing strong support for subsequent classification, recognition, or reconstruction tasks.

[0079] It should be noted that this embodiment first performs region segmentation and feature extraction on the second image, which can more accurately identify and extract second visual features related to the target object type. Secondly, by guiding the algorithm to focus on textures with specific characteristic attributes, the targeting and accuracy of feature extraction can be further improved. Finally, by comparing and verifying the sample features and the second visual features within the region, the effectiveness and accuracy of the finally determined focus feature attributes can be ensured, thereby significantly improving the accuracy and efficiency of subsequent anomaly identification.

[0080] To further illustrate, an optional assumption is to identify a fabric with a specific texture. First, this embodiment acquires a sample image of this fabric (the second image) and performs region segmentation, obtaining two regions with similar second visual features: a first region and a second region. Next, this embodiment analyzes the second visual features within these two regions, extracting their shared first and second feature attributes, such as texture density and direction. Then, this embodiment guides the algorithm to focus on textures with these feature attributes, extracting first and second sample features from other images belonging to this fabric. Next, this embodiment compares these sample features with the second visual features within the original regions, obtaining a first result and a second result. If both results indicate that the similarity between the sample features and the second visual features within the regions exceeds a set threshold (a second threshold and a third threshold), then this embodiment confirms that the first and second feature attributes are valid features of interest. Otherwise, this embodiment needs to adjust these feature attributes and repeat the above process until the accurate feature of interest is found.

[0081] The embodiments provided in this application involve region segmentation of a second image to obtain a first region and a second region, wherein the feature similarity between each second visual feature within the first or second region is greater than or equal to a first threshold. Initial guidance information guides the second-stage feature extraction process for images belonging to the target object type, extracting first sample features and second sample features. The first sample features are features with a first feature attribute, and the second sample features are features with a second feature attribute. The first feature attribute is a feature attribute within the first region, and the second feature attribute is a feature attribute within the second region. The first sample features and the second visual features within the first region are compared to obtain a first result, and the second sample features and the second visual features within the second region are compared to obtain a second result. The initial guidance information is adjusted based on the first and second results to obtain target guidance information, thereby improving the targeting and accuracy of feature extraction and providing effective support for subsequent efficient anomaly identification.

[0082] As an optional approach, when at least one second visual feature is multiple second visual features, the at least one second visual feature is compared with the standard visual features of the second image to obtain the comparison result, including:

[0083] S2-1, obtain the first distance between multiple second visual features and the standard visual features of the second image in the feature space, wherein the distance in the feature space is inversely related to the similarity between the features;

[0084] S2-2, determine the first feature that is furthest away and the second feature that is closest to the first visual feature from multiple second visual features;

[0085] S2-3, Obtain the second distance between the first feature and the second feature in the feature space;

[0086] S2-4, Based on the second distance, obtain the comparison results.

[0087] Optionally, in this embodiment, the feature space is a high-dimensional space used to represent and compare image features. In this space, each point represents a feature, and the coordinates of the point are composed of the various dimensions of the feature. The feature space provides a method for visualizing and quantifying the relationships between features, making it easier to understand and compare the similarities and differences between different features.

[0088] Optionally, in this embodiment, distance is a metric used in the feature space to quantify the similarity or difference between two features. This distance is typically calculated using a specific distance metric function, such as Euclidean distance, Manhattan distance, or cosine similarity. Different distance metric functions may emphasize different feature attributes or characteristics. In the feature space, the distance between two points reflects the similarity between the features they represent. The closer the distance, the higher the similarity; the farther the distance, the lower the similarity. This similarity metric helps us understand the relationships between different features and which features are more important or useful in a specific task.

[0089] It should be noted that this embodiment can obtain a quantitative assessment of the similarity between at least one second visual feature and a standard visual feature. This can help identify the second visual features that are most similar to and least similar to the standard visual features, and understand the degree of difference between them. This information is very valuable for subsequent image processing and analysis tasks (such as object recognition, scene understanding, etc.) because it can help to more accurately understand and interpret image content.

[0090] To further illustrate, consider a possible set of second visual features A, B, and C, and a standard visual feature S. In the feature space, the distances between A, B, C, and S are measured. It is found that A is furthest from S, and C is closest to S. Next, the distance between A and C (the second distance) is calculated. If this distance is large, it indicates a significant difference between these second visual features, and the comparison result may suggest that these second visual features do not match the standard visual feature well. If this distance is small, it indicates a smaller difference between these second visual features, and the comparison result may be more positive.

[0091] Through the embodiments provided in this application, a first distance in feature space is obtained between each of the at least one second visual features and a standard visual feature, wherein the distance in feature space is inversely related to the similarity between features; a first feature with the farthest first distance and a second feature with the closest first distance are determined from the at least one second visual feature; a second distance in feature space is obtained between the first feature and the second feature; and a comparison result is obtained based on the second distance, thereby achieving the purpose of more accurately understanding and interpreting image content, and thus realizing the technical effect of improving the accuracy of the comparison result.

[0092] As an optional approach, the comparison results are obtained based on the second distance, including:

[0093] S3-1, Calculate the third distance between the third feature and the first feature in the feature space, where the third feature is the second visual feature that is adjacent to the second feature in the feature space;

[0094] S3-2, the second distance is robustly optimized using the third distance to obtain the target distance, wherein the comparison result includes the target distance;

[0095] As an optional approach, when at least one basic visual feature is a first number of basic visual features, a first-stage feature extraction is performed on the first image to obtain at least one basic visual feature, including:

[0096] S4-1, the first image is divided into a first number of image blocks, wherein the image blocks are used to represent local image regions of the first image;

[0097] S4-2, Perform the first stage of feature extraction on the first number of image blocks to obtain the first image features corresponding to each image block as the basic visual features;

[0098] When at least one second visual feature is a second number of second visual features, a second-stage feature extraction is performed on the first image to obtain at least one second visual feature related to the target object type, including:

[0099] S5-1, the first image is divided into a second number of image blocks, wherein the image blocks are used to represent local image regions of the first image;

[0100] S5-2, perform the second stage of feature extraction on the second number of image blocks to obtain the second image features corresponding to each image block as the second visual features.

[0101] Optionally, in this embodiment, an image patch can be a group of adjacent pixel regions in an image that have the same characteristics (such as grayscale values). In image processing, these adjacent pixel regions are usually treated as a whole for processing and analysis. The concept of image patches has applications in various image processing tasks, such as feature extraction, image compression, and image enhancement. By processing image patches, information in the image can be extracted more effectively, and computational complexity can be reduced.

[0102] Optionally, in this embodiment, the first quantity and the second quantity can be different. Considering the features of the image patches extracted from the first stage, which constitute basic visual features for determining object type, the accuracy requirement is not as stringent. Therefore, a smaller first quantity can be used to coarse the granularity of the image patches, thereby improving the feature extraction efficiency of the first stage. However, for the feature extraction in the second stage, which needs to be more relevant to the target object type, the accuracy requirement becomes more stringent. Therefore, a larger second quantity can be used to refine the granularity of the image patches, thereby improving the accuracy of feature extraction in the second stage. In other words, the first quantity in this embodiment can be smaller than the second quantity.

[0103] It should be noted that this embodiment employs a two-stage feature extraction method. First, basic feature information is obtained from various local regions of the image. Then, in the second stage, through more targeted processing, features closely related to the target object type can be further extracted, thereby improving the accuracy of target object recognition, especially in cases of complex backgrounds or when the target object is not significantly different from the background.

[0104] To further illustrate, consider an optional assumption: a first image containing a basket of fruit. In the first stage, the image is segmented into multiple image patches, each containing a portion of a fruit. From these image patches, basic visual features such as color and shape can be extracted as first image features. In the second stage, if the target object is known to be an apple, more attention might be paid to image patches associated with apple features, and more specific features, such as the apple's red color, specific shape, or texture, could be extracted as second image features.

[0105] The embodiments provided in this application divide a first image into a first number of image blocks, wherein each image block represents a local image region of the first image; a first-stage feature extraction is performed on the first number of image blocks to obtain first image features corresponding to each image block, wherein the basic visual features include the first image features; the first image is then divided into a second number of image blocks, wherein each image block represents a local image region of the first image; a second-stage feature extraction is performed on the second number of image blocks to obtain second image features corresponding to each image block, wherein the second visual features include the second image features. This achieves the goal of extracting features in two stages at the granularity of image blocks, thereby improving the technical effect of improving the recognition accuracy of target objects.

[0106] As an optional approach, after performing a second-stage feature extraction on a second number of image patches to obtain the second image features corresponding to each image patch, the method further includes:

[0107] Perform the following steps until the number of key image features stored according to the target object type is greater than or equal to a quantity threshold:

[0108] S6-1, Determine the current image features from the second image features corresponding to each image block;

[0109] S6-2 calculates the second image features corresponding to each image block, excluding the current image features, and the fourth distance of the current image features in the feature space. The distance in the feature space is inversely related to the similarity between features.

[0110] S6-3 identifies the second image feature with the largest fourth distance as the key image feature;

[0111] S6-4, Store the obtained key image features according to the target object type;

[0112] S6-5, if the number of key image features obtained is less than the number threshold, determine the next image feature from the second image features corresponding to each image block, and use the next image feature as the current image feature.

[0113] Optionally, in this embodiment, key image features are used to update the standard visual features belonging to the target object type. For example, when performing the second stage of feature extraction on the second number of image blocks to obtain the second image features corresponding to each image block, the standard visual features belonging to the target object type are the standard visual features of the second image. After obtaining the key image features, the standard visual features of the second image can be optimized through the key image features to be closer to the true expression belonging to the target object type, thereby realizing the update of the target object type.

[0114] It should be noted that this embodiment allows for the gradual extraction of key image features most relevant to the target object type. These key features are not only highly correlated with the target object type but also exhibit significant differences from one another, enabling the pre-storage of some key characteristics that more comprehensively describe the target object type. Once a sufficient number of key image features are obtained, these features can be used for subsequent classification, recognition, or reconstruction tasks, balancing the accuracy and efficiency of anomaly detection.

[0115] To further illustrate, consider an optional assumption: an image containing multiple fruits, with the target object being apples. After extracting second image features from multiple image patches in the second stage, the iterative process begins. First, a feature from one image patch is selected as the current image feature. Then, the fourth distance between this feature and the features of all other image patches is calculated. If a feature from one image patch is found to have the largest difference from the current image feature, this feature is identified as a key image feature and stored. If the number of stored key image features is insufficient, the process continues by selecting the feature from the next image patch as the current image feature and repeating the above process. The repetition ends when the number of stored key image features is sufficient.

[0116] The embodiments provided in this application involve performing the following steps until key image features stored according to the target object type are obtained: determining the current image feature from the second image features corresponding to each image block; calculating the other image features besides the current image feature from the second image features corresponding to each image block, and calculating the fourth distance of the current image feature in the feature space, wherein the distance in the feature space is inversely related to the similarity between features; determining the second image feature with the largest fourth distance as the key image feature; storing the obtained key image features according to the target object type when the number of obtained key image features is greater than or equal to a number threshold; determining the next image feature from the second image features corresponding to each image block when the number of obtained key image features is less than the number threshold, and using the next image feature as the current image feature, thereby achieving the purpose of pre-storing some key characteristics that can more comprehensively describe the target object type, thus achieving the technical effect of balancing the accuracy and efficiency of anomaly recognition.

[0117] As an optional approach, after performing a first-stage feature extraction on the first image to obtain at least one basic visual feature, the method further includes:

[0118] S7-1, when the feature similarity between each standard visual feature from at least one standard visual feature and the basic visual feature is less than a preset threshold, the object type to which the object belongs is set as an additional object type, wherein the additional object type is a type other than the rated object type to which multiple sample images belong.

[0119] S7-2: Collect standard object objects of additional object types to obtain multiple third images, and use the third images as sample images.

[0120] Optionally, in this embodiment, in the field of image recognition and processing, a database (multiple sample images) containing various known object types is typically constructed. This database is used to train algorithms to recognize these known types. However, in practical applications, the system may encounter objects not in this database, requiring a mechanism to handle these unknown, additional object types. In other words, these additional object types are any object types that do not belong to the known or predefined nominal object types.

[0121] To further illustrate, consider an optional assumption: a recognition system capable of identifying two object types, "cat" and "dog." When the system encounters an image containing a "rabbit," it extracts basic visual features. Then, the system compares these features to the standard visual features of both "cat" and "dog." If the similarity to both is below a preset threshold, the system identifies it as an additional object type, namely "rabbit." Next, the system acquires an image of a standard "rabbit" object—a third image—and adds it to the sample image.

[0122] It should be noted that the additional steps enhance the anomaly detection capability, enabling it to handle a wider variety of objects. By dynamically adding newly identified object types to the sample image library, it can gradually adapt to and recognize a broader range of scenes and objects. Furthermore, this embodiment can improve the flexibility of anomaly detection, allowing it to better cope with various challenges and changes in practical applications.

[0123] Through the embodiments provided in this application, when the feature similarity between each standard visual feature from at least one standard visual feature and the basic visual feature is less than a preset threshold, the object type to which the object belongs is set as an additional object type, wherein the additional object type is a type other than the rated object type to which multiple sample images belong; standard object objects of the additional object type are collected to obtain multiple third images, and the third images are used as sample images, thereby achieving the purpose of the additional steps to enhance the ability of anomaly recognition, enabling it to handle more types of object objects, thereby achieving the technical effect of improving the flexibility of anomaly recognition.

[0124] As an optional solution, for ease of understanding, the above-mentioned anomaly identification method is applied to a continuous detection scenario, as shown in Figure 4, to achieve the location of abnormal products in a continuous detection scenario.

[0125] Optionally, in this embodiment, as shown in FIG5, the process of the above-mentioned anomaly recognition method includes a training process and a testing process, and is executed by the Continual Prompting Module (CPM) and the Structure-based Contrastive Learning (SCL) module.

[0126] Specifically, the continuous prompt module CPM is mainly divided into three parts: first, there is a Key for one-to-one image-to-module retrieval; second, there is a Prompt for extracting category-specific features; and finally, there is a Knowledge part for storing category-specific features.

[0127] The Structure-based Contrastive Learning (SCL) module aims to distill the knowledge of SAM (Segment Everything Model) into the underlying ViT. By training the Prompt, the feature representation of a specific item in ViT can be modified. This Prompt is a small-scale additional parameter that can be directly superimposed on each layer of features in ViT, enabling fine-tuning of the model.

[0128] Optionally, in the training process, the training images are first feature-extracted using ViT and stored in Key for use during testing. Next, the images are processed by SAM to obtain segmentation maps. Different regions in the segmentation maps have consistent second visual features and corresponding labels. This information is then passed to the feature maps extracted by ViT, enabling it to perform contrastive learning based on the SAM segmentation maps.

[0129] After obtaining the features extracted by ViT and the labels obtained by SAM, contrastive learning can be performed on this basis. The gradients are then backpropagated. The contrastive loss function used in Figure 5 is used for loss calculation during the contrastive learning process. First, the feature map is normalized. Next, the similarity matrix of the patch features is calculated. A mask is created based on the consistency of the labels, and this mask is used to calculate the loss, ensuring that features with the same label are close to each other, while features with different labels are far apart. The definition is as follows:

[0130] Furthermore, after 50 epochs of training, Prompt was optimized and, combined with the features extracted by ViT, was able to better represent the specific secondary visual features of different items. Subsequently, the model extracts features from all images for each category and uses the farthest point sampling (FPS) algorithm to select the most representative features and store them in the Knowledge part of CPM.

[0131] In the testing process, the input image (Inference Image) in Figure 5 is first subjected to feature extraction using ViT. These features are compared one by one with the features in the Key module of CPM to find the product category and corresponding CPM corresponding to the input image. Once the corresponding CPM is found, the input image is processed again by ViT, but this time an additional Prompt for the corresponding CPM is added to extract category-specific features. After obtaining these features, they are compared with the Knowledge in the corresponding CPM to calculate the anomaly score of the corresponding patch. The method of retrieving the Key corresponding to the CPM based on the feature and comparing it with the Knowledge in the CPM based on the feature is the same. In this way, the method can accurately identify and locate abnormal regions in the input image, providing an important basis for subsequent anomaly handling and analysis.

[0132] Suppose that all the features of an image are x test A certain feature m test For x test Given any feature in K, where the Key or Knowledge to be retrieved is K, and m is any feature in K, then the desired retrieval is m. test The feature closest to K can be obtained using PatchCore as shown in the following formulas (1), (2), and (3): s * =||m test,* -m * ||2 (2)

[0133] For any x test Both K and x can have their abnormal scores calculated using formulas (1), (2), and (3) above. Formula (1) can find the abnormal scores for x. test Representative feature of the distance m between K and K test,* ,m * m test,* It is x test The feature furthest from K at mid-range, while m * It is the distance m in K. test,* The closest feature. As shown in formula (2), the basic anomaly score s can be obtained by calculating the distance between the two.* In formula (3), m in K is further calculated. * The neighbor and m test,* The distance can make the score s more robust.

[0134] Specific features of a particular type are extracted and compared with the Knowledge in CPM, thus determining the anomaly score for each patch in the test image. This completes the calculation of the overall anomaly score for the test image. To obtain a more detailed anomaly distribution, the test image is divided into different patches, and the score for each patch is calculated in the same way, but at this point, only the features of the patch to be calculated are considered and processed.

[0135] In this way, an anomaly score can be obtained for each patch in the test image. To obtain a continuous and more intuitive anomaly distribution map, these discrete patch scores are further processed using Gaussian smoothing. Gaussian smoothing ensures that the anomaly scores form a continuous, gradually changing distribution in the image space, thus generating the final anomaly score map. This anomaly score map not only clearly shows which regions in the image contain anomalies, but also reflects the severity of the anomalies through the gradual change in scores, providing an intuitive and powerful tool for subsequent anomaly processing and analysis.

[0136] The anomaly identification method provided through the embodiments of this application demonstrates its high degree of flexibility. Specifically, the algorithm used to calculate the anomaly score map is not limited to Gaussian smoothing mentioned above; it can be replaced with other suitable algorithms according to actual needs or application scenarios. This flexibility ensures that the method can adapt to different data distributions and anomaly types, thereby improving its practicality and accuracy.

[0137] In addition to the flexibility of the anomaly score map calculation algorithm, the training process in this embodiment also allows for adjustments. For example, the contrastive learning loss function can be modified or replaced according to specific needs to adapt to different training objectives and data characteristics. Similarly, the contrastive learning method during training can also be adjusted as needed, which provides the possibility for further optimization of model performance.

[0138] Furthermore, this embodiment employs Prompt to enhance the model's ability to extract relevant category information without altering the backbone network structure. This strategy effectively improves the model's representational power and accuracy. However, the embodiment also provides an alternative: adding an adaptation module after the backbone network. This adaptation module can achieve a similar effect to Prompt, enhancing the model's sensitivity to specific information by fine-tuning or expanding the network structure. This flexibility ensures that the model's performance can be optimized without changing its core structure, thereby reducing the complexity and cost of model tuning.

[0139] It is understood that in the specific embodiments of this application, data such as user information are involved. When the above embodiments of this application are applied to specific products or technologies, user permission or consent is required, and the collection, use and processing of related data must comply with the relevant laws, regulations and standards of the relevant countries and regions.

[0140] It should be noted that, for the sake of simplicity, the foregoing method embodiments are all described as a series of actions. However, those skilled in the art should understand that this application is not limited to the described order of actions, as some steps may be performed in other orders or simultaneously according to this application. Furthermore, those skilled in the art should also understand that the embodiments described in the specification are preferred embodiments, and the actions and modules involved are not necessarily essential to this application.

[0141] According to another aspect of the embodiments of this application, an anomaly identification apparatus for implementing the above-described anomaly identification method is also provided. As shown in FIG6, the apparatus includes:

[0142] The first acquisition unit 602 is used to acquire a first image to be identified, wherein the first image is an image obtained by acquiring an object;

[0143] The first extraction unit 604 is used to perform a first-stage feature extraction on the first image to obtain at least one basic visual feature, wherein the basic visual feature is used to represent the visual attributes presented by the object surface, and the object surface includes the surface of the object.

[0144] The first determining unit 606 is used to determine a first visual feature from at least one standard visual feature whose feature similarity to the basic visual feature is greater than or equal to a preset threshold. The standard visual feature is a feature obtained by feature extraction from multiple sample images belonging to multiple object types. The sample image corresponding to the first visual feature is a second image belonging to the target object type.

[0145] The second acquisition unit 608 is used to acquire target guidance information matched by the first visual feature, wherein the target guidance information is used to guide the features in the image of interest that are related to the type of the target object.

[0146] The second extraction unit 610 is used to perform a second-stage feature extraction on the first image using target guidance information to obtain at least one second visual feature related to the target object type.

[0147] The first comparison unit 612 is used to compare at least one second visual feature with the standard visual features of the second image to obtain a comparison result.

[0148] The second determining unit 614 is used to determine the anomaly recognition result of the first image by comparing the results.

[0149] For specific implementation examples, please refer to the examples shown in the above anomaly identification method; these examples will not be repeated here.

[0150] As an optional solution, the device also includes:

[0151] The segmentation unit is used to perform region segmentation on the second image before obtaining target guidance information matching the target object type, to obtain a first region and a second region, wherein the feature similarity between each second visual feature in the first region or the second region is greater than or equal to a first threshold.

[0152] The third extraction unit is used to guide the second-stage feature extraction process of the image belonging to the target object type before obtaining the target guidance information matching the target object type. It extracts the first sample feature and the second sample feature, wherein the first sample feature is a feature with a first feature attribute, the second sample feature is a feature with a second feature attribute, the first feature attribute is a feature attribute in the first region, and the second feature attribute is a feature attribute in the second region.

[0153] The second comparison unit is used to compare the first sample features and the second visual features in the first region before obtaining the target guidance information matching the target object type, to obtain a first result, and to compare the second sample features and the second visual features in the second region, to obtain a second result.

[0154] The third determining unit is used to adjust the initial guidance information according to the first result and the second result before obtaining the target guidance information that matches the target object type, so as to obtain the target guidance information.

[0155] For specific implementation examples, please refer to the examples shown in the above anomaly identification method; these examples will not be repeated here.

[0156] As an optional solution, when the at least one second visual feature is multiple second visual features, the first comparison unit 612 includes:

[0157] The first acquisition module is used to acquire the first distance between multiple second visual features and the standard visual features of the second image in the feature space, wherein the distance in the feature space is inversely related to the similarity between the features;

[0158] The first determining module is used to determine, from multiple second visual features, the first feature that is furthest away and the second feature that is closest to the first distance;

[0159] The second acquisition module is used to acquire the second distance between the first feature and the second feature in the feature space;

[0160] The third acquisition module is used to obtain the comparison results based on the second distance.

[0161] For specific implementation examples, please refer to the examples shown in the above anomaly identification method; these examples will not be repeated here.

[0162] As an optional solution, the third acquisition module includes:

[0163] A calculation submodule is used to calculate the third distance between the third feature and the first feature in the feature space, wherein the third feature is a second visual feature that is adjacent to the second feature in the feature space;

[0164] The optimization submodule is used to robustly optimize the second distance using the third distance to obtain the target distance, wherein the comparison result includes the target distance;

[0165] The second determining unit 614 includes:

[0166] The second determining module is used to determine that the anomaly identification result is that the first image is in an abnormal state when the target distance is greater than or equal to a preset distance threshold.

[0167] The third determining module is used to determine that the anomaly identification result is that the first image is in a normal state when the target distance is less than a preset distance threshold.

[0168] For specific implementation examples, please refer to the examples shown in the above anomaly identification method; these examples will not be repeated here.

[0169] As an optional approach, when the at least one basic visual feature is a first number of basic visual features, the first extraction unit 604 includes:

[0170] The first segmentation module is used to segment the first image into a first number of image blocks, wherein the image blocks are used to represent local image regions of the first image;

[0171] The first extraction module is used to perform a first-stage feature extraction on a first number of image blocks to obtain the first image features corresponding to each image block as basic visual features.

[0172] When the at least one second visual feature is a second number of second visual features, the second extraction unit 610 includes:

[0173] The second segmentation module is used to segment the first image into a second number of image blocks, wherein the image blocks are used to represent local image regions of the first image;

[0174] The second extraction module is used to perform a second-stage feature extraction on a second number of image blocks to obtain the second image features corresponding to each image block as the second visual features.

[0175] For specific implementation examples, please refer to the examples shown in the above anomaly identification method; these examples will not be repeated here.

[0176] As an optional solution, the device also includes:

[0177] After performing the second-stage feature extraction on a second number of image blocks to obtain the second image features corresponding to each image block, the execution module performs the following steps until the number of key image features stored according to the target object type is greater than or equal to the number threshold:

[0178] The current image features are determined from the second image features corresponding to each image block;

[0179] Calculate the second image features corresponding to each image patch, excluding the current image features, and calculate the fourth distance of the current image features in the feature space. The distance in the feature space is inversely related to the similarity between features.

[0180] The second image feature with the largest fourth distance is identified as the key image feature;

[0181] If the number of obtained key image features is greater than or equal to the number threshold, the obtained key image features are stored according to the target object type.

[0182] If the number of key image features obtained is less than the number threshold, the next image feature is determined from the second image features corresponding to each image block, and the next image feature is used as the current image feature.

[0183] For specific implementation examples, please refer to the examples shown in the above anomaly identification method; these examples will not be repeated here.

[0184] As an optional solution, the device also includes:

[0185] The setting unit is used to set the object type to which the object belongs to an additional object type after performing feature extraction in the first stage on the first image to obtain at least one basic visual feature, and when the feature similarity between each standard visual feature from at least one standard visual feature and the basic visual feature is less than a preset threshold, wherein the additional object type is a type other than the rated object type to which multiple sample images belong.

[0186] The acquisition unit is used to acquire standard object objects of additional object types after performing feature extraction on the first image in the first stage to obtain at least one basic visual feature, and to obtain multiple third images, and to use the third images as sample images.

[0187] For specific implementation examples, please refer to the examples shown in the above anomaly identification method; these examples will not be repeated here.

[0188] According to another aspect of the embodiments of this application, an electronic device for implementing the above-described anomaly identification method is also provided. The electronic device may be, but is not limited to, the user device 102 or the server 112 shown in FIG1. ​​This embodiment takes the user device 102 as an example for illustration. Further, as shown in FIG7, the electronic device includes a memory 702 and a processor 704. The memory 702 stores a computer program, and the processor 704 is configured to execute the steps in any of the above-described method embodiments through the computer program.

[0189] Optionally, in this embodiment, the aforementioned electronic device may be located in at least one of a plurality of network devices in a computer network.

[0190] Optionally, in this embodiment, the processor can be configured to perform the following steps via a computer program:

[0191] S1, acquire the first image to be identified, wherein the first image is an image of the object being acquired;

[0192] S2, perform a first-stage feature extraction on the first image to obtain at least one basic visual feature, wherein the basic visual feature is used to represent the visual attributes presented by the object surface, and the object surface includes the surface of the object.

[0193] S3, determine a first visual feature from at least one standard visual feature whose feature similarity to the basic visual feature is greater than or equal to a preset threshold, wherein the standard visual feature is a feature obtained by feature extraction from multiple sample images belonging to multiple object types, and the sample image corresponding to the first visual feature is a second image belonging to the target object type.

[0194] S4, obtain target guidance information for first visual feature matching, wherein the target guidance information is used to guide attention to features in the image related to the target object type;

[0195] S5, using target guidance information, perform a second-stage feature extraction on the first image to obtain at least one second visual feature related to the target object type;

[0196] S6, compare at least one second visual feature with the standard visual features of the second image to obtain the comparison result;

[0197] S7. By comparing the results, determine the anomaly identification result of the first image.

[0198] Optionally, those skilled in the art will understand that the structure shown in FIG7 is merely illustrative and does not limit the structure of the electronic device described above. For example, the electronic device may also include more or fewer components (such as network interfaces) than shown in FIG7, or have a different configuration than shown in FIG7.

[0199] The memory 702 can be used to store software programs and modules, such as the program instructions / modules corresponding to the anomaly identification method and apparatus in this embodiment. The processor 704 executes various functional applications and data processing by running the software programs and modules stored in the memory 702, thereby realizing the aforementioned anomaly identification method. The memory 702 may include high-speed random access memory, and may also include non-volatile memory, such as one or more magnetic storage devices, flash memory, or other non-volatile solid-state memory. In some instances, the memory 702 may further include memory remotely located relative to the processor 704, and these remote memories can be connected to electronic devices via a network. Examples of such networks include, but are not limited to, the Internet, corporate intranets, local area networks, mobile communication networks, and combinations thereof. Specifically, the memory 702 may be used, but is not limited to, to store information such as a first image, a second image, and anomaly identification results. As an example, as shown in FIG7, the memory 702 may include, but is not limited to, the first acquisition unit 602, the first extraction unit 604, the first determination unit 606, the second acquisition unit 608, the second extraction unit 610, the first comparison unit 612, and the second determination unit 614 of the anomaly detection device. Furthermore, it may include, but is not limited to, other module units of the anomaly detection device, which will not be elaborated in this example.

[0200] Optionally, the transmission device 706 described above is used to receive or send data via a network. Specific examples of the network described above may include wired networks and wireless networks. In one example, the transmission device 706 includes a Network Interface Controller (NIC), which can be connected to other network devices and a router via a network cable to communicate with the Internet or a local area network. In another example, the transmission device 706 is a radio frequency (RF) module, used for wireless communication with the Internet.

[0201] In addition, the aforementioned electronic device also includes: a display 708 for displaying the first image, the second image, and information such as the anomaly recognition result; and a connection bus 710 for connecting the various module components in the aforementioned electronic device.

[0202] In other embodiments, the aforementioned user equipment or server can be a node in a distributed system, wherein the distributed system can be a blockchain system, which is a distributed system formed by connecting multiple nodes through network communication. The nodes can form a peer-to-peer network, and any form of computing device, such as a server, user equipment, or other electronic device, can become a node in the blockchain system by joining this peer-to-peer network.

[0203] According to one aspect of this application, a computer program product is provided, comprising a computer program / instructions containing program code for performing the methods shown in the flowchart. In such embodiments, the computer program can be downloaded and installed from a network via a communication component, and / or installed from a removable medium. When the computer program is executed by a central processing unit, it performs various functions provided in embodiments of this application.

[0204] The sequence numbers of the embodiments in this application are for descriptive purposes only and do not represent the superiority or inferiority of the embodiments.

[0205] It should be noted that the computer system of the electronic device is merely an example and should not impose any limitations on the functionality and scope of use of the embodiments of this application.

[0206] A computer system includes a Central Processing Unit (CPU), which performs various appropriate actions and processes based on programs stored in Read-Only Memory (ROM) or loaded from RAM. ROM also stores various programs and data required for system operation. The CPU, ROM, and RAM are interconnected via a bus. Input / output interfaces (I / O interfaces) are also connected to the bus.

[0207] The following components are connected to the input / output interface: input sections including keyboards, mice, etc.; output sections including cathode ray tubes (CRTs), liquid crystal displays (LCDs), and speakers; storage sections including hard drives; and communication sections including network interface cards such as LAN cards and modems. The communication section performs communication processing via a network such as the Internet. Drives are also connected to the input / output interface as needed. Removable media, such as disks, optical discs, magneto-optical discs, semiconductor memories, etc., are installed on the drive as needed so that computer programs read from them can be installed into the storage section as required.

[0208] Specifically, according to embodiments of this application, the processes described in the various method flowcharts can be implemented as computer software programs. For example, embodiments of this application include a computer program product comprising a computer program carried on a computer-readable medium, the computer program containing program code for performing the methods shown in the flowcharts. In such embodiments, the computer program can be downloaded and installed from a network via a communication component, and / or installed from a removable medium. When the computer program is executed by a central processing unit, it performs various functions defined in the system of this application.

[0209] According to one aspect of this application, a computer-readable storage medium is provided, wherein a processor of a computer device reads a computer program from the computer-readable storage medium, and the processor executes the computer program, causing the computer device to perform the methods provided in the various alternative implementations described above.

[0210] Optionally, in this embodiment, the computer-readable storage medium described above may be configured to store a computer program for performing the following steps:

[0211] S1, acquire the first image to be identified, wherein the first image is an image of the object being acquired;

[0212] S2, perform a first-stage feature extraction on the first image to obtain at least one basic visual feature, wherein the basic visual feature is used to represent the visual attributes presented by the object surface, and the object surface includes the surface of the object.

[0213] S3, determine a first visual feature from at least one standard visual feature whose feature similarity to the basic visual feature is greater than or equal to a preset threshold, wherein the standard visual feature is a feature obtained by feature extraction from multiple sample images belonging to multiple object types, and the sample image corresponding to the first visual feature is a second image belonging to the target object type.

[0214] S4, obtain target guidance information for first visual feature matching, wherein the target guidance information is used to guide attention to features in the image related to the target object type;

[0215] S5, using target guidance information, perform a second-stage feature extraction on the first image to obtain at least one second visual feature related to the target object type;

[0216] S6, compare at least one second visual feature with the standard visual features of the second image to obtain the comparison result;

[0217] S7. By comparing the results, determine the anomaly identification result of the first image.

[0218] Optionally, in embodiments of this application, the terms "module" or "unit" refer to a computer program or part of a computer program that has a predetermined function and works with other related parts to achieve a predetermined goal, and can be implemented wholly or partially using software, hardware (such as processing circuitry or memory), or a combination thereof. Similarly, a processor (or multiple processors or memory) can be used to implement one or more modules or units. Furthermore, each module or unit can be part of an overall module or unit that includes the functionality of that module or unit.

[0219] Optionally, in this embodiment, those skilled in the art will understand that all or part of the steps in the various methods of the above embodiments can be implemented by a program instructing related hardware of an electronic device. The program can be stored in a computer-readable storage medium, which may include: flash drive, read-only memory (ROM), random access memory (RAM), disk or optical disk, etc.

[0220] The sequence numbers of the embodiments in this application are for descriptive purposes only and do not represent the superiority or inferiority of the embodiments.

[0221] If the integrated units in the above embodiments are implemented as software functional units and sold or used as independent products, they can be stored in the aforementioned computer-readable storage medium. Based on this understanding, the technical solution of this application, in essence, or the part that contributes to the prior art, or all or part of the technical solution, can be embodied in the form of a software product. This computer software product is stored in a storage medium and includes several instructions to cause one or more computer devices (which may be personal computers, servers, or network devices, etc.) to execute all or part of the steps of the methods of the various embodiments of this application.

[0222] In the above embodiments of this application, the descriptions of each embodiment have different focuses. For parts not described in detail in a certain embodiment, please refer to the relevant descriptions of other embodiments.

[0223] In the several embodiments provided in this application, it should be understood that the disclosed user equipment can be implemented in other ways. The device embodiments described above are merely illustrative; for example, the division of units is only a logical functional division, and in actual implementation, there may be other division methods. For example, multiple units or components may be combined or integrated into another system, or some features may be ignored or not executed. Furthermore, the coupling or direct coupling or communication connection shown or discussed may be through some interfaces, indirect coupling or communication connection between units or modules, and may be electrical or other forms.

[0224] The units described as separate components may or may not be physically separate. The components shown as units may or may not be physical units; that is, they may be located in one place or distributed across multiple network units. Some or all of the units can be selected to achieve the purpose of this embodiment according to actual needs.

[0225] Furthermore, the functional units in the various embodiments of this application can be integrated into one processing unit, or each unit can exist physically separately, or two or more units can be integrated into one unit. The integrated unit can be implemented in hardware or as a software functional unit.

[0226] The above description is only a preferred embodiment of this application. It should be noted that for those skilled in the art, several improvements and modifications can be made without departing from the principle of this application, and these improvements and modifications should also be considered within the scope of protection of this application.

Claims

1. An anomaly detection method, the method being executed by an electronic device, the method comprising: Acquire a first image to be identified, wherein the first image is an image of the object being acquired; The first image is subjected to a first-stage feature extraction to obtain at least one basic visual feature, wherein the basic visual feature is used to represent the visual attributes presented by the object surface, and the object surface includes the surface of the object. From at least one standard visual feature, a first visual feature is determined that has a feature similarity to the basic visual feature that is greater than or equal to a preset threshold. The standard visual feature is a feature obtained by feature extraction from multiple sample images belonging to multiple object types. The sample image corresponding to the first visual feature is a second image belonging to the target object type. Obtain target guidance information matching the first visual feature, wherein the target guidance information is used to guide attention to features in the image related to the target object type; Using the target guidance information, the first image is subjected to a second stage of feature extraction to obtain at least one second visual feature related to the target object type; The at least one second visual feature is compared with the standard visual features of the second image to obtain a comparison result; The anomaly detection result of the first image is determined based on the comparison results.

2. The method according to claim 1, further comprising, before obtaining the target guidance information matching the target object type: The second image is segmented to obtain a first region and a second region, wherein the feature similarity within the first region or the second region is greater than or equal to a first threshold. The process of feature extraction in the second stage is guided by initial guidance information to an image belonging to the target object type. First sample features and second sample features are extracted, wherein the first sample features are features with a first feature attribute, and the second sample features are features with a second feature attribute. The first feature attribute is a feature attribute within the first region, and the second feature attribute is a feature attribute within the second region. The first sample feature is compared with the second visual feature in the first region to obtain a first result, and the second sample feature is compared with the second visual feature in the second region to obtain a second result; The initial guidance information is adjusted based on the first and second results to obtain the target guidance information.

3. The method according to claim 1, wherein when the at least one second visual feature is a plurality of second visual features, the step of comparing the at least one second visual feature with the standard visual features of the second image to obtain a comparison result includes: Obtain the first distance in the feature space between the plurality of second visual features and the standard visual features of the second image, respectively, wherein the distance in the feature space is inversely related to the similarity between the features; From the plurality of second visual features, determine the first feature that is furthest from the first feature and the second feature that is closest to the first feature; Obtain the second distance between the first feature and the second feature in the feature space; The comparison result is obtained based on the second distance.

4. The method according to claim 3, wherein obtaining the comparison result based on the second distance includes: Calculate the third distance between the third feature and the first feature in the feature space, wherein the third feature is a second visual feature that is adjacent to the second feature in the feature space; The second distance is robustly optimized using the third distance to obtain the target distance, wherein the comparison result includes the target distance; Determining the anomaly detection result of the first image based on the comparison result includes: If the target distance is greater than or equal to a preset distance threshold, the anomaly identification result is determined to be that the first image is in an abnormal state; If the target distance is less than the preset distance threshold, the anomaly identification result is determined to be that the first image is in a normal state.

5. The method according to claim 1, wherein when the at least one basic visual feature is a first number of basic visual features, the first stage of feature extraction on the first image to obtain at least one basic visual feature includes: The first image is divided into a first number of image blocks, wherein the image blocks are used to represent local image regions of the first image; The first stage of feature extraction is performed on the first number of image blocks to obtain the first image features corresponding to each image block as the basic visual features; When the at least one second visual feature is a second number of second visual features, the second-stage feature extraction of the first image using the target guidance information to obtain at least one second visual feature related to the target object type includes: The first image is divided into a second number of image blocks, wherein the image blocks are used to represent local image regions of the first image; Using the target guidance information, the second stage of feature extraction is performed on the second number of image blocks to obtain the second image features corresponding to each image block as the second visual features.

6. The method according to claim 5, after performing the second stage feature extraction on the second number of image blocks to obtain the second image features corresponding to each image block as the second visual features, the method further includes: Perform the following steps until the number of key image features stored according to the target object type is greater than or equal to a number threshold, wherein the key image features are used to update the standard visual features belonging to the target object type: The current image feature is determined from the second image features corresponding to each of the image blocks; Calculate the fourth distance in the feature space between the second image features corresponding to each of the image blocks, excluding the current image feature, and the current image feature, wherein the distance in the feature space is inversely related to the similarity between features; The second image feature with the largest fourth distance is identified as the key image feature; The obtained key image features are stored according to the target object type; If the number of obtained key image features is less than the number threshold, the next image feature is determined from the second image features corresponding to each of the image blocks, and the next image feature is used as the current image feature.

7. The method according to any one of claims 1 to 6, after performing a first-stage feature extraction on the first image to obtain at least one basic visual feature, the method further includes: If the feature similarity between each standard visual feature from at least one standard visual feature and the basic visual feature is less than the preset threshold, the object type to which the object belongs is set as an additional object type, wherein the additional object type is a type other than the rated object type to which the plurality of sample images belong; The standard object of the additional object type is acquired to obtain multiple third images, and the third images are used as the sample images.

8. An anomaly detection device, comprising: The first acquisition unit is used to acquire a first image to be identified, wherein the first image is an image obtained by acquiring an object; The first extraction unit is used to perform a first-stage feature extraction on the first image to obtain at least one basic visual feature, wherein the basic visual feature is used to represent the visual attributes presented by the object surface, and the object surface includes the surface of the object. The first determining unit is configured to determine a first visual feature from at least one standard visual feature whose feature similarity to the basic visual feature is greater than or equal to a preset threshold, wherein the standard visual feature is a feature obtained by feature extraction from multiple sample images belonging to multiple object types, and the sample image corresponding to the first visual feature is a second image belonging to the target object type. The second acquisition unit is used to acquire target guidance information matched by the first visual feature, wherein the target guidance information is used to guide attention to features in the image related to the type of the target object; The second extraction unit is used to perform a second-stage feature extraction on the first image using the target guidance information to obtain at least one second visual feature related to the target object type. The first comparison unit is used to compare the at least one second visual feature with the standard visual features of the second image to obtain a comparison result. The second determining unit is used to determine the anomaly recognition result of the first image based on the comparison result.

9. A computer-readable storage medium comprising a stored computer program, wherein, The computer program is executed by the electronic device to perform the method described in any one of claims 1 to 7.

10. A computer program product comprising a computer program that, when executed by a processor, implements the steps of the method according to any one of claims 1 to 7.

11. An electronic device comprising a memory and a processor, the memory storing a computer program, the processor being configured to perform the method of any one of claims 1 to 7 via the computer program.