Image processing methods, image model training methods, devices, media and equipment

By fusing and desensitizing foreground and background features of multiple target recognition images collected from the same area, the problem of low image privacy protection security in existing biometric technologies is solved, the attack cost for attackers is increased, and the protection strength of image privacy is enhanced.

CN116843911BActive Publication Date: 2026-06-30ALIPAY (HANGZHOU) INFORMATION TECH CO LTD

Patent Information

Authority / Receiving Office
CN · China
Patent Type
Patents(China)
Current Assignee / Owner
ALIPAY (HANGZHOU) INFORMATION TECH CO LTD
Filing Date
2023-05-18
Publication Date
2026-06-30

AI Technical Summary

Technical Problem

Existing biometric technologies offer low security in terms of image privacy protection, making them vulnerable to hacking and difficult to safeguard user privacy information.

Method used

By acquiring multiple target recognition images from the same area, foreground and background features are extracted separately and then fused and desensitized to generate a fused desensitized image. This image fusion and desensitization model is then used for joint desensitization, thereby increasing the attacker's attack cost.

Benefits of technology

This improves the privacy protection and security of biometric images, making it difficult for attackers to train anti-desensitization models and enhancing the strength of image privacy protection.

✦ Generated by Eureka AI based on patent content.

Smart Images

  • Figure CN116843911B_ABST
    Figure CN116843911B_ABST
Patent Text Reader

Abstract

This specification discloses an image processing method, an image model training method, an apparatus, a storage medium, and a device. The method includes: acquiring a first number of target recognition images, where each target recognition image is an image of a target object collected in a predetermined area; determining image features of each target recognition image in the first number of target recognition images, the image features including foreground features and background features; performing fusion and desensitization processing on the foreground features and background features of the first number of target recognition images respectively, generating corresponding foreground fusion desensitized images and background fusion desensitized images; and performing fusion processing on the foreground fusion desensitized images and background fusion desensitized images to generate a fusion desensitized image corresponding to the first number of target recognition images.
Need to check novelty before this filing date? Find Prior Art

Description

Technical Field

[0001] This specification relates to the field of machine learning technology, and in particular to an image processing method, an image model training method, an apparatus, a storage medium, and a device. Background Technology

[0002] Biometric technology provides users with an authentication method other than passwords and has been applied in many scenarios in recent years, such as facial recognition payment, fingerprint attendance, and iris recognition safes. While providing convenience to users, biometric technology requires the collection of users' biometric images, thus raising the issue of biometric information leakage.

[0003] In related technical solutions, data encryption technology is used to protect the privacy of users' biometric images. For example, the biometric image is encrypted during image transmission or storage, and then decrypted during the computation phase. However, this technical solution has low security and is easily cracked by attackers, making it difficult to guarantee the security of users' privacy information in most scenarios.

[0004] Therefore, increasing the cost of attacks for attackers and improving the security of privacy protection for biometric images have become urgent technical challenges. Summary of the Invention

[0005] This specification provides an image processing method, an image model training method, an apparatus, a storage medium, and a device that can increase the attack cost for attackers and improve the security of image privacy protection.

[0006] Firstly, embodiments of this specification provide an image processing method, including:

[0007] Acquire a first number of target recognition images, wherein the target recognition images are images of target objects collected in a predetermined area;

[0008] Determine the image features of each target recognition image in the first number of target recognition images, the image features including foreground features and background features;

[0009] The foreground features and background features of the first number of target recognition images are respectively fused and desensitized to generate corresponding foreground fused desensitized images and background fused desensitized images;

[0010] The foreground fused desensitized image and the background fused desensitized image are fused to generate a fused desensitized image corresponding to the first number of target recognition images.

[0011] Secondly, embodiments of this specification provide an image model training method, wherein the image model includes an image fusion and desensitization model, and the method includes:

[0012] Acquire a first number of target recognition images, wherein the target recognition images are images of target objects collected in a predetermined area;

[0013] Determine the image features of each target recognition image in the first number of target recognition images, the image features including foreground features and background features;

[0014] The image features of the first number of target recognition images are input into the image fusion desensitization model to obtain the corresponding foreground fusion desensitization image and background fusion desensitization image;

[0015] The foreground fused desensitized image and the background fused desensitized image are fused to generate the fused desensitized image corresponding to the first number of target recognition images;

[0016] Based on the fused desensitized image and the first number of target recognition images, the model loss of the image model is determined, and the model loss includes the desensitization loss;

[0017] The image model is trained based on the model loss.

[0018] Thirdly, embodiments of this specification provide an image processing apparatus, including:

[0019] The image acquisition module is configured to acquire a first number of target recognition images, wherein the target recognition images are images of target objects collected in a predetermined area;

[0020] The feature determination module is configured to determine the image features of each target recognition image in the first number of target recognition images, the image features including foreground features and background features;

[0021] The foreground and background decoupling and desensitization module is configured to perform fusion and desensitization processing on the foreground features and background features of the first number of target recognition images respectively, and generate corresponding foreground fusion desensitization images and background fusion desensitization images;

[0022] The fusion processing module is configured to perform fusion processing on the foreground fused desensitized image and the background fused desensitized image to generate a fused desensitized image corresponding to the first number of target recognition images.

[0023] Fourthly, embodiments of this specification provide an image model training apparatus, wherein...

[0024] The image model includes an image fusion desensitization model, and the device includes:

[0025] The image acquisition module is configured to acquire a first number of target recognition images, wherein the target recognition images are images of target objects collected in a predetermined area;

[0026] The feature determination module is configured to determine the image features of each target recognition image in the first number of target recognition images, the image features including foreground features and background features;

[0027] The foreground and background decoupling and desensitization module is configured to input the image features of the first number of target recognition images into the image fusion and desensitization model to obtain the corresponding foreground fusion desensitization image and background fusion desensitization image;

[0028] The fusion processing module is configured to perform fusion processing on the foreground fused desensitized image and the background fused desensitized image to generate a fused desensitized image corresponding to the first number of target recognition images;

[0029] The model loss determination module is configured to determine the model loss of the image model based on the fused desensitized image and the first number of target recognition images, wherein the model loss includes the desensitization loss;

[0030] The model training module is configured to train the image model based on the model loss.

[0031] Fifthly, embodiments of this specification provide a computer storage medium storing a plurality of instructions adapted for loading by a processor and executing the steps of the method described above.

[0032] Sixthly, embodiments of this specification provide a computer program product containing instructions that, when run on a computer or processor, cause the computer or processor to perform the steps of the method described above.

[0033] In a seventh aspect, embodiments of this specification provide an electronic device, including: a processor and a memory; wherein the memory stores a computer program adapted to be loaded by the processor and to execute the steps of the method described above.

[0034] According to the technical solution of the embodiments of this specification, multiple target recognition images collected from the same area are used for joint desensitization. The foreground and background of the multiple target recognition images are decoupled. Based on the similarity between the foreground and background of the multiple target recognition images, the multiple target recognition images are fused and desensitized. This makes it impossible for attackers to know the correspondence between the desensitized image and the original image during the desensitization stage. Therefore, it is difficult for attackers to train a good desensitization model to desensitize the desensitized image, thereby increasing the attacker's attack cost and improving the security of privacy protection for biometric images. Attached Figure Description

[0035] To more clearly illustrate the technical solutions in the embodiments or prior art of this specification, the drawings used in the description of the embodiments or prior art will be briefly introduced below. Obviously, the drawings described below are only some embodiments of this specification. For those skilled in the art, other drawings can be obtained based on these drawings without creative effort.

[0036] Figure 1 This is a schematic diagram of the implementation environment of an image processing method provided according to an embodiment of this specification;

[0037] Figure 2 This is a schematic flowchart of an image processing method provided according to an embodiment of this specification;

[0038] Figure 3 This is a schematic diagram of the process for selecting a first number of target recognition images according to an embodiment of this specification;

[0039] Figure 4 This is a flowchart illustrating an image model training method provided according to an embodiment of this specification.

[0040] Figure 5 This is a schematic diagram illustrating the process of training an image selection model according to the embodiments provided in this specification;

[0041] Figure 6 This is a flowchart illustrating another image model training method provided according to an embodiment of this specification;

[0042] Figure 7 This is a schematic diagram of the structure of the image processing apparatus provided according to the embodiments of this specification;

[0043] Figure 8 This is a schematic diagram of the image model training device provided according to the embodiments of this specification;

[0044] Figure 9 This is a schematic diagram of the structure of an electronic device provided according to an embodiment of this specification. Detailed Implementation

[0045] To make the features and advantages of this specification more apparent and understandable, the technical solutions in the embodiments of this specification will be clearly and completely described below with reference to the accompanying drawings. Obviously, the described embodiments are only a part of the embodiments of this specification, and not all of them. All other embodiments obtained by those skilled in the art based on the embodiments of this specification without creative effort are within the scope of protection of this specification.

[0046] First, the terms and concepts used in one or more embodiments of this specification will be explained.

[0047] Biometric images: including but not limited to various biometric images that can be used for identity authentication, such as faces, fingerprints, irises, etc.

[0048] Privacy protection refers to the protection of identity information in biometric images;

[0049] Foreground-background correlation: This refers to determining the similarity between the foreground and background in multiple images, which is used for subsequent multi-image joint privacy protection.

[0050] Mutual information is a measure used to evaluate the degree of dependence or association between two random variables, such as the degree of dependence or association between two images.

[0051] In related technical solutions, image de-identification algorithms are used to protect the privacy of biometric images. For example, a de-identification model and a de-identification model are trained on a user's biometric image. During image transmission or storage, the de-identification model encrypts and de-identifies the user's biometric image to protect privacy, while the de-identification model de-identifies it during the computation stage. However, this technical solution uses only a single image as the de-identified image. Once an attacker obtains a large amount of original data and de-identified data, they can train a de-identification model, thereby enabling large-scale theft of user privacy.

[0052] Based on the above, embodiments of this specification provide an image processing method and an image model training method. According to the technical solution of the embodiments of this specification, multiple target recognition images collected from the same area are used for joint desensitization. The foreground and background of the multiple target recognition images are decoupled. Based on the similarity between the foreground and background of the multiple target recognition images, the multiple target recognition images are fused and desensitized. This makes it impossible for attackers to know the correspondence between the desensitized image and the original image during the desensitization stage. Therefore, it is difficult for attackers to train a good desensitization model to desensitize the desensitized image, thereby increasing the attacker's attack cost and improving the security of privacy protection for biometric images.

[0053] The technical solutions of the embodiments of this specification will now be described in detail with reference to the accompanying drawings.

[0054] Figure 1 This is a schematic diagram illustrating the implementation environment of an image processing method provided in the embodiments of this specification.

[0055] See Figure 1 The implementation environment may include terminal 110 and server 140.

[0056] Terminal 110 is connected to server 140 via a wireless or wired network. Optionally, terminal 110 may be a smartphone, tablet, laptop, desktop computer, smartwatch, etc., but is not limited to these. Terminal 110 has applications that support image processing methods installed and running.

[0057] Server 140 is a standalone physical server, a server cluster or distributed system composed of multiple physical servers, or a cloud server that provides basic cloud computing services such as cloud services, cloud databases, cloud computing, cloud functions, cloud storage, network services, cloud communication, middleware services, domain name services, security services, Content Delivery Network (CDN), and big data and artificial intelligence platforms. Server 140 provides background services for applications running on terminal 110.

[0058] Those skilled in the art will understand that the number of terminals described above can be more or less. For example, there may be only one terminal, or there may be dozens or hundreds of terminals, or even more, in which case other terminals may also be included in the above implementation environment. This specification does not limit the number of terminals or the type of device in the embodiments.

[0059] After introducing the implementation environment of the embodiments of this specification, the application scenarios of the embodiments of this specification will be described below in conjunction with the above implementation environment. In the following description, the terminal is also the terminal 110 in the above implementation environment, and the server is also the server 140 in the above implementation environment. The technical solutions provided by the embodiments of this specification can be applied to image desensitization processing, such as face recognition image desensitization, fingerprint recognition image desensitization, license plate recognition image desensitization, and iris recognition image desensitization.

[0060] Taking the technical solution provided in the embodiments of this specification as an example in a face recognition scenario, with user authorization, a first number of face recognition images are acquired, where each face recognition image is an image of a target face collected in a predetermined area; the image features of each face recognition image in the first number of face recognition images are determined, including foreground features and background features; the foreground features and background features of the first number of face recognition images are fused and desensitized respectively to generate corresponding foreground fused desensitized images and background fused desensitized images; the foreground fused desensitized images and background fused desensitized images are fused to generate fused desensitized images corresponding to the first number of target recognition images.

[0061] It should be noted that the above description is based on the application of the technical solution provided in the embodiments of this specification to a face recognition scenario. The technical solution provided in the embodiments of this specification can also be applied to other appropriate image desensitization processing scenarios. The implementation process is the same as the above description and belongs to the same inventive concept, so it will not be repeated here.

[0062] It should be noted that the steps in the image processing method in the example embodiments of this specification may be partially executed by the client, partially executed by the server, or entirely executed by the server or entirely by the client. This specification does not impose any special limitations on this.

[0063] based on Figure 1 The implementation environment shown below will be combined with... Figures 2-4 This specification provides a detailed description of the image processing methods provided in the embodiments. It should be noted that the above-described implementation environments are shown only to facilitate understanding of the spirit and principles of this specification, and the embodiments are not limited in any way. Rather, the embodiments can be applied to any applicable scenario.

[0064] Figure 2 This is a schematic flowchart illustrating an image processing method provided in an embodiment of this specification. This image processing method can be executed by a device with computing capabilities, such as a terminal device or a server. Figure 2 As shown, the image processing method in the embodiments of this specification may include the following steps S210 to S240.

[0065] In step S210, a first number of target recognition images are acquired, wherein the target recognition images are images of target objects collected in a predetermined area.

[0066] In the example embodiment, the target object can be a face, iris, or license plate, etc., and the target recognition image can be an image of the target object collected in a predetermined area, such as a target face image collected simultaneously by different devices in the predetermined area. A first number of target recognition images can be acquired by using different devices set up in the predetermined area, or a first number of target recognition images can be acquired by using the same device in the predetermined area at different times, and then uploaded to a server. The predetermined area can be an area such as a station, supermarket, shopping mall, or office building.

[0067] In some example embodiments, the target recognition images are a first number of target object recognition images captured simultaneously by different devices in a predetermined area. For example, suppose the predetermined area is the entrance of a supermarket, and three cameras are set up at the entrance of the supermarket to capture three facial recognition images of people entering the supermarket at the same time.

[0068] In some other example embodiments, the target recognition image is a first number of target object recognition images acquired by different types of devices in a predetermined area. For example, the first number of target recognition images includes visible light recognition images and infrared light recognition images. For instance, suppose the different types of devices include a visible light image acquisition device and an infrared image acquisition device. The visible light recognition image of the target object is acquired by the visible light image acquisition device located in the same area, and the infrared light recognition image of the target object is acquired by the infrared image acquisition device.

[0069] In other example embodiments, the target recognition image is a first number of target object recognition images collected by the same device in a predetermined area at different times. For example, during face recognition verification, with user authorization, a face recognition image of the user's frontal face, a nodding face recognition image, or a blinking face recognition image captured by the user's terminal camera is acquired.

[0070] Furthermore, the first number of target recognition images can be target recognition images selected from multiple target recognition images whose similarity is greater than a predetermined threshold. For example, suppose 5 target recognition images are collected in the same area, and 3 target recognition images with a similarity greater than the predetermined threshold are selected from the 5 target recognition images. The predetermined threshold can be determined based on the number of multiple target recognition images and the first number.

[0071] In step S220, the image features of each target recognition image in the first number of target recognition images are determined. The image features include foreground features and background features.

[0072] In an example embodiment, foreground features can be features of the target object in the target recognition image, such as facial features in a face recognition image; background features can be features of the recognition scene in the target recognition image, such as office scene features or supermarket scene features. Each target recognition image in a first number of target recognition images is encoded to generate image encoded features, and foreground and background features of each target recognition image are extracted from the image encoded features. For example, the image encoded features are input into a foreground and background feature extraction model to extract foreground and background features of each target recognition image. The foreground and background feature extraction model can be a pre-trained convolutional neural network model, which includes a foreground feature extraction module and a background feature extraction module. The foreground feature extraction module is configured to extract foreground features of the target recognition image using a pre-trained foreground extraction convolutional kernel, and the background feature extraction module is configured to extract background features of the target recognition image using a pre-trained background extraction convolutional kernel.

[0073] It should be noted that although the foreground and background feature extraction model is illustrated using a convolutional neural network as an example, those skilled in the art should understand that the foreground and background feature extraction model can also be other appropriate models, such as the Transformer model, etc., and the embodiments in this specification do not impose any special limitations on this.

[0074] In step S230, the foreground features and background features of the first number of target recognition images are fused and desensitized respectively to generate corresponding foreground fused desensitized images and background fused desensitized images.

[0075] In the example embodiment, the fusion desensitization process refers to fusing the foreground or background features of multiple images and then desensitizing the fused image. The foreground features of a first number of target recognition images are fused and desensitized to generate a corresponding foreground fused desensitized image; the background features of the first number of target recognition images are fused and desensitized to generate a corresponding background fused desensitized image.

[0076] For example, assuming the target recognition image is a face recognition image, the foreground features (i.e., face features) of each face recognition image in the first number of face recognition images are fused to generate a corresponding foreground fused image (i.e., face fused image). The foreground fused image may include the correlation features between the foreground features of each image in the first number of target recognition images. The background features (i.e., scene features) of each face recognition image in the first number of target recognition images are fused to generate a corresponding background fused image (i.e., scene fused image). The background fused image may include the correlation features between the background features of each image in the first number of target recognition images. The foreground fused image and the background fused image are desensitized respectively to generate corresponding foreground fused desensitized images and background fused desensitized images.

[0077] Further, in the example embodiment, the image features of a first number of target recognition images are input into the image fusion desensitization model to obtain corresponding foreground fusion desensitized images and background fusion desensitized images. The image fusion desensitization model includes a foreground fusion desensitization sub-model and a background fusion desensitization sub-model. The image features of the first number of target recognition images and the corresponding foreground features are input into the foreground fusion desensitization sub-model to obtain the foreground fusion desensitized image; the image features of the first number of target recognition images and the corresponding background features are input into the background fusion desensitization sub-model to obtain the background fusion desensitized image. The image fusion desensitization model can be a convolutional neural network model. The foreground fusion desensitization sub-model is configured to fuse and desensitize the foreground features in the image features of the target recognition images, for example, by blurring the fused foreground features to generate the foreground fusion desensitized image; the background fusion sub-model is configured to fuse and desensitize the background features in the image features of the target recognition images, for example, by blurring the fused background features to generate the background fusion desensitized image.

[0078] It should be noted that although the image fusion desensitization model is illustrated using a convolutional neural network as an example, those skilled in the art should understand that the image fusion desensitization model can also be other suitable models, such as the Transformer model, etc., and the embodiments in this specification do not impose any special limitations on this.

[0079] In step S240, the foreground fusion desensitized image and the background fusion desensitized image are fused to generate a first number of fusion desensitized images corresponding to the target recognition images.

[0080] In an example embodiment, the target object is extracted from the foreground fusion desensitized image, and the target object in the foreground fusion desensitized image is fused to the target object position in the background fusion desensitized image to generate a fusion desensitized image containing the target object. The target object is a fusion target object of multiple target objects in a first number of target recognition images, and the fusion target object contains the association features between multiple target objects in the first number of target recognition images.

[0081] For example, the target object image is extracted from the foreground fusion desensitized image using a foreground mask, and then the extracted target object image is fused to the target object position in the background fusion desensitized image to generate a fusion desensitized image containing the target object.

[0082] according to Figure 2The technical solution in the example embodiment utilizes multiple target recognition images collected from the same area for joint desensitization, decouples the foreground and background of the multiple target recognition images, and performs fusion desensitization processing on the multiple target recognition images based on the similarity of the foreground and background of the multiple target recognition images. This makes it impossible for attackers to know the correspondence between the desensitized image and the original image during the desensitization stage. Therefore, it is difficult for attackers to train a good desensitization model to desensitize the desensitized image, thereby increasing the attacker's attack cost and improving the security of privacy protection for biometric images.

[0083] Furthermore, in some example embodiments, the image features of the target recognition image also include foreground category features and background category features. Foreground category features may include user identifier classification, license plate identifier classification, ID document identifier classification, etc., while background category features may include scene identifier classification, such as office, station, supermarket, etc. The foreground category features and background category features of each target recognition image in a first number of target recognition images are determined; the first number of target recognition images are classified based on their foreground and background category features; and the similarity between target recognition images of the same category is calculated based on the classification results.

[0084] According to the technical solution in the above example embodiment, classifying target recognition images based on foreground and background categories and determining the similarity between classified target recognition images can reduce the computational load of subsequent fusion and desensitization processing and improve image processing efficiency.

[0085] Figure 3 This is a schematic diagram of the process for selecting a first number of target recognition images according to an embodiment of this specification.

[0086] Reference Figure 3 As shown, in step S310, a second number of target recognition images are acquired, and the second number is greater than the first number.

[0087] In the example embodiment, the second quantity is the number of original target recognition images collected by different devices located in the predetermined area. Since a first quantity of target recognition images needs to be selected from the original target recognition images, the second quantity is greater than the first quantity. The second quantity of target recognition images collected by different devices in the predetermined area is then obtained.

[0088] For example, if the designated area is the entrance of a supermarket, five cameras are installed at the entrance of the supermarket, and the five cameras simultaneously capture five facial recognition images of people entering the supermarket.

[0089] In step S320, the similarity between every two target recognition images in the second number of target recognition images is determined.

[0090] In an example embodiment, each target recognition image in the second number of target recognition images is encoded to generate image encoding features. Image features of each target recognition image are extracted from the image encoding features. The image features include foreground features and background features. For example, the image encoding features are input into a foreground and background feature extraction model to extract the foreground and background features of each target recognition image. The foreground and background feature extraction model can be a pre-trained convolutional neural network model, which includes a foreground feature extraction module and a background feature extraction module. The foreground feature extraction module is configured to extract the foreground features of the target recognition image through a pre-trained foreground extraction convolutional kernel, and the background feature extraction module is configured to extract the background features of the target recognition image through a pre-trained background extraction convolutional kernel.

[0091] Further, the foreground similarity of foreground features between every two target recognition images in the second number of target recognition images is determined, and the background similarity of background features between every two target recognition images is determined, based on the foreground similarity and background similarity of every two target recognition images. For example, the cosine similarity of foreground features between every two target recognition images is determined, and the foreground similarity of foreground features between every two target recognition images is determined based on the cosine similarity; the cosine similarity of background features between every two target recognition images is determined, and the background similarity of background features between every two target recognition images is determined based on the background similarity.

[0092] In step S330, a first number of target recognition images are selected from a second number of target recognition images based on similarity.

[0093] In an example embodiment, a first number of target recognition images are selected from a second number of target recognition images based on the foreground similarity and background similarity of every two target recognition images. For example, a foreground similarity matrix and a background similarity matrix are generated based on the foreground similarity and background similarity of every two target recognition images. The similarity matrices are encoded to generate similarity features. Based on the similarity features and the foreground and background features of the second number of target recognition images, the first number of target recognition images are selected from the second number of target recognition images.

[0094] For example, similarity features and foreground and background features of a second number of target recognition images are input into an image selection model to output a first number of target recognition images. The mutual information between the images in the first number of target recognition images satisfies a predetermined model convergence condition of the image selection model. The predetermined model convergence condition can be a convergence condition where the mutual information is greater than a predetermined threshold.

[0095] Furthermore, in some example embodiments, the mutual information between a second number of target recognition images is determined based on similarity, and a first number of target recognition images are selected from the second number of target recognition images based on the magnitude of the mutual information. Mutual information is a measure used to evaluate the degree of dependence between two random variables. For example, the second number of target recognition images are sorted based on the magnitude of the mutual information between them, and the first number of target recognition images are selected in descending order.

[0096] according to Figure 3 The technical solution in the example embodiment selects a first number of images (K images) from a second number of images (N images) based on the similarity features between images. This allows for the selection of relatively similar images for fusion and desensitization processing, thereby reducing the computational load of fusion and desensitization processing and improving image processing efficiency.

[0097] Figure 4 This is a schematic flowchart illustrating an image model training method provided according to embodiments of this specification. This image model training method can be executed by a device with computing capabilities, such as a server or terminal device. Figure 4 As shown, the image model training method in the embodiments of this specification may include the following steps S410 to S460.

[0098] Reference Figure 4 As shown, in step S410, a first number of target recognition images are acquired, where the target recognition images are images of target objects collected in a predetermined area.

[0099] In the example embodiment, the implementation process and effect of step S410 are similar to those of step S210, and will not be repeated here.

[0100] In step S420, the image features of each target recognition image in the first number of target recognition images are determined. The image features include foreground features and background features.

[0101] In an example embodiment, foreground features can be features of the target object in the target recognition image, such as facial features in a face recognition image; background features can be features of the recognition scene in the target recognition image, such as office scene features or supermarket scene features. Each target recognition image in a first number of target recognition images is encoded to generate image encoded features, and foreground and background features of each target recognition image are extracted from the image encoded features. For example, the image encoded features are input into a foreground and background feature extraction model to extract foreground and background features of each target recognition image. The foreground and background feature extraction model can be a pre-trained convolutional neural network model, which includes a foreground feature extraction module and a background feature extraction module. The foreground feature extraction module is configured to extract foreground features of the target recognition image using a pre-trained foreground extraction convolutional kernel, and the background feature extraction module is configured to extract background features of the target recognition image using a pre-trained background extraction convolutional kernel.

[0102] It should be noted that although the foreground and background feature extraction model is illustrated using a convolutional neural network as an example, those skilled in the art should understand that the foreground and background feature extraction model can also be other appropriate models, such as the Transformer model, etc., and the embodiments in this specification do not impose any special limitations on this.

[0103] In step S430, the image features of the first number of target recognition images are input into the image fusion desensitization model to obtain the corresponding foreground fusion desensitization image and background fusion desensitization image.

[0104] In the example embodiment, the foreground features of the first number of target recognition images are fused and desensitized to generate corresponding foreground fused and desensitized images; the background features of the first number of target recognition images are fused and desensitized to generate corresponding background fused and desensitized images.

[0105] For example, the image features of a first number of target recognition images are input into an image fusion desensitization model to obtain corresponding foreground fusion desensitized images and background fusion desensitized images. The image fusion desensitization model includes a foreground fusion desensitization sub-model and a background fusion desensitization sub-model. The image features of the first number of target recognition images and their corresponding foreground features are input into the foreground fusion desensitization sub-model to obtain a foreground fusion desensitized image; the image features of the first number of target recognition images and their corresponding background features are input into the background fusion desensitization sub-model to obtain a background fusion desensitized image. The image fusion desensitization model can be a convolutional neural network model. The foreground fusion desensitization sub-model is configured to fuse and desensitize the foreground features in the image features of the target recognition images, for example, by blurring the fused foreground features to generate a foreground fusion desensitized image; the background fusion sub-model is configured to fuse and desensitize the background features in the image features of the target recognition images, for example, by blurring the fused background features to generate a background fusion desensitized image.

[0106] It should be noted that although the image fusion desensitization model is illustrated using a convolutional neural network as an example, those skilled in the art should understand that the image fusion desensitization model can also be other suitable models, such as the Transformer model, etc., and the embodiments in this specification do not impose any special limitations on this.

[0107] In step S440, the foreground fusion desensitized image and the background fusion desensitized image are fused to generate a first number of fusion desensitized images corresponding to the target recognition images.

[0108] In the example embodiment, the target object is extracted from the foreground fusion desensitized image, and the target object in the foreground fusion desensitized image is fused to the target object position in the background fusion desensitized image to generate a fusion desensitized image containing the target object. The target object is a fusion target object of multiple target objects in a first number of target recognition images, and the fusion target object contains the association features between multiple target objects in the first number of target recognition images.

[0109] For example, the target object image is extracted from the foreground fusion desensitized image using a foreground mask, and then the extracted target object image is fused to the target object position in the background fusion desensitized image to generate a fusion desensitized image containing the target object.

[0110] In step S450, based on the fused desensitized image and a first number of target recognition images, the model loss of the image model is determined, and the model loss includes the desensitization loss.

[0111] In an example embodiment, the desensitization loss is used to minimize the similarity between the fused desensitized image and any one of the first number of target recognition images, and the desensitization loss is determined by the difference between the fused desensitized image and the target recognition image.

[0112] For example, the image features of the fused desensitized image and the feature similarity between the image features of each target recognition image in a first number of target recognition images are determined, and the desensitization loss of the image model is determined based on the feature similarity.

[0113] In step S460, the image model is trained based on the model loss.

[0114] In the example embodiment, the model loss includes a desensitization loss. Based on this model loss, the image model is trained using gradient descent to adjust its parameters so that the feature similarity between the fused desensitized image and any one of the first number of target recognition images is less than a predetermined threshold. The predetermined threshold can be determined based on the size of the first number.

[0115] For example, the ADAM (Adaptive Moment Estimation) optimizer is used to train an image model until it converges. The ADAM optimizer considers both the first moment estimate (mean of the gradient) and the second moment estimate (uncentered variance of the gradient) to calculate the update step size for model training, and then trains the image model based on this update step size. By training the image model using the ADAM optimizer, it can adaptively adjust from both the mean and squared gradient perspectives, rather than being directly determined by the current gradient, thus improving model training efficiency.

[0116] according to Figure 4 The technical solution in the example embodiment utilizes multiple target recognition images collected from the same area for joint desensitization, decouples the foreground and background of the multiple target recognition images, and performs fusion desensitization processing on the multiple target recognition images based on the similarity of the foreground and background of the multiple target recognition images. This makes it impossible for attackers to know the correspondence between the desensitized image and the original image during the desensitization stage. Therefore, it is difficult for attackers to train a good desensitization model to desensitize the desensitized image, thereby increasing the attacker's attack cost and improving the security of privacy protection for biometric images.

[0117] Furthermore, in the example embodiment, the image model further includes an anti-desensitization model, and the model loss also includes an anti-desensitization loss, which is used to ensure that the anti-desensitized image remains consistent with the original image. The image model training method further includes: inputting the fused desensitized image into the anti-desensitization model to obtain a first number of predicted anti-desensitized images; and determining the anti-desensitization loss of the anti-desensitization model based on the image differences between the first number of predicted anti-desensitized images and the first number of target recognition images.

[0118] In this example embodiment, the above-mentioned training of the image model based on model loss includes: adjusting the parameters of the image model based on desensitization loss and anti-desensitization loss, such that the feature similarity between the fused desensitized image and any image in the first number of target recognition images is less than a predetermined threshold, and the predicted anti-desensitized image is consistent with the original target recognition image.

[0119] According to the technical solution in the above example embodiments, in the anti-desensitization stage, since attackers cannot know the correspondence between the desensitized image and the original image, it is difficult for attackers to train the anti-desensitization model to perform anti-desensitization on the desensitized image, thereby increasing the attacker's attack cost and improving the security of privacy protection for biometric images.

[0120] Furthermore, in the example embodiment, the method further includes: acquiring a second number of target recognition images, the second number being greater than the first number; determining the similarity between every two target recognition images in the second number of target recognition images; inputting the similarity and image features of the target recognition images into an image selection model to obtain a first number of target recognition images, wherein the mutual information between the images in the first number of target recognition images satisfies a predetermined model convergence condition of the image selection model, the predetermined model convergence condition being a convergence condition where the mutual information is greater than a predetermined threshold.

[0121] For example, the image features of each target recognition image in a second number of target recognition images are determined, including foreground features and background features; the foreground similarity of the foreground features and the background similarity of the background features of every two target recognition images in the second number of target recognition images are determined. Based on the foreground and background similarities, the mutual information between the images in the second number of target recognition images is determined, and a first number of target recognition images are selected from the second number of target recognition images based on the magnitude of the mutual information. Mutual information is a measure used to evaluate the degree of dependence between two random variables. For example, the second number of target recognition images are sorted based on the magnitude of the mutual information between the images in the second number of target recognition images, and a first number of target recognition images are selected in descending order.

[0122] According to the technical solution in the above example embodiment, by selecting a first number of images (K images) from a second number (N images) based on the similarity features between images, it is possible to select relatively similar images for fusion and desensitization processing, thereby reducing the computational load of fusion and desensitization processing and improving image processing efficiency.

[0123] Figure 5 This is a schematic diagram illustrating the process of training an image selection model according to the embodiments provided in this specification.

[0124] Reference Figure 5 As shown, in step S510, the foreground similarity and background similarity corresponding to every two target recognition images are input into the similarity encoding module to obtain the similarity features corresponding to the second number of target recognition images.

[0125] In the example embodiment, the image selection model includes a similarity encoding module, which encodes the foreground similarity and background similarity of each pair of target recognition images to generate a similarity encoded feature vector. For example, the input to the similarity encoding module is an N*N foreground similarity matrix and a background similarity matrix corresponding to N target recognition images, and the output is a similarity encoded feature vector corresponding to N target recognition images.

[0126] In step S520, the similarity features and the corresponding foreground and background features are input into the prediction selection module to obtain a first number of target recognition images.

[0127] In an example embodiment, the image selection model includes a prediction selection module. This module selects a first number of target recognition images from a second number of target recognition images based on the similarity features of the target recognition images and the corresponding foreground and background features. The mutual information between the images in the first number of target recognition images satisfies a predetermined model convergence condition for the image selection model. This predetermined model convergence condition can be a convergence condition where the mutual information is greater than a predetermined threshold.

[0128] In step S530, the mutual information between a first number of target recognition images is determined.

[0129] In the example embodiment, the loss function of the image selection model is mutual information. Mutual information is a measure used to evaluate the degree of dependence or association between two random variables, such as the degree of dependence or association between two images. The mutual information between a first number of target recognition images is calculated.

[0130] In step S540, the parameters of the image selection model are adjusted based on mutual information.

[0131] In the example embodiment, the parameters of the similarity encoding module and the prediction selection module of the image selection model are adjusted based on mutual information, so that the mutual information between the images of a first number of target recognition images satisfies the predetermined model convergence condition of the image selection model, such as the convergence condition that the mutual information is greater than a predetermined threshold.

[0132] For example, the ADAM optimizer is used to train an image selection model until the model converges. The ADAM optimizer considers both the first-moment estimate (mean of the gradient) and the second-moment estimate (uncentered variance of the gradient) to calculate the update step size for model training, and then trains the image selection model based on this update step size. By training the image model using the ADAM optimizer, adaptive adjustments can be made from both the mean and squared gradient perspectives, rather than being directly determined by the current gradient, thus improving model training efficiency.

[0133] according to Figure 5 The technical solution in the example embodiment trains the image selection model based on mutual information, which can select a first number of similar target recognition images from a second number of target recognition images, thereby reducing the computational load of subsequent fusion and desensitization processing and improving image processing efficiency.

[0134] Figure 6 This is a flowchart illustrating another image model training method provided according to an embodiment of this specification.

[0135] Traditional biometric image privacy protection algorithms based on de-identification / anti-de-identification use only a single image as the image to be de-identified, making them easy targets for attackers to train anti-de-identification modules for attacks. To overcome this problem, this specification proposes a privacy protection method based on the decoupling and association of multiple images' foreground and background. According to the technical solution of this specification's embodiments, several images are selected from multiple images collected from the same area for joint encryption, while the remaining unselected images are de-identified individually. This prevents attackers from knowing the correspondence between the de-identified images and the original images, thereby increasing the attack cost and making it difficult to train the anti-de-identification model effectively. The following describes the method in conjunction with... Figure 6 The image model training method of this embodiment will be described in detail.

[0136] Reference Figure 6 As shown, in step S610, the foreground similarity and background similarity of the acquired target recognition image are determined.

[0137] In the example embodiment, the acquired target recognition image is input into the foreground-background similarity analysis model to decouple the foreground and background of the target recognition image, thereby obtaining the foreground similarity between foregrounds and the background similarity between backgrounds after decoupling. The foreground-background similarity analysis model can be a convolutional neural network model or other appropriate models such as the Transformer model.

[0138] Furthermore, the training of the foreground-background similarity analysis model based on feature decoupling includes the following parts:

[0139] (1) Model structure: The model structure consists of two parts: the first is the basic feature encoder, the second is the foreground feature encoder, and the third is the background feature encoder.

[0140] (2) Input and output: The input of the basic feature encoder is the original (undesensitized) target recognition image, and the output is the corresponding image coding feature; the input of the foreground feature encoder is the image coding feature, and the output is the foreground feature, such as biometric features and foreground category, such as user identification classification; the input of the background feature encoder is the image coding feature, and the output is the background feature and background category, such as scene category.

[0141] (3) Loss function: The loss function consists of two parts. The first part is the classification loss, which includes foreground feature classification and background feature classification. The second part is the alignment loss, which is used to make the features of the background and foreground of the same category as close as possible.

[0142] (4) Training method: Based on the above model structure and loss function, the ADAM optimizer is used to optimize and train the model until the model converges.

[0143] (5) Calculation of foreground and background similarity: Input the two target recognition images into the foreground and background similarity analysis model trained above to obtain the foreground and background features of each target recognition image. Further, calculate the foreground similarity of the two target recognition images based on the foreground features, for example, calculate the cosine similarity of the foreground features of the two target recognition images; calculate the background similarity of the two target recognition images based on the background features, for example, calculate the cosine similarity of the background features of the two target recognition images.

[0144] In step S620, K target recognition images are selected from the N target recognition images based on mutual information, where N>K.

[0145] In an example embodiment, K target recognition images are selected from N acquired target recognition images using an image selection model based on mutual information. For example, K images are selected from N target recognition images acquired by different devices in the same area at the same time as input for subsequent joint desensitization.

[0146] Furthermore, the training of the image selection model based on mutual information prediction of image sets includes the following parts:

[0147] (1) Model structure: The model structure is divided into three parts. The first part is the similarity encoder, the second part is the foreground and background feature encoder, and the third part is the prediction selection module.

[0148] (2) Input and output: The input of the similarity encoder is the N*N foreground similarity matrix and the N*N background similarity matrix of N target recognition images, and the output is the similarity feature; the input of the foreground and background feature encoder is the foreground and background features of N target recognition images, and the output is the encoded foreground and background features; the input of the prediction selection module is the similarity feature and the encoded foreground and background features of N target recognition images, and the output is the K images selected by the prediction selection model;

[0149] (3) Loss function: The loss function is mutual information extreme value loss;

[0150] (4) Training method: Based on the above model structure and loss function, the image selection model is optimized and trained using the ADAM optimizer until the model converges.

[0151] In step S630, the K target recognition images are fused and desensitized to obtain fused and desensitized images.

[0152] In the example embodiment, after selecting K target recognition images, the K target recognition images are desensitized based on the fusion desensitization model, while the remaining NK images are desensitized individually.

[0153] Furthermore, the training of the fusion desensitization model for associating and desensitizing K target images includes the following parts:

[0154] (1) Model structure: The model structure of fusion desensitization includes four parts: feature encoder, foreground desensitization module, background desensitization module, fusion module and anti-desensitization module;

[0155] (2) Input and output: The input of the feature encoder is the K target recognition images to be desensitized, and the output is the image coding features of the K target recognition images; the input of the background desensitization module is the image coding features of the K target recognition images and the background features generated in step 610 above, and the output is a background fusion desensitization image; the input of the foreground desensitization module is the image coding features of the K target recognition images and the corresponding foreground features generated in step 610 above, and the output is a foreground fusion desensitization image; the input of the fusion module is the background fusion desensitization image and the foreground fusion desensitization image, and the output is a fusion desensitization image; the input of the anti-desensitization module is a fusion desensitization image, and the output is the K target recognition images before desensitization;

[0156] (3) Loss function: The loss function is the desensitization loss and the anti-desensitization loss. The desensitization loss is used to make the similarity between the desensitized image and any target recognition image before desensitization as low as possible. The anti-desensitization loss is used to make the anti-desensitized image and the original target recognition image as consistent as possible.

[0157] (4) Training method: Based on the above model structure and loss function, the ADAM optimizer is used to optimize and train the model until the model converges.

[0158] In step S640, the remaining target recognition images are desensitized as individual images.

[0159] In the example embodiment, the remaining target recognition image is desensitized using a single-image desensitization model. This single-image desensitization model can be a convolutional neural network model or other suitable models such as the Transformer model.

[0160] Furthermore, the training of a single-image desensitization model mainly includes the following parts:

[0161] (1) Model structure: The model structure includes a desensitization module and a reverse desensitization module;

[0162] (2) Input and output: The input of the desensitization module is an original image, and the output is a desensitized image; the input of the anti-desensitization module is a desensitized image, and the output is an image before desensitization.

[0163] (3) Loss function: The loss function is the desensitization loss and the anti-desensitization loss. The desensitization loss is used to make the similarity between the desensitized image and the target recognition image before desensitization as low as possible, and the anti-desensitization loss is used to make the anti-desensitized image and the original target recognition image as consistent as possible.

[0164] (4) Training method: Based on the above model structure and loss function, the ADAM optimizer is used to optimize and train the model until the model converges.

[0165] In step S650, the trained model is deployed to a terminal device or server.

[0166] In the example embodiment, the trained foreground-background similarity analysis model, image selection model, fusion desensitization model, and single image desensitization model are deployed to the terminal device or server. For example, all models can be deployed to the terminal device or server, or some models can be deployed to the terminal device and some models can be deployed to the server.

[0167] Furthermore, N target recognition images acquired simultaneously by different devices within the same area are processed by a single device, such as a terminal device or a server. Specific applications of the model include the following:

[0168] (1) K-image selection takes N target recognition images as input and selects K target recognition images as input to the fusion desensitization model based on the above image selection model;

[0169] (2) Joint encryption: Input K target recognition images into the fusion desensitization model to obtain a desensitized image;

[0170] (3) Single image encryption: Input the remaining NK images into the single image desensitization model to obtain NK desensitized images;

[0171] (4) Upload N-K+1 desensitized images to the cloud server for desensitization and calculation.

[0172] according to Figure 6 The technical solution in the example embodiment utilizes multiple target recognition images collected from the same area for joint desensitization, decouples the foreground and background of the multiple target recognition images, and performs fusion desensitization processing on the multiple target recognition images based on the similarity of the foreground and background of the multiple target recognition images. This makes it impossible for attackers to know the correspondence between the desensitized image and the original image during the desensitization stage. Therefore, it is difficult for attackers to train a good desensitization model to desensitize the desensitized image, thereby increasing the attacker's attack cost and improving the security of privacy protection for biometric images.

[0173] The foregoing has described specific embodiments of this specification. Other embodiments are within the scope of the appended claims. In some cases, the actions or steps recited in the claims may be performed in a different order than that shown in the embodiments and may still achieve the desired result. Furthermore, the processes depicted in the drawings do not necessarily require the specific or sequential order shown to achieve the desired result. In some embodiments, multitasking and parallel processing are also possible or may be advantageous.

[0174] Below, we will combine Figure 7 as well as Figure 1 The system architecture shown in this specification provides a detailed description of the image processing apparatus provided in the embodiments. It should be noted that... Figure 7 The image processing apparatus described herein is used to perform the functions described herein. Figures 2-6 The methods shown in the embodiments are illustrated for ease of explanation, showing only the parts related to the embodiments of this specification. For specific technical details not disclosed, please refer to this specification. Figures 2-6 The example shown.

[0175] Please see Figure 7 This is a schematic diagram of the structure of an image processing apparatus provided in an embodiment of this specification. Figure 7 As shown, the image processing apparatus 700 in this embodiment may include: an image acquisition module 710, a feature determination module 720, a foreground / background decoupling and desensitization module 730, and a fusion processing module 740. Wherein:

[0176] Image acquisition module 710 is configured to acquire a first number of target recognition images, wherein the target recognition images are images of target objects collected in a predetermined area;

[0177] The feature determination module 720 is configured to determine the image features of each target recognition image in the first number of target recognition images, the image features including foreground features and background features;

[0178] The foreground and background decoupling and desensitization module 730 is configured to perform fusion and desensitization processing on the foreground features and background features of the first number of target recognition images respectively, and generate corresponding foreground fusion desensitization images and background fusion desensitization images;

[0179] The fusion processing module 740 is configured to perform fusion processing on the foreground fused desensitized image and the background fused desensitized image to generate a fused desensitized image corresponding to the first number of target recognition images. In some example embodiments, based on the above scheme, the foreground-background decoupling desensitization module 730 is configured as follows:

[0180] The foreground features of each target recognition image in the first number of target recognition images are fused to generate a corresponding foreground fused image;

[0181] The background features of each target recognition image in the first number of target recognition images are fused to generate a corresponding background fused image;

[0182] The foreground fusion image and the background fusion image are respectively desensitized to generate corresponding foreground fusion desensitized images and background fusion desensitized images.

[0183] In some example embodiments, based on the above solution, the device 700 further includes:

[0184] The image acquisition module is configured to acquire a second number of target recognition images before acquiring a first number of target recognition images, wherein the second number is greater than the first number;

[0185] A similarity determination module is configured to determine the similarity between every two target recognition images in the second number of target recognition images;

[0186] The image filtering module is configured to select the first number of target recognition images from the second number of target recognition images based on the similarity.

[0187] In some example embodiments, based on the above scheme, the similarity determination module includes:

[0188] The feature determination unit is configured to determine image features of each target recognition image in the second number of target recognition images, the image features including foreground features and background features;

[0189] The foreground-background similarity determination unit is configured to determine the foreground similarity of the foreground features and the background similarity of the background features for every two target recognition images in the second number of target recognition images.

[0190] The image filtering module is also configured to:

[0191] The first number of target recognition images are selected from the second number of target recognition images based on the foreground similarity and the background similarity.

[0192] In some example embodiments, based on the above scheme, the image filtering module is further configured as follows:

[0193] Based on the foreground similarity and the background similarity, determine the foreground similarity matrix and background similarity matrix corresponding to the second number of target recognition images;

[0194] Based on the foreground similarity matrix and the background similarity matrix, the similarity features of the second number of target recognition images are determined;

[0195] The first number of target recognition images are selected from the second number of target recognition images based on the similarity feature.

[0196] In some example embodiments, based on the above scheme, the image features further include foreground category features and background category features, and the feature determination module 720 is further configured to:

[0197] Determine the foreground category features and background category features of each target recognition image in the first number of target recognition images.

[0198] In some example embodiments, based on the above scheme, the target recognition image is a biometric image, and the target object includes a human face object.

[0199] In some example embodiments, based on the above solution, the device 700 further includes:

[0200] The anti-desensitization module is used to input the fused desensitized image into a pre-trained anti-desensitization model to obtain the first number of target recognition images. The anti-desensitization model is an anti-desensitization model trained using the first number of target recognition images as samples.

[0201] The above is an illustrative embodiment of an image processing apparatus according to this specification. It should be noted that the technical solution of this image processing apparatus and the technical solution of the image processing method described above belong to the same concept. Details not described in detail in the technical solution of the image processing apparatus can be found in the description of the technical solution of the image processing method described above.

[0202] Figure 8 This is a schematic diagram of the image model training device provided according to an embodiment of this specification. Figure 8 As shown, the image model includes an image fusion and desensitization model. The image model training device 800 in this embodiment may include: an image acquisition module 810, a feature determination module 820, a foreground / background decoupling and desensitization module 830, a fusion processing module 840, a model loss determination module 850, and a model training module 860. Wherein:

[0203] Image acquisition module 810 is configured to acquire a first number of target recognition images, wherein the target recognition images are images of target objects collected in a predetermined area;

[0204] The feature determination module 820 is configured to determine the image features of each target recognition image in the first number of target recognition images, the image features including foreground features and background features;

[0205] The foreground and background decoupling and desensitization module 830 is configured to input the image features of the first number of target recognition images into the image fusion and desensitization model to obtain the corresponding foreground fusion desensitization image and background fusion desensitization image;

[0206] The fusion processing module 840 is configured to perform fusion processing on the foreground fused desensitized image and the background fused desensitized image to generate a fused desensitized image corresponding to the first number of target recognition images;

[0207] The model loss determination module 850 is configured to determine the model loss of the image model based on the fused desensitized image and the first number of target recognition images, wherein the model loss includes the desensitization loss;

[0208] The model training module 860 is configured to train the image model based on the model loss.

[0209] In some example embodiments, based on the above scheme, the image fusion desensitization model includes a foreground fusion desensitization sub-model and a background fusion desensitization sub-model, and the foreground-background decoupling desensitization module 830 is configured as follows:

[0210] The image features and corresponding foreground features of the first number of target recognition images are input into the foreground fusion desensitization sub-model to obtain the foreground fusion desensitization image;

[0211] The image features of the first number of target recognition images and the corresponding background features are input into the background fusion desensitization sub-model to obtain the background fusion desensitization image.

[0212] In some example embodiments, based on the above scheme, the image model further includes an anti-desensitization model, the model loss further includes an anti-desensitization loss, and the device 800 further includes:

[0213] The anti-desensitization module is used to input the fused desensitized image into the anti-desensitization model to obtain the first number of predicted anti-desensitized images;

[0214] The desensitization loss determination module is used to determine the desensitization loss of the desensitization model based on the image differences between the first number of predicted recognition images and the first number of target recognition images.

[0215] The model training module 860 is also configured to:

[0216] The parameters of the image model are adjusted based on the desensitization loss and the anti-desensitization loss.

[0217] In some example embodiments, based on the above solution, the device 800 further includes:

[0218] The image acquisition module is configured to acquire a second number of target recognition images before acquiring a first number of target recognition images, wherein the second number is greater than the first number;

[0219] A similarity determination module is configured to determine the similarity between every two target recognition images in the second number of target recognition images;

[0220] The image selection module is configured to input the similarity and the image features of the target recognition image into the image selection model to obtain the first number of target recognition images.

[0221] In some example embodiments, based on the above scheme, the similarity determination module is configured as follows:

[0222] Determine the image features of each target recognition image in the second number of target recognition images, the image features including foreground features and background features;

[0223] Determine the foreground similarity of the foreground features and the background similarity of the background features for every two target recognition images in the second number of target recognition images.

[0224] In some example embodiments, based on the above scheme, the image selection model includes a similarity encoding module and a prediction selection module, the loss function of the image selection model is mutual information, and the image selection module is further configured to:

[0225] The foreground similarity and background similarity corresponding to every two target recognition images are input into the similarity encoding module to obtain the second number of similarity features corresponding to the target recognition images;

[0226] The similarity features, along with the corresponding foreground and background features, are input into the prediction and selection module to obtain the first number of target recognition images.

[0227] Determine the mutual information between the first number of target recognition images;

[0228] Based on the mutual information, the parameters of the image selection model are adjusted.

[0229] In some example embodiments, based on the above scheme, the image features further include a foreground category and a background category, and the feature determination module 820 is further configured to:

[0230] The feature vector of the target recognition image is input into the foreground feature encoder to obtain the foreground features and the foreground category;

[0231] The feature vector of the target recognition image is input into the background feature encoder to obtain the background features and background category of the target recognition image.

[0232] The above is an illustrative scheme of an image model training device according to an embodiment of this specification. It should be noted that the technical solution of this image model training device and the technical solution of the image model training method described above belong to the same concept. Details not described in detail in the technical solution of the image model training device can be found in the description of the technical solution of the image model training method described above.

[0233] This specification also provides a computer storage medium that can store multiple program instructions adapted to be loaded and executed by a processor as described above. Figures 2-6 The method steps of the illustrated embodiment can be found in the following documentation for detailed execution. Figures 2-6 The specific details of the illustrated embodiments will not be elaborated here.

[0234] This specification also provides a computer program product that stores at least one instruction, which is loaded and executed by a processor as described above. Figures 2-6 The image processing method described in the illustrated embodiment can be found in the following document for a detailed execution process. Figures 2-6 The specific details of the illustrated embodiments will not be elaborated here.

[0235] Please refer to Figure 9 This diagram illustrates the structure of an electronic device provided in an exemplary embodiment of this specification. The electronic device in this specification may include one or more of the following components: a processor 910, a memory 920, an input device 930, an output device 940, and a bus 950. The processor 910, memory 920, input device 930, and output device 940 may be connected via the bus 950.

[0236] Processor 910 may include one or more processing cores. Processor 910 connects to various parts of the electronic device through various interfaces and lines, and performs various functions and processes data of electronic device 900 by running or executing instructions, programs, code sets, or instruction sets stored in memory 920, and by calling data stored in memory 920. Optionally, processor 910 may be implemented using at least one hardware form of digital signal processing (DSP), field-programmable gate array (FPGA), or programmable logic array (PLA). Processor 910 may integrate one or more of the following: central processing unit (CPU), graphics processing unit (GPU), and modem. The CPU mainly handles the operating system, user interface, and applications; the GPU is responsible for rendering and drawing the displayed content; and the modem is used for wireless communication. It is understood that the modem may also not be integrated into processor 910 and may be implemented separately through a communication chip.

[0237] The memory 920 may include random access memory (RAM) or read-only memory (ROM). Optionally, the memory 920 may include a non-transitory computer-readable storage medium. The memory 920 may be used to store instructions, programs, code, code sets, or instruction sets. The memory 920 may include a program storage area and a data storage area, wherein the program storage area may store instructions for implementing an operating system, instructions for implementing at least one function (e.g., touch function, sound playback function, image playback function, etc.), instructions for implementing the various method embodiments described below, etc. The operating system may be an Android system, including systems deeply developed based on the Android system, an iOS system, including systems deeply developed based on the iOS system, or other systems.

[0238] In order for the operating system to distinguish the specific application scenarios of third-party applications, it is necessary to establish data communication between the third-party applications and the operating system. This would allow the operating system to obtain the current scenario information of the third-party applications at any time, and then perform targeted system resource adaptation based on the current scenario.

[0239] The input device 930 is used to receive input instructions or data, and includes, but is not limited to, a keyboard, mouse, camera, microphone, or touch device. The output device 940 is used to output instructions or data, and includes, but is not limited to, a display device and a speaker. In one example, the input device 930 and the output device 940 can be combined, and both the input device 930 and the output device 940 can be a touch display screen.

[0240] In addition, those skilled in the art will understand that the structure of the electronic device shown in the above figures does not constitute a limitation on the electronic device. The electronic device may include more or fewer components than shown, or combine certain components, or have different component arrangements. For example, the electronic device may also include radio frequency circuits, input units, sensors, audio circuits, Wireless Fidelity (WiFi) modules, power supplies, Bluetooth modules, etc., which will not be described in detail here.

[0241] exist Figure 9 In the illustrated electronic device, the processor 910 can be used to invoke image processing applications or image model training applications stored in the memory 920. For example, the processor 910 can perform the following operations:

[0242] Acquire a first number of target recognition images, wherein the target recognition images are images of target objects collected in a predetermined area;

[0243] Determine the image features of each target recognition image in the first number of target recognition images, the image features including foreground features and background features;

[0244] The foreground features and background features of the first number of target recognition images are respectively fused and desensitized to generate corresponding foreground fused desensitized images and background fused desensitized images;

[0245] The foreground fused desensitized image and the background fused desensitized image are fused to generate a fused desensitized image corresponding to the first number of target recognition images.

[0246] The above is an illustrative embodiment of an electronic device according to this specification. It should be noted that the technical solution of this electronic device belongs to the same concept as the technical solution of the image processing method or image model training method described above. Details not described in detail in the technical solution of the electronic device can be found in the description of the technical solution of the image processing method or image model training method described above.

[0247] Those skilled in the art will understand that all or part of the processes in the above embodiments can be implemented by a computer program instructing related hardware. The computer program can be stored in a computer-readable storage medium, and when executed, it can include the processes of the embodiments of the above methods. The storage medium for the computer program can be a magnetic disk, optical disk, read-only memory, or random access memory, etc.

[0248] The above-disclosed embodiments are merely preferred embodiments of this specification and should not be construed as limiting the scope of this specification. Therefore, any equivalent variations made in accordance with the claims of this specification shall still fall within the scope of this specification.

[0249] The foregoing has described specific embodiments of this specification. Other embodiments are within the scope of the appended claims. In some cases, the actions or steps recited in the claims may be performed in a different order than that shown in the embodiments and may still achieve the desired result. Furthermore, the processes depicted in the drawings do not necessarily require the specific or sequential order shown to achieve the desired result. In some embodiments, multitasking and parallel processing are possible or may be advantageous.

Claims

1. An image processing method, comprising: Acquire a first number of target recognition images, wherein the target recognition images are images of target objects collected in a predetermined area, and the first number of target recognition images include target recognition images selected from multiple target recognition images with a similarity greater than a predetermined threshold; Determine the image features of each target recognition image in the first number of target recognition images, the image features including foreground features and background features; The foreground features of the first number of target recognition images are fused and desensitized, and the background features of the first number of target recognition images are fused and desensitized to generate corresponding foreground fused and desensitized images and background fused and desensitized images. The fusion and desensitization process means fusing the foreground features or the background features of the first number of target recognition images and desensitizing the fused image. The foreground fused desensitized image and the background fused desensitized image are fused to generate a fused desensitized image corresponding to the first number of target recognition images.

2. The method according to claim 1, wherein, The step of fusing and desensitizing the foreground features of the first number of target recognition images and fusing and desensitizing the background features of the first number of target recognition images to generate corresponding foreground fused desensitized images and background fused desensitized images includes: The foreground features of each target recognition image in the first number of target recognition images are fused to generate a corresponding foreground fused image; The background features of each target recognition image in the first number of target recognition images are fused to generate a corresponding background fused image; The foreground fusion image and the background fusion image are respectively desensitized to generate corresponding foreground fusion desensitized images and background fusion desensitized images.

3. The method according to claim 1, wherein, Before acquiring the first number of target recognition images, the method further includes: Acquire a second number of the target recognition images, wherein the second number is greater than the first number; Determine the similarity between any two target recognition images in the second number of target recognition images; The first number of target recognition images are selected from the second number of target recognition images based on the similarity.

4. The method according to claim 3, wherein, Determining the similarity between any two target recognition images in the second number of target recognition images includes: Determine the image features of each target recognition image in the second number of target recognition images, the image features including foreground features and background features; Determine the foreground similarity of the foreground features and the background similarity of the background features for every two target recognition images in the second number of target recognition images. The step of selecting the first number of target recognition images from the second number of target recognition images based on the similarity includes: The first number of target recognition images are selected from the second number of target recognition images based on the foreground similarity and the background similarity.

5. The method according to claim 4, wherein, The step of selecting the first number of target recognition images from the second number of target recognition images based on the foreground similarity and the background similarity includes: Based on the foreground similarity and the background similarity, determine the foreground similarity matrix and background similarity matrix corresponding to the second number of target recognition images; Based on the foreground similarity matrix and the background similarity matrix, the similarity features of the second number of target recognition images are determined; The first number of target recognition images are selected from the second number of target recognition images based on the similarity feature.

6. The method according to claim 1, wherein, The image features also include foreground category features and background category features. Determining the image features of each target recognition image in the first number of target recognition images includes: Determine the foreground category features and background category features of each target recognition image in the first number of target recognition images.

7. The method according to any one of claims 1 to 6, wherein, The target recognition image is a biometric image, and the target object includes a human face.

8. The method according to any one of claims 1 to 6, wherein, The method further includes: The fused desensitized image is input into a pre-trained anti-desensitization model to obtain the first number of target recognition images. The anti-desensitization model is an anti-desensitization model trained using the first number of target recognition images as samples.

9. An image model training method, wherein, The image model includes an image fusion desensitization model, and the method includes: Acquire a first number of target recognition images, wherein the target recognition images are images of target objects collected in a predetermined area, and the first number of target recognition images include target recognition images selected from multiple target recognition images with a similarity greater than a predetermined threshold; Determine the image features of each target recognition image in the first number of target recognition images, the image features including foreground features and background features; The image features of the first number of target recognition images are input into the image fusion desensitization model for fusion desensitization processing to obtain the corresponding foreground fusion desensitization image and background fusion desensitization image. The fusion desensitization processing means fusing the foreground features or the background features of the first number of target recognition images and desensitizing the fused image. The foreground fused desensitized image and the background fused desensitized image are fused to generate the fused desensitized image corresponding to the first number of target recognition images; Based on the fused desensitized image and the first number of target recognition images, the model loss of the image model is determined, and the model loss includes the desensitization loss; The image model is trained based on the model loss.

10. The method according to claim 9, wherein, The image fusion desensitization model includes a foreground fusion desensitization sub-model and a background fusion desensitization sub-model. The step of inputting the image features of the first number of target recognition images into the image fusion desensitization model for fusion desensitization processing to obtain corresponding foreground fusion desensitized images and background fusion desensitized images includes: The image features and corresponding foreground features of the first number of target recognition images are input into the foreground fusion desensitization sub-model to obtain the foreground fusion desensitization image; The image features of the first number of target recognition images and the corresponding background features are input into the background fusion desensitization sub-model to obtain the background fusion desensitization image.

11. The method according to claim 9, wherein, The image model further includes an anti-desensitization model, the model loss further includes an anti-desensitization loss, and the method further includes: The fused desensitized image is input into the anti-desensitization model to obtain the first number of predicted anti-desensitized images; Based on the image differences between the first number of predicted desensitized images and the first number of target recognition images, the desensitization loss of the desensitization model is determined. Training the image model based on the model loss includes: The parameters of the image model are adjusted based on the desensitization loss and the anti-desensitization loss.

12. The method according to claim 9, wherein, Before acquiring the first number of target recognition images, the method further includes: Acquire a second number of the target recognition images, wherein the second number is greater than the first number; Determine the similarity between any two target recognition images in the second number of target recognition images; The similarity and the image features of the target recognition image are input into the image selection model to obtain the first number of target recognition images.

13. The method according to claim 12, wherein, Determining the similarity between any two target recognition images in the second number of target recognition images includes: Determine the image features of each target recognition image in the second number of target recognition images, the image features including foreground features and background features; Determine the foreground similarity of the foreground features and the background similarity of the background features for every two target recognition images in the second number of target recognition images.

14. The method according to claim 13, wherein, The image selection model includes a similarity encoding module and a prediction selection module. The loss function of the image selection model is mutual information. The step of inputting the similarity and the image features of the target recognition image into the image selection module to obtain the first number of target recognition images includes: The foreground similarity and background similarity corresponding to every two target recognition images are input into the similarity encoding module to obtain the second number of similarity features corresponding to the target recognition images; The similarity features, along with the corresponding foreground and background features, are input into the prediction and selection module to obtain the first number of target recognition images. Determine the mutual information between the first number of target recognition images; Based on the mutual information, the parameters of the image selection model are adjusted.

15. The method according to claim 9, wherein, The image features also include foreground category and background category, and determining the image features of each target recognition image in the first number of target recognition images includes: The feature vector of the target recognition image is input into the foreground feature encoder to obtain the foreground features and the foreground category; The feature vector of the target recognition image is input into the background feature encoder to obtain the background features and background category of the target recognition image.

16. An image processing apparatus, comprising: The image acquisition module is configured to acquire a first number of target recognition images, wherein the target recognition images are images of target objects collected in a predetermined area, and the first number of target recognition images include target recognition images selected from multiple target recognition images with a similarity greater than a predetermined threshold. The feature determination module is configured to determine the image features of each target recognition image in the first number of target recognition images, the image features including foreground features and background features; The foreground and background decoupling and desensitization module is configured to perform fusion desensitization processing on the foreground features of the first number of target recognition images and fusion desensitization processing on the background features of the first number of target recognition images, generating corresponding foreground fusion desensitization images and background fusion desensitization images. The fusion desensitization processing means fusing the foreground features or the background features of the first number of target recognition images and desensitizing the fused image. The fusion processing module is configured to perform fusion processing on the foreground fused desensitized image and the background fused desensitized image to generate a fused desensitized image corresponding to the first number of target recognition images.

17. An image model training device, wherein, The image model includes an image fusion desensitization model, and the device includes: The image acquisition module is configured to acquire a first number of target recognition images, wherein the target recognition images are images of target objects collected in a predetermined area, and the first number of target recognition images include target recognition images selected from multiple target recognition images with a similarity greater than a predetermined threshold. The feature determination module is configured to determine the image features of each target recognition image in the first number of target recognition images, the image features including foreground features and background features; The foreground and background decoupling and desensitization module is configured to input the image features of the first number of target recognition images into the image fusion and desensitization model to obtain the corresponding foreground fusion desensitization image and background fusion desensitization image; The fusion processing module is configured to perform fusion processing on the foreground fused desensitized image and the background fused desensitized image to generate a fused desensitized image corresponding to the first number of target recognition images; The model loss determination module is configured to determine the model loss of the image model based on the fused desensitized image and the first number of target recognition images, wherein the model loss includes the desensitization loss; The model training module is configured to train the image model based on the model loss.

18. A computer storage medium storing a plurality of instructions adapted for loading by a processor and performing the steps of the method as claimed in any one of claims 1 to 15.

19. An electronic device comprising: A processor and a memory; wherein the memory stores a computer program adapted to be loaded by the processor and to execute the steps of the method as claimed in any one of claims 1 to 15.

20. A computer program product comprising instructions that, when run on a computer or processor, causes the computer or processor to perform the steps of the method as claimed in any one of claims 1-15.