Clustering-based decision-making method and related equipment for deepfake image detection

By employing a cluster-based deepfake image detection method, which utilizes feature extraction and distance between cluster centers to detect deepfake images, the problem of high-quality fake images being difficult to detect is solved, achieving accurate and efficient fake image recognition.

CN116580462BActive Publication Date: 2026-06-30HARBIN INST OF TECH SHENZHEN GRADUATE SCHOOL

Patent Information

Authority / Receiving Office
CN · China
Patent Type
Patents(China)
Current Assignee / Owner
HARBIN INST OF TECH SHENZHEN GRADUATE SCHOOL
Filing Date
2023-04-20
Publication Date
2026-06-30

Smart Images

  • Figure CN116580462B_ABST
    Figure CN116580462B_ABST
Patent Text Reader

Abstract

This invention provides a deepfake image detection method and related equipment based on clustering decision-making, belonging to the field of image processing technology. The method includes: acquiring a face image to be detected; performing image segmentation on the face image to be detected to obtain at least one local face image; inputting the face image to be detected and the local face image into a trained feature extraction model to obtain fusion features output by the feature extraction model; obtaining a first distance between the fusion feature and a trained first cluster center, and a second distance between the fusion feature and a trained second cluster center; determining the forgery detection result of the face image to be detected based on the first distance and the second distance, wherein the forgery detection result reflects whether the face image to be detected is a forgery image, the first cluster center reflects the fusion feature corresponding to the forgery image, and the second cluster center reflects the fusion feature corresponding to the real image. This invention can improve the generalization of deepfake image detection.
Need to check novelty before this filing date? Find Prior Art

Description

Technical Field

[0001] This invention relates to the field of image processing technology, and in particular to a deepfake image detection method based on clustering decision and related equipment. Background Technology

[0002] Deepfake technology can generate fake facial images, which malicious users can exploit to harm others. Currently, the human eye often cannot distinguish high-quality deepfake images, making the detection of deepfake facial images a crucial issue that urgently needs to be addressed in the industry. Summary of the Invention

[0003] This invention provides a clustering-based deepfake image detection and related equipment to address the shortcomings of existing technologies in detecting high-quality deepfake images, thereby achieving accurate detection of deepfake images.

[0004] This invention provides a method for detecting deepfake images based on clustering decision-making, comprising:

[0005] A face image to be detected is acquired, and the face image to be detected is segmented to obtain at least one local face image, wherein the local face image reflects the facial features in the face image to be detected;

[0006] The face image to be detected and the local face image are input into a trained feature extraction model to obtain the fused features output by the feature extraction model;

[0007] The first distance between the fusion feature and the trained first cluster center, and the second distance between the fusion feature and the trained second cluster center are obtained respectively. The forgery detection result of the face image to be detected is determined based on the first distance and the second distance. The forgery detection result reflects whether the face image to be detected is a forgery image. The first cluster center reflects the fusion feature corresponding to the forgery image, and the second cluster center reflects the fusion feature corresponding to the real image.

[0008] The trained feature extraction model, the trained first cluster center, and the trained second cluster center are trained based on multiple sets of training data. Each set of training data includes a sample face image to be detected and a forgery detection label corresponding to the sample face image. The training process of the feature extraction model includes:

[0009] An initial feature extraction model and a binary classifier are trained based on multiple sets of training data to obtain an intermediate feature extraction model. The binary classifier is used to output a forgery prediction result based on the fused features output by the feature extraction model. The forgery prediction result is used to classify the training data as real images or forged images.

[0010] The intermediate feature extraction model is trained based on multiple sets of training data to obtain the trained feature extraction model, the trained first cluster center, and the trained second cluster center.

[0011] According to the present invention, a deepfake image detection method based on clustering decision-making, wherein training an intermediate feature extraction model based on multiple sets of training data to obtain the trained feature extraction model, the trained first cluster center, and the trained second cluster center includes:

[0012] Based on the multiple sets of training data, the initial values ​​of the first cluster center and the second cluster center are determined;

[0013] A target training data batch is determined from multiple sets of training data, wherein the target training data batch includes multiple sets of target training data;

[0014] The sample face image to be detected in the target training data is segmented to obtain sample local face images. The sample face image to be detected and the sample local face images are input into the intermediate feature extraction model to obtain the sample fusion features output by the intermediate feature extraction model.

[0015] The parameters of the intermediate feature extraction model are updated based on the sample fusion features and the forgery detection labels corresponding to each target training data in the target training data batch, and the first cluster center and the second cluster center are updated based on the sample fusion features and the forgery detection labels corresponding to each target training data in the target training data batch;

[0016] The step of determining the target training data batch in multiple sets of training data is repeated until the parameters of the intermediate feature extraction model, the first cluster center, and the second cluster center converge. The intermediate feature extraction model after parameter convergence is taken as the trained feature extraction model, the first cluster center after convergence is taken as the trained first cluster center, and the second cluster center after convergence is taken as the trained second cluster center.

[0017] According to the present invention, a deepfake image detection method based on clustering decision-making is provided, wherein determining the initial values ​​of the first cluster center and the second cluster center based on multiple sets of training data includes:

[0018] The training data in each group is classified to obtain a first set and a second set, wherein the forgery detection label in the training data in the first set indicates that the corresponding sample face image to be detected is a forgery image, and the forgery detection label in the training data in the second set indicates that the corresponding sample face image to be detected is a real image;

[0019] Based on the intermediate feature extraction model, a first fusion feature corresponding to each sample face image to be detected in the first set and a second fusion feature corresponding to each sample face image to be detected in the second set are determined. The average value of each first fusion feature is used as the initial value of the first cluster center, and the average value of each second fusion feature is used as the initial value of the second cluster center.

[0020] According to the present invention, a deepfake image detection method based on clustering decision-making, wherein updating the parameters of the intermediate feature extraction model based on the sample fusion features and the forgery detection label corresponding to each target training data in the target training data batch includes:

[0021] The loss function is calculated based on the sample fusion feature and the forgery detection label corresponding to each target training data in the target training data batch;

[0022] The parameters of the intermediate feature extraction model are updated with the goal of minimizing the loss function;

[0023] The loss function includes a first loss function and a second loss function, wherein the first loss function is: The second loss function is Wherein, F represents the sample fusion feature corresponding to the target training data, Fake represents the set of sample fusion features corresponding to the target training data whose forgery detection label is a forged image in the target training data batch, Real represents the set of sample fusion features corresponding to the target training data whose forgery detection label is a real image in the target training data batch, CF represents the current first cluster center, CR represents the current second cluster center, and β is a constant.

[0024] According to the present invention, a deepfake image detection method based on clustering decision is provided, wherein updating the first cluster center and the second cluster center based on the sample fusion feature and the forgery detection label corresponding to each target training data in the target training data batch includes:

[0025] The first cluster center is updated based on the sample fusion features corresponding to all target training data in the target training data batch whose forgery detection label is a forged image.

[0026] The second cluster center is updated based on the sample fusion features corresponding to all the target training data batches whose forgery detection labels are forged images.

[0027] According to the present invention, a deepfake image detection method based on clustering decision is provided, wherein updating the first cluster center according to the sample fusion features corresponding to all target training data in the target training data batch whose forgery detection labels are forged images includes:

[0028] The first cluster center is updated based on the first formula;

[0029] The step of updating the second cluster center based on the sample fusion features corresponding to all target training data in the target training data batch whose forgery detection labels are forged images includes:

[0030] The second cluster center is updated based on the second formula;

[0031] The first formula is CF t =(1-α)CF t-1 +αAvg(∑ F∈Fake F), the second formula is CR t =(1-α)CR t-1 +αAvg(∑ F∈Real F);

[0032] Among them, CF t CR represents the first cluster center updated based on the t-th training batch of the target. t Let F represent the sample fusion feature corresponding to the target training data after updating based on the t-th target training batch, F represent the sample fusion feature corresponding to the target training data whose forgery detection label is a forged image in the target training data batch, Fake represent the set of sample fusion features corresponding to the target training data whose forgery detection label is a real image in the target training data batch, and Avg() represents the average value of the contents in parentheses, where α is a constant.

[0033] According to the present invention, a deepfake image detection method based on clustering decision-making is provided, wherein the feature extraction model includes a first frequency feature extraction module, a second frequency feature extraction module, a first spatial feature extraction module, and a second spatial feature extraction module; the step of obtaining the fused features output by the feature extraction model includes:

[0034] The face image to be detected is input into the first spatial feature extraction module and the first frequency feature extraction module respectively. The first spatial feature of the face image to be detected is extracted based on the first spatial feature extraction module, and the first frequency feature of the face image to be detected is extracted based on the first frequency feature extraction module.

[0035] The local face image is input to the second spatial feature extraction module and the second frequency feature extraction module respectively. The second spatial feature of the local face image is extracted based on the second spatial feature extraction module, and the second frequency feature of the face image to be detected is extracted based on the second frequency feature extraction module.

[0036] All the first spatial features, the first frequency features, the second spatial features, and the second frequency features are concatenated to obtain concatenated features, and the fused features are obtained based on the concatenated features.

[0037] The present invention also provides a deepfake image detection device based on clustering decision, comprising:

[0038] The image acquisition module is used to acquire a face image to be detected, perform image segmentation on the face image to be detected, and obtain at least one local face image, wherein the local face image reflects the facial features in the face image to be detected;

[0039] The feature fusion module is used to input the face image to be detected and the local face image into the trained feature extraction model to obtain the fused features output by the feature extraction model;

[0040] The forgery detection module is used to obtain a first distance between the fusion feature and a trained first cluster center, and a second distance between the fusion feature and a trained second cluster center, and to determine the forgery detection result of the face image to be detected based on the first distance and the second distance. The forgery detection result reflects whether the face image to be detected is a forgery image. The first cluster center reflects the fusion feature corresponding to the forgery image, and the second cluster center reflects the fusion feature corresponding to the real image.

[0041] The trained feature extraction model, the trained first cluster center, and the trained second cluster center are trained based on multiple sets of training data. Each set of training data includes a sample face image to be detected and a forgery detection label corresponding to the sample face image. The training process of the feature extraction model includes:

[0042] An initial feature extraction model and a binary classifier are trained based on multiple sets of training data to obtain an intermediate feature extraction model. The binary classifier is used to output a forgery prediction result based on the fused features output by the feature extraction model. The forgery prediction result is used to classify the training data as real images or forged images.

[0043] The intermediate feature extraction model is trained based on multiple sets of training data to obtain the trained feature extraction model, the trained first cluster center, and the trained second cluster center.

[0044] The present invention also provides an electronic device, including a memory, a processor, and a computer program stored in the memory and executable on the processor, wherein the processor executes the program to implement the deepfake image detection method based on clustering decision as described above.

[0045] The present invention also provides a non-transitory computer-readable storage medium having a computer program stored thereon, which, when executed by a processor, implements the deepfake image detection method based on clustering decision as described above.

[0046] The present invention provides a deepfake image detection method and related equipment based on clustering decision. This method segments a face image, inputs both the face image and a local face image into a feature extraction model to extract features, and then detects forgeries based on the fused features output by the feature extraction model and the distance between the fused features and the cluster centers of the real and forged images. During the training process of the feature extraction model, a binary classifier is first used to train the model, enabling it to focus on tampering traces in certain selective facial regions. Then, based on clustering decision, contrastive learning is further utilized to amplify the differences between real and forged samples. This results in more accurate image forgery detection results generated based on the feature distance relationship between the target face image and the cluster centers of the real and forged images, achieving accurate detection of deepfake images. Attached Figure Description

[0047] To more clearly illustrate the technical solutions in this invention or the prior art, the drawings used in the description of the embodiments or the prior art will be briefly introduced below. Obviously, the drawings described below are some embodiments of this invention. For those skilled in the art, other drawings can be obtained from these drawings without creative effort.

[0048] Figure 1 This is a flowchart illustrating the deepfake image detection method based on clustering decision provided by the present invention.

[0049] Figure 2 This is a schematic diagram of the image segmentation process in the deep forgery image detection method based on clustering decision provided by the present invention;

[0050] Figure 3 This is a schematic diagram illustrating the process of generating fused features in the feature extraction model of the deep forgery image detection method based on clustering decision provided by the present invention;

[0051] Figure 4 This is a schematic diagram of the model training process in the deep forgery image detection method based on clustering decision provided by the present invention;

[0052] Figure 5 This is a schematic diagram of the deep forgery image detection device based on clustering decision provided by the present invention;

[0053] Figure 6 This is a schematic diagram of the structure of the electronic device provided by the present invention. Detailed Implementation

[0054] To make the objectives, technical solutions, and advantages of this invention clearer, the technical solutions of this invention will be clearly and completely described below with reference to the accompanying drawings. Obviously, the described embodiments are only some, not all, of the embodiments of this invention. All other embodiments obtained by those skilled in the art based on the embodiments of this invention without creative effort are within the scope of protection of this invention.

[0055] The following is combined with Figures 1-4 This invention describes a deepfake image detection method based on clustering decision-making.

[0056] like Figure 1 As shown, the deepfake image detection method based on clustering decision provided by the present invention includes the following steps:

[0057] S110. Obtain a face image to be detected, perform image segmentation on the face image to be detected, and obtain at least one local face image, wherein the local face image reflects the facial features in the face image to be detected.

[0058] S120. Input the face image to be detected and the local face image into the trained feature extraction model to obtain the fused features output by the feature extraction model;

[0059] S130 respectively obtains the first distance between the fusion feature and the trained first cluster center, and the second distance between the fusion feature and the trained second cluster center, and determines the forgery detection result of the face image to be detected based on the first distance and the second distance. The forgery detection result reflects whether the face image to be detected is a forgery image. The first cluster center reflects the fusion feature corresponding to the forgery image, and the second cluster center reflects the fusion feature corresponding to the real image.

[0060] The inventors discovered that simply introducing deep learning models to detect deepfake images mostly only works for detecting specific types of tampering, i.e., deepfake traces present in the dataset used to train the model. These models often perform poorly when detecting deepfake images they have never encountered before; this characteristic can be referred to as a lack of generalization. To make models for detecting deepfake images more relevant to real-world scenarios, one possible approach is to design models and training strategies that guide the model to autonomously focus on more common feature anomalies, rather than just those regions that have the greatest impact on the final result. However, such models or training strategies are often quite complex, and the number of parameters is larger than that of typical benchmark models, making them unsuitable for deployment on devices with limited computing power and memory.

[0061] The method provided by this invention segments a face image, inputs both the face image and a local face image into a feature extraction model to extract features, and then detects forgery based on the fused features output by the feature extraction model and the distance between the fused features and the cluster centers of the real and forged images. During the training process of the feature extraction model, a binary classifier is first used to train the model, enabling it to focus on tampering traces in certain selective facial regions. Then, based on clustering decisions, contrastive learning is further utilized to amplify the differences between real and forged samples, making the final image forgery detection result generated based on the feature distance relationship between the target face image and the cluster centers of the real and forged images more accurate. This achieves accurate detection of deepfake images. Furthermore, since all features are treated equally in distance calculation, the method provided by this invention avoids over-reliance on certain features, effectively improving the model's generalization ability, while also requiring less complex model structure design and fewer model parameters.

[0062] Specifically, such as Figure 2As shown, the face image to be detected can be a face image extracted from a single image or a video frame. The face image to be detected is segmented to obtain at least one partial face image. This partial face image can be a facial feature portion of the face image to be detected, such as the eyes, mouth, and nose. In one possible implementation, there can be four partial face images: the left eye region image, the right eye region image, the nose region image, and the mouth region image of the face image to be detected. Segmenting the face image to be detected to obtain the partial face images can be done using existing face detectors, such as Blazeface.

[0063] The feature extraction model includes a first frequency feature extraction module, a second frequency feature extraction module, a first spatial feature extraction module, and a second spatial feature extraction module; obtaining the fused features output by the feature extraction model includes:

[0064] The face image to be detected is input into the first spatial feature extraction module and the first frequency feature extraction module respectively. The first spatial feature of the face image to be detected is extracted based on the first spatial feature extraction module, and the first frequency feature of the face image to be detected is extracted based on the first frequency feature extraction module.

[0065] The local face image is input to the second spatial feature extraction module and the second frequency feature extraction module respectively. The second spatial feature of the local face image is extracted based on the second spatial feature extraction module, and the second frequency feature of the face image to be detected is extracted based on the second frequency feature extraction module.

[0066] All the first spatial features, the first frequency features, the second spatial features, and the second frequency features are concatenated to obtain concatenated features, and the fused features are obtained based on the concatenated features.

[0067] like Figure 3As shown, after inputting the target face image and the local face image into the feature extraction model, the features of the target face image are first extracted through the first spatial feature extraction module and the first frequency feature extraction module, and the features of the local face image are extracted through the second spatial feature extraction module and the second frequency feature extraction module. Specifically, the first spatial feature extraction module and the second spatial feature extraction module can use existing image spatial feature extraction modules, such as convolution modules, where each first spatial feature and each second spatial feature is a vector of size 1*256. To improve the detection efficiency and accuracy of deepfake images, the method provided by this invention focuses the main discrimination criteria for deepfake images on facial features, with the remaining parts only used as supplementary features. That is, the number of channels in the first spatial feature extraction module is less than the number of channels in the second spatial feature extraction module, and the number of parameters in the first spatial feature extraction module is less than the number of parameters in the second spatial feature extraction model. To further enhance feature representation capabilities, the method provided by this invention also extracts frequency features from the face image to be detected and the local face image. Specifically, the first frequency feature extraction module and the second frequency feature extraction module extract frequency features using Discrete Cosine Transform (DCT). For each image, the average amplitude of each frequency is obtained to form a 1*128 frequency vector as the first frequency feature or the second frequency feature. Finally, all the first frequency features, second frequency features, first spatial features, and second spatial features are concatenated and then projected onto a fusion feature F of size 1*200 through a fully connected layer for classification.

[0068] In most methods of forgery detection using deep learning models, features extracted from the image to be detected are input into a binary classifier. The binary classifier outputs a detection result indicating whether the image is real or forged. During training, the feature extraction model and the binary classifier are trained together. However, in the method provided in this invention, to improve the model's generalization ability, all features are treated equally. Instead of this approach, the method is as follows: Figure 4As shown, a traditional binary classifier is first used to train a feature extraction model to obtain the center positions of positive and negative samples (positive samples correspond to real images, and negative samples correspond to fake images) on the feature map. After obtaining the centers of positive and negative samples in the dataset, the feature extraction model and cluster centers are then trained. This improves the generalization ability of detecting fake images based on the feature extraction model and cluster centers by further separating the features of real and fake images. That is, the trained feature extraction model, the trained first cluster center, and the trained second cluster center are obtained based on multiple sets of training data. Each set of training data includes a sample face image to be detected and the fake detection label corresponding to the sample face image to be detected. The training process of the feature extraction model includes:

[0069] An initial feature extraction model and a binary classifier are trained based on multiple sets of training data to obtain an intermediate feature extraction model. The binary classifier is used to output a forgery prediction result based on the fused features output by the feature extraction model.

[0070] The intermediate feature extraction model is trained based on multiple sets of training data to obtain the trained feature extraction model, the trained first cluster center, and the trained second cluster center.

[0071] Specifically, the step of training the intermediate feature extraction model based on multiple sets of training data to obtain the trained feature extraction model, the trained first cluster center, and the trained second cluster center includes:

[0072] Based on the multiple sets of training data, the initial values ​​of the first cluster center and the second cluster center are determined;

[0073] A target training data batch is determined from multiple sets of training data, wherein the target training data batch includes multiple sets of target training data;

[0074] The sample face image to be detected in the target training data is segmented to obtain sample local face images. The sample face image to be detected and the sample local face images are input into the intermediate feature extraction model to obtain the sample fusion features output by the intermediate feature extraction model.

[0075] The parameters of the intermediate feature extraction model are updated based on the sample fusion features and the forgery detection labels corresponding to each target training data in the target training data batch, and the first cluster center and the second cluster center are updated based on the sample fusion features and the forgery detection labels corresponding to each target training data in the target training data batch;

[0076] The step of determining the target training data batch in multiple sets of training data is repeated until the number of the target training data batches reaches a preset value. When the number of the target training data batches reaches the preset value, the training ends. The intermediate feature extraction model at the end of training is taken as the trained feature extraction model, the first cluster center at the end of training is taken as the trained first cluster center, and the second cluster center at the end of training is taken as the trained second cluster center.

[0077] The initial feature extraction model and the feature extraction model have the same structure but different parameters. After training the initial feature extraction model based on multiple sets of training data and updating the parameters of the initial feature extraction model, the intermediate feature extraction model is obtained until the parameters of the initial feature extraction model converge. Obviously, the intermediate feature extraction model has the same structure as the initial feature extraction model and the trained feature extraction model.

[0078] Based on the intermediate feature extraction model, a first cluster center reflecting the features of forged image samples and a second cluster center reflecting the features of real image samples can be obtained. Determining the initial values ​​of the first and second cluster centers based on multiple sets of training data includes:

[0079] The training data in each group is classified to obtain a first set and a second set, wherein the forgery detection label in the training data in the first set indicates that the corresponding sample face image to be detected is a forgery image, and the forgery detection label in the training data in the second set indicates that the corresponding sample face image to be detected is a real image;

[0080] Based on the intermediate feature extraction model, a first fusion feature corresponding to each sample face image to be detected in the first set and a second fusion feature corresponding to each sample face image to be detected in the second set are determined. The average value of each first fusion feature is used as the initial value of the first cluster center, and the average value of each second fusion feature is used as the initial value of the second cluster center.

[0081] Specifically, after obtaining the intermediate feature extraction model, the training dataset is traversed, and each sample face image to be detected that is a forged image is input into the intermediate feature extraction model to obtain each of the first fusion features output by the intermediate feature extraction model. Then, each sample face image to be detected that is a real image is input into the intermediate feature extraction model to obtain each of the second fusion features output by the intermediate feature extraction model. The average value of each of the first fusion features is taken as the initial value of the first cluster center, and the average value of each of the second fusion features is taken as the initial value of the second cluster center.

[0082] The step of updating the parameters of the intermediate feature extraction model based on the sample fusion features and the forgery detection labels corresponding to each target training data in the target training data batch includes:

[0083] The loss function is calculated based on the sample fusion feature and the forgery detection label corresponding to each target training data in the target training data batch;

[0084] The parameters of the intermediate feature extraction model are updated with the goal of minimizing the loss function;

[0085] The loss function includes a first loss function and a second loss function, wherein the first loss function is: The second loss function is Wherein, F represents the sample fusion feature corresponding to the target training data, Fake represents the set of sample fusion features corresponding to the target training data whose forgery detection label is a forged image in the target training data batch, Real represents the set of sample fusion features corresponding to the target training data whose forgery detection label is a real image in the target training data batch, CF represents the current first cluster center, CR represents the current second cluster center, and β is a constant.

[0086] After obtaining the initial values ​​of the first and second cluster centers, the feature extraction model and cluster centers are trained and updated using clustering decisions. This can improve the generalization ability of the feature extraction model and cluster centers in detecting fake images by further separating the features of real and fake images. By minimizing the loss function, the distance between the feature centers of real and fake images can be increased, and samples with the same label can be clustered. β is used to control the allowable boundary. Minimizing the loss function can be achieved by minimizing the weighted sum of the first and second loss functions.

[0087] The step of updating the first cluster center and the second cluster center based on the sample fusion feature and the forgery detection label corresponding to each target training data in the target training data batch includes:

[0088] The first cluster center is updated based on the sample fusion features corresponding to all target training data in the target training data batch whose forgery detection label is a forged image.

[0089] The second cluster center is updated based on the sample fusion features corresponding to all the target training data batches whose forgery detection labels are forged images.

[0090] During the training of the feature extraction model based on clustering decision, the cluster centers are updated according to the features of the current training batch and the center movement rate α, using a momentum update strategy to avoid center point stagnation or excessively rapid position changes. The step of updating the first cluster centers based on the sample fusion features corresponding to all target training data in the target training data batch whose forgery detection label is a forged image includes:

[0091] The first cluster center is updated based on the first formula;

[0092] The step of updating the second cluster center based on the sample fusion features corresponding to all target training data in the target training data batch whose forgery detection labels are forged images includes:

[0093] The second cluster center is updated based on the second formula;

[0094] The first formula is CF t =(1-α)CF t-1 +αAvg(∑ F∈Fake F), the second formula is CR t =(1-α)CR t-1 +αAvg(∑ F∈Real F);

[0095] Among them, CF t CR represents the first cluster center updated based on the t-th training batch of the target. tLet F represent the sample fusion feature corresponding to the target training data after updating based on the t-th target training batch, F represent the sample fusion feature corresponding to the target training data whose forgery detection label is a forged image in the target training data batch, Fake represent the set of sample fusion features corresponding to the target training data whose forgery detection label is a real image in the target training data batch, and Avg() represents the average value of the contents in parentheses, where α is a constant.

[0096] The trained feature extraction model and the first and second cluster centers are used for forgery detection of the face image to be detected. The fused features corresponding to the face image to be detected are obtained using the trained feature extraction model. The forgery detection result of the face image to be detected is determined according to the distance between the fused features and the first and second cluster centers, respectively. Specifically, if the fused features are closer to the first cluster center, the face image to be detected is determined to be a forgery image; otherwise, the face image to be detected is determined to be a real image.

[0097] The following describes the deepfake image detection device based on clustering decision-making provided by the present invention. The deepfake image detection device based on clustering decision-making described below can be referred to in correspondence with the deepfake image detection method based on clustering decision-making described above. For example... Figure 5 As shown, the deepfake image detection device based on clustering decision provided by the present invention includes:

[0098] The image acquisition module 510 is used to acquire a face image to be detected, perform image segmentation on the face image to be detected, and obtain at least one local face image, wherein the local face image reflects the facial features in the face image to be detected.

[0099] The feature fusion module 520 is used to input the face image to be detected and the local face image into a trained feature extraction model to obtain the fused features output by the feature extraction model;

[0100] The forgery detection module 530 is used to obtain a first distance between the fusion feature and a trained first cluster center, and a second distance between the fusion feature and a trained second cluster center, and to determine the forgery detection result of the face image to be detected based on the first distance and the second distance. The forgery detection result reflects whether the face image to be detected is a forgery image. The first cluster center reflects the fusion feature corresponding to the forgery image, and the second cluster center reflects the fusion feature corresponding to the real image.

[0101] The trained feature extraction model, the trained first cluster center, and the trained second cluster center are trained based on multiple sets of training data. Each set of training data includes a sample face image to be detected and a forgery detection label corresponding to the sample face image. The training process of the feature extraction model includes:

[0102] An initial feature extraction model and a binary classifier are trained based on multiple sets of training data to obtain an intermediate feature extraction model. The binary classifier is used to output a forgery prediction result based on the fused features output by the feature extraction model. The forgery prediction result is used to classify the training data as real images or forged images.

[0103] The intermediate feature extraction model is trained based on multiple sets of training data to obtain the trained feature extraction model, the trained first cluster center, and the trained second cluster center.

[0104] Figure 6 An example is a schematic diagram of the physical structure of an electronic device, such as... Figure 6 As shown, the electronic device may include a processor 610, a communications interface 620, a memory 630, and a communication bus 640, wherein the processor 610, the communications interface 620, and the memory 630 communicate with each other via the communication bus 640. The processor 610 can call logical instructions in the memory 630 to execute a deepfake image detection method based on clustering decisions. This method includes: acquiring a face image to be detected; performing image segmentation on the face image to be detected to obtain at least one local face image, wherein the local face image reflects the facial features in the face image to be detected.

[0105] The face image to be detected and the local face image are input into a trained feature extraction model to obtain the fused features output by the feature extraction model;

[0106] The first distance between the fusion feature and the trained first cluster center, and the second distance between the fusion feature and the trained second cluster center are obtained respectively. The forgery detection result of the face image to be detected is determined based on the first distance and the second distance. The forgery detection result reflects whether the face image to be detected is a forgery image. The first cluster center reflects the fusion feature corresponding to the forgery image, and the second cluster center reflects the fusion feature corresponding to the real image.

[0107] The trained feature extraction model, the trained first cluster center, and the trained second cluster center are trained based on multiple sets of training data. Each set of training data includes a sample face image to be detected and a forgery detection label corresponding to the sample face image. The training process of the feature extraction model includes:

[0108] An initial feature extraction model and a binary classifier are trained based on multiple sets of training data to obtain an intermediate feature extraction model. The binary classifier is used to output a forgery prediction result based on the fused features output by the feature extraction model. The forgery prediction result is used to classify the training data as real images or forged images.

[0109] The intermediate feature extraction model is trained based on multiple sets of training data to obtain the trained feature extraction model, the trained first cluster center, and the trained second cluster center.

[0110] Furthermore, the logical instructions in the aforementioned memory 630 can be implemented as software functional units and, when sold or used as independent products, can be stored in a computer-readable storage medium. Based on this understanding, the technical solution of the present invention, essentially, or the part that contributes to the prior art, or a part of the technical solution, can be embodied in the form of a software product. This computer software product is stored in a storage medium and includes several instructions to cause a computer device (which may be a personal computer, server, or network device, etc.) to execute all or part of the steps of the methods described in the various embodiments of the present invention. The aforementioned storage medium includes various media capable of storing program code, such as USB flash drives, portable hard drives, read-only memory (ROM), random access memory (RAM), magnetic disks, or optical disks.

[0111] On the other hand, the present invention also provides a computer program product, the computer program product including a computer program, the computer program being able to be stored on a non-transitory computer-readable storage medium, the computer program being executed by a processor, the computer being able to execute the deepfake image detection method based on clustering decision provided by the above methods, the method including: acquiring a face image to be detected, performing image segmentation on the face image to be detected to obtain at least one local face image, the local face image reflecting the facial features in the face image to be detected;

[0112] The face image to be detected and the local face image are input into a trained feature extraction model to obtain the fused features output by the feature extraction model;

[0113] The first distance between the fusion feature and the trained first cluster center, and the second distance between the fusion feature and the trained second cluster center are obtained respectively. The forgery detection result of the face image to be detected is determined based on the first distance and the second distance. The forgery detection result reflects whether the face image to be detected is a forgery image. The first cluster center reflects the fusion feature corresponding to the forgery image, and the second cluster center reflects the fusion feature corresponding to the real image.

[0114] The trained feature extraction model, the trained first cluster center, and the trained second cluster center are trained based on multiple sets of training data. Each set of training data includes a sample face image to be detected and a forgery detection label corresponding to the sample face image. The training process of the feature extraction model includes:

[0115] An initial feature extraction model and a binary classifier are trained based on multiple sets of training data to obtain an intermediate feature extraction model. The binary classifier is used to output a forgery prediction result based on the fused features output by the feature extraction model. The forgery prediction result is used to classify the training data as real images or forged images.

[0116] The intermediate feature extraction model is trained based on multiple sets of training data to obtain the trained feature extraction model, the trained first cluster center, and the trained second cluster center.

[0117] In another aspect, the present invention also provides a non-transitory computer-readable storage medium having a computer program stored thereon, which, when executed by a processor, implements a deepfake image detection method based on clustering decision provided by the methods described above. The method includes: acquiring a face image to be detected, performing image segmentation on the face image to be detected to obtain at least one local face image, wherein the local face image reflects the facial features in the face image to be detected.

[0118] The face image to be detected and the local face image are input into a trained feature extraction model to obtain the fused features output by the feature extraction model;

[0119] The first distance between the fusion feature and the trained first cluster center, and the second distance between the fusion feature and the trained second cluster center are obtained respectively. The forgery detection result of the face image to be detected is determined based on the first distance and the second distance. The forgery detection result reflects whether the face image to be detected is a forgery image. The first cluster center reflects the fusion feature corresponding to the forgery image, and the second cluster center reflects the fusion feature corresponding to the real image.

[0120] The trained feature extraction model, the trained first cluster center, and the trained second cluster center are trained based on multiple sets of training data. Each set of training data includes a sample face image to be detected and a forgery detection label corresponding to the sample face image. The training process of the feature extraction model includes:

[0121] An initial feature extraction model and a binary classifier are trained based on multiple sets of training data to obtain an intermediate feature extraction model. The binary classifier is used to output a forgery prediction result based on the fused features output by the feature extraction model. The forgery prediction result is used to classify the training data as real images or forged images.

[0122] The intermediate feature extraction model is trained based on multiple sets of training data to obtain the trained feature extraction model, the trained first cluster center, and the trained second cluster center.

[0123] The device embodiments described above are merely illustrative. The units described as separate components may or may not be physically separate. The components shown as units may or may not be physical units; that is, they may be located in one place or distributed across multiple network units. Some or all of the modules can be selected to achieve the purpose of this embodiment according to actual needs. Those skilled in the art can understand and implement this without any creative effort.

[0124] Through the above description of the embodiments, those skilled in the art can clearly understand that each embodiment can be implemented by means of software plus necessary general-purpose hardware platforms, and of course, it can also be implemented by hardware. Based on this understanding, the above technical solutions, in essence or the part that contributes to the prior art, can be embodied in the form of a software product. This computer software product can be stored in a computer-readable storage medium, such as ROM / RAM, magnetic disk, optical disk, etc., and includes several instructions to cause a computer device (which may be a personal computer, server, or network device, etc.) to execute the methods described in the various embodiments or some parts of the embodiments.

[0125] Finally, it should be noted that the above embodiments are only used to illustrate the technical solutions of the present invention, and not to limit them; although the present invention has been described in detail with reference to the foregoing embodiments, those skilled in the art should understand that modifications can still be made to the technical solutions described in the foregoing embodiments, or equivalent substitutions can be made to some of the technical features; and these modifications or substitutions do not cause the essence of the corresponding technical solutions to deviate from the spirit and scope of the technical solutions of the embodiments of the present invention.

Claims

1. A method for detecting deepfake images based on clustering decision-making, characterized in that, include: A face image to be detected is acquired, and the face image to be detected is segmented to obtain at least one local face image, wherein the local face image reflects the facial features in the face image to be detected; The face image to be detected and the local face image are input into a trained feature extraction model to obtain the fused features output by the feature extraction model; The first distance between the fusion feature and the trained first cluster center, and the second distance between the fusion feature and the trained second cluster center are obtained respectively. The forgery detection result of the face image to be detected is determined based on the first distance and the second distance. The forgery detection result reflects whether the face image to be detected is a forgery image. The first cluster center reflects the fusion feature corresponding to the forgery image, and the second cluster center reflects the fusion feature corresponding to the real image. The trained feature extraction model, the trained first cluster center, and the trained second cluster center are trained based on multiple sets of training data. Each set of training data includes a sample face image to be detected and a forgery detection label corresponding to the sample face image. The training process of the feature extraction model includes: An initial feature extraction model and a binary classifier are trained based on multiple sets of training data to obtain an intermediate feature extraction model. The binary classifier is used to output a forgery prediction result based on the fused features output by the feature extraction model. The forgery prediction result is used to classify the training data as real images or forged images. The intermediate feature extraction model is trained based on multiple sets of training data to obtain the trained feature extraction model, the trained first cluster center, and the trained second cluster center.

2. The deepfake image detection method based on clustering decision-making according to claim 1, characterized in that, The step of training the intermediate feature extraction model based on multiple sets of training data to obtain the trained feature extraction model, the trained first cluster center, and the trained second cluster center includes: Based on the multiple sets of training data, the initial values ​​of the first cluster center and the second cluster center are determined; A target training data batch is determined from multiple sets of training data, wherein the target training data batch includes multiple sets of target training data; The sample face image to be detected in the target training data is segmented to obtain sample local face images. The sample face image to be detected and the sample local face images are input into the intermediate feature extraction model to obtain the sample fusion features output by the intermediate feature extraction model. The parameters of the intermediate feature extraction model are updated based on the sample fusion features and the forgery detection labels corresponding to each target training data in the target training data batch, and the first cluster center and the second cluster center are updated based on the sample fusion features and the forgery detection labels corresponding to each target training data in the target training data batch; The step of determining the target training data batch in multiple sets of training data is repeated until the number of the target training data batches reaches a preset value. When the number of the target training data batches reaches the preset value, the training ends. The intermediate feature extraction model at the end of training is taken as the trained feature extraction model, the first cluster center at the end of training is taken as the trained first cluster center, and the second cluster center at the end of training is taken as the trained second cluster center.

3. The deepfake image detection method based on clustering decision-making according to claim 2, characterized in that, The step of determining the initial values ​​of the first cluster center and the second cluster center based on multiple sets of training data includes: The training data in each group is classified to obtain a first set and a second set, wherein the forgery detection label in the training data in the first set indicates that the corresponding sample face image to be detected is a forgery image, and the forgery detection label in the training data in the second set indicates that the corresponding sample face image to be detected is a real image; Based on the intermediate feature extraction model, a first fusion feature corresponding to each sample face image to be detected in the first set and a second fusion feature corresponding to each sample face image to be detected in the second set are determined. The average value of each first fusion feature is used as the initial value of the first cluster center, and the average value of each second fusion feature is used as the initial value of the second cluster center.

4. The deepfake image detection method based on clustering decision-making according to claim 2, characterized in that, The step of updating the parameters of the intermediate feature extraction model based on the sample fusion features and the forgery detection labels corresponding to each target training data in the target training data batch includes: The loss function is calculated based on the sample fusion feature and the forgery detection label corresponding to each target training data in the target training data batch; The parameters of the intermediate feature extraction model are updated with the goal of minimizing the loss function; The loss function includes a first loss function and a second loss function, wherein the first loss function is: The second loss function is Wherein, F represents the sample fusion feature corresponding to the target training data, Fake represents the set of sample fusion features corresponding to the target training data whose forgery detection label is a forged image in the target training data batch, Real represents the set of sample fusion features corresponding to the target training data whose forgery detection label is a real image in the target training data batch, CF represents the current first cluster center, and CR represents the current second cluster center. It is a constant.

5. The deepfake image detection method based on clustering decision-making according to claim 2, characterized in that, The step of updating the first cluster center and the second cluster center based on the sample fusion feature and the forgery detection label corresponding to each target training data in the target training data batch includes: The first cluster center is updated based on the sample fusion features corresponding to all target training data in the target training data batch whose forgery detection label is a forged image. The second cluster center is updated based on the sample fusion features corresponding to all the target training data in the target training data batch whose corresponding fake detection labels are real images.

6. The deepfake image detection method based on clustering decision-making according to claim 5, characterized in that, The step of updating the first cluster center based on the sample fusion features corresponding to all target training data in the target training data batch whose forgery detection label is a forged image includes: The first cluster center is updated based on the first formula; The step of updating the second cluster center based on the sample fusion features corresponding to all target training data in the target training data batch whose corresponding fake detection labels are real images includes: The second cluster center is updated based on the second formula; The first formula is The second formula is ; in, This represents the first cluster center updated based on the t-th batch of the target training data. Let F represent the second cluster center updated based on the t-th batch of the target training data, F represent the sample fusion feature corresponding to the target training data, Fake represent the set of sample fusion features corresponding to the target training data whose forgery detection label is a forged image in the target training data batch, and Real represent the set of sample fusion features corresponding to the target training data whose forgery detection label is a real image in the target training data batch. This indicates that the values ​​within the parentheses are to be averaged. It is a constant.

7. The deepfake image detection method based on clustering decision-making according to claim 1, characterized in that, The feature extraction model includes a first frequency feature extraction module, a second frequency feature extraction module, a first spatial feature extraction module, and a second spatial feature extraction module; obtaining the fused features output by the feature extraction model includes: The face image to be detected is input into the first spatial feature extraction module and the first frequency feature extraction module respectively. The first spatial feature of the face image to be detected is extracted based on the first spatial feature extraction module, and the first frequency feature of the face image to be detected is extracted based on the first frequency feature extraction module. The local face image is input to the second spatial feature extraction module and the second frequency feature extraction module respectively. The second spatial feature of the local face image is extracted based on the second spatial feature extraction module, and the second frequency feature of the face image to be detected is extracted based on the second frequency feature extraction module. All the first spatial features, the first frequency features, the second spatial features, and the second frequency features are concatenated to obtain concatenated features, and the fused features are obtained based on the concatenated features.

8. A deepfake image detection device based on clustering decision-making, characterized in that, include: The image acquisition module is used to acquire a face image to be detected, perform image segmentation on the face image to be detected, and obtain at least one local face image, wherein the local face image reflects the facial features in the face image to be detected; The feature fusion module is used to input the face image to be detected and the local face image into the trained feature extraction model to obtain the fused features output by the feature extraction model; The forgery detection module is used to obtain a first distance between the fusion feature and a trained first cluster center, and a second distance between the fusion feature and a trained second cluster center, and to determine the forgery detection result of the face image to be detected based on the first distance and the second distance. The forgery detection result reflects whether the face image to be detected is a forgery image. The first cluster center reflects the fusion feature corresponding to the forgery image, and the second cluster center reflects the fusion feature corresponding to the real image. The trained feature extraction model, the trained first cluster center, and the trained second cluster center are trained based on multiple sets of training data. Each set of training data includes a sample face image to be detected and a forgery detection label corresponding to the sample face image. The training process of the feature extraction model includes: An initial feature extraction model and a binary classifier are trained based on multiple sets of training data to obtain an intermediate feature extraction model. The binary classifier is used to output a forgery prediction result based on the fused features output by the feature extraction model. The forgery prediction result is used to classify the training data as real images or forged images. The intermediate feature extraction model is trained based on multiple sets of training data to obtain the trained feature extraction model, the trained first cluster center, and the trained second cluster center.

9. An electronic device comprising a memory, a processor, and a computer program stored in the memory and executable on the processor, characterized in that, When the processor executes the program, it implements the deepfake image detection method based on clustering decision as described in any one of claims 1 to 7.

10. A non-transitory computer-readable storage medium having a computer program stored thereon, characterized in that, When the computer program is executed by a processor, it implements the deepfake image detection method based on clustering decision as described in any one of claims 1 to 7.