Deepfake detection method and system using same
The deepfake detection method improves deepfake detection by classifying image quality, adjusting model parameters, and generating a meta-model to enhance detection performance across varying image qualities.
Patent Information
- Authority / Receiving Office
- WO · WO
- Patent Type
- Applications
- Current Assignee / Owner
- UI (UNIVERSITY IND FOUNDATION) YONSEI UNIVERSITY
- Filing Date
- 2025-12-10
- Publication Date
- 2026-06-18
AI Technical Summary
Existing deepfake detection models struggle with varying performance due to data quality and lack of generalization, especially when faced with new deepfake generation techniques, leading to inefficiencies in detecting degraded or varied image quality.
A deepfake detection method involving a deep learning model that classifies image quality, adjusts parameters based on classified data, generates a meta-model using measured performance, and determines deepfakes through a meta-model, utilizing a pre-trained detection model trained on similarity between image and text features.
The method effectively detects deepfakes even in situations of degraded or varied image quality, enhancing detection performance compared to existing models like Xception.
Smart Images

Figure KR2025021190_18062026_PF_FP_ABST
Abstract
Description
Deepfake detection method and system using the same
[0001] The present invention relates to a deepfake detection method, and more specifically, to a method capable of effectively detecting deepfakes even in deepfake images of various qualities.
[0002]
[0003] Deepfakes can be utilized in various forms and bring about positive effects, but they have a negative impact on society and cause serious social problems. As the easy creation and dissemination of fake content using deepfakes threatens social and ethical security, the importance of deepfake detection is growing.
[0004] On the other hand, deepfake detection models have limitations in that their performance varies depending on the quality of the input and training data. Furthermore, detection effectiveness is limited due to the insufficient ability of existing detection models to respond to new deepfake generation techniques. These problems stem from the diversity and variability of data quality and the lack of generalization ability in existing detection models regarding new types of deepfakes.
[0005] The present invention is derived from research conducted as part of the Ministry of Science and ICT’s Digital Dysfunction Response Technology Development (Project No.: 2710008048, Sub-project No.: 00230337, Research Project Name: Development of a platform for advanced deepfake detection, generation suppression, and distribution prevention to respond to maliciously altered content, Lead Organization: Sungkyunkwan University Industry-Academic Cooperation Foundation, Research Period: 2024.01.01 ~ 2024.12.31).
[0006] Meanwhile, in all aspects of the present invention, the Korean government, the entity providing the problem, has no property interest.
[0007]
[0008] The various embodiments described in this specification aim to solve the problems of the prior art and provide a deepfake detection method that maintains high detection performance even in situations where data quality is degraded or differs.
[0009] The problems that this disclosure aims to solve are not limited to those described above, and other unmentioned problems will be clearly understood by a person skilled in the art from the description below.
[0010]
[0011] A deepfake detection method performed by at least one processor according to one embodiment comprises: a step of inputting a first image dataset into a deep learning model to classify the quality of one or more images included in the first image dataset; a step of adjusting parameters of a previously trained detection model based on one or more classified image data; a step of receiving a second image dataset and measuring the detection performance of the adjusted detection model; a step of generating a meta-model based on the measured detection performance and the adjusted detection model; and a step of receiving an external image and determining whether it is a deepfake using the meta-model.
[0012] Here, the step of classifying the quality of the image may include dividing each of the one or more images into frames and detecting faces in the divided frames to extract face image data.
[0013] Here, the above-mentioned pre-trained detection model may be trained based on measuring the similarity between the feature vector of a training image and the feature vector of a training text, and the measured similarity.
[0014] Here, the training image includes a false image and a true image, and the training text may include text corresponding to the false image and text corresponding to the true image.
[0015] Here, the above-mentioned pre-trained detection model may be trained to increase the similarity of correct image-text pairs and decrease the similarity of incorrect image-text pairs.
[0016] Here, the above-mentioned pre-trained detection model may include one or more detection models with different patch sizes and numbers of parameters.
[0017] Here, the step of adjusting the parameters of the previously trained detection model may include the step of updating the image encoder of the previously trained detection model according to the quality of the classified image data.
[0018] Here, the step of adjusting the parameters of the previously learned detection model may include updating the image encoder of the first detection model with image data classified as a first quality and updating the image encoder of the second detection model with image data classified as a second quality.
[0019] Here, the step of generating the meta-model may include the step of generating the meta-model by assigning different weights to the adjusted detection model based on the measured detection performance.
[0020] Here, the step of generating the meta-model may include the step of generating the meta-model by assigning a first weight to a first adjusted detection model and assigning a second weight to a second adjusted detection model based on the measured detection performance.
[0021] In another embodiment, a computer program stored on a computer-readable non-transient recording medium is provided to execute a deepfake detection method.
[0022] In another embodiment, a deepfake detection system is provided, comprising: a first module that inputs a first image dataset into a deep learning model to classify the quality of one or more images included in the first image dataset; a second module that adjusts parameters of a previously trained detection model using the classified one or more image data; a third module that receives a second image dataset and measures the detection performance of the detection model with adjusted parameters; a fourth module that generates a meta-model based on the measured detection performance and the detection model with adjusted parameters; and a fifth module that receives an external image and determines whether it is a deepfake using the meta-model.
[0023]
[0024] According to one embodiment of the present invention, deepfakes can be detected effectively even in situations where image data quality is degraded or differs in quality.
[0025] The effects according to the present disclosure are not limited to those described above, and other unmentioned effects will be clearly understood by a person skilled in the art from the description below.
[0026]
[0027] FIG. 1 is a figure showing components classified according to operations performed in a deepfake detection system according to one embodiment.
[0028] FIG. 2 is a diagram illustrating operations performed in the components of a deepfake detection system according to one embodiment.
[0029] FIG. 3 is a diagram illustrating a method for training a previously trained detection model according to one embodiment.
[0030] FIG. 4 is a flowchart of a deepfake detection method according to one embodiment.
[0031]
[0032] Hereinafter, exemplary embodiments according to the present invention will be described in detail with reference to the contents described in the attached drawings. However, the present invention is not limited or restricted by exemplary embodiments. Unless otherwise defined, all terms used in this specification (including technical and scientific terms) shall be used in a meaning that is commonly understood by those skilled in the art to which this disclosure belongs, but this may vary depending on the intent of those skilled in the art, case law, the emergence of new technology, etc.
[0033] Furthermore, terms defined in commonly used dictionaries are not to be interpreted ideally or excessively unless explicitly and specifically defined otherwise. In certain cases, terms have been selected at the applicant's discretion, and in such cases, their meanings will be described in detail in the relevant explanatory sections. Accordingly, terms used in this disclosure should be defined not merely by their names, but based on their meanings and the content throughout this disclosure.
[0034] Throughout this specification, when a part is described as "comprising" a certain component, this means that, unless specifically stated otherwise, it does not exclude other components but may include additional components. Furthermore, the singular form used in this specification includes the plural form unless specifically stated otherwise. Additionally, the expression "at least one of a, b, and / or c" as used throughout this specification may encompass 'a alone', 'b alone', 'c alone', 'a and b', 'a and c', 'b and c', or 'a, b, and c all'.
[0035] Meanwhile, terms such as "first and / or second" used in this specification may be used to describe various components, but they are used solely for the purpose of distinguishing one component from another and are not intended to limit the scope to the components referred to by such terms. For example, without departing from the scope of the present invention, the first component may be named the second component, and the second component may also be named the first component.
[0036] Additionally, terms such as “…part,” “…module,” etc., as described in this specification refer to a unit that processes at least one function or operation, which may be implemented in hardware or software, or a combination of hardware and software. Furthermore, embodiments of this disclosure may be represented in this specification by functional block configurations and various processing steps. These functional blocks may be implemented by various numbers of hardware and / or software configurations that execute specific functions. For example, embodiments of this disclosure may employ integrated circuit configurations such as memory, processing, logic, look-up tables, etc., which can execute various functions under the control of one or more microprocessors or other control devices.
[0037] In an embodiment according to the present disclosure, functions related to artificial intelligence may be implemented through a processor and memory. In this case, the processor may be any one of a general-purpose processor such as a CPU (Center Processing Unit), AP (Application Processor), DSP (Digital Signal Processor), a graphics-dedicated processor such as a GPU (Graphic Processing Unit) or VPU (Vision Processing Unit), and an artificial intelligence-dedicated processor such as an NPU (Neural Network Processing Unit). The processor may process input data according to predefined operation rules or artificial intelligence models stored in memory. Alternatively, if the processor is an artificial intelligence-dedicated processor, the artificial intelligence-dedicated processor may be designed with a hardware structure specialized for processing a specific artificial intelligence model. In some embodiments according to the present disclosure, functions related to artificial intelligence may be implemented through a plurality of processors.
[0038] In an embodiment according to the present disclosure, a predefined operation rule or artificial intelligence model may be configured to perform machine learning. Here, being configured to perform machine learning means that the predefined operation rule or artificial intelligence model is configured to perform a desired characteristic (or objective) by learning using a plurality of training data based on a learning algorithm. Such learning may be performed on the device itself in which the artificial intelligence according to the present disclosure is implemented, or it may be performed through a separate server and / or system.
[0039] Artificial intelligence models can be implemented as neural networks (or artificial neural networks) and can operate based on statistical learning algorithms that mimic biological neurons in machine learning and cognitive science. A neural network can refer to a model in which artificial neurons (nodes), which form a network through synaptic connections, change the strength of synaptic connections through learning to possess problem-solving capabilities. A neural network can be composed of multiple neural network layers; for example, a neural network may include an input layer, a hidden layer, and an output layer. Each of the multiple neural network layers may include at least one node and at least one weight, and neural network operations can be performed through operations between the results of previous (precious) layers and the weights. At least one weight possessed by the multiple neural network layers may be optimized based on the learning results of the artificial intelligence model. For example, at least one weight may be updated so that the loss value or cost value obtained from the artificial intelligence model during the learning process is reduced or minimized. Neural networks can infer a result to be predicted from an arbitrary input.
[0040] The learning methods of artificial intelligence models can be classified according to the learning approach into supervised learning, where input and output data are provided as training data and the correct answer (output data) corresponding to the problem (input data) is predetermined; unsupervised learning, where only input data is provided without output data and the correct answer (output data) corresponding to the problem (input data) is not predetermined; and reinforcement learning, where a reward is granted whenever an action is taken from the current state and learning proceeds in a direction that maximizes this reward. Alternatively, they can be classified according to the architecture, which is the structure of the learning model.
[0041] In the embodiments of the present disclosure, the artificial intelligence model is a Convolutional Neural Network (CNN) such as GoogleNet, AlexNet, VGG Network, Region with Convolutional Neural Network (R-CNN), Region Proposal Network (RPN), Recurrent Neural Network (RNN), Stacking-based Deep Neural Network (S-DNN), State-Space Dynamic Neural Network (S-SDNN), Deconvolution Network, Deep Belief Network (DBN), Restructured Boltzmann Machine (RBM), Fully Convolutional Network, Long Short-Term Memory Network (LSTM), Classification Network, Generative Modeling, eXplainable AI, Continual AI, Representation Learning, AI for Material Design, BERT, SP-BERT, MRC / QA for Natural Language Processing, Text Analysis, Dialog System, GPT-3, GPT-4, Visual Analytics, Visual Understanding, Video Synthesis for Vision Processing, Anomaly Detection, Prediction, Time-Series Forecasting, Optimization, Recommendation for ResNet Data Intelligence, At least one of various artificial intelligence structures and algorithms, such as data creation, may be used. The examples described above are merely examples of artificial intelligence structures and algorithms used according to the embodiments of the present disclosure and do not limit the artificial intelligence structures and algorithms used according to the embodiments of the present disclosure.
[0042] Hereinafter, various embodiments of the present disclosure will be described in detail with reference to the accompanying drawings. In describing the embodiments, technical details that are well known in the art to which the present invention pertains and are not directly related to the present invention will be omitted. This is to ensure that the essence of the present invention is conveyed more clearly without obscuring it by omitting unnecessary explanations. For the same reason, some components in the accompanying drawings may be exaggerated, omitted, or schematically depicted. Furthermore, the size of each component does not entirely reflect its actual size. Throughout this specification, the same reference numerals may refer to the same or corresponding components.
[0043] FIG. 1 is a figure showing components classified according to operations performed in a deepfake detection system according to one embodiment.
[0044] In the illustrated embodiments, each component may have different functions and capabilities other than those described below, and may include additional components other than those described below. Additionally, in one embodiment, each component may be implemented using one or more physically separated devices, or by one or more processors or a combination of one or more processors and software, and may not be clearly distinguished in specific operation as in the illustrated examples.
[0045] The deepfake detection method illustrated in FIG. 1 can be implemented within a logic circuit by hardware, firmware, software, or a combination thereof, and can also be implemented using a general-purpose or specific-purpose computer. The device can be implemented using a hardwired device, a field programmable gate array (FPGA), an application-specific integrated circuit (ASIC), etc.
[0046] In addition, the system can be implemented as a System on Chip (SoC) including one or more processors and controllers.
[0047] In addition, deepfake detection methods may be installed in the form of software, hardware, or a combination thereof on a computing device or server equipped with hardware elements. A computing device or server may refer to various devices that include, in whole or in part, communication devices such as communication modems for communicating with various devices or wired / wireless communication networks, memory for storing data for executing programs, and microprocessors for executing programs to perform calculations and commands.
[0048] With reference to FIG. 1, a deepfake detection method may include a classification module (100), an adjustment module (200), a measurement module (300), a generation module (400), and a judgment module (500). Each module may include at least one processor. Specifically, each module may mean a set of processors operating for a single function (e.g., classification, adjustment, etc.).
[0049] The classification module (100) can acquire a first image dataset. The first image dataset is a dataset of images having different qualities, and the classification module (100) can input the first image dataset into a deep learning model. For example, the first image dataset may include a first quality image, a second quality image, and a third quality image. The classification module (100) may also be referred to as a classification processor.
[0050] The classification module (100) inputs a first image dataset into a deep learning model and can classify the quality of one or more images included in the first image dataset. Here, the deep learning model is an artificial neural network model that performs neural network operations according to a method learned on input data and outputs a model output value corresponding to the input data, and can receive various input data according to the purpose and output various model output values corresponding thereto. In one embodiment, the artificial neural network may have various neural network structures such as a Recurrent Neural Network (RNN) and a Convolutional Neural Network (CNN). For example, the deep learning model may be a model that extracts features using a CNN, learns a regression model, and learns MOS (Mean Subjective Score) data to predict quality.
[0051] For example, the classification module (100) can input the first image dataset into a deep learning model to classify it into a first quality image, a second quality image, and a third quality image.
[0052] Additionally, the classification module (100) can divide each of one or more images included in the first image dataset into frames and detect faces in the divided frames to extract face image data.
[0053] The classification module (100) can input the extracted face image data into a deep learning model and classify the quality of the face image.
[0054] The adjustment module (200) can adjust the parameters of a previously trained detection model. The adjustment module (200) may also be referred to as an adjustment processor.
[0055] Here, the pre-trained detection model may be a multimodal model combining an image deep learning model and a language model. The pre-trained detection model may measure the similarity between the feature vectors of a training image and the feature vectors of a training text, and be trained based on the measured similarity. Here, the training image may include a false image and a true image, and the training text may include text corresponding to the false image and text corresponding to the true image.
[0056] Specifically, the previously trained detection model may be trained to increase the similarity of correct image-text pairs. Specifically, the previously trained detection model may be trained to decrease the similarity of incorrect image-text pairs. Here, a correct image-text pair refers to a pair of texts corresponding to a false image and a false image, and a pair of texts corresponding to a true image and a true image, and an incorrect image-text pair refers to a pair of texts corresponding to a false image and a true image, and a pair of texts corresponding to a true image and a false image. Hereinafter, the text corresponding to the true image is referred to as the true text, and the text corresponding to the false image is referred to as the false text.
[0057] Specifically, the previously trained detection model may be trained to increase the similarity between the feature vector of the training true image and the feature vector of the training true text, and the similarity between the feature vector of the training false image and the feature vector of the training false text.
[0058] On the other hand, the pre-trained detection model may be trained to increase the similarity between the feature vector of the training true image and the feature vector of the training false text, and the similarity between the feature vector of the training false image and the feature vector of the training true text.
[0059] A detailed explanation regarding this will be provided in Figure 3 below.
[0060] The pre-trained detection model may include one or more detection models with different patch sizes and numbers of parameters. For example, the pre-trained detection model may include a model trained with images of patch sizes of 32X32 or 16X16.
[0061] The adjustment module (200) can adjust the parameters of a pre-trained detection model based on a first image dataset classified by quality. Specifically, the adjustment module (200) can adjust by updating the weights of all layers of the pre-trained detection model. Alternatively, the adjustment module (200) can adjust by updating the weights of some layers of the pre-trained detection model. More specifically, the adjustment module (200) can update the image encoders of the pre-trained detection model according to the quality of the classified image data.
[0062] The adjustment module (200) can generate a first adjusted detection model by updating the image encoder of the first detection model with image data classified as a first quality. The adjustment module (200) can generate a second adjusted detection model by updating the image encoder of the second detection model with image data classified as a second quality. The adjustment module (200) can generate a third adjusted detection model by updating the image encoder of the third detection model with image data classified as a third quality.
[0063] For example, the image encoder of the first detection model, which has the largest number of parameters, can be updated with high-quality classified image data, and the image encoder of the third detection model, which has the smallest number of parameters, can be updated with low-quality classified image data.
[0064] The measurement module (300) can receive a second image dataset and measure the detection performance of the adjusted detection model. Here, the second image dataset is a dataset of images having different qualities and may be the same as or different from the first image dataset. For example, the second image dataset may include a fourth quality image, a fifth quality image, and a sixth quality image. The measurement module (300) may also be referred to as a measurement processor.
[0065] For example, the measurement module (300) can receive the fourth quality image, the fifth quality image, and the sixth quality image and measure the detection performance of each of the first adjustment detection model to the third adjustment detection model.
[0066] For example, the measurement module (300) can measure performance using deepfake measurement metrics such as AUC or ACC.
[0067] The generation module (400) can generate a meta-model based on the measured detection performance and the adjusted detection model. Here, the meta-model may refer to a model generated by assigning different weights to the adjusted detection model based on the measured detection performance. For example, the generation module (400) can generate a meta-model by assigning a first weight to the first adjusted detection model and a second weight to the second adjusted detection model. The generation module (400) may also be referred to as a generation processor.
[0068] The generation module (400) can generate a meta-model using a weighted average or a weighted multi-classification method. For example, the generation module (400) can generate a meta-model by using a weighted average method to return a probability value (e.g., a Sigmoid output value).
[0069] The judgment module (500) can receive an external image and determine whether the image is a deepfake using a meta model generated by the generation module (400). Specifically, the judgment module (500) can receive an external image and a message determining whether it is a deepfake and determine whether the image is a deepfake using a meta model. The judgment module (500) may also be referred to as a judgment processor.
[0070] According to one embodiment, it was confirmed that the detection performance of deepfakes is superior when using the present invention compared to when using Xception, which is widely used as a backbone.
[0071]
[0072] FIG. 2 is a diagram illustrating operations performed in the components of a deepfake detection system according to one embodiment.
[0073] Referring to FIG. 2, the classification module (100) can classify the image quality of the input first image dataset into high quality, medium quality, and low quality. Specifically, the classification module (100) can input the first image dataset into a deep learning model to classify it into high quality, medium quality, and low quality.
[0074] The adjustment module (200) can adjust the parameters of the first detection model using image data classified as high quality. The adjustment module (200) can adjust the parameters of the second detection model using image data classified as medium quality. The adjustment module (200) can adjust the parameters of the third detection model using image data classified as low quality.
[0075] Here, the first to third detection models may be trained to increase the similarity of correct image-text pairs and to decrease the similarity of incorrect image-text pairs. Here, the correct image-text pairs may be, for example, a deepfake image-"fake person" pair and a real image-"real person" pair, and the incorrect image-text pairs may be a deepfake image-"real person" pair and a real image-"fake person" pair.
[0076] Specifically, the adjustment module (200) can generate a first adjusted detection model by updating the image encoder of the first detection model with image data classified as high quality. The adjustment module (200) can generate a second adjusted detection model by updating the image encoder of the second detection model with image data classified as medium quality. The adjustment module (200) can generate a third adjusted detection model by updating the image encoder of the third detection model with image data classified as low quality.
[0077] The measurement module (300) can measure the detection performance of the first adjustment detection model, the second adjustment detection model, and the third adjustment detection model, respectively, using a second image dataset containing high-quality, medium-quality, and low-quality images. For example, the measurement module (300) can measure the performance of the adjustment detection model using AUC or ACC, etc.
[0078] The generation module (400) can generate a meta-model based on performance measurements measured by the measurement module (300) and the first to third adjustment detection models generated by the adjustment module (200). Specifically, the generation module (400) can generate a meta-model by assigning different weights to the first to third adjustment detection models based on the performance measurements.
[0079] The judgment module (500) can receive an external image and use a meta model generated by the generation module (400) to determine whether the image is a deepfake. Specifically, the judgment module (500) can receive an external image and a message determining whether it is a deepfake and use a meta model to determine whether it is a deepfake (T) or not (F).
[0080]
[0081] FIG. 3 is a diagram illustrating a method for training a previously trained detection model according to one embodiment.
[0082] Referring to Fig. 3, the previously trained detection model may have been trained using image data including false images and true images as training images. The previously trained detection model may have been trained using text data including false text and true text as training text.
[0083] For example, referring to Fig. 3, I1 and I3 may be values encoded from a deepfake image. I2 and IN may be values encoded from a real image. T1 may be a value encoded from "fake person", and T2 may be a value encoded from "real person".
[0084] The previously trained detection model may be trained to increase the similarity of correct image-text pairs and decrease the similarity of incorrect image-text pairs. Here, correct image-text pairs refer to false image and false text pairs and true image and true text pairs, and incorrect image-text pairs refer to false image and true text pairs and true image and false text pairs.
[0085] Specifically, the correct image-text pairs are deepfake image-"fake person" pairs and real image-"real person" pairs, and the incorrect image-text pairs can be deepfake image-"real person" pairs and real image-"fake person" pairs.
[0086] For example, referring to Fig. 3, I1T1, I2T2, and I3T1 may be ground truth image-text pairs. Here, the pre-trained detection model may be trained to increase the similarity of the I1T1, I2T2, and I3T1 pairs and decrease the similarity of the remaining pairs.
[0087]
[0088] FIG. 4 is a flowchart of a deepfake detection method according to one embodiment.
[0089] Referring to FIG. 4, a deepfake detection method according to one embodiment may include a step of classifying the quality of a first image dataset (S100), a step of adjusting parameters of a previously trained detection model (S200), a step of measuring the detection performance of the detection model with a second image dataset (S300), a step of generating a meta-model (S400), and a step of determining whether it is a deepfake (S500).
[0090] Step S100 may be a step in which a first image dataset is input into a deep learning model by a classification module (100) to classify the quality of one or more images included in the first image dataset. For example, in Step S100, the classification module (100) may input the first image dataset into a deep learning model to classify it into a first quality image, a second quality image, and a third quality image. Step S100 may include a step in which each of one or more images is divided into frames by the classification module (100), and a face is detected in the divided frames to extract face image data.
[0091] Step S200 may be a step of adjusting the parameters of a previously trained detection model based on one or more image data classified by the adjustment module (200). Step S200 may include a step of updating the image encoder of the previously trained detection model according to the quality of the image data classified by the adjustment module (200). Specifically, in Step S200, the adjustment module (200) may create a first adjusted detection model by updating the image encoder of the first detection model with image data classified as a first quality. In Step S200, the adjustment module (200) may create a second adjusted detection model by updating the image encoder of the second detection model with image data classified as a second quality. In Step S200, the adjustment module (200) may create a third adjusted detection model by updating the image encoder of the third detection model with image data classified as a third quality.
[0092] Step S300 may be a step in which the measurement module (300) receives a second image dataset and measures the detection performance of the adjusted detection model. Specifically, in Step S300, the measurement module (300) may measure the performance of the first adjusted detection model, the second adjusted detection model, and the third adjusted detection model according to the quality of the second image dataset. For example, in Step S300, the measurement module (300) may measure the performance of the first adjusted detection model, the second adjusted detection model, and the third adjusted detection model using the first quality image data, the second quality image data, and the third quality image data.
[0093] Step S400 may be a step of generating a meta-model based on detection performance measured by the generation module (400) and an adjusted detection model. Specifically, in step S400, the generation module (400) may generate a meta-model by assigning different weights to the adjusted detection model using a weighted average or a weighted multi-classification method based on the measured detection performance. For example, in step S400, the generation module (400) may generate a meta-model by assigning different weights to a first adjusted detection model, a second adjusted detection model, and a third adjusted detection model based on the measured detection performance.
[0094] Step S500 may be a step in which an external image is input by a judgment module (500) and a meta model is used to determine whether it is a deepfake. Specifically, in Step S500, the judgment module (500) may input an external image and a message determining whether it is a deepfake, and use a meta model to determine whether the image is a deepfake.
[0095] Meanwhile, the embodiments disclosed in this specification may be implemented in the form of a recording medium that stores instructions executable by a computer. The instructions may be stored in the form of program code and, when executed by a processor, may generate a program module to perform the operations of the disclosed embodiments. The recording medium may be implemented as a computer-readable recording medium. A computer-readable recording medium may include all types of recording media that store instructions decipherable by a computer. Examples include ROM, RAM, magnetic tape, magnetic disk, flash memory, optical data storage devices, etc.
[0096] The above descriptions are specific embodiments for carrying out the present disclosure. The present disclosure will include not only the embodiments described above, but also embodiments that can be simply modified or easily modified. Furthermore, the present disclosure will include technologies that can be easily modified and implemented using the embodiments described above. Accordingly, the scope of the present disclosure should not be limited to the embodiments described above, but should be defined by the claims set forth below as well as equivalents to the claims of the present disclosure.
Claims
1. A deepfake detection method performed by at least one processor, wherein A step of inputting a first image dataset into a deep learning model to classify the quality of one or more images included in the first image dataset; A step of adjusting the parameters of a pre-trained detection model based on one or more classified image data; A step of receiving a second image dataset and measuring the detection performance of an adjusted detection model; A step of generating a meta-model based on the above-mentioned measured detection performance and the above-mentioned adjusted detection model; and A step of receiving an external image as input and determining whether it is a deepfake using the meta model; Deepfake detection methods.
2. In Paragraph 1, The step of classifying the quality of the above image Each of the above one or more images is divided into frame units, and A step comprising detecting a face in the above-described divided frame and extracting face image data Deepfake detection methods.
3. In Paragraph 1, The aforementioned pre-trained detection model Measure the similarity between the feature vector of a training image and the feature vector of a training text, and based on the measured similarity, Deepfake detection methods.
4. In Paragraph 3, The above training image includes a false image and a true image, and The above training text includes text corresponding to the false image and text corresponding to the true image. Deepfake detection methods.
5. In Paragraph 3, The aforementioned pre-trained detection model trained to increase the similarity of correct image-text pairs and decrease the similarity of incorrect image-text pairs, Deepfake detection methods.
6. In Paragraph 1, The above-mentioned pre-trained detection model includes one or more detection models with different patch sizes and numbers of parameters. Deepfake detection methods.
7. In Paragraph 1, The step of adjusting the parameters of the aforementioned pre-trained detection model is A step comprising updating the image encoder of the aforementioned pre-trained detection model according to the quality of the classified image data. Deepfake detection methods.
8. In Paragraph 1, The step of adjusting the parameters of the aforementioned pre-trained detection model is Update the image encoder of the first detection model with image data classified as the first quality, and A step of updating the image encoder of a second detection model with image data classified as a second quality. Deepfake detection methods.
9. In Paragraph 1, The step of generating the above meta-model A step comprising generating a meta-model by assigning different weights to the adjusted detection model based on the above-mentioned measured detection performance. Deepfake detection methods.
10. In Paragraph 1, The step of generating the above meta-model A step comprising generating a meta-model by assigning a first weight to a first adjusted detection model and assigning a second weight to a second adjusted detection model based on the above-mentioned measured detection performance. Deepfake detection methods.
11. A computer-readable, non-transient recording medium having a computer program recorded thereon to execute the deepfake detection method of any one of paragraphs 1 through 10.
12. A first module that inputs a first image dataset into a deep learning model to classify the quality of one or more images included in the first image dataset; A second module that adjusts the parameters of a pre-trained detection model using one or more of the classified image data above; A third module that receives a second image dataset as input and measures the detection performance of a parameter-adjusted detection model; A fourth module that generates a meta-model based on the above-mentioned measured detection performance and the above-mentioned parameter-adjusted detection model; and A fifth module that receives an external image as input and uses the meta-model to determine whether it is a deepfake; Deepfake detection system.