Learning program, information processing device, and inference program

By generating composite images with retained high-frequency components and consistent labels, the method addresses the issue of impaired features in existing training data methods, improving the accuracy of fake image detection models.

WO2026126436A1PCT designated stage Publication Date: 2026-06-18FUJITSU LTD

Patent Information

Authority / Receiving Office
WO · WO
Patent Type
Applications
Current Assignee / Owner
FUJITSU LTD
Filing Date
2024-12-12
Publication Date
2026-06-18

AI Technical Summary

Technical Problem

Existing methods for generating training data for fake image detection, such as CutMix, can impair features useful for distinguishing real from fake images, leading to inaccurate machine learning models.

Method used

A method involving the generation of composite images by replacing low-frequency components of one image with those of another, while maintaining high-frequency components, and ensuring the composite images have consistent labels, to create accurate training data for machine learning models.

🎯Benefits of technology

This approach retains important features for fake detection and prevents incorrect labeling, enhancing the accuracy of machine learning models in distinguishing between real and fake images.

✦ Generated by Eureka AI based on patent content.

Smart Images

  • Figure JP2024044065_18062026_PF_FP_ABST
    Figure JP2024044065_18062026_PF_FP_ABST
Patent Text Reader

Abstract

The present invention improves the accuracy of fake detection. A processing unit (12) acquires an image pair from among a plurality of images. The processing unit (12) generates a composite image (40) by replacing low frequency components (21a) that correspond to frequencies lower than a first frequency and that are among the frequency components contained in a first partial image (21), which is a portion of a first image (20) belonging to the pair, with low frequency components of a second partial image (31) that is included in a second image (30) belonging to the pair and that corresponds to the first partial image (21). The processing unit (12) assigns, to the composite image (40), a label that is based on a first label indicating whether or not the first image (20) is a fake and a second label indicating whether or not the second image (30) is a fake. By machine learning using a composite image generated for each pair acquired from among the plurality of images and the label for the composite image, the processing unit (12) generates a machine learning model that outputs an indicator indicating the likelihood that a third image is a fake upon input of the third image.
Need to check novelty before this filing date? Find Prior Art

Description

Learning program, information processing device, and inference program 【0001】 This invention relates to a learning program, an information processing device, and an inference program. 【0002】 Artificial intelligence (AI) technology can sometimes generate fake images that represent people's faces or other features. For example, fake images generated through machine learning models based on deep learning are sometimes called "deepfakes." Machine learning models that can be used include, for example, variational autoencoders, generative adversarial networks, and diffusion models. 【0003】 Fake images depict people performing actions or saying things that real people have not actually done. To viewers, the people in fake images appear real. Therefore, the malicious use of fake images can lead to the spread of misinformation, invasion of privacy, and even manipulation of political decision-making, potentially damaging the credibility of certain organizations and causing social disruption. 【0004】 Therefore, methods for detecting fake images have been proposed. For example, there is a proposal for a device that uses machine learning to generate a model that determines whether the verification data is fake or not, by comparing white data (images and video data created by registered users themselves) with simulated fake data created using the user's facial information. 【0005】 Another proposed learning device generates synthetic data by combining real data and fake data that mimics the real data at a desired synthesis ratio, and uses the data to be identified, which includes real data, fake data, and synthetic data, to train a discrimination model. 【0006】 Furthermore, there is a proposed device that generates mixed data from existing training samples using a method called CutMix, in order to generate more training samples based on existing training samples. 【0007】Furthermore, there are proposals for devices that use adversarial sample generation algorithms and image editing algorithms to augment training data, and then use the augmented training data to generate a learning model used for recognizing road signs, traffic lights, lane markings, etc., contained in images. 【0008】 Japanese Patent Publication No. 2021-162965, International Publication No. 2021 / 220343, U.S. Patent Application Publication No. 2022 / 0270353, Specification Japanese Patent Publication No. 2022-73495 【0009】 To obtain training data for machine learning, one possible method is to generate new images by pasting parts of one image onto parts of another, such as with CutMix. However, this method may impair features useful for fake detection that were present in the original images due to the image cutting and pasting process. If machine learning is performed using images with these features impaired as training data, the machine learning model may not be able to properly learn the criteria for distinguishing between real and fake images. 【0010】 In one aspect, the present invention aims to improve the accuracy of fake detection. 【0011】 In one embodiment, a learning program is provided that causes a computer to perform the following processes: The computer obtains pairs of images from multiple images. The computer generates a composite image by replacing the low-frequency components in the first sub-image, which is a part of the first image belonging to the pair, with the low-frequency components in the second sub-image, which is the part of the second image belonging to the pair that corresponds to the first sub-image. The computer assigns labels to the composite image based on a first label indicating whether the first image is fake and a second label indicating whether the second image is fake. The computer generates a machine learning model that outputs an index indicating the likelihood that a third image is fake for a given input of a third image, using machine learning with the composite image generated for each pair of images obtained from multiple images and the labels for each composite image. 【0012】In one embodiment, an inference program is provided that causes a computer to perform the following process. The computer inputs an input image into a machine learning model, which is generated by machine learning using a composite image generated for each pair of images in a plurality of images and a label for each composite image. The machine learning model outputs an index indicating the possibility that an image is fake for each image input, and the computer performs the process of obtaining the index for the input image. Labels are assigned to the composite image based on a first label indicating whether the first image belonging to the pair is fake or not, and a second label indicating whether the second image belonging to the pair is fake or not. The composite image is an image generated by replacing the low-frequency components corresponding to frequencies lower than the first frequency among the frequency components contained in the first partial image, which is a part of the first image, with the low-frequency components of the second partial image, which is the part of the second image that corresponds to the first partial image. 【0013】 In one embodiment, an information processing device having a storage unit and a processing unit is provided. 【0014】 In one aspect, the accuracy of fake detection can be improved. The above and other objects, features and advantages of the present invention will become apparent from the following description in conjunction with the accompanying drawings illustrating preferred embodiments as examples of the present invention. 【0015】 This is a diagram illustrating the information processing device of the first embodiment. This is a diagram showing an example of the hardware of the information processing device of the second embodiment. This is a diagram showing an example of the functions of the information processing device. This is a diagram showing an example of the training data distribution. This is a diagram showing a first example of image synthesis. This is a diagram showing a second example of image synthesis. This is a flowchart showing an example of the learning process. This is a flowchart showing an example of the data augmentation process. This is a flowchart showing an example of the model training process. This is a flowchart showing an example of the inference process. This is a diagram showing a comparative example. This is a diagram showing an example of the accuracy of fake detection. 【0016】 Hereinafter, this embodiment will be described with reference to the drawings. [First Embodiment] The first embodiment will be described. 【0017】Figure 1 is a diagram illustrating an information processing device according to a first embodiment. The information processing device 10 generates training data using existing images. The information processing device 10 uses the generated training data to perform machine learning on a model for detecting fake images. Fake images may be still images or videos. The model may be called a machine learning model, AI model, learning model, prediction model, or inference model. The information processing device 10 has a storage unit 11 and a processing unit 12. 【0018】 The memory unit 11 may be a volatile semiconductor memory such as RAM (Random Access Memory), or a non-volatile storage such as an HDD (Hard Disk Drive) or flash memory. The processing unit 12 is a processor such as a CPU (Central Processing Unit), GPU (Graphics Processing Unit), or DSP (Digital Signal Processor). However, the processing unit 12 may also include application-specific electronic circuits such as an ASIC (Application Specific Integrated Circuit) or FPGA (Field Programmable Gate Array). The processor executes programs stored in memory such as RAM (which may also be the memory unit 11). A collection of multiple processors is sometimes called a "multiprocessor" or simply a "processor." 【0019】 The memory unit 11 stores multiple images. These multiple images are used to generate new training data. For example, each of the multiple images may include a person's face. However, the multiple images do not necessarily have to represent a person's face; for example, they may represent an animal or a predetermined object. 【0020】Each of the multiple images is pre-labeled to indicate whether or not it is fake. Specifically, if the faces in the image belong to real people and are not synthesized using AI technology, the image is labeled as not fake, i.e., real. Conversely, if the faces in the image are synthesized using AI technology, the image is labeled as fake. 【0021】 Furthermore, the memory unit 11 stores model information, such as the model to be trained and the values ​​of the parameters included in the model, which are used for detecting fake images. The model is, for example, a neural network. One example of such a model is a convolutional neural network (CNN). 【0022】 The processing unit 12 generates new training data, i.e., extended training data, based on a plurality of images stored in the memory unit 11. Specifically, this is done as follows: First, the processing unit 12 obtains a pair of first and second images from the plurality of images stored in the memory unit 11. The first image includes the face of a first person. The second image includes the face of a second person. In Figure 1, the first image 20 and the second image 30 are shown as examples. 【0023】 The processing unit 12 replaces the low-frequency components 21a, which correspond to frequencies lower than the first frequency, among the frequency components contained in the first partial image 21, which is a part of the first image 20, with the low-frequency components of the second partial image 31, which is the part of the second image 30 that corresponds to the first partial image 21. The processing unit 12 generates a composite image 40 through this replacement. 【0024】Specifically, first, the processing unit 12 randomly selects a sub-region from the entire area of ​​the first image 20 that corresponds to the first sub-image 21. For example, the processing unit 12 randomly obtains a parameter λ. λ is called the mixing ratio. 0 < λ < 1. λ ~ Beta(α, α), α = 1. Beta(α, α) is the beta function. Beta(α, α), α = 1 represents a uniform distribution. Then, the processing unit 12 randomly selects a square sub-region with sides of length √(1-λ) from the first image 20 and designates this sub-region as the first sub-image 21. Note that the side length of each image is 1. Also, √(1-λ) represents (1-λ)^(1 / 2). The processing unit 12 designates the sub-region in the second image 30 that corresponds to the first sub-image 21 as the second sub-image 31. 【0025】 Next, the processing unit 12 removes low-frequency components 21a corresponding to frequencies lower than the first frequency from the first partial image 21. The processing unit 12 also removes high-frequency components 31a corresponding to frequencies higher than the first frequency from the second partial image 31, which is the portion of the second image 30 corresponding to the first partial image. 【0026】 For example, the processing unit 12 can perform the following operations to retain only the low / high frequency components of an image. The processing unit 12 may perform a discrete Fourier transform on the target image and then remove frequency components in a specific frequency range. The processing unit 12 may also obtain the low frequency components of an image by Gaussian smoothing, and then obtain the high frequency components of the image by comparing the obtained low frequency components with the original image. Note that the process of obtaining the low frequency components of an image, i.e., low-pass filter processing, is equivalent to removing the high frequency components of the image. Also, the process of obtaining the high frequency components of an image, i.e., high-pass filter processing, is equivalent to removing the low frequency components of the image. 【0027】Here, the processing unit 12 may use, for example, the X percentile value within the full frequency range included in each image as the first frequency. The processing unit 12 can be arbitrarily set to X = 25, 50, 75, …. That is, the processing unit 12 may regard components with frequencies lower than, for example, the first frequency f = H / 4 (or H / 2, (3H) / 4, etc.) with respect to the maximum frequency H of the frequency components included in each image as low-frequency components. The processing unit 12 may regard components with frequencies higher than, for example, the first frequency f = H / 4 (or H / 2, (3H) / 4, etc.) as high-frequency components. 【0028】 The processing unit 12 generates a composite image 40 by synthesizing the second partial image 31 after removing the high-frequency component 31a with the first partial image 21 after removing the low-frequency component 21a in the first image 20. Thereby, at the location of the first partial image 21 of the composite image 40, the original high-frequency component remains, and only the low-frequency component is replaced with the low-frequency component of the second partial image 31. 【0029】 The processing unit 12 assigns a label L3 to the composite image 40 based on a first label (label L1) indicating whether the first image 20 is fake and a second label (label L2) indicating whether the second image 30 is fake. 【0030】 For example, the label is a pair of two numerical values [a, b], where real is represented by [1, 0] and fake is represented by [0, 1]. For example, label L1 = [a 1 , b 1 , label L2 = [a 2 , b 2 . In this case, the processing unit 12 may set label L3 = [a 1 , b 1 × λ + [a 2 , b 2 × (1 - λ) = [a 1 × λ + a 2 × (1 - λ), b 1 × λ + b 2 × (1 - λ)] according to the aforementioned mixing ratio λ. 【0031】However, since the location of the first partial image 21 is randomly selected, local feature information useful for fake detection may be lost in the composite image 40 due to the mixing of the two images, and there is a possibility that the label L3 may be an incorrect label in this label calculation method. 【0032】 Therefore, it is preferable that the processing unit 12 obtains a pair of images with the same label as the first image 20 and the second image 30 and generates the composite image 40. In this case, the label L3 becomes the same label as the label L1 and the label L2. That is, when the labels L1 and L2 are real, the label L3 is also real. On the other hand, when the labels L1 and L2 are fake, the label L3 is also fake. Thereby, the incorrect label as described above is prevented. 【0033】 The processing unit 12 generates a composite image by the above method for each pair of images stored in the storage unit 11 and assigns a label to the composite image. The processing unit 12 generates a machine learning model that outputs an index indicating the possibility that an image including a person's face is fake for an input of an image including a person's face by machine learning using each generated composite image and its label. For example, the index indicating the possibility of being fake is the probability that the image is fake (fake prediction probability). 【0034】 The processing unit 12 can obtain an index indicating the possibility that an image is fake by inputting, for example, an image including a person's face into the generated machine learning model. For example, the user can grasp the possibility that the image is fake by referring to the fake prediction probability output by the information processing device 10, and can efficiently identify an image suspected of being fake. Note that the machine learning model generated by the information processing device 10 may be provided to another information processing device and may be used for inference processing by another information processing device. 【0035】As described above, the information processing device 10 obtains pairs of images from multiple images. A composite image is generated by replacing the low-frequency components of the first partial image, which is a part of the first image belonging to the pair, with the low-frequency components of the second partial image, which is the part of the second image belonging to the pair that corresponds to the first partial image. A label based on a first label indicating whether the first image is fake or not, and a second label indicating whether the second image is fake or not, is assigned to the composite image. Machine learning is performed using the composite image generated for each pair of images obtained from multiple images and the label for each composite image to generate a machine learning model that outputs an index indicating the possibility that the third image is fake for the input of the third image. As a result, the information processing device 10 can improve the accuracy of fake detection. 【0036】 Here, for example, in the case of fake images such as deepfakes, the detailed parts of the image are characteristic. The detailed parts of an image are reflected in the high-frequency components of the image. Therefore, when the processing unit 12 synthesizes the second part of the second image with the first part of the first image, it retains the high-frequency components of the first part of the image and replaces only the low-frequency components of the first part of the image with the low-frequency components of the second part of the image. As a result, the information processing unit 10 can appropriately retain features useful for fake detection in the synthesized image. Furthermore, by using the synthesized image as training data, the information processing unit 10 can enable a machine learning model to appropriately learn the basis for judging whether an image is real or fake. 【0037】 Furthermore, as mentioned above, the information processing device 10 may acquire a pair of images with the same label as the first and second images, generate a composite image, and assign the same label to the composite image as to the first and second images. This prevents the information processing device 10 from assigning a false label to the composite image. Therefore, by using the composite image and label as training data, the information processing device 10 can enable the machine learning model to learn the basis for real / fake judgments more appropriately, further improving the accuracy of fake detection. 【0038】[Second Embodiment] Next, a second embodiment will be described. Figure 2 is a diagram showing an example of the hardware of the information processing device of the second embodiment. The information processing device 100 generates training data using existing images and performs machine learning on a model for detecting fake images using the generated training data. The information processing device 100 may also be called a computer. 【0039】 The information processing device 100 includes a processor 101, RAM 102, HDD 103, GPU 104, input interface 105, media reader 106, and communication interface 107. These units of the information processing device 100 are connected to a bus internally. The processor 101 corresponds to the processing unit 12 of the first embodiment. The RAM 102 or HDD 103 corresponds to the storage unit 11 of the first embodiment. 【0040】 The processor 101 is an arithmetic unit that executes program instructions. The processor 101 is, for example, a CPU. The processor 101 loads at least a portion of the program and data stored in the HDD 103 into the RAM 102 and executes the program. The processor 101 may include multiple processor cores. The information processing device 100 may have multiple processors. The processor that executes one of the multiple processes performed by the information processing device 100 may be different from the processor that executes a different process from the multiple processes. In addition, at least a portion of the processes described below may be executed in parallel using multiple processors or processor cores. A collection of multiple processors is sometimes called a "multiprocessor" or simply a "processor". A processor may also be called a "processor circuitry". 【0041】 RAM 102 is a volatile semiconductor memory that temporarily stores programs executed by the processor 101 and data used by the processor 101 for calculations. The information processing device 100 may also be equipped with other types of memory, and may be equipped with multiple types of memory. 【0042】The HDD 103 is a non-volatile storage device that stores software programs such as the OS (Operating System), middleware, and application software, as well as data. The information processing device 100 may also be equipped with other types of storage devices such as flash memory or SSD (Solid State Drive), and may be equipped with multiple non-volatile storage devices. 【0043】 The GPU 104 outputs an image to the display 51 connected to the information processing device 100, according to instructions from the processor 101. Any type of display can be used as the display 51, such as a CRT (Cathode Ray Tube) display, a liquid crystal display (LCD), a plasma display, or an organic electro-luminescence (OEL) display. 【0044】 The input interface 105 acquires input signals from an input device 52 connected to the information processing device 100 and outputs them to the processor 101. The input device 52 can be a pointing device such as a mouse, touch panel, touchpad, or trackball, a keyboard, a remote controller, or a button switch. Furthermore, multiple types of input devices may be connected to the information processing device 100. 【0045】 The media reader 106 is a reading device that reads programs and data recorded on the recording medium 53. The recording medium 53 can be, for example, a magnetic disk, an optical disk, a magneto-optical disk (MO), or semiconductor memory. Magnetic disks include flexible disks (FD) and HDDs. Optical disks include CDs (Compact Discs) and DVDs (Digital Versatile Discs). 【0046】The media reader 106 copies programs and data read from the recording medium 53 to other recording media such as RAM 102 or HDD 103. The read programs are executed by the processor 101, for example. The recording medium 53 may be a portable recording medium and may be used for distributing programs and data. The recording medium 53 and HDD 103 are sometimes referred to as computer-readable recording media. 【0047】 The communication interface 107 is connected to the network 54 and communicates with other information processing devices via the network 54. The communication interface 107 may be a wired communication interface connected to a wired communication device such as a switch or router, or a wireless communication interface connected to a wireless communication device such as a base station or access point. 【0048】 Figure 3 shows an example of the functions of an information processing device. The information processing device 100 includes a training data storage unit 110, an extended training data storage unit 120, a learned data storage unit 130, a mixing ratio acquisition unit 140, a pair sampling probability calculation unit 141, a pair acquisition unit 142, a square selection unit 143, a high-pass filter unit 144, a low-pass filter unit 145, a data mixing unit 146, a learning processing unit 147, and an inference processing unit 148. The training data storage unit 110, the extended training data storage unit 120, and the learned data storage unit 130 use the storage areas of RAM 102 and HDD 103. The mixing ratio acquisition unit 140, the pair sampling probability calculation unit 141, the pair acquisition unit 142, the square selection unit 143, the high-pass filter unit 144, the low-pass filter unit 145, the data mixing unit 146, the learning processing unit 147, and the inference processing unit 148 are realized by the execution of a program stored in RAM 102 by the processor 101. 【0049】 The training data storage unit 110 stores images containing human faces, which serve as the source for generating extended training data. In this example, the images targeted for fake detection are assumed to be videos. A video can also be described as a series of n frames, where n is approximately 16 to 32. The images targeted for fake detection may also be still images. 【0050】Each image stored in the training data storage unit 110 is pre-assigned a label indicating whether it is real or fake. The label "real" is represented as [1,0], indicating that the image is not fake. The label "fake" is represented as [0,1], indicating that the image is fake. In addition, each image is pre-assigned identification information for a group corresponding to the attribute to which the image belongs. The attribute or the group corresponding to the attribute may classify gender, such as male or female. The attribute or the group corresponding to the attribute may classify race, age, etc. 【0051】 The extended training data storage unit 120 stores composite images generated based on the images stored in the training data storage unit 110. The trained data storage unit 130 stores information about the trained machine learning model, i.e., the trained model. 【0052】 The mixing ratio acquisition unit 140 acquires the mixing ratio λ (0 < λ < 1). For example, the mixing ratio acquisition unit 140 acquires λ according to a uniform distribution (Beta(1,1)). The pair sampling probability calculation unit 141 calculates the pair sampling probability. The pair sampling probability is calculated for each group. The pair sampling probability indicates the probability of extracting an image belonging to the relevant group. The pair sampling probability is used when extracting another image to be synthesized with a given image, that is, the image that will be the partner in the pair (pair image), from the training data storage unit 110. The pair sampling probability is determined by a probability distribution that follows the reciprocal of the number of images belonging to the group. As a result, when acquiring pairs of images, the minority is preferentially sampled and is more likely to be mixed with the majority in the original image. This makes it possible to generate composite images based on diverse groups. 【0053】The pair acquisition unit 142 acquires image pairs based on a plurality of images stored in the training data storage unit 110 and the pair sampling probability calculated by the pair sampling probability calculation unit 141. At this time, the pair acquisition unit 142 creates pairs of two images that have the same label. For example, for an image labeled "real", the pair acquisition unit 142 selects an image labeled "real" as the pair image. For an image labeled "fake", the pair acquisition unit 142 selects an image labeled "fake" as the pair image. 【0054】 The square selection unit 143 selects a square region to be composited from two images in a pair acquired by the pair acquisition unit 142, based on the mixing ratio λ acquired by the mixing ratio acquisition unit 140. Here, the two images in the pair are images A and B. The length of one side of each image is 1. The square selection unit 143 randomly selects a square region with a side length of √(1-λ) from image A. The square selection unit 143 selects a square region of the same size in the same location in image B. 【0055】 The high-pass filter unit 144 performs high-pass filtering on a selected square region of image A. That is, the high-pass filter unit 144 removes low-frequency components and retains high-frequency components in the square region of image A. 【0056】 The low-pass filter unit 145 performs low-pass filtering on a selected square region of image B. That is, the low-pass filter unit 145 removes high-frequency components and retains low-frequency components in the square region of image B. 【0057】 Here, low frequencies and high frequencies are distinguished by predetermined specific frequencies. For example, high frequencies are frequencies above a certain frequency. Low frequencies are frequencies below a certain frequency. The specific frequency is predetermined as a frequency corresponding to X% of the maximum frequency contained in each image stored in the training data storage unit 110. For example, X = 25, 50, 75, etc. 【0058】The high-pass filter section 144 and the low-pass filter section 145, for example, perform a discrete Fourier transform on the target image and then remove frequency components outside the frequency range they each transmit. The low-pass filter section 145 may also acquire low-frequency components of the image by Gaussian smoothing. The high-pass filter section 144 may acquire high-frequency components of the image by comparing the low-frequency components acquired by the low-pass filter section 145 with the original image. 【0059】 The data mixing unit 146 generates a composite image by combining the square region of image B, which has been processed with a low-pass filter, with the square region of image A, which has been processed with a high-pass filter. At this time, the data mixing unit 146 retains the high-frequency components of the square region of image A and replaces the low-frequency components with the low-frequency components of the square region of image B. 【0060】 Furthermore, the data mixing unit 146 assigns the same label to the composite image as to each of the source images. For example, if the label of each of the source images is "Real," the data mixing unit 146 assigns the label "Real" to the composite image. If the label of each of the source images is "Fake," the data mixing unit 146 assigns the label "Fake" to the composite image. 【0061】 The data mixing unit 146 generates a composite image for each of the multiple pairs acquired by the pair acquisition unit 142 and assigns a label to the composite image. The data mixing unit 146 stores the generated composite image and label in the extended training data storage unit 120. 【0062】The learning processing unit 147 uses each composite image and the label of each composite image stored in the extended training data storage unit 120 to perform machine learning on a model (machine learning model) that outputs an index indicating the likelihood that an image containing a person's face is fake, in response to the input of such an image. That is, the learning processing unit 147 generates a machine learning model that outputs an index indicating the likelihood that an image containing a person's face is fake, in response to the input of such an image, in response to the input of such an image. The machine learning model is, for example, CNN. The index indicating the likelihood of being fake is, for example, the probability that the image is fake, i.e., the fake prediction probability. The fake prediction probability can be expressed as, for example, [0.4, 0.6] if the probability of being fake is 60%. The learning processing unit 147 stores information such as the parameters of the learned machine learning model in the trained data storage unit 130. The learning processing unit 147 may also perform additional training (retraining) of an existing model that has already been trained, using each composite image and the label of each composite image stored in the extended training data storage unit 120. 【0063】 When the inference processing unit 148 receives an image to be predicted, it inputs the image into the machine learning model stored in the trained data storage unit 130 and obtains the fake prediction probability for the image. The inference processing unit 148 outputs the fake prediction probability. For example, the inference processing unit 148 may display the fake prediction probability on the display 51, or it may transmit the fake prediction probability information to another information processing device via the network 54. 【0064】 The inference processing unit 148 may be provided by an information processing device other than the information processing device 100. In that case, the information of the trained machine learning model in the trained data storage unit 130 will be provided to the other information processing device. 【0065】 Figure 4 shows an example of the training data distribution. Pie chart 60 is an example of the proportion of images belonging to each group when each image of the training data pre-stored in the training data storage unit 110 is grouped into male and female. In addition to grouping, pie chart 60 also distinguishes between real and fake images. 【0066】 If we were to randomly select image pairs from all groups, the spurious correlation based on groups would worsen. Specifically, the likelihood of generating majority-centered samples would increase compared to minority samples, potentially leading to overfitting of the model for majority groups. As a result, biases in prediction accuracy could occur across groups. 【0067】 As illustrated in pie chart 60, the original training data contains a very large number of fake images of men. Therefore, if pairs are selected randomly, there is a high probability that composite images based solely on fake images of men will be generated. This results in a lack of diverse composite images for each group, making it difficult to improve the generalization performance of the model. 【0068】 Therefore, the pair sampling probability calculation unit 141 calculates the pair sampling probability according to the reciprocal of the number of images belonging to a group. For example, when the number of images in each of the three groups is x, y, and z, the pair sampling probability calculation unit 141 may select an image to be paired with a given image such that the ratio for each group is 1 / x:1 / y:1 / z. 【0069】 This allows images belonging to minority groups (e.g., women) to be preferentially sampled and more likely to be mixed with images belonging to majority groups (e.g., men). The information processing device 100 can generate composite images based on diverse groups, and by training a model with these composite images, the generalization performance of the model can be improved. 【0070】 Next, an example of image synthesis by the information processing device 100 will be explained. Figure 5 shows a first example of image synthesis. In Figure 5, a case is illustrated in which gender (female, male, etc.) is defined as a group for each image in the training data. For example, first, the pair acquisition unit 142 acquires image 300 from the training data storage unit 110. The label of image 300 is "Real" (= [1, 0]). The group to which image 300 belongs is "Female". 【0071】Next, the pair acquisition unit 142 acquires image 400 from the training data storage unit 110 as the paired image for image 300. The label of image 400 is "Real" (= [1, 0]). The group to which image 400 belongs is "Male". 【0072】 The square selection unit 143 randomly selects a partial image 310 from image 300 that is a square region with side length √(1-λ) based on the mixing ratio λ, and cuts out the partial image 310. The square selection unit 143 also selects a partial image 410 from image 400 that is a square region corresponding to the partial image 310, and cuts out the partial image 410. 【0073】 The high-pass filter unit 144 generates a partial image 311 by performing a high-pass filter process on the partial image 310. That is, the high-pass filter unit 144 removes low-frequency components from the partial image 310, thereby generating a partial image 311 that retains only the high-frequency components of the partial image 310. In this way, the high-pass filter unit 144 converts the portion of the image 300 that was the partial image 310 into the partial image 311. 【0074】 The low-pass filter unit 145 generates a partial image 411 by performing a low-pass filter process on the partial image 410. That is, the low-pass filter unit 145 removes high-frequency components from the partial image 410, thereby generating a partial image 411 that retains only the low-frequency components of the partial image 410. 【0075】 The data mixing unit 146 generates a composite image 500 by mixing partial image 411 with partial image 311 in the image 300 after high-pass filtering. The data mixing unit 146 assigns the same label "Real" (= [1, 0]) as the labels of images 300 and 400 to the composite image 500. The composite area of ​​the composite image 500 becomes an image having high-frequency components corresponding to partial image 311 and low-frequency components corresponding to partial image 411. 【0076】Figure 6 shows a second example of image synthesis. In Figure 6, a case is illustrated in which race (white, black, etc.) is defined as the group of each image in the training data. For example, first the pair acquisition unit 142 acquires image 600 from the training data storage unit 110. The label of image 600 is "fake" (= [0, 1]). The group to which image 600 belongs is "white". 【0077】 Next, the pair acquisition unit 142 acquires image 700 from the training data storage unit 110 as the paired image for image 600. The label of image 700 is "Fake" (= [0, 1]). The group to which image 700 belongs is "Black". 【0078】 The square selection unit 143 randomly selects a partial image 610 from image 600 that is a square region with side length √(1-λ) based on the mixing ratio λ, and cuts out the partial image 610. The square selection unit 143 also selects a partial image 710 from image 700 that is a square region corresponding to the partial image 610, and cuts out the partial image 710. 【0079】 The high-pass filter unit 144 generates a partial image 611 by performing a high-pass filter process on the partial image 610. That is, the high-pass filter unit 144 removes low-frequency components from the partial image 610, thereby generating a partial image 611 that retains only the high-frequency components of the partial image 610. In this way, the high-pass filter unit 144 converts the portion of the image 600 that was the partial image 610 into the partial image 611. 【0080】 The low-pass filter unit 145 generates a partial image 711 by performing a low-pass filter process on the partial image 710. That is, the low-pass filter unit 145 removes high-frequency components from the partial image 710, thereby generating a partial image 711 that retains only the low-frequency components of the partial image 710. 【0081】The data mixing unit 146 generates a composite image 800 by mixing partial image 711 with partial image 611 in the image 600 after high-pass filtering. The data mixing unit 146 assigns the same label "fake" (= [0,1]) as the labels of images 600 and 700 to the composite image 800. The composite area of ​​the composite image 800 becomes an image having high-frequency components corresponding to partial image 611 and low-frequency components corresponding to partial image 711. 【0082】 The information processing device 100 performs machine learning using the extended training data generated by the method illustrated in Figures 5 and 6. Next, the processing procedure of the information processing device 100 will be described. First, the procedure for the learning process by the information processing device 100 will be described. 【0083】 Figure 7 is a flowchart showing an example of the learning process. (S10) The pair acquisition unit 142 acquires training data from the training data storage unit 110. The training data includes multiple images containing the faces of multiple people, a label for each image, and information about the group to which each image belongs. 【0084】 (S11) The pair acquisition unit 142 acquires minibatches from the training data. Specifically, the pair acquisition unit 142 randomly samples M images from among multiple images in the training data. 【0085】 (S12) The information processing device 100 performs data augmentation processing. Details of the data augmentation processing will be described later. The data augmentation processing generates augmented training data, which is stored in the augmented training data storage unit 120. 【0086】 (S13) The learning processing unit 147 performs model training. Details of the model training process will be described later. (S14) The learning processing unit 147 determines whether the loss value related to the output of the model trained in step S13 has converged to a local minimum. The loss value is defined by a predetermined loss function. If the loss value has converged, the process proceeds to step S15. If the loss value has not converged, the process proceeds to step S11. 【0087】(S15) The learning processing unit 147 evaluates the model obtained as a result of step S14, i.e., the trained model, using test data. The test data consists of images that were not used in model training and whose fake prediction results are known. The learning processing unit 147 confirms that the accuracy of the fake prediction probability by the trained model meets the criteria based on the evaluation in step S15. The learning processing unit 147 stores the information of the trained model in the trained data storage unit 130 and makes it available for inference processing. The learning process then ends. The information processing device 100 may transmit the information of the trained model to other information processing devices so that other information processing devices can perform inference processing based on the trained model. 【0088】 Figure 8 is a flowchart showing an example of data augmentation processing. The data augmentation processing corresponds to step S12. (S20) The pair acquisition unit 142 acquires the image set I = (i 1 , i 2 , ..., i M For ), the set of pairs J = (j 1 ,j 2 , ..., j M Select the image i with the same index k. For example, select the image i with the same index k. k and image j k and are a pair. 【0089】 (S21) The mixing ratio acquisition unit 140 samples the mixing ratio λ from Beta(1,1). The square selection unit 143 randomly selects and cuts out a square with sides of √(1-λ) from the image pair acquired in step S20. The mixing ratio λ and the selection of the square region are performed for each image pair. 【0090】 (S22) The high-pass filter unit 144 applies a high-pass filter to the region of image i ∈ I selected in step S21. The low-pass filter unit 145 applies a low-pass filter to the region of image j ∈ J selected in step S21. 【0091】 (S23) The data mixing unit 146 mixes the data of each pair. Specifically, the data mixing unit 146 mixes the image i kIn the region after high-pass filtering in ∈I, image j k The image of the region after low-pass filtering in ∈J is synthesized. The data mixing unit 146 processes the pair (i k ,j k Step S23 is performed for each pair to generate a composite image. The data mixing unit 146 adds image i to the generated composite image. k ,j k The same label is assigned to the data, and it is stored in the extended training data storage unit 120. Then the data extension process is completed. 【0092】 Figure 9 is a flowchart showing an example of the model training process. The model training process corresponds to step S13. (S30) The learning processing unit 147 inputs each composite image stored in the extended training data storage unit 120 to the model and obtains a prediction result. The prediction result is the fake prediction probability. 【0093】 (S31) The learning processing unit 147 calculates the loss value using the loss function based on the correct fake probability based on the label of each composite image and the prediction result of step S30. (S32) The learning processing unit 147 calculates the gradient of the loss function for each weight included in the model by backpropagation. 【0094】 (S33) The learning processing unit 147 updates each weight of the model based on the calculation results of step S32, using methods such as stochastic gradient descent. Then, the learning processing unit 147 terminates the model training process. 【0095】 Next, the procedure for inference processing by the information processing device 100 will be described. Figure 10 is a flowchart showing an example of inference processing. (S40) The inference processing unit 148 acquires the input image. The input image may be input to the information processing device 100 from another information processing device via the network 54. 【0096】 (S41) The inference processing unit 148 performs fake detection using the trained model. That is, the inference processing unit 148 inputs the input image into the trained model and obtains the fake prediction probability as the output of the trained model. 【0097】(S42) The inference processing unit 148 outputs the fake prediction probability of the input image. The inference processing unit 148 may display the fake prediction probability on the display 51. The inference processing unit 148 may transmit the fake prediction probability information to other information processing devices via the network 54. Then the inference process ends. 【0098】 As mentioned above, the inference processing using the trained model may be performed by an information processing device other than the information processing device 100. For example, a system that provides a fake detection service may perform inference processing using a trained model on an input image input by a user and provide the user with a fake prediction probability. 【0099】 Next, a comparative example will be explained. Figure 11 shows a comparative example. The comparative example is a simple CutMix example. Figure 11 illustrates, for example, the case where the same part of image 400 is pasted onto a part of image 700. Label y of image 700 A This is a "fake" (= [0,1]). Label y in image 400 B is "real" (= [1, 0]). 【0100】 First, the mixing ratio λ is determined. In the example in Figure 11, λ = 0.6. Next, a square region 720 with sides of √(1-λ) is randomly selected from image 700, and a square region 420 from the same location as square region 720 is extracted from image 400. Image 900 is generated by cutting and pasting the image of square region 420 onto square region 720 in image 700. At this time, no filtering such as high-pass or low-pass filtering is applied to either square region 720 or 420. Image 900 is simply obtained by replacing square region 720, which is part of image 700, with the image of square region 420, which is part of image 400. Then, a label λy is applied to image 900. A + (1 - λ) y B = 0.6 × [0, 1] + 0.4 × [1, 0] = [0.4, 0.6] is assigned. 【0101】In the comparative example, by mixing parts of two images and also mixing labels, diverse synthetic data can be generated that can improve the generalization performance of the model. By using synthetic data with intermediate labels as training data, the model can form a smoother decision boundary and potentially prevent it from becoming overly dependent on specific image features. 【0102】 However, in the comparative example, the image of square region 420 is simply pasted onto the square region 720 of image 700, so the information contained in the image of square region 720 is lost from image 900. Here, typical deepfake features appear in the detailed parts of the image, such as the eyes, nose, and mouth. In the example of image 700, square region 720 contains fine details such as the eyes, nose, and mouth, and if these parts are simply replaced with the image of square region 420, the typical deepfake features are lost from image 900. In this case, even though the deepfake features are lost from image 900, the fake ratio of the label for image 900 becomes 0.6, resulting in a mislabel. That is, the label [0.4, 0.6] has too high a fake ratio and is not an appropriate label for image 900. 【0103】 Thus, when fine details of an image are locally lost due to mixing, the model becomes unable to properly learn the basis for determining whether an image is real or fake. Furthermore, mixing different labels can lead to mislabeling problems. These factors contribute to a decrease in the model's ability to predict fakes in the comparative example method. 【0104】 Furthermore, in the comparative example, pairs of original images are selected randomly, regardless of the group. As a result, as explained in Figure 4, there is a possibility of generating a majority-centered sample rather than a minority sample, and a model that overfits the majority group may be generated. 【0105】Figure 12 shows an example of fake detection accuracy. Bar graph 70 shows the fake detection accuracy when using the comparative example method in Figure 11 and when using the information processing device 100. Series 71 is the average value (balanced accuracy) of fake detection accuracy for each group when using the comparative example method. Series 72 is the average value (balanced accuracy) of fake detection accuracy for each group when using the information processing device 100 method. The fake detection accuracy for a given group may be measured, for example, as the ratio of the number of images in the test data that were correctly determined to be real or fake by the model according to a predetermined criterion, out of the total number of images belonging to a given group. 【0106】 As for the predetermined criteria, for example, for input images labeled "fake" in the test data, it could be determined that the image was correctly identified as fake if the model output predicts a fake probability of p% or higher (e.g., p = 70). In this case, for input images labeled "real," it could be determined that the image was correctly identified as real if the model output predicts a fake probability of less than (100-p)%. 【0107】 Comparing series 71 and 72, the comparative example method exhibits poorer balance accuracy compared to the information processing device 100 method. This is because the fake detection accuracy is high for majority groups such as men, but low for minority groups such as women. Furthermore, in the comparative example method, the loss of deepfake features or mislabeling in the synthesized images may have prevented the model from properly learning to distinguish between real and fake images, which is another contributing factor to the poor balance accuracy. 【0108】On the other hand, when acquiring pairs of images, the information processing device 100 uses a pair sampling probability that takes minority groups into account. The information processing device 100 also pairs images A and B with the same label. When the information processing device 100 combines a partial image of image B with a partial image of image A, it retains high-frequency components from image A and reflects only the low-frequency components of the partial image of image B into image A. As a result, the information processing device 100 can generate augmented training data that is robust to minority groups and deepfake features, and can train a model using this augmented training data. As a result, the information processing device 100 can improve the balance accuracy compared to the comparative example method, as shown in the bar graph 70. Because the information processing device 100 can suppress mislabeling and appropriately retain deepfake features in each composite image of the augmented training data, the model can appropriately learn real / fake features. In this way, the information processing device 100 can improve the accuracy of fake detection. 【0109】 Currently, there is a lot of development underway to prevent the spread of deepfakes by using data-driven AI-based deepfake detection technologies. However, the generalization performance of these models is still insufficient. A problem with the accuracy of deepfake detection by these models is that they are biased in attributes such as gender, age, and ethnicity. For example, deepfakes featuring individuals with dark skin tones may not be detected as well as deepfakes featuring individuals with lighter skin tones. As a result, attackers can generate harmful deepfakes targeting specific population groups and evade deepfake detection. 【0110】 According to the information processing device 100, by preferentially sampling from minority groups and synthesizing only the low-frequency components of parts of other images with a part of one image, it improves balance accuracy while preserving the deepfake-specific features that are prominently present in the high-frequency components. As a result, the information processing device 100 can correct the bias of deepfake detection technology and build a more generalizable model. Therefore, the information processing device 100 can improve the detection accuracy of harmful deepfakes that target specific population groups, such as those described above. 【0111】 As explained above, the information processing device 100 performs the following processing, for example: The processor 101 obtains pairs of images from multiple images. The processor 101 generates a composite image by replacing the low-frequency components corresponding to frequencies lower than the first frequency in the first partial image, which is a part of the first image belonging to the pair, with the low-frequency components of the second partial image, which is the part of the second image belonging to the pair that corresponds to the first partial image. The processor 101 assigns labels to the composite image based on a first label indicating whether the first image is fake or not, and a second label indicating whether the second image is fake or not. The processor 101 generates a machine learning model that outputs an index indicating the possibility that the third image is fake for the input of the third image, using machine learning with the composite image generated for each pair of images obtained from multiple images and the labels for each composite image. This allows the information processing device 100 to improve the accuracy of fake detection. 【0112】 The information processing device 100 performs the following processing, for example, when generating a composite image: The processor 101 removes low-frequency components from the first partial image. The processor 101 removes high-frequency components corresponding to frequencies above the first frequency from the second partial image. The processor 101 generates a composite image by combining the second partial image, from which the high-frequency components have been removed, with the first partial image from which the low-frequency components have been removed. As a result, the information processing device 100 can retain features in the first image that are useful for determining whether an image is real or fake in the composite image, and can enable a machine learning model to appropriately learn these features. 【0113】For example, in generating a composite image, the processor 101 retains high-frequency components corresponding to frequencies of the first partial image and above in the portion of the first partial image that corresponds to the first partial image of the composite image. This allows the information processing device 100 to retain features in the first image that are useful for determining whether an image is real or fake, and enables a machine learning model to appropriately learn these features. The processor 101 may also retain high-frequency components in the first partial image that correspond to frequencies of the second frequency and above, which are greater than the first frequency, in the composite image. 【0114】 Each of the multiple images may include a person's face. This allows the information processing device 100 to appropriately detect fake images that are used maliciously, for example, to spread misinformation, infringe on privacy, or manipulate political decision-making. However, each of the multiple images may include animals or other objects. For example, the fake image to be detected may be a fake image of an animal. 【0115】 When acquiring image pairs, the processor 101 may acquire two images with the same label as a pair and assign that same label to the composite image. This prevents mislabeling by the information processing device 100 and allows the machine learning model to appropriately learn the features of real / fake images. As a result, the information processing device 100 can improve the accuracy of fake image detection. 【0116】Each of the multiple images may be classified into one of several attributes. For example, if an image contains a person, the attributes could include gender, race, and age corresponding to the person in the image. In this case, when the processor 101 acquires a pair of images, it may preferentially acquire the image of the attribute with the fewest images classified among the multiple attributes as one of the pair. This allows the information processing device 100 to generate a composite image by combining images with diverse attributes, thereby improving the generalization performance of the model. One way to preferentially acquire images of attributes with a small number of images classified is, for example, to select them in the ratio of the reciprocal of the number of images classified for each attribute. Specifically, for example, when the number of images for each of three attributes is x, y, and z, the processor 101 may acquire the images to be paired with a given image such that the ratio for each attribute is 1 / x:1 / y:1 / z. 【0117】 Furthermore, the information processing device 100 may perform inference processing using the generated machine learning model. That is, the processor 101 may input an input image to a machine learning model that is generated by machine learning using a composite image generated for each pair of images in a plurality of images and a label for each composite image, and outputs an index indicating the possibility that an image is fake for the input image, and execute a process to obtain the index for the input image. The input image is stored in a memory unit such as RAM 102. The label for the composite image is assigned to the composite image based on a first label indicating whether the first image belonging to the pair of images is fake or not, and a second label indicating whether the second image belonging to the pair is fake or not. The composite image is an image generated by replacing the low-frequency components corresponding to frequencies lower than the first frequency among the frequency components contained in the first partial image, which is a part of the first image, with the low-frequency components of the second partial image, which is the part of the second image that corresponds to the first partial image. This allows the information processing device 100 to improve the accuracy of fake detection. The inference processing using the above machine learning model may be performed by an information processing device other than the information processing device 100. 【0118】Furthermore, the processor 101 may be a collection of multiple processors, i.e., a multiprocessor. For example, each of the above processes executed by the processor 101 may be divided and executed by multiple processors. 【0119】 Furthermore, the information processing in the first embodiment can be achieved by having the processing unit 12 execute a program. Similarly, the information processing in the second embodiment can be achieved by having the processor 101 execute a program. The program can be recorded on a computer-readable recording medium 53. 【0120】 For example, a program can be distributed by distributing a recording medium 53 on which the program is stored. Alternatively, the program may be stored on another computer and distributed via a network. A computer may, for example, store (install) a program stored on the recording medium 53 or a program received from another computer into a storage device such as RAM 102 or HDD 103, and then read and execute the program from that storage device. 【0121】 The above merely illustrates the principle of the present invention. Furthermore, numerous modifications and changes are possible for those skilled in the art, and the present invention is not limited to the exact configurations and applications shown and described above. All corresponding modifications and equivalents are considered to be within the scope of the present invention as defined by the appended claims and their equivalents. 【0122】 10 Information processing device 11 Storage unit 12 Processing unit 20 First image 21 First partial image 21a Low frequency component 30 Second image 31 Second partial image 31a High frequency component 40 Composite image

Claims

1. A learning program that causes a computer to perform the following processes: acquire pairs of images from multiple images; generate a composite image by replacing the low-frequency components corresponding to frequencies lower than the first frequency in a first partial image, which is a part of the first image belonging to the pair, with the low-frequency components of a second partial image, which is the part of the second image belonging to the pair that corresponds to the first partial image; assign labels to the composite image based on a first label indicating whether the first image is fake and a second label indicating whether the second image is fake; and generate a machine learning model that outputs an index indicating the possibility that a third image is fake for input of a third image, using the composite image generated for each pair acquired from the multiple images and the labels for each composite image.

2. The learning program according to claim 1, wherein in generating the composite image, the low-frequency components are removed from the first partial image, high-frequency components corresponding to frequencies greater than or equal to the first frequency are removed from the second partial image, and the composite image is generated by combining the second partial image after the removal of the high-frequency components with the first partial image from the first image after the removal of the low-frequency components.

3. The learning program according to claim 1, wherein in the generation of the composite image, high-frequency components corresponding to frequencies of the first partial image and above are retained in the portion of the composite image corresponding to the first partial image.

4. The learning program according to claim 1, wherein each of the plurality of images includes a person's face.

5. The learning program according to claim 1, wherein when acquiring the pair, two images having the same label are acquired as the pair, and the same label is assigned to the composite image.

6. The learning program according to claim 1, wherein each of the plurality of images is classified into one of a plurality of attributes, and when acquiring the pair, the program preferentially acquires the image of the attribute with the fewer number of images classified as one of the pair.

7. An information processing device comprising: a storage unit for storing multiple images; an information processing device that acquires pairs of images from the multiple images, generates a composite image by replacing the low-frequency components corresponding to frequencies lower than the first frequency among the frequency components contained in a first partial image which is a part of the first image belonging to the pair with the low-frequency components of a second partial image which is a part of the second image belonging to the pair that corresponds to the first partial image; assigns labels to the composite image based on a first label indicating whether the first image is fake or not and a second label indicating whether the second image is fake or not; and generates a machine learning model that outputs an index indicating the possibility that the third image is fake for input of a third image by machine learning using the composite image generated for each pair acquired from the multiple images and the label for each composite image; 8. An inference program comprising: a machine learning model generated by machine learning using a composite image generated for each pair of images in a plurality of images and a label for each composite image, wherein an input image is input to the machine learning model which outputs an index indicating the possibility that an image is fake in response to an image input, and the program is made to execute a process to obtain the index for the input image, wherein the label is assigned to the composite image based on a first label indicating whether or not the first image belonging to the pair is fake and a second label indicating whether or not the second image belonging to the pair is fake, and the composite image is an image generated by replacing the low-frequency components corresponding to frequencies lower than the first frequency among the frequency components contained in a first partial image which is a part of the first image with the low-frequency components of a second partial image which is the part of the second image corresponding to the first partial image.

9. Information processing device comprising: a storage unit for storing input images; a machine learning model generated by machine learning using a composite image generated for each pair of images in a plurality of images and a label for each composite image, wherein the input image is input to the machine learning model which outputs an index indicating the possibility that an image is fake in response to an input image, and the processing unit acquires the index for the input image, wherein the label is assigned to the composite image based on a first label indicating whether or not a first image belonging to the pair is fake and a second label indicating whether or not a second image belonging to the pair is fake, and the composite image is an image generated by replacing a low-frequency component corresponding to a frequency lower than a first frequency among the frequency components included in a first partial image which is a part of the first image with the low-frequency component of a second partial image which is a part of the second image corresponding to the first partial image.