AI-generated image detection method based on high-frequency reconstruction and regional consistency disturbance
By employing high-frequency reconstruction feature extraction, regional consistency perturbation modeling, and forgery residual mapping methods, this study addresses the shortcomings in accuracy and robustness of image detection generated by diffusion models in existing technologies, achieving more efficient detection results.
Patent Information
- Authority / Receiving Office
- CN · China
- Patent Type
- Applications(China)
- Current Assignee / Owner
- CHANGSHA UNIVERSITY OF SCIENCE AND TECHNOLOGY
- Filing Date
- 2026-03-27
- Publication Date
- 2026-06-19
Smart Images

Figure FT_1 
Figure FT_2 
Figure SMS_5
Abstract
Description
Technical Field
[0001] This invention relates to the fields of image forensics and multimedia security technology, specifically to an AI-generated image detection method based on high-frequency reconstruction and regional consistency perturbation. Background Technology
[0002] With the rapid development of generative techniques such as Generative Adversarial Networks (GANs) and diffusion models, the quality of generated images has approached, and in some scenarios even surpassed, human perception. These generative techniques extend information from a low-dimensional latent space to a high-resolution image through progressive feature recombination and filling. However, this process inevitably introduces a series of high-frequency artifacts, such as edge anomalies, unnatural oversmoothing, or abrupt changes in local features. These high-frequency artifacts are usually absent in real images and have become an important basis for distinguishing generated images from natural images.
[0003] Existing generative image detection methods mainly rely on modeling global features or simulating features based on specific generators, which may limit the accuracy of forensic results generated across generative methods. For example, the paper "ChuangchuangTan, Yao Zhao, Shikui Wei, Guanghua Gu and Yunchao Wei. 'Learning on Gradients: Generalized Artifacts Representation for GAN-Generated Images Detection.' The Conference on Applications of Computer Vision (CVPR), 2023." discloses a method for detecting GAN-generated images using the discriminator and gradients of a GAN as a general artifact representation. The paper "Yonghyun Jeong, Doyeon Kim, Seungjai Min, Seongho Joe, Youngjune Gwon and Jongwon Choi. 'BiHPF: Bilateral High-Pass Filters for Robust Deepfake Detection.' The Conference on Applications of Computer Vision (CVPR), 2022." discloses a method for enhancing the robustness of deep fake image detection using bilateral high-pass filters (BiHPF). The paper "Chuangchuang Tan, Huan Liu, Yao Zhao, Shikui Wei, Guanghua Gu, Ping Liu, Yunchao Wei. 'Rethinking the "Up-Sampling Operations in CNN-based Generative Network for Generalizable Deepfake Detection." The Conference on Applications of Computer Vision (CVPR), 2024. "Discloses how to determine the authenticity of generated images by analyzing local artifacts caused by upsampling operations in commonly used generative models.
[0004] In view of this, this invention addresses deeper image manipulation forensics targets by proposing an AI-generated image detection method based on high-frequency reconstruction and region consistency perturbation to improve the detection accuracy of AIGC content. Summary of the Invention
[0005] This invention proposes an AI-generated image detection method based on high-frequency reconstruction and region consistency perturbation, addressing the problems of existing methods in detecting diffusion-model-generated images, such as difficulty in capturing high-frequency feature anomalies, insufficient region consistency modeling, and limited generalization ability. The proposed method comprehensively utilizes high-frequency reconstruction features, high-frequency perturbation features, and forgery residual mapping features. Through multi-scale modeling and feature fusion, it significantly improves the accuracy and robustness of AI-generated image detection. The method includes:
[0006] A high-frequency reconstruction feature extraction method is proposed: through the high-frequency reconstruction process, the feature recombination and expansion operation of the diffusion model during image generation is simulated to extract the unique high-frequency abnormal features in the generated image.
[0007] A regional consistency perturbation modeling method is proposed: based on the difference between the smooth features of local regions and the original features, the discontinuities and consistency deviations between regions in the image generated by the diffusion model are captured.
[0008] A method for spoofing residual mapping is proposed: high-frequency reconstruction features and regional consistency features are integrated, and a spoofing residual map is generated through nonlinear mapping and residual modeling, which further enhances the saliency of spoofing features.
[0009] The method of this invention has high detection accuracy and robustness, and is applicable to detection tasks of images generated by various diffusion models. It performs particularly well in the generalization ability test of images generated by unseen diffusion models. The specific details are as follows:
[0010] (1) High-frequency reconstruction feature extraction
[0011] Diffusion models typically construct high-resolution images through feature recombination and progressive expansion during image generation. This process inevitably introduces a series of high-frequency artifacts, such as discontinuities in edge regions, distortion of texture details, excessive smoothing of detail regions, or unnatural enhancements. These high-frequency anomalies are generally not present in real images; therefore, capturing these high-frequency artifacts introduced during the generation process can more effectively distinguish between real images and AI-generated images. To effectively capture these high-frequency anomaly features, this invention proposes a high-frequency reconstruction feature extraction method.
[0012] First, gradient features are extracted from the input image I to describe the intensity of changes in image edges and detail regions, where I(i,j) represents the pixel value at position (i,j) in image I. (Horizontal gradient) and vertical gradient The calculation is performed using a Sobel filter, as shown in the following formula:
[0013]
[0014]
[0015] Where (m, n) represents the direction change, and the X-direction weights and Y-direction weights of the Sobel filter are:
[0016]
[0017] The gradient magnitude G(X) is calculated using the horizontal and vertical gradients:
[0018]
[0019] To simulate the mechanism of feature expansion and reconstruction in the diffusion model during image generation, we first perform local average pooling (AvgPool) on the gradient magnitude map G(X) to smooth the gradient features and remove high-frequency details in local regions, thus simulating the compression and smoothing of detail information by the generative model during feature reconstruction. At the same time, by downsampling the gradient features through pooling, we can obtain a low-resolution feature representation, thereby more realistically mimicking the progressive feature reconstruction process of the diffusion model from low-dimensional latent space to high-dimensional image space.
[0020] Next, we upsample the downsampled low-resolution features to restore them to their original resolution, simulating the process of the diffusion model ultimately reconstructing a high-resolution image. During this feature expansion and reconstruction process, due to the inaccuracy of the generated features or the limitations of the feature recombination operation, the diffusion model may inevitably introduce high-frequency artifacts. Therefore, we calculate the residual between the original gradient magnitude map and the features reconstructed through pooling-upsampling, thereby effectively extracting the high-frequency anomalous features (HFRs) unique to the image generated by the diffusion model.
[0021]
[0022] To capture high-frequency anomalies at different scales, the input image is scaled at multiple scales (size∈{1,0.5,0.25}), and high-frequency residual features at each scale are calculated separately:
[0023]
[0024] Finally, the multi-scale high-frequency features are concatenated to form a complete high-frequency reconstructed feature representation:
[0025]
[0026] The output of the high-frequency reconstruction feature method This invention helps to amplify the high-frequency artifacts unique to generated images, providing crucial support for subsequent feature fusion and classification. These high-frequency artifacts typically manifest as unnatural texture distortion in edge regions or discontinuities in detail regions, phenomena rarely observed in real images. Therefore, by explicitly modeling and highlighting these anomalous high-frequency patterns, this invention effectively enhances the discriminative power between real and generated images, thereby significantly improving the detection accuracy and generalization robustness of generated images.
[0027] (2) Regional Consistency Perturbation Modeling
[0028] Region consistency is a crucial feature of natural images, typically manifested as smooth transitions between regions and high continuity of texture features. However, diffusion models, due to inaccuracies in feature recombination or context modeling during generation, may introduce anomalies in inter-regional discontinuities, such as sharpened edges or unnatural texture enhancements in local areas. These inconsistencies and local perturbations are significant forgery features in diffusion-generated images. By capturing smoothness deviations and anomalous texture changes between regions, we can provide strong support for detecting generated images. This method models region consistency perturbations. First, it performs local smoothing on the input image to obtain smoothed region feature representations. Then, by calculating the residual between the original image and the smoothed image, it extracts and quantifies the deviations in texture and feature distribution between regions, thereby effectively capturing the unique region consistency anomalies in diffusion-generated images.
[0029] To simulate the region smoothing characteristics of the natural image I, a Gaussian filter is used to smooth the input image, resulting in a smoothed feature map A(I):
[0030]
[0031] in, It is a standard Gaussian kernel, used for local weighted averaging of images, and is defined as:
[0032]
[0033] σ controls the degree of smoothing. Then, the difference between the original input image and the smoothed feature map A(I) is calculated to quantify the inconsistency bias between regions. For natural images, bias Typically small, but for generated images, the deviation is greater due to discontinuities between regions. Usually larger.
[0034]
[0035] To capture region consistency perturbation features at different scales, the input image X was scaled to multiple resolutions (size∈{1,0.5,0.25}), and region consistency features at each scale were calculated. (I).
[0036]
[0037] Finally, the multi-scale regional consistency features are concatenated to form a complete regional consistency perturbation feature representation:
[0038]
[0039] The output of the regional consistency perturbation modeling method This method effectively highlights the unique inter-regional discontinuities and local anomalous enhancements in images generated by the diffusion model, providing crucial support for image detection. By quantifying the difference between original features and locally smoothed features, this method explicitly characterizes the anomalous distribution of regional features caused by the diffusion model during feature recombination and context expansion. This explicit modeling of anomalous features effectively captures the unique inter-regional perturbation patterns in the generated images, thus providing a more discriminative basis for subsequent feature fusion and classification.
[0040] (3) Falsifying residual mapping
[0041] The main goal of the forgery residual mapping method is to further enhance the significance of high-frequency reconstruction features and region consistency perturbation features. It generates high-dimensional feature representations through feature fusion and nonlinear mapping to capture high-frequency artifacts and region consistency anomalies in the generated image. Furthermore, it generates a single-channel forgery residual map through residual modeling, significantly improving the expressive power of forgery features. High-frequency reconstruction features and region consistency features reveal the forgery characteristics in images generated by the diffusion model from different perspectives. However, these two features differ in their local and global properties, and direct use may not fully capture the comprehensive features of the forgery image. The forgery residual mapping method fuses these two features and performs nonlinear mapping and compression to generate a more discriminative high-dimensional forgery residual representation, thereby improving detection performance. The forgery residual mapping method includes the following steps:
[0042] First, high-frequency reconstruction features and regional consistency perturbation characteristics Channel-level concatenation is performed to form preliminary fused features. To enhance the expressive power of the fused features, a high-dimensional convolutional layer is used to perform a non-linear mapping on the fused features, extending them to a high-dimensional feature representation:
[0043]
[0044] Features after high-dimensional mapping This may contain redundant information. To remove redundancy and improve feature compactness, a compressive convolutional layer is used to reduce the dimensionality of the high-dimensional features:
[0045]
[0046] in, It is a compression convolution operation used to remove irrelevant information and retain important features.
[0047] The compressed features are mapped to a single-channel feature space through a residual modeling convolutional layer to generate a fake residual map R(I):
[0048]
[0049] Finally, the forged residual map is concatenated with the preliminary fused features again to form the final enhanced feature representation:
[0050]
[0051] By combining original multi-scale features with forged residual features, the comprehensive expressive ability of forged features is further improved.
[0052] The final feature representation output by the fake residual mapping method This invention incorporates high-frequency anomaly information and captures regional consistency perturbation features. Furthermore, it significantly enhances the expressive power of forgery features through residual mapping, providing strong discriminative support for subsequent classifiers. By using forgery residual mapping, this invention effectively integrates multiple forgery features and significantly improves the detection performance of images generated by diffusion models, particularly excelling in forgery detection tasks in complex scenes.
[0053] Finally The extracted multi-scale spoofing cascade features are fed into a classifier designed based on a lightweight ResNet architecture. Binary classification is performed to achieve efficient detection of images generated by the diffusion model. Attached Figure Description
[0054] Figure 1 This is a schematic diagram of the detection model of the "AI-generated image detection method based on frequency domain and region consistency modeling" of the present invention;
[0055] Figure 2 This is a visual diagram illustrating the forgery residual features of the "AI-generated image detection method based on frequency domain and regional consistency modeling" of the present invention. Detailed Implementation
[0056] This invention relates to an AI-generated image detection method based on frequency domain and region consistency modeling. For ease of explanation, this embodiment focuses on the ForenSynths dataset. Four categories (car, cat, chair, and horse) from this dataset are selected for training. Each category contains 18,000 synthetic images generated by ProGAN, along with a corresponding number of real images. All images are uniformly resized to a fixed size of 256×256 pixels. This invention uses DiffusionForensics, Ojha, and images generated through 1000 diffusion steps as the test set for testing. The specific steps are as follows:
[0057] Step 1: Extraction of local high-frequency features.
[0058] After inputting an image to be detected, the module first extracts local high-frequency features. This module captures unique high-frequency anomalous features in the generated image by simulating the feature recombination and expansion operations of the diffusion model during image generation. Specifically, it uses gradient operators to extract edge and texture variation features in the image, and further simulates the low-resolution to high-resolution generation process of the diffusion model through downsampling and upsampling. During this process, the feature expansion error of the generated image will manifest as high-frequency residuals. By modeling these residuals, possible anomalous textures and artifacts in the generated image can be effectively reflected.
[0059] Step 2: Extraction of regional consistency features.
[0060] After extracting high-frequency features, the module extracts region consistency features from the image. This module captures discontinuities and consistency deviations between regions in the generated image by analyzing the differences between the smoothed features and the original features in local image regions. Specifically, Gaussian filtering is used to smooth the image, simulating the natural transitions between regions in a real image. Due to feature recombination and expansion during the generation process, generated images often exhibit abnormal enhancements or breaks between regions. By calculating the difference between the smoothed features and the original features, the perturbation characteristics between regions can be revealed, thereby further distinguishing between the real and generated images.
[0061] Step 3: Enhance and classify fake residual features.
[0062] The high-frequency features extracted in steps 1 and 2 are fused with the region consistency features, and a fake residual map is generated through nonlinear mapping. Specific features include... Figure 2As shown, the forged residual map further enhances the saliency of forged features in the generated image. Subsequently, these fused features are fed into a pruned and optimized ResNet classifier, which normalizes, convolves, and compresses the input features, ultimately outputting the probability of the image being forged. If the probability is greater than 0.5, the image is determined to be a forged image; otherwise, it is determined to be a real image.
[0063] Step 4: Model training and testing.
[0064] This invention implements the proposed detection model using the PyTorch deep learning framework, and the entire network is optimized through end-to-end training. During the training phase, the Adam optimizer is used to iteratively update the network parameters, with the binary cross-entropy loss function serving as a supervision signal to measure the error between the network output and the true image label. The learning rate is set to 0.0002, the batch size to 32, and the maximum number of training iterations to 40.
[0065] After training, the trained model was evaluated using DiffusionForensics, Ojha, and images generated through 1000 diffusion steps as a test set to verify the generalization detection capability of the method of the present invention for images generated by unknown diffusion models.
Claims
1. An AI-generated image detection method based on high-frequency reconstruction and region consistency perturbation, characterized in that, The method includes: (1) Construct a high-frequency reconstruction feature extraction model to extract unique high-frequency abnormal features in the generated image; (2) Construct a region consistency perturbation modeling network to capture the inter-region consistency deviation of the generated image; (3) Construct a fake residual feature fusion network to achieve nonlinear fusion of high-frequency features and regional consistency features, and generate a fake residual map to enhance the saliency of fake features; (4) Construct a detection module based on a lightweight classifier to classify fused features and output the forgery detection results of the image.
2. The AI-generated image detection method based on high-frequency reconstruction and region consistency perturbation according to claim 1, characterized in that, A high-frequency reconstruction feature extraction model is constructed to extract high-frequency anomalous features unique to the generated image. Specifically, this includes: Gradient features are extracted from the input image. The horizontal and vertical gradients are calculated using a Sobel filter. Local high-frequency details are removed by local average pooling and downsampling of the gradient magnitude map. The downsampled features are upsampled to restore the original resolution. The residual between the upsampled features and the original gradient magnitude map is calculated to extract multi-scale high-frequency anomaly features. The high-frequency residual features at different scales are concatenated to form a complete high-frequency reconstructed feature representation.
3. The AI-generated image detection method based on high-frequency reconstruction and region consistency perturbation according to claim 1, characterized in that, A region consistency perturbation modeling network is constructed to capture inter-region consistency deviations in generated images. Specifically, this includes: The input image is locally smoothed by applying a Gaussian filter to generate a smooth feature map. The difference between the input image and the smooth feature map is then calculated to quantify the texture and feature distribution deviations between regions. The input image is scaled to different resolutions, and the region consistency perturbation features at each scale are calculated separately. The multi-scale region consistency perturbation features are then stitched together to form a complete region consistency feature representation.
4. The AI-generated image detection method based on high-frequency reconstruction and region consistency perturbation according to claim 1, characterized in that, A forged residual feature fusion network is constructed to achieve nonlinear fusion of high-frequency features and region consistency features, and a forged residual map is generated to enhance the saliency of forged features. Specifically, this includes: High-frequency reconstruction features and region consistency perturbation features are concatenated along the channel dimension to form a preliminary fusion feature. The fusion feature is then nonlinearly mapped using a high-dimensional convolutional layer to expand it into a high-dimensional feature space. The high-dimensional feature is reduced in dimensionality by a compression convolutional layer to remove redundant information. The reduced-dimensional feature is then mapped to a single-channel feature space using a residual modeling convolutional layer to generate a fake residual map. The fake residual map is then concatenated with the preliminary fusion feature again to form the final enhanced comprehensive feature representation, which is used to detect fake features in the generated image.
5. The AI-generated image detection method based on high-frequency reconstruction and region consistency perturbation according to claim 1, characterized in that, A detection module based on a lightweight classifier is constructed to classify fused features and output forgery detection results for images. Specifically, this includes: The final enhanced comprehensive feature representation is input into a lightweight classifier. The classifier is based on a pruned and optimized ResNet architecture, which normalizes, convolves, and compresses the input features. It uses binary classification logic to output the forgery probability. When the forgery probability is greater than 0.5, the input image is determined to be a forgery image; otherwise, it is determined to be a real image.