Image tampering detection method based on multi-view feature extraction and bilateral edge contrast learning
By employing multi-view feature extraction and dual-edge contrast learning, this method addresses the issues of imprecise task definition and insufficient tampering clue mining in existing image tampering detection methods, achieving high-precision and high-generalization image tampering detection results.
Patent Information
- Authority / Receiving Office
- CN · China
- Patent Type
- Applications(China)
- Current Assignee / Owner
- ZHEJIANG GONGSHANG UNIVERSITY
- Filing Date
- 2026-05-14
- Publication Date
- 2026-06-19
AI Technical Summary
Existing image tampering detection methods suffer from imprecise task definition and insufficient tampering clue mining, resulting in inadequate detection accuracy and generalization ability.
We adopt a method based on multi-view feature extraction and dual-edge contrast learning. By constructing a dual-branch feature encoder in the spatial domain and noise domain, and combining it with a hybrid expert gating network, we extract multi-scale and heterogeneous features. Through the dual-edge contrast learning training strategy, we narrow the distance between features in the same group and widen the distance between features in different groups, thus solving the problems of category label conflict and incomplete discovery of tampering clues.
It significantly improves the accuracy and edge localization capability of image tampering detection, enhances the generalization ability of the model, and enables high-precision localization of tampered areas in complex scenarios.
Smart Images

Figure CN122244655A_ABST
Abstract
Description
Technical Field
[0001] This invention relates to the field of image content security technology in artificial intelligence, and in particular to an image tampering detection method based on multi-view feature extraction and dual-edge contrast learning. Background Technology
[0002] The spread of misinformation remains a significant threat to social security. In recent years, with the continuous upgrades of image editing tools such as Adobe Photoshop and Meitu Xiu Xiu, the operational threshold has been significantly lowered. Even users without professional image processing knowledge can easily create indistinguishable fake images. These forged images are widely posted on social media and are often maliciously used to spread misinformation and seek illicit gains. Such behavior not only greatly increases the risk of the public being deceived and causes economic losses to individuals, but also poses a serious challenge to social order and public safety. Therefore, developing an efficient and accurate image tampering detection technology has become an urgent research task with significant application value.
[0003] With the remarkable achievements of deep learning across various fields, a large number of image tampering detection technologies based on deep learning have emerged. Compared to traditional statistical methods, deep learning-based image tampering detection techniques often outperform traditional methods in both accuracy and applicability. In terms of problem modeling, some early non-deep learning methods employed a clustering approach. Existing deep learning methods define image tampering detection as a pixel-by-pixel classification problem, requiring the algorithm to classify each pixel of the input image into either a "tampered" or "untampered" category. Regarding tampering clue extraction, some methods have explored potential tampering clues in the image spatial domain, while others consider extracting potential tampering clues from different perspectives. These algorithms have explored the image tampering detection problem to some extent and achieved certain results.
[0004] However, current methods still have limitations. First, the definition of image tampering detection tasks in existing methods is not rigorous enough; simply defining it as a classification problem can lead to conflicts in category labels. Second, existing methods are insufficient in their ability to extract tampering clues. Although some methods propose ideas for extracting tampering clues from different aspects of the image, these approaches are still not comprehensive enough in practical design. Summary of the Invention
[0005] The purpose of this invention is to address the shortcomings of existing technologies by proposing an image tampering detection method based on multi-view feature extraction and dual-edge contrast learning.
[0006] The objective of this invention is achieved through the following technical solution: an image tampering detection method based on multi-view feature extraction and dual-edge contrast learning, comprising the following steps:
[0007] Construct an image dataset containing tampered labels, and preprocess the images in the dataset;
[0008] The preprocessed image is input into the spatial domain branch and the noise domain branch of the multi-view feature encoder, respectively, to extract multi-scale features in the spatial domain and heterogeneous features in the noise domain, and then feature fusion is performed to obtain a fused feature map.
[0009] The image is subjected to dilation and erosion operations on the tampered areas respectively, and the outer edge and inner edge masks are calculated based on the original label mask. The fused feature map is then grouped and labeled accordingly. Then, a dual-edge contrast learning training strategy is adopted to bring the feature distance in the same group closer and push the feature distance in different groups further apart, and the multi-view feature encoder is trained.
[0010] The image to be detected is input into the trained multi-view feature encoder. The clustering algorithm is used to classify the pixels according to the fused feature map. The predicted tampering mask is obtained according to the coordinates of the feature vector in the feature map.
[0011] Furthermore, the preprocessing of the dataset images includes: adjusting the three-channel images and the label mask images that can reflect the tampered areas in the dataset to a uniform size, normalizing the three-channel images, and performing data enhancement through random compression, scaling, blurring, and noise addition.
[0012] Furthermore, the spatial domain branch in the multi-view feature encoder includes: using a high-resolution network as the backbone network to extract multi-scale features of the image; using a hybrid expert gating network based on a multilayer perceptron structure to dynamically map the maximum scale features into weight coefficients for each scale; calculating feature residuals for features at adjacent scales based on the weight coefficients; and stacking the feature residuals with the maximum scale features according to the channel dimension to obtain the final spatial domain multi-scale features.
[0013] Furthermore, the noise domain branch in the multi-view feature encoder includes: extracting initial noise features of the image to be detected from multiple views, including the image to be detected. The noise fingerprint information, SRM features obtained after rich hidden write analysis model filter, Bayar features obtained after Bayar convolution, residual features before and after maximum pooling operation, residual features before and after average pooling operation, and high-frequency features obtained after Fourier frequency domain low-frequency mask filtering.
[0014] A hybrid expert gating network based on a multilayer perceptron structure is adopted to map the image to be detected into weight coefficients, and the last 5 initial noise features are dynamically weighted to obtain weighted noise features. The noise features under each viewpoint are stacked according to the channel dimension to obtain the final noise domain heterogeneous features.
[0015] Furthermore, the outer edge mask is calculated by subtracting the original label mask from the expanded mask after the expansion operation;
[0016] The inner edge mask is calculated by subtracting the erosion mask after the erosion operation from the original label mask.
[0017] Furthermore, the grouping and labeling of the fused feature map includes: using the erosion mask as a mask for non-edge tampering regions, and using the source image minus the dilation mask portion as a mask for non-edge tampering regions.
[0018] Furthermore, in the dual-edge contrast learning training strategy, the loss function includes:
[0019] In the fusion feature map of the image to be detected, the feature vectors corresponding to the pixel positions with a pixel value of 1 in the outer edge mask are used as positive samples, and the feature vectors corresponding to the pixel positions with a pixel value of 1 in the inner edge mask are used as negative samples. A contrastive learning loss is constructed for the inner and outer edge pixels.
[0020] On the fused feature map of the image to be detected, the feature vectors corresponding to the pixel positions with a pixel value of 1 in the non-edge tampered region mask are used as positive samples, and the feature vectors corresponding to the pixel positions with a pixel value of 1 in the non-edge untampered region mask are used as negative samples. A contrastive learning loss is constructed for non-edge pixels in tampered and untampered regions.
[0021] The weighted contrastive learning loss for inner and outer edge pixels is added to the contrastive learning loss for non-edge pixels in tampered and untampered regions to form the total loss in the dual-edge contrastive learning training strategy.
[0022] On the other hand, this specification also discloses an image tampering detection device based on multi-view feature extraction and dual-edge contrast learning, including a memory and one or more processors. The memory stores executable code, and when the processor executes the executable code, it implements the image tampering detection method based on multi-view feature extraction and dual-edge contrast learning.
[0023] On the other hand, this specification also discloses a computer-readable storage medium having a program stored thereon, which, when executed by a processor, implements the image tampering detection method based on multi-view feature extraction and dual-edge contrast learning.
[0024] The beneficial effects of this invention are as follows: This invention employs a dual-edge contrastive learning training strategy. By calculating inner and outer edge masks and utilizing contrastive learning loss to bring features within the same group closer together and push features from different groups further apart, it solves the category label conflict problem caused by the imprecise task definition in existing methods. Simultaneously, by constructing a dual-branch multi-view feature encoder in both the spatial and noise domains, and combining it with a hybrid expert gating network for adaptive extraction and fusion of multi-scale spatial features and multi-view heterogeneous noise features, it overcomes the technical limitations of existing methods in terms of insufficient strength and incomplete consideration of tampering clues. This invention significantly improves the accuracy of image tampering detection, edge localization capability, and model generalization, enabling high-precision tampering region localization in various complex tampering scenarios. Attached Figure Description
[0025] Figure 1 This is an overall framework diagram of the method provided in the embodiments of the present invention;
[0026] Figure 2 This is a diagram of the spatial domain and noise domain feature extraction structure provided in an embodiment of the present invention;
[0027] Figure 3 This is an example diagram of dual edge mask determination provided in an embodiment of the present invention;
[0028] Figure 4 These are comparison images of image tampering detection effects provided in embodiments of the present invention;
[0029] Figure 5 This is a schematic diagram of the device provided in an embodiment of the present invention. Detailed Implementation
[0030] The specific embodiments of the present invention will be further described in detail below with reference to the accompanying drawings.
[0031] like Figure 1 As shown, the present invention provides a
[0032] S1. Construct an image dataset. All three-channel images and the label mask images reflecting the tampered areas in the dataset are resized to 1024×1024, and the three-channel images are normalized. Furthermore, data augmentation techniques such as random compression, scaling, blurring, and noise addition are used to simulate image quality in real-world scenarios. Specific information about the training set data is shown in Table 1. The training dataset mainly consists of CASIA-v2, SP-COCO, CM-COCO, CM-RAISE, CM-C-RAISE, and IMD2020 datasets, totaling over 800,000 images. Among them, CASIA-v2 includes multi-source tampered image data, meaning that an image may contain multiple tampered regions from different source images; SP-COCO and CM-COCO are based on the large-scale object detection dataset COCO, featuring rich image scenes covering 80 categories of everyday objects, mainly including splicing, copy-and-move tampering types; CM-RAISE and CM-C-RAISE are based on the RAISE high-resolution image library, with relatively higher image resolution; IMD2020 collects a large number of tampered images from the internet, reflecting the distribution of tampered images in the real world. Specific information about the test set data is shown in Table 2. The datasets used in the tests mainly include NIST, Columbia, COVERAGE, CASIA-v1, and DSO datasets. These datasets each have different focuses: NIST and CASIA-v1 cover multiple tampering types, Columbia mainly focuses on splicing tampering, and COVERAGE and DSO focus more on copy-and-move tampering. At the same time, these datasets contain both tampered data of natural images and tampered data of human portraits, with a wide variety of scenes and tampering types, which can fully test the performance of the algorithms.
[0033] Table 1. Information on the training dataset used.
[0034]
[0035] Table 2. Information on the test dataset used.
[0036]
[0037] S2. Multi-view feature extraction: Extracting pre-processed images from the target image. The spatial domain branch and noise domain branch of the multi-view feature encoder are input separately. First, the spatial domain branch uses HRNet (High-Resolution Network) as the backbone network to extract multi-scale features of the image. The specific processing procedure is as follows: Figure 2 As shown in (a):
[0038] Extract the image to be detected Multiscale features , respectively corresponding , , and The four scales are defined by the following formulas:
[0039]
[0040] in, These represent four features with different spatial resolutions, and their feature scales decrease sequentially. For the largest scale feature, This indicates a spatial domain feature extraction network. This represents the image to be detected.
[0041] Using a hybrid expert gating network based on a multilayer perceptron structure, the largest scale features are... Dynamically mapped to weight coefficients at various scales The specific formula is as follows:
[0042]
[0043] Calculate the feature residuals for features at adjacent scales. The specific formula is as follows:
[0044]
[0045] in, , This represents the weighted characteristic residual. Indicates the weighting coefficient. Indicates an upsampling operation. Indicates the image to be detected Multiscale features.
[0046] Combine feature residuals with maximum scale features Stacking is performed along the channel dimension to obtain the final spatial domain multi-scale features. The specific formula is as follows:
[0047]
[0048] in, This indicates stacking by channel dimension. Indicates an upsampling operation. This represents the weighted characteristic residual.
[0049] Next, the specific processing procedure for the noise domain branch is as follows: Figure 2 As shown in (b):
[0050] Extracting the image to be detected from multiple perspectives initial noise characteristics , Specifically, it includes the following six types of noise features: the image to be detected Noise fingerprint information SRM features obtained after passing through the Steganalysis Rich Model (SRM) filter Bayar features obtained through Bayar convolution Residual characteristics before and after maximum pooling operation Residual characteristics before and after average pooling operation High-frequency features obtained after Fourier frequency domain low-frequency mask filtering The specific formula is as follows:
[0051]
[0052]
[0053]
[0054]
[0055]
[0056]
[0057] in, This represents the image to be detected. This indicates that the parameterized denoising operator can be implemented through a filtering model based on residual prediction or a pre-trained neural network, without being limited to a specific network structure; The variance parameter used to control the high-frequency noise capture intensity; This indicates stacking based on channel dimensions; The j-th convolution kernel in the filter bank (including but not limited to first-order residuals, second-order residuals, and tangent residual operators); This represents the total number of convolution kernels in the SRM filter bank; The convolution function representing Bayar noise modeling; Represents the max pooling function. This represents the average pooling function. This represents the inverse fast Fourier transform function. Represents the Fast Fourier Transform function. This represents the learnable mask parameters.
[0058] A hybrid expert gating network based on a multilayer perceptron structure is used to generate weight coefficients. The last five initial noise features are then dynamically weighted to obtain the weighted noise features. The specific formula is as follows:
[0059]
[0060]
[0061] in, This represents the last 5 initial noise features. .
[0062] By stacking the noise features from different perspectives along the channel dimension, the final heterogeneous noise domain features are obtained. The specific formula is as follows:
[0063]
[0064] in, This indicates stacking by channel dimension. This represents the noise fingerprint information of the image. This represents the weighted noise characteristics.
[0065] Finally, the image to be detected Spatial domain multi-scale features Heterogeneous features in the noise domain The images are fused together to obtain the image to be detected. Fusion feature map The specific formula is as follows:
[0066]
[0067] in, This represents a 1×1 convolution operation. This indicates stacking by channel dimension.
[0068] S3, dual-edge contrast learning, a specific example process is as follows: Figure 3 As shown: First, a double-edge mask is generated for the label mask image of the tampered area. The process involves dilation operations to obtain a dilation mask. And calculate the outer edge mask. The specific formula is as follows:
[0069]
[0070] And an erosion mask is obtained by performing an erosion operation on the label mask image. And calculate the inner edge mask. The specific formula is as follows:
[0071]
[0072] Simultaneously, based on the label mask image Calculate the mask of non-edge tampering regions. The specific formula is as follows:
[0073]
[0074] and non-edge, non-tamper-proof region mask The specific formula is as follows:
[0075]
[0076] Then, in the image to be detected Fusion feature map Above, based on outer edge mask Inner edge mask Non-edge tamper-evident region mask and non-edge, non-tamper-proof region mask Feature vectors are extracted from the pixels where the mask value is 1, and a contrastive learning loss function is constructed based on the extracted feature vectors to constrain the feature distribution. Specifically, in the image to be detected... Fusion feature map Above, mask the outer edge. The feature vector corresponding to the pixel position with a pixel value of 1 is used as a positive sample, and the inner edge mask is... The feature vector corresponding to the pixel position with a pixel value of 1 is used as a negative sample to construct a contrastive learning loss for inner and outer edge pixels. and To increase the distance between the outer edge feature vector and the inner edge feature vector in the feature space, the specific formula is as follows:
[0077]
[0078]
[0079] in, Indicates the outer edge mask The number of pixels with a value of 1; Indicates inner edge mask The number of pixels with a value of 1; Indicates the image to be detected Fusion feature map Top outer edge mask The feature vector corresponding to the i-th pixel position with a value of 1; Indicates the image to be detected Fusion feature map Upper inner edge mask The feature vector corresponding to the pixel position where the j-th pixel value is 1; Indicates the temperature coefficient; Represent the natural logarithm function; This represents an exponential function.
[0080] In the image to be detected Fusion feature map Above, the non-edge tampering area mask. The feature vector corresponding to the pixel position with a pixel value of 1 is used as a positive sample, and the non-edge, non-tampered region is masked. The feature vector corresponding to the pixel position with a pixel value of 1 is used as a negative sample to construct a contrastive learning loss for non-edge pixels in tampered and untampered regions. and To increase the distance between the feature vectors of non-edge tampered regions and non-edge untampered regions in the feature space, the specific formula is as follows:
[0081]
[0082]
[0083] in, Indicates a non-edge tampering region mask The number of pixels with a value of 1; Represents a non-edge, non-tampered region mask. The number of pixels with a value of 1; Indicates the image to be detected Fusion feature map Non-edge tampering region mask The feature vector corresponding to the i-th pixel position with a value of 1; Indicates the image to be detected Fusion feature map Non-edge non-tamper-proof region mask The feature vector corresponding to the pixel position where the j-th pixel value is 1; Indicates the temperature coefficient; Represent the natural logarithm function; This represents an exponential function.
[0084] Finally, construct the total loss function. ,in is a weighting coefficient used to adjust the model's focus on features in edge regions.
[0085] S4. Tampering Mask Prediction: First, the image to be detected... The resolution was adjusted to 1024×1024 according to the preprocessing standards during the training phase and input into the trained multi-view feature encoder. Features were extracted and fused through spatial domain and noise domain branches to obtain the final feature map. The K-means clustering algorithm is used, based on the fused feature map. The feature vectors of each pixel in the image are used to classify the pixels, and the number of clusters is set to 2. Based on the feature differences enhanced by the dual-edge contrast learning during the training phase, the clustering algorithm can effectively identify and merge pixel regions with inconsistent features within the image, thereby generating a prediction mask. Finally, the prediction mask is calculated. With label mask image The matching degree. Considering that the clustering process is unsupervised learning, its results may include label flipping (i.e., the predicted value is completely opposite to the true value). This invention adopts... (permuted F1) and Permuted IoU is used as a quantitative evaluation metric to obtain the maximum matching degree under different label mappings. The specific formula is as follows:
[0086]
[0087]
[0088]
[0089]
[0090] in, and These represent the set of prediction masks and the set of label masks involved in the calculation, respectively. This represents the intersection operation of sets; This represents the set union operation; Indicates the number of elements in the set; This represents the prediction mask generated by the clustering algorithm; This indicates that the prediction mask is inverted to simulate the prediction result after the label is flipped. Represents a label mask image; This indicates an operation to select the larger value.
[0091] To verify the effectiveness of this invention, it was compared with high-performing image tampering detection methods in recent years on the same dataset. Table 3 shows the performance comparison between classic image tampering localization algorithms and this invention, including PSCC-Net, Mesorch, SparseViT, SAFIRE, MPC, CoDE, CAT-Net, MVSS-Net, TruFor, and FMAE. For fair comparison, the Adam algorithm was used to optimize the model, with an initial learning rate of 0.0001 and a batch size of 3. The entire training process was performed on a single NVIDIA 3090 24GB graphics card.
[0092] Table 3. Information on the test dataset used.
[0093]
[0094] As shown in Table 3, the effectiveness and advancement of the present invention on multiple publicly available forgery detection datasets are verified. The method of the present invention achieves excellent performance in the pF1 and pIoU metrics, realizing high-precision tamper location. Therefore, the method of the present invention has outstanding value for technical promotion and practical application.
[0095] Figure 4 To compare the image tampering detection performance of this invention with classic image tampering localization algorithms, the algorithms used for comparison include MVSS-Net, CAT-Net, TruFor, CoDE, SparseVIT, and Mesorch. Figure 4 It can be intuitively observed that the image tampering detection effect of this invention is more accurate than that of other classic image tampering localization algorithms, which further proves that the image tampering detection method of this invention is more effective.
[0096] Corresponding to the aforementioned embodiment of an image tampering detection method based on multi-view feature extraction and dual-edge contrast learning, the present invention also provides an embodiment of an image tampering detection device based on multi-view feature extraction and dual-edge contrast learning.
[0097] See Figure 5 The present invention provides an image tampering detection device based on multi-view feature extraction and dual-edge contrast learning, comprising a memory and one or more processors. The memory stores executable code, and when the processor executes the executable code, it is used to implement an image tampering detection method based on multi-view feature extraction and dual-edge contrast learning as described in the above embodiment.
[0098] The embodiment of the image tampering detection device based on multi-view feature extraction and dual-edge contrast learning provided by this invention can be applied to any device with data processing capabilities, such as a computer. The device embodiment can be implemented in software, hardware, or a combination of both. Taking software implementation as an example, as a logical device, it is formed by the processor of any data processing device loading the corresponding computer program instructions from non-volatile memory into memory for execution. From a hardware perspective, such as... Figure 5 The diagram shown is a hardware structure diagram of any data processing device, including the image tampering detection device based on multi-view feature extraction and dual-edge contrast learning provided by this invention. (Except for...) Figure 4 In addition to the processor, memory, network interface, and non-volatile memory shown, any data processing device in the embodiment may also include other hardware depending on the actual function of the data processing device, which will not be described in detail here.
[0099] The specific implementation process of the functions and roles of each unit in the above device can be found in the implementation process of the corresponding steps in the above method, and will not be repeated here.
[0100] For the device embodiments, since they basically correspond to the method embodiments, the relevant parts can be referred to in the description of the method embodiments. The device embodiments described above are merely illustrative. The units described as separate components may or may not be physically separate, and the components shown as units may or may not be physical units, that is, they may be located in one place or distributed across multiple network units. Some or all of the modules can be selected to achieve the purpose of the present invention according to actual needs. Those skilled in the art can understand and implement this without creative effort.
[0101] This invention also provides a computer-readable storage medium storing a program thereon, which, when executed by a processor, implements an image tampering detection method based on multi-view feature extraction and dual-edge contrast learning as described in the above embodiments.
[0102] The computer-readable storage medium can be an internal storage unit of any data processing device described in any of the foregoing embodiments, such as a hard disk or memory. The computer-readable storage medium can also be an external storage device of any data processing device, such as a plug-in hard disk, smart media card (SMC), SD card, flash card, etc., equipped on the device. Furthermore, the computer-readable storage medium can include both internal storage units and external storage devices of any data processing device. The computer-readable storage medium is used to store the computer program and other programs and data required by the data processing device, and can also be used to temporarily store data that has been output or will be output.
[0103] The present invention also provides a computer program product, including a computer program that, when executed by a processor, implements the image tampering detection method based on multi-view feature extraction and dual-edge contrast learning.
[0104] Other embodiments of this application will readily occur to those skilled in the art upon consideration of the specification and practice of the disclosure herein. This application is intended to cover any variations, uses, or adaptations of this application that follow the general principles of this application and include common knowledge or customary techniques in the art not disclosed herein. The specification and embodiments are to be considered exemplary only, and the true scope and spirit of this application are indicated by the claims.
[0105] It should be understood that the foregoing general description and the following detailed description are exemplary and explanatory only, and are not intended to limit this application. This application is not limited to the precise structures described above and shown in the accompanying drawings, and various modifications and changes can be made without departing from its scope. The scope of this application is limited only by the appended claims.
Claims
1. An image tampering detection method based on multi-view feature extraction and dual-edge contrast learning, characterized in that, Includes the following steps: Construct an image dataset containing tampered labels, and preprocess the images in the dataset; The preprocessed image is input into the spatial domain branch and the noise domain branch of the multi-view feature encoder, respectively, to extract multi-scale features in the spatial domain and heterogeneous features in the noise domain, and then feature fusion is performed to obtain a fused feature map. The image is subjected to dilation and erosion operations on the tampered areas respectively, and the outer edge and inner edge masks are calculated based on the original label mask. The fused feature map is then grouped and labeled accordingly. Then, a dual-edge contrast learning training strategy is adopted to bring the feature distance in the same group closer and push the feature distance in different groups further apart, and the multi-view feature encoder is trained. The image to be detected is input into the trained multi-view feature encoder. The clustering algorithm is used to classify the pixels according to the fused feature map. The predicted tampering mask is obtained according to the coordinates of the feature vector in the feature map.
2. The image tampering detection method based on multi-view feature extraction and dual-edge contrast learning according to claim 1, characterized in that, The preprocessing of the dataset images includes: adjusting the three-channel images and the label mask images that can reflect the tampered areas in the dataset to a uniform size, normalizing the three-channel images, and performing data enhancement through random compression, scaling, blurring, and noise addition.
3. The image tampering detection method based on multi-view feature extraction and dual-edge contrast learning according to claim 1, characterized in that, The spatial domain branch of the multi-view feature encoder includes: using a high-resolution network as the backbone network to extract multi-scale features of the image; using a hybrid expert gating network based on a multilayer perceptron structure to dynamically map the maximum scale features into weight coefficients for each scale; calculating feature residuals for features at adjacent scales based on the weight coefficients; and stacking the feature residuals with the maximum scale features according to the channel dimension to obtain the final spatial domain multi-scale features.
4. The image tampering detection method based on multi-view feature extraction and dual-edge contrast learning according to claim 1, characterized in that, The noise domain branch in the multi-view feature encoder includes: extracting initial noise features of the image to be detected from multiple views, including the image to be detected. The noise fingerprint information, SRM features obtained after rich hidden write analysis model filter, Bayar features obtained after Bayar convolution, residual features before and after maximum pooling operation, residual features before and after average pooling operation, and high-frequency features obtained after Fourier frequency domain low-frequency mask filtering. A hybrid expert gating network based on a multilayer perceptron structure is adopted to map the image to be detected into weight coefficients, and the last 5 initial noise features are dynamically weighted to obtain weighted noise features. The noise features under each viewpoint are stacked according to the channel dimension to obtain the final noise domain heterogeneous features.
5. The image tampering detection method based on multi-view feature extraction and dual-edge contrast learning according to claim 1, characterized in that, The outer edge mask is calculated by subtracting the original label mask from the expanded mask after the expansion operation. The inner edge mask is calculated by subtracting the erosion mask after the erosion operation from the original label mask.
6. The image tampering detection method based on multi-view feature extraction and dual-edge contrast learning according to claim 1, characterized in that, The grouping and labeling of the fused feature map includes: using the erosion mask as the mask for non-edge tampering regions, and using the source image minus the dilation mask portion as the mask for non-edge non-tampering regions.
7. The image tampering detection method based on multi-view feature extraction and dual-edge contrast learning according to claim 6, characterized in that, In the dual-edge contrast learning training strategy, the loss function includes: In the fusion feature map of the image to be detected, the feature vectors corresponding to the pixel positions with a pixel value of 1 in the outer edge mask are used as positive samples, and the feature vectors corresponding to the pixel positions with a pixel value of 1 in the inner edge mask are used as negative samples. A contrastive learning loss is constructed for the inner and outer edge pixels. On the fused feature map of the image to be detected, the feature vectors corresponding to the pixel positions with a pixel value of 1 in the non-edge tampered region mask are used as positive samples, and the feature vectors corresponding to the pixel positions with a pixel value of 1 in the non-edge untampered region mask are used as negative samples. A contrastive learning loss is constructed for non-edge pixels in tampered and untampered regions. The weighted contrastive learning loss for inner and outer edge pixels is added to the contrastive learning loss for non-edge pixels in tampered and untampered regions to form the total loss in the dual-edge contrastive learning training strategy.
8. An image tampering detection device based on multi-view feature extraction and dual-edge contrast learning, comprising a memory and one or more processors, wherein the memory stores executable code, characterized in that, When the processor executes the executable code, it implements an image tampering detection method based on multi-view feature extraction and dual-edge contrast learning as described in any one of claims 1-7.
9. A computer-readable storage medium having a program stored thereon, characterized in that, When the program is executed by the processor, it implements an image tampering detection method based on multi-view feature extraction and dual-edge contrast learning as described in any one of claims 1-7.