Image detection methods, apparatus, computer equipment and storage media

What is AI technical title?
AI technical title is built by PatSnap AI team. It summarizes the technical point description of the patent document.
By extracting features from images using the attention convolutional module and pooling layer in the feature extraction network, and combining this with a classification network for classification, the problem of low accuracy in image tampering detection in existing technologies is solved, achieving higher detection accuracy.

CN117197086BActive Publication Date: 2026-06-30INDUSTRIAL AND COMMERCIAL BANK OF CHINA

View PDF 1 Cites 0 Cited by

Patent Information

Authority / Receiving Office: CN · China
Patent Type: Patents(China)
Current Assignee / Owner: INDUSTRIAL AND COMMERCIAL BANK OF CHINA
Filing Date: 2023-09-11
Publication Date: 2026-06-30

Application Information

Patent Timeline

11 Sep 2023

Application

30 Jun 2026

Publication

CN117197086B

IPC: G06T7/00; G06V10/40; G06V10/764; G06V10/82; G06N3/045; G06N3/0464; G06N3/08

AI Tagging

Technology Topics

Feature extraction Image detection

Explore More Agents

Novelty Search
Search existing technologies and assess novelty
↗
FTO
Analyze whether a product may infringe others' patents
↗
Design FTO
Check prior-design risk for exterior design
↗
Drafting
Draft patent application text based on a technical solution
↗
Find Solutions with TRIZ
Generate feasible solution to solve your technical challenge
↗

Similar Technology Patents

Get free access to AI patent search and analysis

Check patentability, review prior art and ask IP Agent with full patent context.

AI Technical Summary

Technical Problem

In existing technologies, increasing the depth of neural networks to improve the accuracy of image tampering detection is not very effective.

Method used

A feature extraction network, including an attention convolutional module and pooling layers, is used to extract features from images. This is combined with a classification network for classification processing to improve detection accuracy.

Benefits of technology

By introducing an attention convolutional module and a pooling layer into the feature extraction network, image features can be reflected more accurately, thus improving the accuracy of image tampering detection.

✦ Generated by Eureka AI based on patent content.

Smart Images

Figure CN117197086B_ABST

Patent Text Reader

Abstract

This application relates to an image detection method, apparatus, computer device, and storage medium, which can be applied to the field of artificial intelligence technology. The method acquires an image to be detected, then extracts features from the image using a feature extraction network to obtain a target feature map. The feature extraction network includes an attention convolutional module and pooling layers. Finally, a classification network classifies the target feature map to obtain a detection result for the image to be detected. The detection result indicates whether the image to be detected has been tampered with or not. This image detection method, through attention convolutional modules and pooling layers, extracts features from the image to be detected, making the obtained target feature map more accurately reflect the image's features, thus improving the accuracy of image tampering detection.

Need to check novelty before this filing date? Find Prior Art

Description

Technical Field

[0001] This application relates to the field of artificial intelligence technology, and in particular to an image detection method, apparatus, computer device, and storage medium. Background Technology

[0002] Currently, many transaction and business scenarios utilize a large amount of image data. If this image data is tampered with, it can significantly impact the security of transactions or business operations. With continuous technological advancements, image processing technology has become increasingly sophisticated, leading to the emergence of numerous image editing software programs. These programs possess powerful image editing capabilities, allowing for the manipulation of images to a level that is often indistinguishable from the real thing to the naked eye. Therefore, image tampering detection is necessary.

[0003] Current technologies typically employ detection models to detect images, and improve accuracy by continuously increasing the depth of the neural network. However, increasing the depth of the neural network does not significantly improve detection accuracy. Summary of the Invention

[0004] Therefore, it is necessary to provide an image detection method, apparatus, computer equipment, and storage medium to address the aforementioned technical problems and improve the accuracy of image tampering detection.

[0005] Firstly, this application provides an image detection method. The method includes:

[0006] Acquire the image to be detected;

[0007] The feature extraction network extracts features from the image to be detected to obtain the target feature map of the image to be detected; wherein the feature extraction network includes an attention convolutional module and a pooling layer;

[0008] The target feature map is classified using a classification network to obtain the detection result of the image to be detected; wherein the detection result indicates that the image to be detected has been tampered with or has not been tampered with.

[0009] In one embodiment, the feature extraction network includes at least two convolutional pooling layers connected end-to-end, each convolutional pooling layer including an attention convolutional module and a pooling layer.

[0010] In one embodiment, the step of extracting features from the image to be detected using a feature extraction network to obtain a target feature map of the image to be detected includes:

[0011] Feature extraction is performed on the input information of each convolutional pooling layer, and the output of the last convolutional pooling layer is used as the target feature map of the image to be detected; wherein, the input information of the first convolutional pooling layer is the image to be detected, and the input information of any other convolutional pooling layer is the output of the convolutional pooling layer above it.

[0012] In one embodiment, feature extraction is performed on the input information through each convolutional pooling layer, including:

[0013] For each convolutional pooling layer, the input information of the convolutional pooling layer is used to extract features through the attention convolutional module contained in the convolutional pooling layer to obtain the basic feature map;

[0014] The basic feature map is compressed by the pooling layer contained in the convolutional pooling layer to obtain the output result of the convolutional pooling layer.

[0015] In one embodiment, the attention convolution module includes a first convolution module, a second convolution module, a convolutional attention module, and a joint feature convolution module; the input information includes first input information and second input information; and the basic feature map includes a first basic feature map and a second basic feature map.

[0016] The attention convolution module included in the convolutional pooling layer extracts features from the input information of the convolutional pooling layer to obtain a basic feature map, including:

[0017] The first input information of the convolutional pooling layer is extracted using the first convolutional module and the convolutional attention module to obtain the first basic feature map.

[0018] The second convolution module extracts features from the second input information of the convolutional pooling layer to obtain a first intermediate feature map.

[0019] The first basic feature map and the first intermediate feature map are fused using the joint feature convolution module to obtain the second basic feature map.

[0020] In one embodiment, the step of extracting features from the first input information of the convolutional pooling layer through the first convolutional module and the convolutional attention module to obtain a first basic feature map includes:

[0021] The first convolution module extracts features from the first input information of the convolutional pooling layer to obtain a second intermediate feature map.

[0022] The convolutional attention module extracts channel and spatial features from the second intermediate feature map to obtain the first basic feature map.

[0023] Secondly, this application also provides an image detection apparatus. The apparatus includes:

[0024] The acquisition module is used to acquire the image to be detected;

[0025] An extraction module is used to extract features from the image to be detected through a feature extraction network to obtain a target feature map of the image to be detected; wherein, the feature extraction network includes an attention convolution module and a pooling layer;

[0026] The classification module is used to classify the target feature map through a classification network to obtain the detection result of the image to be detected; wherein the detection result is that the image to be detected has been tampered with or the image to be detected has not been tampered with.

[0027] Thirdly, this application also provides a computer device. The computer device includes a memory and a processor, the memory storing a computer program, and the processor executing the computer program to perform the following steps:

[0028] Acquire the image to be detected;

[0029] The feature extraction network extracts features from the image to be detected to obtain the target feature map of the image to be detected; wherein the feature extraction network includes an attention convolutional module and a pooling layer;

[0030] The target feature map is classified using a classification network to obtain the detection result of the image to be detected; wherein the detection result indicates that the image to be detected has been tampered with or has not been tampered with.

[0031] Fourthly, this application also provides a computer-readable storage medium. The computer-readable storage medium stores a computer program thereon, which, when executed by a processor, performs the following steps:

[0032] Acquire the image to be detected;

[0033] The feature extraction network extracts features from the image to be detected to obtain the target feature map of the image to be detected; wherein the feature extraction network includes an attention convolutional module and a pooling layer;

[0034] The target feature map is classified using a classification network to obtain the detection result of the image to be detected; wherein the detection result indicates that the image to be detected has been tampered with or has not been tampered with.

[0035] Fifthly, this application also provides a computer program product. The computer program product includes a computer program that, when executed by a processor, performs the following steps:

[0036] Acquire the image to be detected;

[0037] The feature extraction network extracts features from the image to be detected to obtain the target feature map of the image to be detected; wherein the feature extraction network includes an attention convolutional module and a pooling layer;

[0038] The target feature map is classified using a classification network to obtain the detection result of the image to be detected; wherein the detection result indicates that the image to be detected has been tampered with or has not been tampered with.

[0039] The aforementioned image detection method, apparatus, computer equipment, and storage medium, by introducing a feature extraction network containing attention convolution modules and pooling layers, extract features from the image to be detected. This makes the target feature map of the image to be detected more accurately reflect the features of the image. Furthermore, by using a classification network to classify the target feature map, the results are more accurate, thus accurately determining whether the image to be detected has been tampered with, thereby improving the accuracy of image tampering detection. Attached Figure Description

[0040] Figure 1 This is an application environment diagram of the image detection method in one embodiment;

[0041] Figure 2 This is a flowchart illustrating an image detection method in one embodiment;

[0042] Figure 3 This is a schematic diagram of the feature extraction network structure in one embodiment;

[0043] Figure 4 This is a schematic diagram of the attention convolution module in one embodiment;

[0044] Figure 5 This is a schematic diagram of the process for obtaining a basic feature map in one embodiment;

[0045] Figure 6 This is a schematic diagram of the structure of the convolutional attention module in one embodiment;

[0046] Figure 7 This is a schematic diagram of the structure of an image detection model in one embodiment;

[0047] Figure 8 This is a schematic diagram of the structure of an image detection device in one embodiment;

[0048] Figure 9 This is an internal structural diagram of a computer device in one embodiment. Detailed Implementation

[0049] To make the objectives, technical solutions, and advantages of this application clearer, the following detailed description is provided in conjunction with the accompanying drawings and embodiments. It should be understood that the specific embodiments described herein are merely illustrative and not intended to limit the scope of this application.

[0050] The image detection method provided in this application is applicable to scenarios involving detecting whether an image has been tampered with. This method can be executed by a server, by a terminal with powerful computing capabilities, or through interaction between the terminal and the server. For example, Figure 1 This is an application environment diagram of an image detection method provided in this embodiment. The terminal 102 can send an image to be detected to the server 104 via a network. The server 104 may integrate an image detection model, which includes a feature extraction network and a classification network. The server 104 can then use these two networks to detect tampering in the image. Optionally, the terminal 102 can be, but is not limited to, various personal computers, laptops, smartphones, tablets, IoT devices, and portable wearable devices. IoT devices can include smart speakers, smart TVs, smart air conditioners, smart in-vehicle devices, etc. Portable wearable devices can include smartwatches, smart bracelets, head-mounted devices, etc. The server 104 can be implemented using a standalone server or a server cluster consisting of multiple servers.

[0051] In one embodiment, such as Figure 2 As shown, an image detection method is provided, which is applied to... Figure 2 Taking server 104 as an example, the following steps are included:

[0052] S201, Obtain the image to be detected.

[0053] In this embodiment of the application, the image to be detected can be understood as an image used to detect whether it has been tampered with.

[0054] Optionally, there can be many ways to acquire the image to be detected, and this application embodiment does not limit this. For example, one possible implementation is that when the terminal 102 initiates an image tampering detection request to the server 104, the terminal 102 sends the image to be detected to the server 104; another possible implementation is that when the server 104 detects that there is an image tampering detection request, it acquires the image to be detected from a specified storage path, etc.

[0055] S202, through the feature extraction network, performs feature extraction on the image to be detected, and obtains the target feature map of the image to be detected.

[0056] Optionally, a feature extraction network is used to extract feature information from the image to be detected. Further, the feature extraction network includes attention convolutional modules and pooling layers.

[0057] For example, after obtaining the image to be detected, the image can be used to extract features through the attention convolution module in the feature extraction network, and the extracted features can be compressed using the pooling layer in the feature extraction network to obtain the target feature map of the image to be detected.

[0058] S203 uses a classification network to classify the target feature map and obtain the detection result of the image to be detected.

[0059] Optionally, the classification network can output the classification result corresponding to the feature information based on the feature information.

[0060] For example, a classification network can be used to classify the target feature map to obtain the detection result of the image to be detected; further, the detection result can indicate whether the image to be detected has been tampered with or not. That is, the detection result is used to indicate whether the image to be detected is a tampered image or an untampered image.

[0061] The image detection method described above introduces a feature extraction network containing attention convolutional modules and pooling layers to extract features from the image to be detected. This makes the target feature map of the image to be detected more accurately reflect the features of the image. Then, the classification network further classifies the target feature map more accurately, thus accurately determining whether the image to be detected has been tampered with, thereby improving the accuracy of image tampering detection.

[0062] In one embodiment, see Figure 3 , Figure 3 This is a schematic diagram of a feature extraction network provided in an embodiment of this application. The feature extraction network includes at least two convolutional pooling layers connected end-to-end. Each convolutional pooling layer includes an attention convolutional module and a pooling layer. In this way, the convolutional pooling layers in the feature extraction network perform layer-by-layer feature extraction on the image to be detected, making the target feature map of the image more reflective of the image's features, thereby improving the accuracy of the detection results.

[0063] exist Figure 3 Based on this, the embodiments of this application provide an implementation method for obtaining the target feature map of the image to be detected. Specifically, it can be: extracting features from the input information of each convolutional pooling layer, and using the output result of the last convolutional pooling layer as the target feature map of the image to be detected; wherein, the input information of the first convolutional pooling layer is the image to be detected, and the input information of any other convolutional pooling layer is the output result of the previous convolutional pooling layer.

[0064] The following section will provide a detailed explanation using a feature extraction network with two convolutional pooling layers as an example.

[0065] The first convolutional pooling layer uses an attention convolutional module to extract features from the input image to be detected. The extracted features are then compressed by the pooling layer within the first convolutional pooling layer. This compressed feature information is used as the output of the first convolutional pooling layer and as the input to the second convolutional pooling layer. The attention convolutional module in the second convolutional pooling layer extracts features from the output of the first convolutional pooling layer and compresses the extracted features by the pooling layer, yielding the output. Since the second convolutional pooling layer is the last one in this embodiment, its output is used as the target feature map of the image to be detected.

[0066] In this embodiment, the convolutional pooling layers in the feature extraction network perform layer-by-layer feature extraction on the image to be detected, so that the target feature map of the image to be detected can better reflect the features of the image, thereby improving the accuracy of the detection result of the image to be detected.

[0067] In one embodiment, the convolutional pooling layer in this application embodiment may consist of an attention convolutional module and a pooling layer. When extracting features from the input information through each convolutional pooling layer, for each convolutional pooling layer, the attention convolutional module contained in that convolutional pooling layer can be used to extract features from the input information of that convolutional pooling layer to obtain a basic feature map; then, the pooling layer contained in that convolutional pooling layer can be used to compress the basic feature map to obtain the output result of that convolutional pooling layer.

[0068] Optionally, in combination with the above Figure 3 In the illustrated embodiment, when the feature extraction network includes two contiguous convolutional pooling layers, the attention convolutional module in the first convolutional pooling layer extracts features from the input information of the first convolutional pooling layer to obtain a basic feature map. Then, the basic feature map is compressed through the pooling layers contained in the first convolutional pooling layer to obtain the output of the first convolutional pooling layer. The output of the first convolutional pooling layer is then input into the second convolutional pooling layer. The attention convolutional module in the second convolutional pooling layer extracts features from the output information of the first convolutional pooling layer to also obtain a basic feature map. Then, the basic feature map obtained by the attention convolutional module in the second convolutional pooling layer is compressed through the pooling layers contained in the second convolutional pooling layer to obtain the output of the second convolutional pooling layer.

[0069] In this embodiment, the image features are extracted layer by layer by the attention convolution module in each convolution pooling layer, so that the target feature map of the image to be detected can better reflect the features of the image, thereby improving the accuracy of the detection result of the image to be detected.

[0070] To improve the accuracy of image detection results, the structure of the feature extraction network is further refined. In one embodiment, see... Figure 4 , Figure 4 This application provides a schematic diagram of the structure of an attention convolution module according to an embodiment. The attention convolution module in this embodiment can be composed of a first convolution module, a second convolution module, a convolutional attention module, and a joint feature convolution module. The connection relationships between these modules are shown below. Figure 4 .

[0071] For example, the first convolutional module can be the VGG convolutional module in the VGG-16 model, and its kernel parameters can be historical empirical parameters; the second convolutional module can be a regular convolutional module, and its kernel parameters can be randomly generated by random numbers; the kernel parameters of the convolutional attention module and the joint feature convolutional module can also be randomly generated by random numbers.

[0072] Optionally, in this embodiment, the input information fed into the convolutional pooling layer includes first input information and second input information; the obtained basic feature map includes a first basic feature map and a second basic feature map. Furthermore, in... Figure 4 Based on, combined Figure 5 For any convolutional pooling layer, when extracting features from the input information of that convolutional pooling layer using the attention convolutional module contained within that layer to obtain the basic feature map, the specific steps can be as follows:

[0073] S501, through the first convolution module and the convolutional attention module, the first input information of the convolutional pooling layer is used to extract features to obtain the first basic feature map.

[0074] S502, through the second convolution module, the second input information of the convolutional pooling layer is used to extract features to obtain the first intermediate feature map.

[0075] S503 uses a joint feature convolution module to fuse the first basic feature map and the first intermediate feature map to obtain the second basic feature map.

[0076] For example, the first input information of the convolutional pooling layer can be used to extract features through the first convolutional module and the convolutional attention module in the convolutional pooling layer to obtain a first basic feature map; at the same time, the second input information of the convolutional pooling layer can be used to extract features through the second convolutional module in the convolutional pooling layer to obtain a first intermediate feature map; then the second feature map and the first intermediate feature map are input into the joint feature convolution module, and the joint feature convolution module fuses the first basic feature map and the first intermediate feature map to obtain the second basic feature map.

[0077] It's important to note that each convolutional pooling layer includes an attention convolutional module. That is, the number of convolutional pooling layers is the same as the number of attention convolutional modules. When this is the first convolutional pooling layer, the attention convolutional module within it is also the first attention convolutional module. In this case, the first and second input information are the same, both being the image to be detected. Correspondingly, other convolutional pooling layers correspond to other attention convolutional modules. The first and second input information input to any other attention convolutional module are not the same. The first input information input to any other attention convolutional module is the feature map compressed by the pooling layer after the first basic feature map obtained from the previous attention convolutional module; the second input information input to any other attention convolutional module is the feature map compressed by the pooling layer after the second basic feature map obtained from the previous attention convolutional module.

[0078] In this embodiment, image features are extracted layer by layer through attention convolution modules in each convolutional pooling layer. Each attention convolution module has two branches. The first branch obtains a first basic feature map through the first convolution module and the convolutional attention module. The second branch obtains a first intermediate feature map through the second convolution module. The first basic feature map and the intermediate feature map are then fused through the joint feature convolution module to obtain a second basic feature map. In this way, the obtained first and second basic feature maps better reflect the features of the image, thereby improving the accuracy of the detection results of the image to be detected.

[0079] In one embodiment, to improve the accuracy of image detection results, the structure of the convolutional attention module is further refined. See also Figure 6 , Figure 6 This is a schematic diagram of a convolutional attention module provided in an embodiment of this application. The convolutional attention module in this embodiment can be composed of a channel attention module and a spatial attention module.

[0080] Optionally, the channel attention module can use a 3D template with a height and width of 1 and a depth equal to the number of channels in the input feature map. The spatial attention module can also use a 3D template with a height and width equal to the height and width of the input feature map and a depth of 1.

[0081] Optionally, the expression for the convolutional attention module is as follows:

[0082]

[0083]

[0084] Where F represents the input feature map, M c M represents the channel attention operation. s This represents spatial attention operations. This indicates that the corresponding pixels are multiplied.

[0085] Optionally, when extracting features from the first input information of the convolutional pooling layer through the first convolutional module and the convolutional attention module to obtain the first basic feature map, the first convolutional module can be used to extract features from the first input information of the convolutional pooling layer to obtain the second intermediate feature map; then, the convolutional attention module can be used to extract channel features and spatial features from the second intermediate feature map to obtain the first basic feature map.

[0086] Specifically, the first convolutional module extracts features from the first input information of the convolutional pooling layer to obtain a second intermediate feature map. Then, the second intermediate feature map is input into the channel attention module, and the feature map output by the channel attention module is multiplied by the second intermediate feature map at corresponding pixels to obtain a channel feature map. Then, the channel feature map is input into the spatial attention module, and the feature map output by the spatial attention module is multiplied by the channel feature map at corresponding pixels to obtain a spatial feature map. This spatial feature map is used as the first basic feature map.

[0087] In some embodiments, to improve the efficiency of feature extraction by the channel attention module, the feature image can be compressed in the spatial dimension using max pooling and average pooling layers before feature extraction by the channel attention module; similarly, to improve the efficiency of feature extraction by the spatial attention module, the feature image can be compressed in the channel dimension using max pooling and average pooling layers before feature extraction by the channel attention module.

[0088] This application embodiment sets up a channel attention module and a spatial attention module in the convolutional attention module, and uses the channel attention module and spatial attention module to extract features from the second intermediate feature map, so that the obtained first basic feature map has richer feature content and can better reflect the features of the image, thereby enabling the use of a shorter network model to obtain more accurate detection results.

[0089] In one embodiment, see Figure 7 , Figure 7This is a schematic diagram of the structure of an image detection model provided in an embodiment of this application. The feature extraction network in this image detection model includes five convolutional pooling layers connected end-to-end. The image detection model in this embodiment of the application designs five attention convolutional modules, each followed by a pooling layer. Finally, a fully connected layer is used to summarize the features. This embodiment of the application uses the VGG-16 network model as the base network. The 16 in VGG-16 refers to the fact that this network contains 16 convolutional layers. The structure of VGG-16 is relatively simple, and this network structure is very regular, with several convolutional layers followed by pooling layers that can compress the image size. At the same time, the number of filters in the convolutional layers changes in a certain pattern, doubling from 64 to 128, then to 256 and 512. This model can extract image content well in shallow layers, so it is often used as a basic architecture. However, its main drawback is that the number of features required for training is very large. Therefore, in this embodiment, only the first 5 convolutional and pooling layers of VGG-16 are selected as the basic architecture of the downsampling module, and the convolutional layers in the downsampling module are modified by adding an attention mechanism, which is called the attention convolutional module.

[0090] It should be understood that the dataset needs to be cropped and standardized in size before training the image detection model. Furthermore, to further improve the robustness of the trained model, enhance its generalization ability, and avoid overfitting, offline image enhancement techniques can be used to process the training data; that is, image data is processed before model training to form a fixed dataset. The processing strategy is as follows:

[0091] Rotation variations: Rotate the image by 90 degrees, 180 degrees, and 270 degrees to simulate images taken from different angles.

[0092] Color jittering: Randomly enhances image chroma, saturation, and contrast to simulate image data under different lighting conditions.

[0093] Sharpening: Enhances image edge contours to simulate images with varying degrees of sharpness.

[0094] When training an image detection model, a loss function can be used to calculate the difference between the detection result and the true result, so as to further adjust the model parameters and improve the detection accuracy of the model.

[0095] Specifically, in this embodiment, a binary classification cross-entropy loss function can be used to calculate the difference. First, the predicted probability output by the classification function is denoted as y', and the other predicted probability is denoted as 1-y'. The formula for calculating the difference is:

[0096]

[0097] Where L represents the difference between the predicted result and the actual result, y' represents the predicted probability, y represents the actual result, and N represents the number of samples.

[0098] After obtaining the difference value, the parameters are adjusted using the backpropagation algorithm to reduce the difference value. The formula for the backpropagation algorithm is as follows:

[0099]

[0100] Where w' represents the adjusted parameters, α represents the learning rate, and w represents the parameters of the current detection model. The derivative of w is as follows:

[0101]

[0102] Where L represents the difference value, s represents the activation function, and y' represents the prediction probability.

[0103] In this way, the image detection model can adjust its parameters according to the adjusted parameters w', thereby improving the detection accuracy of the image detection model.

[0104] The following is combined Figure 7 The image detection method provided in the embodiments of this application is described below. The method specifically includes the following steps:

[0105] Step 1: Obtain the image to be detected.

[0106] Optionally, there can be many ways to obtain the image to be detected, and this application embodiment does not limit this. For example, one possible implementation is that when the terminal 102 initiates an image tampering detection request to the server 104, the terminal 102 sends the image to be detected to the server 104; another possible implementation is that when the server 104 detects that there is an image tampering detection request, it obtains the image to be detected from the specified storage path.

[0107] Step 2: Input the image to be detected into attention convolution module 1 to obtain the first basic feature map and the second basic feature map.

[0108] The image to be detected is input into the attention convolution module 1 of the image detection model. The attention convolution module 1 includes a first convolution module, a second convolution module, a convolutional attention module, and a joint feature convolution module. The first convolution module in the attention convolution module 1 extracts features from the input image to obtain a second intermediate feature map. Then, the second intermediate feature map is input into the channel attention module in the convolutional attention module, and the feature map output by the channel attention module is multiplied by the second intermediate feature map at corresponding pixels to obtain a channel feature map. The channel feature map is then input into the spatial attention module in the convolutional attention module, and the feature map output by the spatial attention module is multiplied by the channel feature map at corresponding pixels to obtain a spatial feature map, which is used as the first basic feature map.

[0109] Similarly, the second convolutional module also extracts features from the image to be detected input into the attention convolutional module 1 to obtain the first intermediate feature map; then, through the joint feature convolutional module, the first basic feature map and the first intermediate feature map are fused to obtain the second basic feature map.

[0110] Step 3: Input the first basic feature map and the second basic feature map into pooling layer 1.

[0111] The first and second basic feature maps are input into pooling layer 1. Pooling layer 1 compresses the first and second basic feature maps so that they can be processed by the next convolutional pooling layer for feature extraction.

[0112] Step four: The compressed result of the first basic feature map after pooling layer 1 is input into attention convolution module 2, and the compressed result of the second basic feature map after pooling layer 1 is also input into attention convolution module 2. Attention convolution module 2 also includes a first convolution module, a second convolution module, a convolutional attention module, and a joint feature convolution module. Through the first convolution module in attention convolution module 2, feature extraction is performed on the compressed result of the first basic feature map after pooling layer 1 to obtain a third intermediate feature map; then the third intermediate feature map is input into the channel attention module in the convolutional attention module, and the feature map output by the channel attention module is multiplied with the third intermediate feature map at corresponding pixels to obtain a channel feature map; then the channel feature map is input into the spatial attention module in the convolutional attention module, and the feature map output by the spatial attention module is multiplied with the channel feature map at corresponding pixels to obtain a spatial feature map, which is used as the third basic feature map.

[0113] Similarly, the second convolutional module will also extract features from the compressed result of the second basic feature map input to the attention convolutional module 2 after being compressed by the pooling layer 1 to obtain the fourth intermediate feature map; then the terminal will fuse the fourth basic feature map and the fourth intermediate feature map through the joint feature convolutional module to obtain the fourth basic feature map.

[0114] Step 5: Input the third and fourth basic feature maps into pooling layer 2.

[0115] The terminal inputs the third and fourth basic feature maps into pooling layer 2. Pooling layer 2 compresses the third and fourth basic feature maps so that they can be processed by the next convolutional pooling layer for feature extraction.

[0116] Step six: Correspondingly, through the subsequent three convolutional pooling layers, feature extraction is performed on the output result corresponding to pooling layer 2 to obtain the target feature map.

[0117] The feature extraction operations of the latter three convolutional pooling layers in this embodiment are the same as the operation logic of the first two convolutional pooling layers described in the above embodiment. Therefore, the specific operations of the latter three convolutional pooling layers will not be described in detail here.

[0118] Step 7: Classify the target feature map using a classification network to obtain the detection result of the image to be detected.

[0119] The classification function of the classification network can be the Sigmoid function. The classification network classifies the target feature map to obtain the detection result of the image to be detected. This detection result is used to indicate whether the image to be detected has been tampered with.

[0120] Optionally, the detection results for the image to be detected include two categories: image tampered with and image not tampered with. A classification network is used to classify the target feature map to obtain the detection results for the image to be detected. Specifically, a classification function can be set to output the predicted probability corresponding to the classification result. Then, it is determined whether the predicted probability is greater than a preset probability threshold, and the detection result is output based on the comparison result. For example, the classification function can be set to output the predicted probability corresponding to image tampering, and the prediction probability threshold can be set to 95%. Thus, when the predicted probability corresponding to image tampering output by the classification function is greater than 95%, the detection result is that the image has been tampered with; otherwise, the detection result is that the image has not been tampered with.

[0121] For example, pooling layers 1-5 in the above embodiments can be 2x2 max pooling layers.

[0122] The specific processes of steps one through seven above can be found in the description of the above method embodiments. Their implementation principles and technical effects are similar, and will not be repeated here.

[0123] It should be understood that although the steps in the flowcharts of the embodiments described above are shown sequentially according to the arrows, these steps are not necessarily executed in the order indicated by the arrows. Unless explicitly stated herein, there is no strict order restriction on the execution of these steps, and they can be executed in other orders. Moreover, at least some steps in the flowcharts of the embodiments described above may include multiple steps or multiple stages. These steps or stages are not necessarily completed at the same time, but can be executed at different times. The execution order of these steps or stages is not necessarily sequential, but can be performed alternately or in turn with other steps or at least some of the steps or stages of other steps.

[0124] Based on the same inventive concept, this application also provides an image detection apparatus for implementing the image detection method described above. The solution provided by this apparatus is similar to the implementation described in the above method; therefore, the specific limitations in one or more image detection apparatus embodiments provided below can be found in the limitations of the image detection method described above, and will not be repeated here.

[0125] In one embodiment, such as Figure 8 As shown, an image detection device 1 is provided, comprising:

[0126] Acquisition module 10 is used to acquire the image to be detected;

[0127] The extraction module 20 is used to extract features from the image to be detected through a feature extraction network to obtain the target feature map of the image to be detected; wherein, the feature extraction network includes an attention convolution module and a pooling layer;

[0128] The classification module 30 is used to classify the target feature map through a classification network to obtain the detection result of the image to be detected; wherein the detection result is whether the image to be detected has been tampered with or has not been tampered with.

[0129] The aforementioned image detection device, by introducing a feature extraction network including attention convolutional modules and pooling layers, extracts features from the image to be detected. This results in a target feature map that more accurately reflects the image's characteristics. Furthermore, a classification network performs more precise classification on the target feature map, thus accurately determining whether the image has been tampered with, thereby improving the accuracy of image tampering detection.

[0130] In one embodiment, the feature extraction network includes at least two convolutional pooling layers connected end-to-end, each convolutional pooling layer including an attention convolutional module and a pooling layer.

[0131] In one embodiment, the extraction module 20 is specifically used for:

[0132] Feature extraction is performed on the input information of each convolutional pooling layer, and the output of the last convolutional pooling layer is used as the target feature map of the image to be detected. The input information of the first convolutional pooling layer is the image to be detected, and the input information of any other convolutional pooling layer is the output of the convolutional pooling layer above it.

[0133] In one embodiment, the extraction module 20 specifically includes an extraction unit and a compression unit;

[0134] The extraction unit is used to extract features from the input information of each convolutional pooling layer through the attention convolutional module contained in that convolutional pooling layer to obtain a basic feature map.

[0135] The compression unit is used to compress the basic feature map through the pooling layer contained in the convolutional pooling layer to obtain the output result of the convolutional pooling layer.

[0136] In one embodiment, the attention convolution module includes a first convolution module, a second convolution module, a convolutional attention module, and a joint feature convolution module; the input information includes first input information and second input information; and the basic feature map includes a first basic feature map and a second basic feature map.

[0137] The extraction unit specifically includes a first extraction subunit, a second extraction subunit, and a fusion subunit;

[0138] The first extraction subunit is specifically used to extract features from the first input information of the convolutional pooling layer through the first convolutional module and the convolutional attention module to obtain the first basic feature map.

[0139] The second extraction subunit is specifically used to extract features from the second input information of the convolutional pooling layer through the second convolution module to obtain the first intermediate feature map.

[0140] The fusion subunit is specifically used to fuse the first basic feature map and the first intermediate feature map through the joint feature convolution module to obtain the second basic feature map.

[0141] In one embodiment, the first extraction subunit is specifically used for:

[0142] The first convolution module extracts features from the first input information of the convolutional pooling layer to obtain a second intermediate feature map; the convolutional attention module extracts channel features and spatial features from the second intermediate feature map to obtain a first basic feature map.

[0143] Each module in the aforementioned image detection device can be implemented entirely or partially through software, hardware, or a combination thereof. These modules can be embedded in or independent of the processor in a computer device, or stored in the memory of a computer device as software, so that the processor can call and execute the operations corresponding to each module.

[0144] In one embodiment, a computer device is provided, which may be a server, and its internal structure diagram may be as follows: Figure 9 As shown, the computer device includes a processor, memory, and a network interface connected via a system bus. The processor provides computational and control capabilities. The memory includes non-volatile storage media and internal memory. The non-volatile storage media stores the operating system, computer programs, and a database. The internal memory provides an environment for the operation of the operating system and computer programs stored in the non-volatile storage media. The database stores image data. The network interface communicates with external terminals via a network connection. When executed by the processor, the computer program implements an image detection method.

[0145] Those skilled in the art will understand that Figure 9 The structure shown is merely a block diagram of a portion of the structure related to the present application and does not constitute a limitation on the computer device to which the present application is applied. Specific computer devices may include more or fewer components than those shown in the figure, or combine certain components, or have different component arrangements.

[0146] In one embodiment, a computer device is provided, including a memory and a processor, wherein the memory stores a computer program, and the processor executes the computer program to perform the following steps:

[0147] Acquire the image to be detected;

[0148] The feature extraction network extracts features from the image to be detected, resulting in a target feature map. The feature extraction network includes an attention convolutional module and a pooling layer.

[0149] The target feature map is classified using a classification network to obtain the detection result of the image to be detected; the detection result indicates whether the image to be detected has been tampered with or has not been tampered with.

[0150] In one embodiment, the feature extraction network involved when the processor executes a computer program includes at least two convolutional pooling layers connected end-to-end, each convolutional pooling layer including an attention convolutional module and a pooling layer.

[0151] In one embodiment, the processor, when executing a computer program, also performs the following steps:

[0152] Feature extraction is performed on the input information of each convolutional pooling layer, and the output of the last convolutional pooling layer is used as the target feature map of the image to be detected. The input information of the first convolutional pooling layer is the image to be detected, and the input information of any other convolutional pooling layer is the output of the convolutional pooling layer above it.

[0153] In one embodiment, the processor, when executing a computer program, also performs the following steps:

[0154] For each convolutional pooling layer, the input information of the convolutional pooling layer is used to extract features through the attention convolutional module contained in the convolutional pooling layer to obtain a basic feature map; the basic feature map is then compressed through the pooling layer contained in the convolutional pooling layer to obtain the output result of the convolutional pooling layer.

[0155] In one embodiment, the processor, when executing a computer program, also performs the following steps:

[0156] The attention convolution module includes a first convolution module, a second convolution module, a convolutional attention module, and a joint feature convolution module. The input information includes first input information and second input information, and the basic feature maps include a first basic feature map and a second basic feature map. The first convolution module and the convolutional attention module extract features from the first input information of the convolutional pooling layer to obtain a first basic feature map. The second convolution module extracts features from the second input information of the convolutional pooling layer to obtain a first intermediate feature map. The joint feature convolution module fuses the first basic feature map and the first intermediate feature map to obtain a second basic feature map.

[0157] In one embodiment, the processor, when executing a computer program, also performs the following steps:

[0158] The first convolution module extracts features from the first input information of the convolutional pooling layer to obtain a second intermediate feature map; the convolutional attention module extracts channel features and spatial features from the second intermediate feature map to obtain a first basic feature map.

[0159] In one embodiment, a computer-readable storage medium is provided having a computer program stored thereon, the computer program performing the following steps when executed by a processor:

[0160] Acquire the image to be detected;

[0161] The feature extraction network extracts features from the image to be detected, resulting in a target feature map. The feature extraction network includes an attention convolutional module and a pooling layer.

[0162] The target feature map is classified using a classification network to obtain the detection result of the image to be detected; the detection result indicates whether the image to be detected has been tampered with or has not been tampered with.

[0163] In one embodiment, the feature extraction network involved when the computer program is executed by the processor includes at least two convolutional pooling layers connected end-to-end, each convolutional pooling layer including an attention convolutional module and a pooling layer.

[0164] In one embodiment, when the computer program is executed by a processor, it also performs the following steps:

[0165] Feature extraction is performed on the input information of each convolutional pooling layer, and the output of the last convolutional pooling layer is used as the target feature map of the image to be detected. The input information of the first convolutional pooling layer is the image to be detected, and the input information of any other convolutional pooling layer is the output of the convolutional pooling layer above it.

[0166] In one embodiment, when the computer program is executed by a processor, it also performs the following steps:

[0167] For each convolutional pooling layer, the input information of the convolutional pooling layer is used to extract features through the attention convolutional module contained in the convolutional pooling layer to obtain a basic feature map; the basic feature map is then compressed through the pooling layer contained in the convolutional pooling layer to obtain the output result of the convolutional pooling layer.

[0168] In one embodiment, when the computer program is executed by a processor, it also performs the following steps:

[0169] The attention convolution module includes a first convolution module, a second convolution module, a convolutional attention module, and a joint feature convolution module. The input information includes first input information and second input information, and the basic feature maps include a first basic feature map and a second basic feature map. The first convolution module and the convolutional attention module extract features from the first input information of the convolutional pooling layer to obtain a first basic feature map. The second convolution module extracts features from the second input information of the convolutional pooling layer to obtain a first intermediate feature map. The joint feature convolution module fuses the first basic feature map and the first intermediate feature map to obtain a second basic feature map.

[0170] In one embodiment, when the computer program is executed by a processor, it also performs the following steps:

[0171] The first convolution module extracts features from the first input information of the convolutional pooling layer to obtain a second intermediate feature map; the convolutional attention module extracts channel features and spatial features from the second intermediate feature map to obtain a first basic feature map.

[0172] In one embodiment, a computer program product is provided, including a computer program that, when executed by a processor, performs the following steps:

[0173] Acquire the image to be detected;

[0174] The feature extraction network extracts features from the image to be detected, resulting in a target feature map. The feature extraction network includes an attention convolutional module and a pooling layer.

[0175] The target feature map is classified using a classification network to obtain the detection result of the image to be detected; the detection result indicates whether the image to be detected has been tampered with or has not been tampered with.

[0176] In one embodiment, the feature extraction network involved when the computer program is executed by the processor includes at least two convolutional pooling layers connected end-to-end, each convolutional pooling layer including an attention convolutional module and a pooling layer.

[0177] In one embodiment, when the computer program is executed by a processor, it also performs the following steps:

[0178] Feature extraction is performed on the input information of each convolutional pooling layer, and the output of the last convolutional pooling layer is used as the target feature map of the image to be detected. The input information of the first convolutional pooling layer is the image to be detected, and the input information of any other convolutional pooling layer is the output of the convolutional pooling layer above it.

[0179] In one embodiment, when the computer program is executed by a processor, it also performs the following steps:

[0180] For each convolutional pooling layer, the input information of the convolutional pooling layer is used to extract features through the attention convolutional module contained in the convolutional pooling layer to obtain a basic feature map; the basic feature map is then compressed through the pooling layer contained in the convolutional pooling layer to obtain the output result of the convolutional pooling layer.

[0181] In one embodiment, when the computer program is executed by a processor, it also performs the following steps:

[0182] The attention convolution module includes a first convolution module, a second convolution module, a convolutional attention module, and a joint feature convolution module. The input information includes first input information and second input information, and the basic feature maps include a first basic feature map and a second basic feature map. The first convolution module and the convolutional attention module extract features from the first input information of the convolutional pooling layer to obtain a first basic feature map. The second convolution module extracts features from the second input information of the convolutional pooling layer to obtain a first intermediate feature map. The joint feature convolution module fuses the first basic feature map and the first intermediate feature map to obtain a second basic feature map.

[0183] In one embodiment, when the computer program is executed by a processor, it also performs the following steps:

[0184] The first convolution module extracts features from the first input information of the convolutional pooling layer to obtain a second intermediate feature map; the convolutional attention module extracts channel features and spatial features from the second intermediate feature map to obtain a first basic feature map.

[0185] Those skilled in the art will understand that all or part of the processes in the methods of the above embodiments can be implemented by a computer program instructing related hardware. The computer program can be stored in a non-volatile computer-readable storage medium, and when executed, it can include the processes of the embodiments of the above methods. Any references to memory, databases, or other media used in the embodiments provided in this application can include at least one of non-volatile and volatile memory. Non-volatile memory can include read-only memory (ROM), magnetic tape, floppy disk, flash memory, optical memory, high-density embedded non-volatile memory, resistive random access memory (ReRAM), magnetic random access memory (MRAM), ferroelectric random access memory (FRAM), phase change memory (PCM), graphene memory, etc. Volatile memory can include random access memory (RAM) or external cache memory, etc. By way of illustration and not limitation, RAM can take many forms, such as Static Random Access Memory (SRAM) or Dynamic Random Access Memory (DRAM). The databases involved in the embodiments provided in this application may include at least one type of relational database and non-relational database. Non-relational databases may include, but are not limited to, blockchain-based distributed databases. The processors involved in the embodiments provided in this application may be general-purpose processors, central processing units, graphics processing units, digital signal processors, programmable logic devices, quantum computing-based data processing logic devices, etc., and are not limited to these.

[0186] The technical features of the above embodiments can be combined in any way. For the sake of brevity, not all possible combinations of the technical features in the above embodiments are described. However, as long as there is no contradiction in the combination of these technical features, they should be considered to be within the scope of this specification.

[0187] The embodiments described above are merely illustrative of several implementation methods of this application, and while the descriptions are specific and detailed, they should not be construed as limiting the scope of this patent application. It should be noted that those skilled in the art can make various modifications and improvements without departing from the concept of this application, and these all fall within the protection scope of this application. Therefore, the protection scope of this application should be determined by the appended claims.

Claims

1. An image detection method, characterized in that, The method includes: Acquiring the image to be detected includes: receiving the image to be detected sent by the terminal, or acquiring the image to be detected from a specified storage path; The feature extraction network extracts features from the image to be detected to obtain the target feature map of the image to be detected; wherein, the feature extraction network includes an attention convolutional module and a pooling layer; the feature extraction network includes at least two contiguous convolutional pooling layers, each convolutional pooling layer including an attention convolutional module and a pooling layer; The process involves extracting features from the image to be detected using a feature extraction network to obtain a target feature map of the image to be detected. This includes: extracting features from the input information of each convolutional pooling layer, and using the output of the last convolutional pooling layer as the target feature map of the image to be detected. The input information of the first convolutional pooling layer is the image to be detected, and the input information of any other convolutional pooling layer is the output of the convolutional pooling layer preceding it. The process of extracting features from the input information of each convolutional pooling layer includes: for each convolutional pooling layer, extracting features from the input information of the convolutional pooling layer through the attention convolutional module contained in the convolutional pooling layer to obtain a basic feature map; and compressing the basic feature map through the pooling layer contained in the convolutional pooling layer to obtain the output result of the convolutional pooling layer. The attention convolution module includes a first convolution module, a second convolution module, a convolutional attention module, and a joint feature convolution module. The input information includes first input information and second input information. The basic feature map includes a first basic feature map and a second basic feature map. The step of extracting features from the input information of the convolutional pooling layer through the attention convolution module to obtain the basic feature map includes: extracting features from the first input information of the convolutional pooling layer through the first convolution module and the convolutional attention module to obtain a first basic feature map; extracting features from the second input information of the convolutional pooling layer through the second convolution module to obtain a first intermediate feature map; and fusing the first basic feature map and the first intermediate feature map through the joint feature convolution module to obtain a second basic feature map. The target feature map is classified using a classification network to obtain the detection result of the image to be detected; wherein the detection result indicates that the image to be detected has been tampered with or has not been tampered with.

2. The method according to claim 1, characterized in that, The first basic feature map is obtained by extracting features from the first input information of the convolutional pooling layer through the first convolutional module and the convolutional attention module, including: The first convolution module extracts features from the first input information of the convolutional pooling layer to obtain a second intermediate feature map. The convolutional attention module extracts channel and spatial features from the second intermediate feature map to obtain the first basic feature map.

3. An image detection device, characterized in that, The device includes: The acquisition module is used to acquire the image to be detected, including: receiving the image to be detected sent by the terminal, or acquiring the image to be detected from a specified storage path; An extraction module is used to extract features from the image to be detected through a feature extraction network to obtain a target feature map of the image to be detected; wherein, the feature extraction network includes an attention convolutional module and a pooling layer; the feature extraction network includes at least two contiguous convolutional pooling layers, each convolutional pooling layer including an attention convolutional module and a pooling layer; The extraction module is specifically used to: extract features from the input information of each convolutional pooling layer, and use the output of the last convolutional pooling layer as the target feature map of the image to be detected; wherein, the input information of the first convolutional pooling layer is the image to be detected, and the input information of any other convolutional pooling layer is the output of the convolutional pooling layer preceding that convolutional pooling layer; The extraction module includes: an extraction unit, used to extract features from the input information of each convolutional pooling layer through the attention convolutional module contained in the convolutional pooling layer to obtain a basic feature map; and a compression unit, used to compress the basic feature map through the pooling layer contained in the convolutional pooling layer to obtain the output result of the convolutional pooling layer. The attention convolution module includes a first convolution module, a second convolution module, a convolutional attention module, and a joint feature convolution module. The input information includes first input information and second input information. The basic feature map includes a first basic feature map and a second basic feature map. The extraction unit includes: a first extraction subunit, specifically used to extract features from the first input information of the convolutional pooling layer through the first convolution module and the convolutional attention module to obtain a first basic feature map; a second extraction subunit, specifically used to extract features from the second input information of the convolutional pooling layer through the second convolution module to obtain a first intermediate feature map; and a fusion subunit, specifically used to fuse the first basic feature map and the first intermediate feature map through the joint feature convolution module to obtain a second basic feature map. The classification module is used to classify the target feature map through a classification network to obtain the detection result of the image to be detected; wherein the detection result is that the image to be detected has been tampered with or the image to be detected has not been tampered with.

4. A computer device comprising a memory and a processor, wherein the memory stores a computer program, characterized in that, When the processor executes the computer program, it implements the steps of the method according to any one of claims 1 to 2.

5. A computer-readable storage medium having a computer program stored thereon, characterized in that, When the computer program is executed by a processor, it implements the steps of the method according to any one of claims 1 to 2.

6. A computer program product, comprising a computer program, characterized in that, When the computer program is executed by a processor, it implements the steps of the method according to any one of claims 1 to 2.