Dct perception-based remote sensing image change detection method
Patent Information
- Authority / Receiving Office
- CN · China
- Patent Type
- Patents(China)
- Current Assignee / Owner
- NANJING UNIV OF SCI & TECH
- Filing Date
- 2024-08-30
- Publication Date
- 2026-06-19
Smart Images

Figure CN119107562B_ABST
Abstract
Description
Technical Field
[0001] This invention belongs to the field of change detection, specifically relating to a method for detecting changes in remote sensing images based on DCT sensing. Background Technology
[0002] Change detection is a task that involves identifying differences in the state of a specific object or phenomenon by observing it at different points in time. Specifically, change detection aims to identify changes in images of the same geographic area taken at different times. Change detection technology has wide applications in various fields, including disaster assessment, environmental monitoring, land use management, and urban development analysis. With the increasing frequency of extreme weather events (such as droughts, floods, hurricanes, and heat waves) caused by climate change, the dynamic changes in building and land use are becoming increasingly prominent.
[0003] In cognitive science, due to information processing bottlenecks, humans selectively focus on certain key information while ignoring less important information. Similarly, to improve the performance of change detection and refine deep features, attention modules are widely used in change detection networks. Change detection not only needs to extract global semantic features but also enhance detailed features. Many studies have focused on designing effective channel or spatial attention mechanisms, neglecting a fundamental problem: channel attention mechanisms use scalar representations of channels, which are difficult to implement due to significant information loss.
[0004] In recent years, many deep learning-based change detection methods have been proposed, increasingly incorporating traditional neural network architectures used for semantic segmentation to facilitate the extraction of deeper representational features. A significant limitation of these methods is the lack of comprehensive supervision over each sublayer during the training phase. Supervising only the last block leads to insufficient constraints on the intermediate layers. Summary of the Invention
[0005] The purpose of this invention is to provide a remote sensing image change detection method based on DCT perception. It introduces an adaptive spectral feature extraction module, which parameterizes the discrete cosine transform filter into learnable weights, allowing the model to adaptively optimize frequency selection. This is an extension of the fixed properties of traditional DCT. In addition, the confidence-weighted feature fusion builder is innovative in that it integrates the outputs of multiple sub-networks. Through confidence quantification and attention weighting mechanisms, it achieves dynamic feature selection and weighting, thereby enhancing the model's ability to identify key change regions.
[0006] The technical solution for achieving the objective of this invention is as follows: Firstly, this invention provides a method for detecting changes in remote sensing images based on DCT sensing, comprising the following steps:
[0007] The first step is to construct a dual-branch network based on four different levels of UNet to extract dual temporal features, concatenate features of the same level, and then connect the concatenated features to higher-level features in a skip connection to fuse features of different levels.
[0008] The second step is to design an adaptive spectral feature extraction module, select specific frequency components, generate a discrete cosine transform filter, and multiply the input data with the filter during forward propagation. The DCT filter corresponds to the response of different frequency components. Then, through spatial aggregation operations, the frequency domain representation of the features is obtained. The responses of each filter are integrated to generate a comprehensive frequency domain feature representation.
[0009] The third step is to design a confidence-weighted feature fusion machine. The probability distribution of the output of each sub-network is calculated through a softmax layer, and the change probability is selected as the confidence level. The confidence level is combined with the attention weights and normalized to ensure that the sum of the weights is 1. The resulting feature map is then classified into background and foreground through a point convolutional integral to achieve the change detection target.
[0010] In a second aspect, the present invention provides a computer device, including a memory, a processor, and a computer program stored in the memory and executable on the processor, wherein the processor executes the program to implement the steps of the method described in the first aspect.
[0011] Thirdly, the present invention provides a computer-readable storage medium having a computer program stored thereon, which, when executed by a processor, implements the steps of the method described in the first aspect.
[0012] Fourthly, the present invention provides a computer program product, including a computer program that, when executed by a processor, implements the steps of the method described in the first aspect.
[0013] Compared with the prior art, the significant features of this invention are as follows: (1) This invention designs an adaptive spectral feature extraction module, which integrates learnable parameters into the discrete cosine transform to achieve adaptive frequency domain feature extraction. This method automatically optimizes frequency selection and enhances the model's sensitivity to key features in the signal. The innovation of the module lies in its adaptive frequency domain feature extraction capability, which not only improves the generalization of the model, but also enhances the information integration capability through multispectral feature fusion. In addition, this module improves computation and storage efficiency by compressing frequency domain features, and may also bring additional regularization effects, which helps to control model complexity and reduce overfitting. (2) Confidence-weighted feature fusion unit. The core innovation of this module lies in the fusion of the outputs of multiple sub-networks. Through confidence quantification and attention weighting mechanism, dynamic feature selection and weighting are achieved, thereby enhancing the model's ability to identify key change regions. Through normalization processing, the balance of contributions from different sub-networks is ensured, while maintaining the model's adaptability and generalization capability. This ensemble learning method not only improves the accuracy of change detection, but also increases the interpretability of model decisions.
[0014] The present invention will now be described in further detail with reference to the accompanying drawings. Attached Figure Description
[0015] Figure 1 This is a flowchart of the present invention.
[0016] Figure 2 This is the change detection network structure of the present invention.
[0017] Figure 3 This is the adaptive spectral feature extraction module of the present invention.
[0018] Figure 4 (a) is ImageA of the change detection dataset at time T1.
[0019] Figure 4 (b) is the image ImageB of the change detection dataset at time T2.
[0020] Figure 4 (c) is the ground truth map of the change detection dataset.
[0021] Figure 4 (d) is a change graph of the change detection dataset using the FC-EF method.
[0022] Figure 4 (e) is a change graph of the change detection dataset using the FC-Siam-diff method.
[0023] Figure 4 (f) is a change graph of the change detection dataset using the FC-Siam-conc method.
[0024] Figure 4 (g) is a change graph of the change detection dataset using the SNUNet method.
[0025] Figure 4 (h) is a change map of the change detection dataset using a change detection method based on DCT-sensing of remote sensing images. Detailed Implementation
[0026] This invention proposes a remote sensing image change detection method based on DCT perception. A UNet-based deep supervised network ensures that each sub-network layer is supervised, thus more closely approximating the real target. A multispectral attention module is used to extract detailed features, and its channel attention module assigns differentiated weights to channels based on different frequency components. This allows the network to focus its limited attention on more critical regions during training, enhancing the representation of detailed features.
[0027] Combination Figure 1 The implementation process of this invention is described in detail below, and the steps are as follows:
[0028] The first step involves constructing a dual-branch network based on four different levels of UNet to extract dual-temporal features. Features at the same level are concatenated, and the concatenated features are then connected to higher-level features in a skip connection process to fuse features from different levels. Specifically:
[0029] A UNet-based dual-branch network is constructed to extract dual-temporal features. The network consists of four sub-networks. The input to the network is two dual-temporal remote sensing images. The encoder obtains features at different scales. After the dual-temporal features are concatenated, the high-resolution, fine-grained features in the encoder are directly passed to the corresponding layers of the decoder through a dense skip connection mechanism.
[0030] The second step involves designing an adaptive spectral feature extraction module. Specific frequency components are selected to generate discrete cosine transform (DCT) filters. These filters correspond to specific frequency components. During forward propagation, the input data is multiplied by the filters. The DCT filters correspond to the responses of different frequency components. Subsequently, through spatial aggregation operations, the frequency domain representation of the features is obtained. The responses of each filter are integrated to generate a comprehensive frequency domain feature representation.
[0031] like Figure 3 As shown, an adaptive spectral feature extraction module is designed to select specific frequency components and generate a discrete cosine transform filter. Typically, the basic function of the two-dimensional discrete cosine transform is:
[0032]
[0033] sth∈{0, 1, ..., H-1}, w∈{0, 1, ..., W-1}
[0034] in This represents the two-dimensional discrete cosine transform coefficients. H and W represent the height and width of the image, respectively. C represents the channel, while i and j are the indices of the two-dimensional discrete cosine transform coefficients, representing the amplitude of the signal at the image frequency.
[0035] First, we extract the feature map X∈R from the backbone network. C×H×W Split into n parts according to the channel dimension, represented as [X 0 X 1 , ..., X n-1 ], where X i ∈R C′×H×W , Initialize weights W uv , where u and v represent the frequency component indices in the two-dimensional space; thus, for each part, a corresponding two-dimensional discrete cosine transform frequency component is assigned, and the result of the two-dimensional discrete cosine transform can be used as the compression result of channel attention. Thus, we have:
[0036]
[0037] sti∈{0, 1, ..., n-1}
[0038] Where [u i v i ] is X i The corresponding frequency component 2D index, It is the pixel value at position (h, w) in the i-th channel of the original image data. These are the basis functions of the discrete cosine transform. For learnable discrete cosine transform filter weights, F i ∈R C′ This is the compressed C′-dimensional vector;
[0039] The entire compression vector can be obtained by concatenation:
[0040] F = cat([F 0 F 1 F n-1 ])
[0041] Where F = R C The resulting multispectral vector. The entire multispectral channel attention framework can be written as:
[0042] MSCA_att = sigmoid(f(F))
[0043] Through the above process, we extend the channel to a framework with multiple frequency components. These filters correspond to the corresponding frequency components. During forward propagation, the input data is multiplied by the filters. The DCT filters correspond to the responses of different frequency components. Subsequently, through aggregation operations in the spatial dimension, we obtain the frequency domain representation of the features. By integrating the responses of each filter, we generate a comprehensive frequency domain feature representation.
[0044] The third step involves designing a confidence-weighted feature fusion engine. This engine calculates the probability distribution of the output for each sub-network using a softmax layer and selects the change probability as the confidence level. These confidence levels are then combined with attention weights and normalized to ensure the total weights equal to 1. This approach allows the model to assign different weights when processing outputs at different levels, focusing on the most useful features and thus improving performance. Finally, the resulting feature maps are processed using a point convolutional integral to classify the background and foreground, achieving the change detection objective.
[0045] Define a confidence-normalized attention module, as follows:
[0046] 1) First, for the output of the four-layer subnet, apply the softmax formula to obtain the probability distribution P of each pixel belonging to each change category. n :
[0047]
[0048] in, L is the probability of the j-th category in the i-th batch. n The logits matrix output by the nth subnetwork. It corresponds to L n The logits value in the equation, where K is the total number of categories, k∈{0,1,...,K-1}, and the denominator is the logits of all categories k. The sum of these probabilities ensures the normalization of the probability distribution, meaning that the sum of the probabilities of all classes is 1.
[0049] 2) Use channel attention for the feature map C of each sub-network n Calculate weight A n Multiply by each other to obtain the weighted confidence level W. n :
[0050] W n =C n ⊙A n
[0051] Where ⊙ represents element-wise multiplication.
[0052] 3) Weighted confidence level W n Normalization ensures that the sum of the weights of all subnetworks is 1:
[0053]
[0054] 4) Weight the confidence level W′ n Applied to the original feature map F n And merge to obtain the final feature map F merged :
[0055]
[0056] Finally, the obtained feature map is used as the background and foreground through a point convolutional integral to achieve the change detection target.
[0057] The core features of this invention lie in its innovative network structure and attention-weighted mechanism, which enable it to effectively extract and enhance detailed features when processing remote sensing images, and also increase the interpretability of model decisions. This method utilizes a Siamese network structure to process dual-temporal images, ensuring consistency in feature extraction. A key innovation is the introduction of an adaptive spectral feature extraction module. This module adaptively selects important frequency components based on data and task requirements, rather than relying on fixed, preset frequency selections. It parameterizes the discrete cosine transform filter (DCT) into learnable weights, allowing the model to adaptively optimize frequency selection—an extension of the fixed properties of traditional DCT. Furthermore, the confidence-weighted feature fusion engine innovates by fusing the outputs of multiple sub-networks. Through confidence quantification and attention-weighted mechanisms, it achieves dynamic feature selection and weighting, thereby enhancing the model's ability to identify key changing regions.
[0058] The effects of this invention can be further illustrated by the following simulation experiments:
[0059] (1) Simulation conditions
[0060] The simulation experiments used a change detection dataset. Our series of comparative experiments were conducted on CDD and LEVIR-CD, the most commonly used evaluation datasets in the field of change detection. The DD dataset consists of 11 pairs of multispectral (R, G, B) images acquired by Google Earth (Digital Globe) in different seasons, with spatial resolutions ranging from 3 cm / px to 100 cm / px. This invention implements SSUNet-CD using the PyTorch framework. During training, we conducted numerous trials and adjustments, setting the batch size to 16 and using Adam as the optimizer. The learning rate was set to 1e-3, decaying by 0.5 times every 8 epochs. We conducted experiments on an NVIDIA RTX 3090 and trained the model for 100 epochs until convergence.
[0061] In terms of quantitative evaluation, this invention uses three standard metrics: precision, recall, and F1 score.
[0062] (2) Simulation content
[0063] This invention uses the CDD dataset and the LEVIR-CD dataset to evaluate the performance of the algorithm. Four representative change detection algorithms are used for comparison: FC-EF (Fully Convolutional Early Fusion), FC-Siam-diff (Fully Convolutional Siamese-Difference), FC-Siam-conc (Fully Convolutional Siamese-Concatenation), and SNUNet (the combination of Siamese network and NestedUNet).
[0064] (3) Analysis of simulation experiment results
[0065] Table 1 shows the evaluation metrics results on the CDD dataset, while Table 2 presents the evaluation metrics results on the LEVIR-CD dataset.
[0066] Table 1. Quantitative Evaluation of Different Change Detection Algorithms for CDD Datasets
[0067] Methods Pre(%) Rec(%) F1(%) FC-EF 79.61 49.10 60.74 FC-Siam-diff 73.41 73.71 73.56 FC-Siam-conc 80.81 65.66 72.45 SNUNet 94.52 91.02 92.74 Ours 96.10 96.28 96.19
[0068] Table 2. Quantitative Evaluation of Different Change Detection Algorithms in the LEVIR-CD Dataset
[0069] Methods Pre(%) Rec(%) F1(%) FC-EF 89.08 79.62 84.08 FC-Siam-diff 91.22 82.85 86.83 FC-Siam-conc 89.95 84.21 86.99 SNUNet 91.58 88.79 90.17 Ours 93.13 92.50 92.78
[0070] As shown in Table 1, compared with the four change detection algorithms, the remote sensing image change detection method based on DCT perception proposed in this invention exhibits superior performance on the change detection dataset, achieving the highest precision, recall, and F1 score of 96.10%, 96.28%, and 96.19%, respectively. The superior performance of this method is attributed to the deep supervision proposed in this invention, which gradually brings the feature map closer to the ground reality during training. Our network also includes spatial and spectral attention modules, effectively utilizing complex input information, particularly enhancing the representation of detailed features. The results of this method on two change detection datasets are shown in the figures below. Figure 4 As shown above, simulation results on the real change detection dataset demonstrate the effectiveness of the method of this invention.
Claims
1. A method for detecting changes in remote sensing images based on DCT sensing, characterized in that, Includes the following steps: The first step is to construct a dual-branch network based on four different levels of UNet to extract dual temporal features, concatenate features of the same level, and then connect the concatenated features to higher-level features in a skip connection to fuse features of different levels. The second step is to design an adaptive spectral feature extraction module, select specific frequency components, generate a discrete cosine transform filter, and multiply the input data with the filter during forward propagation. The DCT filter corresponds to the response of different frequency components. Then, through spatial aggregation operations, the frequency domain representation of the features is obtained. The responses of each filter are integrated to generate a comprehensive frequency domain feature representation. The specific steps are as follows: The fundamental function of the two-dimensional discrete cosine transform is: ; ; in Represents the coefficients of the two-dimensional discrete cosine transform. and These represent the height and width of the image, respectively. and These are the indices of the two-dimensional discrete cosine transform coefficients, representing the signal amplitude at the image frequency; the feature map extracted by the backbone network... ,in The channel is represented as n parts according to the channel dimension, and is represented as ,in , Initialize weights ,in and Represents the frequency component index in two-dimensional space; For each part, a corresponding two-dimensional discrete cosine transform frequency component is assigned, and the result of the two-dimensional discrete cosine transform can be used as the compression result of channel attention; thus, we have: ; ; ; in[ ]for The corresponding frequency component 2D index, The original image data is in the first... Location on each channel Pixel value at that location, These are the basis functions of the discrete cosine transform. For learnable discrete cosine transform filter weights, For compressed dimensional vector; The entire compression vector can be obtained by concatenation: ; in, The resulting multispectral vector; Through the above process, the channel is extended to a framework with multiple frequency components. These filters correspond to the corresponding frequency components. During forward propagation, the input data is multiplied by the filters. The DCT filters correspond to the responses of different frequency components. Subsequently, through aggregation operations in the spatial dimension, the frequency domain representation of the features is obtained. The responses of each filter are integrated to generate a comprehensive frequency domain feature representation. The third step is to design a confidence-weighted feature fusion machine. The probability distribution of the output of each sub-network is calculated through a softmax layer, and the change probability is selected as the confidence level. The confidence level is combined with the attention weights and normalized to ensure that the sum of the weights is 1. The resulting feature map is then classified into background and foreground through a point convolutional integral to achieve the change detection target.
2. The DCT perception based remote sensing image change detection method according to claim 1, wherein, The first step is to construct a UNet-based dual-branch network for extracting dual-temporal features. The network consists of four sub-networks. The input to the network is two dual-temporal remote sensing images. The encoder obtains features at different scales. After the dual-temporal features are stitched together, the high-resolution, fine-grained features in the encoder are directly passed to the corresponding layers of the decoder through a dense skip connection mechanism.
3. The remote sensing image change detection method based on DCT sensing according to claim 1, characterized in that, The third step is to define the confidence-weighted feature fusion engine, which is operated as follows: (1) For the four-layer sub-net output, the softmax formula is applied to obtain the probability distribution of each pixel belonging to each change category : ; in, It represents the probability of the j-th category in the i-th batch. No. The logits matrix output by each subnetwork. It corresponds to The logits value in the middle, It is the total number of categories. The denominator is all categories of The sum of these probabilities ensures the normalization of the probability distribution, meaning that the sum of the probabilities of all classes is 1. (2) Use channel attention for the feature map of each sub-network With calculation weight Multiply by each other to obtain the weighted confidence score. : ; wherein represents an element-wise multiplication; (3) Weighted confidence level Normalization ensures that the sum of the weights of all subnetworks is 1: ; (4) Weighted confidence levels Applied to the original feature map And merge to obtain the final feature map. : ; Finally, the obtained feature map is used as the background and foreground through a point convolutional integral to achieve the change detection target.
4. A computer device, comprising a memory, a processor, and a computer program stored in the memory and executable on the processor, characterized in that, When the processor executes the program, it implements the steps of the method according to any one of claims 1-3.
5. A computer-readable storage medium having stored thereon a computer program, characterized in that, When the program is executed by the processor, it implements the steps of the method described in any one of claims 1-3.
6. A computer program product comprising a computer program, characterized in that, When the computer program is executed by a processor, it implements the steps of the method described in any one of claims 1-3.