Dual-branch multi-scale dynamic local convolution attention method based on remote sensing change detection

By employing a bi-branch, multi-scale dynamic local convolutional attention method, the problems of inaccurate detection of small change regions and false changes in remote sensing change detection are solved, achieving high-precision remote sensing change detection.

CN118736416BActive Publication Date: 2026-06-26XINJIANG UNIVERSITY

Patent Information

Authority / Receiving Office
CN · China
Patent Type
Patents(China)
Current Assignee / Owner
XINJIANG UNIVERSITY
Filing Date
2024-07-03
Publication Date
2026-06-26

AI Technical Summary

Technical Problem

Existing remote sensing change detection methods struggle to accurately capture detailed information in areas of small change, and suffer from spurious changes and sample imbalance, leading to low detection accuracy and decreased model precision.

Method used

A dual-branch, multi-scale, dynamic local convolutional attention method based on remote sensing change detection is adopted, which includes feature extraction, dynamic multi-scale convolutional attention module, parallel decoding aggregator module, and loss function. Features are extracted through a lightweight network, and feature fusion and detection capabilities are enhanced by combining multi-scale context aggregation and dynamic partial convolutional attention.

Benefits of technology

It improves the accuracy and robustness of remote sensing image change detection, effectively captures subtle changes in complex scenes, reduces spurious change interference, and enhances the model's detection capability in areas with small changes.

✦ Generated by Eureka AI based on patent content.

Smart Images

  • Figure CN118736416B_ABST
    Figure CN118736416B_ABST
Patent Text Reader

Abstract

The application discloses a double-branch multi-scale dynamic local convolution attention method based on remote sensing change detection, and mainly relates to the field of remote sensing change detection; including the following steps: S1, collecting double-time remote sensing images, and making a remote sensing image dataset; S2, constructing a double-branch multi-scale dynamic partial convolution attention network architecture based on remote sensing change detection; S3, performing feature extraction processing on the double-time remote sensing images by using a pre-trained lightweight network; S4, inputting the extracted multi-layer features into a dynamic multi-scale convolution attention module; S5, inputting the features obtained in the dynamic multi-scale convolution attention module into a parallel decoding aggregator module; S6, saving the trained model weight, testing by using a test set, and obtaining a change detection result of the test image; the application can accurately detect changes in the captured remote sensing images.
Need to check novelty before this filing date? Find Prior Art

Description

Technical Field

[0001] This invention relates to the field of remote sensing change detection, specifically a bi-branch multi-scale dynamic local convolutional attention method based on remote sensing change detection. Background Technology

[0002] Remote sensing change detection (RSCD) aims to detect changes in a pair of remote sensing images of the same location taken at different times. It plays a very important role in urban road planning, landslide disaster assessment, land resource utilization, and ecological environment protection. However, with the rapid development of society, the changing areas will become increasingly complex, which brings great challenges to change detection.

[0003] There are various change detection methods. Traditional change detection methods can be divided into image transformation-based methods, image algebra-based methods, and post-classification-based methods. These methods are based on handcrafted features and utilize different classifiers, including decision trees, change vector analysis, support vector machines, and clustering methods, to better distinguish changed regions. However, these classic methods exhibit various limitations, failing to capture rich semantic feature information, resulting in low change detection accuracy. Change detection methods based on convolutional neural networks often struggle to effectively capture global contextual information, thus reducing change detection accuracy. Change detection methods based on Transformers increase the number of parameters, memory usage, and floating-point operations per second.

[0004] For the detection of small change regions, the types and magnitudes of changes of interest vary in dual-temporal remote sensing images. Small change regions can be further divided into sparse and dense small change regions. Compared with larger change regions, small change regions are smaller in size and their change features are less obvious. During change monitoring, detailed information of small change regions may be lost during feature extraction and change feature reconstruction, leading to the omission of some small change regions during change detection. Furthermore, there are also spurious changes. First, during the acquisition of multi-temporal remote sensing images, different imaging conditions such as changes in illumination angle or intensity, seasonal changes, and sensor type can affect the appearance of the same semantic object in a specific image pair. Second, the definition of change varies depending on the application scenario. Some changes that occur under visual conditions may not be of interest in certain application scenarios, such as changes to temporary objects like cars. These irrelevant changes can interfere with the detection of changes of interest and are called spurious changes. Imbalanced sample classes negatively impact change detection networks. Deep learning-based remote sensing image change detection methods require a large amount of labeled data for training and optimizing the network model. However, in the real world, changed areas are far fewer than unchanged areas. This results in a situation where the number of pixels in unchanged areas far exceeds the number of pixels in changed areas in the change detection dataset—a phenomenon known as class imbalance. This imbalance causes the trained network to favor a particular class of samples that appear frequently, leading to a decrease in model accuracy. Summary of the Invention

[0005] The purpose of this invention is to solve the problems existing in the prior art and provide a bi-branch multi-scale dynamic local convolutional attention method based on remote sensing change detection, which can accurately detect changes in captured remote sensing images.

[0006] To achieve the above objectives, the present invention employs the following technical solution:

[0007] A two-branch, multi-scale dynamic local convolutional attention method based on remote sensing change detection includes the following steps:

[0008] S1. Collect dual-temporal remote sensing images, create a remote sensing image dataset, ensure that each dual-temporal remote sensing image has pixel-level annotations, divide the remote sensing image dataset into training set, validation set and test set, and cut the dual-temporal remote sensing images in the remote sensing image dataset into the same size.

[0009] S2. Construct a dual-branch multi-scale dynamic partial convolutional attention network architecture based on remote sensing change detection, including a feature extractor, a dynamic multi-scale convolutional attention module, a parallel decoding aggregator module, and a loss function. The dynamic multi-scale convolutional attention module is formed by combining the dynamic partial convolutional attention module and the multi-scale context aggregation module.

[0010] S3. Use a pre-trained lightweight network to extract features from dual-temporal remote sensing images;

[0011] S4. Input the extracted multi-layer features into the dynamic multi-scale convolutional attention module;

[0012] S5. Input the features obtained in the dynamic multi-scale convolutional attention module into the parallel decoding aggregator module, where dilated convolution is used to expand the receptive field, and the multi-layer prediction map and loss function output by the parallel decoding aggregator module are used for deep supervision.

[0013] S6. Save the trained model weights, test them using the test set, and obtain the change detection results for the test image.

[0014] Preferably, the operation flow of the dual-branch multi-scale dynamic partially convolutional attention network architecture based on remote sensing change detection in step S2 is as follows:

[0015] First, multi-level feature maps are extracted from the dual-temporal remote sensing images in two parallel branches of a pre-trained weight-shared CNN, with sizes of 64×64, 32×32, 16×16, and 8×8.

[0016] Then, the multi-scale context aggregation module aggregates features across different layers, while the dynamic part convolutional attention module uses concatenation and absolute value subtraction to generate two parallel branches, allowing the two branches to simultaneously use dynamic convolution to generate queries, keys, and values.

[0017] Furthermore, by combining the multi-scale context aggregation module, the information from the two branches can be fully integrated to obtain different semantic change information;

[0018] Finally, a parallel decoding aggregator employs multi-path feature processing and dilated convolution strategies to generate a prediction map by combining feature information from different layers of the network.

[0019] Preferably, in step S3, the lightweight network is Mobilenetv2. A convolutional layer with a stride of 2 is added to each layer of the backbone network, downsampling the feature map of each level to half the size of the previous level. By feeding a pair of bi-temporal T1 and T2 remote sensing images into the feature extractor Mobilenetv2, four layers of bi-temporal feature maps are obtained, represented as follows:

[0020] Preferably, the operation flow of the multi-scale context aggregation module is as follows: using T1 remote sensing images as input to the network model, the first three layers of the four-layer feature extraction are input into the multi-scale context aggregation module, and then... Sampling was achieved through max pooling. With the same size, it is then passed through a convolutional layer to increase its channel count and... Maintain consistency; similarly, first... The number of channels is achieved through a 3×3 convolutional layer and To maintain consistency, interpolate using double lines. Upsampling to and Consistent; then X1, X2 and X3 are combined by concatenation to gather multi-level feature information, and residual learning is used to retain the original feature information while enriching the contextual feature information;

[0021] The entire representation process can be expressed as:

[0022]

[0023] X=X2+Conv3×3(Conv3×3(Cat(X1,X2,X3)))

[0024] Where Mxpooling(·) represents max pooling, Cat(·) represents concatenation, Conv3×3 represents a function consisting of 3×3 convolution, batch normalization, and ReLU activation, and Interpolate represents upsampling.

[0025] Preferably, a DSConv2d layer is introduced on the basis of the convolutional neural network to form a dynamic partial convolutional attention module. The DSConv2d layer combines dynamic weight adjustment and includes selective convolution bias and convolution bias weights. In DSConv2d, the input channel is divided into several blocks according to the block size, and each block undergoes independent dynamic weight adjustment.

[0026] Preferably, the dynamic partial convolutional attention module adopts a dual-branch strategy, and the operation flow of the dynamic partial convolutional attention module is as follows:

[0027] First, the multi-layer feature maps extracted from the dual-temporal T1 and T2 remote sensing images are stitched together, and their differences are calculated simultaneously to obtain F. cat and F abs Feature map;

[0028] Then, input feature F cat and F abs Processed by DSConv2d to generate spatially dependent Q, K, and V representations as follows:

[0029] Q = DSconv2d(F abs )

[0030] K = DSconv2d(F abs )

[0031] V = DSconv2d(F abs )

[0032] After generating Q and K through dynamic convolution, they are fed into the split channel module respectively;

[0033] Then, a 3×3 convolution is performed on each separated feature map to extract unique feature information;

[0034] The two convolutional feature maps are connected together, and then further refined through two 1×1 convolutions;

[0035] The learning process is enhanced by utilizing residual connections, ultimately producing refined features Q′ and K′;

[0036] The split channel module is represented as:

[0037] Q' = SCM(Q)

[0038] K′=SCM(K)

[0039] Here, SCM stands for Separate Channel Module, which extracts more diverse feature information by convolving different channel parts separately;

[0040] Based on the obtained enhanced Q′ and K′, matrix multiplication is performed, and then Softmax is applied to compute attention weights, which are used to modulate V, producing the final feature output for change detection:

[0041] C = Softmax(Q′(K′) T )

[0042] =Softmax(Q′(K′) T )*V

[0043] =Softmax(Q′(K′) T )*V+F abs .

[0044] Preferably, the parallel decoding aggregator module employs dual-branch decoding, with one branch using concatenation and the other using feature subtraction; the parallel decoding aggregator module aggregates four layers of feature maps in a bottom-up manner.

[0045] Preferably, the operation flow of the parallel decoding aggregator module is as follows:

[0046] First, input two feature maps;

[0047] Then, bilinear interpolation is used to ensure that the feature sizes of the two feature maps are consistent;

[0048] Next, we connect the two mappings, as follows:

[0049]

[0050] Where Conv3×3 represents the combination of 3×3 convolution, batch normalization and REU activation function, and Up(·) is the bilinear upsampling operation;

[0051] The two feature maps are then concatenated, and another 3×3 convolution is used to adjust the number of channels after concatenation.

[0052] Subsequently, 3×3 convolutions were used to adjust the number of channels after concatenation, and the resulting features were divided into two branches: the first branch used 3×3 convolutions with an expansion rate of 3; the second branch used parallel 3×3 convolutions with expansion rates of 2 and 1.

[0053] Finally, these three feature layers are connected to obtain the final feature map.

[0054] Preferably, the loss function is:

[0055]

[0056] Where p represents the model's predicted change probability, y is the binary true label, and N is the number of pixels.

[0057] Compared with the prior art, the beneficial effects of the present invention are as follows:

[0058] This invention proposes a change detection method based on Siamese convolutional neural networks—a dual-branch parallel multi-scale dynamic partial convolutional attention network (DMDPCANet) for remote sensing change detection. This method can accurately detect changes in captured remote sensing images. The main contributions of this invention are as follows:

[0059] 1) This invention provides a CNN-based remote sensing change detection network with dual-branch multi-scale dynamic partial convolutional attention, which can accurately detect semantic changes.

[0060] 2) This invention provides a multi-scale context aggregation module (MCA), which aggregates feature information between adjacent layers, not only retaining the original feature information, but also supplementing information between other layers, so that the network of this invention can ensure good detection performance during detection.

[0061] 3) This invention provides a Dynamic Partial Convolutional Attention Module (DPCATT) that effectively integrates raw input features with features enhanced through an attention mechanism. This integration ensures the model maintains high accuracy and robustness when performing remote sensing change detection tasks. By leveraging attention weights to highlight important aspects of the input data, the model can better distinguish subtle changes in the environment.

[0062] 4) This invention provides a multi-scale context aggregation module (PDA) that fuses features at different scales through upsampling and dilated convolution, effectively capturing multi-layer information. It uses a dual-branch input, where one branch represents concatenated features and the other represents the feature difference with subtraction. Both are fed into the PDA, which uses upsampling, multiple 3×3 convolutions, and dilated convolutions to progressively upsample from deep to shallow layers to the same size, ultimately outputting a transformation map. Furthermore, a loss function is used to perform deep supervision on the multi-layer features in the output. Attached Figure Description

[0063] Figure 1 This is the overall framework diagram of the present invention;

[0064] Figure 2 This is a schematic diagram of the multi-scale context aggregation module;

[0065] Figure 3 This is a schematic diagram of the dynamic part of the convolutional attention module;

[0066] Figure 4 This is a schematic diagram of the parallel decoding aggregator module. Detailed Implementation

[0067] The present invention will be further illustrated below with reference to specific embodiments. It should be understood that these embodiments are for illustrative purposes only and are not intended to limit the scope of the invention. Furthermore, it should be understood that after reading the teachings of this invention, those skilled in the art can make various alterations or modifications to the invention, and these equivalent forms also fall within the scope defined in this application.

[0068] Example: The present invention describes a two-branch, multi-scale dynamic local convolutional attention method based on remote sensing change detection, comprising the following steps:

[0069] S1. Collect dual-temporal remote sensing images and create a remote sensing image dataset. Ensure that each dual-temporal remote sensing image has pixel-level annotations. Divide the remote sensing image dataset into training set, validation set, and test set for use at different stages. At the same time, cut the dual-temporal remote sensing images in the remote sensing image dataset into the same size, such as cutting them into 256×256.

[0070] S2. Construct a dual-branch multi-scale dynamic partial convolutional attention network architecture based on remote sensing change detection, including a feature extractor, a dynamic multi-scale convolutional attention module, a parallel decoding aggregator module, and a loss function. The dynamic multi-scale convolutional attention module is formed by combining the dynamic partial convolutional attention module and the multi-scale context aggregation module.

[0071] S3. Use a pre-trained lightweight network to extract features from the dual-temporal remote sensing images. The lightweight network can be Moblinetv2.

[0072] S4. The extracted multi-layer features are input into the dynamic multi-scale convolutional attention module (composed of dynamic partial convolutional attention module (DPCATT) and multi-scale context aggregation (MCA)). The original features and enhanced features are combined to capture rich details and obtain the feature information of the change region.

[0073] S5. Input the features obtained from the dynamic multi-scale convolutional attention module into the parallel decoding aggregator module. Here, dilated convolution is used to expand the receptive field, which improves the feature information of some small change areas. The multi-layer prediction map and loss function output by the parallel decoding aggregator module are used for deep supervision to learn richer information.

[0074] S6. Save the trained model weights, test them using the test set, and obtain the change detection results for the test image.

[0075] The following sections will provide a detailed introduction to the dual-branch, multi-scale dynamic partially convolutional attention network architecture based on remote sensing change detection mentioned in the above steps, as well as the feature extractor (Mobilenetv2), the dynamic multi-scale convolutional attention module (composed of the dynamic partially convolutional attention module (DPCATT) and multi-scale context aggregation (MCA)), the parallel decoding aggregator module (PDA), and the loss function.

[0076] 1) A dual-branch, multi-scale, dynamic partially convolutional attention network architecture based on remote sensing change detection

[0077] The overall structure of DMDPCANet is shown in the attached figure. Figure 1 As shown, a pre-trained neural network is used to extract multi-layer features from dual-temporal images. A Dynamic Partial Convolutional Attention Module (DPCATT) and Multi-Scale Context Aggregation (MCA) are combined to form a Dynamic Multi-Scale Convolutional Attention Module (DMCA), which captures different semantic information about the changes. A Parallel Decoding Aggregator (PDA) consists of upsampling and dilated convolutions to generate multi-layered prediction images. A loss function is used for deep supervision of the multi-layered predictions. Finally, the final output prediction images from the two branches are combined to generate a change map.

[0078] The dual-branch multi-scale dynamic partially convolutional attention network based on remote sensing change detection is a dual-branch attention network that combines global and local contextual information.

[0079] Specifically, it includes a feature extractor, multi-scale feature fusion (MCA), dynamic partially convolutional attention (DPCATT), and a parallel decoding aggregator (PDA). It also uses a binary cross-entropy loss function to perform deep supervision (DS) on the output multi-level prediction map, finally generating a change map. First, multi-level feature maps are extracted from the dual-temporal remote sensing images in two parallel branches (Mobilenetv2) of a pre-trained weight-shared CNN, with sizes of 64×64, 32×32, 16×16, and 8×8. Then, the multi-scale context aggregation module enhances the network's sensitivity and recognition ability for small-change regions in the remote sensing images by aggregating features across different layers, improving feature representation and discriminative power, and suppressing irrelevant interference. In the DCATT module, cascading and absolute value subtraction are used to generate two parallel branches. Borrowing the concept of self-attention, the two branches simultaneously use dynamic convolution to generate queries, keys, and values, extracting richer and more meaningful spatial relationships and contextual information, enhancing the model's ability to capture subtle changes. Finally, the multi-scale context aggregation module fully integrates the information from both branches, obtaining different semantic change information and significantly improving the model's robustness and adaptability in complex remote sensing scenarios. Finally, a parallel decoding aggregator employs multi-path feature processing and dilated convolution strategies to generate a prediction map by combining feature information from different layers of the network. Overall, the DMDPCANet network of this invention significantly improves the model's change detection accuracy by fusing interactive information between two temporal images.

[0080] The specific implementation is as follows:

[0081] First, to effectively process high-resolution dual-temporal remote sensing images, this invention utilizes a lightweight network (Mobilenetv2) to extract dual-temporal feature maps, capturing key visual features at each time point. By comparing the feature maps from the two time points, the model of this invention accurately and effectively distinguishes between changed and unchanged regions, and performs precise quantitative analysis on these changed regions.

[0082] Considering the complexity of change detection scenarios, this invention removes the Global Average Pooling (GAP) layer and the final fully connected layer from the backbone network to more effectively address this task. A convolutional layer with a stride of 2 is added to each layer of the backbone, downsampling the feature map of each level to half the size of the previous level. By feeding a pair of bi-temporal remote sensing images, T1 and T2, into the feature extractor (Mobilenetv2), this invention obtains four layers of bi-temporal feature maps, represented as follows: Compared to other CNN backbones, MobileNetv2 employs depthwise separable convolutions, significantly reducing the model's parameters and computational complexity. This design allows MobileNetv2 to operate efficiently in resource-constrained environments while maintaining powerful feature extraction capabilities and processing speed.

[0083] 2) Multi-scale context aggregation

[0084] Multi-scale Context Aggregation (MCA) module, such as Figure 2 As shown, it is generally believed that deep features contain rich semantic information, while shallow features contain detailed and texture information about objects. Therefore, this invention considers combining extracted multi-level features to enhance the representational power of the features.

[0085] This invention uses T1 remote sensing images as input to the network model, and inputs the first three layers of the four-layer feature extraction into the MCA module, subsequently applying the present invention... Sampling was achieved through max pooling. With the same size, it is then passed through a convolutional layer to increase its channel count and... Maintain consistency. Again, first... The number of channels is achieved through a 3×3 convolutional layer and To maintain consistency, interpolate using double lines. Upsampling to and Consistent. Then, X1, X2, and X3 are combined using a concatenation method to gather multi-level feature information. Residual learning is then employed to retain the original feature information while enriching the contextual feature information. The entire representation process can be expressed as:

[0086]

[0087] X=X2+Conv3×3(Conv3×3(Cat(X1,X2,X3)))

[0088] Where Mxpooling(·) represents max pooling, Cat(·) represents concatenation, Conv3×3 represents a function consisting of 3×3 convolution, batch normalization, and ReLU activation, and Interpolate represents upsampling.

[0089] 3) Dynamic Partial Convolution Attention Module

[0090] Dynamic Partial Convolutional Attention Module (DPCATT) such as Figure 3As shown. In deep learning, Convolutional Neural Networks (CNNs) extract multi-scale features from images through stacked convolutional layers. However, standard convolution operations are computationally intensive and involve many parameters, especially when processing high-resolution images and large datasets. To improve the efficiency of convolution operations while maintaining strong feature extraction capabilities, this invention introduces an enhanced convolutional layer called DSConv2d. This layer combines dynamic weight adjustment and includes selective convolution bias (KDSBias) and convolution bias weights (CDS). DSConv2d can be used in various complex neural network architectures, particularly in attention mechanisms in query layers, key layers, and value layers. By introducing dynamic weight adjustment and selective convolution bias, DSConv2d significantly improves the performance of the model in handling complex scenarios. The block size parameter defines the size of the blocks into which the input channels are divided during dynamic weight adjustment. In DSConv2d, the input channels are divided into several blocks according to the block size, and each block undergoes independent dynamic weight adjustment. This block-based processing allows for finer control over convolution operations, thereby increasing the model's flexibility and feature capture capabilities.

[0091] As is well known, concatenation and absolute difference produce different feature information in bi-branch representation. To enrich the feature information of each layer, this invention employs a bi-branch strategy. This invention uses a Dynamic Partial Convolutional Attention Module (DPCATT), which captures the boundary, semantic, and texture information of each layer.

[0092] like Figure 3 As shown. This invention stitches together multi-layer feature maps extracted from remote sensing dual-temporal images T1 and T2, and simultaneously calculates their differences to obtain F. cat and F abs Feature maps. This method allows each branch to capture different feature information, which is then fused and fed into separate paths. Initially, the input features F cat and F abs The spatially dependent Q, K, and V generated by DSConv2d processing can be expressed as:

[0093] Q = DSconv2d(F abs )

[0094] K = DSconv2d(F abs )

[0095] V = DSconv2d(F abs )

[0096] After generating Q and K through dynamic convolution, they are fed into a split channel module (SCM). The SCM processes these components by separating channels, halving the number of channels in the original function. This method not only reduces the number of parameters but also lowers computational complexity. Then, a 3×3 convolution is performed on each separated feature map to extract unique feature information. The two convolutional feature maps are concatenated and further refined through two 1×1 convolutions. Furthermore, residual connections are utilized to enhance the learning process, ultimately producing refined features Q' and K'. This approach ensures that the network captures feature representations across different ranges while maintaining efficiency. The SCM can be represented as:

[0097] Q' = SCM(Q)

[0098] K' = SCM(K)

[0099] Here, SCM stands for Separate Channel Module, which extracts more diverse feature information by convolving different channel parts separately. Each part can focus on different types of features, capturing finer-grained details. Based on the obtained enhanced Q and K, matrix multiplication is performed, and then Softmax is applied to compute attention weights. These weights are used to modulate V, producing the final feature output for change detection. Furthermore, residual connections are used to merge the output features with the original features. This method enhances the model's ability to capture complex details and improves the overall feature representation.

[0100] C = Softmax(Q'(K') T )

[0101] =Softmax(Q′(K′) T )*V

[0102] =Softmax(Q'(K') T )*V+F abs .

[0103] The resulting feature map effectively integrates the original input features with features enhanced through an attention mechanism. This integration ensures the model maintains high accuracy and robustness when performing remote sensing change detection tasks. By leveraging attention weights to highlight important aspects of the input data, the model can better distinguish subtle changes in the environment, ultimately leading to more accurate and reliable change detection results.

[0104] 4) Parallel decoding aggregator

[0105] To better capture the temporal changes of features, this invention designs a parallel decoding aggregator module, such as... Figure 4As shown. This invention retains the two-branch decoding: one branch uses concatenation, while the other branch uses feature subtraction. This method fully utilizes the different semantic information carried by the features obtained through concatenation and subtraction. The PDA module aggregates four-layer feature maps in a bottom-up manner, fully integrating the changing semantic information. First, this invention uses... and Taking the feature mapping as an example, the input of this invention... and Then, the present invention uses bilinear interpolation to ensure and The feature sizes are consistent. Next, the present invention connects... and The specific steps are as follows:

[0106]

[0107] Where Conv3×3 represents a combination of 3×3 convolution, batch normalization, and REU activation function, and Up(·) is a bilinear upsampling operation.

[0108] Will and The concatenated layers are then further refined using another 3×3 convolution to adjust the number of channels. The resulting features are divided into two branches: the first branch uses a 3×3 convolution with a dilation rate of 3; the second branch uses parallel 3×3 convolutions with dilation rates of 2 and 1. Finally, these three feature layers are connected to obtain the final feature map. Dilated convolutions are used because they can broaden the applicability of the feature map, reduce the possibility of missing small variations, and allow the model to more comprehensively address changes.

[0109]

[0110] Out=Conv3×3(Concat(X1,X2,X3)))

[0111] 5) Loss Function

[0112] The present invention employs cross-entropy loss to enhance DMDPCANet. Ce is widely used to measure the difference between the predicted probability distribution and the actual distribution, making it ideal for classification tasks. In the context of change detection, this is defined as a pixel-level binary classification problem, where each pixel must be classified as either "changed" or "unchanged".

[0113]

[0114] In this equation, p represents the model's predicted probability of change, y is the binary true label (0 or 1), and N is the number of pixels. This loss function encourages the model to generate probability predictions close to the true label by minimizing the information entropy difference between the predicted probability and the true label.

Claims

1. A dual-branch, multi-scale dynamic local convolutional attention method based on remote sensing change detection, characterized in that, Including the following steps: S1. Collect dual-temporal remote sensing images, create a remote sensing image dataset, ensure that each dual-temporal remote sensing image has pixel-level annotations, divide the remote sensing image dataset into training set, validation set and test set, and cut the dual-temporal remote sensing images in the remote sensing image dataset into the same size. S2. Construct a dual-branch, multi-scale, dynamic partially convolutional attention network architecture based on remote sensing change detection. It includes a feature extractor, a dynamic multi-scale convolutional attention module, a parallel decoding aggregator module, and a loss function. The dynamic multi-scale convolutional attention module is formed by combining a dynamic partial convolutional attention module and a multi-scale context aggregation module. S3. Use a pre-trained lightweight network to extract features from dual-temporal remote sensing images; S4. Input the extracted multi-layer features into the dynamic multi-scale convolutional attention module; S5. Input the features obtained in the dynamic multi-scale convolutional attention module into the parallel decoding aggregator module, where dilated convolution is used to expand the receptive field, and the multi-layer prediction map and loss function output by the parallel decoding aggregator module are used for deep supervision. S6. Save the trained model weights, test them using the test set, and obtain the change detection results for the test image. The operation flow of the dual-branch multi-scale dynamic partially convolutional attention network architecture based on remote sensing change detection in step S2 is as follows: First, multi-level feature maps are extracted from the dual-temporal remote sensing images in two parallel branches of a pre-trained weight-shared CNN, with sizes of 64×64, 32×32, 16×16, and 8×8. Then, the multi-scale context aggregation module aggregates features across different layers, while the dynamic part convolutional attention module uses concatenation and absolute value subtraction to generate two parallel branches, allowing the two branches to simultaneously use dynamic convolution to generate queries, keys, and values. Furthermore, by combining the multi-scale context aggregation module, the information from the two branches is fully integrated, resulting in different semantic change information; Finally, a parallel decoding aggregator is used to employ multi-path feature processing and dilated convolution strategies to generate a prediction map by combining feature information from different layers of the network. The dynamic partial convolutional attention module adopts a two-branch strategy. The execution flow of the dynamic partial convolutional attention module is as follows: First, the multi-layer feature maps extracted from the dual-temporal T1 and T2 remote sensing images are stitched together, and their differences are calculated simultaneously to obtain... and Feature map; Then, input features and Processed by DSConv2d to generate spatially dependent Q, K, and V representations as follows: After generating Q and K through dynamic convolution, they are fed into the split channel module respectively; Then, a 3×3 convolution is performed on each separated feature map to extract unique feature information; The two convolutional feature maps are connected together, and then further refined through two 1×1 convolutions; The learning process is enhanced by utilizing residual connections, ultimately resulting in refined features. and ; The split channel module is represented as: Here, SCM stands for Separate Channel Module, which extracts more diverse feature information by convolving different channel parts separately; Based on the obtained enhancement and Matrix multiplication is performed, and then Softmax is applied to compute attention weights, which are used to modulate V, producing the final feature output for change detection. The parallel decoding aggregator module employs a two-branch decoding approach, with one branch using concatenation and the other using feature subtraction; the parallel decoding aggregator module aggregates four layers of feature maps in a bottom-up manner. The operation flow of the parallel decoding aggregator module is as follows: First, input two feature maps; Then, bilinear interpolation is used to ensure that the feature sizes of the two feature maps are consistent; Next, we connect the two mappings, as follows: Where Conv3×3 represents a combination of 3×3 convolution, batch normalization and ReLU activation function, and Up(·) is a bilinear upsampling operation; The two feature maps are then concatenated, and another 3×3 convolution is used to adjust the number of channels after concatenation. Subsequently, 3×3 convolutions were used to adjust the number of channels after concatenation, and the resulting features were divided into two branches: the first branch used 3×3 convolutions with an expansion rate of 3; the second branch used parallel 3×3 convolutions with expansion rates of 2 and 1. Finally, these three feature layers are connected to obtain the final feature map.

2. The dual-branch multi-scale dynamic local convolutional attention method based on remote sensing change detection according to claim 1, characterized in that, In step S3, the lightweight network is Mobilenetv2. A convolutional layer with a stride of 2 is added to each layer of the backbone network, downsampling the feature map of each level to half the size of the previous level. By feeding a pair of bi-temporal remote sensing images T1 and T2 into the feature extractor Mobilenetv2, four layers of bi-temporal feature maps are obtained, represented as follows: , , ···, , .

3. The dual-branch multi-scale dynamic local convolutional attention method based on remote sensing change detection according to claim 2, characterized in that, The operation flow of the multi-scale context aggregation module is as follows: Using T1 remote sensing images as input to the network model, the first three layers of the four-layer feature extraction are input into the multi-scale context aggregation module, followed by... Sampling was achieved through max pooling. With the same size, it is then passed through a convolutional layer to increase its channel count and... Maintain consistency; similarly, first... The number of channels is achieved through a 3×3 convolutional layer and To maintain consistency, interpolate using double lines. Upsampling to and Consistent; then X1, X2 and X3 are combined by concatenation to gather multi-level feature information, and residual learning is used to retain the original feature information while enriching the contextual feature information; The entire representation process can be expressed as: In this context, Maxpooling(·) represents the max pooling operation, Cat(·) represents the concatenation operation, Conv3×3 represents a system consisting of 3×3 convolution, batch normalization, and ReLU activation functions, and Interpolate represents the upsampling operation.

4. The dual-branch multi-scale dynamic local convolutional attention method based on remote sensing change detection according to claim 3, characterized in that, A DSConv2d layer is introduced on the basis of convolutional neural networks to form a dynamic partial convolutional attention module. The DSConv2d layer combines dynamic weight adjustment and includes selective convolution bias and convolution bias weights. In DSConv2d, the input channel is divided into several blocks according to the block size, and each block undergoes independent dynamic weight adjustment.

5. The dual-branch multi-scale dynamic local convolutional attention method based on remote sensing change detection according to claim 1, characterized in that, The loss function is: Where p represents the model's predicted change probability, y is the binary true label, and N is the number of pixels.