A lightweight remote sensing image change detection method based on binary neural networks and large-kernel strip convolution.
By constructing a lightweight remote sensing image change detection method based on binary neural networks and large kernel strip convolution, the problems of low computational efficiency and insufficient accuracy in existing technologies are solved, and efficient remote sensing image change detection is achieved.
Patent Information
- Authority / Receiving Office
- CN · China
- Patent Type
- Patents(China)
- Current Assignee / Owner
- XIDIAN UNIV
- Filing Date
- 2024-11-20
- Publication Date
- 2026-06-30
AI Technical Summary
Existing deep learning-based remote sensing image change detection methods suffer from low computational efficiency, large parameter count, and insufficient detection accuracy when processing high-resolution images or large-scale scenes.
A lightweight remote sensing image change detection method based on binary neural networks and large-kernel strip convolution is adopted. By constructing two parallel branch networks and a depthwise separable convolutional detection head, a binary weight-sharing ResNet18 network is used for feature extraction and detection. The weights and activations are quantized to +1 and -1, respectively, which reduces the amount of computation and improves the efficiency of feature extraction.
It achieves 32 times the memory saving and 58 times the CPU acceleration, significantly improving detection accuracy and sensitivity to subtle changes. It makes up for the shortcomings of traditional methods in capturing a wide range of contextual information and recognizing complex structures, thus improving detection efficiency and accuracy.
Smart Images

Figure CN119672295B_ABST
Abstract
Description
Technical Field
[0001] This invention belongs to the field of remote sensing image processing technology and relates to a lightweight remote sensing image change detection method. Specifically, it relates to a lightweight remote sensing image change detection method based on binary neural networks and large kernel strip convolution, which can be used in fields such as geological disaster monitoring, land cover surveys, and urban planning. Background Technology
[0002] Remote sensing images are images of the Earth's surface acquired from a distance by spaceborne or airborne sensors, capturing various information such as surface features, vegetation cover, land use, urban expansion, and water distribution. Due to their wide coverage, remote sensing images can quickly scan vast areas, making them widely used in geographic and environmental monitoring. These images typically require digital image processing techniques to extract key information.
[0003] Change detection in remote sensing images refers to processing two images of the same location taken at different times to identify differences between them. Specifically, a change detection system compares two registered remote sensing images and assigns a binary label to each pixel, indicating whether the area corresponding to that pixel has changed during the time period it was captured. In the final change detection map, white typically indicates change, while black indicates no change.
[0004] Image change detection methods can be categorized into traditional methods and deep learning-based methods. Traditional methods include those based on image algebraic operations, image transformations, and post-classification. However, traditional change detection methods suffer from several drawbacks, including the need for complex image preprocessing, sensitivity to noise, difficulty in feature extraction, lack of adaptability, computational complexity, difficulty in real-time application, and often limitation to a single data source, leading to challenges in interpreting the results. These factors restrict both detection accuracy and efficiency. In contrast, deep learning-based methods possess end-to-end characteristics and powerful nonlinear mapping capabilities, effectively avoiding the tedious process of manually designing features in traditional methods.
[0005] Existing deep learning-based remote sensing image change detection methods have relatively high parameter counts and computational complexity, which may lead to low computational efficiency, especially when processing high-resolution images or large-scale scenes. For example, patent application CN118334532A, entitled "Lightweight Remote Sensing Change Detection Method and System Based on Dual-Time Image Remote Sensing," discloses a lightweight remote sensing image change detection method based on dual-time image remote sensing. The remote sensing image change detection network used in this invention includes a feature extraction backbone RSShuffle, LightSEPP, a Transformer encoder-decoder, and a prediction head. The network is trained using a dataset with dual-time image remote sensing data, and outputs a predicted change map of the predicted change region. This invention uses a reduced network architecture to lightweight the model, which reduces the number of model parameters to some extent. However, since full-precision floating-point parameters are still used during computation, the number of model parameters remains large, affecting further improvement in detection efficiency. Summary of the Invention
[0006] The purpose of this invention is to overcome the shortcomings of the existing technology and propose a lightweight remote sensing image change detection method based on binary neural network and large kernel strip convolution, aiming to improve detection efficiency while ensuring detection accuracy.
[0007] To achieve the above objectives, the technical solution adopted by the present invention includes the following steps:
[0008] (1) Obtain the training sample set, validation sample set, and test sample set:
[0009] N pairs of dual-temporal remote sensing images containing multiple target categories are preprocessed, and the changed regions in the corresponding image patches of the preprocessed dual-temporal remote sensing images are labeled. Then, the preprocessed image patches and their labels corresponding to more than half of the dual-temporal remote sensing images are used to form a training sample set, the preprocessed image patches and their labels corresponding to the remaining dual-temporal remote sensing images are used to form a validation sample set, and the other half is used to form a test sample set, where N≥3000;
[0010] (2) Construct a lightweight remote sensing image change detection network model based on binary neural networks and large-kernel strip convolution:
[0011] A lightweight remote sensing image change detection network model O is constructed, consisting of two branch networks with identical structures and arranged in parallel, as well as a subtractor and a depthwise separable convolutional detection head cascaded with them; both branch networks include a cascaded large kernel strip convolutional network and a binarized weight-sharing ResNet18 network; the large kernel strip convolutional network includes a cascaded channel mapping module, feature extraction module and max pooling module;
[0012] (3) Iteratively train the lightweight remote sensing image change detection network model:
[0013] The lightweight remote sensing image change detection network model O is iteratively trained using training and validation sample sets to obtain the trained detection network model O. t ;
[0014] (4) Obtain lightweight remote sensing image change detection results:
[0015] The test sample set is used as the trained detection network model O. * The input is propagated forward to obtain the change detection result corresponding to each test sample.
[0016] Compared with the prior art, the present invention has the following advantages:
[0017] 1. In the process of iteratively training the lightweight remote sensing image change detection network model and obtaining change detection results, this invention uses the ResNet18 network in two branch networks to perform deep feature extraction on the feature maps of the two temporal states, which can quantize the weights and activations to +1 and -1, achieving 32 times memory saving and 58 times CPU acceleration. This avoids the defect of the large number of parameters caused by the use of 32-bit floating-point numbers to store parameters in the prior art, and effectively improves the detection efficiency.
[0018] 2. This invention utilizes a large-kernel strip convolutional network in two branch networks to perform preliminary feature extraction on dual-temporal training samples. This efficiently captures the contextual and directional features of the training samples, significantly improving detection accuracy and sensitivity to subtle changes. It avoids the shortcomings of existing technologies, such as difficulty in capturing large-scale contextual information, insufficient recognition of complex structures and large-scale changes, and low efficiency in feature extraction for linear or elongated structures (e.g., roads, plants). It effectively compensates for the accuracy deficiencies caused by binary quantization. Attached Figure Description
[0019] Figure 1 This is a flowchart illustrating the implementation of the present invention;
[0020] Figure 2 This is a diagram showing the overall structure of the remote sensing image change detection network model of this invention. Detailed Implementation
[0021] The present invention will now be described in further detail with reference to the accompanying drawings and specific embodiments.
[0022] Reference Figure 1 The present invention includes the following steps:
[0023] Step 1) Obtain the training sample set and the test sample set:
[0024] N pairs of dual-temporal remote sensing images containing multiple target categories are preprocessed, and the changed regions in the corresponding image patches of the preprocessed dual-temporal remote sensing images are labeled. Then, the preprocessed image patches and their labels corresponding to more than half of the dual-temporal remote sensing images are used to form a training sample set, the preprocessed image patches and their labels corresponding to the remaining dual-temporal remote sensing images are used to form a validation sample set, and the other half are used to form a test sample set, where N≥3000; in this embodiment, N=3000;
[0025] (1a) Preprocess N pairs of bitemporal remote sensing images containing multiple target categories;
[0026] (1a1) Collect raw remote sensing images of buildings. Raw dual-temporal remote sensing images refer to remote sensing images of the same area or object collected at different time points. Before data preprocessing, raw images often contain sensor noise, and the radiation intensity of the images may be inconsistent.
[0027] (1a2) Remote sensing image preprocessing: each pair of dual-temporal remote sensing images is filtered, and each pair of filtered dual-temporal remote sensing images is radiometrically corrected. Then, each pair of radiometrically corrected dual-temporal remote sensing images is spatially aligned.
[0028] (1a3) The dual-temporal remote sensing image is cropped into n pairs of dual-temporal remote sensing image blocks with a size of H×W. In this embodiment, n=10000 and the cropping size is 256×256.
[0029] (1b) Labeling of change regions in image patches corresponding to dual-temporal remote sensing images after preprocessing: Threshold-based detection, change vector analysis (CVA), and principal component analysis (PCA) are used to automatically identify change regions and generate preliminary label maps, which are then fine-tuned by manual adjustment;
[0030] (1c) Based on this set of dual-temporal remote sensing images and change label maps, construct a dataset containing the first temporal image, the second temporal image, and the related label map;
[0031] (1c1) Organize the images and ensure consistent naming; then create a dataset structure to classify and store the first phase images, the second phase images, and the label images;
[0032] (1c2) The preprocessed image patches and their labels corresponding to more than half of the dual-temporal remote sensing images are used to form a training sample set, the preprocessed image patches and their labels corresponding to the remaining dual-temporal remote sensing images are used to form a verification sample set, and the other half is used to form a test sample set. In this embodiment, the dataset is randomly divided into 60% training sample dataset, 20% verification sample dataset and 20% test sample dataset.
[0033] (1c3) Generate metadata file records the path to ensure that the dataset is available for subsequent analysis and model training;
[0034] (1d) Input the generated dataset into the data augmentation module and set the corresponding data augmentation parameters. To ensure the model's generalization ability, only random horizontal flipping, random vertical flipping, and random blurring operations were enabled during the data augmentation process. These augmentation strategies can effectively improve the robustness of the model without introducing too many complex transformations, while avoiding excessive changes to the original image structure. Let each pair of original dual-temporal remote sensing images be input1 and input2:
[0035] (1d1) Randomly flip the original images input1 and input2 horizontally: When random horizontal flipping is applied, the images are horizontally flipped with a certain probability (50% in this example). This means that there is a 50% probability that the input images will be mirrored, thereby increasing the diversity of the images; it is enabled by the parameter with_random_hflip=True; for some object categories (such as vehicles, animals, etc.), horizontal flipping will not change their category, so it can effectively improve the model's ability to recognize these objects;
[0036] (1d2) Randomly flip the image that has been randomly horizontally flipped: When the random vertical flip is applied, the image is vertically flipped with a certain probability (50% in this embodiment). It is enabled by the parameter with_random_vflip=True. In some application scenarios (such as satellite images or certain types of traffic signs), vertical flipping will not change the content of the image. Therefore, this can further increase the robustness of the model.
[0037] (1d3) The image, after random vertical flipping, is subjected to Gaussian blur to simulate motion blur during shooting. This helps the training model to better adapt to different shooting conditions. The image obtained after the above data augmentation is set as...
[0038] Step 2) Construct a lightweight remote sensing image change detection network model O based on binary neural networks and large-kernel strip convolution:
[0039] Build as Figure 2The remote sensing image change detection network model O shown includes two branch networks with the same structure and arranged in parallel, as well as a subtractor and a depth-separable convolutional detection head cascaded with them, forming a lightweight remote sensing image change detection network model O; both branch networks include a cascaded large kernel strip convolutional network and a binarized weight-sharing ResNet18 network; the large kernel strip convolutional network includes a cascaded channel mapping module, feature extraction module and max pooling module;
[0040] (2a) Construct a large kernel strip convolutional network:
[0041] (2a1) Using the channel mapping module, the channels are mapped and downsampled to obtain a low-resolution dual-temporal image I. 1_new and I 2_new The size becomes
[0042] (2a2) Transform the dual-temporal image I 1_new and I 2_new The input is a large kernel convolution module, which is actually a multi-scale convolution parallel module, consisting of three key parts:
[0043] (2a21) First, by aggregating local information through depthwise convolution, the detailed features of the image can be effectively extracted;
[0044] (2a22) Secondly, the module uses multi-branch depth-direction strip convolution to capture multi-scale contextual information, which is particularly important for handling complex structures and changes;
[0045] (2a23) Finally, convolution with a kernel size of 1 is used to model the relationship between different channels, so as to achieve the fusion and optimization of information from each channel;
[0046] (2a3) Let F be the feature extracted by convolution in the depth direction. i (i = 1, 2), then we have:
[0047] F i =Conv(I i_new ), (i = 1, 2);
[0048] (2a4) The features F extracted by depthwise convolution are... i (i=1,2) are input into the depthwise strip convolution module, where the convolutional layer has three branches. Each branch uses two depthwise strip convolutions to approximate a large kernel standard depthwise convolution; the kernel sizes of the three branches are set to 7, 11, and 21 respectively.
[0049]
[0050] Scalej It is a skip connection, which means that in this network structure, the input is directly passed to the next layer through identity connections without any additional transformation, so as to ensure that the information can cross multiple layers without being destroyed by the intermediate layers.
[0051] There are two main reasons for choosing to use depth strip convolution in this embodiment: First, strip convolution has the advantage of being lightweight; compared with standard convolution, strip convolution can achieve similar results by decomposing the 7×7 convolution into two sets of convolution operations, namely 7×1 and 1×7, while significantly reducing the amount of computation; Second, strip convolution is particularly effective in change detection tasks, especially when dealing with long and thin objects such as people and utility poles, it can make up for the shortcomings of traditional mesh convolution and better extract these strip features.
[0052] Furthermore, gradually increasing the kernel size allows for the capture of a wider range of contextual information, which helps to obtain global features at a larger scale, thereby improving the accuracy of change detection.
[0053] (2a5) The final step of the convolutional module is to weight the output of the 1×1 convolution with the initial input of the module. This residual connection can better preserve the original feature information, enhance the feature transfer capability, and ultimately improve the model's expressiveness and detection accuracy. Let the output of this module be Out. i (i = 1, 2);
[0054] Out i =Conv 1×1 (F i ), i = 1, 2;
[0055] (2a6) The retained input features and the output features of the large kernel convolution are multiplied element-wise to weight the original features, thereby highlighting important feature information:
[0056]
[0057] Through weighted operations, the model can amplify those strip-shaped target features that are easily ignored in the feature space, while suppressing unimportant or irrelevant information. This mechanism allows the network to focus more on the most critical parts of the image, such as edges, object outlines, or strip-shaped objects, thereby improving the detection accuracy and recognition ability of these important features.
[0058] (2a7) Construct a max pooling module to fuse features extracted at different scales to obtain feature maps;
[0059] (2b) Construct a binary-quantized ResNet18 and perform 1-bit quantization on it, including the following steps:
[0060] (2b1) For any convolutional layer, the input is defined as... Weight is The output of the convolutional layer is To distinguish them, let the full-precision weights and full-precision inputs be ω. f ,I f The binarization weights and binarization inputs are ω, respectively. b ,I b And the only possible values are +1 and -1;
[0061] (2b2) For a full-precision convolutional layer, the output is:
[0062]
[0063] in, This is a standard convolution operation;
[0064] (2b3) Next, the weights and activations are quantized using different methods. For the weights, the following steps are required:
[0065] (2b31) The weights are scaled using a weight normalization method, for the input full-precision weights ω. f After this process, the standardized weights are represented as ω. f '
[0066]
[0067] Where is the mean of the weights, and is the scaled standard deviation;
[0068] μ = mean(ω) f ,dim=[1,2,3],keepdim=True)
[0069]
[0070] First, let's consider the weight ω. f Performing mean removal, by subtracting the mean of the weights, makes the mean of the weights after processing equal to 0. This helps improve training stability and convergence speed because mean removal can reduce bias.
[0071] Then calculate the weight ω f The standard deviation of the weights is calculated, and a small constant 1e-5 is added during the calculation to avoid the denominator being zero; this step ensures that the calculation will not be incorrect due to the variance of the weights being zero during the standardization process.
[0072] The final standard deviation was scaled by dividing by a constant to scale it to a suitable range; in this embodiment, we take... This scaling is to ensure that the subsequent training process is within an appropriate numerical range, thereby improving the model's learning ability;
[0073] (2b32) Prune the weights to restrict them to [Q]. τ ,-Q τ Within the scope of ], the processing steps are as follows:
[0074] First, calculate the expected value of the weights:
[0075]
[0076] Where K is the weight ω′ f The number of elements;
[0077] Next, the quantization threshold Q is calculated. τ :
[0078] Q τ =-E ω ·log(2-2τ)
[0079] For weight ω′ f Perform cropping to restrict it to [Q] τ ,-Q τ Within the scope of ]:
[0080] ω′ f =clamp(ω′) f ,-Q τ Q τ )
[0081] Here, `clamp(x,min,max)` is a function that restricts the input `x` to between the minimum value `min` and the maximum value `max`.
[0082] This method can eliminate unimportant weights, effectively reducing the computational load of the model and the time required for matrix operations.
[0083] (2b33) Binarize the processed weights during forward propagation. The processing steps are as follows:
[0084]
[0085] (2b34) During backpropagation, due to the non-differentiability of the sign function, ω b The gradient cannot be directly calculated, so a straight-through estimator (STE) is used to approximate the gradient calculation. The process of calculating the partial derivatives of the loss function and parameters during backpropagation is as follows:
[0086]
[0087] in, Represents the loss function;
[0088] (2b4) For activation, the following steps are required:
[0089] (2b41) During training, activations should be standardized to maintain a stable scale and prevent excessive data variation. However, activations should not be standardized during inference and testing. The standardization steps are as follows:
[0090]
[0091] Where ∈ is the smoothing term, which is taken as 1e-5 in this embodiment, Var(I f ) represents the variance of full-precision activation in dimension [1, 2, 3]:
[0092]
[0093] (2b42) Binarize the processed activations during forward propagation. The processing steps are as follows:
[0094]
[0095] (2b43) During backpropagation, due to the non-differentiability of the sign function, I b The gradient cannot be calculated directly, so a polynomial method is used. The process of calculating the partial derivatives of the loss function and parameters during backpropagation is as follows:
[0096]
[0097] Decomposing it, we have:
[0098]
[0099] (2b5) After completing the weight and activation binarization, the convolution operation can be replaced by bitwise operations:
[0100]
[0101] Where, represents the dot product operation, including XNOR and POPCONT;
[0102] (2b6) In previous studies, many binarized networks have multiplied the weights and activations by a scaling factor in order to reduce quantization error. Although the weights and activations are still binarized, they are actually represented by a small number of floating-point numbers. Although this method avoids a significant decrease in accuracy to some extent, it does not achieve a complete binarization effect.
[0103] This embodiment improves upon the method by introducing a scaling factor α after completing the binarized convolution operation. The output result is multiplied by this factor and returned to the next layer of the network. The convolutional layer only involves +1 and -1 in the operation. This not only improves the input accuracy of the next convolutional layer, but also accelerates the entire convolution operation. Through this optimization strategy, both computational efficiency and accuracy are maintained, resulting in better performance.
[0104]
[0105] (2b7) Using the quantized convolutional layer as the basic component, construct ResNet18 as a network layer with two weights shared, denoted as CNN_Backbone;
[0106] (2b8) After the image passes through two ResNet18 feature extraction networks with shared weights, two feature maps X with the same dimension are obtained. i (i = 1, 2):
[0107] X i =CNN_Backbone(I i ′), i=1,2
[0108] (2c) Constructing a detection head based on depthwise separable convolution includes the following steps:
[0109] (2c1) Upsample the feature map to obtain a feature map of the same size as the input image;
[0110] (2c2) The transformation map is input into the prediction head, which consists of a set of depthwise convolutions:
[0111] (2c21) First, a depthwise convolution is performed, which is an operation performed only on each channel individually:
[0112]
[0113] Among them, y i It is the output feature, x i,k It is the input feature, w i,k It is the convolution kernel for each channel, and the operation is completed through depthwise convolution;
[0114] (2c22) The second step is to perform pointwise convolution, using a convolution kernel of size 1 for each output channel to combine information from different channels and complete the task of mapping in_channels to out_channels:
[0115]
[0116] Where M is the number of channels, K i It is a convolutional unit with a kernel size of 1;
[0117] (2c3) Using depthwise separable convolution can effectively reduce computational complexity. The computational complexity of standard convolution is O(C0). in ×C out ×K 2 This embodiment introduces a depthwise separable convolution module, whose computational complexity is O(C0). in ×K 2 +C in ×C out This greatly reduces the amount of computation.
[0118] Step 3) Iteratively train the lightweight remote sensing image change detection network model:
[0119] (3a) The initial number of iterations is t, the maximum number of iterations is T, T≥200, and the convolutional kernel weights and bias parameters of the current change detection network model are a. t And let t = 1;
[0120] (3b) The large-kernel strip convolutional network in the first branch performs preliminary feature extraction on the training samples in the first time step. The ResNet18 network performs deep feature extraction on the feature maps obtained from the feature extraction, resulting in the feature map X of each training sample in the first time step. 1 Simultaneously, the large-kernel strip convolutional network in the second branch performs preliminary feature extraction on the training samples in the second time step, and the ResNet18 network performs deep feature extraction on the feature maps obtained from the feature extraction, resulting in the feature map X of each training sample in the second time step. 2 ;
[0121] (3b1) Channel mapping module, which maps the channels of each training sample in the first time to a high dimension and downsamples the mapped image;
[0122] (3b2) Feature extraction module: Performs feature extraction at different scales on each downsampled image to capture rich spatial features at different scales;
[0123] (3b3) Max pooling module, which performs feature fusion on the extracted features at different scales to obtain feature maps;
[0124] (3c) Subtractor on feature map X 1 With X 2 Perform differential operation: Preliminary change graph X obtained by the detection head through differential operation. new =X 1 -X 2 The detection is performed to obtain the predicted result y of the change detection;
[0125] (3D) Through Real Labels The cross-entropy loss value L of the change detection network model O is calculated based on the predicted result y. ce and dice loss value L dice ; and through L ce and L dice Calculate the total loss value (Loss) for O; then apply the Adam optimizer and adjust a based on the loss. t The update is performed to obtain the change detection network model O for this iteration. t ;
[0126] (3d1) Change detection is a common binary classification problem; therefore, the cross-entropy loss formula for change detection is:
[0127]
[0128] in, These are real labels;
[0129] The cross-entropy loss function measures the difference between the probability distribution predicted by the model and the true class; the smaller the value, the closer the model prediction is to the true value.
[0130] (3d2) Class imbalance loss (dice loss) is a metric that measures the similarity between two sample sets; a higher value indicates that the prediction is closer to the true label. To optimize the network, the loss function returns 1 minus the Dice coefficient, so that the Dice coefficient is maximized by minimizing the loss:
[0131]
[0132] (3d3) via L ce and L dice Calculate the total loss value (Loss) for O:
[0133] Loss = L ce +λ×L dice
[0134] Where λ represents the weighting coefficient;
[0135] (3d4) Update the model parameters:
[0136]
[0137]
[0138] m t =β1m t-1 +(1-β1)g t
[0139]
[0140] Where, α t This represents the kernel weights and bias parameters for the current round, α. t+1 This represents the kernel weights and bias parameters for the next round of convolution, where η represents the learning rate. It is a corrected first-order momentum estimate. It is the exponentially weighted moving average of the corrected squared gradient, m t This is a momentum estimation section, where β1 is the exponential decay rate of the first-order momentum, β2 is the exponential decay rate of the second-order moment estimation, and v t It is the exponentially weighted moving average of the squared gradient.
[0141] (3e) Calculate the change detection network model O for this iteration using validation set samples. t The F1 score metric is used to determine the corresponding change detection network model O. t Save it, and make the optimal model O* = O t And save its corresponding F1_Score;
[0142] (3f) Determine whether t = T holds true. If so, obtain the trained detection network model O. * Otherwise, let t = t + 1, O = O t and perform step (3b);
[0143] (3g) After each training round, the change detection network model O of the current iteration is calculated using the validation set samples. t The F1 score is calculated and compared with the previously saved best F1 score; if the current F1 score is higher than the previous best value, then the current model's O(n) score is adjusted. t Set as the best model O * If the best model is not found, its corresponding F1 score is saved; otherwise, the original best model and F1 score are kept unchanged.
[0144] (3g1) Quantitative analysis of prediction results is performed using five commonly used evaluation metrics for change detection: these metrics include: precision, intersection-over-union ratio (IoU), recall, F1 score, and overall accuracy, TP (True Positive), TN (True Negative), FP (False Positive), and FN (False Negative); the higher the value of these evaluation metrics, the more ideal the model's prediction results are;
[0145] TP: The number of pixels predicted as positive and actually positive (i.e., the model correctly detected the actual area of change);
[0146]
[0147] TN: The number of pixels that were predicted as negative and were actually negative (i.e., the model correctly detected regions that did not actually change).
[0148]
[0149] FP: The number of pixels predicted as positive but actually negative (i.e., the model incorrectly detects unchanged regions as changed regions);
[0150]
[0151] FN: The number of pixels predicted as negative but actually positive (i.e., the model failed to detect the actual area of change);
[0152]
[0153] Precision: The proportion of pixels that are actually positive out of all pixels predicted to be positive; a higher precision indicates a lower false positive rate. This invention uses the letter P to represent it.
[0154]
[0155] Recall: The proportion of pixels that are correctly predicted to be positive out of all actual positive pixels; a higher recall indicates a lower false negative rate. This invention uses the letter R to represent it.
[0156]
[0157] F1 Score: The harmonic mean of precision and recall, a comprehensive metric that balances precision and recall.
[0158]
[0159] Overall Accuracy: The proportion of pixels correctly predicted by the model out of the total number of pixels. This invention uses the letter OA to represent it.
[0160]
[0161] Step 4) Obtain the change detection results of the lightweight remote sensing image:
[0162] The test sample set is used as the trained detection network model O. *The input is propagated forward to obtain the change detection result corresponding to each test sample.
[0163] The technical effects of the present invention will be further illustrated by the following simulation experiments:
[0164] 1. Simulation conditions and content:
[0165] This experiment used three datasets: LEVIR-CD Dataset, CDD Dataset, and WHUDataset. These three datasets will be introduced in turn below.
[0166] LEVIR-CD (Large-scale Earth Vision Infrared Remote Change Detection Dataset) is a widely used remote sensing change detection dataset, focusing on high-resolution urban building change detection and covering scenarios such as building expansion. This dataset contains 637 pairs of 1024x1024 resolution remote sensing images. Each pair of images represents the same area captured at different time points, used to capture changes in buildings.
[0167] The CDD Dataset (Change Detection Dataset) is a public dataset for remote sensing change detection. It contains multiple pairs of remote sensing images with a resolution of 256x256. The images cover a variety of geographical scenes (e.g., urban, rural), providing a diverse training and testing platform. The image resolution is 256x256, and the data size is moderate. This dataset features a wider variety of image types, including changes in different geographical environments. Covering diverse scenes, it is more suitable for a wide range of change detection tasks, including vegetation, disasters, and urbanization, making it a general-purpose change detection dataset.
[0168] The WHU Dataset (Wuhan University Building Dataset) is a dataset released by Wuhan University specifically for building extraction and change detection. It includes multiple satellite images of different resolutions, most of which are high-resolution images. It is mainly used to evaluate building change detection tasks, focusing on building extraction and change detection, providing higher resolution, and is suitable for building detail detection.
[0169] The simulation experiments were conducted on a server with an NVIDIA A100 GPU. The operating system was Ubuntu 18.04, the deep learning framework was PyTorch, and the programming language was Python 3.10.
[0170] 2. Simulation Result Analysis
[0171] The present invention was compared with the change detection network proposed in the existing "Lightweight Remote Sensing Change Detection Method and System Based on Dual-Time Remote Sensing Images" through simulation. The results are shown in Table 1.
[0172] Experiments demonstrate that this invention achieves improved accuracy compared to existing technologies on three publicly available datasets. Simultaneously, it significantly reduces the number of parameters and compresses the number of floating-point operations by 80 times, resulting in memory savings and CPU acceleration. This avoids the drawback of existing technologies using 32-bit floating-point numbers for parameter storage, leading to a large number of parameters and effectively improving detection efficiency.
[0173]
Claims
1. A lightweight remote sensing image change detection method based on binary neural network and large kernel strip convolution, characterized in that, Includes the following steps: (1) Obtain the training sample set, validation sample set, and test sample set: The method comprises the following steps: The double-time remote sensing image pairs are preprocessed, and the changed regions in the image blocks corresponding to each of the preprocessed double-time remote sensing image pairs are labeled. Then, more than half of the preprocessed image blocks corresponding to the double-time remote sensing image pairs and the labels thereof are used to form a training sample set, and the preprocessed image blocks corresponding to the other half of the double-time remote sensing image pairs in the remaining part and the labels thereof are used to form a verification sample set, and the other half is used to form a test sample set, wherein ; (2) Construct a lightweight remote sensing image change detection network model based on binary neural networks and large kernel strip convolution: A lightweight remote sensing image change detection network model is constructed, comprising two branch networks with identical structures arranged in parallel, and cascaded subtractors and depthwise separable convolutional detection heads. Both of the branch networks include a cascaded large-kernel strip convolutional network and a binarized weight-sharing ResNet18 network; the large-kernel strip convolutional network includes a cascaded channel mapping module, feature extraction module and max pooling module; (3) Iteratively train the lightweight remote sensing image change detection network model: The lightweight remote sensing image change detection network model was tested using training and validation sample sets. Perform iterative training to obtain a trained detection network model. ; (4) Obtain the change detection results of lightweight remote sensing images: The test sample set is used as the trained detection network model. The input is propagated forward to obtain the change detection result corresponding to each test sample.
2. The method according to claim 1, characterized in that, The step (1) described above involves multiple target categories. The preprocessing steps for dual-temporal remote sensing images are as follows: Each pair of dual-temporal remote sensing images is filtered separately, and each pair of filtered dual-temporal remote sensing images is radiometrically corrected separately. Then, each pair of radiometrically corrected dual-temporal remote sensing images is spatially aligned, and then each pair of aligned dual-temporal remote sensing images is cropped into multiple image blocks.
3. The method according to claim 1, characterized in that, The lightweight remote sensing image change detection network model described in step (2) ,in: The ResNet18 network consists of stacked convolutional layers, residual blocks, global average pooling layers, and fully connected layers. The large-kernel strip convolutional network consists of a channel mapping module implemented by two convolutional layers; a feature extraction module comprising multiple parallel large-kernel strip convolutional layers; and a max pooling module comprising stacked max pooling layers and PReLU activation function layers. The prediction head consists of stacked deep convolutional layers, batch normalization layers, PReLU activation function layers, and pointwise convolutional layers.
4. The method according to claim 1, characterized in that, The iterative training of the lightweight remote sensing image change detection network model described in step (3) is implemented as follows: (3a) Initialize the number of iterations to be The maximum number of iterations is , The current change detection network model has the following convolutional kernel weights and bias parameters: and order ; (3b) The large-kernel strip convolutional network in the first branch performs preliminary feature extraction on the training samples in the first time step, and the ResNet18 network performs deep feature extraction on the feature maps obtained from the feature extraction to obtain the feature maps of each training sample in the first time step. Simultaneously, the large-kernel strip convolutional network in the second branch performs preliminary feature extraction on the training samples in the second time step, and the ResNet18 network performs deep feature extraction on the feature maps obtained from the feature extraction, thus obtaining the feature map of each training sample in the second time step. ; (3c) Subtractor on feature map and Perform differential operations; the detection head obtains a preliminary change graph from the differential operations. The detection was performed to obtain the predicted results of the change detection. ; (3D) Through Real Labels and prediction results Computational change detection network model Cross-entropy loss value and dice loss value ; and through and calculate Total loss value ; Then use the Adam optimizer, and through right The change detection network model for this iteration is updated. ; (3e) Calculate the change detection network model for this iteration using validation set samples. F1 Score and to and the corresponding change detection network model Save the best model. ; (3f) Judgment If true, then the trained detection network model is obtained. Otherwise, let , The change detection network model for the current iteration is calculated using validation set samples. F1 Score ,judge Is it true? If so, then... As the best model and order Otherwise, keep and No change; then proceed to step (3b).
5. The method according to claim 4, characterized in that, The large-kernel convolutional network in the first branch network described in step (3b) extracts features from the training samples in the first time step. The implementation steps are as follows: (3b1) The channel mapping module maps the channels of each training sample in the first time to a high dimension and downsamples the mapped image; (3b2) The feature extraction module extracts spatial features at different scales for each downsampled image; (3b3) The max pooling module fuses the extracted spatial features at different scales to obtain a feature map.
6. The method according to claim 4, characterized in that, The change detection network model described in step (3d) Cross-entropy loss value and dice loss value ,as well as Total loss value The calculation formulas are as follows: ; ; ; in, Indicates the smoothing term. This represents the weighting coefficient.
7. The method according to claim 6, characterized in that, The steps described in step (3d) The update is performed using the following formula: ; ; ; ; ; ; in, , They represent the first , The convolution kernel weights and bias parameters for the next iteration Indicates the learning rate. It is a corrected first-order momentum estimate. It is the exponentially weighted moving average of the corrected squared gradient. It is a section on momentum estimation. It is the exponential decay rate of first-order momentum. It is the exponential decay rate estimated by the second moment. It is an exponentially weighted moving average of the squared gradient.