An optical remote sensing image slice-level change detection method and device
By using a twin encoder and multi-level feature compression technology to screen out image slices of the changed areas for detection, the problem of limited computing resources and insufficient real-time performance in remote sensing change detection systems is solved, and efficient optical remote sensing image slice-level change detection is achieved.
Patent Information
- Authority / Receiving Office
- CN · China
- Patent Type
- Patents(China)
- Current Assignee / Owner
- TSINGHUA UNIVERSITY
- Filing Date
- 2023-10-30
- Publication Date
- 2026-06-26
Smart Images

Figure CN117372842B_ABST
Abstract
Description
Technical Field
[0001] This application relates to the field of remote sensing image processing, and in particular to a method and apparatus for detecting slice-level changes in optical remote sensing images. Background Technology
[0002] The task of remote sensing change detection is to identify changes in land features in a region by comparing remote sensing images of the same area acquired at different times. In existing change detection systems, remote sensing images captured by satellites need to be downlinked to ground stations for processing and analysis. However, with the development of remote sensing technology, the resolution and data volume of acquired remote sensing images have increased rapidly, increasing the pressure on the data downlink and requiring more communication bandwidth and time.
[0003] For change detection in high-resolution remote sensing images, the following methods are employed: Figure 1 The detection framework shown in (a) performs change detection by cropping the large remote sensing image acquired by satellite into many fixed-size image patches, performing pixel-level change detection on all image patches, and finally stitching the change detection results of the image patches together to obtain the detection result of the large remote sensing image. Although this method can overcome the problem of limited computing resources, processing a large number of image patches will consume a considerable amount of detection time, making it difficult to meet real-time requirements.
[0004] In practical applications, the distribution of changed regions in large remote sensing images is usually sparse. Most image patches obtained from slicing do not actually contain changed regions. Applying pixel-level change detection methods to all image patches is unnecessary and consumes additional time. Therefore, as Figure 1 As shown in (b), image slices containing changing regions are selected before applying the pixel-level model. Then, pixel-level change detection is performed only on the changed image slices, thus reducing time costs and computational resource consumption. Therefore, how to select image slices that have changed is a pressing technical problem that needs to be solved. Summary of the Invention
[0005] In view of the above problems, embodiments of this application provide a method and apparatus for detecting slice-level changes in optical remote sensing images, so as to overcome the above problems or at least partially solve the above problems.
[0006] A first aspect of this application discloses a method for detecting slice-level changes in optical remote sensing images, the method comprising:
[0007] Obtain image slice pairs of two temporal optical remote sensing images, wherein the two temporal optical remote sensing images include an optical remote sensing image before the change and an optical remote sensing image after the change;
[0008] The image slices are processed by a twin encoder to extract features, resulting in a two-phase multi-level feature map. The twin encoder is obtained by a sensitivity-based network pruning method.
[0009] The two-phase multi-level feature maps are compressed using a multi-level feature compression module to obtain two-phase global feature vectors.
[0010] Based on the two-phase global feature vectors, a difference feature vector is obtained, and the difference feature vector is input into the decision network for change detection to obtain the change detection result of the image slice pair.
[0011] Optionally, the twin encoder includes multiple levels, and the twin encoder is obtained through the following sensitivity-based network pruning method:
[0012] Based on the initial pruning rate, each level of the pre-trained initial twin encoder is pruned to obtain the pruned initial twin encoder.
[0013] The network parameters of the pruned initial twin encoder are used as pre-training weights, and the pruned initial twin encoder is trained to obtain the performance loss after pruning at each level.
[0014] Calculate the sensitivity of each level based on the defined sensitivity function and the performance loss after pruning;
[0015] The initial pruning rate is corrected using the sensitivity of each level to obtain the corrected pruning rate for each level;
[0016] The initial twin encoder is pruned according to the modified pruning rate of each level to obtain the twin encoder.
[0017] Optionally, based on the initial pruning rate, each level of the pre-trained initial twin encoder is pruned to obtain the pruned initial twin encoder, including:
[0018] Based on the L1 norm criterion, each level of the initial twin encoder is pruned sequentially to remove elements with the smallest L1 norm value at each level. One channel is used to obtain the pruned initial twin encoder;
[0019] Wherein, the value of N is equal to the product of the number of channels at the corresponding level and the initial pruning rate.
[0020] Optionally, based on the defined sensitivity function and the performance loss after pruning, the sensitivity of each level is calculated, including:
[0021] From the performance loss after pruning at multiple levels of the initial twin encoder after pruning, the minimum performance loss and the maximum performance loss after pruning are determined.
[0022] The sensitivity of each level is obtained based on the minimum performance loss after pruning, the maximum performance loss after pruning, and the performance loss after pruning at each level.
[0023] Optionally, the initial twin encoder is pruned according to the modified pruning rate of each level to obtain a twin encoder, including:
[0024] Based on the number of channels in each level of the initial twin encoder and the corrected pruning rate corresponding to that level, the number of channels after pruning in each level is obtained;
[0025] Based on the number of channels after pruning at each level, the channels at each level are pruned to obtain a twin encoder.
[0026] Optionally, the multi-level feature compression module compresses the feature map at each level; the multi-level feature compression module compresses the two-phase multi-level feature maps to obtain a two-phase global feature vector, including:
[0027] The multi-level feature compression module uses convolution and max pooling operations at each level to compress the feature map of that level in both channel and spatial dimensions, resulting in a two-phase multi-level compressed feature map.
[0028] Flatten the two-phase multi-level compressed feature maps to obtain one-dimensional feature vectors of different levels in the two phases;
[0029] The one-dimensional feature vectors of different levels in the two time phases are concatenated to obtain the global feature vectors of the two time phases.
[0030] Optionally, the pooling window size used for the max pooling operation in each level of the multi-level feature compression module is the same, and the spatial size of the output feature map of the pooling layer decreases sequentially as the level increases.
[0031] Optionally, the difference feature vector is input into a decision network for change detection, including:
[0032] The decision network uses two fully connected layers to detect changes in the differential feature vectors and generates binary labels representing changes and no changes. These binary labels represent the change detection results of the image slice pairs.
[0033] Optionally, image tile pairs of two temporal optical remote sensing images are acquired, including:
[0034] Based on the preset sliding window size and step size, the two temporal optical remote sensing images are sliced to obtain image slice pairs;
[0035] Wherein, the sliding window size represents the size of the image slice, and the step size represents the overlap rate of adjacent slices.
[0036] A second aspect of this application discloses an optical remote sensing image slice-level change detection device, the device comprising:
[0037] The slicing module is used to acquire image slice pairs of two temporal optical remote sensing images, wherein the two temporal optical remote sensing images include an optical remote sensing image before the change and an optical remote sensing image after the change;
[0038] The extraction module is used to extract features from the image slice pairs using a twin encoder to obtain two-phase multi-level feature maps. The twin encoder is obtained based on a network pruning method with sensitivity.
[0039] The compression module is used to compress the two-phase multi-level feature maps using the multi-level feature compression module to obtain the two-phase global feature vectors.
[0040] The detection module is used to obtain the difference feature vector based on the two-phase global feature vectors, and input the difference feature vector into the decision network for change detection to obtain the change detection result of the image slice pair.
[0041] The embodiments of this application have the following advantages:
[0042] In this embodiment, to maximize model complexity and improve inference speed while ensuring detection accuracy, multi-level feature compression and network pruning techniques are used to achieve slice-level change detection of optical remote sensing images. First, image slice pairs of two-temporal optical remote sensing images are acquired, including the image before and after the change. A twin encoder is used to extract features from the image slice pairs, obtaining multi-level feature maps for both temporal phases. Then, a multi-level feature compression module is used to compress these multi-level feature maps, obtaining global feature vectors for both temporal phases. Finally, a difference feature vector is obtained based on the global feature vectors for both temporal phases, and this difference feature vector is input into a decision network for change detection, yielding the change detection result for the image slice pairs. Since the twin encoder used for feature extraction is obtained through a sensitivity-based network pruning method, the network is more lightweight and easier to deploy. The fusion of the compressed multi-level feature maps for both temporal phases into a global feature vector for change detection improves detection accuracy. Therefore, a lightweight slice-level change detection method for optical remote sensing images is achieved. Attached Figure Description
[0043] To more clearly illustrate the technical solutions of the embodiments of this application, the drawings used in the description of the embodiments of this application will be briefly introduced below. Obviously, the drawings described below are only some embodiments of this application. For those skilled in the art, other drawings can be obtained based on these drawings without creative effort.
[0044] Figure 1 These are schematic diagrams of two high-resolution remote sensing large-image change detection frameworks;
[0045] Figure 2 This is a flowchart illustrating the steps of a method for detecting slice-level changes in optical remote sensing images provided in an embodiment of this application.
[0046] Figure 3 This is a schematic diagram of the structure of a multi-level feature compression module provided in an embodiment of this application;
[0047] Figure 4 This is a schematic diagram of the structure of a slice-level detection network provided in an embodiment of this application;
[0048] Figure 5 This is a schematic diagram of the structure of an optical remote sensing image slice-level change detection device provided in an embodiment of this application. Detailed Implementation
[0049] To make the above-mentioned objectives, features, and advantages of this application more apparent and understandable, the technical solutions in the embodiments of this application will be clearly and completely described below with reference to the accompanying drawings. Obviously, the described embodiments are only some embodiments of this application, not all embodiments. Based on the embodiments in this application, all other embodiments obtained by those skilled in the art without creative effort are within the scope of protection of this application.
[0050] To achieve slice-level change detection in optical remote sensing images, related techniques typically employ networks such as VGG and ResNet for feature extraction. These networks have a relatively large number of basic parameters and exhibit significant redundancy between channels. Redundant feature channels account for a large proportion of the parameter and computational costs, but their contribution to improving detection accuracy is very limited. Therefore, to maximize model complexity and improve inference speed while maintaining high detection accuracy, this application employs multi-level feature compression and sensitivity-based network pruning techniques to design an efficient and lightweight slice-level detection network for rapid slice-level change detection in optical remote sensing images. A detailed description of the slice-level change detection method for optical remote sensing images is provided below.
[0051] Reference Figure 2 As shown, Figure 2This is a flowchart illustrating the steps of a method for detecting slice-level changes in optical remote sensing images provided in an embodiment of this application. Figure 2 As shown, the method for detecting slice-level changes in optical remote sensing images provided in this application embodiment may include steps S210 to S240:
[0052] Step S210: Obtain image slice pairs of two temporal optical remote sensing images, wherein the two temporal optical remote sensing images include the optical remote sensing image before the change and the optical remote sensing image after the change.
[0053] In this embodiment, two-phase optical remote sensing images refer to images taken at different time points. and A pair of high-resolution remote sensing images of the same region is obtained. An image tile pair refers to two image slices corresponding to the same location in the optical remote sensing image before and after the change. By acquiring image tile pairs from two temporal optical remote sensing images, subsequent change detection can be performed on the image tile pairs to determine whether the large image region corresponding to the image slices has changed.
[0054] In one optional implementation, obtaining a pair of image slices from two temporal optical remote sensing images includes: slicing the two temporal optical remote sensing images according to a preset sliding window size and step size to obtain a pair of image slices; wherein the sliding window size represents the size of the image slice, and the step size represents the overlap rate of adjacent slices.
[0055] In practice, the optical remote sensing image before and after the change is moved and segmented sequentially according to the preset sliding window size and step size to obtain image slices covering the entire optical remote sensing image. Then, two image slices corresponding to the same position in the optical remote sensing image before and after the change are combined into image slice pairs.
[0056] Step S220: Use a twin encoder to extract features from the image slices to obtain a two-phase multi-level feature map. The twin encoder is obtained by processing a sensitivity-based network pruning method.
[0057] In this embodiment, the two encoders in the twin encoder share parameters. The twin encoder includes multiple levels, thus allowing the extraction of feature information from different levels for subsequent discrimination. The two-phase multi-level feature map refers to the multi-level feature map corresponding to two image slices. In specific implementation, the two encoders in the twin encoder each perform multi-level feature extraction on one image slice from the image slice pair to obtain the multi-level feature map corresponding to that slice. The multi-level feature map refers to feature maps at multiple levels from shallow to deep. Shallow feature maps contain more detailed information (e.g., edge information, contour information, etc.), while deep feature maps contain more overall information. Therefore, by extracting the two-phase multi-level feature map, the obtained feature information is more comprehensive.
[0058] Specifically, the Siamese network is implemented based on the ResNet18 network. Compared with commonly used networks such as VGG and UNet, ResNet18 uses residual connections, avoiding the vanishing gradient problem and improving network performance. In addition, the ResNet18 network has a smaller size, contains fewer parameters, and has fewer multiply-accumulate operations per second, resulting in better model efficiency.
[0059] Step S230: Compress the two-phase multi-level feature maps using a multi-level feature compression module to obtain two-phase global feature vectors.
[0060] In this embodiment, for slice-level detection tasks, the size differences of the variation regions contained in image slices are usually large, and prediction using only the feature information of the deepest layer of the network will limit the detection accuracy. To solve this problem, we consider fusing features from multiple layers for subsequent change detection. The two-phase global feature vector refers to the global feature vectors of the two image slices corresponding to the image slice. A multi-level feature compression module is used to compress and fuse the multi-level feature maps corresponding to the two-phase image slices respectively to obtain the global features corresponding to the image slice. Since the global feature vector is obtained by compressing and fusing multi-level feature maps, it contains rich detailed and semantic information, making it more useful for change detection.
[0061] Step S240: Based on the two-phase global feature vectors, obtain the difference feature vector, and input the difference feature vector into the decision network for change detection to obtain the change detection result of the image slice pair.
[0062] In this embodiment, the absolute value of the difference between the two temporal global feature vectors is taken to generate a difference feature vector. The difference feature vector contains feature information related to the change region, and change detection can be performed based on the difference feature vector.
[0063] In one optional embodiment, the differential feature vector is input into a decision network for change detection, including: the decision network uses two fully connected layers to perform change detection on the differential feature vector and generates binary labels representing changes and no changes, the binary labels representing the change detection results of the image slice pair.
[0064] Specifically, the decision network uses the change region-related features contained in the difference feature vector to predict the probability that an image slice belongs to a change region or not. Image slices with a change probability greater than the change threshold are identified as containing change regions and are labeled as change regions; image slices with a change probability not greater than the change threshold are identified as not containing change regions and are labeled as not change regions, thus obtaining binary labels.
[0065] In summary, this application embodiment aims to minimize model complexity and improve inference speed while ensuring detection accuracy. It utilizes multi-level feature compression and network pruning techniques to achieve slice-level change detection in optical remote sensing images. First, image slice pairs from two temporal optical remote sensing images are acquired, including the image before and after the change. A twin encoder is then used to extract features from the image slice pairs, resulting in multi-level feature maps for both temporal phases. Next, a multi-level feature compression module compresses these multi-level feature maps to obtain global feature vectors for both temporal phases. Finally, a difference feature vector is obtained based on these global feature vectors and input into a decision network for change detection, yielding the change detection result for the image slice pairs. Since the twin encoder used for feature extraction is processed using a sensitivity-based network pruning method, the network is more lightweight and easier to deploy. Furthermore, fusing the compressed multi-level feature maps from both temporal phases into a global feature vector for change detection improves detection accuracy. Therefore, a lightweight slice-level change detection method for optical remote sensing images is achieved.
[0066] In one alternative embodiment, considering the significant redundancy in feature channels of the Siamese encoder built on the ResNet18 network, the visualization results of some channel feature maps are highly similar for ground objects (e.g., buildings and roads) in the input image slices, resulting in additional parameter and computational costs. Furthermore, the ground object-related feature information contained in some channels is very limited, offering little improvement to detection accuracy. Therefore, compressing the original backbone network using channel pruning is feasible without significant performance loss, while further reducing model size and accelerating inference speed.
[0067] Furthermore, the twin encoder comprises multiple levels, and the twin encoder is obtained by processing steps A1 to A5 using the following sensitivity-based network pruning method:
[0068] Step A1: Based on the initial pruning rate, prune each level of the pre-trained initial twin encoder to obtain the pruned initial twin encoder.
[0069] In this embodiment, the initial Siamese encoder refers to a Siamese encoder constructed using a ResNet18 network with the last layer removed. This initial Siamese encoder includes four layers. Without introducing multi-level feature compression modules, this initial Siamese encoder is trained until convergence, and the corresponding training parameters are stored to obtain the trained initial Siamese encoder. Then, the initial pruning rate is set. and the initial pruning rate Pruning is applied to each level of the initial twin encoder.
[0070] Specifically, based on the initial pruning rate, each level of the pre-trained initial Siamese encoder is pruned to obtain the pruned initial Siamese encoder. This includes: based on the L1 norm criterion, sequentially pruning each level of the initial Siamese encoder to remove elements with the smallest L1 norm value at each level. A number of channels are used to obtain the pruned initial twin encoder; where N is equal to the product of the number of channels at the corresponding level and the initial pruning rate. The value of N can be expressed as... , This represents the number of channels for the corresponding level. i Indicates the hierarchical sequence number. For example, This indicates the number of channels in the first tier. The number of channels in each tier is not exactly the same, therefore the number of channels deleted in each tier is also different.
[0071] Step A2: Use the network parameters of the pruned initial Siamese encoder as pre-training weights, and train the pruned initial Siamese encoder to obtain the performance loss after pruning at each level.
[0072] Specifically, the pruned initial Siamese encoder network parameters obtained in step A1 above are used as pre-training weights to retrain the pruned network at each level, and the performance loss compared to the original model (i.e., the initial Siamese encoder) is calculated on the validation set. .
[0073] Step A3: Calculate the sensitivity of each level based on the defined sensitivity function and the performance loss after pruning.
[0074] In this embodiment, considering that different levels contain feature information of different granularities and have different sensitivities to pruning operations, the sensitivity of each level is calculated based on the sensitivity function and performance loss. This allows for the quantification of the sensitivity of each level to pruning operations, enabling subsequent correction of the initial pruning rate based on the sensitivity of each level.
[0075] Specifically, based on the defined sensitivity function and the performance loss after pruning, the sensitivity of each level is calculated, including: determining the minimum performance loss and the maximum performance loss after pruning from the performance losses after pruning of multiple levels of the initial twin encoder after pruning; and obtaining the sensitivity of each level based on the minimum performance loss after pruning, the maximum performance loss after pruning, and the performance loss after pruning of each level.
[0076] For example, the sensitivity of each level can be represented as:
[0077]
[0078] in, Indicates the first i Sensitivity at each level It is a constant. This indicates the maximum performance loss after pruning. This represents the minimum performance loss after pruning. Indicates the first i Performance loss after each level of pruning.
[0079] Step A4: Correct the initial pruning rate using the sensitivity of each level to obtain the corrected pruning rate for each level.
[0080] Specifically, for levels sensitive to pruning operations (i.e., levels with high sensitivity), the initial pruning rate is reduced to minimize performance loss; while for levels insensitive to pruning operations (i.e., levels with low sensitivity), a pruning rate close to the initial pruning rate can be selected. Furthermore, by correcting the pruning rate for each level, a corrected pruning rate for each level is obtained, which is then used to prune the initial twin encoder subsequently.
[0081] Step A5: Prune the initial twin encoder according to the modified pruning rate of each level to obtain the twin encoder.
[0082] In this embodiment, each layer has a different sensitivity to pruning operations. Therefore, the pruning is adjusted according to the modified pruning rate of each layer. That is, the number of channels in each layer after pruning is calculated based on the modified pruning rate, and then the pruned twin encoder is constructed. After pruning, the twin encoder is a lightweight network, which is easier to deploy and has a faster inference speed.
[0083] Specifically, the initial twin encoder is pruned according to the modified pruning rate of each level to obtain a twin encoder, including: obtaining the number of channels after pruning at each level based on the number of channels at each level of the initial twin encoder and the modified pruning rate corresponding to that level; and pruning the channels at each level according to the number of channels after pruning at each level to obtain a twin encoder.
[0084] For example, the number of channels after pruning at each level Represented as:
[0085]
[0086] in, C This represents the number of channels at each level of the initial twin encoder. Indicates the first i Corrected pruning rates at each level, Indicates the initial pruning rate. Indicates the first i Sensitivity at each level.
[0087] In this embodiment, a twin encoder is obtained through sensitivity-based network pruning. Compared with the simple approach of assigning a fixed pruning rate to the model to be pruned, this embodiment takes into account that different levels contain feature information of different granularities and have different sensitivities to pruning operations. Therefore, the initial pruning rate set manually is modified according to the sensitivity of each level, and the initial twin encoder is pruned based on the modified pruning rate, making the pruning operation more reasonable and thus compressing the model complexity to the greatest extent while ensuring detection accuracy.
[0088] In an optional implementation, the multi-level feature compression module compresses the feature map at each level; the multi-level feature compression module is used to compress the two-phase multi-level feature maps to obtain two-phase global feature vectors, including steps B1 to B3:
[0089] Step B1: The multi-level feature compression module employs convolution and max pooling operations at each level to compress the feature map of that level in both channel and spatial dimensions, resulting in a two-phase multi-level compressed feature map. The max pooling operation at each level of the multi-level feature compression module uses the same pooling window size, and the spatial size of the output feature image of the pooling layer decreases sequentially with increasing level.
[0090] Step B2: Flatten the two-phase multi-level compressed feature maps to obtain one-dimensional feature vectors of different levels in the two phases.
[0091] Step B3: Concatenate the one-dimensional feature vectors of different levels in the two time phases to obtain the global feature vectors of the two time phases.
[0092] In this embodiment, to minimize model complexity while ensuring detection accuracy, a max pooling layer is used to compress the multi-level feature maps before fusing the two-phase multi-level feature maps. Max pooling not only preserves important features but also reduces computational cost, avoids overfitting, and thus improves the model's generalization ability. Furthermore, max pooling maximizes the translation invariance of the feature maps, reducing the impact of registration errors on the detection results to some extent. Considering the differences in feature granularity, using global pooling at each stage is inappropriate, as it would lead to significant loss of low-level fine-grained features. Therefore, this embodiment allocates a fixed-size pooling window to all pooling layers, increasing the spatial dimension of the output features from lower-level pooling layers to retain sufficient detail information.
[0093] Specifically, 1×1 convolution and max pooling operations are used to compress the feature maps of all four levels of the Siamese encoder to obtain the corresponding global feature vectors. For example, the structure of the Multi-Level Feature Compression Module (MLFC) is as follows: Figure 3 As shown, each layer uses 1×1 convolution and max pooling operations to compress the feature map of the corresponding layer in the channel and spatial dimensions. Then, each compressed feature map is flattened into a one-dimensional feature vector. Finally, the feature vectors of each layer are concatenated to obtain the global feature vector.
[0094] For example, the global feature vector is represented as:
[0095]
[0096]
[0097] in, , They represent the first i The output feature maps of the hierarchy and the flattened one-dimensional feature vectors, This indicates a max pooling operation. This indicates a 1×1 convolution with C output channels, where the value of C is set to half the minimum number of channels across all layers. It is the output global feature vector. Flatten and Concat These represent the flattening and splicing operations, respectively. The sliding window size in the pooling layer is set to 8×8. With an input size of 128×128, the output sizes of the pooling layers in the four levels are 8×8, 4×4, 2×2, and 1×1, respectively.
[0098] The method for detecting slice-level changes in optical remote sensing images provided in this application is based on a slice-level detection network. Figure 4 This is a schematic diagram of a slice-level detection network provided in an embodiment of this application. The slice-level detection network consists of a twin encoder, a multi-level feature compression module, and a decision network. Compared with existing slice-level detection networks, the slice-level detection network provided in this application is built after sensitivity pruning, and is a lighter, more efficient, and faster backbone network model.
[0099] In practice, image slice pairs from two temporal optical remote sensing images are input into a slice-level detection network. This network uses a twin encoder to extract features from the slice pairs, obtaining multi-level feature maps for both temporal phases. Then, a multi-level feature compression module compresses and fuses these feature maps to obtain global feature vectors for both phases. Finally, based on these global feature vectors, a difference feature vector is obtained and input into a decision network for change detection, yielding the change detection result for the image slice pairs. Therefore, while ensuring detection accuracy, this approach maximizes model complexity and inference speed. A highly efficient and lightweight slice-level detection network based on multi-level feature compression and network pruning techniques is designed to achieve slice-level change detection in optical remote sensing images.
[0100] This application also provides an optical remote sensing image slice-level change detection device, referring to... Figure 5 As shown, Figure 5 This is a schematic diagram of a device for detecting slice-level changes in optical remote sensing images according to an embodiment of this application. The device includes:
[0101] The slicing module 510 is used to acquire image slice pairs of two temporal optical remote sensing images, wherein the two temporal optical remote sensing images include an optical remote sensing image before the change and an optical remote sensing image after the change;
[0102] The extraction module 520 is used to extract features from the image slice pairs using a twin encoder to obtain two-phase multi-level feature maps. The twin encoder is obtained by processing a sensitivity-based network pruning method.
[0103] Compression module 530 is used to compress the two-phase multi-level feature maps using a multi-level feature compression module to obtain two-phase global feature vectors.
[0104] The detection module 540 is used to obtain a difference feature vector based on the two-phase global feature vectors, and input the difference feature vector into the decision network for change detection to obtain the change detection result of the image slice pair.
[0105] In one alternative embodiment, the apparatus includes a pruning module for generating a twin encoder, the twin encoder comprising multiple levels, the pruning module comprising:
[0106] The initial pruning module is used to prune each level of the pre-trained initial twin encoder according to the initial pruning rate, so as to obtain the pruned initial twin encoder.
[0107] The performance loss calculation module is used to use the network parameters of the pruned initial Siamese encoder as pre-training weights and to train the pruned initial Siamese encoder to obtain the performance loss after pruning at each level.
[0108] The sensitivity calculation module is used to calculate the sensitivity of each level based on the defined sensitivity function and the performance loss after pruning.
[0109] The pruning rate correction module is used to correct the initial pruning rate using the sensitivity of each level, so as to obtain the corrected pruning rate for each level.
[0110] The correction pruning module is used to prune the initial twin encoder according to the correction pruning rate of each level to obtain a twin encoder.
[0111] In one optional embodiment, the initial pruning module includes:
[0112] The initial pruning submodule is used to prune each level of the initial twin encoder sequentially based on the L1 norm criterion, to remove elements with the smallest L1 norm value at each level. The initial twin encoder after pruning is obtained by using one channel; where the value of N is equal to the product of the number of channels at the corresponding level and the initial pruning rate.
[0113] In one optional embodiment, the sensitivity calculation module includes:
[0114] The determination module is used to determine the minimum performance loss and the maximum performance loss after pruning from the performance loss after pruning at multiple levels of the initial twin encoder after pruning.
[0115] The calculation submodule is used to obtain the sensitivity of each level based on the minimum performance loss after pruning, the maximum performance loss after pruning, and the performance loss after pruning at each level.
[0116] In one optional embodiment, the corrective pruning module includes:
[0117] The channel module is used to obtain the number of channels after pruning at each level based on the number of channels at each level of the initial twin encoder and the corrected pruning rate corresponding to that level.
[0118] The pruning submodule is used to prune the channels of each level according to the number of channels after pruning at each level, so as to obtain a twin encoder.
[0119] In one optional embodiment, the multi-level feature compression module compresses the feature map of each level; the compression module includes:
[0120] The computation module is used to perform convolution and max pooling operations at each level to compress the feature map of that level in the channel and spatial dimensions, respectively, to obtain a two-phase multi-level compressed feature map.
[0121] The flattening module is used to flatten the two-phase multi-level compressed feature maps to obtain one-dimensional feature vectors of different levels in the two phases.
[0122] The splicing module is used to splice the one-dimensional feature vectors of different levels in two time phases to obtain the global feature vectors of the two time phases.
[0123] In one optional embodiment, the pooling window size used for max pooling operations at each level of the multi-level feature compression module is the same, and the spatial size of the output feature map of the pooling decreases sequentially as the level increases.
[0124] In one optional embodiment, the detection module includes:
[0125] The detection submodule is used to perform change detection on the differential feature vector using two fully connected layers, and generate binary labels representing changes and no changes. The binary labels represent the change detection results of the image slice pairs.
[0126] In one optional embodiment, the slicing module includes:
[0127] The slicing module is used to slice the two temporal optical remote sensing images according to a preset sliding window size and step size to obtain image slice pairs; wherein, the sliding window size represents the size of the image slice, and the step size represents the overlap rate of adjacent slices.
[0128] The various embodiments in this specification are described in a progressive manner, with each embodiment focusing on the differences from other embodiments. The same or similar parts between the various embodiments can be referred to each other.
[0129] This application describes embodiments of methods and apparatus according to flowchart illustrations and / or block diagrams. It should be understood that each block of the flowchart illustrations and / or block diagrams, and combinations of blocks in the flowchart illustrations and / or block diagrams, can be implemented by computer program instructions. These computer program instructions can be provided to a processor of a general-purpose computer, special-purpose computer, embedded processor, or other programmable data processing terminal device to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing terminal device, generate instructions for implementing the flowchart illustrations and / or block diagrams. Figure 1 One or more processes and / or boxes Figure 1 A device that provides the functions specified in one or more boxes.
[0130] These computer program instructions may also be stored in a computer-readable storage medium that can direct a computer or other programmable data processing terminal device to operate in a particular manner, such that the instructions stored in the computer-readable storage medium produce an article of manufacture including instruction means, which are implemented in a process Figure 1 One or more processes and / or boxes Figure 1 The function specified in one or more boxes.
[0131] These computer program instructions can also be loaded onto a computer or other programmable data processing terminal equipment, causing a series of operational steps to be performed on the computer or other programmable terminal equipment to produce a computer-implemented process, thereby providing instructions that execute on the computer or other programmable terminal equipment for implementing the process. Figure 1 One or more processes and / or boxes Figure 1 The steps of the function specified in one or more boxes.
[0132] Although preferred embodiments of the present application have been described, those skilled in the art, upon learning the basic inventive concept, can make other changes and modifications to these embodiments. Therefore, the appended claims are intended to be interpreted as including the preferred embodiments as well as all changes and modifications falling within the scope of the embodiments of the present application.
[0133] Finally, it should be noted that in this document, relational terms such as "first" and "second" are used only to distinguish one entity or operation from another, and do not necessarily require or imply any such actual relationship or order between these entities or operations. Furthermore, the terms "comprising," "including," or any other variations thereof are intended to cover non-exclusive inclusion, such that a process, method, article, or terminal device that comprises a list of elements includes not only those elements but also other elements not expressly listed, or elements inherent to such a process, method, article, or terminal device. Without further limitations, an element defined by the phrase "comprising one..." does not exclude the presence of other identical elements in the process, method, article, or terminal device that includes said element.
[0134] The above provides a detailed description of the method and apparatus for detecting slice-level changes in optical remote sensing images provided in this application. Specific examples have been used to illustrate the principles and implementation methods of this application. The descriptions of the above embodiments are only for the purpose of helping to understand the method and its core ideas. At the same time, for those skilled in the art, there will be changes in the specific implementation methods and application scope based on the ideas of this application. Therefore, the content of this specification should not be construed as a limitation of this application.
Claims
1. A method for detecting slice-level changes in optical remote sensing images, characterized in that, The method includes: Obtain image slice pairs of two temporal optical remote sensing images, wherein the two temporal optical remote sensing images include an optical remote sensing image before the change and an optical remote sensing image after the change; The image slices are processed by a twin encoder to extract features, resulting in a two-phase multi-level feature map. The twin encoder is obtained by a sensitivity-based network pruning method. The two-phase multi-level feature maps are compressed using a multi-level feature compression module to obtain two-phase global feature vectors. Based on the two-phase global feature vectors, a difference feature vector is obtained, and the difference feature vector is input into a decision network for change detection to obtain the change detection result of the image slice pair; The twin encoder comprises multiple levels and is obtained through the following sensitivity-based network pruning method: Based on the initial pruning rate, each level of the pre-trained initial twin encoder is pruned to obtain the pruned initial twin encoder. The network parameters of the pruned initial twin encoder are used as pre-training weights, and the pruned initial twin encoder is trained to obtain the performance loss after pruning at each level. Calculate the sensitivity of each level based on the defined sensitivity function and the performance loss after pruning; The initial pruning rate is corrected using the sensitivity of each level to obtain the corrected pruning rate for each level; The initial twin encoder is pruned according to the modified pruning rate of each level to obtain the twin encoder.
2. The method according to claim 1, characterized in that, Based on the initial pruning rate, each level of the pre-trained initial twin encoder is pruned to obtain the pruned initial twin encoder, including: Based on the L1 norm criterion, each level of the initial twin encoder is pruned sequentially to remove elements with the smallest L1 norm value at each level. One channel is used to obtain the pruned initial twin encoder; Wherein, the value of N is equal to the product of the number of channels at the corresponding level and the initial pruning rate.
3. The method according to claim 1, characterized in that, Based on the defined sensitivity function and the performance loss after pruning, the sensitivity of each level is calculated, including: From the performance loss after pruning at multiple levels of the initial twin encoder after pruning, the minimum performance loss and the maximum performance loss after pruning are determined. The sensitivity of each level is obtained based on the minimum performance loss after pruning, the maximum performance loss after pruning, and the performance loss after pruning at each level.
4. The method according to claim 1, characterized in that, The initial twin encoder is pruned according to the modified pruning rate of each level to obtain a twin encoder, including: Based on the number of channels in each level of the initial twin encoder and the corrected pruning rate corresponding to that level, the number of channels after pruning in each level is obtained; Based on the number of channels after pruning at each level, the channels at each level are pruned to obtain a twin encoder.
5. The method according to claim 1, characterized in that, The multi-level feature compression module compresses the feature map at each level; the multi-level feature compression module compresses the two-phase multi-level feature maps to obtain two-phase global feature vectors, including: The multi-level feature compression module uses convolution and max pooling operations at each level to compress the feature map of that level in both channel and spatial dimensions, resulting in a two-phase multi-level compressed feature map. Flatten the two-phase multi-level compressed feature maps to obtain one-dimensional feature vectors of different levels in the two phases; The one-dimensional feature vectors of different levels in the two time phases are concatenated to obtain the global feature vectors of the two time phases.
6. The method according to claim 5, characterized in that, The pooling window size used in the max pooling operation of each level of the multi-level feature compression module is the same, and the spatial size of the output feature map of the pooling layer decreases sequentially as the level increases.
7. The method according to claim 1, characterized in that, The differential feature vector is input into a decision network for change detection, including: The decision network uses two fully connected layers to detect changes in the differential feature vectors and generates binary labels representing changes and no changes. These binary labels represent the change detection results of the image slice pairs.
8. The method according to claim 1, characterized in that, Image tile pairs of two temporal optical remote sensing images are obtained, including: Based on the preset sliding window size and step size, the two temporal optical remote sensing images are sliced to obtain image slice pairs; Wherein, the sliding window size represents the size of the image slice, and the step size represents the overlap rate of adjacent slices.
9. A device for detecting slice-level changes in optical remote sensing images, characterized in that, The device includes: The slicing module is used to acquire image slice pairs of two temporal optical remote sensing images, wherein the two temporal optical remote sensing images include an optical remote sensing image before the change and an optical remote sensing image after the change; The extraction module is used to extract features from the image slice pairs using a twin encoder to obtain two-phase multi-level feature maps. The twin encoder is obtained by processing a sensitivity-based network pruning method. The compression module is used to compress the two-phase multi-level feature maps using the multi-level feature compression module to obtain the two-phase global feature vectors. The detection module is used to obtain the difference feature vector based on the two-phase global feature vectors, and input the difference feature vector into the decision network for change detection to obtain the change detection result of the image slice pair; The twin encoder comprises multiple levels and is obtained through the following sensitivity-based network pruning method: Based on the initial pruning rate, each level of the pre-trained initial twin encoder is pruned to obtain the pruned initial twin encoder. The network parameters of the pruned initial twin encoder are used as pre-training weights, and the pruned initial twin encoder is trained to obtain the performance loss after pruning at each level. Calculate the sensitivity of each level based on the defined sensitivity function and the performance loss after pruning; The initial pruning rate is corrected using the sensitivity of each level to obtain the corrected pruning rate for each level; The initial twin encoder is pruned according to the modified pruning rate of each level to obtain the twin encoder.