Deep mutual learning method for building extraction from remote sensing images based on geometric saliency

What is AI technical title?
AI technical title is built by PatSnap AI team. It summarizes the technical point description of the patent document.
By introducing geometric saliency feature maps of buildings into a deep learning network and combining a deep mutual learning method with bidirectional guided attention and flow alignment modules, the problem of lack of prior knowledge in deep learning remote sensing image building extraction methods is solved, achieving high-precision and stable building extraction results.

CN118537747BActive Publication Date: 2026-06-12CHINA UNIV OF MINING & TECH

View PDF 2 Cites 0 Cited by

Patent Information

Authority / Receiving Office: CN · China
Patent Type: Patents(China)
Current Assignee / Owner: CHINA UNIV OF MINING & TECH
Filing Date: 2024-06-03
Publication Date: 2026-06-12

Application Information

Patent Timeline

03 Jun 2024

Application

12 Jun 2026

Publication

CN118537747B

IPC: G06V20/13; G06V20/10; G06V10/80; G06T3/4053; G06N3/0464; G06N3/048; G06N3/08; G06N3/0442

CPC: G06V20/13; G06V20/176; G06V10/806; G06T3/4053; G06N3/0464; G06N3/048; G06N3/08; G06N3/0442

AI Tagging

Application Domain

Internal combustion piston engines Geometric image transformation

Explore More Agents

Novelty Search
Search existing technologies and assess novelty
↗
FTO
Analyze whether a product may infringe others' patents
↗
Design FTO
Check prior-design risk for exterior design
↗
Drafting
Draft patent application text based on a technical solution
↗
Find Solutions with TRIZ
Generate feasible solution to solve your technical challenge
↗

Similar Technology Patents

Working condition self-adaptive distributed micro-channel comprehensive heat dissipation system, heat dissipation method and engineering machinery
CN115899025BLiquid coolingCoolant flow control
Gap game perception takeover decision method for highway cut-in scene
CN121947557BInternal combustion piston engines Inference methods
A traffic infrastructure monitoring data probability outlier diagnosis method based on a conditional diffusion model
CN119862510BMathematical models Internal combustion piston engines
Fuel injector and internal combustion engine with fuel injector
CN122206859AElectrical control Internal combustion piston engines
Vehicle equipped with a water-cooled high-pressure fuel pump
JP7873138B2Liquid coolingCoolant flow control

Get free access to AI patent search and analysis

Check patentability, review prior art and ask IP Agent with full patent context.

AI Technical Summary

⚠Technical Problem

Existing deep learning methods for extracting buildings from remote sensing images lack effective utilization of prior remote sensing knowledge, resulting in insufficient network specificity and difficulty in achieving high-precision extraction in complex environments.

⚗Method used

We employ a deep mutual learning method based on geometric saliency. By acquiring geometric saliency feature maps of buildings, combining RGB and GS semantic branches, and utilizing a bidirectional guided attention module and an improved flow alignment module for deep mutual learning, we construct a multi-objective loss function optimization network to enhance the learning ability of building features.

🎯Benefits of technology

It improves the accuracy and stability of building extraction, performs well on different datasets, has strong versatility and generalization ability, and enhances the accuracy and completeness of building extraction.

✦ Generated by Eureka AI based on patent content.

Smart Images

Figure CN118537747B_ABST

Patent Text Reader

Abstract

The application discloses a kind of deep mutual learning remote sensing image building extraction methods based on geometric saliency, belong to the field of remote sensing image processing.The building geometric saliency feature map is obtained as prior knowledge, and is introduced into deep learning network, and the pertinence of building feature extraction is enhanced;Bidirectional guiding attention module is constructed, and the building feature map and building geometric saliency feature map in double branch network are deep mutual learning;High-resolution strong semantic feature map is obtained using improved flow alignment module, and MLP layer is used to fuse multi-level building feature map to obtain building prediction map of remote sensing image.The building geometric prior knowledge is introduced in the application, bidirectional guiding attention module and improved flow alignment module are constructed to further strengthen the learning ability of network to building feature, and are optimized in conjunction with multi-objective loss function, which can effectively improve the precision and stability of building extraction.

Need to check novelty before this filing date? Find Prior Art

Description

Technical Field

[0001] This invention relates to the field of remote sensing image processing technology, and in particular to a method for extracting buildings from remote sensing images based on deep mutual learning of geometric saliency. Background Technology

[0002] Buildings serve as carriers of urban information; their type and distribution reflect the city's construction and development. Timely access to building information facilitates precise urban governance and efficient urban management. Based on different methods of application, most current remote sensing image building extraction methods can be categorized into knowledge-driven and data-driven approaches.

[0003] Knowledge-driven building extraction utilizes the spectral characteristics and spatial features of buildings in remote sensing imagery to construct highly interpretable building extraction models. These methods primarily include pixel-based, feature-based, and object-oriented approaches. Pixel-based methods are simple to implement and suitable for low- to medium-resolution remote sensing imagery, but they have poor anti-interference capabilities. Feature-based methods extract and integrate multiple features of buildings from remote sensing imagery, compensating for the insufficient accuracy of single-feature extraction. Object-oriented methods effectively improve the "salt and pepper" noise phenomenon in pixel-level building extraction results; their main idea is to extract the spectral, geometric, and spatial features of buildings from objects. However, these methods are primarily based on prior knowledge and manually designed features, making them susceptible to influences such as remote sensing imagery shooting angle, lighting conditions, building shape, and building shadows, thus limiting the accuracy and applicability of building extraction results over large urban areas.

[0004] The rapid development of artificial intelligence has led to the rapid application of deep learning in building change detection in remote sensing imagery. Data-driven building extraction methods mostly improve accuracy by enhancing multi-scale feature extraction modules, attention mechanisms, and loss functions in existing networks. A multi-constraint fully convolutional network (MC-FCN) model is used for end-to-end building segmentation; the application of multiple constraints further enhances the model's multi-scale feature representation capabilities and improves building extraction performance. MCG-UNet and BCL-UNet are two novel deep convolutional models based on the U-Net series, which can preserve building boundary information even in complex urban scenes. CSA-UNet enhances the network's accuracy in extracting illegal blue-roofed buildings in rural areas by introducing a channel spatial attention mechanism module. Some researchers have added edge constraint optimization functions to building extraction networks, which can effectively improve the accuracy of building edge extraction. Although the above-mentioned building extraction methods have designed new deep learning network models for building extraction, they rely excessively on training samples, lack guidance from prior knowledge of buildings during model training, and have insufficient network specificity.

[0005] Knowledge-driven building extraction methods, which rely on manually designed prior knowledge of buildings, have limitations and are susceptible to external factors such as lighting, making them difficult to apply on a large scale. Data-driven building extraction methods, on the other hand, depend excessively on training samples and lack guidance from prior knowledge of buildings, resulting in insufficient network specificity. To address these issues and improve the specificity and effective utilization of prior remote sensing knowledge in deep learning-based high-resolution remote sensing image building extraction networks, and to meet the need for refined building acquisition, it is necessary to provide a deep mutual learning-based remote sensing image building extraction method that considers geometric saliency to solve these technical problems. Summary of the Invention

[0006] This invention provides a deep mutual learning method for building extraction from remote sensing images based on geometric saliency, in order to solve the problem that existing deep learning building extraction networks lack effective utilization of prior remote sensing knowledge.

[0007] A first aspect of the present invention provides a method for extracting buildings from deep mutual learning remote sensing images based on geometric saliency, comprising the following steps:

[0008] Acquire remote sensing images and extract geometric saliency feature maps of buildings from the remote sensing images;

[0009] The remote sensing image and the geometric saliency feature map of the building are respectively input into the RGB semantic branch and the GS semantic branch to obtain building feature maps at different scales;

[0010] A bidirectional guided attention module is used to obtain the dependency relationship between the feature maps of buildings of the same scale in the RGB semantic branch and the GS semantic branch, and deep mutual learning is performed.

[0011] The feature maps of adjacent buildings are input into an improved flow alignment module to obtain high-resolution building feature maps with strong semantic information.

[0012] The MLP layer is used to fuse multi-level building feature maps to obtain building prediction maps from remote sensing images.

[0013] Optionally, in one embodiment of the present invention, the building extraction method utilizes a multi-objective loss function to optimize the extraction process. The multi-objective loss function L consists of the loss of the final building prediction map and the loss of the four feature maps output by the bidirectional guided attention module, and is calculated using the following formula:

[0014]

[0015] Among them, L final L is the loss function for the final building prediction map. iλ is the loss function for the feature maps {BG1,BG2,BG3,BG4} output by the bidirectional guided attention module, where λ is the weight coefficient.

[0016] Optionally, in one embodiment of the present invention, acquiring remote sensing imagery and extracting geometric saliency feature maps of buildings from the remote sensing imagery includes:

[0017] Based on the characteristics of building intersections in remote sensing imagery, a geometric saliency algorithm is used to obtain geometric saliency feature maps of buildings. The confidence level and angle information of the intersections are used as first-order saliency. The adjacency relationship of the intersection points as second-order saliency The geometric saliency feature map of a building is obtained by fusing the first-order and second-order saliency of intersection points on remote sensing images. The calculation formula is as follows:

[0018]

[0019] Where p represents each pixel in the remote sensing image, J represents all intersections of the remote sensing image, j represents a single intersection, and R... j This represents a quadrilateral region formed by a single intersection point with branches and the included angle θ between the branches. Indicates that pixel p is in R j The value is 1 if it falls within the specified range, otherwise the value is 0.

[0020]

[0021] Where ρ is the confidence level of the intersection point, P(j∈J) B |j θ ) indicates that the included angle at the intersection is j. θ The probability that j belongs to the intersection of buildings when ∈[0,π);

[0022]

[0023] Where z is the size of the neighborhood, N j Let j′ represent the set of intersection points of the domains. j Intersection within, and Indicate that j and j′ are in R j The center point within the range, τ represents the center point of the range. and The distances between them are standardized.

[0024] Optionally, in one embodiment of the present invention, the remote sensing image and the building geometric saliency feature map are respectively input into the RGB semantic branch and the GS semantic branch to obtain building feature maps at different scales, including:

[0025] Remote sensing images are represented as I∈RH×W×C H, W, and C represent the height, width, and number of channels of the remote sensing image, respectively. The remote sensing image is input into the RGB semantic branch to obtain four building feature maps {BF1, BF2, BF3, BF4} at different scales, with a size of [missing information].

[0026] The geometric saliency feature map of a building is obtained by the geometric saliency algorithm. This feature map is then input into the GS semantic branch to obtain four building feature maps {GS1, GS2, GS3, GS4} at different scales, each containing geometric saliency.

[0027] Optionally, in one embodiment of the present invention, a bidirectional guided attention module is used to obtain the dependency relationship between building feature maps of the same scale in the RGB semantic branch and the GS semantic branch, and deep mutual learning is performed, including:

[0028] In the building feature diagram BF i Guided by the geometric saliency feature map GS of buildings i Adaptive learning of features in the corresponding region, preserving the geometric saliency feature map GS of buildings. i BF (Building Feature Diagram) i IGS, a key feature of highly correlated regions i The calculation formula is as follows:

[0029] α i =sigmoid(BF i GS i )

[0030] IGS i =α i GS i

[0031] In the formula, i = 1, 2, 3, 4, sigmoid(·) is the activation function, and α i ∈[0,1] represents the correlation between the i-th building feature map BF and the building geometric saliency feature map GS;

[0032] IGS in important regional features i Guided by the model, the building feature map BF i Adaptive learning of features in the corresponding region yields a building feature map (BF). i IGS with important regional characteristics i Features with strong correlation IBF i The calculation formula is as follows:

[0033] β i =sigmoid(BF i +IGSi )

[0034] IBF i =β i BF i

[0035] In the formula, β i ∈[0,1] represents the correlation between the i-th building feature map BF and the building geometric saliency important region feature map IGS;

[0036] IGS (Integrated Geometric System) of Important Regions i and IBF i The data is then fused together, and the concept of residuals is used in conjunction with the input BF. i The summation yields the building feature map (BG) after deep mutual learning. i The calculation formula is as follows:

[0037] BG i =BF i +Conv 3×3 (IBF i PIGS i )

[0038] Where P represents the feature fusion operation along the channel dimension, Conv 3×3 (·) represents a convolution operation with a kernel size of 3×3.

[0039] Optionally, in one embodiment of the invention, adjacent building feature maps are input into an improved flow alignment module to obtain high-resolution building feature maps with strong semantic information, including...

[0040] Calculate the semantic flow field Δ∈R for adjacent feature layers H×W×2 The calculation formula is as follows:

[0041] Δ=Conv 3×3 (Conv 1×1 (BG i )PUp(Conv 1×1 (BG i ′ +1 )))

[0042] Among them, BG i BG is a high-resolution building feature map with weak semantic information. i ′ +1 Conv is a low-resolution feature map of buildings with strong semantic information. 1×1 (·) represents a convolution operation with a kernel size of 1×1, and Up(·) represents an upsampling operation;

[0043] The adjacent feature layers are fused, and then a gating operation is performed to obtain the gated feature map g. The calculation formula is as follows:

[0044] g = Conv G (Conv 3×3 (Conv 1×1 (BG i )PUp(Conv 1×1 (BG i ′ +1 ))))

[0045] Among them, Conv G (·) represents a gating operation, which includes a convolutional layer with a kernel size of 1×1 and a sigmoid activation function;

[0046] By applying the same semantic flow field to adjacent feature layers, two high-resolution feature maps with aligned semantic information are obtained. Then, the two semantically aligned feature maps are weighted and fused using gated feature maps g and 1-g to obtain a high-resolution building feature map BG with strong semantic information. i The calculation formula is as follows:

[0047] BG i ′=gConv 1×1 (warp(BG i ′ +1 ,Δ))+(1-g)warp(BG i ,Δ)

[0048] Here, warp(·) represents the warp operation.

[0049] A second aspect of the present invention provides a device for extracting buildings from deep mutual learning remote sensing images based on geometric saliency, comprising:

[0050] The extraction module is used to acquire remote sensing images and extract geometric saliency feature maps of buildings in the remote sensing images;

[0051] The branch module is used to input the remote sensing image and the geometric saliency feature map of the building into the RGB semantic branch and the GS semantic branch respectively to obtain building feature maps at different scales;

[0052] The mutual learning module is used to obtain the dependency relationship between the building feature maps of the same scale in the RGB semantic branch and the GS semantic branch through the bidirectional guided attention module, and to perform deep mutual learning.

[0053] The input module is used to input the feature maps of adjacent buildings into the improved flow alignment module to obtain high-resolution building feature maps with strong semantic information;

[0054] The fusion module is used to fuse multi-level building feature maps using an MLP layer to obtain a building prediction map of the remote sensing image.

[0055] Optionally, in one embodiment of the present invention, the building extraction method utilizes a multi-objective loss function to optimize the extraction process. The multi-objective loss function L consists of the loss of the final building prediction map and the loss of the four feature maps output by the bidirectional guided attention module, and is calculated using the following formula:

[0056]

[0057] Among them, L final L is the loss function for the final building prediction map. i λ is the loss function for the feature maps {BG1,BG2,BG3,BG4} output by the bidirectional guided attention module, where λ is the weight coefficient.

[0058] Optionally, in one embodiment of the present invention, the extraction module is specifically used for:

[0059] Based on the characteristics of building intersections in remote sensing imagery, a geometric saliency algorithm is used to obtain geometric saliency feature maps of buildings. The confidence level and angle information of the intersections are used as first-order saliency. The adjacency relationship of the intersection points as second-order saliency The geometric saliency feature map of a building is obtained by fusing the first-order and second-order saliency of intersection points on remote sensing images. The calculation formula is as follows:

[0060]

[0061] Where p represents each pixel in the remote sensing image, J represents all intersections of the remote sensing image, j represents a single intersection, and R... j This represents a quadrilateral region formed by a single intersection point with branches and the included angle θ between the branches. Indicates that pixel p is in R j The value is 1 if it falls within the specified range, otherwise the value is 0.

[0062]

[0063] Where ρ is the confidence level of the intersection point, P(j∈J) B |j θ ) indicates that the included angle at the intersection is j. θ The probability that j belongs to the intersection of buildings when ∈[0,π);

[0064]

[0065] Where z is the size of the neighborhood, N jLet j′ represent the set of intersection points of the domains. j Intersection within, and Indicate that j and j′ are in R j The center point within the range, τ represents the center point of the range. and The distances between them are standardized.

[0066] Optionally, in one embodiment of the present invention, the branch module is specifically used for:

[0067] Remote sensing images are represented as I∈R H×W×C H, W, and C represent the height, width, and number of channels of the remote sensing image, respectively. The remote sensing image is input into the RGB semantic branch to obtain four building feature maps {BF1, BF2, BF3, BF4} at different scales, with a size of [missing information].

[0068] The geometric saliency feature map of a building is obtained by the geometric saliency algorithm. This feature map is then input into the GS semantic branch to obtain four building feature maps {GS1, GS2, GS3, GS4} at different scales, each containing geometric saliency.

[0069] A third aspect of the present invention provides an electronic device, including: a memory, a processor, and a computer program stored in the memory and executable on the processor, wherein the processor executes the program to implement the deep mutual learning remote sensing image building extraction method based on geometric saliency as described in the above embodiments.

[0070] A fourth aspect of the present invention provides a computer-readable storage medium storing a computer program that, when executed by a processor, implements the above-described method for extracting buildings from deep mutual learning remote sensing images based on geometric saliency.

[0071] A fourth aspect of the present invention provides a computer program product, including a computer program that is executed to implement the deep mutual learning remote sensing image building extraction method based on geometric saliency as described in the above embodiments.

[0072] The method and apparatus for extracting buildings from remote sensing images based on geometric saliency in this invention have the following beneficial effects:

[0073] (1) This invention introduces the geometric salience of buildings as prior knowledge into the deep learning network, guiding the network to focus on the geometric salience region of buildings, which can effectively enhance the network's targeting of building feature learning.

[0074] (2) The present invention proposes a bidirectional guided attention module (BGAM) to effectively capture the dependency relationship between feature maps of the same scale in the RGB semantic branch and the GS semantic branch. The deep mutual learning between the two can enhance the feature extraction effect of the target region.

[0075] (3) This invention proposes an improved flow alignment module (FAM++) to obtain high-resolution building feature maps with strong semantic information and reduce the phenomenon of holes inside buildings. Based on FAM, FAM++ incorporates the idea of gating, which can better learn the semantic and spatial information between adjacent feature layers.

[0076] (4) In order to obtain more accurate building extraction results, the present invention constructs a multi-objective loss function to optimize the network. The multi-objective loss function consists of the loss of the final prediction map and the loss of the four feature maps output by the encoder.

[0077] (5) This invention performs excellently on three different datasets, achieving the highest accuracy in all evaluation metrics. For the WHU Building Dataset, GSDMLNet's Precision, Recall, F1, IoU, and OA are 97.89%, 97.57%, 97.73%, 95.61%, and 98.83%, respectively. For the Massachusetts Building Dataset, compared to the second-ranked SegFormer, GSDMLNet's Precision, Recall, F1, IoU, and OA are 0.35%, 2.77%, 2.55%, 2.71%, and 0.71% higher, respectively. GSDMLNet also performs well on the China Building Dataset, with its Precision, Recall, F1, IoU, and OA being 1.77%, 0.77%, 1.27%, 2.05%, and 1.13% higher than the better-performing SegFormer, respectively. GSDMLNet has a certain degree of versatility and strong generalization ability on different datasets.

[0078] Additional aspects and advantages of the invention will be set forth in part in the description which follows, and in part will be obvious from the description, or may be learned by practice of the invention. Attached Figure Description

[0079] The above and / or additional aspects and advantages of the present invention will become apparent and readily understood from the following description of the embodiments taken in conjunction with the accompanying drawings, wherein:

[0080] Figure 1 A flowchart illustrating a method for extracting buildings from remote sensing images based on geometric saliency, according to an embodiment of the present invention;

[0081] Figure 2 Here are the network structure diagrams of GSDMLNet according to embodiments of the present invention: (a) Overall structure diagram of GSDMLNet network; (b) Structure diagram of E module in encoder; (c) Structure diagram of MLP layer in decoder; (d) Structure diagram of BGAM module in encoder; (e) Structure diagram of FAM++ module in decoder;

[0082] Figure 3 A flowchart illustrating a specific embodiment of a deep mutual learning remote sensing image building extraction method based on geometric saliency;

[0083] Figure 4 The following images were extracted from the WHU Building Dataset: (a) Original image; (b) Labels; (c) PSPNet; (d) Deeplabv3+; (e) HRNet; (f) SegFormer; (g) GSDMLNet;

[0084] Figure 5 Building images extracted from the Massachusetts Building Dataset: (a) Original image; (b) Labels; (c) PSPNet; (d) Deeplabv3+; (e) HRNet; (f) SegFormer; (g) GSDMLNet;

[0085] Figure 6 Building extraction results for the China Building Dataset: (a) Original image; (b) Labels; (c) PSPNet; (d) Deeplabv3+; (e) HRNet; (f) SegFormer; (g) GSDMLNet;

[0086] Figure 7 This is a schematic diagram of the structure of a deep mutual learning remote sensing image building extraction device based on geometric saliency according to an embodiment of the present invention. Detailed Implementation

[0087] Embodiments of the present invention are described in detail below. Examples of these embodiments are illustrated in the accompanying drawings, wherein the same or similar reference numerals denote the same or similar elements or elements having the same or similar functions throughout. The embodiments described below with reference to the accompanying drawings are exemplary and intended to explain the present invention, and should not be construed as limiting the present invention.

[0088] Figure 1 This is a flowchart illustrating a method for extracting buildings from remote sensing images based on geometric saliency, according to an embodiment of the present invention.

[0089] like Figure 1As shown, the deep mutual learning remote sensing image building extraction method based on geometric saliency includes the following steps:

[0090] Step S101: Acquire remote sensing images and extract geometric saliency feature maps of buildings in the remote sensing images.

[0091] In embodiments of the present invention, remote sensing images are acquired and a dataset is created. The dataset is randomly divided into a training set, a validation set, and a test set according to a set ratio. Geometric saliency feature maps of buildings are obtained using the remote sensing images and introduced as prior knowledge into the deep learning network.

[0092] In embodiments of the present invention, based on the characteristics of building intersections in remote sensing imagery, a Geometric Saliency (GeoSay) algorithm is employed to obtain a geometric saliency feature map of the buildings. The confidence level and included angle information of the intersections are used as first-order saliency. The adjacency relationship of the intersection points as second-order saliency The geometric saliency feature map of the building is obtained by fully integrating the first-order and second-order saliency of the intersection points on the remote sensing image. The calculation formula is as follows:

[0093]

[0094] Where p represents each pixel in the remote sensing image, J represents all intersection points obtained from the remote sensing image using the ASJ algorithm, j represents a single intersection point, and R... j This represents a quadrilateral region formed by a single intersection point with branches and the included angle θ between the branches. Indicates that pixel p is in R j The value is 1 if it falls within the specified range, otherwise the value is 0.

[0095]

[0096] Where ρ is the confidence level of the intersection point, P(j∈J) B |j θ ) indicates that the included angle at the intersection is j. θ The probability that j belongs to the intersection of buildings when ∈[0,π); since most buildings in remote sensing images are rectangular, in this invention

[0097]

[0098] Where z is the size of the neighborhood, N j Let j′ represent the set of neighborhood intersections obtained by the τ-NN algorithm, where j′ is N. j Intersection within, and Indicate that j and j′ are in R j The center point within the range, τ represents the center point of the range. and The distances between them are standardized.

[0099] Step S102: Input the remote sensing image and the geometric saliency feature map of the building into the RGB semantic branch and the GS semantic branch respectively to obtain building feature maps at different scales.

[0100] In embodiments of the present invention, such as Figure 2 As shown, the encoder part constructs RGB semantic branches and GS semantic branches. Assume a remote sensing image I∈R H×W×C H, W, and C represent the height, width, and number of channels of the remote sensing image, respectively. The remote sensing image is input into the RGB semantic branch to obtain four building feature maps {BF1, BF2, BF3, BF4} at different scales, with a size of [missing information]. The GeoSay algorithm is used to obtain single-band building geometric saliency feature maps. These feature maps are then input into the GS semantic branch to obtain four building feature maps {GS1, GS2, GS3, GS4} at different scales, each containing geometric saliency.

[0101]

[0102] Step S103: A bi-directional guided attention module (BGAM) is used to obtain the dependency relationship between the feature maps of buildings of the same scale in the RGB semantic branch and the GS semantic branch, and deep mutual learning is performed. The deep mutual learning between the two can enhance the feature extraction effect of the target region.

[0103] In an embodiment of the invention, a bidirectional guided attention module is constructed that can effectively capture the dependencies between building feature maps of the same scale in both branches. First, in the building feature map BF... i Guided by the geometric saliency feature map GS of buildings i Adaptive learning of features in the corresponding region, preserving GS i Zhong and BF i IGS, a key feature of highly correlated regions i The calculation formula is as follows:

[0104] α i =sigmoid(BF i GS i )

[0105] IGS i =α i GS i

[0106] Where i = 1, 2, 3, 4, sigmoid(·) is the activation function, and α i ∈[0,1] represents the correlation between the i-th building feature map BF and the building geometric saliency feature map GS.

[0107] Next, in the important regional features IGS i Guided by the model, the building feature map BF i Adaptive learning of features in the corresponding region yields BF. i China and IGS i Features with strong correlation IBF i The calculation formula is as follows:

[0108] β i =sigmoid(BF i +IGS i )

[0109] IBF i =β i BF i

[0110] In the formula, β i ∈[0,1] represents the correlation between the i-th building feature map BF and the building geometric saliency important region feature map IGS.

[0111] Finally, the adaptively learned features IGS i and IBF i To supplement the semantic information of the buildings, the concept of residuals is then used in conjunction with the input BF. i The summation yields the building feature map (BG) after deep mutual learning. i The calculation formula is as follows:

[0112] BG i =BF i +Conv 3×3 (IBF i PIGS i )

[0113] In the formula, P represents the feature fusion operation along the channel dimension, and Conv 3×3 (·) represents a convolution operation with a kernel size of 3×3.

[0114] Step S104: Input the feature maps of adjacent buildings into the flow alignment module (FAM++) to obtain high-resolution building feature maps with strong semantic information.

[0115] In embodiments of the present invention, to obtain high-resolution building feature maps with strong semantic information and reduce the phenomenon of holes inside buildings, an improved flow alignment module (FAM++) is constructed. FAM++ incorporates the gating concept based on FAM (Flow Alignment Module), enabling it to better learn the semantic and spatial information between adjacent feature layers. First, the semantic flow field Δ∈R is calculated for adjacent feature layers. H×W×2 The calculation formula is as follows:

[0116] Δ=Conv 3×3 (Conv 1×1 (BG i )PUp(Conv 1×1 (BG i ′ +1 )))

[0117] Among them, BG i BG is a high-resolution building feature map with weak semantic information. i ′ +1 Conv is a low-resolution feature map of buildings with strong semantic information. 1×1 (·) represents a convolution operation with a kernel size of 1×1, and Up(·) represents an upsampling operation.

[0118] Next, to highlight the features of the same regions in adjacent feature layers, a fusion operation is performed on the adjacent feature layers, and then a gating operation is used to obtain the gated feature map g, the calculation formula of which is as follows:

[0119] g = Conv G (Conv 3×3 (Conv 1×1 (BG i )PUp(Conv 1×1 (BG i ′ +1 ))))

[0120] Among them, Conv G (·) represents a gating operation, which contains a convolutional layer with a kernel size of 1×1 and a sigmoid activation function.

[0121] Finally, warp operations are performed on adjacent feature layers using the same semantic flow field to obtain two high-resolution feature maps with aligned semantic information. These two semantically aligned feature maps are then weighted and fused using gated feature maps g and 1-g to obtain a high-resolution building feature map BG with strong semantic information. i The calculation formula is as follows:

[0122] BG i ′=gConv 1×1 (warp(BG i ′ +1 ,Δ))+(1-g)warp(BG i ,Δ)

[0123] Here, warp(·) represents the warp operation.

[0124] Step S105: The MLP layer is used to fuse the multi-level building feature maps to obtain the building prediction map of the remote sensing image.

[0125] It is understood that the MLP layer in this embodiment of the invention is a conventional network structure, and therefore will not be described in detail.

[0126] Optionally, in one embodiment of the present invention, in order to obtain more accurate building extraction results, a multi-objective loss function is constructed to optimize the network. The multi-objective loss function L consists of the loss of the final predicted map and the loss of the four feature maps output by the encoder, and the calculation formula is as follows:

[0127]

[0128] In the formula, L final Let L be the loss function for the final predicted image, Li be the loss function for the intermediate feature maps {BG1, BG2, BG3, BG4}, and λ be the weighting coefficient. In this invention, to reduce the impact of sample imbalance, L... final Using Dice loss and Focal Loss, L i Using Focal Loss, λ is set to 1.

[0129] The following detailed description of the building extraction method for deep mutual learning remote sensing images based on geometric saliency, according to a specific embodiment, illustrates the present invention.

[0130] The datasets used in the embodiments of the present invention are publicly available online datasets, namely WHU BuildingDataset, Massachusetts Building Dataset, and China Building Dataset.

[0131] like Figure 3 As shown, the deep mutual learning remote sensing image building extraction method based on geometric saliency includes the following steps:

[0132] Step 1: Acquire remote sensing imagery and create datasets. The WHU Building Dataset cropped all remote sensing images to a size of 256×256, totaling 14,868 sets. 8,460 sets were randomly selected for training, 3,626 for validation, and 2,782 for testing. The Massachusetts Building Dataset cropped all remote sensing images to a size of 256×256, totaling 3,775 sets. 3,425 sets were randomly selected for training, 100 for validation, and 250 for testing. The China Building Dataset contained 7,259 sets of 256×256 remote sensing images. 4,188 sets were randomly selected for training, 1,796 for validation, and 1,275 for testing.

[0133] Step 2: Extraction of Building Geometric Saliency Feature Maps. The GeoSay algorithm is used to obtain single-band building geometric saliency feature maps, which are then introduced into the deep learning network as prior knowledge.

[0134] Step 3: Preliminary extraction of building feature maps at different scales. The remote sensing image and the building geometric saliency feature map are input into the RGB semantic branch and the GS semantic branch, respectively, to obtain building feature maps {BF1,BF2,BF3,BF4} and {GS1,GS2,GS3,GS4} at different scales. The size of both is...

[0135]

[0136] Step 4: Deep mutual learning is performed between the building feature map and the building geometric saliency feature map. A bidirectional guided attention module (BGAM) is used to effectively capture the dependencies between the two building feature maps of the same scale in Step 3. Deep mutual learning between the two enhances the feature extraction effect of the target region. After deep mutual learning with BGAM, the building feature map {BG1, BG2, BG3, BG4} is obtained, with a size of [missing information].

[0137] Step 5: Extraction of high-resolution building feature maps with strong semantic information. The adjacent building feature maps from Step 4 are input into the improved flow alignment module (FAM++) to obtain a high-resolution building feature map with strong semantic information {BG1′, BG2′, BG3′, BG4}, with a size of [missing information].

[0138] Step 6: Use an MLP layer to fuse the multi-level building feature maps to obtain a prediction map. Input the high-resolution, semantically rich building feature maps {BG1′, BG2′, BG3′, BG4} from Step 5 into the MLP layer to obtain the building prediction result map.

[0139] Step 7: Optimize the network using a multi-objective loss function. Supervised optimization of the network is performed using a loss function that combines the joint prediction map and intermediate feature maps {BG1, BG2, BG3, BG4}, resulting in better performance for the GSDMLNet network.

[0140] Step 8: Use the trained model to extract buildings and evaluate accuracy. This invention is performed using an NVIDIA GeForce RTX 3090 GPU within the PyTorch framework, with Adaptive Moment Estimation (Adam) as the optimizer, a learning rate of 0.0001, a batch size of 30, and 100 training epochs. Five evaluation metrics—Precision, Recall, F1-Score (F1) intersection over union (IoU), and overall accuracy (OA)—were selected to quantitatively evaluate the building extraction performance.

[0141] The five methods GSDMLNet, PSPNet, Deeplabv3+, HRNet, and SegFormer extracted building data from the WHU BuildingDataset as follows: Figure 4 As shown, when buildings in remote sensing images are similar in color to the ground, the four methods—PSPNet, Deeplabv3+, HRNet, and SegFormer—perform poor building extraction. Because GSDMLNet incorporates building geometric saliency as prior knowledge, it can accurately extract buildings even when their colors are similar to the ground, relying on their structural features. For large-scale building extraction, PSPNet, Deeplabv3+, HRNet, and SegFormer are prone to hole artifacts. The use of the FAM++ module in GSDMLNet preserves the integrity of buildings and reduces the occurrence of hole artifacts in large-scale buildings.

[0142] The five methods GSDMLNet, PSPNet, Deeplabv3+, HRNet, and SegFormer extracted building data from the Massachusetts Building Dataset as follows: Figure 5 As shown, most of the buildings extracted by PSPNet and Deeplabv3+ exhibit a sticking-to-the-bundle phenomenon, while HRNet shows less sticking-to-the-bundle but the building outlines are more blurred. The building extraction results of PSPNet, Deeplabv3+, and HRNet are not ideal, while the building images extracted by SegFormer and GSDMLNet are similar to the label images. In particular, for the extraction of complex-shaped buildings, GSDMLNet extracts more refined and accurate building outlines than SegFormer.

[0143] The five methods GSDMLNet, PSPNet, Deeplabv3+, HRNet, and SegFormer extracted building data from the China Building Dataset as follows: Figure 6 As shown, Deeplabv3+ clearly performs the worst in building extraction, exhibiting a large number of incorrect extractions. While PSPNet can extract most buildings, it lacks sufficient detail in the building outlines. When there is shadow occlusion, HRNet and SegFormer are prone to missing buildings, while GSDMLNet can correctly extract buildings and maintain the integrity of their outlines. For extracting buildings with complex shapes, GSDMLNet is more suitable than HRNet and SegFormer. In particular, guided by both semantic and geometric saliency features of buildings, GSDMLNet can eliminate interference from some ground features with similar colors but no structural features, effectively ensuring the accuracy of building extraction.

[0144] Table 1 shows the accuracy evaluation results of each method on the WHU Building Dataset, Massachusetts Building Dataset, and China Building Dataset. GSDMLNet demonstrated superior performance on all three datasets, achieving the highest accuracy metrics. All methods performed well on the WHU Building Dataset, with GSDMLNet achieving Precision, Recall, F1, IoU, and OA of 97.89%, 97.57%, 97.73%, 95.61%, and 98.83%, respectively. For the Massachusetts Building Dataset, compared to the second-ranked SegFormer, GSDMLNet's Precision, Recall, F1, IoU, and OA were 0.35%, 2.77%, 2.55%, 2.71%, and 0.71% higher, respectively. GSDMLNet continues to perform well on the China Building Dataset, with its Precision, Recall, F1, IoU, and OA being 1.77%, 0.77%, 1.27%, 2.05%, and 1.13% higher than the better-performing SegFormer, respectively. GSDMLNet's excellent performance across different datasets demonstrates its versatility and generalization ability. Table 1 shows the accuracy evaluation results of each method on the WHU Building Dataset, Massachusetts Building Dataset, and China Building Dataset.

[0145] Table 1

[0146]

[0147] The proposed method for building extraction from remote sensing images based on geometric saliency, according to embodiments of the present invention, firstly, uses the GeoSay algorithm to obtain geometric saliency feature maps of buildings as prior knowledge and introduces them into a deep learning network to enhance the targeting of building feature extraction; secondly, a bidirectional guided attention module is constructed to perform deep mutual learning between the building feature maps and the geometric saliency feature maps in the dual-branch network; nextly, an improved flow alignment module is used to obtain high-resolution strong semantic feature maps; finally, a multi-objective loss function is constructed to optimize the network. This invention, by introducing prior knowledge of building geometry, constructing BGAM and FAM++ to further enhance the network's ability to learn building features, and jointly optimizing with a multi-objective loss function, can effectively improve the accuracy and stability of building extraction. Accurate building extraction and timely access to building information contribute to precise urban governance and efficient urban management.

[0148] Next, referring to the accompanying drawings, a deep mutual learning remote sensing image building extraction device based on geometric saliency proposed according to an embodiment of the present invention is described.

[0149] Figure 7 This is a schematic diagram of the structure of a deep mutual learning remote sensing image building extraction device based on geometric saliency according to an embodiment of the present invention.

[0150] like Figure 7 As shown, the deep mutual learning remote sensing image building extraction device 10 based on geometric saliency includes: extraction module 100, branch module 200, mutual learning module 300, input module 400 and fusion module 500.

[0151] The system comprises the following modules: Extraction module 100, which acquires remote sensing images and extracts geometric saliency feature maps of buildings from them; Branching module 200, which inputs the remote sensing images and building geometric saliency feature maps into the RGB semantic branch and GS semantic branch, respectively, to obtain building feature maps at different scales; Mutual learning module 300, which uses a bidirectional guided attention module to obtain the dependencies between building feature maps of the same scale in the RGB and GS semantic branches for deep mutual learning; Input module 400, which inputs adjacent building feature maps into the flow alignment module to obtain high-resolution building feature maps with strong semantic information; and Fusion module 500, which uses an MLP layer to fuse multi-level building feature maps to obtain predicted building maps of the remote sensing images.

[0152] Optionally, in an embodiment of the present invention, the building extraction method utilizes a multi-objective loss function to optimize the extraction process. The multi-objective loss function L consists of the loss of the final building prediction map and the loss of the four feature maps output by the bidirectional guided attention module, and is calculated using the following formula:

[0153]

[0154] Among them, L final L is the loss function for the final building prediction map. i λ is the loss function for the feature maps {BG1,BG2,BG3,BG4} output by the bidirectional guided attention module, where λ is the weight coefficient.

[0155] Optionally, in embodiments of the present invention, the extraction module 100 is specifically used for:

[0156] Based on the characteristics of building intersections in remote sensing imagery, a geometric saliency algorithm is used to obtain geometric saliency feature maps of buildings. The confidence level and angle information of the intersections are used as first-order saliency. The adjacency relationship of the intersection points as second-order saliency The geometric saliency feature map of a building is obtained by fusing the first-order and second-order saliency of intersection points on remote sensing images. The calculation formula is as follows:

[0157]

[0158] Where p represents each pixel in the remote sensing image, J represents all intersections of the remote sensing image, j represents a single intersection, and R... j This represents a quadrilateral region formed by a single intersection point with branches and the included angle θ between the branches. Indicates that pixel p is in R j The value is 1 if it falls within the specified range, otherwise the value is 0.

[0159]

[0160] Where ρ is the confidence level of the intersection point, P(j∈J) B |j θ ) indicates that the included angle at the intersection is j. θ The probability that j belongs to the intersection of buildings when ∈[0,π);

[0161]

[0162] Where z is the size of the neighborhood, N j Let j′ represent the set of intersection points of the domains. j Intersection within, and Indicate that j and j′ are in R j The center point within the range, τ represents the center point of the range. and The distances between them are standardized.

[0163] Optionally, in embodiments of the present invention, the branch module 200 is specifically used for:

[0164] Remote sensing images are represented as I∈R H×W×C H, W, and C represent the height, width, and number of channels of the remote sensing image, respectively. The remote sensing image is input into the RGB semantic branch to obtain four building feature maps {BF1, BF2, BF3, BF4} at different scales, with a size of [missing information].

[0165] The geometric saliency algorithm is used to obtain single-band building geometric saliency feature maps. These feature maps are then input into the GS semantic branch to obtain four building feature maps {GS1, GS2, GS3, GS4} at different scales, each containing geometric saliency.

[0166] Optionally, in embodiments of the present invention, the mutual learning module 300 is specifically used for:

[0167] In the building feature diagram BF i Guided by the geometric saliency feature map GS of buildings i Adaptive learning of features in the corresponding region, preserving the geometric saliency feature map GS of buildings. i BF (Building Feature Diagram) i IGS, a key feature of highly correlated regions i The calculation formula is as follows:

[0168] α i =sigmoid(BF i GS i )

[0169] IGS i =α i GS i

[0170] In the formula, i = 1, 2, 3, 4, sigmoid(·) is the activation function, and α i ∈[0,1] represents the correlation between the i-th building feature map BF and the building geometric saliency feature map GS;

[0171] IGS in important regional features i Guided by the model, the building feature map BF i Adaptive learning of features in the corresponding region yields a building feature map (BF). i IGS with important regional characteristics i Features with strong correlation IBFi The calculation formula is as follows:

[0172] β i =sigmoid(BF i +IGS i )

[0173] IBF i =β i BF i

[0174] In the formula, β i ∈[0,1] represents the correlation between the i-th building feature map BF and the building geometric saliency important region feature map IGS;

[0175] IGS (Integrated Geometric System) of Important Regions i and IBF i The data is then fused together, and the concept of residuals is used in conjunction with the input BF. i The summation yields the building feature map (BG) after deep mutual learning. i The calculation formula is as follows:

[0176] BG i =BF i +Conv 3×3 (IBF i PIGS i )

[0177] Where P represents the feature fusion operation along the channel dimension, Conv 3×3 (·) represents a convolution operation with a kernel size of 3×3.

[0178] Optionally, in embodiments of the present invention, the input module 400 is specifically used for:

[0179] Calculate the semantic flow field Δ∈R for adjacent feature layers H×W×2 The calculation formula is as follows:

[0180] Δ=Conv 3×3 (Conv 1×1 (BG i )PUp(Conv 1×1 (BG i ′ +1 )))

[0181] Among them, BG i BG is a high-resolution building feature map with weak semantic information. i ′ +1 Conv is a low-resolution feature map of buildings with strong semantic information. 1×1(·) represents a convolution operation with a kernel size of 1×1, and Up(·) represents an upsampling operation;

[0182] The adjacent feature layers are fused, and then a gating operation is performed to obtain the gated feature map g. The calculation formula is as follows:

[0183] g = Conv G (Conv 3×3 (Conv 1×1 (BG i )PUp(Conv 1×1 (BG i ′ +1 ))))

[0184] Among them, Conv G (·) represents a gating operation, which includes a convolutional layer with a kernel size of 1×1 and a sigmoid activation function;

[0185] By applying the same semantic flow field to adjacent feature layers, two high-resolution feature maps with aligned semantic information are obtained. Then, the two semantically aligned feature maps are weighted and fused using gated feature maps g and 1-g to obtain a high-resolution building feature map BG with strong semantic information. i The calculation formula is as follows:

[0186] BG i ′=gConv 1×1 (warp(BG i ′ +1 ,Δ))+(1-g)warp(BG i ,Δ)

[0187] Here, warp(·) represents the warp operation.

[0188] It should be noted that the foregoing explanation of the embodiment of the deep mutual learning remote sensing image building extraction method based on geometric saliency also applies to the deep mutual learning remote sensing image building extraction device based on geometric saliency in this embodiment, and will not be repeated here.

[0189] The building extraction device based on geometric saliency-based deep mutual learning remote sensing imagery proposed in this invention first employs the GeoSay algorithm to acquire building geometric saliency feature maps as prior knowledge, which are then introduced into a deep learning network to enhance the targeting of building feature extraction. Second, a bidirectional guided attention module is constructed to perform deep mutual learning between the building feature maps and the building geometric saliency feature maps in the dual-branch network. Next, an improved flow alignment module is used to acquire high-resolution strong semantic feature maps. Finally, a multi-objective loss function is constructed to optimize the network. This invention, by introducing prior knowledge of building geometry, constructing BGAM and FAM++ to further enhance the network's ability to learn building features, and combining this with a multi-objective loss function for optimization, can effectively improve the accuracy and stability of building extraction. Accurate building extraction and timely access to building information contribute to precise urban governance and efficient urban management.

[0190] Furthermore, embodiments of the present invention also provide an electronic device, including: a memory, a processor, and a computer program stored in the memory and executable on the processor, wherein the processor executes the program to implement the geometric saliency-based deep mutual learning remote sensing image building extraction method as described in the above embodiments.

[0191] Furthermore, embodiments of the present invention provide a computer-readable storage medium storing a computer program that, when executed by a processor, implements the above-described method for extracting buildings from deep mutual learning remote sensing images based on geometric saliency.

[0192] In addition, embodiments of the present invention also provide a computer program product, including a computer program, which is executed to implement the geometric saliency-based deep mutual learning remote sensing image building extraction method of the above embodiments.

[0193] In the description of this specification, the references to terms such as "one embodiment," "some embodiments," "example," "specific example," or "some examples," etc., refer to specific features, structures, materials, or characteristics described in connection with that embodiment or example, which are included in at least one embodiment or example of the present invention. In this specification, the illustrative expressions of the above terms do not necessarily refer to the same embodiment or example. Furthermore, the specific features, structures, materials, or characteristics described may be combined in any suitable manner in one or more embodiments or examples. Moreover, without contradiction, those skilled in the art can combine and integrate the different embodiments or examples described in this specification, as well as the features of different embodiments or examples.

[0194] Furthermore, the terms "first" and "second" are used for descriptive purposes only and should not be construed as indicating or implying relative importance or implicitly specifying the number of indicated technical features. Thus, a feature defined as "first" or "second" may explicitly or implicitly include at least one of that feature. In the description of this invention, "N" means at least two, such as two, three, etc., unless otherwise explicitly specified.

[0195] Any process or method description in the flowchart or otherwise herein can be understood as representing a module, segment, or portion of code comprising one or N executable instructions for implementing custom logic functions or processes, and the scope of preferred embodiments of the invention includes additional implementations in which functions may be performed not in the order shown or discussed, including substantially simultaneously or in reverse order depending on the functions involved, as will be understood by those skilled in the art to which embodiments of the invention pertain.

Claims

1. A method for extracting buildings from deep mutual learning remote sensing images based on geometric saliency, characterized in that, Includes the following steps: Acquire remote sensing images and extract geometric saliency feature maps of buildings from the remote sensing images; The remote sensing image and the geometric saliency feature map of the building are respectively input into the RGB semantic branch and the GS semantic branch to obtain building feature maps at different scales; A bidirectional guided attention module is used to obtain the dependency relationship between the feature maps of buildings of the same scale in the RGB semantic branch and the GS semantic branch, and deep mutual learning is performed. Specifically, a bidirectional guided attention module is used to obtain the dependencies between building feature maps of the same scale in the RGB semantic branch and the GS semantic branch, and deep mutual learning is performed, including: In building feature diagram Guided by the geometric saliency feature map of the building Adaptive learning of features in the corresponding region, preserving the geometric saliency feature map of buildings. Central and building feature diagrams Key regional features with strong correlation The calculation formula is as follows: In the formula, , For activation function, For the first Building feature map Geometric saliency feature diagram of buildings The correlation; Features in important regions Under guidance, building feature map Adaptive learning of features in the corresponding region yields building feature maps. Central and important regional characteristics Features with strong correlation The calculation formula is as follows: In the formula, For the first Building feature map Geometric saliency of important regions feature map The correlation; Adaptively learned important region features and The process involves fusion, and then utilizing the concept of residuals with the input. The summation yields the building feature map after deep mutual learning. The calculation formula is as follows: in, This represents a feature fusion operation along the channel dimension. The kernel size represents the convolution kernel size. Convolution operations; The feature maps of adjacent buildings are input into the flow alignment module to obtain high-resolution building feature maps with strong semantic information; The MLP layer is used to fuse multi-level building feature maps to obtain building prediction maps from remote sensing images.

2. The method according to claim 1, characterized in that, In the building extraction method, a multi-objective loss function is used to optimize the extraction process. The loss consists of the loss from the final building prediction map and the loss from the four feature maps output by the bidirectional guided attention module. The calculation formula is as follows: in, The loss function for the final building prediction map. Feature maps output by the bidirectional guided attention module loss function, These are the weighting coefficients.

3. The method according to claim 1, characterized in that, Acquire remote sensing imagery and extract geometric saliency feature maps of buildings from the remote sensing imagery, including: Based on the characteristics of building intersections in remote sensing imagery, a geometric saliency algorithm is used to obtain geometric saliency feature maps of buildings. The confidence level and angle information of the intersections are used as first-order saliency. The adjacency relationship of the intersection points is used as a second-order saliency. The geometric saliency feature map of a building is obtained by fusing the first-order and second-order saliency of intersection points on remote sensing images. The calculation formula is as follows: in, For each pixel in the remote sensing image, For all intersections of remote sensing images, For a single intersection, Indicates a single intersection point with branches and the angle between the branches. The resulting quadrilateral region Represents pixels exist The value is 1 if it falls within the specified range, otherwise the value is 0. in, The confidence level of the intersection point. The angle between the intersection points is... hour The probability of being located at a building intersection; in, For the size of the domain, Represents the set of intersections of domains. for Intersection within, and express and exist The center point within the range, Indicates will and The distances between them are standardized.

4. The method according to claim 1, characterized in that, The remote sensing image and the geometric saliency feature map of the building are input into the RGB semantic branch and the GS semantic branch, respectively, to obtain building feature maps at different scales, including: Representing remote sensing images as , , and The height, width, and number of channels of the remote sensing image are used as the inputs to the RGB semantic branch to obtain building feature maps at four different scales. Its size is ; The geometric saliency feature map of the building is obtained by the geometric saliency algorithm in a single band. The geometric saliency feature map of the building is then input into the GS semantic branch to obtain building feature maps containing geometric saliency at four different scales. Its size is .

5. The method according to claim 1, characterized in that, The feature maps of adjacent buildings are input into the flow alignment module to obtain high-resolution building feature maps with strong semantic information, including... Calculate the semantic flow field for adjacent feature layers The calculation formula is as follows: in, This is a high-resolution feature map of buildings with weak semantic information. This is a low-resolution feature map of buildings with strong semantic information. The kernel size represents the convolution kernel size. Convolution operation, This represents an upsampling operation; The adjacent feature layers are fused, and then a gating operation is performed to obtain the gated feature map. The calculation formula is as follows: in, This represents a gating operation, which includes a convolutional kernel of size [size missing]. A convolutional layer and one sigmoid activation function; By applying the same semantic flow field to adjacent feature layers, a warp operation is performed to obtain two high-resolution feature maps with aligned semantic information. Then, the feature maps are gated. and Weighted fusion of two semantically aligned feature maps yields a high-resolution building feature map with strong semantic information. The calculation formula is as follows: in, This represents the warp operation.

6. A device for extracting buildings from deep mutual learning remote sensing images based on geometric saliency, characterized in that, include: The extraction module is used to acquire remote sensing images and extract geometric saliency feature maps of buildings in the remote sensing images; The branch module is used to input the remote sensing image and the geometric saliency feature map of the building into the RGB semantic branch and the GS semantic branch respectively to obtain building feature maps at different scales; The mutual learning module is used to obtain the dependency relationship between the building feature maps of the same scale in the RGB semantic branch and the GS semantic branch through the bidirectional guided attention module, and to perform deep mutual learning. The input module is used to input the feature maps of adjacent buildings into the flow alignment module to obtain high-resolution building feature maps with strong semantic information; The fusion module is used to fuse multi-level building feature maps using an MLP layer to obtain a building prediction map of the remote sensing image. Specifically, a bidirectional guided attention module is used to obtain the dependencies between building feature maps of the same scale in the RGB semantic branch and the GS semantic branch, and deep mutual learning is performed, including: In building feature diagram Guided by the geometric saliency feature map of the building Adaptive learning of features in the corresponding region, preserving the geometric saliency feature map of buildings. Central and building feature diagrams Key regional features with strong correlation The calculation formula is as follows: In the formula, , For activation function, For the first Building feature map Geometric saliency feature diagram of buildings The correlation; Features in important regions Under guidance, building feature map Adaptive learning of features in the corresponding region yields building feature maps. Central and important regional characteristics Features with strong correlation The calculation formula is as follows: In the formula, For the first Building feature map Geometric saliency of important regions feature map The correlation; Adaptively learned important region features and The process involves fusion, and then utilizing the concept of residuals with the input. The summation yields the building feature map after deep mutual learning. The calculation formula is as follows: in, This represents a feature fusion operation along the channel dimension. The kernel size represents the convolution kernel size. The convolution operation.

7. An electronic device, characterized in that, include: The memory, the processor, and the computer program stored in the memory and executable on the processor, the processor executing the program to implement the geometric saliency-based deep mutual learning remote sensing image building extraction method as described in any one of claims 1-6.

8. A computer-readable storage medium having a computer program stored thereon, characterized in that, The program is executed by the processor to implement the deep mutual learning remote sensing image building extraction method based on geometric saliency as described in any one of claims 1-6.

9. A computer program product, comprising a computer program, characterized in that, The computer program is executed to implement the method for extracting buildings from deep mutual learning remote sensing images based on geometric saliency as described in any one of claims 1-6.