Unmanned aerial vehicle cross-view positioning and navigation method and device based on semi-supervised feature enhancement

By employing a semi-supervised feature enhancement method, a Siamese network model was constructed using a variational autoencoder and a ResNet50-ibn-a network. This solved the accuracy problem of cross-view positioning and navigation for UAVs in complex environments, achieving high-precision image matching and navigation.

CN117746231BActive Publication Date: 2026-06-19NAT UNIV OF DEFENSE TECH

Patent Information

Authority / Receiving Office
CN · China
Patent Type
Patents(China)
Current Assignee / Owner
NAT UNIV OF DEFENSE TECH
Filing Date
2023-12-01
Publication Date
2026-06-19

AI Technical Summary

Technical Problem

Existing cross-view positioning and navigation methods for UAVs struggle to achieve high-precision matching in complex environments or without GPS signals, and traditional methods are easily limited by their reliance on external equipment.

Method used

A semi-supervised feature enhancement method is adopted, which constructs a Siamese network model using a variational autoencoder and a ResNet50-ibn-a network. Through self-supervised feature enhancement and cross-domain hard sample sampling, the image matching accuracy is improved, and cross-view positioning and navigation of UAVs is realized.

🎯Benefits of technology

In complex environments or without GPS signals, high-precision cross-view image matching and UAV positioning and navigation were achieved, improving the model's cross-view matching accuracy and recognition capability.

✦ Generated by Eureka AI based on patent content.

Smart Images

  • Figure CN117746231B_ABST
    Figure CN117746231B_ABST
Patent Text Reader

Abstract

This application relates to a method and apparatus for UAV cross-view positioning and navigation based on semi-supervised feature enhancement. The method includes: training a variational autoencoder using a pre-built variational autoencoder loss function to preprocess geographic information data; performing self-supervised feature enhancement on the preprocessed geographic information data using the trained variational autoencoder to obtain an enhanced image; constructing a Siamese network model based on a ResNet50-ibn-a network and a feature enhancement module; sampling from a UAV aerial photography database and a satellite remote sensing image database using a cross-domain hard sample sampling method; training the Siamese network model based on the sampled hard samples; and performing UAV cross-view positioning and navigation using a cross-view matching model. This method enables UAV positioning and navigation in complex environments or in situations without GPS signals.
Need to check novelty before this filing date? Find Prior Art

Description

Technical Field

[0001] This application relates to the field of remote sensing image technology, and in particular to a cross-view positioning and navigation method and device for unmanned aerial vehicles based on semi-supervised feature enhancement. Background Technology

[0002] With the rapid development of satellite remote sensing imagery technology and the widespread application of unmanned aerial vehicle (UAV) platforms, acquiring image data of the same geographical location from different platforms has become increasingly easier. In particular, the continuous development and maturation of UAV technology has endowed UAV platforms with high mobility and flexibility, enabling low-cost continuous operation and the acquisition of high-resolution, high-precision remote sensing image data at low and medium altitudes. Simultaneously, deep learning-based computer vision technology, by learning the feature representations and matching rules of large amounts of image data, has demonstrated powerful capabilities in image matching. These technologies have made cross-view positioning and navigation methods based on UAVs and satellite remote sensing imagery possible, and are continuously driving the development and improvement of this method.

[0003] The UAV cross-view positioning and navigation method refers to completing both geographic positioning and target location navigation for a UAV solely based on image data and geographic coordinates, and it has broad application prospects in agriculture, surveying, and other fields. By equipping UAVs with various sensors and cameras, high-resolution, multi-view oblique-view data can be acquired, containing rich geographic image information. From another bird's-eye view, remote sensing satellite images often contain precise GPS coordinate information. Matching UAV images with satellite images allows for indirect real-time positioning of the UAV, enabling UAV navigation.

[0004] Previous cross-view geolocation methods primarily involved matching image pairs from different platforms, mainly using street view images captured by mobile phones or cameras and satellite remote sensing imagery. However, these methods have the following drawbacks: First, there are significant differences between street view and satellite viewpoints, making it difficult to match the same geographic target from both perspectives; second, street view data lacks sufficient representational information, making it difficult to effectively describe the geographic target's representational information using only a single image and a single viewpoint. In contrast, drones can easily acquire data on the same geographic target from different perspectives while in a circling flight, effectively obtaining global representational information for the same geographic target; furthermore, drone view data and satellite remote sensing data share certain similarities, thus enabling better cross-view image matching.

[0005] Traditional positioning and navigation methods often rely on external devices such as the Global Positioning System (GPS), but these methods may not meet requirements in complex environments or when GPS signals are unavailable. Therefore, developing a UAV positioning and navigation method that relies solely on image data and geographic coordinates is of great significance. Summary of the Invention

[0006] Therefore, it is necessary to provide a method and apparatus for UAV cross-view positioning and navigation based on semi-supervised feature enhancement, which can achieve UAV positioning and navigation in some complex environments or in the absence of GPS signals, in order to address the above-mentioned technical problems.

[0007] A cross-view localization and navigation method for unmanned aerial vehicles (UAVs) based on semi-supervised feature enhancement, the method comprising:

[0008] A drone aerial photography database and a satellite remote sensing image database are constructed based on high-quality geographic information data acquired by drones equipped with high-resolution optoelectronic payloads.

[0009] The variational autoencoder is trained using a pre-constructed variational autoencoder loss function to obtain the trained variational autoencoder.

[0010] Geographic information data is preprocessed and then self-supervised feature enhancement is performed on the preprocessed geographic information data based on the trained variational autoencoder to obtain the enhanced image;

[0011] A Siamese network model was constructed based on the Resnet50-ibn-a network and feature enhancement module; the UAV aerial photography database and satellite remote sensing image database were sampled using the cross-domain hard sample sampling method, and the Siamese network model was trained based on the sampled hard samples to obtain a cross-view matching model;

[0012] Cross-view matching model is used for UAV cross-view positioning and navigation.

[0013] In one embodiment, the pre-built variational autoencoder loss function is:

[0014]

[0015] Where D is the data dimension, x i It is the value of the i-th pixel in the original input image x, x rec_i It generates image x rec The value of the i-th pixel, where N is the dimension of the latent variable, μ enc and σ enc are the mean and standard deviation of the i-th dimension of the latent variable output by the encoder, respectively, and β is the weight parameter.

[0016] In one embodiment, the preprocessing process includes resizing, randomly flipping, rotating, and color-dithering the data from different perspectives; and performing self-supervised feature enhancement on the preprocessed geographic information data using a variational autoencoder to obtain an enhanced image, including:

[0017] The encoder in the variational autoencoder maps the preprocessed geographic information data to obtain the corresponding latent variables; the decoder in the variational autoencoder maps the latent variables back to the data space to obtain the reconstructed image; the reconstructed image is then normalized to obtain the enhanced image.

[0018] In one embodiment, the latent variables are mapped back to the data space using the decoder in the trained variational autoencoder to obtain a reconstructed image; the reconstructed image is then normalized to obtain an enhanced image, including:

[0019] The latent variables are decoded using the decoder in the trained variational autoencoder to obtain image samples. These image samples are then mapped back to the data space to obtain the reconstructed image.

[0020]

[0021] Where, x rec Represents an image sample, x rec_min Let x represent the smallest image sample. rec_max Indicates maximum

[0022] Image samples;

[0023] The reconstructed image is normalized to obtain the enhanced image.

[0024] I enhanced =I×Mask

[0025] Where I represents the original image.

[0026] In one embodiment, the feature enhancement module is:

[0027] F max =MaxPool(FC(ReLu(FC(sigmod(F)))))×F

[0028] F mean =AvgPool(FC(ReLu(FC(sigmod(F)))))×F

[0029] F′=Concat(F mean, F max )

[0030] Where F represents shared features, MaxPool represents global max pooling layer, AvgPool represents global average pooling layer, FC represents fully connected layer, ReLU and sigmoid are activation functions, and Concat represents concatenating features along the channel dimension.

[0031] In one embodiment, the Siamese network model is trained based on sampled hard samples to obtain a cross-view matching model, including:

[0032] The Siamese network model is jointly trained using the triplet loss function and cross-entropy loss function of the sampled hard samples to obtain a cross-view matching model; the triplet loss function of the sampled hard samples is...

[0033]

[0034] Where N represents the data size of a batch, S1 and S2 represent data from different viewpoints, and d a,p and d a,n α represents the Euclidean distance between the anchor sample and the positive and negative samples, respectively, and α is the boundary value.

[0035] In one embodiment, cross-view matching model is used for UAV cross-view positioning and navigation, including:

[0036] The target image and the image to be matched are input into the cross-view matching model. The feature enhancement module enhances the target image and the image to be matched to obtain the feature vectors of the target image and the image to be matched. The Euclidean distance between the target feature vector and all the images to be matched is calculated to obtain the retrieval results.

[0037] Calculate the Mahalanobis distance and Jaccard distance between the feature vectors of the target image and the image to be matched, and then perform a weighted sum of the Mahalanobis distance and Jaccard distance to obtain the similarity result;

[0038] The search results are reordered based on the similarity results to obtain an optimized matching sequence; the UAV cross-view positioning and navigation is then performed based on the optimized matching sequence.

[0039] In one embodiment, calculating the Euclidean distance between the target feature vector and all images to be matched includes:

[0040] Calculate the Euclidean distance between the target feature vector and all images to be matched.

[0041]

[0042] Among them, v target [i] represents the feature vector of the target image i, v query [i] represents the feature vector of the image i to be matched.

[0043] A cross-view localization and navigation device for unmanned aerial vehicles (UAVs) based on semi-supervised feature enhancement, the device comprising:

[0044] The database construction module is used to build a drone aerial photography database and a satellite remote sensing image database based on high-quality geographic information data acquired by drones equipped with high-resolution optoelectronic payloads.

[0045] The variational autoencoder training module is used to train the variational autoencoder using a pre-built variational autoencoder loss function to obtain the trained variational autoencoder.

[0046] The image enhancement module is used to preprocess geographic information data and perform self-supervised feature enhancement on the preprocessed geographic information data based on the trained variational autoencoder to obtain the enhanced image.

[0047] The cross-view matching model training module is used to construct a Siamese network model based on the Resnet50-ibn-a network and the feature enhancement module; the UAV aerial photography database and satellite remote sensing image database are sampled according to the cross-domain hard sample sampling method, and the Siamese network model is trained based on the sampled hard samples to obtain the cross-view matching model;

[0048] The UAV cross-view positioning and navigation module is used to perform UAV cross-view positioning and navigation using a cross-view matching model.

[0049] The above-mentioned UAV cross-view positioning and navigation method and device based on semi-supervised feature enhancement is constructed by using high-quality geographic information data obtained by UAVs carrying high-resolution optoelectronic payloads to build UAV aerial photography database and satellite remote sensing image database. The variational autoencoder is trained using a pre-constructed variational autoencoder loss function to obtain the trained variational autoencoder. The self-supervised trained variational autoencoder is used to obtain the prior key region mask. After the image feature enhancement is achieved without introducing additional labels, it is convenient to improve the accuracy of subsequent image matching. At the same time, a Siamese network model is constructed based on the Resnet50-ibn-a network and the feature enhancement module. This application employs a cross-domain hard sample sampling method to sample data from UAV aerial photography databases and satellite remote sensing image databases. A Siamese network model is trained based on these hard samples, using ResNet50-ibn-a as the baseline network. Shared feature representations from different source data are learned through shared weights, and the CWAM module is used to further enhance these shared features, resulting in more discriminative and generalizable features. This improves the model's cross-view matching accuracy. The cross-domain hard sample sampling method, combined with a cross-domain triplet loss function, reduces the distance between images with the same ID from different viewpoints in the feature space, enhancing the model's ability to recognize cross-view images. This application divides the cross-view image matching method into three stages: data preprocessing, model training, and post-processing. It enables UAV localization and navigation tasks based on cross-view image matching, solving the difficulties faced by previous cross-view matching methods in effectively matching street view images and satellite remote sensing image data. This method can achieve high-precision cross-view matching using multi-view data from UAVs in complex environments or without GPS signals. Attached Figure Description

[0050] Figure 1 This is a flowchart illustrating a cross-view localization and navigation method for unmanned aerial vehicles (UAVs) based on semi-supervised feature enhancement in one embodiment.

[0051] Figure 2 This is a schematic diagram of the CWAM feature enhancement module in one embodiment;

[0052] Figure 3 This is a structural block diagram of a UAV cross-view positioning and navigation device based on semi-supervised feature enhancement in one embodiment. Detailed Implementation

[0053] To make the objectives, technical solutions, and advantages of this application clearer, the following detailed description is provided in conjunction with the accompanying drawings and embodiments. It should be understood that the specific embodiments described herein are merely illustrative and not intended to limit the scope of this application.

[0054] In one embodiment, such as Figure 1As shown, a cross-view localization and navigation method for UAVs based on semi-supervised feature enhancement is provided, including the following steps:

[0055] Step 102: Construct a drone aerial photography database and a satellite remote sensing image database based on high-quality geographic information data acquired by drones equipped with high-resolution optoelectronic payloads.

[0056] To meet the positioning and navigation requirements of UAVs during long-duration flights, a large-scale UAV aerial imagery database and a corresponding regional remote sensing satellite imagery database are needed. This application utilizes UAVs equipped with high-resolution electro-optical payloads to acquire high-quality geographic information data. A UAV carrying an electro-optical payload is selected, and a suitable flight path is designed to cover the region of interest, ensuring sufficient perspective variation and overlap. The path includes horizontal movement, altitude changes, and angle changes to simulate real-world navigation scenarios. The altitudes are set to three representative low, medium, and high altitudes (200 meters, 500 meters, and 1000 meters) for practical applications. Following the set flight path, the UAV maintains the same speed at each altitude to ensure sufficient image acquisition from different angles. Satellite imagery data is acquired through open data sources, and the satellite images are registered with the UAV perspective data to ensure they are in the same geographic coordinate system. Ground truth annotation is performed on the UAV aerial images and remote sensing satellite images, marking ground features and landmark information for verification and evaluation of the UAV positioning and navigation algorithm.

[0057] The database is divided into training and testing parts. The training set is first divided into a UAV aerial photography database training set and a remote sensing satellite imagery database training set according to different perspectives. In the training sets of different perspectives, the data are numbered and assigned ID information according to building landmark information. The test set includes a query set and a search set, and the IDs of the test set and the training set do not overlap. Corresponding query sets and search sets are established for different perspective databases to realize UAV localization and navigation functions based on image matching methods.

[0058] Step 104: Train the variational autoencoder using the pre-built variational autoencoder loss function to obtain the trained variational autoencoder.

[0059] By training the variational autoencoder with a pre-constructed loss function, the trained variational autoencoder can more effectively highlight the features of the target subject while suppressing environmental noise, thus achieving image feature enhancement.

[0060] Step 106: Preprocess the geographic information data and perform self-supervised feature enhancement on the preprocessed geographic information data based on the trained variational autoencoder to obtain the enhanced image.

[0061] Data preprocessing mainly includes two parts: resizing data from different perspectives, data augmentation and other transformation operations, and self-supervised feature enhancement. Input images are standardized to 256×256 pixels, and data augmentation methods such as random flipping, rotation, and color dithering are used to enhance the model's robustness and generalization ability.

[0062] Step 108: Construct a Siamese network model based on the ResNet50-ibn-a network and feature enhancement module; sample the UAV aerial photography database and satellite remote sensing image database according to the cross-domain difficult sample sampling method, and train the Siamese network model based on the sampled difficult samples to obtain a cross-view matching model.

[0063] Step 110: Use the cross-view matching model to perform cross-view positioning and navigation for the UAV.

[0064] This application employs a cross-domain hard sample sampling method, selecting images taken at the same geographical location from both UAV aerial photography databases and satellite remote sensing image databases as input to the model. By selecting difficult samples for training, this application enables the model to generalize better and improves its performance in practical applications.

[0065] This application uses ResNet50-ibn-a as the baseline network and learns shared feature representations from different source data through shared weights. Subsequently, the CWAM feature enhancement module is used to further enhance the shared features, obtaining more discriminative and generalizable features. Specifically, a weight-sharing ResNet50-ibn-a Siamese network is used as the baseline network to process data from different perspectives, performing cross-perspective feature extraction. After the baseline network, the CWAM feature enhancement module adds weights to different channels of the features, and then concatenates the weighted features. In the CWAM feature enhancement module, the feature channel dimension becomes twice that of the baseline network's output features, increasing the network's representational and learning capabilities. The structure of the CWAM feature enhancement module is as follows: Figure 2 As shown.

[0066] Joint training is performed using the cross-domain triplet loss function and the cross-entropy loss function. The cross-domain triplet loss function uses the features output by the CWAM feature enhancement module as input, while the cross-entropy loss function is placed after the classification module to ensure training stability and accelerate model convergence.

[0067] In the testing phase, cross-domain image measurement and matching are completed. The model's performance on the test set is evaluated to determine its cross-view matching performance. First, the target image and the image to be matched are input into the model, and the output of the feature enhancement module is used as the feature vectors of the target image and the image to be matched. Next, similarity measurement is performed, calculating the Euclidean distance between the target image feature vector and all images to be matched. The smaller the Euclidean distance, the more similar the two feature vectors are. The results are sorted according to the similarity values ​​to obtain a preliminary retrieval sequence, i.e., the retrieval results. The Mahalanobis distance and Jaccard distance between the target image feature vector and the image to be matched feature vector are calculated, and then the two parts are weighted and summed to obtain the similarity result. The retrieval results are re-sorted based on the similarity result to obtain an optimized matching sequence, which is the cross-view matching result. Based on the cross-view matching results of UAV aerial images and remote sensing satellite imagery, UAV localization and navigation based on image matching is achieved.

[0068] In the aforementioned UAV cross-view positioning and navigation method based on semi-supervised feature enhancement, this application constructs a UAV aerial photography database and a satellite remote sensing image database by acquiring high-quality geographic information data from a UAV carrying a high-resolution optoelectronic payload. A pre-built variational autoencoder loss function is used to train the variational autoencoder, resulting in a trained variational autoencoder. The self-supervised trained variational autoencoder is then used to obtain a priori key region masks. This enhances image features without introducing additional labels, improving the accuracy of subsequent image matching. Simultaneously, a Siamese network model is constructed based on the ResNet50-ibn-a network and the feature enhancement module. This application employs a cross-domain hard sample sampling method to sample data from UAV aerial photography databases and satellite remote sensing image databases. Based on the sampled hard samples, a Siamese network model is trained, using ResNet50-ibn-a as the baseline network. Shared feature representations from different source data are learned through shared weights, and the CWAM module is used to further enhance these shared features, resulting in more discriminative and generalizable features. This improves the model's cross-view matching accuracy. The cross-domain hard sample sampling method, combined with a cross-domain triplet loss function, reduces the distance between images with the same ID from different viewpoints in the feature space, enhancing the model's ability to recognize cross-view images. This application divides the cross-view image matching method into three stages: data preprocessing, model training, and post-processing. It enables UAV localization and navigation tasks based on cross-view image matching, solving the difficulties faced by previous cross-view matching methods in effectively matching street view images and satellite remote sensing image data. It can achieve high-precision cross-view matching using multi-view data from UAVs in complex environments or without GPS signals.

[0069] In one embodiment, the pre-built variational autoencoder loss function is:

[0070]

[0071] Where D is the data dimension, x i x is the value of the i-th pixel in the original input image x, and xrec_i is the value of the generated image x. rec The value of the i-th pixel, where N is the dimension of the latent variable, μ enc and σ enc are the mean and standard deviation of the i-th dimension of the latent variable output by the encoder, respectively, and β is the weight parameter.

[0072] In one embodiment, the preprocessing process includes resizing, randomly flipping, rotating, and color-dithering the data from different perspectives; and performing self-supervised feature enhancement on the preprocessed geographic information data using a variational autoencoder to obtain an enhanced image, including:

[0073] The encoder in the variational autoencoder maps the preprocessed geographic information data to obtain the corresponding latent variables; the decoder in the variational autoencoder maps the latent variables back to the data space to obtain the reconstructed image; the reconstructed image is then normalized to obtain the enhanced image.

[0074] In one embodiment, the latent variables are mapped back to the data space using the decoder in the trained variational autoencoder to obtain a reconstructed image; the reconstructed image is then normalized to obtain an enhanced image, including:

[0075] The latent variables are decoded using the decoder in the trained variational autoencoder to obtain image samples. These image samples are then mapped back to the data space to obtain the reconstructed image.

[0076]

[0077] Where, x rec Represents an image sample, x rec_min Let x represent the smallest image sample. rec_max Represents the largest image sample;

[0078] The reconstructed image is normalized to obtain the enhanced image.

[0079] I enhanced =I×Mask

[0080] Where I represents the original image.

[0081] In a specific embodiment, the self-supervised feature enhancement step is completed using a trained variational autoencoder. The encoder maps the input data x to the conditional probability distribution q(z|x) of the latent variable z, including the mean μ. enc and standard deviation σenc , where μ enc and σ enc They are all vectors, with each dimension corresponding to a dimension of the latent variable z.

[0082] μ enc , σ enc =Encoder(x)

[0083]

[0084] The decoder maps the latent variable z back to the data space to generate the reconstructed image x. rec The conditional probability distribution p(x|z) is given by the following formula. The decoder takes the latent variable z as input and outputs the mean vector μ of the reconstructed image. dec and standard deviation vector σ dec As shown below:

[0085] μ dec , σ dec =Decoder(x)

[0086]

[0087] Where, μ dec and σ dec They are all vectors, with each dimension corresponding to the reconstructed image x. rec A pixel value.

[0088] To achieve random sampling and data generation, sampling is performed from the latent variable distribution output by the encoder. Specifically, sampling is performed from a Gaussian distribution N(μ) enc ,σ enc We sample from the data to obtain the latent variable z, which is represented as:

[0089]

[0090] Then, the sampled latent variable z is input into the decoder to obtain the generated reconstructed image x. rec The mean vector μ dec and standard deviation vector σ dec Then, from the Gaussian distribution N(μ) dec ,σ dec Sampling is performed in ) to obtain the generated image sample x. rec As shown in the following formula:

[0091]

[0092] x rec After normalization, the enhanced image Mask (weighted map) is obtained. Specifically, the original image I and the enhanced image Mask are multiplied together to obtain the enhanced image.

[0093] In one embodiment, the feature enhancement module is:

[0094] F max =MaxPool(FC(ReLu(FC(sigmod(F)))))×F

[0095] F mean =AvgPool(FC(ReLu(FC(sigmod(F)))))×F

[0096] F′=Concat(F mean ,F max )

[0097] Where F represents shared features, MaxPool represents global max pooling layer, AvgPool represents global average pooling layer, FC represents fully connected layer, ReLU and sigmoid are activation functions, and Concat represents concatenating features along the channel dimension.

[0098] In one embodiment, the Siamese network model is trained based on sampled hard samples to obtain a cross-view matching model, including:

[0099] The Siamese network model is jointly trained using the triplet loss function and cross-entropy loss function of the sampled hard samples to obtain a cross-view matching model; the triplet loss function of the sampled hard samples is...

[0100]

[0101] Where N represents the data size of a batch, S1 and S2 represent data from different viewpoints, and d a,p and d a,n α represents the Euclidean distance between the anchor sample and the positive and negative samples, respectively, and α is the boundary value.

[0102] In one embodiment, cross-view matching model is used for UAV cross-view positioning and navigation, including:

[0103] The target image and the image to be matched are input into the cross-view matching model. The feature enhancement module enhances the target image and the image to be matched to obtain the feature vectors of the target image and the image to be matched. The Euclidean distance between the target feature vector and all the images to be matched is calculated to obtain the retrieval results.

[0104] Calculate the Mahalanobis distance and Jaccard distance between the feature vectors of the target image and the image to be matched, and then perform a weighted sum of the Mahalanobis distance and Jaccard distance to obtain the similarity result;

[0105] The search results are reordered based on the similarity results to obtain an optimized matching sequence; the UAV cross-view positioning and navigation is then performed based on the optimized matching sequence.

[0106] In one embodiment, calculating the Euclidean distance between the target feature vector and all images to be matched includes:

[0107] Calculate the Euclidean distance between the target feature vector and all images to be matched.

[0108]

[0109] Among them, v target [i] represents the feature vector of the target image i, v query [i] represents the feature vector of the image i to be matched.

[0110] It should be understood that, although Figure 1 The steps in the flowchart are shown sequentially as indicated by the arrows, but these steps are not necessarily executed in the order indicated by the arrows. Unless otherwise specified herein, there is no strict order in which these steps are executed, and they can be performed in other orders. Figure 1 At least some of the steps in the process may include multiple sub-steps or multiple stages. These sub-steps or stages are not necessarily completed at the same time, but can be executed at different times. The execution order of these sub-steps or stages is not necessarily sequential, but can be executed in turn or alternately with other steps or at least some of the sub-steps or stages of other steps.

[0111] In one embodiment, such as Figure 3 As shown, a UAV cross-view localization and navigation device based on semi-supervised feature enhancement is provided, including: a database construction module 302, a variational autoencoder training module 304, an image enhancement module 306, a cross-view matching model training module 308, and a UAV cross-view localization and navigation module 310, wherein:

[0112] Database construction module 302 is used to construct a drone aerial photography database and a satellite remote sensing image database based on high-quality geographic information data obtained by drones carrying high-resolution optoelectronic payloads.

[0113] The variational autoencoder training module 304 is used to train the variational autoencoder using a pre-built variational autoencoder loss function to obtain the trained variational autoencoder.

[0114] Image enhancement module 306 is used to preprocess geographic information data and perform self-supervised feature enhancement on the preprocessed geographic information data according to the trained variational autoencoder to obtain the enhanced image;

[0115] The cross-view matching model training module 308 is used to construct a Siamese network model based on the Resnet50-ibn-a network and the feature enhancement module; it samples the UAV aerial photography database and the satellite remote sensing image database according to the cross-domain hard sample sampling method, and trains the Siamese network model based on the sampled hard samples to obtain the cross-view matching model;

[0116] The UAV cross-view positioning and navigation module 310 is used to perform UAV cross-view positioning and navigation using a cross-view matching model.

[0117] Specific limitations regarding the UAV cross-view positioning and navigation device based on semi-supervised feature enhancement can be found in the limitations of the UAV cross-view positioning and navigation method based on semi-supervised feature enhancement mentioned above, and will not be repeated here. Each module in the aforementioned UAV cross-view positioning and navigation device based on semi-supervised feature enhancement can be implemented entirely or partially through software, hardware, or a combination thereof. These modules can be embedded in or independent of the processor in a computer device in hardware form, or stored in the memory of a computer device in software form, so that the processor can call and execute the corresponding operations of each module.

[0118] The technical features of the above embodiments can be combined in any way. For the sake of brevity, not all possible combinations of the technical features in the above embodiments are described. However, as long as there is no contradiction in the combination of these technical features, they should be considered to be within the scope of this specification.

[0119] The embodiments described above are merely illustrative of several implementation methods of this application, and while the descriptions are relatively specific and detailed, they should not be construed as limiting the scope of the invention patent. It should be noted that those skilled in the art can make various modifications and improvements without departing from the concept of this application, and these all fall within the protection scope of this application. Therefore, the protection scope of this patent application should be determined by the appended claims.

Claims

1. A method for cross-view positioning and navigation of a UAV based on semi-supervised feature enhancement, characterized in that, The method includes: A drone aerial photography database and a satellite remote sensing image database are constructed based on high-quality geographic information data acquired by drones equipped with high-resolution optoelectronic payloads. The variational autoencoder is trained using a pre-constructed variational autoencoder loss function to obtain the trained variational autoencoder. The geographic information data is preprocessed and then self-supervised feature enhancement is performed on the preprocessed geographic information data based on the trained variational autoencoder to obtain the enhanced image. A Siamese network model is constructed based on the Resnet50-ibn-a network and feature enhancement module; the UAV aerial photography database and satellite remote sensing image database are sampled according to the cross-domain hard sample sampling method, and the Siamese network model is trained based on the sampled hard samples to obtain a cross-view matching model; The cross-view matching model is used for UAV cross-view positioning and navigation; The feature enhancement module is Where F represents shared features, MaxPool represents global max pooling layer, AvgPool represents global average pooling layer, FC represents fully connected layer, ReLU and sigmoid are activation functions, and Concat represents concatenating features along the channel dimension. The Siamese network model is trained based on the sampled hard samples to obtain a cross-view matching model, including: The Siamese network model is jointly trained using the triplet loss function and cross-entropy loss function of the sampled hard samples to obtain a cross-view matching model; the triplet loss function of the sampled hard samples is... Where N represents the data size of a batch, and S1 and S2 represent data from different viewpoints, respectively. and These represent the Euclidean distances between the anchored sample and the positive and negative samples, respectively. These are boundary values.

2. The method of claim 1, wherein, The pre-built variational autoencoder loss function is: in, It is a data dimension. It is the original input image. The i pixel value, It generates images The i pixel value, It is the dimension of the latent variables. and These are the latent variables output by the encoder. i The mean and standard deviation of each dimension These are the weight parameters.

3. The method of claim 1, wherein, The preprocessing process includes resizing, randomly flipping, rotating, and color jittering of data from different perspectives; Self-supervised feature enhancement is performed on the preprocessed geographic information data using a variational autoencoder to obtain enhanced images, including: The encoder in the variational autoencoder maps the preprocessed geographic information data to obtain the latent variables corresponding to the data. The latent variables are mapped back to the data space using the decoder in the variational autoencoder to obtain a reconstructed image; the reconstructed image is then normalized to obtain an enhanced image.

4. The method of claim 3, wherein, The latent variables are mapped back to the data space using the decoder in the trained variational autoencoder to obtain a reconstructed image; the reconstructed image is then normalized to obtain an enhanced image, including: The latent variables are decoded using the decoder in the trained variational autoencoder to obtain image samples. These image samples are then mapped back to the data space to obtain the reconstructed image. wherein, represents an image sample, represents a minimum image sample, represents a maximum image sample; The reconstructed image is normalized to obtain the enhanced image. wherein represents the original image.

5. The method of claim 1, wherein, Using the cross-view matching model for UAV cross-view positioning and navigation includes: The target image and the image to be matched are respectively input into the cross-view matching model. The target image and the image to be matched are enhanced by the feature enhancement module to obtain the feature vectors of the target image and the image to be matched. The Euclidean distance between the target feature vector and all the images to be matched is calculated to obtain the retrieval results. Calculate the Mahalanobis distance and Jaccard distance between the feature vectors of the target image and the image to be matched, and then perform a weighted sum of the Mahalanobis distance and Jaccard distance to obtain the similarity result; The search results are reordered based on the similarity results to obtain an optimized matching sequence; the UAV cross-view positioning and navigation is performed based on the optimized matching sequence.

6. The method according to claim 5, characterized in that, Calculate the Euclidean distance between the target feature vector and all images to be matched, including: Calculate the Euclidean distance between the target feature vector and all images to be matched. in, Represents the target image The eigenvectors are, Indicates the image to be matched eigenvectors.

7. A semi-supervised feature enhancement based unmanned aerial vehicle cross-view positioning and navigation device, characterized in that, The device includes: The database construction module is used to build a drone aerial photography database and a satellite remote sensing image database based on high-quality geographic information data acquired by drones equipped with high-resolution optoelectronic payloads. The variational autoencoder training module is used to train the variational autoencoder using a pre-built variational autoencoder loss function to obtain the trained variational autoencoder. The image enhancement module is used to preprocess the geographic information data and perform self-supervised feature enhancement on the preprocessed geographic information data according to the trained variational autoencoder to obtain the enhanced image. A cross-view matching model training module is used to construct a Siamese network model based on the ResNet50-ibn-a network and a feature enhancement module; the feature enhancement module is... Where F represents shared features, MaxPool represents the global max pooling layer, AvgPool represents the global average pooling layer, FC represents the fully connected layer, ReLU and sigmoid are activation functions, and Concat represents concatenating features along the channel dimension; the UAV aerial photography database and satellite remote sensing image database are sampled according to the cross-domain hard sample sampling method, and the Siamese network model is trained based on the sampled hard samples to obtain a cross-view matching model, including: The Siamese network model is jointly trained using the triplet loss function and cross-entropy loss function of the sampled hard samples to obtain a cross-view matching model; the triplet loss function of the sampled hard samples is... Where N represents the data size of a batch, and S1 and S2 represent data from different viewpoints, respectively. and These represent the Euclidean distances between the anchored sample and the positive and negative samples, respectively. These are boundary values; A UAV cross-view positioning and navigation module is used to perform UAV cross-view positioning and navigation using the cross-view matching model.

Citation Information

Patent Citations

  • Cross-view-angle image real-time matching geographic positioning method and system based on deep learning

    CN114241464A

  • Autoregression Image Abnormity Detection Method of Enhancing Latent Space Based on Memory

    US20230154177A1