Multi-source remote sensing scene classification method based on dynamic sample selection
By employing a dynamic sample selection method and utilizing residual networks and a flexible actor-critic reinforcement learning framework, the transfer value and risk of multi-source remote sensing images are quantified, solving the problem of cross-domain distribution shift in multi-source remote sensing scene classification and achieving adaptive high-precision classification.
Patent Information
- Authority / Receiving Office
- CN · China
- Patent Type
- Applications(China)
- Current Assignee / Owner
- CHANGCHUN INST OF OPTICS FINE MECHANICS & PHYSICS CHINESE ACAD OF SCI
- Filing Date
- 2026-05-14
- Publication Date
- 2026-06-16
AI Technical Summary
Existing multi-source remote sensing scene classification methods cannot effectively integrate multi-source data when faced with differences in sensors, changes in imaging time phase, and heterogeneity of regional environments. This results in complex cross-domain distribution shifts and makes it difficult to achieve adaptive sample selection when the target domain is unknown or dynamically evolving, thus affecting the model's generalization performance.
A dynamic sample selection-based approach is adopted. Multi-source remote sensing image features are extracted through residual networks to construct a target domain proxy validation set. The value and risk of sample migration are quantified by combining a path evaluation network. A data selector with a flexible actor-critic reinforcement learning framework is built to dynamically select samples to optimize the classification model.
Without the need for domain labels, it significantly improves the classification accuracy and robustness of the model in unknown or dynamic target domains, effectively suppresses negative transfer, achieves fine-grained adaptive multi-source remote sensing sample selection, and improves the classification performance of the model in complex remote sensing scenarios.
Smart Images

Figure CN122223451A_ABST
Abstract
Description
Technical Field
[0001] This invention relates to the field of remote sensing scene classification technology, and in particular to a multi-source remote sensing scene classification method based on dynamic sample selection. Background Technology
[0002] Remote sensing scene classification is one of the core tasks of intelligent remote sensing interpretation, and it is widely used in land use monitoring, disaster assessment, urban planning, and other fields. With the explosive growth of multi-source remote sensing data, how to effectively integrate heterogeneous data from different sensors, times, and regions to improve model generalization ability has become a research hotspot. However, multi-source remote sensing data naturally exhibits significant distributional heterogeneity: different sensors have different spectral response functions, spatial resolutions, and imaging mechanisms; when the same area is imaged under different seasons or weather conditions, the reflectance characteristics of ground objects, cloud cover, and vegetation status change drastically; furthermore, differences in geographical environment further exacerbate inter-domain migration. These factors collectively lead to a complex non-independent and identically distributed relationship between the source and target domains.
[0003] Against this backdrop, unsupervised domain adaptation has been widely used to alleviate the problem of cross-domain performance degradation. Early methods mainly focused on single-source domain adaptation, aligning the feature space by minimizing the distribution distance between the source domain and the target domain. However, when faced with multiple heterogeneous source domains, single-source methods cannot fully utilize the complementary information from multiple sources, and training each source domain separately is inefficient. To address this, researchers have proposed multi-source domain adaptation methods, with typical strategies including: (1) simply merging all source domain data and then performing unified domain alignment; (2) assigning fixed weights to each source domain for weighted fusion; and (3) setting static thresholds based on sample-level metrics to screen "reliable" samples.
[0004] However, the aforementioned approaches face fundamental limitations in dynamic and open remote sensing scenarios. First, directly merging multi-source data ignores inter-domain differences, easily introducing samples that contradict the target domain distribution, leading to negative transfer—meaning the model performance may even be lower than using only the optimal single source. Second, static thresholds or fixed weight strategies lack adaptability and cannot perceive the dynamic evolution of the target domain distribution, causing the selection mechanism to lag behind actual needs. More importantly, in real-world remote sensing applications, target domains are often unlabeled, and their respective "domains" are not clearly defined, making methods relying on explicit domain labels or pre-defined domain divisions difficult to deploy.
[0005] To address the aforementioned issues, two main technical approaches can be adopted: The first is a multi-source domain adaptation method based on sample selection. For example, some works employ clustering or nearest-neighbor strategies to select samples from multi-source data that are close to the target domain's feature space for training; other methods use the confidence level of the target domain's pseudo-labels as a selection criterion, retaining only high-confidence samples for model updates. While these methods can partially mitigate negative transfer, their selection rules are fixed and heuristic, failing to consider the dynamic risks during sample transfer. For instance, a high-confidence sample might mislead the model in subsequent iterations due to feature shift. Furthermore, they typically assume a relatively stable target domain distribution, making them unable to handle the continuous distribution drift common in remote sensing data. The second approach is an adaptive sampling framework based on reinforcement learning. Recent research has attempted to introduce reinforcement learning into domain adaptation, allowing agents to learn when and where to sample source data to optimize the target task. However, existing reinforcement learning-based methods are mostly designed for general image classification, failing to model transfer risks specific to the multi-source heterogeneity of remote sensing data, and often employ discrete action spaces, resulting in coarse granularity and difficulty in achieving fine-grained sample-level control. More importantly, they lack a joint modeling mechanism for transferability and risk gradients, which leads to unstable policy learning and makes them prone to getting trapped in local optima in complex remote sensing scenarios.
[0006] In summary, although existing research has made progress in multi-source migration and sample selection, there is still no method that can achieve fine-grained, adaptive multi-source remote sensing sample selection by dynamically sensing migration risks and optimizing long-term benefits when the target domain is unknown. Summary of the Invention
[0007] This invention aims to solve the technical problem of complex cross-domain distribution shifts caused by sensor differences, imaging time-phase changes, and regional environmental heterogeneity in multi-source remote sensing scene classification in the prior art, and provides a multi-source remote sensing scene classification method based on dynamic sample selection.
[0008] To solve the above-mentioned technical problems, the technical solution of the present invention is as follows:
[0009] A multi-source remote sensing scene classification method based on dynamic sample selection includes the following steps:
[0010] Step 1: Use a residual network to uniformly extract the features of source and target domain samples from multi-source remote sensing images, and use high-confidence pseudo-labels to construct a target domain proxy validation set for evaluation on the unlabeled target domain, and calculate the feature mean of the target domain proxy validation set as the target domain context.
[0011] Step 2: Construct a path evaluation network to jointly quantify the migration value and risk of each multi-source remote sensing source and target domain sample, forming a joint state vector;
[0012] Step 3: Build a data selector based on the flexible actor-critic reinforcement learning framework, using the joint state vector output by the path evaluation network in Step 2 as input, and drive sample selection with long-term cumulative rewards to achieve sample-level adaptive screening;
[0013] Step 4: Use the source domain sample subset selected by the data selector in Step 3 and the target domain dataset to jointly train the scene classification model. The training loss includes supervised loss and consistency regularization loss.
[0014] Step 5: Use the scene classification model trained in Step 4 to perform multi-source remote sensing scene classification. Input the multi-source remote sensing image to be tested, and output the scene category of the image based on the classification score.
[0015] In the above technical solution, in step 2, the path evaluation network includes two parallel sub-modules: a portability branch and a migration risk branch;
[0016] The transferability branch quantifies the potential contribution of a single source domain sample to the classification performance of the target domain based on the statistical relationship between the features of the source domain samples and the mean features of the target domain proxy validation set.
[0017] The migration risk branch explicitly assesses the degree of distribution shift by calculating the maximum mean difference gradient norm between the feature distributions of the source domain samples and the feature distributions of the target domain.
[0018] The transferability score and transferability risk value output by the transferability branch and the transferability risk branch, respectively, are concatenated with the original source domain sample features to form a joint state vector.
[0019] In the above technical solution, in step 3, the data selector is used to take the joint state vector output by the path evaluation network as the state input, and dynamically decides whether to retain the current source domain sample by outputting continuous actions through the policy network.
[0020] In the above technical solution, step 3, the composite reward function that drives sample selection with long-term cumulative rewards is:
[0021] The composite reward function comprehensively considers three indicators on the target domain proxy validation set: accuracy gain, migration risk value, and pseudo-label entropy.
[0022] The accuracy gain is calculated as the change in accuracy on the target domain proxy validation set after fine-tuning the data selector model using source domain samples, and is used as a positive incentive.
[0023] The migration risk value is: the migration risk value provided by the path evaluation network is used as a negative penalty term to actively suppress source domain samples with severe distribution shift;
[0024] The pseudo-label entropy is the negative value of the pseudo-label prediction entropy, which encourages the selection of source domain samples that the data selector model predicts with high confidence.
[0025] Accuracy gain, migration risk value, and pseudo-label entropy are weighted and summed using adjustable weights to form the final reward, expressed as:
[0026]
[0027] in, This is the increment in the accuracy of the target domain proxy validation set after using source domain samples; Represents pseudo-label entropy; , , These are the weighting coefficients; For migration risk value; This represents the reward value.
[0028] In the above technical solution, in step 4,
[0029] Supervised loss is: for the filtered subset of source domain samples The supervised cross-entropy loss ensures that the discriminative power of the source domain samples is preserved. Represents a sample from the source domain. Indicates the probability of sample selection;
[0030] The consistency regularization loss is: the consistency regularization loss for samples in the target domain dataset, which improves cross-domain generalization ability through prediction consistency enhancement under strong and weak data.
[0031] In the above technical solution, step 4 involves the loss function used for training loss. The calculation formula is:
[0032]
[0033] in, Indicates the true category label, Represents the target domain sample. For the target domain dataset, This represents supervised cross-entropy loss. This represents the predicted probability distribution for the sample. Indicates the weighting coefficient. This represents the consistency regularization loss.
[0034] In the above technical solution, step 3 involves dynamically determining whether to retain the current source domain sample by outputting continuous actions through the policy network. Specifically:
[0035] The data selector employs an actor network based on a flexible actor-critic reinforcement learning framework, receiving the joint state vector as input and outputting... As the action space; if the output If the value is greater than 0.5, the source sample is included in the training set; otherwise, it is discarded. This represents the probability of choosing an action.
[0036] The present invention has the following beneficial effects:
[0037] The multi-source remote sensing scene classification method based on dynamic sample selection of this invention does not require domain labels and is adaptable to unknown open scenes. Existing methods typically assume that the boundaries between the source and target domains are well-defined and require explicit domain division; while this invention does not rely on domain labels at all and can still operate adaptively in open remote sensing scenes where the target domain distribution is unknown or dynamically evolving, significantly improving practical deployment capabilities.
[0038] The multi-source remote sensing scene classification method based on dynamic sample selection in this invention can achieve sample-level risk perception and effectively suppress negative migration. Traditional multi-source methods often directly merge all source domain data, which is prone to negative migration due to distribution differences; while this invention designs a path evaluation network to quantify the migration risk and benefit of each sample, and combines reinforcement learning to achieve fine-grained screening, avoiding training contamination by high-confidence but high-bias samples, and its performance is significantly better than static fusion strategies.
[0039] The multi-source remote sensing scene classification method based on dynamic sample selection of this invention responds to distribution evolution through a dynamically adaptive data selector. Existing static threshold or fixed weighting strategies are difficult to cope with the time-varying characteristics of remote sensing data; while this invention adopts a flexible actor-critic reinforcement learning framework, which drives sample selection with long-term cumulative rewards, and can actively perceive and adapt to the dynamic evolution of the target domain, maintaining robust classification performance in open and non-steady-state environments.
[0040] The multi-source remote sensing scene classification method based on dynamic sample selection in this invention maintains a single-stream backbone network compared to multi-branch or multi-stage network architectures, achieving efficient knowledge transfer solely through intelligent data selection. Feature extraction, influence assessment, sample selection, and classification training are jointly optimized in a closed loop. Attached Figure Description
[0041] The present invention will now be described in further detail with reference to the accompanying drawings and specific embodiments.
[0042] Figure 1 This is a schematic diagram of the overall architecture of the multi-source remote sensing scene classification method based on dynamic sample selection of the present invention.
[0043] Figure 2 This is a schematic diagram of the path evaluation network structure.
[0044] Figure 3 This is a schematic diagram of a data selector based on a flexible actor-critic reinforcement learning framework. Detailed Implementation
[0045] The inventive concept of this invention is as follows:
[0046] Existing methods typically involve simply merging data from multiple source domains for training or using static threshold strategies for sample selection. This ignores the dynamic differences in transfer difficulty between different source and target domains, easily leading to negative transfer and impairing the model's generalization performance in the target domain. Especially in real-world remote sensing applications where the target domain is unlabeled and its specific distribution is unknown, automatically identifying and selecting high-quality samples with the greatest transfer value to the current target domain from massive, heterogeneous multi-source data becomes a key bottleneck in improving adaptive classification accuracy.
[0047] To address this issue, this invention proposes a remote sensing scene classification method based on dynamic sample selection. Its core lies in abandoning fixed rules and constructing an intelligent selection mechanism capable of sensing migration risks and making autonomous decisions. Inspired by reinforcement learning and data influence modeling, this mechanism uses a learnable "data selector" to dynamically evaluate the contribution of each source domain sample to optimizing the target domain task. This selector not only considers the feature quality of the sample itself, but more importantly, models its potential alignment with the target domain data and migration uncertainty, thereby avoiding the introduction of noisy or harmful samples that contradict the target domain distribution. Through this refined, task-oriented sample selection, this invention effectively mitigates mutual interference between multi-source data, significantly improves the efficiency and robustness of the model's cross-domain classification, and provides efficient and adaptive technical support for complex and open real-world remote sensing scene classification tasks.
[0048] The purpose of this invention is to propose a dynamic sample selection mechanism that does not require explicit domain labels and possesses risk perception capabilities, enabling refined and adaptive utilization of multi-source remote sensing data. Specifically, a path evaluation network is constructed to jointly quantify the transferability and transfer risk of each source domain sample; and a data selector based on a flexible actor-critic reinforcement learning framework is designed to dynamically decide sample selection with the goal of improving long-term classification performance. This method effectively suppresses negative transfer while preserving complementary information from multiple sources, significantly improving the model's scene classification accuracy and robustness in unknown and dynamic target domains.
[0049] The following is in conjunction with the appendix Figure 1-3 The present invention will be described in detail below. Figure 1 and 2 In this context, "target domain proxy validation set" is used to represent "the feature mean of the target domain proxy validation set".
[0050] The multi-source remote sensing scene classification method based on dynamic sample selection of the present invention, such as Figure 1 As shown, it includes the following steps:
[0051] Step 1: Use a residual network to uniformly extract the features of source and target domain samples from multi-source remote sensing images, and use high-confidence pseudo-labels to construct a "target domain proxy verification set" that can be used for evaluation on the unlabeled target domain. Calculate the feature mean of the target domain proxy verification set as the target domain context. This step provides a supervised signal for target domain performance evaluation and solves the problem of unlabeled target domains in real-world scenarios.
[0052] Step 2: Construct a path evaluation network to jointly quantify the migration value and risk of each multi-source remote sensing sample. Specifically, using the source domain sample features output from Step 1 and the mean features of the target domain surrogate validation set as input, the path evaluation network outputs a transferability score and a migration risk value. The transferability score and migration risk value are then concatenated with the source domain sample features to finally output a joint state vector. This network contains two parallel sub-modules: one is the transferability branch, which predicts the potential improvement of the source domain's classification performance on the target domain based on the statistical relationship between the source domain sample features and the mean features of the target domain surrogate validation set, ultimately outputting a transferability score; the other is the migration risk branch, which explicitly evaluates the distribution shift by calculating the maximum mean difference gradient norm between the source domain sample features and the target domain feature distribution, ultimately outputting a migration risk value. The transferability score and migration risk value output by the transferability branch and the migration risk branch, respectively, are concatenated with the source domain sample features (original features) to obtain the joint state vector, providing a reliable basis for dynamic screening.
[0053] Step 3: Construct a data selector based on a flexible actor-critic reinforcement learning framework. Using the joint state vector output by the path evaluation network in Step 2 as input, and driven by long-term cumulative rewards, select a subset of source domain samples and the target domain dataset, achieving sample-level adaptive selection. This data selector uses the joint state vector output by the path evaluation network as state input, and through the policy network, outputs continuous actions to dynamically determine whether to retain the current source domain sample. The reward function comprehensively considers the accuracy gain, transfer risk value, and pseudo-label entropy on the target domain proxy validation set.
[0054] Step 4: Use the high-quality source domain sample subset selected by the data selector and the target domain dataset to jointly train the final scene classification model, and combine supervised loss and consistency regularization loss to improve cross-domain generalization ability.
[0055] Step 5: Use the trained scene classification model to perform multi-source remote sensing scene classification. Input the remote sensing image to be tested, and output the scene category of the image based on the classification score.
[0056] In step 1 of the multi-source remote sensing scene classification method based on dynamic sample selection of the present invention, the residual network is a basic deep neural network model used to uniformly extract the source domain and target domain features of multi-source remote sensing images. Its core function is to map heterogeneous remote sensing images from different sensors, different time phases, and different resolutions into a common, semantically aligned feature space, laying the foundation for subsequent cross-domain evaluation and transfer.
[0057] First, from Labeled data from heterogeneous source domains Unlabeled data in the target domain Input residual network The feature values obtained in the unified semantic space can be represented as:
[0058] (1)
[0059] (2)
[0060] in, and Let these represent samples from the source and target domains, respectively. and These represent the features of samples from the source and target domains, respectively. For residual network parameters, This indicates that feature extraction is performed using a shared backbone network.
[0061] Subsequently, the class probabilities of the target domain samples are predicted. Only retain the one with the highest probability exceeding the threshold. The samples and their pseudo-labels constitute the target domain proxy verification set. And calculate the characteristic mean. As the target domain context, the formula is:
[0062] (3)
[0063] in, This indicates the number of samples in the target domain proxy validation set. Indicates the first Target domain samples in the target domain proxy verification set. Indicates the sample number in the target domain. This indicates that feature extraction is performed using a shared backbone network.
[0064] In step 2 of the multi-source remote sensing scene classification method based on dynamic sample selection of the present invention, the source domain sample features output in step 1 are received. and the feature mean of the target domain proxy validation set As input, and during the computation, the feature set of all samples in the target domain proxy validation set is dynamically accessed. The computational fusion is performed through the portability branch and the migration risk branch, respectively. For details of this path evaluation network framework, please refer to [link to relevant documentation]. Figure 2 .
[0065] (a) Constructing a transferability branch to quantify the potential contribution of a single source domain sample to the classification performance of the target domain. This branch is based on the features of the source domain samples. The feature mean of the target domain proxy validation set As input, a lightweight multilayer perceptron models the semantic alignment between the two domains. When the source domain sample is semantically and distributionally close to the target domain, a higher influence score is output; conversely, a lower score is output. This can be represented as:
[0066] (4)
[0067] (5)
[0068] in, For activation function, Features of source domain samples The feature mean of the target domain proxy validation set The concatenated vector, and The weights and biases of the first-layer perceptron are given. and These are the weights and biases of the second-layer perceptron, respectively. For the Sigmoid function, ensure the output... , This indicates the activation output of the first hidden layer. This represents the transferability score.
[0069] This transferability branch further demonstrates the change in accuracy on the target domain proxy validation set through supervised training, i.e., fine-tuning using source domain samples. The regression objective is to minimize the mean squared error. :
[0070] (6)
[0071] That is, if The larger the value, the better the source domain sample can improve the target domain performance.
[0072] (b) Construct a migration risk branch, which focuses on identifying "harmful" samples—those that may be predicted with high confidence by the model but whose actual distribution differs significantly from the target domain. This is achieved by calculating a measure of inconsistency between the feature distributions of the source domain samples and the target domain features. Specifically, the system uses the maximum mean difference as the distribution distance indicator and further calculates the gradient norm of this distance with respect to the features of the source domain samples. The calculation method is as follows:
[0073] (7)
[0074] in, For Gaussian kernel function, To represent the number of samples in the target domain proxy validation set, This represents the migration risk value. The gradient norm reflects the "contribution" of the source domain sample to the overall distribution shift; that is, the larger the gradient norm, the more the source domain sample deviates from the mainstream distribution of the target domain, and the higher the migration risk. The meanings of the other parameters are the same. This gradient value, after normalization, is used as the output of the migration risk branch. This design enables the path evaluation network to not only perceive static shifts but also capture the "sensitive directions" of samples in the feature space, achieving more refined risk discrimination.
[0075] (c) Obtain transferability scores separately and migration risk value Subsequently, the path evaluation network compares these two quantitative metrics with the features of the original source domain samples. The vectors are concatenated to form a joint state vector. This can be represented as:
[0076] (8)
[0077] This joint state vector contains both high-level semantic information and task-oriented influence and risk. This fusion strategy ensures that the subsequent reinforcement learning agent can both understand the sample content and weigh its transfer benefits and risks, avoiding decision-making bias caused by relying on a single indicator.
[0078] Step 3 of the multi-source remote sensing scene classification method based on dynamic sample selection of the present invention is as follows: Figure 3 As shown, specifically, the joint state vector output by the path evaluation network is used as the state of the reinforcement learning agent (a data selector based on a flexible actor-critic reinforcement learning framework). ,Right now Then, the "actor" network in the data selector of the flexible actor-critic reinforcement learning framework is used. Its structure is a multilayer perceptron, which receives the above joint state vector as input and outputs... As action space ( (As a continuous value), this action is interpreted as a "choice probability," that is... This represents the probability of action selection, with a value between 0 and 1. If the output... If the value is greater than 0.5, the source domain sample is included in the training set; otherwise, it is discarded.
[0079] The multi-source remote sensing scene classification method based on dynamic sample selection in this invention designs a composite reward function that comprehensively considers three indicators: First, the change in accuracy on the target domain proxy validation set after using the source domain samples to fine-tune the model (data selector in a flexible actor-critic reinforcement learning framework) is calculated as a positive incentive, i.e., the greater the performance improvement, the higher the reward; second, the migration risk value provided by the path evaluation network is introduced as a negative penalty term to actively suppress source domain samples with severe distribution shifts; finally, pseudo-label entropy is added, and the negative value of its pseudo-label prediction entropy encourages the selection of source domain samples with high confidence predictions from the model, avoiding noise interference. The three indicators are weighted and summed through adjustable weights to form the final reward, which can be expressed as:
[0080] (9)
[0081] in, This is the increment in the accuracy of the target domain proxy validation set after using the source domain sample; Represents pseudo-label entropy; , , These are the weighting coefficients; For migration risk value; This represents the reward value, which is the contribution of the current source domain sample selection action to the target domain classification performance.
[0082] The agent (i.e., the data selector in the flexible actor-critic reinforcement learning framework) updates the "actor" and "critic" networks through policy gradients (see...). Figure 3 Maximize cumulative discount rewards The calculation formula is:
[0083] (10)
[0084] in, The status-action value output by "commentators" in the network. The temperature coefficient controls the weight of the entropy regularization term. The objective, which uses the log probability density of the policy, encourages agents to maintain policy diversity while pursuing high cumulative rewards, thereby enhancing their adaptability to dynamic remote sensing scenarios. Indicates state, Representing state The empirical distribution, Indicates the state The following actions were taken. Representation strategy, i.e., from state Mapping to the probability distribution of actions.
[0085] In step 4 of the multi-source remote sensing scene classification method based on dynamic sample selection of the present invention, the source domain sample subset selected by the data selector is used to jointly train the scene classification model with the target domain dataset. The training loss consists of two parts:
[0086] First, the filtered source domain sample subset The supervised cross-entropy loss ensures that the discriminative power of the source domain samples is preserved. Indicates the probability of sample selection;
[0087] Secondly, the consistency regularization loss of the target domain dataset samples is used to improve cross-domain generalization ability through prediction consistency enhancement under strong and weak data.
[0088] Therefore, the formula for calculating the loss function is:
[0089] (11)
[0090] in Indicates the true category label, Represents the target domain sample. For the target domain dataset, This represents supervised cross-entropy loss. This represents the predicted probability distribution for that sample. Indicates the weighting coefficient. This represents the consistency regularization loss.
[0091] This strategy avoids the noise and conflicts introduced by traditional multi-source methods that directly merge heterogeneous data, and fundamentally suppresses negative migration.
[0092] In the multi-source remote sensing scene classification method based on dynamic sample selection of the present invention, step 1 extracts multi-source features using a residual network under the condition of no target domain label, and dynamically constructs a target domain proxy validation set based on high-confidence pseudo-labels. This is a proxy validation set generation method that does not rely on real labels, including pseudo-label threshold screening, feature space alignment, and a periodic update mechanism for the validation set, used for subsequent transfer evaluation.
[0093] In the multi-source remote sensing scene classification method based on dynamic sample selection of the present invention, step 2 designs a dual-branch structure to jointly model sample-level transferability and transfer risk, and output a multi-dimensional state vector. This path evaluation network's dual-branch architecture and training method, in particular, quantizes the distribution offset into a differentiable gradient risk and fuses it with the influence score to form the state input of the reinforcement learning framework in the data selector.
[0094] In the multi-source remote sensing scene classification method based on dynamic sample selection of the present invention, step 3 is based on the maximum entropy reinforcement learning framework, using composite rewards to drive sample-level adaptive screening, and achieving a dynamic balance between exploration and utilization. Its core lies in modeling the sample adoption decision as a continuous action control problem, using a flexible actor-critic reinforcement learning framework to optimize the policy network, and designing a composite reward function that integrates the target domain proxy validation set performance gain, migration risk penalty and pseudo-label confidence, to achieve adaptive sample filtering without domain labels.
[0095] In the multi-source remote sensing scene classification method based on dynamic sample selection of the present invention, step 4 combines supervised loss and consistency regularization, and uses only the high-quality source domain samples selected by the data selector in step 3 and the target domain dataset for joint training. This is a master classifier training strategy that is resistant to negative transfer. Its training data is dynamically provided by the reinforcement learning agent and forms a closed-loop feedback with the target domain agent validation set and the route evaluation network to ensure the model's generalization ability on unknown target domains.
[0096] In the multi-source remote sensing scene classification method based on dynamic sample selection of the present invention, in the feature extraction part, the network structure preferably selected is a residual network, Swin-Tiny or its variants, which can achieve the purpose of mapping multi-source remote sensing images to a unified feature space.
[0097] The multi-source remote sensing scene classification method based on dynamic sample selection of the present invention can be used not only in stand-alone software, but also in cloud software deployment, and supports the calling of software functions and modules on both local and web-based platforms.
[0098] Obviously, the above embodiments are merely illustrative examples for clear explanation and are not intended to limit the implementation. Those skilled in the art will recognize that other variations or modifications can be made based on the above description. It is neither necessary nor possible to exhaustively list all possible implementations here. However, obvious variations or modifications derived therefrom are still within the scope of protection of this invention.
Claims
1. A multi-source remote sensing scene classification method based on dynamic sample selection, characterized in that, Includes the following steps: Step 1: Use a residual network to uniformly extract the features of source and target domain samples from multi-source remote sensing images, and use high-confidence pseudo-labels to construct a target domain proxy validation set for evaluation on the unlabeled target domain, and calculate the feature mean of the target domain proxy validation set as the target domain context. Step 2: Construct a path evaluation network to jointly quantify the migration value and risk of each multi-source remote sensing source and target domain sample, forming a joint state vector; Step 3: Build a data selector based on the flexible actor-critic reinforcement learning framework, using the joint state vector output by the path evaluation network in Step 2 as input, and drive sample selection with long-term cumulative rewards to achieve sample-level adaptive screening; Step 4: Use the source domain sample subset selected by the data selector in Step 3 and the target domain dataset to jointly train the scene classification model. The training loss includes supervised loss and consistency regularization loss. Step 5: Use the scene classification model trained in Step 4 to perform multi-source remote sensing scene classification. Input the multi-source remote sensing image to be tested, and output the scene category of the image based on the classification score.
2. The multi-source remote sensing scene classification method based on dynamic sample selection according to claim 1, characterized in that, In step 2, the path evaluation network consists of two parallel sub-modules: a portability branch and a migration risk branch; The transferability branch quantifies the potential contribution of a single source domain sample to the classification performance of the target domain based on the statistical relationship between the features of the source domain samples and the mean features of the target domain proxy validation set. The migration risk branch explicitly assesses the degree of distribution shift by calculating the maximum mean difference gradient norm between the feature distributions of the source domain samples and the feature distributions of the target domain. The transferability score and transferability risk value output by the transferability branch and the transferability risk branch, respectively, are concatenated with the original source domain sample features to form a joint state vector.
3. The multi-source remote sensing scene classification method based on dynamic sample selection according to claim 1, characterized in that, In step 3, the data selector takes the joint state vector output by the path evaluation network as the state input, and outputs continuous actions through the policy network to dynamically determine whether to retain the current source domain sample.
4. The multi-source remote sensing scene classification method based on dynamic sample selection according to claim 1, characterized in that, In step 3, the composite reward function that drives sample selection with long-term cumulative rewards is: The composite reward function comprehensively considers three indicators on the target domain proxy validation set: accuracy gain, migration risk value, and pseudo-label entropy. The accuracy gain is calculated as the change in accuracy on the target domain proxy validation set after fine-tuning the data selector model using source domain samples, and is used as a positive incentive. The migration risk value is: the migration risk value provided by the path evaluation network is used as a negative penalty term to actively suppress source domain samples with severe distribution shift; The pseudo-label entropy is the negative value of the pseudo-label prediction entropy, which encourages the selection of source domain samples that the data selector model predicts with high confidence. Accuracy gain, migration risk value, and pseudo-label entropy are weighted and summed using adjustable weights to form the final reward, expressed as: in, This is the increment in the accuracy of the target domain proxy validation set after using source domain samples; Represents pseudo-label entropy; , , These are the weighting coefficients; For migration risk value; This represents the reward value.
5. The multi-source remote sensing scene classification method based on dynamic sample selection according to claim 1, characterized in that, In step 4, Supervised loss is: for the filtered subset of source domain samples The supervised cross-entropy loss ensures that the discriminative power of the source domain samples is preserved. Represents a sample from the source domain. Indicates the probability of sample selection; The consistency regularization loss is: the consistency regularization loss for samples in the target domain dataset, which improves cross-domain generalization ability through prediction consistency enhancement under strong and weak data.
6. The multi-source remote sensing scene classification method based on dynamic sample selection according to claim 5, characterized in that, In step 4, the loss function for training loss is... The calculation formula is: in, Indicates the true category label, Represents the target domain sample. For the target domain dataset, This represents supervised cross-entropy loss. This represents the predicted probability distribution for the sample. Indicates the weighting coefficient. This represents the consistency regularization loss.
7. The multi-source remote sensing scene classification method based on dynamic sample selection according to claim 3, characterized in that, In step 3, the policy network outputs continuous actions to dynamically determine whether to retain the current source domain sample, specifically: The data selector employs an actor network based on a flexible actor-critic reinforcement learning framework, receiving the joint state vector as input and outputting... As the action space; if the output If the value is greater than 0.5, the source sample is included in the training set; otherwise, it is discarded. This represents the probability of choosing an action.