A stopping method for space vehicle target access optimization mission based on local search and exploration enhancement

By introducing local search and exploration-enhanced optimization termination criteria into Bayesian optimization, and utilizing the GPR surrogate model and particle swarm optimization algorithm, the problem of excessive resource consumption or getting trapped in local optima in space target access tasks by Bayesian optimization is solved, and a more efficient optimization process is achieved.

CN120722745BActive Publication Date: 2026-06-26HARBIN INST OF TECH

Patent Information

Authority / Receiving Office
CN · China
Patent Type
Patents(China)
Current Assignee / Owner
HARBIN INST OF TECH
Filing Date
2025-06-30
Publication Date
2026-06-26

AI Technical Summary

Technical Problem

Existing Bayesian optimization termination strategies suffer from excessive resource consumption or getting trapped in local optima in space target access tasks, lacking robustness and adaptability, which affects computational efficiency and optimization performance.

Method used

We adopt an optimization termination criterion based on local search and exploration enhancement. By using the prediction standard deviation and local evaluation of the GPR surrogate model, we dynamically adjust the iteration process of Bayesian optimization. We also combine particle swarm optimization algorithm to perform dense sampling of local regions to enhance exploration capabilities and local evaluation.

Benefits of technology

It effectively avoids getting trapped in local optima, improves computational efficiency and optimization performance, reduces unnecessary computational overhead, and enhances global search capabilities and local convergence speed.

✦ Generated by Eureka AI based on patent content.

Smart Images

  • Figure CN120722745B_ABST
    Figure CN120722745B_ABST
Patent Text Reader

Abstract

A kind of stop method of space vehicle target access optimization task based on local search and exploration enhancement belongs to space vehicle control technical field.To solve the problem of optimization termination control with robustness and self-adaptation, the present application designs optimization termination criterion as when the maximum prediction standard deviation of Gaussian process regression surrogate model in current local area is lower than preset threshold, terminate the iteration optimization of bayesian optimization;The design optimization termination criterion is embedded into BO;The prediction standard deviation of GPR surrogate model is used to measure the cognitive uncertainty of GPR surrogate model to objective function at a point;Set the judgment condition to judge whether the exploration enhancement ability of acquisition function is enhanced and whether local evaluation is triggered.The present application realizes the optimization termination control with robustness and self-adaptation.
Need to check novelty before this filing date? Find Prior Art

Description

Technical Field

[0001] This invention belongs to the field of spacecraft control technology, specifically relating to a stopping method for spacecraft target access optimization tasks based on local search and exploration enhancement. Background Technology

[0002] In space target rendezvous mission planning, spacecraft need to achieve rapid rendezvous and transfer with multiple space targets under known origin and destination orbits. The total velocity increment during the transfer directly determines fuel consumption, thus affecting the overall cost and feasibility of the mission; therefore, optimization is necessary. When using a multi-pulse transfer method, due to the introduction of multiple orbital change nodes, the velocity increment between two points lacks an analytical form, requiring numerical optimization to obtain the optimal solution. Embedding such complex trajectory optimization into the rendezvous sequence planning process will severely impact computational efficiency. Therefore, using Bayesian optimization (BO) to optimize the total velocity increment, effectively terminating the optimization iteration early, plays a crucial role in improving the overall computational efficiency of this problem.

[0003] Boolean optimization (BO) is a global optimization method for designing high-cost, non-differentiable, or black-box functions without explicit expressions. BO uses a surrogate model (typically a Gaussian Process Regression (GPR) model) to approximate the objective function and selects sample points with the greatest potential improvement based on the surrogate model using a sampling function, effectively approximating the global optimum within a finite number of function evaluations. This method has been widely applied to resource-sensitive tasks such as hyperparameter tuning, experimental design, reinforcement learning, and robot control.

[0004] In practical applications, Business Optimization (BO) typically relies on a preset maximum number of iterations or a time budget as a termination criterion. However, its optimization results are highly sensitive to the number of iterations: too many iterations lead to unnecessary resource overhead, while insufficient iterations may result in getting trapped in local optima. In fact, due to factors such as the complexity of the objective function, the selection strategy and guidance capability of the acquisition function, and the modeling accuracy and uncertainty representation capability of the surrogate model for the objective function, it is difficult to pre-set a reasonable maximum number of iterations or a time budget for BO. This indicates that the termination of BO should be dynamically adaptive, rather than relying on static settings.

[0005] How to design a reasonable and effective termination criterion is a relatively underestimated but highly practical problem in BO research. Currently, there are the following methods: (1) An intuitive strategy is: if the optimal value of the objective function does not improve significantly in several consecutive iterations, the optimization process is considered to have reached stability and can be terminated early. (2) Some other studies have proposed termination criteria based on expected improvement (EI) and (3) probability of improvement (PI) based on the sampling function, that is, the optimization is terminated when the improvement value is lower than a set threshold. (4) Some researchers have also proposed a strategy based on the difference between the upper and lower confidence bounds of the GPR model to measure whether there are still high potential regions in the space and avoid premature termination. (5) Another method introduces a backtracking mechanism. When multiple consecutive sampling points are concentrated in a certain local convex region and the local regret value is lower than a preset threshold, it is judged that the region has been fully explored and the optimization is terminated.

[0006] The existing technologies have the following shortcomings and limitations in practical applications: (1) The termination strategy is highly sensitive to the number of iterations. When the preset number of iterations is too high, it may cause unnecessary resource consumption; when the number of iterations is insufficient, it may fall into a local optimum, affecting the optimization effect. (2) The threshold setting lacks universality and adaptability. Based on the early stopping strategy of single-point optimal value change or collection function value, the threshold selection depends on experience setting, lacks adaptability to the complexity of the objective function, and is prone to misjudging the convergence state. (3) The surrogate model error has a large interference with the termination criterion. When the Gaussian process surrogate model has insufficient fitting accuracy or oscillation fluctuations, it may cause premature termination or delayed termination, affecting the optimization performance. (4) The local convexity criterion is not applicable in high-dimensional space. Existing methods assist in termination by judging the degree of concentration of local sampling distribution and regret value, but it is difficult to effectively determine whether the local region has been fully explored in high-dimensional space, and it lacks the ability to jump out of the local optimum, so the overall optimization performance needs to be improved. Therefore, the applicability and stability of existing Bayesian optimization termination strategies in complex space target access tasks are still limited. There is an urgent need to propose optimization termination control methods that are both robust and adaptive in order to improve overall computational efficiency and convergence performance. Summary of the Invention

[0007] The problem this invention aims to solve is to achieve optimized termination control that is both robust and adaptive, and proposes a termination method for spacecraft target access optimization tasks based on local search and exploration enhancement.

[0008] To achieve the above objectives, the present invention provides the following technical solution:

[0009] A termination method for spacecraft target access optimization tasks based on local search and exploration enhancement is proposed. The optimization termination criterion is designed as follows: when the maximum prediction standard deviation of the GPR surrogate model in the current local region is lower than a preset threshold. When the time is right, terminate the iterative optimization of BO; embed the design optimization termination criterion into BO.

[0010] Furthermore, the prediction standard deviation is used in the GPR surrogate model. To measure the GPR surrogate model's performance on the objective function Cognitive uncertainty at a certain point; setting judgment conditions to determine whether to enhance the exploratory enhancement capabilities of the acquisition function and whether to trigger local evaluation;

[0011] Condition 1: First, the current optimal solution must be required. In continuous No improvement was made in the second iteration; the second requirement is that the GPR surrogate model is at the current optimal solution. The prediction standard deviation satisfies the following constraints:

[0012] ;

[0013] in, For the preset threshold, The number of consecutive iterations to the current optimal solution;

[0014] If condition 1 is met, the ability to explore the function is enhanced; if not, the termination judgment of this iteration is terminated.

[0015] Condition 2: Based on the premise that the ability to explore the function is enhanced by satisfying Condition 1, determine whether to perform local evaluation;

[0016] Condition 2 is that when the maximum uncertainty within a local area satisfies the following constraint, a local evaluation is performed, expressed as:

[0017] ;

[0018] Where max is the maximum value function. Let be the decision vector. for The domain, Domain middle The maximum value is approximated here using the particle swarm optimization algorithm;

[0019] If condition 2 is met, a local evaluation of the neighborhood of the optimal solution is performed; otherwise, condition 3 is determined.

[0020] Condition 3: If Condition 2 is not met, it is considered that the GPR surrogate model has achieved sufficient confidence in the local region. In this case, if the mean satisfies the following constraint, the subsequent BO iterations are terminated, expressed as:

[0021] ;

[0022] in, This represents the maximum predicted mean value in the local assessment region. Represents the maximum predicted mean across the entire domain, with a preset deviation. .

[0023] Furthermore, the specific implementation method of embedding the aforementioned design optimization termination criterion into the BO includes the following steps:

[0024] S1. Given the objective function domain Initial sample size Maximum number of evaluations Termination parameter;

[0025] S2. In Internal collection Initial samples and using To conduct an evaluation and obtain the initial training dataset. ;

[0026] S3. Based on Training the GPR agent model;

[0027] S4. Optimize the acquisition function based on the GPR proxy model Obtain candidate solutions ;

[0028] S5. Evaluation objective function value ,renew Then return to step S2 for the next round of iterative optimization, and repeat the above process until the termination criterion is met.

[0029] Further methods to enhance the ability to explore functions include the following steps:

[0030] Current optimal value continuous If the next iteration does not improve the result, then enhance the acquisition function. The ability to explore; using UCB as the acquisition function, assuming a maximum of [number] operations. With each enhancement of exploration ability, the exploration-utilization balance coefficient increases. Updated to:

[0031] ;

[0032] in, It is the inverse function of the cumulative distribution function of the standard normal distribution. At the basic confidence level, The maximum enhancement level is the maximum confidence level. and basic confidence level difference, This represents the number of times exploration enhancements have been performed so far, and .

[0033] Furthermore, the method for local evaluation of the neighborhood of the optimal solution includes the following steps:

[0034] For the current optimal solution Centered on the current optimal solution, a hypercube with side length is constructed as the local evaluation space. and in Latin hypercube sampling was used to obtain One sampling point;

[0035] Use the objective function right Each sampling point is evaluated, and the training sample set is updated. and optimal data pair ,in, ;

[0036] Side length of the hypercube for:

[0037] ;

[0038] in, These are the domains The upper and lower bounds, ,and ;

[0039] Number of sampling points ,in, Decision variables dimensionality ,and Round up.

[0040] The beneficial effects of this invention are:

[0041] This invention presents a termination method for spacecraft target access optimization tasks based on local search and enhanced exploration. This method effectively avoids the risk of getting trapped in local optima during the optimization process and accelerates local convergence while maintaining global search capabilities. By dynamically enhancing the exploratory nature of the acquisition function when the optimal target value fails to improve for multiple consecutive rounds, and combining this with local evaluation of the neighborhood of the current optimal solution to determine the optimization trend, dynamic control of the optimization process is achieved. When the prediction uncertainty of the surrogate model in a local region falls below a set threshold, a termination mechanism is automatically triggered, thereby effectively reducing unnecessary computational overhead and improving overall optimization efficiency and performance. Attached Figure Description

[0042] Figure 1 This is a flowchart illustrating the process structure of the present invention.

[0043] Figure 2 This is the local optimum and local evaluation graph as determined by GP;

[0044] Figure 3 This is the local evaluation graph of GP at the current optimal solution;

[0045] Figure 4 The diagrams show the termination status of different algorithms of the present invention, where (a) is the termination status of the algorithm with a low termination threshold; (b) is the termination status of the algorithm with a medium termination threshold; and (c) is the termination status of the algorithm with a high termination threshold. Detailed Implementation

[0046] To make the objectives, technical solutions, and advantages of this invention clearer, the invention will be further described in detail below with reference to the accompanying drawings and specific embodiments. It should be understood that the specific embodiments described herein are only for explaining the invention and are not intended to limit the invention; that is, the described specific embodiments are merely a part of the embodiments of the invention, and not all of them. The components of the specific embodiments of the invention described and shown in the accompanying drawings can generally be arranged and designed in various different configurations, and the invention may also have other embodiments.

[0047] Therefore, the following detailed description of specific embodiments of the invention provided in the accompanying drawings is not intended to limit the scope of the claimed invention, but merely to illustrate selected specific embodiments of the invention. All other specific embodiments obtained by those skilled in the art based on these specific embodiments without inventive effort are within the scope of protection of this invention.

[0048] To further understand the invention's content, features, and effects, the following specific embodiments are provided, along with accompanying drawings. Figure 1 -Appendix Figure 4 Detailed explanation is as follows:

[0049] Example 1:

[0050] A termination method for spacecraft target access optimization tasks based on local search and exploration enhancement is proposed. The optimization termination criterion is designed as follows: when the maximum prediction standard deviation of the GPR surrogate model in the current local region is lower than a preset threshold. When the time is right, terminate the iterative optimization of BO; embed the design optimization termination criterion into BO.

[0051] Furthermore, the prediction standard deviation of the GPR surrogate model is used to... Measuring the GPR proxy model's performance on the objective function Cognitive uncertainty at a certain point; setting judgment conditions to determine whether to enhance the exploratory enhancement capability of the acquisition function and whether to trigger local evaluation;

[0052] Condition 1: First, the current optimal solution must be required. In continuous No improvement was made in the second iteration; the second requirement is that the GPR surrogate model achieves the current optimal solution. The prediction standard deviation satisfies the following constraints:

[0053] ;

[0054] in, For the preset threshold, The number of consecutive iterations to the current optimal solution;

[0055] If condition 1 is met, the ability to explore the function will be enhanced; if not, the termination judgment of this iteration will be terminated.

[0056] Condition 2: Based on the premise that the ability to explore the function is enhanced by satisfying Condition 1, determine whether to perform local evaluation;

[0057] Condition 2 is that when the maximum uncertainty within a local area satisfies the following constraint, a local evaluation is performed, expressed as:

[0058] ;

[0059] Where max is the maximum value function. Let be the decision vector. for The domain, Domain middle The maximum value is approximated here using the particle swarm optimization algorithm;

[0060] If condition 2 is met, a local evaluation of the neighborhood of the optimal solution is performed; otherwise, condition 3 is determined.

[0061] Condition 3: If Condition 2 is not met, it is considered that the GPR surrogate model has achieved sufficient confidence in the local region. In this case, if the mean satisfies the following constraint, the subsequent BO iterations are terminated, expressed as:

[0062] ;

[0063] in, This represents the maximum predicted mean value in the local assessment region. Represents the maximum predicted mean across the entire domain, with a preset deviation. .

[0064] Furthermore, the specific implementation method of embedding the aforementioned design optimization termination criterion into the BO includes the following steps:

[0065] S1. Given the objective function domain Initial sample size Maximum number of evaluations Termination parameter;

[0066] Furthermore, given the objective function If the optimization objective is to maximize its evaluation value, then the problem can be formally represented as:

[0067] ;

[0068] Among them, the decision vector for 3D real vector, domain ,function ;

[0069] S2. In Internal collection Initial samples and using To conduct an evaluation and obtain the initial training dataset ;

[0070] Furthermore, regarding the training dataset ,in The noise is independent and identically distributed Gaussian random noise; the posterior distribution of GPR. It also follows a Gaussian distribution, therefore, given a new set of observation points When the predicted mean and predicted variance of the objective function are respectively:

[0071] ;

[0072] in, for With training point set covariance, The covariance between observation points, The covariance between training points, It is the predicted value of the noise variance. .

[0073] S3. Based on Training the GPR agent model;

[0074] Furthermore, building based on The GPR proxy model approximates the objective function. .

[0075] S4. Optimize the acquisition function based on the GPR proxy model to obtain candidate solutions. ;

[0076] Optimize the acquisition function The acquisition function is used to weigh exploration against utilization, quantifying the potential benefits of candidate query points. BO determines the next evaluation point by maximizing the acquisition function.

[0077] ;

[0078] Common data collection functions include upper confidence bound (UCB), expected improvement, improvement probability, and entropy search (ES), where UCB is represented as:

[0079] ;

[0080] in, It is a hyperparameter that balances exploration and utilization.

[0081] S5. Evaluation objective function value ,renew Then return to step S2 for the next round of iterative optimization, and repeat the above process until the termination criterion is met.

[0082] Furthermore, update the training dataset. Determine the next assessment point BO then evaluated the point and obtained the observed values. and new data Add to training dataset ;

[0083] Furthermore, during the BO iteration process, the current optimal value If no improvement is achieved in multiple iterations, two situations may occur: (1) The algorithm gets trapped in the objective function. (1) The local optimum is difficult to escape; (2) It is close to or has reached the global optimum. For the first case, in order to avoid the algorithm repeatedly sampling and evaluating at this point, the algorithm can be helped to escape the local region by enhancing the exploration ability of the sampling function. It should be noted that the local optimum here not only refers to the local optimum of the objective function itself, but also includes the false local optimum generated by the GPR surrogate model due to inaccurate modeling. As shown in Figure 2, the global optimum estimate of the GPR model is significantly different from the local optimum and global optimum of the real objective function. This phenomenon usually stems from insufficient evaluation of sample points in the neighborhood, which leads to the surrogate model failing to fully learn the local features of the objective function, resulting in insufficient fitting accuracy of the surrogate model in this region. To this end, as shown in Figure 3, the strategy of local dense sampling is introduced. By enhancing the evaluation in the candidate optimal region, the fitting accuracy of the surrogate model can be effectively improved, thereby accelerating convergence. It should be emphasized that the enhancement of the exploration ability of the sampling function has an upper limit, and local dense sampling should not be carried out indefinitely. Therefore, a theoretically based optimization termination criterion is given.

[0084] Further methods to enhance the ability to explore functions include the following steps:

[0085] Current optimal value continuous If the next iteration does not improve the result, then enhance the acquisition function. The ability to explore; using UCB as the acquisition function, assuming a maximum of [number] operations. With each enhancement of exploration ability, the exploration-utilization balance coefficient increases. Updated to:

[0086] ;

[0087] in, It is the inverse function of the cumulative distribution function of the standard normal distribution. At the basic confidence level, The maximum enhancement level is the maximum confidence level. and basic confidence level difference, This represents the number of times exploration enhancements have been performed so far, and .

[0088] Furthermore, the method for local evaluation of the neighborhood of the optimal solution includes the following steps:

[0089] For the current optimal solution Using the current optimal solution as the center, construct a side with a length of The hypercube as a local evaluation space and in Latin hypercube sampling was used to obtain One sampling point;

[0090] Use the objective function right Each sampling point is evaluated, and the training sample set is updated. and optimal data pair ,in, ;

[0091] Side length of the hypercube for: ;

[0092] in, These are the domains The upper and lower bounds, ;

[0093] Number of sampling points ,in, Decision variables dimensionality ,and Round up.

[0094] The following is an example illustrating this implementation method:

[0095] First, let's take a benchmark test set as an example to illustrate the effectiveness of this invention.

[0096] The classic test functions Ackley, Levy, Schwefel, and Rastrigin, which are widely used in the field of optimization, are selected, respectively, in terms of dimension. Test on the issue. Set the maximum number of evaluations. The initial assessment quantity is And it was tested using three hyperparameter optimization problems widely used in machine learning to verify BO performance: a support vector machine (SVM) with parameters = 2, a hyperparameter optimization problem with parameters = 5, and a hyperparameter optimization problem with parameters = 2. multi-layer perceptron (MLP) and =8 XGBoost. Other settings are the same as in the test function.

[0097] The effectiveness of the termination method is evaluated using the following two indicators:

[0098] (1) Relative computational cost:

[0099] ;

[0100] in, This indicates the number of times BO was evaluated when the termination method was terminated. The smaller the value, the less the evaluation budget is required for the termination method.

[0101] (2) Relative performance loss:

[0102] ;

[0103] in, and These respectively indicate that BO is completed. The optimal and worst solutions in this evaluation process. This represents the optimal solution when terminating using the termination method. The smaller the value, the less optimization performance is lost by the termination method.

[0104] use This indicates the three termination thresholds (low, medium, and high) set for each method. Table 1 shows the cost and performance loss of different termination methods at BO termination, respectively, expressed in terms of... To measure this. As can be seen from the table, Method 1 has the lowest cost, followed by the method in this embodiment, but the performance loss of the method in this embodiment is approximately 5% lower than that of Method 1. Method 3... Although the median of the index is small, its cost consumption is the most unstable, and its performance loss is the greatest. Furthermore, the performance loss of the method in this embodiment is... It takes less time than all other methods, and due to the effects of exploration enhancement and local evaluation, it can achieve better optimization results than the original BO.

[0105] Table 1. Different methods Significance assessment

[0106]

[0107] Note: The significance level is set at 0.05. The symbols indicate that the method in this embodiment is significantly superior to other methods. If the method meets the requirements, it means that other methods are significantly superior to the method in this embodiment.

[0108] Second, the method of this embodiment is then applied to the planning of space target access missions:

[0109] In this task, set The decision variables for the number of pulses applied include the time at which each pulse is applied. and the applied pulse vector The last two pulses can be based on a predetermined... and This was obtained by solving a two-point boundary value problem. The optimization objective is the total velocity increment.

[0110] ;

[0111] To accurately model this type of orbital transfer process, a spacecraft rendezvous and transfer simulation model capable of applying multiple velocity pulses was used to evaluate the total velocity increment and trajectory feasibility under different transfer strategies. The relevant parameters are shown in Table 2.

[0112] Table 2 Parameter Table for Spacecraft Rendezvous and Transfer Model

[0113]

[0114] Set the number of pulses applied to As shown in Figure 4, Method 1 terminated earliest, followed by Method 4. Although the method in this embodiment terminated later, its optimization performance at termination was the best. The remaining methods used up all evaluation attempts and their optimization performance was inferior to that of the method in this embodiment, which is consistent with the analysis in the first part. This once again verifies the advantage of the method in this embodiment in maintaining optimization performance and effectively reducing optimization costs.

[0115] It should be noted that relational terms such as "first" and "second" are used merely to distinguish one entity or operation from another, and do not necessarily require or imply any such actual relationship or order between these entities or operations. Furthermore, the terms "comprising," "including," or any other variations thereof are intended to cover non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements includes not only those elements but also other elements not expressly listed, or elements inherent to such a process, method, article, or apparatus. Without further limitations, an element defined by the phrase "comprising one..." does not exclude the presence of other identical elements in the process, method, article, or apparatus that includes said element.

[0116] Although this application has been described above with reference to specific embodiments, various modifications can be made and components can be replaced with equivalents without departing from the scope of this application. In particular, as long as there is no structural conflict, the features in the specific embodiments disclosed in this application can be combined with each other in any way. The lack of an exhaustive description of these combinations in this specification is merely for the sake of brevity and resource conservation. Therefore, this application is not limited to the specific embodiments disclosed herein, but includes all technical solutions falling within the scope of the claims.

Claims

1. A stopping method for a spacecraft target access optimization mission based on local search and exploration enhancement, characterized in that, The design optimization termination criterion is that the maximum prediction standard deviation of the GPR surrogate model in the current local region is lower than a preset threshold. When the time is right, terminate the iterative optimization of BO; embed the design optimization termination criterion into BO; Prediction standard deviation using the GPR surrogate model Measuring the GPR proxy model's performance on the objective function Cognitive uncertainty at a certain point; setting judgment conditions to determine whether to enhance the exploratory enhancement capabilities of the acquisition function and whether to trigger local evaluation; Condition 1: First, the current optimal solution must be required. In continuous No improvement was made in the second iteration; the second requirement is that the GPR surrogate model is at the current optimal solution. The prediction standard deviation satisfies the following constraints: ; in, For the preset threshold, The number of consecutive iterations to the current optimal solution; If condition 1 is met, the ability to explore the function is enhanced; if not, the termination judgment of this iteration is terminated. Condition 2: Based on the premise that the ability to explore the function is enhanced by satisfying Condition 1, determine whether to perform local evaluation; Condition 2 is that when the maximum uncertainty within a local area satisfies the following constraint, a local evaluation is performed, expressed as: ; Where max is the maximum value function. Let be the decision vector. for The domain, Domain middle The maximum value is approximated here using the particle swarm optimization algorithm; If condition 2 is met, a local evaluation of the neighborhood of the optimal solution is performed; otherwise, condition 3 is determined. Condition 3: If Condition 2 is not met, it is considered that the GPR surrogate model has achieved sufficient confidence in the local region. In this case, if the mean satisfies the following constraint, the subsequent BO iterations are terminated, expressed as: ; in, This represents the maximum predicted mean value in the local assessment region. Represents the maximum predicted mean across the entire domain, with a preset deviation. .

2. The stopping method for a spacecraft target access optimization mission based on local search and exploration enhancement as described in claim 1, characterized in that, The specific implementation method of embedding the aforementioned design optimization termination criterion into the BO includes the following steps: S1. Given the objective function domain Initial sample size Maximum number of evaluations Termination parameter; S2. In Internal collection Initial samples and using To conduct an evaluation and obtain the initial training dataset ; S3. Based on Training the GPR agent model; S4. Optimize the acquisition function based on the GPR proxy model Obtain candidate solutions ; S5. Evaluation objective function value ,renew Then return to step S2 for the next round of iterative optimization, and repeat the above process until the termination criterion is met.

3. The stopping method for a spacecraft target access optimization mission based on local search and exploration enhancement as described in claim 2, characterized in that, Methods to enhance the ability to explore functions include the following steps: Current optimal value continuous If the next iteration does not improve the result, then enhance the acquisition function. The ability to explore; using UCB as the acquisition function, assuming a maximum of [number] operations. With each enhancement of exploration ability, the exploration-utilization balance coefficient increases. Updated to: ; in, It is the inverse function of the cumulative distribution function of the standard normal distribution. At the basic confidence level, The maximum enhancement level is the maximum confidence level. and basic confidence level difference, This represents the number of times exploration enhancements have been performed so far, and .

4. The stopping method for a spacecraft target access optimization mission based on local search and exploration enhancement as described in claim 3, characterized in that, The method for local evaluation of the neighborhood of the optimal solution includes the following steps: For the current optimal solution Using the current optimal solution as the center, construct a side with a length of The hypercube as a local evaluation space and in Latin hypercube sampling was used to obtain One sampling point; Use the objective function right Each sampling point is evaluated, and the training sample set is updated. and optimal data pair ,in, ; Side length of the hypercube for: ; in, , These are the domains The upper and lower bounds, ,and ; Number of sampling points ,in, Decision variables dimensionality ,and Round up.