A multi-objective vehicle path optimization method, system, electronic device and medium

By employing deep reinforcement learning models and evolutionary transfer optimization methods, the problems of low efficiency and insufficient accuracy in large-scale multi-objective vehicle routing problems are solved, achieving efficient optimization of five-objective vehicle routing problems with time windows.

CN116384602BActive Publication Date: 2026-06-23HUAQIAO UNIVERSITY

Patent Information

Authority / Receiving Office
CN · China
Patent Type
Patents(China)
Current Assignee / Owner
HUAQIAO UNIVERSITY
Filing Date
2022-12-26
Publication Date
2026-06-23

AI Technical Summary

Technical Problem

Existing multi-objective vehicle routing optimization algorithms are inefficient and difficult to scale in large-scale problems, and heuristic algorithms have low accuracy and cannot effectively solve five-objective vehicle routing problems with time windows.

Method used

By employing a deep reinforcement learning model combined with evolutionary transfer learning, a multi-objective optimization method is constructed through the decomposition and training of sub-problem models of the main task and auxiliary tasks. The offline training and online decision-making capabilities of the deep reinforcement learning model are utilized to optimize vehicle path planning.

Benefits of technology

It achieves efficient solution to multi-objective vehicle routing problems with time windows, improves the accuracy and efficiency of planning, and can optimize five objectives simultaneously.

✦ Generated by Eureka AI based on patent content.

Smart Images

  • Figure CN116384602B_ABST
    Figure CN116384602B_ABST
Patent Text Reader

Abstract

The application discloses a kind of multi-target vehicle path optimization method, system, electronic equipment and medium, it is related to vehicle scheduling and intelligent optimization field, the method includes: the customer point sequence and vehicle data of current time of acquisition are input in each main task subproblem model and each auxiliary task subproblem model, obtain the planning scheme of current main task and the planning scheme of current auxiliary task;Current planning scheme is determined as initial population, and the initial population of main task is iteratively optimized according to the initial population of auxiliary task using the way of evolutionary migration, obtain optimal planning scheme, and the subproblem model is obtained based on decomposition model to deep reinforcement learning model training;Main task decomposition model is obtained by decomposing main task model, and auxiliary task decomposition model is obtained by decomposing auxiliary task model;Main task model is constructed based on five targets;Auxiliary task model is constructed based on two targets in five targets.The application can improve the planning efficiency and accuracy of vehicle path.
Need to check novelty before this filing date? Find Prior Art

Description

Technical Field

[0001] This invention relates to the field of vehicle scheduling and intelligent optimization, and in particular to a multi-objective vehicle route optimization method, system, electronic device, and medium. Background Technology

[0002] The vehicle routing problem, proposed by Dantzig and Ramser in 1959, is a classic combinatorial optimization problem and is NP-hard. Due to its wide applicability across multiple scenarios such as logistics, vehicle scheduling, and traffic planning, and its significant economic value, it has always been a research hotspot. In recent years, with the rapid development of the social economy and the internet, different types of vehicle routing problems have emerged based on the basic problem and combined with different problem environments and settings. Among these, the vehicle routing problem with multiple objectives and time windows (MOVRPTW), which is more realistic and involves five mutually constraining optimization objectives, has received considerable attention.

[0003] Current optimization algorithms for multi-objective vehicle routing problems are mainly divided into exact algorithms and heuristic algorithms. Exact algorithms, including Dijkstra's algorithm, branch and bound, and dynamic programming, can obtain the globally optimal solution. However, as the problem size increases, the computational cost of exact algorithms grows exponentially, making them difficult to scale to large-scale problems. Therefore, they struggle with five-objective vehicle routing problems with time windows. Heuristic algorithms, including genetic algorithms, tabu search, and simulated annealing, generally operate iteratively and can arrive at a solution within a certain timeframe. However, the number of iterations also increases significantly with the problem size. Furthermore, the solutions obtained by heuristic algorithms rely on intuitive or expert experience, resulting in relatively low accuracy. Especially when encountering new problems, the algorithm needs to start searching from scratch. For five-objective vehicle routing problems with time windows, heuristic algorithms are clearly not the optimal choice.

[0004] With the rapid development of artificial intelligence technology and cloud computing platforms, deep learning technology has broken through the barriers of traditional methods in many fields and achieved groundbreaking results. Deep reinforcement learning is mainly used for sequential decision-making. Based on Markov decision processes, the agent acts as the main body, making corresponding action choices based on the current environmental state. The environment provides feedback based on the actions, and the agent continuously adjusts its strategy according to the quality of the feedback to achieve the set goal. The process of the agent's actions interacting with the environment is very similar to the selection of decision variables in combinatorial optimization within the decision space. Moreover, deep reinforcement learning can be "trained offline and decided online," making it possible to solve combinatorial optimization problems in real time. However, in existing deep reinforcement learning algorithms for solving vehicle routing problems, most address typical vehicle routing optimization problems with only one or two optimization objectives and relatively few constraints. Furthermore, the distance between customer points is calculated based on Euclidean distance, resulting in symmetric distance and time matrices. In contrast, for real-world problems, the distance and time between customer points are asymmetric.

[0005] The methods described above are all based on solving a single problem. However, in the real world, there are similarities between vehicle routing problems of the same type with different scales or conditions. Therefore, how to achieve multi-objective vehicle routing optimization has become an urgent problem to be solved. Summary of the Invention

[0006] Based on this, embodiments of the present invention provide a multi-objective vehicle routing optimization method, system, electronic device, and medium to solve a vehicle routing problem with five objectives and a time window, that is, to achieve multi-objective vehicle routing optimization and improve the planning efficiency and accuracy of vehicle routes.

[0007] To achieve the above objectives, the present invention provides the following solution:

[0008] A multi-objective vehicle routing optimization method includes:

[0009] Obtain the current customer point sequence and vehicle data; the customer point sequence includes: the location of the customer point, the demand information of the customer point, and the time window of the customer point; the vehicle information includes: the location of the vehicle; each customer point is served by one vehicle within the corresponding time window;

[0010] The system invokes a first predetermined number of main task sub-problem models and a second predetermined number of auxiliary task sub-problem models. The main task sub-problem models are obtained by training a deep reinforcement learning model based on a main task decomposition model. The auxiliary task sub-problem models are obtained by training a deep reinforcement learning model based on an auxiliary task decomposition model. The main task decomposition model is obtained by decomposing the main task model by a third predetermined number of decompositions. The auxiliary task decomposition model is obtained by decomposing the auxiliary task model by a fourth predetermined number of decompositions. The first predetermined number is less than the third predetermined number. The second predetermined number is less than the fourth predetermined number. The main task model is constructed based on five objectives. The auxiliary task model is constructed based on two of the five objectives. The five objectives include: the number of vehicles assigned, the sum of the travel distances of all vehicles after completing services at all customer points, the maximum travel distance of all vehicles after completing services at assigned customer points, the sum of the waiting times of all vehicles in each group of paths in the planning scheme, and the sum of the delay times of all vehicles in each group of paths in the planning scheme.

[0011] The current customer point sequence and the current vehicle data are input into each of the main task sub-problem models to obtain the planning scheme for the current main task. The current customer point sequence and the current vehicle data are input into each of the auxiliary task sub-problem models to obtain the planning scheme for the current auxiliary task. The planning scheme includes: multiple planning paths; one planning path corresponds to one vehicle; each planning path includes the order in which the vehicle serves each customer point.

[0012] The planning scheme of the current primary task is determined as the initial population of the primary task, and the planning scheme of the current secondary task is determined as the initial population of the secondary task; each individual in the initial population corresponds to a planning path.

[0013] An evolutionary migration approach is used to iteratively optimize the initial population of the main task based on the initial population of the auxiliary task, thereby obtaining the optimal planning scheme for the customer point sequence at the current moment; the optimal planning scheme is the planning scheme for the main task under the set termination condition for a certain number of iterations.

[0014] Optionally, the step of using evolutionary migration to iteratively optimize the initial population of the main task based on the initial population of the auxiliary task to obtain the optimal planning scheme for the customer point sequence at the current moment specifically includes:

[0015] The initial population for the main task is evaluated and ranked under a problem environment with five objectives, and the N individuals with the lowest ranking are selected.

[0016] The initial population for the auxiliary task is evaluated and ranked under the problem environment of two objectives, and the top N individuals are selected.

[0017] A temporary population of size 2N is formed by the N individuals ranked last in the initial population of the main task and the N individuals ranked first in the initial population of the auxiliary task.

[0018] Each individual in the temporary population is evaluated and ranked in a problem environment with five objectives. The top N individuals in the temporary population are used to replace the bottom N individuals in the initial population of the main task. The initial population of the main task after the replacement is the parent population.

[0019] Iterate through each individual in the parent population, randomly select a genetic operator to generate a offspring population, and combine the offspring population with the parent population to obtain a new population;

[0020] Elite selection is performed on the new population to obtain the selected new population;

[0021] A local search operation is performed on the selected new population to obtain the optimal planning scheme for the current customer point sequence.

[0022] Optionally, the method for determining the first predetermined number of main task sub-problem models and the second predetermined number of auxiliary task sub-problem models is as follows:

[0023] Construct a main task model and a secondary task model;

[0024] The main task model is decomposed into a third predetermined number of main task decomposition models using a weighted summation method, and the auxiliary task model is decomposed into a fourth predetermined number of auxiliary task decomposition models; each main task decomposition model corresponds to a different weight vector; each auxiliary task decomposition model corresponds to a different weight vector.

[0025] A corresponding deep reinforcement learning model is constructed for each of the main task decomposition models, and a corresponding deep reinforcement learning model is constructed for each of the auxiliary task decomposition models.

[0026] Based on the main task decomposition model and the corresponding weight vector, determine the loss function of the deep reinforcement learning model corresponding to the main task decomposition model; based on the auxiliary task decomposition model and the corresponding weight vector, determine the loss function of the deep reinforcement learning model corresponding to the auxiliary task decomposition model.

[0027] Acquire training data; the training data includes: historical customer point sequences, historical vehicle data, and historical planning schemes;

[0028] The training data is input into each of the deep reinforcement learning models respectively, and the gradient descent method is used to train the models with the goal of minimizing the corresponding loss function, so as to obtain the trained deep reinforcement learning models.

[0029] The trained deep reinforcement learning model corresponding to the main task decomposition model is determined as the main task sub-problem model, and a first set number of main task sub-problem models are obtained.

[0030] The trained deep reinforcement learning model corresponding to the auxiliary task decomposition model is determined as the auxiliary task sub-problem model, thus obtaining a second set number of auxiliary task sub-problem models.

[0031] Optionally, the main task model includes: a main task objective function;

[0032] The objective function of the main task is:

[0033] minF1={f1,f2,f3,f4,f5};

[0034] F1 represents the objective function of the main task; f1 represents the number of vehicles assigned; f2 represents the sum of the distances traveled by all vehicles after completing services to all customer points; f3 represents the maximum distance traveled by all vehicles after completing services to their assigned customer points; f4 represents the sum of the waiting times of all vehicles in each group of paths in the planning scheme; f5 represents the sum of the delay times of all vehicles in each group of paths in the planning scheme.

[0035] in,

[0036] f1 = M

[0037]

[0038]

[0039]

[0040]

[0041] M represents the number of planned paths in the planning scheme; Dis j T represents the distance traveled by a vehicle along the j-th planned route to serve all customer points along that route; j W represents the travel time for a vehicle to complete services to all customer points along the j-th planned route; j D represents the waiting time incurred by a vehicle on the j-th planned route when it completes service to all customer points; j This represents the delay time generated by vehicles in the j-th planned path.

[0042] Optionally, the auxiliary task model includes: an auxiliary task objective function;

[0043] The objective function for the auxiliary task is:

[0044] minF2={f2,f3};

[0045] F2 represents the objective function of the auxiliary task; f2 represents the sum of the distances traveled by all vehicles after completing services to all customer points; f3 represents the maximum distance traveled by all vehicles after completing the services to their assigned customer points.

[0046]

[0047]

[0048] M represents the number of planned paths in the planning scheme; Dis j T represents the distance traveled by a vehicle along the j-th planned route to serve all customer points along that route; j This represents the travel time for a vehicle to complete services to all customer points along the j-th planned route.

[0049] Optionally, the deep reinforcement learning model includes: an encoder and a decoder connected in sequence;

[0050] The encoder includes: a linear layer, a first multi-head attention layer, a first residual network layer, a first feedforward neural network layer, and a second residual network layer connected in sequence;

[0051] The decoder includes: a second multi-head attention layer, a third residual network layer, a third multi-head attention layer, a fourth residual network layer, a second feedforward neural network layer, and a fifth residual network layer connected in sequence.

[0052] Optionally, the loss function is:

[0053]

[0054] g ws (π|λ u ) represents the condition for a customer point sequence π, based on the u-th decomposition model and the corresponding weight vector λ. u The loss function of the deep reinforcement learning model corresponding to the determined decomposition model; the decomposition model is either the main task decomposition model or the auxiliary task decomposition model; λ uk represents the weight vector of the k-th objective function in the u-th decomposition model; o represents the number of objective functions; when the decomposition model is the main task decomposition model, k = 1, 2, 3, 4, 5; when the decomposition model is the auxiliary task decomposition model, k = 2, 3; f k (π) represents the value of the k-th objective function when calculating the customer point sequence π.

[0055] The present invention also provides a multi-objective vehicle routing optimization system, comprising:

[0056] The data acquisition module is used to acquire the current customer point sequence and vehicle data; the customer point sequence includes: the location of the customer point, the demand information of the customer point, and the time window of the customer point; the vehicle information includes: the location of the vehicle; each customer point is served by one vehicle within the corresponding time window;

[0057] The model invocation module is used to invoke a first predetermined number of main task sub-problem models and a second predetermined number of auxiliary task sub-problem models. The main task sub-problem models are obtained by training a deep reinforcement learning model based on a main task decomposition model. The auxiliary task sub-problem models are obtained by training a deep reinforcement learning model based on an auxiliary task decomposition model. The main task decomposition model is obtained by decomposing the main task model by a third predetermined number of decompositions. The auxiliary task decomposition model is obtained by decomposing the auxiliary task model by a fourth predetermined number of decompositions. The first predetermined number is less than the third predetermined number. The second predetermined number is less than the fourth predetermined number. The main task model is constructed based on five objectives. The auxiliary task model is constructed based on two of the five objectives. The five objectives include: the number of assigned vehicles, the sum of the travel distances of all vehicles after completing services at all customer points, the maximum travel distance of all vehicles after completing services at assigned customer points, the sum of the waiting times of all vehicles in each group of paths in the planning scheme, and the sum of the delay times of all vehicles in each group of paths in the planning scheme.

[0058] The initial plan planning module is used to input the current customer point sequence and the current vehicle data into each of the main task sub-problem models to obtain the planning scheme for the current main task, and to input the current customer point sequence and the current vehicle data into each of the auxiliary task sub-problem models to obtain the planning scheme for the current auxiliary task. The planning scheme includes: multiple planning paths; one planning path corresponds to one vehicle; each planning path includes the order in which the vehicle serves each customer point.

[0059] The initial population determination module is used to determine the planning scheme of the current main task as the initial population of the main task and the planning scheme of the current auxiliary task as the initial population of the auxiliary task; each individual in the initial population corresponds to a planning path.

[0060] The optimization scheme determination module is used to iteratively optimize the initial population of the main task based on the initial population of the auxiliary task using an evolutionary migration method to obtain the optimal planning scheme for the customer point sequence at the current time; the optimal planning scheme is the planning scheme of the main task under the set number of iterations that meet the termination conditions.

[0061] The present invention also provides an electronic device, including a memory and a processor, wherein the memory is used to store a computer program, and the processor runs the computer program to enable the electronic device to perform the above-described multi-objective vehicle path optimization method.

[0062] The present invention also provides a computer-readable storage medium storing a computer program that, when executed by a processor, implements the above-described multi-objective vehicle path optimization method.

[0063] According to specific embodiments provided by the present invention, the present invention discloses the following technical effects:

[0064] This invention proposes a multi-objective vehicle routing optimization method, system, electronic device, and medium. The method involves inputting the current customer point sequence and vehicle data into the main task sub-problem models and the auxiliary task sub-problem models to obtain the current planning schemes for the main task and the auxiliary tasks. The current planning scheme is used as the initial population, and an evolutionary transfer approach is employed to iteratively optimize the initial population of the main task based on the initial population of the auxiliary tasks, yielding the optimal planning scheme. The sub-problem models are obtained by training a deep reinforcement learning model based on a decomposition model. The main task decomposition model is obtained by decomposing the main task model, and the auxiliary task decomposition model is obtained by decomposing the auxiliary task model. The main task model is constructed based on five objectives, and the auxiliary task model is constructed based on two of the five objectives. This invention combines deep learning, multi-objective optimization, and multi-task optimization to provide an efficient and feasible solution to the multi-objective vehicle routing optimization problem with time windows. This invention achieves the solution of the five-objective vehicle routing problem with time windows, thus realizing multi-objective vehicle routing optimization and improving the efficiency and accuracy of vehicle routing planning. Attached Figure Description

[0065] To more clearly illustrate the technical solutions in the embodiments of the present invention or the prior art, the drawings used in the embodiments will be briefly introduced below. Obviously, the drawings described below are only some embodiments of the present invention. For those skilled in the art, other drawings can be obtained based on these drawings without creative effort.

[0066] Figure 1 A flowchart of a multi-objective vehicle path optimization method provided in an embodiment of the present invention;

[0067] Figure 2 This is a schematic diagram of the structure of a deep reinforcement learning model provided in an embodiment of the present invention;

[0068] Figure 3 A flowchart illustrating the method for calling and optimizing a model when solving a set of customer point sequences, as provided in an embodiment of the present invention;

[0069] Figure 4 This is a flowchart of population evolution and migration in a multi-task optimization environment based on auxiliary tasks, provided by an embodiment of the present invention.

[0070] Figure 5 This is a structural diagram of the multi-objective vehicle path optimization system provided in an embodiment of the present invention. Detailed Implementation

[0071] The technical solutions of the embodiments of the present invention will be clearly and completely described below with reference to the accompanying drawings. Obviously, the described embodiments are only some embodiments of the present invention, and not all embodiments. Based on the embodiments of the present invention, all other embodiments obtained by those skilled in the art without creative effort are within the scope of protection of the present invention.

[0072] To address the shortcomings and problems of existing technologies for solving multi-objective vehicle routing problems, this invention proposes a multi-objective vehicle routing optimization method. This method is an optimization approach based on auxiliary task construction to solve multi-objective vehicle routing problems, addressing the issues of existing algorithms being unsuitable for large-scale problems, inefficient, and lacking scalability. This invention not only solves a five-objective vehicle routing problem with time windows but also leverages knowledge transfer through auxiliary tasks to accelerate optimization and improve the accuracy and diversity of solutions. The main concept of this invention is as follows: based on the mathematical modeling of the five-objective vehicle routing problem with time windows, the optimization objectives are determined as the main task; an auxiliary task is constructed for the main task, and a knowledge transfer scheme is determined to promote the convergence of the main task solution; deep reinforcement learning models are trained for both the main task and the auxiliary task, and multi-task optimization based on the auxiliary task is performed.

[0073] To make the above-mentioned objects, features and advantages of the present invention more apparent and understandable, the present invention will be further described in detail below with reference to the accompanying drawings and specific embodiments.

[0074] Example 1

[0075] See Figure 1 The multi-objective vehicle path optimization method in this embodiment includes:

[0076] Step 101: Obtain the current customer point sequence and vehicle data; the customer point sequence includes: the location of the customer point, the demand information of the customer point, and the time window of the customer point; the vehicle information includes: the location of the vehicle; each customer point is served by one vehicle within the corresponding time window.

[0077] Step 102: Call the first set number of main task sub-problem models and the second set number of auxiliary task sub-problem models.

[0078] The main task sub-problem model is obtained by training a deep reinforcement learning model based on the main task decomposition model; the auxiliary task sub-problem model is obtained by training a deep reinforcement learning model based on the auxiliary task decomposition model; the main task decomposition model is obtained by decomposing the main task model by a third predetermined number of decompositions; the auxiliary task decomposition model is obtained by decomposing the auxiliary task model by a fourth predetermined number of decompositions; the first predetermined number is less than the third predetermined number; the second predetermined number is less than the fourth predetermined number; the main task model is constructed based on five objectives; the auxiliary task model is constructed based on two of the five objectives; the five objectives include: the number of vehicles assigned, the sum of the driving distances of all vehicles after completing all customer point services, the maximum value of the driving distances of all vehicles after completing the assigned customer point services, the sum of the waiting times of all vehicles in each group of paths in the planning scheme, and the sum of the delay times of all vehicles in each group of paths in the planning scheme.

[0079] The method for determining the first predetermined number of main task sub-problem models and the second predetermined number of auxiliary task sub-problem models is as follows:

[0080] 1) Construct the main task model and the auxiliary task model.

[0081] 2) The main task model is decomposed into a third set number of main task decomposition models and the auxiliary task model is decomposed into a fourth set number of auxiliary task decomposition models using a weighted summation method; each main task decomposition model corresponds to a different weight vector; each auxiliary task decomposition model corresponds to a different weight vector.

[0082] 3) Construct a corresponding deep reinforcement learning model for each of the main task decomposition models, and construct a corresponding deep reinforcement learning model for each of the auxiliary task decomposition models.

[0083] 4) Based on the main task decomposition model and the corresponding weight vector, determine the loss function of the deep reinforcement learning model corresponding to the main task decomposition model, and based on the auxiliary task decomposition model and the corresponding weight vector, determine the loss function of the deep reinforcement learning model corresponding to the auxiliary task decomposition model.

[0084] 5) Obtain training data; the training data includes: customer point sequences at historical times, vehicle data at historical times, and planning schemes at historical times.

[0085] 6) Input the training data into each of the deep reinforcement learning models respectively, and use gradient descent to train them with the goal of minimizing the corresponding loss function, so as to obtain the trained deep reinforcement learning models.

[0086] 7) The trained deep reinforcement learning model corresponding to the main task decomposition model is determined as the main task sub-problem model, thus obtaining a first set number of main task sub-problem models.

[0087] 8) The trained deep reinforcement learning model corresponding to the auxiliary task decomposition model is determined as the auxiliary task sub-problem model, and a second set number of auxiliary task sub-problem models are obtained.

[0088] Step 103: Input the current customer point sequence and the current vehicle data into each of the main task sub-problem models to obtain the planning scheme for the current main task. Input the current customer point sequence and the current vehicle data into each of the auxiliary task sub-problem models to obtain the planning scheme for the current auxiliary task.

[0089] The planning scheme includes: multiple planning routes; one planning route corresponds to one vehicle; each planning route includes the order in which the vehicle serves each customer point.

[0090] Step 104: Determine the planning scheme of the current primary task as the initial population of the primary task, and determine the planning scheme of the current secondary task as the initial population of the secondary task; each individual in the initial population corresponds to a planning path.

[0091] Step 105: Using an evolutionary migration approach, iteratively optimize the initial population of the main task based on the initial population of the auxiliary task to obtain the optimal planning scheme for the customer point sequence at the current moment; the optimal planning scheme is the planning scheme of the main task after a set number of iterations that satisfy the set termination condition. If the set termination condition is met, the iteration stops, and the set termination condition is reaching the user-defined running time.

[0092] Step 105 specifically includes:

[0093] 1) Evaluate and rank the initial population of the main task under the problem environment of five objectives, and select the N individuals with the lowest ranking.

[0094] 2) Evaluate and rank the initial population of the auxiliary task under the problem environment of two objectives, and select the top N individuals.

[0095] 3) A temporary population of size 2N is formed by the N individuals ranked last in the initial population of the main task and the N individuals ranked first in the initial population of the auxiliary task.

[0096] 4) Evaluate and rank each individual in the temporary population under the five-objective problem environment, and replace the N individuals ranked last in the initial population of the main task with the N individuals ranked first in the temporary population. The initial population of the main task after the replacement is the parent population.

[0097] 5) Traverse each individual in the parent population, randomly select a genetic operator to generate a offspring population, and combine the offspring population with the parent population to obtain a new population.

[0098] 6) Perform elite selection on the new population to obtain the selected new population.

[0099] 7) Perform a local search operation on the selected new population to obtain the optimal planning scheme for the customer point sequence at the current time.

[0100] The main task model in step 102 includes: a main task objective function; the main task objective function is:

[0101] minF1={f1,f2,f3,f4,f5};

[0102] F1 represents the objective function of the main task; f1 represents the number of vehicles assigned; f2 represents the sum of the distances traveled by all vehicles after completing services to all customer points; f3 represents the maximum distance traveled by all vehicles after completing services to their assigned customer points; f4 represents the sum of the waiting times of all vehicles in each group of paths in the planning scheme; f5 represents the sum of the delay times of all vehicles in each group of paths in the planning scheme.

[0103] in,

[0104] f1 = M

[0105]

[0106]

[0107]

[0108]

[0109] M represents the number of planned paths in the planning scheme; Dis j T represents the distance traveled by a vehicle along the j-th planned route to serve all customer points along that route; j W represents the travel time for a vehicle to complete services to all customer points along the j-th planned route; j D represents the waiting time incurred by a vehicle on the j-th planned route when it completes service to all customer points; j This represents the delay time generated by vehicles in the j-th planned path.

[0110] The auxiliary task model in step 102 includes: an auxiliary task objective function; the auxiliary task objective function is:

[0111] minF2={f2,f3};

[0112] F2 represents the objective function of the auxiliary task; f2 represents the sum of the distances traveled by all vehicles after completing services to all customer points; f3 represents the maximum distance traveled by all vehicles after completing the services to their assigned customer points.

[0113]

[0114]

[0115] M represents the number of planned paths in the planning scheme; Dis j T represents the distance traveled by a vehicle along the j-th planned route to serve all customer points along that route; j This represents the travel time for a vehicle to complete services to all customer points along the j-th planned route.

[0116] Furthermore, both the primary task model and the secondary task model include the same constraints. These constraints include: capacity constraints, time constraints, and service constraints. The capacity constraint states that the passenger load on each route cannot exceed the total vehicle capacity at any time. The time constraint states that the delay in a vehicle arriving at a customer point cannot exceed the maximum allowed delay after the customer point, and the time it takes for a vehicle to return to the depot after serving the last customer point is less than or equal to the depot's closing time. The service constraint states that customers boarding and alighting in any sequence can only be on the same route.

[0117] The deep reinforcement learning model in step 102 includes an encoder and a decoder connected in sequence.

[0118] The encoder includes: a linear layer, a first multi-head attention layer, a first residual network layer, a first feedforward neural network layer, and a second residual network layer connected in sequence.

[0119] The decoder includes: a second multi-head attention layer, a third residual network layer, a third multi-head attention layer, a fourth residual network layer, a second feedforward neural network layer, and a fifth residual network layer connected in sequence.

[0120] The loss function in step 102 is:

[0121]

[0122] g ws (π|λ u ) represents the condition for a customer point sequence π, based on the u-th decomposition model and the corresponding weight vector λ. uThe loss function of the deep reinforcement learning model corresponding to the determined decomposition model; the decomposition model is either the main task decomposition model or the auxiliary task decomposition model; λ uk represents the weight vector of the k-th objective function in the u-th decomposition model; o represents the number of objective functions; when the decomposition model is the main task decomposition model, k = 1, 2, 3, 4, 5; when the decomposition model is the auxiliary task decomposition model, k = 2, 3; f k (π) represents the value of the k-th objective function when calculating the customer point sequence π.

[0123] In practical applications, one implementation process of the above-mentioned multi-objective vehicle path optimization method is as follows:

[0124] 1) Based on the practical definition of the five-objective vehicle routing optimization problem with time windows, the parking lot needs to arrange a group of vehicles of the same type to provide services to multiple customers with known demand. Each customer has a time window (final service time and latest service time), and each customer is served by one and only one vehicle within their time window. Each vehicle cannot provide services to customer points on a path exceeding its maximum capacity. Furthermore, the following five objectives need to be minimized while satisfying all constraints: 1. The total number of vehicles required by the solution; 2. The total distance traveled by vehicles after serving all customer points in the solution; 3. The maximum travel distance of vehicles used in the solution; 4. The total waiting time for all customer points during the service process in the solution; 5. The total delay time caused by vehicles arriving at each customer point in the solution. (See step 2 for a detailed introduction to the five-objective vehicle routing optimization problem with time windows).

[0125] 2) In this embodiment, the five objectives mentioned in step 1) are represented by f1, f2, f3, f4, and f5 respectively. Therefore, the mathematical problem model (main task model) of the vehicle routing problem with time window for the five objectives can be defined as: minF1 = {f1, f2, f3, f4, f5}. The specific calculation formulas of the five objective functions f1, f2, f3, f4, and f5 of this mathematical model will not be repeated here.

[0126] The solution to the mathematical model requires a planning scheme to allocate vehicles so that they can serve n customer points in m planned routes, with each route being handled by one vehicle.

[0127] 3) Take the vehicle path optimization problem with time window of the five objectives in step 2) as the main task. On this basis, define an auxiliary task with two objectives according to the importance and priority of the objectives, that is, the auxiliary task model minF2={f2,f3}.

[0128] 4) Based on steps 2) and 3), the multi-objective problem is decomposed into multiple sub-problems using a weighted summation method, resulting in a third set number of main task decomposition models and a fourth set number of auxiliary task decomposition models. In this embodiment, the main task model is decomposed into 70 sub-problems, resulting in 70 main task decomposition models, and the auxiliary task model is decomposed into 100 sub-problems, resulting in 100 auxiliary task decomposition models. Corresponding deep reinforcement learning models (such as attention models) are then constructed for the main task decomposition models and auxiliary task decomposition models, respectively. Data with randomly generated customer points based on the actual problem (distance matrix and time matrix are both asymmetric matrices) are used as training data, and the constructed models are trained using the Actor-Critic algorithm.

[0129] 5) Determine if there are any unprocessed customer point sequences. If so, proceed to step 6); otherwise, the program enters a waiting state until an unprocessed customer point sequence appears, and then proceeds to step 6.

[0130] 6) After reading the customer point sequence and the data information such as the demand and time window of each customer point, process them to obtain the feature values ​​required by the model as input, and call the model trained in step 4) respectively. The main task and the auxiliary task obtain the solution of the customer point sequence through the model. The obtained solutions are used as individuals to construct the initial population of the two tasks.

[0131] 7) Evaluate and rank the initial populations of the two tasks in their respective task environments. Combine the top N individuals in the auxiliary task with the bottom N individuals in the initial population of the main task to form a temporary population C of size 2N. Proceed to step 8).

[0132] 8) Evaluate and rank the temporary population from step 7) in the main task environment, and replace the N worst-performing individuals in the main task population with the top-ranked individuals from the temporary population C as the new parents, then proceed to step 9).

[0133] 9) Based on the parent generation generated in step 8), perform crossover and mutation on them using genetic operators to generate offspring, evaluate the offspring in the main task environment, and update the external archive A. Then, perform environmental selection on the parent generation and the different offspring generated to form a new parent generation population, until all main task populations have been traversed; proceed to step 10).

[0134] 10) After the knowledge transfer stage in step 9), a local search is performed on the external archive A. In this invention, there are three main local search operators for random selection. The individuals after the local search are compared with the individuals in A to select the best one. Finally, multiple non-dominant allocation schemes are obtained. The optimal one is selected as the final solution. The processed customer point sequence is deleted from the customer point queue S, and the status of the vehicles with allocated customer points is marked as "performing a task". Proceed to step 11).

[0135] 11) Set external archive A to an empty set and return to step 5).

[0136] In practical applications, a logistics distribution center or delivery point needs to provide delivery services to a large number of customers every day. Each customer has different delivery volume requirements and different delivery time windows. This requires the distribution center or delivery point to arrange delivery vehicles according to its own fleet size. The multi-objective vehicle routing optimization method mentioned above can optimize the delivery vehicle routes to meet delivery requirements.

[0137] This embodiment of the multi-objective vehicle path optimization method, based on the definition and constraints of a multi-objective vehicle path problem with five objectives that better reflects real-world application scenarios, constructs a simulation environment that conforms to the problem, providing the environment and problem state for the deep reinforcement learning model; proposes a deep reinforcement learning model based on an attention mechanism to derive a planning scheme that meets the constraints; introduces the concept of auxiliary tasks, constructing a mathematical model for the auxiliary tasks of the main task; proposes a multi-task optimization environment and introduces local search to iteratively optimize the allocation scheme. This embodiment of the invention, by combining deep reinforcement learning, multi-objective optimization, and multi-task optimization, provides an efficient and feasible solution to the multi-objective vehicle path optimization problem with time windows.

[0138] To further explain the above multi-objective vehicle routing optimization method in detail, a more specific implementation process is provided below:

[0139] This embodiment addresses the multi-objective vehicle routing optimization problem and designs an optimization method based on auxiliary task construction. Its technical concept mainly includes the following four key points: 1. Based on the definition and constraints of a more realistic five-objective vehicle routing optimization problem with time windows, a simulation environment conforming to the problem is constructed to provide the environment and problem state for the deep reinforcement learning model; 2. An attention-based deep reinforcement learning model is proposed to derive a planning scheme that meets the constraints as the initial solution; 3. A multi-objective optimization problem containing two objectives is proposed as its auxiliary task, and the same model is trained; 4. Multi-task optimization is introduced to utilize the auxiliary task to improve the running speed of the main task (original problem) and the quality of the obtained allocation scheme.

[0140] The multi-objective vehicle routing optimization method in this specific example includes the following steps:

[0141] Step 1: Based on the more realistic definition and constraints of the five-objective vehicle routing problem with time windows, we can obtain a five-objective combinatorial optimization problem: minF1={f1,f2,f3,f4,f5}. This problem requires planning all customer points in a sequence according to the solution given by the algorithm. The n customer points in the customer point sequence are divided into m groups of different planning paths, and each group of paths is assigned one vehicle. That is, the solution X is a set of M paths, represented as X={r1,r2,...,r M}, for r i ={c i,1 ,c i,2 ,...,c (i,j)} is the order in which vehicles serve customers, given by the planning scheme, c i,j Let j represent the j-th customer point on the i-th path. Each customer has its own earliest and latest service times as its time window.

[0142] The objectives of this model are defined as follows:

[0143] 1) f1 represents the number of vehicles required in the current planning scheme:

[0144] f1 = |R| = M

[0145] 2) f2 represents the sum of the distances traveled by all vehicles in the current planning scheme after completing services to all customer points:

[0146]

[0147] 3) f3 represents the maximum distance (or travel time) traveled by all vehicles in the current planning scheme after completing their assigned customer point services:

[0148]

[0149] 4) If a vehicle arrives before the earliest service time set at a customer's location, it must wait until the earliest service time at that customer's location before it can provide service. Therefore, f4 represents the total waiting time for all vehicles in the current planning scheme after completing service to all customer locations.

[0150]

[0151] 5) This embodiment uses a soft time window, meaning the vehicle provides service within the maximum allowable delay time md. If a vehicle arrives at the customer's location after the latest service time, it will cause a delay. Therefore, f5 represents the total delay time for all vehicles in the current planning scheme after completing service to all customer points.

[0152]

[0153] The meanings of the variables involved in the five target definitions described above will be explained in detail below.

[0154] R: Represents the path set in the current customer sequence planning scheme, and its length is used to represent the number of vehicles.

[0155] Dis j : This represents the distance traveled by a vehicle along path j to serve all customer points on that path. The calculation process is as follows:

[0156] In the multi-objective vehicle routing optimization problem, this embodiment only considers the travel distance between all customer points located on the same path. Specifically, in this embodiment, the travel distance from the starting point of the vehicle on the j-th path to the coordinates of all customer points is calculated as follows:

[0157]

[0158] Where, d c(i,j)c(i+1,j) This represents the distance from the i-th customer point to the (i+1)-th customer point for a vehicle on the j-th route. In this embodiment, all distances are formed by asymmetric matrices, meaning that the distance from the i-th customer point to the (i+1)-th customer point is not the same as the distance from the (i+1)-th customer point to the i-th customer point.

[0159] T j : This represents the travel time for the vehicle to complete all orders along the j-th path. In this embodiment, the speed is set to 1 for ease of model training, therefore the travel time T is... j With driving route Dis j They can be the same in numerical value.

[0160] W j This represents the waiting time incurred by a vehicle on route j in planning scheme X when completing service to all customer points. Waiting time means that if a vehicle arrives at a customer point before its earliest service start time, it must wait until that customer point's earliest service start time to provide service. Therefore, waiting time affects the vehicle's arrival time at the next customer point. The calculation process is as follows:

[0161]

[0162] Among them, b c(i,j) a is the earliest service time for the i-th customer point in the j-th path; c(i,j) Let be the time it takes for the vehicle on the j-th path to reach customer point i. The calculation process is as follows:

[0163] a c (i,j)=l c (i-1,j)+t c(i-1,j)c(i,j)

[0164] Among them, l c (i-1,j) represents the time when the vehicle leaves the (i-1)th customer point in the j-th path, t c (i-1,j)c( i,j Let l be the travel time of a vehicle on the j-th path from the (i-1)-th customer point to the ith customer point (assuming a speed of 1, and the travel time exists in the form of an asymmetric matrix). When the vehicle leaves the parking lot, l c =0.

[0165] Then W j The calculation method is as follows:

[0166]

[0167] Among them, w c(i,j) This represents the waiting time incurred when a vehicle on the j-th path serves the i-th customer point.

[0168] D j : This represents the delay time incurred by a vehicle on the j-th path. If a vehicle arrives at the customer's location later than the latest service time for the passenger, but not later than the maximum allowed lateness time md at the customer's location, then the vehicle will incur a delay time. The calculation process is as follows:

[0169]

[0170] Among them, delay c(i,j) The delay time incurred when a vehicle on the j-th path serves the i-th customer point is calculated as follows:

[0171]

[0172] Among them, e c(i,j) This represents the latest service time for the i-th customer point in the j-th path.

[0173] Step 2: In the multi-objective vehicle path optimization problem, specifically the five-objective multi-objective vehicle path optimization problem defined in this embodiment, there are also several constraints, which are defined as follows:

[0174] 1) Capacity constraints

[0175] The number of passengers carried by vehicles on each route must not exceed the total vehicle capacity at any time, as follows:

[0176]

[0177] Where Q is the vehicle capacity (7 in this embodiment); in different customer sequence planning schemes, the vehicle on the j-th path reaches the i-th customer point c.(i,j) The current load can be defined as q. ci,j) And it must satisfy the following constraints:

[0178]

[0179] 2) Time constraints

[0180] In real-world scenarios, emergencies often arise, requiring vehicles to arrive within the maximum acceptable time window for passengers. Therefore, to address this situation while maintaining vehicle capacity, each customer point has a maximum permissible vehicle delay time *md*, meaning the vehicle's arrival delay at the customer point cannot exceed this maximum permissible delay. Thus, the planning scheme for the five-objective vehicle routing problem with time windows must ensure that vehicles meet the time constraints at customer points as follows:

[0181]

[0182] Where md represents the maximum allowed lateness time for passengers;

[0183] In addition, service vehicles must return to the parking lot before it closes, and each parking lot has a designated closing time. Therefore, the planning scheme for each customer point sequence must also ensure that the vehicles meet the parking lot time window constraints:

[0184]

[0185] The time it takes for a vehicle on the j-th path to return to the depot after completing service to the last customer point; This refers to the closing time of the parking lot.

[0186] 3) Service Constraints

[0187] In a planning scheme, customers getting on and off the bus in any sequence can only be on the same path.

[0188] Step 3: Take the more realistic five-objective vehicle routing optimization problem from the above steps as the main task, and construct an auxiliary task with two objectives: minF2={f2,f3}.

[0189] The objectives of this model are defined as follows:

[0190] 1) f2 represents the sum of the distances traveled by all vehicles in the current planning scheme after completing services to all customer points:

[0191]

[0192] 2) f3 represents the maximum distance (or time) traveled by all vehicles in the current planning scheme after completing their assigned customer point services:

[0193]

[0194] The meanings, calculation methods, and constraints of the relevant variables involved in the two target definitions described above are the same as those in the main task, and will not be repeated here.

[0195] Step 4: Decompose the main task and auxiliary task described in Steps 2 and 3 into 70 and 100 sub-problems respectively using a decomposition strategy. This embodiment employs a weighted summation decomposition strategy, with each sub-problem corresponding to a weight vector. The weight values ​​represent the different objective functions emphasized by each sub-problem; therefore, the objective function value of each sub-problem is the actual objective function value obtained by calculating its weight value. The objective function calculation process for each sub-problem is as follows:

[0196]

[0197] There are o objective components to be optimized (i.e., the number of sub-problems). In this embodiment, the vehicle routing problems with five objectives and two objectives are decomposed respectively. Therefore, in this embodiment, o = 5, 2; π is the sequence of customer point visits in each path of the allocation scheme, f k (π) represents the k-th objective function value calculated on the sequence π, and λ uk Let be the weight value of the k-th objective function in the u-th subproblem.

[0198] Step 5: Based on the vehicle service to customer points process in the five-objective vehicle routing problem with time windows, establish a simulation environment. Use the product of the weights of each sub-problem and the objective function value as the loss function to construct a deep reinforcement learning model. This model determines the next customer point information to be assigned to the vehicle based on the states of the vehicle and the customer point. Furthermore, parameters are passed between each sub-problem model using a neighborhood-based parameter transfer strategy. The model construction and training framework are as follows: Figure 2As shown, the model mainly consists of two parts: an encoder and a decoder. Both the encoder and decoder are composed of linear layers, multi-head attention layers (MHA), residual network layers (Add & Norm), feed-forward neural network layers, and residual network layers (Add & Norm). During training, randomly generated vehicle routing problem data with time windows that meet five objectives are input in the form of a graph. Each customer point and parking lot is a node in the graph, and each node has corresponding location information, demand information, and time window information. The model first encodes the features (node ​​information, edge information) of the training data through the encoder. The transformation process of the linear layer on the data is as follows:

[0199]

[0200] Let x be the features of the i-th customer point (location coordinates, time window, and number of people), W and b be the neural network parameters, and x be the number of customers. i This indicates that the data for the i-th customer point is input.

[0201] Afterwards, the context vector of the entire data is output through relevant calculations in the MHA, Add&Norm, and Feed-Forward layers. The decoder uses this vector and the information of currently unvisited nodes to calculate the probability of selecting the next customer order node for the vehicle, and then uses random sampling to select it until all customer points are selected. Each time a selection is made, the model updates the vehicle and customer point information through the defined simulation environment; this process is the decoding process. The problem state mainly includes the vehicle's current location coordinates, remaining vehicle capacity, remaining unassigned customer points, and the location and number of customers at the next customer point served by the vehicle.

[0202] Step 6: The model training process is as follows Figure 2 As shown, this embodiment employs the Actor-Critic reinforcement learning algorithm to train the model. The Actor part consists of an encoder and a decoder, while the Critic part consists of four 1D convolutional layers (1D-Conv). The solution obtained by the Actor is used to calculate the target value based on its weights, which serves as the loss function. The target value obtained by the solution from the Critic is used as a baseline to calculate the gradient. The algorithm minimizes the loss function through gradient descent, ultimately obtaining the desired model.

[0203] Step 7: After training the main task and auxiliary task using Step 6, 70 sub-models (main task sub-problem models) and 100 sub-models (auxiliary task sub-problem models) are obtained respectively. These are directly called when allocating orders. For example... Figure 3As shown, first determine if there are any unprocessed customer point sequences in the customer point sequence queue. If there are, proceed to step 8; otherwise, enter the waiting state until an unprocessed customer point sequence appears, and then proceed to step 8.

[0204] Step 8: When there is an unprocessed customer point sequence, randomly select 5 models from the 70 main task sub-problem models, use the customer point sequence as input data and solve the problem using the selected model to obtain the planning scheme S1 for the sequence; at the same time, randomly select 5 models from the 100 auxiliary task sub-problem models to obtain the planning scheme A2 for the sequence, and proceed to step 9.

[0205] Step 9: Use the planning schemes A1 and S2 obtained from the two models as the initial populations P1 and P2 for the multi-task evolutionary algorithm. Each population consists of multiple individuals, and an individual refers to a planning scheme for that sequence, which is composed of paths consisting of different sequential sequences of vehicle service customer points. The multi-task evolutionary process is as follows: Figure 4 As shown, after obtaining the initial population through the model, population P1 is evaluated and ranked under the five-objective problem environment of the main task, and population P2 is evaluated and ranked under the two-objective problem environment of the auxiliary task. Then, the bottom N individuals (N=15 in this embodiment) are selected from population P1, and the top N individuals are selected from population P2 to form a temporary population C with a population size of 2N. Each individual in population C is re-evaluated and ranked under the main task environment, and the best N individuals replace the worst N individuals in population P1 to obtain a new parent population P. Each individual in population P is traversed, and offspring are generated by randomly selecting a genetic operator (in this embodiment, the genetic operators include crossover, mutation, and inversion). These offspring are evaluated under the main task environment, and the generated offspring are combined with their parent population P to form a new population P′. Then, elite selection is performed on population P′ based on the main task environment, and the external archive A is updated. Proceed to step 10.

[0206] Step 10: In step 9, the final population P′ is obtained. To further improve the quality and efficiency of the allocation scheme, this step performs a local search operation on population P′. Each individual in population P′ is traversed, and a local search operator is randomly selected for each individual (in this embodiment, there are three different local search operators to choose from) and the external archive A is updated. Then, an environment selection is performed on the external archive A, ultimately yielding the optimal solution set for the main task. Based on the path combinations in this set, vehicles are allocated as needed to provide services to the assigned customer points, and the status of vehicles at the assigned customer points is marked as "performing a task".

[0207] In this embodiment, the comparison between different individuals in the population, i.e., different allocation schemes, is conducted through multi-objective dominance relationships, which are defined as follows:

[0208] Let there be different allocation schemes x1 and x2:

[0209] Condition 1: For all target values, f j (x1)≤f j (x2), j = 1, 2, 3, 4, 5;

[0210] Condition 2: There exists at least one objective value such that f j (x1)<f j (x2)

[0211] If both conditions 1 and 2 are satisfied, then x1 is said to be dominant over x2; otherwise, x1 and x2 are said to be mutually non-dominant, and x1 and x2 are non-dominant solutions.

[0212] Under this relationship, the external archive update strategy is as follows:

[0213] Case 1: If the external archive A is empty, the generated planning scheme x will be directly added to A;

[0214] Case 2: If the external archive A is not empty, then the newly generated planning scheme x′ is compared with all planning schemes in the external archive A for dominance: If there is already a scheme in A that is superior to scheme x′, or is the same as scheme x′, then scheme x′ is discarded; if scheme x′ is superior to any existing planning scheme in A, then the superior scheme in A is deleted, and scheme x′ is added to the external archive A; if scheme x′ is not superior to any of the planning schemes in A, then scheme x′ is added to the external archive A.

[0215] After completing step 10, the customer point sequence is deleted from the customer point queue. At the same time, it is determined whether the customer point sequence queue is empty. If the queue is empty, the entire process ends; otherwise, proceed to step 11.

[0216] Step 11: Clear external save file A and proceed to step 7.

[0217] The multi-objective vehicle path optimization method in this embodiment has the following advantages:

[0218] 1. The present invention provides an optimization method and system for solving multi-objective vehicle routing problems based on auxiliary task construction. For multi-objective vehicle routing planning problems that are more in line with practical applications, a new method is proposed to solve the five-objective vehicle routing problem with time window, which has higher quality solutions and better scalability than traditional methods.

[0219] 2. For the five-objective vehicle routing problem with time windows, the main point is to allocate the customer point sequence into multiple different paths, and assign one vehicle to each path to complete the service to all customer points. This invention constructs the process of each vehicle departing from the depot, picking up each customer, and finally returning to the depot as the basic task, and further constructs a multi-objective deep reinforcement learning model by combining reinforcement learning, so that the model can effectively solve the problem.

[0220] 3. This invention takes the multi-objective vehicle path optimization problem with five objectives as the main task and constructs an auxiliary task with two objectives for it, thus forming a multi-task and multi-objective optimization scenario. This allows the main task and auxiliary task to promote the optimal allocation scheme of the main task through knowledge transfer evolutionary optimization, thereby improving the performance and efficiency of the allocation method.

[0221] Example 2

[0222] In order to implement the method corresponding to Embodiment 1 above and achieve the corresponding functions and technical effects, a multi-objective vehicle path optimization system is provided below.

[0223] See Figure 5 The system includes:

[0224] The data acquisition module 501 is used to acquire the current customer point sequence and vehicle data; the customer point sequence includes: the location of the customer point, the demand information of the customer point, and the time window of the customer point; the vehicle information includes: the location of the vehicle; each customer point is served by one vehicle within the corresponding time window.

[0225] The model invocation module 502 is used to invoke a first predetermined number of main task sub-problem models and a second predetermined number of auxiliary task sub-problem models. The main task sub-problem models are obtained by training a deep reinforcement learning model based on a main task decomposition model. The auxiliary task sub-problem models are obtained by training a deep reinforcement learning model based on an auxiliary task decomposition model. The main task decomposition model is obtained by decomposing the main task model by a third predetermined number of decompositions. The auxiliary task decomposition model is obtained by decomposing the auxiliary task model by a fourth predetermined number of decompositions. The first predetermined number is less than the third predetermined number. The second predetermined number is less than the fourth predetermined number. The main task model is constructed based on five objectives. The auxiliary task model is constructed based on two of the five objectives. The five objectives include: the number of vehicles assigned, the sum of the driving distances of all vehicles after completing services at all customer points, the maximum value of the driving distances of all vehicles after completing services at assigned customer points, the sum of the waiting times of all vehicles in each group of paths in the planning scheme, and the sum of the delay times of all vehicles in each group of paths in the planning scheme.

[0226] The initial plan planning module 503 is used to input the current customer point sequence and the current vehicle data into each of the main task sub-problem models to obtain the planning scheme for the current main task, and to input the current customer point sequence and the current vehicle data into each of the auxiliary task sub-problem models to obtain the planning scheme for the current auxiliary task. The planning scheme includes: multiple planning paths; one planning path corresponds to one vehicle; each planning path includes the order in which the vehicle serves each customer point.

[0227] The initial population determination module 504 is used to determine the planning scheme of the current main task as the initial population of the main task and the planning scheme of the current auxiliary task as the initial population of the auxiliary task; each individual in the initial population corresponds to a planning path.

[0228] The optimization scheme determination module 505 is used to iteratively optimize the initial population of the main task based on the initial population of the auxiliary task using an evolutionary migration method to obtain the optimal planning scheme of the customer point sequence at the current time; the optimal planning scheme is the planning scheme of the main task under the set number of iterations that meet the set termination conditions.

[0229] Example 3

[0230] This embodiment provides an electronic device, including a memory and a processor. The memory stores a computer program, and the processor runs the computer program to enable the electronic device to execute the multi-objective vehicle path optimization method of Embodiment 1.

[0231] Alternatively, the aforementioned electronic device may be a server.

[0232] In addition, embodiments of the present invention also provide a computer-readable storage medium storing a computer program that, when executed by a processor, implements the multi-objective vehicle path optimization method of Embodiment 1.

[0233] The various embodiments in this specification are described in a progressive manner, with each embodiment focusing on its differences from other embodiments. Similar or identical parts between embodiments can be referred to interchangeably. For the systems disclosed in the embodiments, since they correspond to the methods disclosed in the embodiments, the descriptions are relatively simple; relevant parts can be referred to the method section.

[0234] This document uses specific examples to illustrate the principles and implementation methods of the present invention. The descriptions of the above embodiments are only for the purpose of helping to understand the method and core ideas of the present invention. Furthermore, those skilled in the art will recognize that, based on the ideas of the present invention, there will be changes in the specific implementation methods and application scope. Therefore, the content of this specification should not be construed as a limitation of the present invention.

Claims

1. A multi-objective vehicle path optimization method, characterized in that, include: Obtain the current customer location sequence and vehicle data; The customer point sequence includes: the location of the customer point, the demand information of the customer point, and the time window of the customer point; the vehicle information includes: the location of the vehicle; each customer point is served by one vehicle within the corresponding time window; The system invokes a first predetermined number of main task sub-problem models and a second predetermined number of auxiliary task sub-problem models. The main task sub-problem models are obtained by training a deep reinforcement learning model based on a main task decomposition model. The auxiliary task sub-problem models are obtained by training a deep reinforcement learning model based on an auxiliary task decomposition model. The main task decomposition model is obtained by decomposing the main task model by a third predetermined number of decompositions. The auxiliary task decomposition model is obtained by decomposing the auxiliary task model by a fourth predetermined number of decompositions. The first predetermined number is less than the third predetermined number. The second predetermined number is less than the fourth predetermined number. The main task model is constructed based on five objectives. The auxiliary task model is constructed based on two of the five objectives. The five objectives include: the number of vehicles assigned, the sum of the travel distances of all vehicles after completing services at all customer points, the maximum travel distance of all vehicles after completing services at assigned customer points, the sum of the waiting times of all vehicles in each group of paths in the planning scheme, and the sum of the delay times of all vehicles in each group of paths in the planning scheme. The current customer point sequence and the current vehicle data are input into each of the main task sub-problem models to obtain the planning scheme for the current main task. The current customer point sequence and the current vehicle data are input into each of the auxiliary task sub-problem models to obtain the planning scheme for the current auxiliary task. The planning scheme includes: multiple planning paths; one planning path corresponds to one vehicle; each planning path includes the order in which the vehicle serves each customer point. The planning scheme of the current primary task is determined as the initial population of the primary task, and the planning scheme of the current secondary task is determined as the initial population of the secondary task; each individual in the initial population corresponds to a planning path. An evolutionary migration approach is used to iteratively optimize the initial population of the main task based on the initial population of the auxiliary task, thereby obtaining the optimal planning scheme for the customer point sequence at the current moment; the optimal planning scheme is the planning scheme for the main task under the set termination condition for a certain number of iterations.

2. The multi-objective vehicle path optimization method according to claim 1, characterized in that, The method of using evolutionary migration to iteratively optimize the initial population of the main task based on the initial population of the auxiliary task, to obtain the optimal planning scheme for the customer point sequence at the current moment, specifically includes: The initial population for the main task is evaluated and ranked under a problem environment with five objectives, and the N individuals with the lowest ranking are selected. The initial population for the auxiliary task is evaluated and ranked under the problem environment of two objectives, and the top N individuals are selected. A temporary population of size 2N is formed by the N individuals ranked last in the initial population of the main task and the N individuals ranked first in the initial population of the auxiliary task. Each individual in the temporary population is evaluated and ranked in a problem environment with five objectives. The top N individuals in the temporary population are used to replace the bottom N individuals in the initial population of the main task. The initial population of the main task after the replacement is the parent population. Iterate through each individual in the parent population, randomly select a genetic operator to generate a offspring population, and combine the offspring population with the parent population to obtain a new population; Elite selection is performed on the new population to obtain the selected new population; A local search operation is performed on the selected new population to obtain the optimal planning scheme for the current customer point sequence.

3. The multi-objective vehicle path optimization method according to claim 1, characterized in that, The method for determining the first set number of main task sub-problem models and the second set number of auxiliary task sub-problem models is as follows: Construct a main task model and a secondary task model; The main task model is decomposed into a third predetermined number of main task decomposition models using a weighted summation method, and the auxiliary task model is decomposed into a fourth predetermined number of auxiliary task decomposition models; each main task decomposition model corresponds to a different weight vector; each auxiliary task decomposition model corresponds to a different weight vector. A corresponding deep reinforcement learning model is constructed for each of the main task decomposition models, and a corresponding deep reinforcement learning model is constructed for each of the auxiliary task decomposition models. Based on the main task decomposition model and the corresponding weight vector, determine the loss function of the deep reinforcement learning model corresponding to the main task decomposition model; based on the auxiliary task decomposition model and the corresponding weight vector, determine the loss function of the deep reinforcement learning model corresponding to the auxiliary task decomposition model. Obtain training data; The training data includes: historical customer point sequences, historical vehicle data, and historical planning schemes. The training data is input into each of the deep reinforcement learning models respectively, and the gradient descent method is used to train the models with the goal of minimizing the corresponding loss function, so as to obtain the trained deep reinforcement learning models. The trained deep reinforcement learning model corresponding to the main task decomposition model is determined as the main task sub-problem model, and a first set number of main task sub-problem models are obtained. The trained deep reinforcement learning model corresponding to the auxiliary task decomposition model is determined as the auxiliary task sub-problem model, thus obtaining a second set number of auxiliary task sub-problem models.

4. The multi-objective vehicle path optimization method according to claim 3, characterized in that, The main task model includes: the main task objective function; The objective function of the main task is: minF1={f1,f2,f3,f4,f5}; F1 represents the objective function of the main task; f1 represents the number of vehicles assigned; f2 represents the sum of the distances traveled by all vehicles after completing services to all customer points; f3 represents the maximum distance traveled by all vehicles after completing services to their assigned customer points; f4 represents the sum of the waiting times of all vehicles in each group of paths in the planning scheme; f5 represents the sum of the delay times of all vehicles in each group of paths in the planning scheme. in, f1 = M M represents the number of planned paths in the planning scheme; Dis j T represents the distance traveled by a vehicle along the j-th planned route to serve all customer points along that route; j W represents the travel time for a vehicle to complete services to all customer points along the j-th planned route; j D represents the waiting time incurred by a vehicle on the j-th planned route when it completes service to all customer points; j This represents the delay time generated by vehicles in the j-th planned path.

5. The multi-objective vehicle path optimization method according to claim 3, characterized in that, The auxiliary task model includes: an auxiliary task objective function; The objective function for the auxiliary task is: minF2={f2,f3}; F2 represents the objective function of the auxiliary task; f2 represents the sum of the distances traveled by all vehicles after completing services to all customer points; f3 represents the maximum distance traveled by all vehicles after completing the services to their assigned customer points. M represents the number of planned paths in the planning scheme; Dis j T represents the distance traveled by a vehicle along the j-th planned route to serve all customer points along that route; j This represents the travel time for a vehicle to complete services to all customer points along the j-th planned route.

6. The multi-objective vehicle path optimization method according to claim 3, characterized in that, The deep reinforcement learning model includes: an encoder and a decoder connected in sequence; The encoder includes: a linear layer, a first multi-head attention layer, a first residual network layer, a first feedforward neural network layer, and a second residual network layer connected in sequence; The decoder includes: a second multi-head attention layer, a third residual network layer, a third multi-head attention layer, a fourth residual network layer, a second feedforward neural network layer, and a fifth residual network layer connected in sequence.

7. The multi-objective vehicle path optimization method according to claim 3, characterized in that, The loss function is: g ws (π|λ u ) represents the condition for a customer point sequence π, based on the u-th decomposition model and the corresponding weight vector λ. u The loss function of the deep reinforcement learning model corresponding to the determined decomposition model; the decomposition model is either the main task decomposition model or the auxiliary task decomposition model; λ uk This represents the weight vector of the k-th objective function in the u-th decomposition model; o represents the number of objective functions; When the decomposition model is the primary task decomposition model, k = 1, 2, 3, 4, 5; when the decomposition model is the secondary task decomposition model, k = 2, 3; f k (π) represents the value of the k-th objective function when calculating the customer point sequence π.

8. A multi-objective vehicle routing optimization system, characterized in that, include: The data acquisition module is used to acquire the current customer point sequence and vehicle data. The customer point sequence includes: the location of the customer point, the demand information of the customer point, and the time window of the customer point; the vehicle information includes: the location of the vehicle; each customer point is served by one vehicle within the corresponding time window; The model invocation module is used to invoke a first predetermined number of main task sub-problem models and a second predetermined number of auxiliary task sub-problem models. The main task sub-problem models are obtained by training a deep reinforcement learning model based on a main task decomposition model. The auxiliary task sub-problem models are obtained by training a deep reinforcement learning model based on an auxiliary task decomposition model. The main task decomposition model is obtained by decomposing the main task model by a third predetermined number of decompositions. The auxiliary task decomposition model is obtained by decomposing the auxiliary task model by a fourth predetermined number of decompositions. The first predetermined number is less than the third predetermined number. The second predetermined number is less than the fourth predetermined number. The main task model is constructed based on five objectives. The auxiliary task model is constructed based on two of the five objectives. The five objectives include: the number of assigned vehicles, the sum of the travel distances of all vehicles after completing services at all customer points, the maximum travel distance of all vehicles after completing services at assigned customer points, the sum of the waiting times of all vehicles in each group of paths in the planning scheme, and the sum of the delay times of all vehicles in each group of paths in the planning scheme. The initial plan planning module is used to input the current customer point sequence and the current vehicle data into each of the main task sub-problem models to obtain the planning scheme for the current main task, and to input the current customer point sequence and the current vehicle data into each of the auxiliary task sub-problem models to obtain the planning scheme for the current auxiliary task. The planning scheme includes: multiple planning paths; one planning path corresponds to one vehicle; each planning path includes the order in which the vehicle serves each customer point. The initial population determination module is used to determine the planning scheme of the current main task as the initial population of the main task and the planning scheme of the current auxiliary task as the initial population of the auxiliary task; each individual in the initial population corresponds to a planning path. The optimization scheme determination module is used to iteratively optimize the initial population of the main task based on the initial population of the auxiliary task using an evolutionary migration method to obtain the optimal planning scheme for the customer point sequence at the current time; the optimal planning scheme is the planning scheme of the main task under the set number of iterations that meet the termination conditions.

9. An electronic device, characterized in that, The device includes a memory and a processor, the memory being used to store a computer program, and the processor running the computer program to cause the electronic device to perform the multi-objective vehicle path optimization method according to any one of claims 1 to 7.

10. A computer-readable storage medium, characterized in that, It stores a computer program that, when executed by a processor, implements the multi-objective vehicle path optimization method as described in any one of claims 1 to 7.