Space-time task scheduling method and device for port transportation and electronic equipment
By constructing a set of alternative storage locations for port transportation tasks and employing a reinforcement learning model, the problem of rigid binding between tasks and locations was solved, achieving seamless connection of vehicle tasks and efficient utilization of resources in the port transportation system, and improving overall transportation efficiency.
Patent Information
- Authority / Receiving Office
- CN · China
- Patent Type
- Patents(China)
- Current Assignee / Owner
- SHANGHAI MARITIME UNIVERSITY
- Filing Date
- 2026-04-10
- Publication Date
- 2026-06-16
AI Technical Summary
In existing port transportation systems, the rigid binding relationship between transportation tasks and operating locations limits the flexibility and continuity of vehicle tasks, resulting in frequent empty runs, low resource utilization, and limited system optimization efficiency.
By dynamically constructing a set of alternative storage locations for each transportation task, and utilizing a reinforcement learning model based on Markov decision processes and an Actor-Critic architecture, the task orchestration is optimized. Outbound and return task pairs that meet the time window matching and path reachability conditions are identified, generating vehicle transportation task orchestration schemes, which are then dynamically adjusted during execution.
It has achieved a seamless and continuous transport chain across the entire shoreline, reducing vehicle empty runs, improving resource utilization and transport efficiency, and ensuring the continuity and flexibility of missions.
Smart Images

Figure CN121998390B_ABST
Abstract
Description
Technical Field
[0001] This invention relates to the fields of intelligent transportation systems and port logistics technology, and in particular to a spatiotemporal task scheduling method, apparatus and electronic equipment for port transportation. Background Technology
[0002] In automated container terminals, horizontal transport systems are responsible for transferring containers between quay cranes and container yards, and their operational efficiency directly impacts the port's throughput capacity. The core challenge lies in minimizing the empty mileage of transport vehicles to improve the utilization rate of transport capacity and energy. To this end, the industry has developed numerous intelligent scheduling methods. Their common approach is to use optimization algorithms to allocate appropriate transport tasks to vehicles in real time and plan travel routes, aiming to achieve "vehicles not waiting for goods, and goods not waiting for vehicles."
[0003] However, in-depth analysis of existing solutions reveals a fundamental constraint that remains unresolved: the rigid binding relationship between transportation tasks and operational locations. In mainstream scheduling models, whether it's an unloading task (from a quay crane to a designated yard) or a loading task (from a designated yard to a quay crane), each task is assigned a unique and definite operational location (stacking location or container retrieval location) upon generation. The scheduling system performs task matching and route planning under this rigid constraint; vehicles must proceed to this fixed location to complete the task. This model severely limits the flexibility of system optimization.
[0004] Specifically, because the target location of a task is singular and fixed, the system has an extremely narrow selection space when searching for multiple consecutive and sequential tasks for a vehicle. For example, after a vehicle completes an unloading task, the starting point for its next task is locked at the same fixed storage yard location where unloading just took place. The system can only try to find loading tasks with matching time windows near that fixed point. If there are no suitable tasks nearby, or if the path to that fixed point is temporarily congested, the vehicle is often forced to drive empty to other areas, thus disrupting the continuous cargo handling chain. Conversely, the same applies to loading tasks. This task connection based on a fixed location is essentially a "nearest match" logic. While it can reduce some empty runs, it cannot proactively construct a closed-loop transportation chain at the global level, formed by the natural coupling of outbound and return tasks. This is because the key to forming a closed loop lies in the high degree of sequentiality of the spatial paths and the close temporal connection between the two tasks, and a single fixed location greatly limits the probability of such sequential combinations being discovered and successfully matched.
[0005] Therefore, existing solutions fail to fundamentally overcome the rigid limitations of the task model itself, and their scheduling optimization is performed within a narrow solution space. This results in the inability to completely eliminate vehicle idling, and the improvement of the overall transportation efficiency of the system encounters a bottleneck. Summary of the Invention
[0006] Therefore, it is necessary to provide a spatiotemporal task orchestration method, device, and electronic equipment for port transportation that can enhance the flexibility of task execution from the source, thereby creating the possibility of dynamically orchestrating seamless continuous transport chains across the entire shoreline.
[0007] This invention provides a spatiotemporal task orchestration method for port transportation, the method comprising:
[0008] Obtain multiple transportation tasks to be executed, each of which includes a defined starting point, the original target storage location, and the task execution time window;
[0009] Based on the remaining capacity of the storage location and the operation distance threshold from the task start point, multiple candidate storage locations are dynamically selected for each transportation task to form a set of alternative storage locations for the task, so as to expand the transportation task from a single target location to multiple candidate target locations.
[0010] Based on the set of candidate storage locations, the expanded transportation tasks are analyzed across the entire shoreline. Outbound and return tasks that meet the time window matching and path reachability conditions are identified as connectable task pairs, forming a candidate set for connecting outbound and return tasks. The time window matching and path reachability conditions refer to the following: the sum of the estimated completion time of the preceding task and the empty transfer time from the end position of the preceding task to the starting point of the subsequent task does not exceed the deadline of the execution time window of the subsequent task, and there exists a feasible path connecting the two task locations.
[0011] The vehicle transportation task scheduling process is modeled as a Markov decision process, and a reinforcement learning model based on the Actor-Critic architecture is used for collaborative optimization decision-making. The reinforcement learning model outputs actions for the vehicles based on the current system state, which includes the tasks to be executed, vehicle status, and operation network status. The actions include selecting a task from the tasks to be executed, selecting a storage location from the set of candidate storage locations for the task, and planning a driving path. The reinforcement learning model updates its strategy by maximizing the expected cumulative reward, and its reward function is a multi-objective reward function that comprehensively optimizes the vehicle's empty driving mileage, operation waiting time, and task delay.
[0012] Based on the optimization decision results, a vehicle transportation task scheduling scheme containing task sequences, selected storage locations, and driving routes is generated and issued to the corresponding vehicles for execution. During the execution process, when deviations or environmental changes are detected, the connection relationship and driving routes of subsequent tasks are dynamically adjusted.
[0013] In one embodiment, the step of acquiring multiple transportation tasks to be executed further includes:
[0014] Abstract the dock operation area into a directed operation network diagram:
[0015]
[0016] in, It is a set of work nodes, including quay crane work points, container yard crane work points, vehicle waiting points, and road intersections; Let be the set of directed paths that a vehicle can travel on, and each path... The association includes attribute vectors representing path length, estimated travel time, and congestion coefficient.
[0017] In one embodiment, the step of dynamically selecting multiple candidate storage locations for each transportation task specifically includes:
[0018] For any transportation task Its original attributes are ,in As the starting point of the task, This is the original target storage location. For task execution time window;
[0019] Construct a set of alternative heap locations for it Defined as:
[0020]
[0021] in, For the set of job nodes, Indicates the storage location The remaining capacity, Indicates starting from the origin To the stacking location distance, The preset maximum working distance threshold; by introducing The transportation task Expand to .
[0022] In one embodiment, identifying outbound and return tasks that meet the time window matching and path reachability conditions as connectable task pairs specifically involves:
[0023] Define task With the task It can be connected when the following conditions are met:
[0024]
[0025] And there exists a task End location to task starting point Feasible path ;in, For the task The estimated completion time, For vehicles from the mission Drive from the end point to the starting point Required empty transfer time; all tasks that meet the above conditions Constitutes the candidate set for outbound and return task connections .
[0026] In one embodiment, modeling the vehicle transportation task scheduling process as a Markov decision process specifically includes:
[0027] At the moment of decision , system status Defined as: ,in This is a set of transportation tasks to be performed. For the set of vehicle states, The current network state; the reinforcement learning model represents the action output by the vehicle. Defined as: ,in From The selected transportation task To the task alternative heap location set The selected storage location, The planned vehicle travel route.
[0028] In one embodiment, the multi-objective reward function is specifically defined as:
[0029]
[0030] in, This indicates the vehicle's empty mileage within the decision-making step. This indicates the vehicle's waiting time for operation. Indicates the amount of task delay. The weights are preset; the reinforcement learning model maximizes the expected cumulative reward. To update its decision-making strategy.
[0031] In one embodiment, the use of a reinforcement learning model based on the Actor-Critic architecture for collaborative optimization decision-making specifically includes:
[0032] Each transport vehicle is modeled as an independent intelligent agent, and each agent is equipped with a neural network model based on the Actor-Critic architecture; the Actor network adjusts according to the current system state. Output Action The probability distribution is used to select tasks, heap locations, and paths; the Critic network evaluates the state. The value of this is used to guide policy updates in the Actor network;
[0033] Through the multi-objective reward function The reverse propagation is used to collaboratively optimize the strategies of all vehicle agents in order to achieve optimal global transportation efficiency.
[0034] In one embodiment, the dynamic adjustment of the connection relationship and travel path of subsequent tasks specifically includes:
[0035] Dynamic adjustments are triggered when vehicle execution progress deviates from expectations, target storage location capacity changes abruptly, or critical path congestion occurs in the work network.
[0036] Based on the current system state The reinforcement learning model re-decides decisions for affected vehicles, calculates new task connections and travel routes, and updates the task scheduling schemes issued to the corresponding vehicles to maintain the continuity of the transportation chain.
[0037] The present invention also provides a spatiotemporal task scheduling device for port transportation, the device comprising:
[0038] The transportation task acquisition module is used to acquire multiple transportation tasks to be executed. Each transportation task includes a defined starting point, the original target storage location, and the task execution time window.
[0039] The candidate storage location filtering module is used to dynamically filter multiple candidate storage locations for each transportation task based on the remaining capacity of the storage location and the operation distance threshold from the task starting point, forming a set of alternative storage locations for the task, so as to expand the transportation task from a single target location to multiple candidate target locations.
[0040] The connectable task pair identification module is used to analyze the expanded transportation tasks within the entire shoreline based on the candidate storage location set, identify outbound and return tasks that meet the time window matching and path reachability conditions as connectable task pairs, and form a candidate set for connecting outbound and return tasks; wherein, the time window matching and path reachability conditions refer to: the sum of the estimated completion time of the preceding task and the empty transfer time from the end position of the preceding task to the starting point of the subsequent task does not exceed the deadline of the execution time window of the subsequent task, and there is a feasible path connecting the two task positions;
[0041] The decision-making process modeling module is used to model the vehicle transportation task scheduling process as a Markov decision process and adopts a reinforcement learning model based on the Actor-Critic architecture for collaborative optimization decision-making. The reinforcement learning model outputs actions to the vehicle based on the current system state, which includes the tasks to be executed, the vehicle state, and the operation network state. The actions include selecting a task from the tasks to be executed, selecting a storage location from the set of candidate storage locations for the task, and planning a driving path. The reinforcement learning model updates the strategy by maximizing the expected cumulative reward, and its reward function is a multi-objective reward function that comprehensively optimizes the vehicle's empty driving mileage, operation waiting time, and task delay.
[0042] The task orchestration scheme execution module is used to generate a vehicle transportation task orchestration scheme containing task sequences, selected storage locations, and driving routes based on the optimization decision results, and then distribute it to the corresponding vehicles for execution. During the execution process, when deviations or environmental changes are detected, the module dynamically adjusts the connection relationship and driving route of subsequent tasks.
[0043] The present invention also provides an electronic device, including a memory and a processor, wherein the memory stores a computer program, and the processor executes the computer program to implement the spatiotemporal task orchestration method for port transportation as described above.
[0044] The aforementioned spatiotemporal task orchestration method, apparatus, and electronic equipment for port transportation fundamentally breaks the rigid binding relationship between tasks and single fixed storage locations by dynamically constructing a set of candidate storage locations for each transportation task. This expands tasks from a single target location to multiple candidate target locations, thus providing spatial flexibility for task rearrangement and combination. Based on this, by analyzing the expanded tasks across the entire shoreline, pairs of outbound and return tasks that meet the time window matching and path reachability conditions are identified. A candidate set for connecting outbound and return tasks is proactively constructed, creating conditions for vehicles to form a continuous outbound-return transportation chain. Furthermore, a reinforcement learning model based on the Actor-Critic architecture is used for collaborative optimization decision-making. This model outputs actions to vehicles based on the system state, including task selection, selection of a location from the candidate location set, and driving path. It guides strategy updates through a multi-objective reward function aimed at comprehensively optimizing empty mileage, waiting time, and task delays. This achieves integrated collaborative orchestration of vehicle task sequences, storage locations, and driving paths from a global perspective, shifting the scheduling objective from single-task optimization to overall continuous optimization of the transportation chain. Ultimately, based on the optimization results, a task scheduling scheme is generated and dynamic adjustments are supported during execution, thereby ensuring the continuity and coordination of vehicle transportation operations. This scheme effectively overcomes the inherent defects of serious empty runs and difficulties in task coordination caused by fixed task target locations, and realizes optimized scheduling of "cargo-aircraft-vehicle" relay continuous operations throughout the entire port shoreline. Attached Figure Description
[0045] To more clearly illustrate the technical solutions in this invention or the prior art, the drawings used in the description of the embodiments or the prior art will be briefly introduced below. Obviously, the drawings described below are some embodiments of this invention. For those skilled in the art, other drawings can be obtained from these drawings without creative effort.
[0046] Figure 1 This is a flowchart of a spatiotemporal task orchestration method for port transportation according to an embodiment of the present invention;
[0047] Figure 2 This is a schematic diagram showing the connection relationship between the quay crane and the yard operation area nodes according to an embodiment of the present invention;
[0048] Figure 3 This is a spatial topology view of the outbound and return journey task connection in an embodiment of the present invention;
[0049] Figure 4 This is a collaborative view of the multi-vehicle task time window according to an embodiment of the present invention;
[0050] Figure 5 This is a schematic diagram of the central server processing core architecture according to an embodiment of the present invention;
[0051] Figure 6 This is a schematic diagram of a spatiotemporal task orchestration device for port transportation according to an embodiment of the present invention;
[0052] Figure 7 This is an internal structural diagram of an electronic device according to one embodiment. Detailed Implementation
[0053] To make the objectives, technical solutions, and advantages of the embodiments of the present invention clearer, the technical solutions of the embodiments of the present invention will be clearly and completely described below with reference to the accompanying drawings. Obviously, the described embodiments are only some embodiments of the present invention, not all embodiments. Based on the embodiments of the present invention, all other embodiments obtained by those skilled in the art without creative effort are within the scope of protection of the present invention.
[0054] The following is combined Figures 1-7 The present invention describes a spatiotemporal task scheduling method, apparatus, and electronic equipment for port transportation.
[0055] like Figure 1As shown in one embodiment, a spatiotemporal task orchestration method for port transportation aims to solve problems such as rigid coupling between tasks and locations, high empty-running rates, and resource waste caused by poor coordination between outbound and return tasks in traditional port transportation. This method achieves flexible task expansion by constructing a set of alternative storage locations and utilizes a reinforcement learning model to achieve seamless coordination and collaborative optimization of outbound and return tasks. Specifically, it includes the following steps:
[0056] Step S110: Obtain multiple transportation tasks to be executed. Each transportation task includes a defined starting point, the original target storage location, and the task execution time window.
[0057] Transportation tasks typically originate from instructions issued by the Terminal Operating System (TOS) or a higher-level scheduling system. For each transportation task, the starting point usually refers to the current location of the container, such as the receiving point under the quay crane or the pickup point in the yard; the original target storage location refers to the container storage location initially assigned by the system, such as a specific yard container area; the task execution time window specifies that vehicles must arrive and begin operations within this time frame to ensure the operational efficiency of the quay crane or yard crane. It should be understood that the task information obtained in this embodiment is not limited to the above, and may also include container attributes such as size, weight, and priority, to provide richer data support for subsequent optimization decisions.
[0058] Step S120: Based on the remaining capacity of the storage location and the operation distance threshold from the task starting point, multiple candidate storage locations are dynamically selected for each transportation task to form a set of alternative storage locations for the task, so as to expand the transportation task from a single target location to multiple candidate target locations.
[0059] In traditional scheduling models, each task, whether unloading or loading, is assigned a unique and fixed work location upon generation. This rigid binding forces vehicles to travel to that fixed location to complete the task, severely limiting the flexibility of system optimization. After a vehicle completes an unloading task, its next task's starting point is locked at the same fixed storage yard location where unloading just occurred. The system can only search for loading tasks with matching time windows near that fixed point. If there are no suitable tasks nearby or the path to that fixed point is temporarily congested, vehicles are often forced to drive empty to other areas, thus disrupting the continuous cargo handling chain. This fixed-location-based task connection is essentially a proximity matching logic. While it can reduce some empty runs, it cannot proactively build a closed-loop transportation chain naturally coupled by outbound and return tasks at the global level. This embodiment breaks this rigid binding relationship, allowing tasks to have multiple selectable target locations under certain constraints. The system monitors the remaining capacity of each storage location in real time to ensure that candidate locations have space to accommodate containers. Simultaneously, the distance between each candidate location and the task starting point is calculated, and locations within a preset threshold range are selected to avoid increasing transportation costs due to selecting locations that are too far away. By constructing a set of alternative storage locations, the originally simple point-to-point transportation task is expanded into a flexible point-to-region task, significantly expanding the task's executable space and providing the necessary spatial conditions for the sequential combination of outbound and return tasks. This greatly increases the solution space for subsequent task matching and path optimization, creating the preconditions for reducing empty-running rates.
[0060] Step S130: Based on the candidate storage location set, analyze the expanded transportation tasks within the entire shoreline area, identify outbound and return tasks that meet the time window matching and path reachability conditions as connectable task pairs, and form a candidate set for connecting outbound and return tasks.
[0061] The time window matching and path reachability conditions refer to the following: the sum of the estimated completion time of the preceding task and the empty transfer time from the end position of the preceding task to the starting point of the subsequent task does not exceed the deadline of the execution time window of the subsequent task, and there exists a feasible path connecting the two task positions.
[0062] Within the entire shoreline, the system analyzes the flexibly expanded task set to identify task pairs that can be executed consecutively. For example, if an AGV (Automated Guided Vehicle) completes its outbound unloading task, and a return loading task is about to begin near its end point, and the AGV can reach the starting point of that task before its time window expires, then these two tasks are identified as a connectable task pair. The time window matching condition here ensures that the vehicle has sufficient time to move to the subsequent task location after completing the preceding task, without delaying the execution of the subsequent task. The path reachability condition ensures that there is a physically accessible road between the two task locations, and that there is no severe congestion. By forming a candidate set of outbound and return task connections, the system plans a continuous outbound-return operation chain for the vehicle, fundamentally reducing the situation of vehicles traveling empty.
[0063] Step S140: The vehicle transportation task scheduling process is modeled as a Markov decision process, and a reinforcement learning model based on the Actor-Critic architecture is used for collaborative optimization decision-making.
[0064] The reinforcement learning model outputs actions to the vehicle based on the current system state, which includes the tasks to be executed, the vehicle state, and the job network state. These actions include selecting a task from the tasks to be executed, selecting a storage location from the set of candidate storage locations for that task, and planning a driving path. The reinforcement learning model updates its policy by maximizing the expected cumulative reward, and its reward function is a multi-objective reward function that comprehensively optimizes the vehicle's empty driving mileage, job waiting time, and task delay.
[0065] The port transportation environment is highly dynamic and uncertain, making it difficult for traditional rule-based algorithms to handle complex real-time changes. This embodiment employs reinforcement learning to transform the scheduling problem into a sequential decision-making problem. Markov Decision Process (MDP) modeling maps system states (e.g., current tasks, vehicle locations, road conditions) to vehicle actions (e.g., choosing a task, going to a storage yard, taking a route). The Actor-Critic architecture combines the advantages of policy gradient and value function approximation. The Actor network generates specific scheduling actions, while the Critic network evaluates the quality of the current state. Their collaborative training enables the model to quickly converge to the optimal policy. The design of the reward function directly determines the model's behavioral orientation. This embodiment comprehensively considers three key indicators: empty mileage, waiting time, and delay, guiding the model to minimize empty mileage and waiting time while ensuring operational efficiency, thereby optimizing overall transportation efficiency.
[0066] Step S150: Based on the optimization decision results, generate a vehicle transportation task scheduling scheme that includes task sequence, selected storage location and driving route, and send it to the corresponding vehicle for execution; during the execution process, when deviation or environmental change is detected, dynamically adjust the connection relationship and driving route of subsequent tasks.
[0067] After decision-making by the reinforcement learning model, the system generates specific execution plans and issues them to AGVs or container trucks for execution. However, various disturbances inevitably occur during actual operations, such as vehicle malfunctions, temporary congestion in the yard, and changes in the rhythm of quay crane operations. This embodiment introduces a dynamic adjustment mechanism to monitor the execution progress and environmental status in real time. Once the deviation between the actual state and the expected plan exceeds a preset threshold, the system will re-trigger the decision-making process based on the current real-time status, calculate new task connections and travel paths, and update and issue them to the vehicles in a timely manner. This closed-loop feedback control mechanism ensures the robustness and continuity of the transportation chain in the face of emergencies, avoiding the interruption of the entire transportation chain due to local anomalies.
[0068] This embodiment realizes a complete closed loop from task acquisition, flexible expansion, connection identification to optimization decision-making and dynamic execution, effectively solving the problems of high empty-load rate and low resource utilization in port horizontal transportation, and significantly improving the overall operational efficiency of the entire shoreline.
[0069] In one embodiment, the present invention further describes in detail the specific mathematical principles and implementation process of task modeling and connection recognition.
[0070] First, regarding the construction of the operation network graph. Before obtaining multiple transportation tasks to be executed, the process also includes abstracting the terminal operation area into a directed operation network graph: .in, It is a set of work nodes, including quay crane work points, container yard crane work points, vehicle waiting points, and road intersections; Let be the set of directed paths that a vehicle can travel on, and each path... The association includes attribute vectors representing path length, estimated travel time, and congestion level. (See reference...) Figure 2 The quay crane operation area and the yard operation area are connected by a road network, nodes The nodes represent specific work locations, and the connections between nodes represent passable paths for vehicles. By constructing this directed graph, the complex physical port environment is transformed into a computer-processable topology, providing a mathematical foundation for subsequent path planning and task matching. It should be understood that the set of nodes... It is not limited to the work points listed above, but can also include auxiliary facility nodes such as charging stations and maintenance stations. The attribute vector of the edge can further include parameters such as road width and speed limit to more accurately simulate the actual working environment.
[0071] Secondly, regarding the dynamic selection of candidate storage locations. For any transportation task... Its original attributes are ,in As the starting point of the task, This is the original target storage location. Define the task execution time window. Construct a set of alternative heap locations for it. Defined as: ,in, For the set of job nodes, Indicates the storage location The remaining capacity, Indicates starting from the origin To the stacking location distance, The preset maximum working distance threshold; by introducing The transportation task Expand to .
[0072] For the starting point of the task The system does not only lock onto the original target. Instead of searching for nodes that meet the criteria, this design directly overcomes the fundamental limitation of traditional scheduling modes where the task target location is singular and fixed. In the traditional mode, because the task target location is uniquely determined, the system has an extremely narrow selection space when searching for multiple consecutive, sequential tasks for a vehicle. A single fixed location greatly limits the probability of sequential combinations being discovered and successfully matched. This embodiment constructs a set of alternative heap locations. This breaks the constraint of a single, fixed position. In the formula... The conditions ensure that the candidate location has the physical space to receive containers, avoiding the situation where vehicles cannot unload containers after arriving because the yard is full. The conditions limit the search scope, excluding locations that are too far away and would result in excessively high transportation costs, thus achieving a balance between search efficiency and transportation benefits. In this way, the originally rigid point-to-point task is expanded into a flexible point-to-region task, significantly expanding the task execution space and providing the necessary spatial conditions for the sequential combination of outbound and return tasks. This greatly increases the solution space for subsequent task matching and path optimization, breaking through the limitations of traditional nearest-neighbor matching logic.
[0073] Finally, regarding the identification of connectable task pairs: Outbound and return tasks that meet the time window matching and path reachability conditions are identified as connectable task pairs. Specifically, this involves defining the tasks. With the task It can be connected when the following conditions are met: And there exists a task End location to task starting point Feasible path ;in, For the task The estimated completion time, For vehicles from the mission Drive from the end point to the starting point Required empty transfer time; all tasks that meet the above conditions Constitutes the candidate set for outbound and return task connections . Reference Figure 3 and Figure 4 , Figure 3 Showing the outbound mission With the return mission In spatial topological relationships, dashed arrows indicate empty-running transfer processes; Figure 4 This demonstrates the matching logic for the time window. (Formula) This is a key constraint for successful connection. It requires that the end time of the preceding task plus the empty transfer time must be less than or equal to the latest start time of the subsequent task. This ensures that the vehicle has enough time to move between the two tasks without delaying the subsequent task. Simultaneously, path reachability conditions... This ensures spatial connectivity and eliminates inaccessibility caused by road blockages or one-way restrictions. It is particularly important to emphasize that, since the preceding steps have already constructed a set of alternative storage locations for each task, the task... The ending position is no longer locked to a single fixed location, but can be selected from multiple candidate locations. This design allows the system to search for matching opportunities in a wider range of spaces when identifying connectable task pairs, rather than being limited to rigid nearest-neighbor matching in the traditional model. Through the above dual-condition screening, the system can accurately identify task pairs that can form a re-departure-return closed loop, actively constructing a closed-loop transportation chain formed by the natural coupling of outbound and return tasks, providing high-quality input data for subsequent collaborative optimization decisions.
[0074] In one embodiment, the construction and decision-making process of the reinforcement learning model is described in detail.
[0075] First, regarding the modeling of Markov Decision Processes (MDPs). Modeling the vehicle transportation task scheduling process as a Markov Decision Process specifically includes: at the decision-making time... , system status Defined as: ,in This is a set of transportation tasks to be performed. For the set of vehicle states, This refers to the current network status. Specifically, it refers to the system status. It is a holographic description of the complex operating environment of the port. It contains attribute information of all tasks to be executed at the current moment, such as start point, end point, time window, priority, etc., reflecting the dynamic changes of the task pool; It contains information such as the location, load status, and battery level of all vehicles, reflecting the real-time distribution of transportation capacity resources; This includes information on road network congestion and road closures, reflecting the constraints of the operating environment. By combining these three elements, the model can comprehensively perceive the current environment, providing accurate input for subsequent decision-making. The reinforcement learning model outputs actions to the vehicle. Defined as: ,in From The selected transportation task To the task alternative heap location set The selected storage location, The planned vehicle travel route.
[0076] action It's a complex action that not only determines where the vehicle goes (selecting the task) It also determined how to select the heap location. and path This definition of a composite action space allows the model to simultaneously solve the three coupled problems of task allocation, location selection, and path planning within a single decision step. This avoids error accumulation caused by hierarchical decision-making and improves decision consistency and global optimality. It is particularly important to emphasize that, since the preceding steps have already constructed a set of alternative storage locations for each task... The position selection dimension in the action space is significantly expanded. In traditional scheduling models, the target location of the task is uniquely determined, and the decision space is limited to choosing which task and which path to take. However, this embodiment incorporates position selection into the decision variable, enabling the model to search for the optimal solution in a larger solution space. This expanded decision space gives the model the ability to proactively construct closed-loop transportation chains, rather than being limited to the rigid proximity matching logic of the traditional model.
[0077] Secondly, regarding the design of the multi-objective reward function. The multi-objective reward function is specifically defined as follows: ,in, This indicates the vehicle's empty mileage within the decision-making step. This indicates the vehicle's waiting time for operation. Indicates the amount of task delay. These are preset weight coefficients. Specifically, the reward function is the core mechanism guiding the optimization direction of the reinforcement learning model. This embodiment uses negative rewards (penalties) to guide the model to avoid undesirable behaviors. The item is used to penalize vehicles for unloaded driving distance, which directly corresponds to the goals of reducing energy consumption and improving capacity utilization. This item is used to penalize vehicles for unnecessary waiting time at work sites, corresponding to the goal of improving work efficiency; This item is used to penalize delays in tasks exceeding deadlines, corresponding to targets for ensuring on-time shipping and operational efficiency. Weighting coefficients. The settings allow port managers to adjust and optimize their strategies based on actual operational needs. For example, during periods of tight shipping schedules, the focus can be appropriately increased. The value of prioritizes timely task completion in the model; when pursuing low-cost operations, it can be increased. The value of guides the model to prioritize reducing empty runs. The reinforcement learning model maximizes the expected cumulative reward. This allows the model to update its decision-making strategy. This means that the model does not only focus on the reward at the current moment, but learns a strategy that can balance current benefits and future impacts by maximizing the expected value of long-term gains, thereby optimizing the overall transportation efficiency.
[0078] Finally, regarding collaborative optimization decision-making based on the Actor-Critic architecture, a reinforcement learning model based on the Actor-Critic architecture is used for collaborative optimization decision-making. Specifically, each transport vehicle is modeled as an independent agent, and each agent is equipped with a neural network model based on the Actor-Critic architecture. (Refer to...) Figure 7 As shown, the Actor network is based on the current system state. Output Action The probability distribution is used to select tasks, storage locations, and paths. Specifically, the Actor network (policy network) acts as the decision-maker, receiving information from the environment state. As input, the neural network extracts and maps features, outputting an action probability distribution. For task selection and heap location selection, the output is typically a discrete probability distribution; for path selection, it may output continuous path parameters or discrete path index probabilities. During training, the model samples actions based on probabilities to explore the environment; during application, it selects the action with the highest probability as the optimal decision. The Critic network evaluates its state. The value of the Critic network is used to guide the policy updates of the Actor network. The Critic network acts as an evaluator; it does not directly output actions but instead estimates the current state. value This refers to predicting the expected cumulative reward that can be obtained from the current state. By comparing the actual reward obtained with the Critic's predicted value, the time difference error can be calculated, which reflects the quality of the action. This is achieved through the aforementioned multi-objective reward function. Backpropagation is used to collaboratively optimize the strategies of all vehicle agents to achieve optimal global transportation efficiency. In a multi-agent environment, each vehicle's decisions not only affect itself but also influence other vehicles by changing environmental states (such as path occupancy and yard occupancy). This embodiment uses a shared Critic network or a centralized training mechanism to enable each vehicle agent to consider the impact on other vehicles while optimizing its own strategy, thereby achieving global-level collaborative optimization and avoiding deadlock or congestion caused by local optima.
[0079] In one embodiment, dynamically adjusting the connection between subsequent tasks and the travel path specifically includes: triggering dynamic adjustments when the vehicle's execution progress deviates from expectations, the target storage location capacity changes abruptly, or the critical path of the work network becomes congested; based on the current system state... By using reinforcement learning models to re-determine the affected vehicles, new task connections and travel routes are calculated, and the task scheduling schemes issued to the corresponding vehicles are updated to maintain the continuity of the transportation chain.
[0080] Specifically, the port operation environment is highly dynamic and uncertain, and static pre-set plans are often insufficient to cope with emergencies. This embodiment sets up multiple trigger conditions to activate a dynamic adjustment mechanism to ensure the robustness of the system. Vehicle execution progress deviating from expectations refers to a situation where a vehicle's actual arrival time is significantly delayed from the planned time due to malfunctions, obstacle avoidance, or traffic flow interference, potentially delaying subsequent tasks or even causing a break in the "return trip" chain. Sudden capacity changes at the target storage location refer to a situation where the originally selected container yard has insufficient remaining capacity due to a sudden surge in unloading operations, making it unable to receive containers for the current task. If adjustments are not made in time, vehicles will be unable to unload upon arrival and will be stuck waiting. Critical path congestion in the operation network refers to a situation where the main road connecting the quay crane and the container yard is blocked due to excessive traffic density or an accident, making the original route infeasible or causing excessively long travel times, severely impacting transportation efficiency.
[0081] When any of the above situations are detected by the system (e.g., through vehicle status reporting, yard sensor data, or road network monitoring data), the system will base its decisions on the current system state. Then, the pre-trained reinforcement learning model is used for re-inference. At this point, the system state... Updated to the latest status including exception information, such as the set of tasks to be executed. The system may have removed expired tasks; job network status. The image may indicate congested road sections. The model can quickly assess affected vehicles and their associated tasks, and output new actions. This means reselecting the task, storage location, or path. For example, if the target storage yard capacity is insufficient, the model will select from the set of alternative storage locations. The system will reselect a location with remaining capacity; if the original route is congested, the model will plan a new route that avoids the congested section. This dynamic adjustment mechanism is not a simple local correction, but a re-optimization based on the global state. By responding to environmental disturbances in real time, the system can maintain the continuity of the transportation chain and avoid the interruption or idleness of the entire transportation chain due to local anomalies.
[0082] It should be understood that the triggering conditions are not limited to the situations listed above, but may also include equipment failure (such as quay crane failure), extreme weather effects, etc. The adjustment strategy can also be divided into local fine-tuning and global replanning according to the degree of disturbance; this embodiment does not impose such limitations. Through this closed-loop feedback control mechanism, the present invention can effectively cope with the complex and ever-changing operating environment of ports and ensure the efficient execution of transportation tasks.
[0083] In a specific embodiment, to verify the effectiveness and superiority of the method of the present invention in a real engineering environment, the above-described spatiotemporal task orchestration method for port transportation was applied to the full-shore multi-quay crane-yard horizontal transportation system of a large-scale automated container terminal. (Refer to...) Figure 5 The system is equipped with a central server as the processing core, which communicates with the Terminal Operating System (TOS), vehicle controllers and road network infrastructure.
[0084] Specifically, the wharf has a shoreline of approximately 2.35 km and seven deep-water berths. In terms of engineering configuration, the site is equipped with 26 dual-trolley quay cranes, 61 yard blocks, approximately 120 automated guided vehicles (AGVs), and approximately 130 AGVs. The AGVs are lithium-battery driven, with a rated load of 65 tons and a maximum operating speed of approximately 6 m / s. In terms of control strategy, the AGVs follow a standard operating rhythm of deceleration-queueing-alignment-transfer-departure when entering and exiting quay crane work positions and yard handover positions. The system interfaces with the TOS (Transportation System), quay crane control system, yard AGV system, and V2X (Vehicle-to-Everything) infrastructure to achieve real-time linkage between task, equipment, and network status.
[0085] To ensure the real-time nature of decision-making and execution, this embodiment sets the time base for scheduling control as follows: global decision step size Δt = 2s, road network weight refresh cycle 1s, task set refresh cycle 2s, vehicle status reporting cycle 0.5s, and sets immediate reporting for key triggering events, such as the completion of container unloading by the quay crane, the opening of the yard handover window, and road blockage or restoration.
[0086] Under the aforementioned hardware environment and time base, the specific execution process of this embodiment is as follows:
[0087] First, the system performs the steps of constructing the operation network and status awareness. The quay crane operation points, the seaside handover points of the storage yard, the forward waiting area, and port intersections are uniformly modeled as operation nodes. The lanes between nodes serve as directed paths, forming a horizontal transportation operation network covering the entire shoreline. Each path is configured with distance, real-time traffic latency, and concurrent capacity parameters. The maximum allowed concurrent number of AGVs on a single main path is 6-8 vehicles. The network status is updated every 1 second to reflect the current road network capacity.
[0088] Secondly, the system executes a flexible expansion step for transportation tasks. The system obtains real-time information on quay crane unloading and loading transportation tasks from the TOS (Transportation System). For unloading tasks, while retaining the original storage location, 3-6 optional yard blocks are dynamically generated for each task based on yard load and distance thresholds; the expansion conditions are that the additional travel distance does not exceed approximately 600 meters, and the target yard has handover capability within the next 5 minutes. Loading tasks expand container pickup locations in the same way, providing space for task rearrangement.
[0089] Subsequently, the system performs a task connection identification step. The system automatically identifies consecutively executable unloading and loading task combinations in the task pool. After the AGV completes unloading, if a loading task with a matching time window and minimal route detour exists in a nearby yard, the two tasks are bound together as a "return-return" transport chain, preventing the vehicle from returning to the quay crane empty. In actual operation, approximately 45% to 55% of tasks can form effective connections during peak hours.
[0090] Next, the system executes a task orchestration decision-making step based on the Actor-Critic architecture. The scheduling system employs a multi-agent reinforcement learning model with an Actor-Critic architecture, where each AGV participates in the decision-making process as an independent agent. Model inputs include the task time window, candidate locations in the storage yard, the real-time location of the AGV, and the road network occupancy status, while simultaneously incorporating the quay crane's operating cycle time. The Actor network outputs the AGV's task selection, storage location, and travel path, while the Critic network evaluates the long-term benefits of the decisions. The model is trained offline using historical operation data and performs real-time inference during field operation, supporting rolling adjustments to task order and path selection under operational disturbances.
[0091] Finally, the system performs transportation task scheduling and execution feedback steps. The scheduling system issues the current task and the next connecting task to the AGV every 2 seconds. After unloading the container, the AGV directly drives to the matching container yard, forming a continuous container loading route. When yard congestion or changes in quay crane cycle time are detected, the system immediately adjusts the storage position or replaces the connecting task to prevent vehicles from entering a waiting or empty running state.
[0092] After continuous production testing and verification, statistical results show that after applying the method of this invention, the proportion of AGVs operating without load decreased to 6%–9%, the average waiting time per container decreased to 55–65 seconds, the average waiting time for transport vehicles on quay cranes decreased by 18%–25%, and the number of road network congestion triggers during peak hours decreased by approximately 30%. These data indicate that this invention fundamentally breaks through the rigid binding relationship between tasks and work locations in traditional scheduling models. Through flexible expansion of transport tasks and seamless arrangement of outbound and return trips, it breaks the limitations of the traditional "nearest matching" logic, proactively constructing a closed-loop transport chain formed by the natural coupling of outbound and return tasks. This effectively realizes continuous container-carrying operation of AGVs during the transport-loading and unloading process, significantly reducing idle time and waiting, and improving the operational efficiency and stability of the horizontal transport system at the fully automated quay terminal.
[0093] The spatiotemporal task scheduling apparatus for port transportation provided by the present invention will be described below. The spatiotemporal task scheduling apparatus for port transportation described below can be referred to in correspondence with the spatiotemporal task scheduling method for port transportation described above.
[0094] like Figure 6 As shown, in one embodiment, a spatiotemporal task orchestration device for port transportation includes a transportation task acquisition module 610, a candidate storage location screening module 620, a connectable task pair identification module 630, a decision process modeling module 640, and a task orchestration scheme execution module 650.
[0095] The transportation task acquisition module 610 is used to acquire multiple transportation tasks to be executed. Each transportation task includes a defined starting point, the original target storage location, and the task execution time window.
[0096] The candidate storage location filtering module 620 is used to dynamically filter multiple candidate storage locations for each transportation task based on the remaining capacity of the storage location and the operation distance threshold from the task starting point, forming a set of alternative storage locations for the task, so as to expand the transportation task from a single target location to multiple candidate target locations.
[0097] The connectable task pair identification module 630 is used to analyze the expanded transportation tasks within the entire shoreline range based on the candidate storage location set, identify outbound and return tasks that meet the time window matching and path reachability conditions as connectable task pairs, and form a candidate set for outbound and return task connection; wherein, the time window matching and path reachability conditions refer to: the sum of the estimated completion time of the preceding task and the empty transfer time from the end position of the task to the starting point of the subsequent task does not exceed the deadline of the execution time window of the subsequent task, and there is a feasible path connecting the two task positions;
[0098] The decision process modeling module 640 is used to model the vehicle transportation task scheduling process as a Markov decision process and adopts a reinforcement learning model based on the Actor-Critic architecture for collaborative optimization decision-making. The reinforcement learning model outputs actions to the vehicle based on the current system state, which includes the tasks to be executed, the vehicle state, and the operation network state. These actions include selecting a task from the tasks to be executed, selecting a storage location from the set of candidate storage locations for the task, and planning a driving path. The reinforcement learning model updates its strategy by maximizing the expected cumulative reward. Its reward function is a multi-objective reward function that comprehensively optimizes the vehicle's empty driving mileage, operation waiting time, and task delay.
[0099] The task orchestration scheme execution module 650 is used to generate a vehicle transportation task orchestration scheme containing task sequences, selected storage locations and driving routes based on the optimization decision results, and distribute it to the corresponding vehicles for execution; during the execution process, when deviations or environmental changes are detected, the connection relationship and driving route of subsequent tasks are dynamically adjusted.
[0100] Figure 7 This example illustrates a schematic diagram of the physical structure of an electronic device, which can be a smart terminal. Its internal structure diagram can be as follows: Figure 7 As shown, the electronic device includes a processor, memory, and network interface connected via a system bus. The processor provides computing and control capabilities. The memory includes a non-volatile storage medium and internal memory. The non-volatile storage medium stores an operating system and computer programs. The internal memory provides an environment for the operation of the operating system and computer programs in the non-volatile storage medium. The network interface is used to communicate with external terminals via a network connection. When the computer program is executed by the processor, it implements the spatiotemporal task scheduling method for port transportation according to any of the above embodiments.
[0101] Those skilled in the art will understand that Figure 7 The structure shown is merely a block diagram of a portion of the structure related to the present invention and does not constitute a limitation on the electronic device to which the present invention is applied. A specific electronic device may include more or fewer components than those shown in the figure, or combine certain components, or have different component arrangements.
[0102] On the other hand, the present invention also provides a computer storage medium storing a computer program, which, when executed by a processor, implements the spatiotemporal task scheduling method for port transportation according to any of the above embodiments.
[0103] In another aspect, a computer program product or computer program is provided, which includes computer instructions stored in a computer-readable storage medium. A processor of an electronic device reads the computer instructions from the computer-readable storage medium, and when the processor executes the computer instructions, it implements the spatiotemporal task scheduling method for port transportation according to any of the above embodiments.
[0104] Those skilled in the art will understand that all or part of the processes in the methods of the above embodiments can be implemented by a computer program instructing related hardware. This computer program can be stored in a non-volatile computer-readable storage medium. When executed, the computer program can include the processes of the embodiments of the above methods. Any references to memory, storage, databases, or other media used in the embodiments provided by this invention can include non-volatile and / or volatile memory. Non-volatile memory may include read-only memory (ROM), programmable ROM (PROM), electrically programmable ROM (EPROM), electrically erasable programmable ROM (EEPROM), or flash memory. Volatile memory may include random access memory (RAM) or external cache memory.
[0105] By way of illustration and not limitation, RAM is available in a variety of forms, such as static RAM (SRAM), dynamic RAM (DRAM), synchronous DRAM (SDRAM), double data rate SDRAM (DDRSDRAM), enhanced SDRAM (ESDRAM), synchronous link DRAM (SLDRAM), RAMbus direct RAM (RDRAM), direct memory bus dynamic RAM (DRDRAM), and memory bus dynamic RAM (RDRAM), etc.
[0106] The technical features of the above embodiments can be combined in any way. For the sake of brevity, not all possible combinations of the technical features in the above embodiments are described. However, as long as there is no contradiction in the combination of these technical features, they should be considered to be within the scope of this specification.
[0107] The above-described embodiments are merely illustrative of several implementations of the present invention, and while the descriptions are specific and detailed, they should not be construed as limiting the scope of the invention. It should be noted that those skilled in the art can make various modifications and improvements without departing from the concept of the present invention, and these modifications and improvements all fall within the scope of protection of the present invention. Therefore, the scope of protection of the present invention should be determined by the appended claims.
Claims
1. A spatiotemporal task orchestration method for port transportation, characterized in that, The method includes: Obtain multiple transportation tasks to be executed, each of which includes a defined starting point, the original target storage location, and the task execution time window; Based on the remaining capacity of the storage location and the operation distance threshold from the task start point, multiple candidate storage locations are dynamically selected for each transportation task to form a set of alternative storage locations for the task, so as to expand the transportation task from a single target location to multiple candidate target locations. Based on the set of candidate storage locations, the expanded transportation tasks are analyzed across the entire shoreline. Outbound and return tasks that meet the time window matching and path reachability conditions are identified as connectable task pairs, forming a candidate set for connecting outbound and return tasks. The time window matching and path reachability conditions refer to the following: the sum of the estimated completion time of the preceding task and the empty transfer time from the end position of the preceding task to the starting point of the subsequent task does not exceed the deadline of the execution time window of the subsequent task, and there exists a feasible path connecting the two task locations. The vehicle transportation task scheduling process is modeled as a Markov decision process, and a reinforcement learning model based on the Actor-Critic architecture is used for collaborative optimization decision-making. The reinforcement learning model outputs actions for the vehicles based on the current system state, which includes the tasks to be executed, vehicle status, and operation network status. The actions include selecting a task from the tasks to be executed, selecting a storage location from the set of candidate storage locations for the task, and planning a driving path. The reinforcement learning model updates its strategy by maximizing the expected cumulative reward, and its reward function is a multi-objective reward function that comprehensively optimizes the vehicle's empty driving mileage, operation waiting time, and task delay. Based on the optimization decision results, a vehicle transportation task scheduling scheme containing task sequences, selected storage locations, and driving routes is generated and issued to the corresponding vehicles for execution. During the execution process, when deviations or environmental changes are detected, the connection relationship and driving routes of subsequent tasks are dynamically adjusted.
2. The spatiotemporal task scheduling method for port transportation according to claim 1, characterized in that, The process of acquiring multiple transportation tasks to be executed also includes, prior to: Abstract the dock operation area into a directed operation network diagram: in, It is a set of work nodes, including quay crane work points, container yard crane work points, vehicle waiting points, and road intersections; Let be the set of directed paths that a vehicle can travel on, and each path... The association includes attribute vectors representing path length, estimated travel time, and congestion coefficient.
3. The spatiotemporal task scheduling method for port transportation according to claim 1, characterized in that, The process of dynamically selecting multiple candidate storage locations for each transportation task specifically includes: For any transportation task Its original attributes are ,in As the starting point of the task, This is the original target storage location. For task execution time window; Construct a set of alternative heap locations for it Defined as: in, For the set of job nodes, Indicates the storage location The remaining capacity, Indicates starting from the origin To the stacking location distance, The preset maximum working distance threshold; by introducing The transportation task Expand to .
4. The spatiotemporal task scheduling method for port transportation according to claim 3, characterized in that, The process of identifying outbound and return tasks that meet the time window matching and path reachability conditions as connectable task pairs specifically involves: Define task With the task It can be connected when the following conditions are met: And there exists a task End location to task starting point Feasible path ;in, For the task The estimated completion time, For vehicles from the mission Drive from the end point to the starting point Required empty transfer time; all tasks that meet the above conditions Constitutes the candidate set for outbound and return task connections .
5. The spatiotemporal task scheduling method for port transportation according to claim 1, characterized in that, The process of modeling the vehicle transportation task scheduling process as a Markov decision process specifically includes: At the moment of decision , system status Defined as: ,in This is a set of transportation tasks to be performed. For the set of vehicle states, The current network state; the reinforcement learning model represents the action output by the vehicle. Defined as: ,in From The selected transportation task To the task alternative heap location set The selected storage location, The planned vehicle travel route.
6. The spatiotemporal task scheduling method for port transportation according to claim 5, characterized in that, The multi-objective reward function is specifically defined as follows: in, This indicates the vehicle's empty mileage within the decision-making step. This indicates the vehicle's waiting time for operation. Indicates the amount of task delay. The weights are preset; the reinforcement learning model maximizes the expected cumulative reward. To update its decision-making strategy.
7. The spatiotemporal task scheduling method for port transportation according to claim 6, characterized in that, The collaborative optimization decision-making using a reinforcement learning model based on the Actor-Critic architecture is specifically as follows: Each transport vehicle is modeled as an independent intelligent agent, and each agent is equipped with a neural network model based on the Actor-Critic architecture; the Actor network adjusts according to the current system state. Output Action The probability distribution is used to select tasks, heap locations, and paths; the Critic network evaluates the state. The value of this is used to guide policy updates in the Actor network; Through the multi-objective reward function The reverse propagation is used to collaboratively optimize the strategies of all vehicle agents in order to achieve optimal global transportation efficiency.
8. The spatiotemporal task orchestration method for port transportation according to claim 1, characterized in that, The dynamic adjustment of the connection relationship and travel path of subsequent tasks specifically includes: Dynamic adjustments are triggered when vehicle execution progress deviates from expectations, target storage location capacity changes abruptly, or critical path congestion occurs in the work network. Based on the current system state The reinforcement learning model re-decides decisions for affected vehicles, calculates new task connections and travel routes, and updates the task scheduling schemes issued to the corresponding vehicles to maintain the continuity of the transportation chain.
9. A spatiotemporal task scheduling device for port transportation, characterized in that, The device includes: The transportation task acquisition module is used to acquire multiple transportation tasks to be executed. Each transportation task includes a defined starting point, the original target storage location, and the task execution time window. The candidate storage location filtering module is used to dynamically filter multiple candidate storage locations for each transportation task based on the remaining capacity of the storage location and the operation distance threshold from the task starting point, forming a set of alternative storage locations for the task, so as to expand the transportation task from a single target location to multiple candidate target locations. The connectable task pair identification module is used to analyze the expanded transportation tasks within the entire shoreline based on the candidate storage location set, identify outbound and return tasks that meet the time window matching and path reachability conditions as connectable task pairs, and form a candidate set for connecting outbound and return tasks; wherein, the time window matching and path reachability conditions refer to: the sum of the estimated completion time of the preceding task and the empty transfer time from the end position of the preceding task to the starting point of the subsequent task does not exceed the deadline of the execution time window of the subsequent task, and there is a feasible path connecting the two task positions; The decision-making process modeling module is used to model the vehicle transportation task scheduling process as a Markov decision process and adopts a reinforcement learning model based on the Actor-Critic architecture for collaborative optimization decision-making. The reinforcement learning model outputs actions to the vehicle based on the current system state, which includes the tasks to be executed, the vehicle state, and the operation network state. The actions include selecting a task from the tasks to be executed, selecting a storage location from the set of candidate storage locations for the task, and planning a driving path. The reinforcement learning model updates the strategy by maximizing the expected cumulative reward, and its reward function is a multi-objective reward function that comprehensively optimizes the vehicle's empty driving mileage, operation waiting time, and task delay. The task orchestration scheme execution module is used to generate a vehicle transportation task orchestration scheme containing task sequences, selected storage locations, and driving routes based on the optimization decision results, and then distribute it to the corresponding vehicles for execution. During the execution process, when deviations or environmental changes are detected, the module dynamically adjusts the connection relationship and driving route of subsequent tasks.
10. An electronic device comprising a memory and a processor, wherein the memory stores a computer program, characterized in that, When the processor executes the computer program, it implements the spatiotemporal task orchestration method for port transportation as described in any one of claims 1 to 8.