Freight demand prediction and dispatch optimization method based on artificial intelligence
By integrating adaptive edge weight calculation based on geographical proximity and business relevance strength with a spatiotemporal graph attention network, and combining deep reinforcement learning strategies, the uncertainty of prediction is quantified and the scheduling scheme is adaptively adjusted. This solves the problems of insufficient capacity and insufficient response sensitivity in existing freight demand prediction and scheduling, and improves the robustness and real-time response capability of the scheduling scheme.
Patent Information
- Authority / Receiving Office
- CN · China
- Patent Type
- Applications(China)
- Current Assignee / Owner
- HAIHEZI SUPPLY CHAIN (SHENZHEN) CO LTD
- Filing Date
- 2026-05-07
- Publication Date
- 2026-06-19
AI Technical Summary
Existing freight demand forecasting methods struggle to simultaneously model inter-regional spatial correlations and temporal dynamic dependencies, and scheduling schemes lack robustness against demand fluctuations, leading to insufficient transport capacity and decreased service quality.
An adaptive edge weight calculation that integrates geographical proximity and business relevance strength is adopted, combined with a spatiotemporal graph attention network and a deep reinforcement learning strategy, to quantify and predict uncertainty and adaptively adjust the scheduling scheme.
By quantifying and predicting uncertainties and using adaptive triggering scheduling, the problems of insufficient capacity and response sensitivity are solved, and the robustness and real-time response capability of the scheduling scheme are improved.
Smart Images

Figure CN122243119A_ABST
Abstract
Description
Technical Field
[0001] This invention relates to the fields of artificial intelligence and logistics scheduling technology, and more specifically, to a method for predicting and optimizing freight demand based on artificial intelligence. Background Technology
[0002] The freight logistics industry faces complex fluctuations in demand across time and space. Order demand in different regions and at different times is affected by multiple factors such as weather, holidays, and promotional activities, exhibiting significant non-stationarity and inter-regional correlation. Accurately predicting future demand in various regions and developing reasonable vehicle dispatching plans accordingly is key to improving logistics operational efficiency and customer service levels.
[0003] Existing freight demand forecasting methods mostly employ time series models or traditional machine learning methods, which struggle to simultaneously model spatial correlations and temporal dynamic dependencies between regions. Furthermore, they typically only output demand point estimates, failing to quantify the uncertainty of the forecast. Regarding scheduling optimization, existing methods largely construct optimization models based on deterministic demand assumptions, failing to incorporate forecast uncertainty into the objective function. This leads to insufficient capacity and decreased service quality when demand fluctuates upwards.
[0004] Regarding dynamic response, existing systems mostly use fixed time intervals to trigger rescheduling, failing to adaptively adjust the trigger sensitivity according to the degree of demand fluctuation in each region. This results in excessive intervention in stable regions and untimely response to highly volatile regions. These shortcomings make it difficult for existing technologies to meet the needs of actual freight scenarios in terms of demand forecasting accuracy, scheduling robustness, and real-time dynamic response capabilities. Summary of the Invention
[0005] This invention provides an artificial intelligence-based method for freight demand forecasting and scheduling optimization, which solves the technical problems in related technologies such as the inability to effectively quantify uncertainty in freight demand forecasting, the lack of robustness of scheduling schemes to demand fluctuations, and the difficulty of static scheduling to respond to real-time demand changes.
[0006] This invention provides a method for freight demand forecasting and scheduling optimization based on artificial intelligence, comprising the following steps: S1: Obtain historical order records, historical weather records, holiday information, and regional static attribute data. Use adaptive edge weight calculation that integrates geographical proximity and business correlation strength, and static and dynamic historical demand features to construct a spatiotemporal feature map. S2, input the spatiotemporal feature map into the spatiotemporal graph attention network, train it using a multi-task learning framework that combines gated fusion spatiotemporal representation with learnable uncertainty weights for joint loss, and output the mean and standard deviation of future demand for each region. S3, based on the mean and standard deviation of demand in each region, uses the upper bound of demand quantile to quantify the risk penalty of capacity gap and constructs an objective function by weighting it with transportation costs, and solves it by adaptive large neighborhood search to generate an initial scheduling scheme; S4 collects real-time orders, vehicle status, and predicted standard deviation obtained from real-time inference during the execution of the initial scheduling scheme. The predicted standard deviation is used as an adaptive trigger threshold and incorporated into the state vector. A pre-trained deep reinforcement learning policy network is used to output the dynamically adjusted scheduling scheme.
[0007] In a preferred embodiment, S1 includes: Calculate the Euclidean distance matrix between regions based on the coordinates of the center point of each region. Establish initial edge connections for region pairs whose distance is less than the preset geographical proximity distance threshold. Define geographical proximity as a negative exponential function of distance and normalize the outgoing edge proximity of each node. Statistically count the order flow between any region pairs from historical orders. Divide the total number of historical orders from region i to region j by the total number of historical orders from region i to obtain the business association strength. The geographical proximity and business association strength are weighted and combined. The comprehensive edge weight is equal to the geographical weight coefficient multiplied by the normalized geographical proximity plus the business weight coefficient multiplied by the business association strength. The sum of the two coefficients is one. The comprehensive edge weight matrix is threshold filtered, and each node retains no more than the maximum number of outgoing edges with the maximum weight, which is then normalized again to obtain the adjacency matrix A.
[0008] In a preferred embodiment, S1 further includes: For static features, the region type is encoded using one-hot encoding, and the numerical features of population density, number of commercial outlets, and region area are standardized using z-scores. The one-hot encoded vector is then concatenated with the standardized numerical features to form a static feature vector. For dynamic features, the day of the week, month, hour, holiday, and weekday time attributes are extracted. The hour is encoded using sine and cosine periodicity. In the weather features, temperature and precipitation are normalized, and the weather type is encoded using one-hot encoding. The time attribute features are then concatenated with the weather features to form a dynamic feature vector. For each region at each time step, the static feature vector, dynamic feature vector, and historical demand value are sequentially concatenated to form the complete feature vector of the node at the corresponding time step. The sliding window method is used to construct training samples, and the data of a historical continuous time window with a preset time window length threshold is selected as the input sequence. All node feature vector sequences are combined with the adjacency matrix A to construct a complete spatiotemporal feature map.
[0009] In a preferred embodiment, S2 includes: The spatial encoding module processes spatial dimensional information propagation based on the graph attention mechanism: for the target node i and the neighboring node j, their respective feature vectors are mapped to the hidden dimension space through a learnable linear transformation matrix W; the attention score is calculated using an additive attention mechanism, the transformed features of the two nodes are concatenated and the inner product is made with the learnable attention parameter vector, and the original attention score is obtained by LeakyReLU activation; the attention weights are obtained by softmax normalization of the scores of all neighboring nodes. The spatial context features are obtained by weighted aggregation of the transformation features of neighboring nodes based on attention weights and nonlinear activation. A multi-head graph attention mechanism is adopted to concatenate the outputs of each head and map them to a unified hidden dimension through a linear projection layer to obtain the final spatial context features.
[0010] In a preferred embodiment, S2 further includes: The time encoding module employs a multi-head self-attention mechanism. After encoding the historical time step feature sequence with fixed trigonometric function positions, it generates a query matrix, a key matrix, and a value matrix through three independent linear transformations. The dot product of the query matrix and the key matrix is calculated and divided by the square root of the key vector dimension for scaling. After softmax normalization, the time step attention weight matrix is obtained. Based on this matrix, the value matrix is weighted and summed to obtain the time-encoded features. The multi-head outputs are concatenated, then subjected to linear transformation for dimensionality reduction and aligned to a unified hidden dimension through a linear projection layer. During gating fusion, spatial context features and temporal coding features are concatenated and input into a fully connected layer. After sigmoid activation, a gating vector is generated. The fusion result is the element-wise product of the gating vector and the spatial context features, plus the element-wise product of the gating vector minus one of each element and the temporal coding features, to obtain the spatiotemporal representation vector.
[0011] In a preferred embodiment, S2 further includes: The shared feature extraction layer and task-specific output layer are used. The output layer for the mean prediction task uses linear activation to output the required mean, while the output layer for the standard deviation prediction task uses softplus activation to ensure that the output is always positive and outputs the required standard deviation. The joint loss consists of the mean squared error loss of the mean task and the negative log-likelihood loss of the standard deviation task. Learnable uncertainty weights for the mean task and standard deviation task are introduced. The joint loss is the sum of the losses of each task divided by twice the square of the corresponding uncertainty weight parameter, plus the natural logarithm of each of the two uncertainty weight parameters as a regularization term. Two uncertain weight parameters, along with the network weights, participate in backpropagation and are automatically adjusted according to the relative magnitude of the loss of each task, thereby achieving dynamic balance among tasks.
[0012] In a preferred embodiment, S3 includes: The objective function is a weighted sum of transportation costs and uncertainty risk costs, with the sum of the weight coefficients being one, controlled by preset transportation cost weight thresholds and preset risk cost weight thresholds respectively. The uncertainty risk cost item is constructed as follows: based on the preset demand quantile confidence level threshold, the quantile value z is obtained by looking up the standard normal distribution table, and the upper bound of the demand quantile is obtained by adding the mean demand of each region to the product of z and the standard deviation of demand. The potential capacity gap is obtained by taking the positive part of the difference between the upper bound of the demand quantile and the allocated capacity. The allocated capacity is expressed as a linear function of the allocation decision variable. The potential capacity gap of each region is multiplied by the unit service failure penalty cost and weighted according to the regional priority weight to obtain the total uncertainty risk cost.
[0013] In a preferred embodiment, S3 further includes: The nearest neighbor heuristic method is used to generate an initial feasible solution. In each iteration, destruction and repair operations are performed. The destruction operation removes tasks to the allocation pool according to a preset destruction degree threshold. The destruction strategies include random removal, removal of related task groups, and removal of high-cost contribution tasks. The algorithm adaptively adjusts the selection probability based on the historical improvement frequency of each strategy. The repair operation evaluates the incremental cost of inserting the assigned task into each vehicle and each position. The incremental cost includes the distance increment, time increment, and risk cost change. Under the premise of satisfying capacity constraints and time window constraints, the insertion scheme with the minimum total incremental cost is selected. The simulated annealing acceptance criterion is used for new solutions. The probability of accepting inferior solutions is controlled by the temperature parameter that decreases with the number of iterations. After iterating to the maximum number of times, the historical best solution is output as the initial scheduling scheme.
[0014] In a preferred embodiment, S4 includes: The system collects real-time data according to a preset monitoring time interval threshold. When each time interval arrives, the latest historical demand sequence and real-time external features are input into the spatiotemporal graph attention network to infer the mean and standard deviation of demand in each region in real time. The absolute value of the difference between the actual demand in each region and the predicted mean is used to obtain the demand deviation. When the demand deviation of a region exceeds the preset deviation triggering multiple threshold of its predicted standard deviation, it is judged as a significant deviation and dynamic scheduling optimization is triggered. Dynamic scheduling is also triggered when the remaining vehicle capacity in a certain area is about to be unable to cover new orders. The state vector contains the actual demand, the predicted mean demand, the demand deviation, the prediction standard deviation, the status of each vehicle, the current time code, and global statistical information of each area. The above parts are concatenated and processed by feature normalization to form a complete state vector.
[0015] In a preferred embodiment, S4 further includes: The pre-trained deep reinforcement learning strategy network adopts an actor-critic architecture. The action space includes task reallocation actions, local vehicle path adjustment actions, inter-delivery center collaborative support actions, and no-operation actions. For actions that do not meet the capacity constraints or time window constraints, the corresponding output value is set to negative infinity before softmax normalization to mask the unusable actions. Training employs a proximal policy optimization algorithm, constructing the actor network objective function based on the product of the probability ratio of the old and new policies and the advantage function. The probability ratio is clipped within an interval with a preset policy clipping radius threshold, taking the smaller value before and after clipping to prevent excessively large single update magnitudes. The reward function is the negative total cost, including additional travel distance and time costs, time window default penalties, and the estimable transportation cost of completing the remaining tasks. During online application, the action with the highest probability is selected for execution; if the constraints are not met after execution, it is backtracked and the second-best action is selected.
[0016] The beneficial effects of this invention are as follows: By jointly predicting the mean and standard deviation of demand in each region through a spatiotemporal graph attention network, the prediction uncertainty is explicitly incorporated into the scheduling objective function in the form of the upper bound of the demand quantile. This enables the initial scheduling scheme to reserve capacity margin for regions with high uncertainty and maintain the feasibility of the scheme when demand fluctuates upwards. This solves the problem of insufficient capacity caused by the neglect of prediction uncertainty in the existing scheduling scheme. The system uses the prediction standard deviation obtained from real-time inference as the adaptive trigger threshold for dynamic scheduling and incorporates it into the state vector of the deep reinforcement learning policy network. This enables the system to adaptively adjust the trigger sensitivity and adjustment range according to the current uncertainty level of each region, thus solving the problem of mismatch between response sensitivity and regional demand characteristics caused by the fixed trigger threshold in existing dynamic scheduling methods. Attached Figure Description
[0017] Figure 1 This is a flowchart of the freight demand forecasting and scheduling optimization method based on artificial intelligence of the present invention; Figure 2 This is a flowchart of the artificial intelligence-based freight demand forecasting and scheduling optimization method of the present invention. Figure 1 ; Figure 3 This is a flowchart of the artificial intelligence-based freight demand forecasting and scheduling optimization method of the present invention. Figure 2 . Detailed Implementation
[0018] The subject matter described herein will now be discussed with reference to exemplary embodiments. It should be understood that these embodiments are discussed only to enable those skilled in the art to better understand and implement the subject matter described herein, and changes may be made to the function and arrangement of the elements discussed without departing from the scope of this specification. Various processes or components may be omitted, substituted, or added as needed in the examples. Furthermore, some features described in the examples may be combined in other examples.
[0019] At least one embodiment of the present invention discloses a freight demand forecasting and scheduling optimization method based on artificial intelligence, such as... Figures 1 to 3 As shown, it includes the following steps: S1: Obtain historical order records, historical weather records, holiday information, and regional static attribute data. Use adaptive edge weight calculation that integrates geographical proximity and business correlation strength, and static and dynamic historical demand features to construct a spatiotemporal feature map. Historical operational data is obtained from the business management systems of logistics companies. The data collection time span is from the lower limit threshold to the upper limit threshold of past data. The default value for the lower limit threshold is three months, and the default value for the upper limit threshold is twelve months, to ensure sufficient data volume to support model training while maintaining data timeliness. The core of the historical operational data is the order record table. Each record includes fields such as order generation time, origin delivery area number, destination delivery area number, cargo weight, cargo volume, cargo type label, and customer-required delivery time window. The delivery time window includes two subfields: earliest delivery time and latest delivery time, which constitute the source of time window constraints in subsequent scheduling optimization. The raw order data is preprocessed, including removing or completing records with missing key fields, converting timestamp format to a standard format, mapping area number codes, and detecting and handling outliers in cargo weight and volume to ensure data quality. External environmental data is obtained from public data sources or meteorological service interfaces, including historical weather records (temperature, precipitation, wind speed, weather type), holiday information, and static attribute data of each delivery area (area type, area, population density, number of commercial outlets, etc.).
[0020] Historical order data is aggregated at the hourly granularity to calculate the order demand for each delivery region within each hourly time period. This embodiment uses shipping demand as an example. The specific aggregation method is as follows: traverse the order record table, grouping and counting by the hour field of order generation time and the starting delivery region number field to obtain the order count for each region within each hourly time period, forming a two-dimensional demand matrix indexed by region number and timestamp. For missing time slots in the aggregated time series, linear interpolation is preferentially used to linearly estimate the demand values of the time slots before and after the missing point. For cases of continuous large-scale missing slots, the historical average of the same period is used to fill in the gaps to preserve periodic characteristics. Moving window mean smoothing is applied to anomalous abrupt changes. The deviation of each time slot from the data in its adjacent window is calculated. If the deviation exceeds the abrupt change smoothing threshold, the window mean is used instead. The abrupt change smoothing threshold is a multiple of the abrupt change multiple threshold of the standard deviation of the data within the window. The default value of the abrupt change multiple threshold is set to twice the standard deviation of the data within the window, ensuring the stationarity of the time series. After the above processing, a continuous and complete demand time series for each delivery region is obtained, serving as the data foundation for subsequent spatial graph construction and model training.
[0021] A spatial graph structure of delivery areas is constructed, with the node set encompassing all delivery areas. Edge construction employs an adaptive method that integrates geographical proximity and business relevance. First, the Euclidean distance matrix between areas is calculated based on the coordinates of each area's center point. Initial edge connections are established between area pairs whose distance is less than a geographical proximity threshold, which is set by default to the median of the distance distribution between areas in the urban delivery network. Geographical proximity is defined as a negative exponential function of distance, i.e., proximity equals the value of an exponential function with a base of the natural constant and a negative distance divided by a scale parameter. The outgoing edge proximity of each node is normalized so that the sum of the outgoing edge proximity of each node is one. Second, order flow between any two area pairs is statistically analyzed from historical orders. For area i and area j, the total number of historical orders originating from i to j is divided by the total number of historical orders originating from area i to obtain the business relevance strength. This ratio is naturally normalized to between zero and one, reflecting the actual impact of area i's demand on area j.
[0022] The edge weights are weighted and combined using a combination of geographical proximity and business association strength. The overall edge weight equals the geographical weight coefficient multiplied by the normalized geographical proximity plus the business weight coefficient multiplied by the business association strength, with the sum of the two weight coefficients being one. The business weight coefficient can be slightly higher than the geographical weight coefficient to highlight order-driven associations. The geographical weight coefficient is the geographical weight threshold, with a default value set to 0.4. The business weight coefficient is the business weight threshold, with a default value set to 0.6, and the sum of the two is one. A threshold filter is applied to the overall edge weight matrix, controlling the graph sparsity by retaining the top few outgoing edges with the highest weight for each node. The number of retained edges is the node outgoing edge retention threshold, with a default value set to five. The retained edges are then normalized again to obtain the adjacency matrix A of the adaptive spatial graph. The matrix element A_ij represents the normalized overall association strength from region i to region j, with zero for no edge. This adjacency matrix integrates spatial proximity and business flow patterns, providing an accurate topological foundation for the subsequent spatial information propagation of the graph neural network.
[0023] Node features of each region are extracted. For static features, region type is encoded using one-hot encoding, converting the classification label into a binary vector of the same length as the number of categories, with the corresponding category position set to one and the rest to zero; numerical features such as population density, number of commercial outlets, and region area are standardized using z-scores, by subtracting the mean of the feature from the original value for all regions and then dividing by the standard deviation, so that the mean of each feature is zero and the standard deviation is one, eliminating dimensional differences; the one-hot encoded vector and the standardized numerical features are concatenated sequentially to form the static feature vector of each region, which does not change over time. Regarding dynamic features, for each time step, time attributes such as day of the week, month, hour, whether it is a holiday, and whether it is a weekday are extracted. The day of the week and month are encoded using one-hot encoding, while the hour is encoded using periodic encoding. Specifically, the hour value is substituted into the sine and cosine functions respectively, and the result is a two-dimensional vector with 2 times pi, then multiplied by the hour value and divided by 24 as the independent variable. This ensures that adjacent hours are close in distance in the encoding space. Holidays and weekdays are coded as binary features of zero or one. Weather features for the corresponding time step are extracted. Temperature and precipitation are normalized, weather type is encoded using one-hot encoding, and wind force level is encoded as a normalized value. The time attribute features and weather features are concatenated to form the dynamic feature vector for that time step, and the dynamic feature vector is shared across all regions. For each region at each time step, the static feature vector of the region, the dynamic feature vector of the time step, and the historical demand value of the region at that time step are concatenated in sequence to form the complete feature vector of the node at that time step. The sequential concatenation of the above three types of features is the feature fusion operation in this step, which comprehensively describes the inherent attributes of the region, external time factors, and historical demand information.
[0024] For the prediction task, a sliding window method is used to construct training samples. Data from T consecutive historical time steps are selected as the input sequence, where T is the time window length, i.e., the time window length threshold. The default time window length threshold is set to 24 hours, indicating the use of historical data from the past few hours. The subsequent time steps are used as the prediction target, and the sliding window moves chronologically to generate multiple training samples. The feature vector sequences of all nodes are combined with the adjacency matrix A of the adaptive spatial graph to construct a complete spatiotemporal feature map. Spatially, the adjacency matrix A expresses the topological relationships and influence weights between regions, while temporally, the feature vector sequences of each node express the historical evolution of demand and the dynamic changes of external factors, serving as the input representation for the subsequent prediction model.
[0025] S2, input the spatiotemporal feature map into the spatiotemporal graph attention network, train it using a multi-task learning framework that combines gated fusion spatiotemporal representation with learnable uncertainty weights for joint loss, and output the mean and standard deviation of future demand for each region. The spatiotemporal feature map constructed in step S1 is input into the spatiotemporal graph attention network for encoding and prediction. The network adopts a multi-layer stacked structure, with each layer containing a spatial encoding module and a temporal encoding module. The number of network layers is a network stacking layer threshold, with the default value set to three layers.
[0026] The spatial encoding module handles information propagation in the spatial dimension based on a graph attention mechanism. For each target node, its neighborhood node set consists of the nodes connected to it in the adjacency matrix A. For target node i and each neighboring node j, their feature vectors at the current time step t are obtained and mapped to the hidden dimension space through a learnable linear transformation matrix W to obtain the transformed feature representation. When calculating the attention score between the target node and its neighboring nodes, an additive attention mechanism is used: the transformed features of the two nodes are concatenated, and then the inner product is taken with the learnable attention parameter vector. The original attention score is obtained by applying the LeakyReLU activation function. The higher the score, the greater the information contribution of the neighboring node to the target node. The original attention scores of all neighboring nodes of the target node are normalized by softmax to obtain attention weights, and the sum of the weights is one. The transformed features of the neighboring nodes are weighted and aggregated based on the attention weights, and the spatial context features of the target node are obtained through a non-linear activation function. To enhance the ability to characterize different spatial association patterns, a multi-head graph attention mechanism is adopted, which sets up several independent attention heads. Each head is calculated independently using different parameters. The outputs of each head are concatenated and mapped to a unified hidden dimension through a linear projection layer to obtain the final spatial context features.
[0027] The temporal encoding module employs a multi-head self-attention mechanism to handle temporal dependencies. The feature sequences of the node's historical T time steps are taken as input, with each time step's features having undergone spatial encoding and fused with spatial context information. Positional encoding is performed on the feature sequences using a fixed trigonometric function encoding method, superimposing positional information onto the feature vectors to enable the model to distinguish the positions of different time steps. The feature sequences with positional information are then subjected to three independent linear transformations to generate a query matrix, a key matrix, and a value matrix. The dot product of the query matrix and the key matrix is calculated, divided by the square root of the key vector dimension for scaling to stabilize the gradient, and then normalized using softmax to obtain the attention weight matrix between time steps. Each element in the matrix reflects the correlation between two time steps. The value matrix is weighted and summed based on the attention weight matrix to obtain the temporal encoded feature for each time step. This feature integrates information from all time steps in the sequence, capturing long-range dependencies and periodic patterns. Again, a multi-head mechanism is used, concatenating the outputs of multiple heads and performing a linear transformation to reduce dimensionality, resulting in the final temporal encoded feature. The multi-head splicing output of the time coding module is mapped to the same hidden dimension as the output of the spatial coding module through a linear projection layer, ensuring that the two types of features have the same dimension before entering the gating fusion.
[0028] Spatial context features and temporal encoding features are fused using a gated approach. The multi-head concatenation output of the spatial encoding module is aligned to a unified hidden dimension by a linear projection layer. This aligned output, along with the output of the temporal encoding module (also aligned by a linear projection layer), ensures consistent dimensions. The concatenation of these two outputs is then fed into a fully connected layer, where a sigmoid activation function generates a gate vector. Each element of the gate vector takes a value between zero and one. The fusion result is the element-wise product of the gate vector and the spatial context features, plus the element-wise product of the gate vector (with each element subtracted from the previous one) and the temporal encoding features, yielding the fused spatiotemporal representation vector. The gate vector is learned through data-driven analysis. In scenarios dominated by spatial correlation, the gate value tends to be larger, while in scenarios dominated by temporal trends, the gate value tends to be smaller, allowing the model to adaptively adjust the contribution ratio of the two types of information based on the samples.
[0029] A multi-task learning framework is constructed based on spatiotemporal representation vectors, employing a shared feature extraction layer and task-specific output layers. The shared layer is a multi-layer fully connected network that performs nonlinear transformations on the spatiotemporal representation vectors to extract high-level abstract features, providing a common feature foundation for both prediction tasks. The output layer of the mean prediction task uses linear activation to output the mean demand for each region in each prediction period, serving as a point estimate of future demand. The output layer of the standard deviation prediction task uses a softplus activation function, which calculates the logarithm of the output value raised to the power of a natural constant, ensuring the output is always positive. It outputs the standard deviation of demand for each region in each prediction period, quantifying the degree of uncertainty in the prediction.
[0030] The joint loss function consists of two parts. The loss for the mean prediction task uses the mean squared error. For all prediction targets, the square of the difference between the actual demand value and the predicted mean is calculated and averaged to measure the overall deviation between the predicted mean and the actual demand value. The loss for the standard deviation prediction task uses the negative log-likelihood based on the normal distribution assumption. It is calculated as follows: for each prediction target, the output layer of the standard deviation prediction task is activated by softplus to obtain the predicted standard deviation. The natural logarithm of this predicted standard deviation is taken to obtain the logarithmic standard deviation. The sum of the squares of the differences between the logarithmic standard deviation and the actual value and the predicted mean, divided by twice the square of the predicted standard deviation, is the negative log-likelihood contribution for that target. This summation over all targets yields the standard deviation task loss. This loss function's properties encourage the model to output a larger standard deviation to reflect high uncertainty when the predicted mean deviates significantly from the actual value; conversely, when the predicted mean is close to the actual value, the model is encouraged to output a smaller standard deviation to reflect low uncertainty. This ensures that the quality of the standard deviation prediction matches that of the mean prediction, avoiding a disconnect between the standard deviation output and the actual prediction error.
[0031] To balance the losses of the two tasks, learnable uncertainty weights are introduced, denoted as the mean task uncertainty weight and the standard deviation task uncertainty weight, corresponding to the homoscedastic uncertainty levels of each task. The joint loss is constructed as follows: the mean task loss is divided by twice the square of the mean task uncertainty weight, plus the standard deviation task loss is divided by twice the square of the standard deviation task uncertainty weight, and then the natural logarithm of each uncertainty weight is added as a regularization term. This regularization term prevents the weights from increasing indefinitely, causing the corresponding task loss to approach zero, ensuring effective optimization of both tasks. The two uncertainty weights participate in backpropagation along with the network weights, automatically adjusting during training based on the relative magnitude of the task losses: when a task loss is relatively large, the corresponding weight tends to increase, reducing the contribution of that task to the joint loss, thus achieving dynamic balance between tasks and preventing a single task from dominating the training process. Training employs a backpropagation algorithm and an adaptive optimizer, dividing historical data into training, validation, and test sets. Parameters are updated on the training set, and generalization performance is monitored on the validation set with an early stopping strategy to prevent overfitting. After training, the calibration quality of the predicted mean error and standard deviation is evaluated on the test set. The calibration quality of the standard deviation is evaluated using a coverage index, which is the proportion of actual demand in each region falling within a confidence interval centered on the predicted mean and with the predicted standard deviation as the radius. The closeness of this proportion to the theoretical confidence level reflects the calibration quality of the standard deviation prediction.
[0032] During model inference, the mean and standard deviation of demand for each region are output, resulting in a probability distribution description of future demand for each region under the assumption of a normal distribution. The mean demand provides a point estimate of future demand and forms the basis for scheduling decisions; the standard deviation quantifies the uncertainty of the forecast, with a larger standard deviation indicating lower forecast confidence and a greater likelihood of demand fluctuations. This probability distribution description enables subsequent scheduling to address uncertainty risk costs within the same probabilistic language, providing a quantitative basis for robust scheduling scheme generation and dynamic triggering mechanisms.
[0033] S3, based on the mean and standard deviation of demand in each region, uses the upper bound of demand quantile to quantify the risk penalty of capacity gap and constructs an objective function by weighting it with transportation costs, and solves it by adaptive large neighborhood search to generate an initial scheduling scheme; Obtain the mean and standard deviation vectors of future demand for each region output from step S2. Obtain vehicle resource information for each distribution center, including the number of available vehicles, load capacity, volumetric capacity, and vehicle type. Obtain the distance matrix between distribution regions, where matrix elements represent the shortest path distance or historical mileage statistics between regions. Obtain the estimated travel time matrix between regions, where matrix elements represent the estimated travel time from one region to another, which can be calculated by combining historical average speed and distance, or obtained from a map service interface.
[0034] A multi-objective optimization model for collaborative scheduling of multiple distribution centers is constructed. The decision variables include three categories: the allocation relationship between tasks and vehicles (binary variables, indicating whether a certain delivery task is executed by a certain vehicle), the order of vehicle access to tasks (binary variables, indicating whether a vehicle goes directly from one task to another), and the time when vehicles arrive at each task point (continuous variables).
[0035] The objective function is a weighted sum of total transportation costs and uncertainty risk costs. The transportation cost weight is a transportation cost weight threshold, with a default value set to 0.6. The risk cost weight is a risk cost weight threshold, with a default value set to 0.4. The sum of the two is one. By adjusting these two weight coefficients, a balance is achieved between economic efficiency and the project's resilience to volatility. The transportation cost component includes three parts: distance cost, time-related costs, and fixed vehicle costs. These are obtained by linearly combining the corresponding decision variables with coefficients for unit distance cost, unit time cost, and unit vehicle fixed cost, respectively.
[0036] The uncertainty risk cost term is constructed based on the demand mean and standard deviation output in step S2. Its core idea is to calculate the upper bound of the demand quantiles at a given confidence level for each delivery region, and quantify the gap between the allocated capacity and this upper bound as a risk penalty. The specific calculation process is as follows: For each delivery region, firstly, obtain the demand mean and standard deviation for the corresponding forecast period from step S2; then determine the confidence level, which is the demand quantile confidence level threshold. The default value of the demand quantile confidence level threshold is set to 90%. Based on this confidence level, look up the corresponding quantile value in the standard normal distribution table, denoted as the z-value; add the demand mean to the product of the z-value and the demand standard deviation to obtain the upper bound of the demand quantiles for that region at this confidence level. This upper bound indicates that the probability of actual demand not exceeding the demand quantile confidence level threshold is no greater than this value, reflecting the upward risk boundary of demand.
[0037] After obtaining the upper bound of the demand quantiles for each region, the potential capacity gap for each region is calculated. For each delivery region, the allocated capacity equals the sum of the cargo volume of all tasks assigned to that region in the current scheduling plan. Since the allocation relationship between tasks and vehicles is an optimization decision variable, the allocated capacity changes dynamically with the value of the decision variable. Therefore, it can be expressed as a linear function of the allocation decision variable, making the capacity gap a computable function of the decision variable. The entire risk cost term remains optimizable with respect to the decision variable and is updated in real time as the plan is adjusted during the solution process. The difference between the upper bound of the demand quantiles and the allocated capacity is taken, and the positive part of the difference is recorded. That is, when the allocated capacity is greater than or equal to the upper bound of the demand quantiles, the gap is zero; when the allocated capacity is less than the upper bound of the demand quantiles, the gap is the difference between the two, thus obtaining the potential capacity gap for that region. The capacity gap reflects the amount of insufficient capacity that may occur at a given confidence level. The larger the gap, the higher the risk of service failure when demand increases in that region. Multiplying the potential capacity gap in each region by the unit service failure penalty cost yields the risk penalty value for each region. These values are then weighted and accumulated according to regional priority to obtain the total uncertainty risk cost. Regional priority weights can be set based on customer level, service contract level, or regional strategic importance, with higher-priority regions receiving larger weights to ensure priority service quality in key regions. By explicitly incorporating the demand standard deviation output from step S2 into the optimization objective via the upper bound of the demand quantiles, the scheduling scheme, while pursuing minimum transportation costs, can reserve more capacity margin for regions with high standard deviations. This maintains the feasibility of the scheme even when demand fluctuates upwards, achieving a fusion of predictive uncertainty and scheduling decisions within the same optimization objective.
[0038] The constraints include: vehicle load and volume capacity constraints, requiring that the total weight of the tasks and goods assigned to each vehicle does not exceed the load capacity and the total volume does not exceed the volume capacity; customer time window constraints, requiring that the time when the vehicle arrives at each task point is within the range of the earliest and latest service time required by the customer; task uniqueness allocation constraints, requiring that each delivery task is assigned to exactly one and only one vehicle; path validity constraints, requiring that each vehicle departs from its own distribution center, visits the assigned tasks in sequence and returns to the distribution center, with each task being visited only once in the path, which is achieved through flow balancing conditions and sub-loop constraint elimination; and vehicle quantity constraints, requiring that the total number of vehicles used does not exceed the upper limit of available vehicles.
[0039] An adaptive large neighborhood search algorithm is used to solve this optimization model. The algorithm first uses a nearest neighbor heuristic to generate an initial feasible solution: for each delivery task, vehicles that meet the capacity constraints and are closest to the delivery center are selected, and an initial path is constructed in geographical order. In the iterative search phase, each iteration includes two steps: a destruction operation and a repair operation.
[0040] The disruption operation removes a certain proportion of tasks from the current solution and places them into the allocation pool. This proportion is controlled by a disruption severity threshold, which defaults to 30%. Disruption strategies include random removal, removal from related task groups (selecting task groups similar to the seed task geographically or temporally), and removal of tasks with the highest cost contribution (removing tasks that contribute the most to the current objective function). The algorithm maintains the selection probability of each disruption strategy and adaptively adjusts it based on the frequency with which each strategy brings improvement in historical iterations, increasing the probability of well-performing strategies and decreasing the probability of poorly performing strategies.
[0041] The repair operation evaluates the incremental cost of inserting a task into each vehicle and location in the task pool. The incremental cost includes three parts: distance increment, time increment, and change in risk cost. The change in risk cost is calculated as follows: after inserting a task into a vehicle, the allocated capacity in the destination region increases the cargo volume of that task. The capacity gap in that region is recalculated, and the difference between this recalculation and the pre-insertion capacity gap is multiplied by the unit penalty cost and the region's priority weight. This result is the change in risk cost caused by the insertion operation, which is then combined with the distance and time increments to form the total incremental cost. Under the premise of satisfying capacity and time window constraints, the insertion scheme with the minimum total incremental cost is selected and executed until all tasks are re-inserted, resulting in a repaired solution.
[0042] A simulated annealing acceptance criterion is used for new solutions: if the objective function value of the new solution is better than the current solution, it is accepted directly; if the new solution is worse than the current solution, it is accepted with a probability controlled by a temperature parameter that decreases with the number of iterations, allowing the algorithm to escape local optima. Simultaneously, adaptive weights are updated based on the effects of each operation in the current iteration: the highest reward is given for discovering a new global optimum, a medium reward for accepting an improved solution, a small reward for accepting a worse solution, and no reward for not accepting it. After iterating to the maximum number of iterations or the running time limit, the historical best solution is output as the initial scheduling scheme. The scheme includes a task allocation table for each distribution center, a complete access sequence for each vehicle, and the estimated arrival time for each task point.
[0043] S4: During the execution of the initial scheduling scheme, real-time orders, vehicle status and the predicted standard deviation obtained from real-time inference are collected. The predicted standard deviation is used as the adaptive trigger threshold and incorporated into the state vector. A pre-trained deep reinforcement learning policy network is used to output the dynamically adjusted scheduling scheme. After the initial scheduling plan is generated, it is distributed to each distribution center for execution through the scheduling management system. During execution, the system collects real-time data at fixed time intervals, with each time interval being a monitoring time interval threshold. The default value for the monitoring time interval threshold is set to fifteen minutes. Real-time order information is obtained through the order management system, and the fields are consistent with historical orders. Vehicle status information is obtained through the vehicle's GPS terminal and the driver's mobile application, including the current location area code, remaining load capacity, number of completed tasks, number of incomplete tasks, and the estimated time of completion of the current task.
[0044] At the arrival of each time interval, the system uses the current time as a baseline, inputs the latest historical demand sequence and real-time external features into the spatiotemporal graph attention network trained in step S2, and infers in real time the mean and standard deviation of demand for each region during the current prediction period, which serve as the basis for trigger judgment and state vector construction. Simultaneously, real-time orders are aggregated by delivery region, and the actual order demand for each region within the current time window is statistically analyzed, using the same aggregation method as in step S1. The absolute value of the difference between the actual demand for each region and the predicted mean obtained through real-time inference is taken to obtain the demand deviation for each region. The demand deviation is compared with a preset multiple of the predicted standard deviation obtained through real-time inference. When the demand deviation of a region exceeds a multiple of its predicted standard deviation trigger multiple threshold, it is determined to be a significant deviation, triggering dynamic scheduling optimization. The default value of the deviation trigger multiple threshold is set to 1.5 times. For regions with larger predicted standard deviations, the absolute value of the trigger threshold is correspondingly higher to avoid excessive intervention in normal fluctuations; for regions with smaller predicted standard deviations, the absolute value of the trigger threshold is lower to respond promptly to abnormal changes. Furthermore, even if a region does not meet the deviation triggering criteria, but there is a hard constraint risk that the remaining vehicle capacity will soon be insufficient to cover new orders, the system will still trigger dynamic scheduling to ensure service feasibility. The prediction standard deviation is used as an adaptive triggering threshold and incorporated into the state vector, enabling the real-time response at the tactical layer and the robust solution at the strategic layer to form a closed loop within the same probabilistic language framework.
[0045] Dynamic scheduling optimization employs a deep reinforcement learning strategy of offline training and online inference. During the offline training phase, a simulation environment is constructed to simulate the dynamic process of freight scheduling: the demand generation model samples and generates diverse demand scenarios based on historical demand distribution and the prediction model trained in step S2, covering situations such as normal fluctuations, demand surges, and demand drops; the vehicle operation simulator advances the simulation time based on the vehicle's current location, remaining tasks, and driving speed, updates the vehicle status, and generates feedback consistent with the real environment.
[0046] The policy network employs an actor-critic architecture. The state vector comprises the following components: the current actual demand for each region (obtained through real-time aggregation), the mean predicted demand for each region (from real-time inference in step S2), the demand deviation for each region (the difference between actual demand and the predicted mean), the prediction standard deviation for each region (from real-time inference in step S2, providing uncertainty information), the status of each vehicle (current location region number, remaining capacity, number of assigned tasks, number of unfinished tasks, estimated completion time of the current task), the current time code (hour, day of the week, whether it is a holiday, etc.), and global statistics (total number of completed tasks, total number of remaining tasks, etc.). These components are concatenated and processed through feature normalization to form the complete state vector. The state vector explicitly includes the prediction standard deviation output from real-time inference in step S2, enabling the policy network to perceive the uncertainty level of each region and thus tend to choose a wider range of adjustment actions when deviations occur in high-uncertainty regions.
[0047] The action space is a discrete set, including the following types of actions: task reallocation actions (transferring a task from one vehicle to another, with candidate vehicles limited to adjacent areas or vehicles with sufficient remaining capacity to control the effective action scale), local vehicle path adjustment actions (swapping the access order of two tasks in a vehicle's path), inter-departmental collaborative support actions (requesting vehicles from other distribution centers to support the current area), and no-operation actions (no adjustment is made, and the original plan continues to be executed). In the actor network output layer, for actions that do not meet capacity or time window constraints in the current state, the corresponding output value is set to negative infinity before softmax normalization, so that its probability approaches zero after softmax. This masks inoperable actions at the probability distribution level, ensuring that the sampled or greedy selected actions always meet the hard constraints. By limiting candidate vehicles to adjacent areas or vehicles with sufficient capacity and combining this masking mechanism, the number of effective actions is reasonably controlled, enabling the near-end policy optimization algorithm to converge normally.
[0048] The reward function is designed as a negative total cost, which includes three parts: the additional travel distance and time cost caused by the adjustment, the time window default penalty, and the estimated transportation cost of completing the remaining tasks. By maximizing the cumulative reward, the policy network learns to select the scheduling adjustment strategy with the lowest cost in an uncertain environment.
[0049] The training employs a proximal policy optimization algorithm, which stabilizes the training process by pruning the policy update magnitude. The training process is as follows: First, the parameters of the actor network and the critic network are initialized. The input layer dimension of the actor network is the state vector dimension, and the output layer, after masking the inactive actions, is activated by softmax to obtain the probability distribution of each action. The input layer dimension of the critic network is also the state vector dimension, and the output layer is a single numerical value representing the value estimate of the current state. In the simulation environment, the agent interacts with the environment according to the current actor network policy, observes the current state, samples and selects actions, and after executing the action, the environment transitions to a new state and returns a reward. The transition samples of state, action, reward, and new state are recorded, and this process is repeated several times to collect a batch of trajectory data.
[0050] The advantage function is calculated based on the collected trajectory data. This advantage function reflects the superiority of an action in a given state relative to the average level of the current policy. The advantage is estimated by accumulating discounted rewards, then subtracting the state value estimate from the critic network's output. A positive value indicates the action is better than average, while a negative value indicates it is worse. The objective function of the actor network is constructed based on the product of the probability ratio of the old and new policies on the same action (denoted as the probability ratio) and the advantage function. The probability ratio is clipped within a range with a policy clipping radius threshold (default value set to 0.2). The smaller of the products before and after clipping is taken as the final objective to prevent large single updates from causing training instability; this is the core mechanism of the near-end policy optimization algorithm. The actor network maximizes the objective function through gradient ascent, while the critic network updates by minimizing the mean squared error between the predicted state value and the target value. This process of collecting trajectories, calculating the advantage, and updating the network is iteratively performed until the average cumulative reward of the policy in the validation scenario converges. Finally, the actor network weights are fixed and deployed to the online scheduling system.
[0051] In online applications, when a significant demand deviation is detected, triggering dynamic scheduling, the system extracts information from real-time data and combines it to form a current state vector. This vector is then input into the actor network for forward propagation. Unsuitable actions are masked and then processed using a softmax function to obtain the probability distribution of each action. The action with the highest probability is selected as the deterministic decision. The system executes the selected action: for task reassignment, the system updates the task assignment and the path planning for both vehicles, recalculates the travel distance and arrival time, and verifies capacity and time window constraints. If these constraints are not met, the system backtracks and selects a suboptimal action. For local path adjustment, the system modifies the task access order of the designated vehicles and updates the arrival time. For collaborative support, the system sends a support request to the designated distribution center. After the support distribution center responds, it assigns vehicles to the support area, and the system updates the global task allocation scheme. For no-operation actions, the original scheme continues to be executed. After the adjustment is completed, the updated scheme is distributed to the relevant vehicles, and the system continues to monitor the subsequent status at time intervals to achieve continuous dynamic optimization.
[0052] The initial scheduling scheme generated in step S3 is based on a risk cost term constructed from the predicted demand mean and standard deviation. During the optimization phase, capacity reserves have been reserved for high-uncertainty areas, providing a stable basic scheme. The dynamic scheduling in step S4 is based on real-time order and demand deviations, using the standard deviation output from the real-time inference in step S2 as an adaptive trigger threshold. The standard deviation is incorporated into the policy network state vector, providing flexible tactical response capabilities. Both are connected in a closed loop through the probability distribution description output in step S2, with the strategic layer providing direction and the tactical layer providing corrections, constituting a complete integrated prediction-scheduling collaborative mechanism.
[0053] This invention focuses on the application of multi-distribution center collaborative delivery in a large urban express logistics company. The company has established several distribution centers within the city, strategically located in the core business district, northern industrial park, eastern residential area, southern university town, and western transportation hub, covering multiple delivery areas encompassing various types of locations including core business districts, high-density residential areas, industrial parks, university towns, and suburbs. The company is equipped with different types of freight vehicles, including light trucks, medium trucks, and heavy trucks, and handles a large daily order volume. The company faces significant spatiotemporal fluctuations in demand. On weekdays, delivery demand is concentrated in the business and industrial areas on weekday mornings, while receiving demand is concentrated in residential areas on weekday afternoons. Order volumes fluctuate during promotional activities, and severe weather causes changes in demand distribution and delivery delays in some areas. Traditional static scheduling methods based on historical average demand are insufficient to cope with such complex demand changes, leading to imbalances in vehicle resource allocation between peak and off-peak periods and unstable customer service timeliness.
[0054] Enterprises utilize the method of this invention to construct an intelligent freight demand forecasting and scheduling optimization system, collecting historical order data and corresponding weather and holiday information from the past several months. The system constructs a spatiotemporal feature map in step S1, calculates the geographical distance matrix between delivery areas, analyzes historical order traffic to calculate the business correlation strength matrix, performs weighted combination and threshold filtering, forming an adaptive spatial map containing both geographical and business correlations, extracts static attribute features and temporal dynamic features of each region, and constructs a complete spatiotemporal feature map input. The system trains a spatiotemporal map attention network in step S2, using a multi-task learning framework to jointly predict the mean and standard deviation of demand. After training, the model can output a description of the probability distribution of demand in each region and time period. The system generates an initial scheduling plan based on the prediction results in step S3, incorporating the predicted standard deviation into the optimization objective by constructing a risk cost term using the upper bound of the demand quantile, and solving it using an adaptive large neighborhood search algorithm, reserving capacity margin for high-uncertainty areas. During execution, step S4 involves real-time inference to obtain the latest predicted mean and standard deviation at each monitoring interval, monitoring real-time order deviations, using the predicted standard deviation as an adaptive trigger threshold, and calling the pre-trained strategy network for dynamic adjustment after triggering.
[0055] Table 1 shows example data of predicted demand in different areas at different times on a weekday: Table 1, Example Data for Forecasted Demand As shown in Table 1, the deviations between the actual demand and the predicted mean in each region are all within the predicted standard deviation range, verifying the accuracy of the prediction model and the rationality of uncertainty quantification. The predicted standard deviation for University Town E is relatively large, reflecting higher demand fluctuations in this region. Step S3 allocates more capacity margin for this region when generating the scheduling plan.
[0056] Table 2 shows some vehicle allocation and route overview data for the corresponding scheduling scheme: Table 2, Vehicle Allocation and Route Overview Data As shown in Table 2, the scheduling scheme rationally allocated vehicles and tasks to each distribution center, and the vehicle coverage area matched the location of the distribution center, demonstrating the economy and feasibility of the scheduling scheme. During the scheduling execution, from 20:00 to 21:00, the actual order demand in University City E increased relative to the predicted mean. The ratio of demand deviation to the predicted standard deviation exceeded the deviation trigger multiple threshold, triggering dynamic scheduling. The strategy network output a combination of cross-distribution center support and route rearrangement, scheduling vehicles from adjacent distribution centers to support the University City E area, enabling orders during this period to be delivered within the time window, verifying the economic rationality of the dynamic scheduling decision.
[0057] The embodiments of the present invention have been described above. However, the embodiments are not limited to the specific implementation methods described above. The specific implementation methods described above are merely illustrative and not restrictive. Those skilled in the art can make more equivalent embodiments under the guidance of the present embodiments, and all of them are within the protection scope of the present embodiments.
Claims
1. A freight demand forecasting and scheduling optimization method based on artificial intelligence, characterized in that, Includes the following steps: S1: Obtain historical order records, historical weather records, holiday information, and regional static attribute data. Use adaptive edge weight calculation that integrates geographical proximity and business correlation strength, and static and dynamic historical demand features to construct a spatiotemporal feature map. S2, input the spatiotemporal feature map into the spatiotemporal graph attention network, train it using a multi-task learning framework that combines gated fusion spatiotemporal representation with learnable uncertainty weights for joint loss, and output the mean and standard deviation of future demand for each region. S3, based on the mean and standard deviation of demand in each region, uses the upper bound of demand quantile to quantify the risk penalty of capacity gap and constructs an objective function by weighting it with transportation costs, and solves it by adaptive large neighborhood search to generate an initial scheduling scheme; S4 collects real-time orders, vehicle status, and predicted standard deviation obtained from real-time inference during the execution of the initial scheduling scheme. The predicted standard deviation is used as an adaptive trigger threshold and incorporated into the state vector. A pre-trained deep reinforcement learning policy network is used to output the dynamically adjusted scheduling scheme.
2. The freight demand forecasting and scheduling optimization method based on artificial intelligence according to claim 1, characterized in that, S1 includes: Calculate the Euclidean distance matrix between regions based on the coordinates of the center point of each region. Establish initial edge connections for region pairs whose distance is less than the preset geographical proximity distance threshold. Define geographical proximity as a negative exponential function of distance and normalize the outgoing edge proximity of each node. Statistically count the order flow between any region pairs from historical orders. Divide the total number of historical orders from region i to region j by the total number of historical orders from region i to obtain the business association strength. The geographical proximity and business association strength are weighted and combined. The comprehensive edge weight is equal to the geographical weight coefficient multiplied by the normalized geographical proximity plus the business weight coefficient multiplied by the business association strength. The sum of the two coefficients is one. The comprehensive edge weight matrix is threshold filtered, and each node retains no more than the maximum number of outgoing edges with the maximum weight, which is then normalized again to obtain the adjacency matrix A.
3. The freight demand forecasting and scheduling optimization method based on artificial intelligence according to claim 1, characterized in that, S1 further includes: For static features, the region type is encoded using one-hot encoding, and the numerical features of population density, number of commercial outlets, and region area are standardized using z-scores. The one-hot encoded vector is then concatenated with the standardized numerical features to form a static feature vector. For dynamic features, the day of the week, month, hour, holiday, and weekday time attributes are extracted. The hour is encoded using sine and cosine periodicity. In the weather features, temperature and precipitation are normalized, and the weather type is encoded using one-hot encoding. The time attribute features are then concatenated with the weather features to form a dynamic feature vector. For each region at each time step, the static feature vector, dynamic feature vector, and historical demand value are sequentially concatenated to form the complete feature vector of the node at the corresponding time step. The sliding window method is used to construct training samples, and the data of a historical continuous time window with a preset time window length threshold is selected as the input sequence. All node feature vector sequences are combined with the adjacency matrix A to construct a complete spatiotemporal feature map.
4. The method for freight demand forecasting and scheduling optimization based on artificial intelligence according to claim 1, characterized in that, S2 includes: The spatial encoding module processes spatial dimensional information propagation based on the graph attention mechanism: for the target node i and the neighboring node j, their respective feature vectors are mapped to the hidden dimension space through a learnable linear transformation matrix W; the attention score is calculated using an additive attention mechanism, the transformed features of the two nodes are concatenated and the inner product is made with the learnable attention parameter vector, and the original attention score is obtained by LeakyReLU activation; the attention weights are obtained by softmax normalization of the scores of all neighboring nodes. The spatial context features are obtained by weighted aggregation of the transformation features of neighboring nodes based on attention weights and nonlinear activation. A multi-head graph attention mechanism is adopted to concatenate the outputs of each head and map them to a unified hidden dimension through a linear projection layer to obtain the final spatial context features.
5. The freight demand forecasting and scheduling optimization method based on artificial intelligence according to claim 1, characterized in that, S2 further includes: The time encoding module employs a multi-head self-attention mechanism. After encoding the historical time step feature sequence with fixed trigonometric function positions, it generates a query matrix, a key matrix, and a value matrix through three independent linear transformations. The dot product of the query matrix and the key matrix is calculated and divided by the square root of the key vector dimension for scaling. After softmax normalization, the time step attention weight matrix is obtained. Based on this matrix, the value matrix is weighted and summed to obtain the time-encoded features. The multi-head outputs are concatenated, then subjected to linear transformation for dimensionality reduction and aligned to a unified hidden dimension through a linear projection layer. During gating fusion, spatial context features and temporal coding features are concatenated and input into a fully connected layer. After sigmoid activation, a gating vector is generated. The fusion result is the element-wise product of the gating vector and the spatial context features, plus the element-wise product of the gating vector minus one of each element and the temporal coding features, to obtain the spatiotemporal representation vector.
6. The freight demand forecasting and scheduling optimization method based on artificial intelligence according to claim 1, characterized in that, S2 further includes: The shared feature extraction layer and task-specific output layer are used. The output layer for the mean prediction task uses linear activation to output the required mean, while the output layer for the standard deviation prediction task uses softplus activation to ensure that the output is always positive and outputs the required standard deviation. The joint loss consists of the mean squared error loss of the mean task and the negative log-likelihood loss of the standard deviation task. Learnable uncertainty weights for the mean task and standard deviation task are introduced. The joint loss is the sum of the losses of each task divided by twice the square of the corresponding uncertainty weight parameter, plus the natural logarithm of each of the two uncertainty weight parameters as a regularization term. Two uncertain weight parameters, along with the network weights, participate in backpropagation and are automatically adjusted according to the relative magnitude of the loss of each task, thereby achieving dynamic balance among tasks.
7. The freight demand forecasting and scheduling optimization method based on artificial intelligence according to claim 1, characterized in that, S3 includes: The objective function is a weighted sum of transportation costs and uncertainty risk costs, with the sum of the weight coefficients being one, controlled by preset transportation cost weight thresholds and preset risk cost weight thresholds respectively. The uncertainty risk cost item is constructed as follows: based on the preset demand quantile confidence level threshold, the quantile value z is obtained by looking up the standard normal distribution table, and the upper bound of the demand quantile is obtained by adding the mean demand of each region to the product of z and the standard deviation of demand. The potential capacity gap is obtained by taking the positive part of the difference between the upper bound of the demand quantile and the allocated capacity. The allocated capacity is expressed as a linear function of the allocation decision variable. The potential capacity gap of each region is multiplied by the unit service failure penalty cost and weighted according to the regional priority weight to obtain the total uncertainty risk cost.
8. The method for freight demand forecasting and scheduling optimization based on artificial intelligence according to claim 1, characterized in that, S3 further includes: The nearest neighbor heuristic method is used to generate an initial feasible solution. In each iteration, destruction and repair operations are performed. The destruction operation removes tasks to the allocation pool according to a preset destruction degree threshold. The destruction strategies include random removal, removal of related task groups, and removal of high-cost contribution tasks. The algorithm adaptively adjusts the selection probability based on the historical improvement frequency of each strategy. The repair operation evaluates the incremental cost of inserting the assigned task into each vehicle and each position. The incremental cost includes the distance increment, time increment, and risk cost change. Under the premise of satisfying capacity constraints and time window constraints, the insertion scheme with the minimum total incremental cost is selected. The simulated annealing acceptance criterion is used for new solutions. The probability of accepting inferior solutions is controlled by the temperature parameter that decreases with the number of iterations. After iterating to the maximum number of times, the historical best solution is output as the initial scheduling scheme.
9. The freight demand forecasting and scheduling optimization method based on artificial intelligence according to claim 1, characterized in that, S4 includes: The system collects real-time data according to a preset monitoring time interval threshold. When each time interval arrives, the latest historical demand sequence and real-time external features are input into the spatiotemporal graph attention network to infer the mean and standard deviation of demand in each region in real time. The absolute value of the difference between the actual demand in each region and the predicted mean is used to obtain the demand deviation. When the demand deviation of a region exceeds the preset deviation triggering multiple threshold of its predicted standard deviation, it is judged as a significant deviation and dynamic scheduling optimization is triggered. Dynamic scheduling is also triggered when the remaining vehicle capacity in a certain area is about to be unable to cover new orders. The state vector contains the actual demand, the predicted mean demand, the demand deviation, the prediction standard deviation, the status of each vehicle, the current time code, and global statistical information of each area. The above parts are concatenated and processed by feature normalization to form a complete state vector.
10. The method for freight demand forecasting and scheduling optimization based on artificial intelligence according to claim 1, characterized in that, S4 further includes: The pre-trained deep reinforcement learning strategy network adopts an actor-critic architecture. The action space includes task reallocation actions, local vehicle path adjustment actions, inter-delivery center collaborative support actions, and no-operation actions. For actions that do not meet the capacity constraints or time window constraints, the corresponding output value is set to negative infinity before softmax normalization to mask the unusable actions. Training employs a proximal policy optimization algorithm, constructing the actor network objective function based on the product of the probability ratio of the old and new policies and the advantage function. The probability ratio is clipped within an interval with a preset policy clipping radius threshold, taking the smaller value before and after clipping to prevent excessively large single update magnitudes. The reward function is the negative total cost, including additional travel distance and time costs, time window default penalties, and the estimable transportation cost of completing the remaining tasks. During online application, the action with the highest probability is selected for execution; if the constraints are not met after execution, it is backtracked and the second-best action is selected.