A coal sea shipping cost prediction method based on multi-period decomposition and fusion

By employing a multi-period decomposition and fusion method, combining Fast Fourier Transform and two-dimensional convolutional networks to extract the periodic characteristics of coal shipping costs, and utilizing a gated residual mechanism to enhance information on sudden events, this approach solves the problem of insufficient multi-period separation and non-stationary fluctuation capture capabilities in existing technologies, thereby improving the accuracy and stability of predictions.

CN122198225APending Publication Date: 2026-06-12FUJIAN HUADIAN FURUI ENERGY DEVELOPMENT CO LTD

Patent Information

Authority / Receiving Office
CN · China
Patent Type
Applications(China)
Current Assignee / Owner
FUJIAN HUADIAN FURUI ENERGY DEVELOPMENT CO LTD
Filing Date
2026-03-06
Publication Date
2026-06-12

AI Technical Summary

Technical Problem

Existing time series forecasting techniques struggle to separate dynamically superimposed multiple cycles in coal shipping cost forecasting scenarios, have limited ability to capture non-stationary and sudden fluctuations, and exhibit weak model generalization ability in small sample scenarios.

Method used

We adopt a multi-period decomposition and fusion approach, extract the main periods through fast Fourier transform, extract feature components using a two-dimensional convolutional network, enhance the information of sudden events by combining a gated residual mechanism, and introduce a training strategy of input random mask to improve the model's generalization ability.

🎯Benefits of technology

It improves the accuracy and robustness of coal shipping cost forecasting, effectively captures multiple cyclical characteristics, reduces the impact of sudden events on forecasting, and enhances the model's stability under noise and unseen conditions.

✦ Generated by Eureka AI based on patent content.

Smart Images

  • Figure CN122198225A_ABST
    Figure CN122198225A_ABST
Patent Text Reader

Abstract

The application discloses a coal sea freight forecasting method based on multi-period decomposition and fusion, which comprises the following steps: constructing a historical time series of coal sea freight data; inputting the historical time series into a time series forecasting module based on multi-period decomposition and fusion; extracting main periods by fast Fourier transform, remodeling data in two dimensions, extracting feature components of different periods by using a two-dimensional convolution network, and generating a preliminary forecasting result after weighted fusion; inputting external sudden event information into a sudden information enhancement module based on a gating residual mechanism; embedding and coding the sudden information, and then performing feature fusion with the preliminary forecasting result to generate a correction amount of the preliminary forecasting result; and performing dimension alignment on the correction amount and the preliminary forecasting result in a prediction window, and realizing residual correction by element-level addition to obtain a final forecasting result. The application improves the accuracy of coal sea freight forecasting based on multi-period decomposition and fusion and external sudden event information.
Need to check novelty before this filing date? Find Prior Art

Description

Technical Field

[0001] This invention belongs to the field of deep learning and coal shipping cost prediction technology, specifically involving a coal shipping cost prediction method based on multi-period decomposition and fusion. Background Technology

[0002] Coal is a crucial component of the global energy mix, and its cross-regional transportation primarily relies on maritime transport. Therefore, fluctuations in coal shipping costs directly impact energy costs, making accurate forecasting of these costs economically significant. However, forecasting shipping costs is a challenging task. Historical price data typically exhibits strong volatility, nonlinearity, and non-stationarity, making it difficult for traditional forecasting models to capture their inherent patterns. This complexity stems primarily from the fact that the shipping market is influenced by multiple external factors. For example, fluctuations in international oil prices directly alter ship operating costs, changes in the global coal market's supply and demand dynamics profoundly affect shipping demand, and unforeseen events such as geopolitical conflicts can lead to sharp short-term fluctuations in freight rates.

[0003] To effectively predict coal shipping costs, the collected data mainly includes: (1) coal shipping costs: historical shipping costs for multiple international and domestic routes; (2) market environment indicators: comprehensive freight rate indices such as the China Coastal and Baltic Dry Index; (3) real-time supply and demand data: operational dynamics such as inventory, throughput, and number of ships at anchor at major ports; and (4) coal price index: CCI price index for domestic and imported coal. These data are crucial for predicting coal shipping costs. The historical price of specific routes is the core objective of the prediction, while the macro shipping index reflects the overall market trend. Port operation data directly shows the real-time supply and congestion status of logistics nodes, which often leads to short-term fluctuations in freight rates. The coal price index reflects the supply and demand of the goods themselves and directly affects transportation demand. Only by integrating information from these dimensions can the prediction model simultaneously capture long-term trends, medium-term patterns, and short-term sudden fluctuations, thereby drawing more accurate and reliable conclusions.

[0004] Initially, traditional statistical methods were primarily used for forecasting, with the Autoregressive Integral Moving Average (ARIMA) model being the most common. This model is based on the assumption of a linear relationship between data and requires the time series to be stationary. However, coal shipping costs are influenced by various factors, and their data often exhibit dramatic and sudden fluctuations, failing to meet both the linear assumption and the stationarity characteristic. Therefore, the ARIMA model struggles to effectively capture such complex fluctuation patterns. Another commonly used method is exponential smoothing, which predicts trends through weighted averages, but it lags behind sudden changes in market prices. Furthermore, while the Vector Autoregressive (VAR) model can analyze multiple related variables simultaneously, the number of parameters that need to be estimated increases dramatically with the number of variables, significantly increasing computational complexity and posing challenges for practical applications.

[0005] In recent years, deep learning technology has demonstrated significant advantages in nonlinear modeling and feature learning, becoming a key paradigm for solving complex time series prediction tasks. As a classic architecture for sequence modeling, recurrent neural networks (RNNs) can effectively capture short-term dependencies in data, but their inherent gradient vanishing problem makes them difficult to model long-term dependencies. To address the difficulty RNNs face in handling long-term dependencies, Long Short-Term Memory (LSTM) networks were proposed. LSTMs solve the long-term dependency problem of RNNs through gating mechanisms, but the model structure of LSTMs is much more complex than that of RNNs, with approximately four times the number of parameters, resulting in slower training and prediction speeds and higher computational resource consumption. Furthermore, LSTMs require a large amount of data for training; if data is insufficient, overfitting is likely to occur.

[0006] The Transformer, by capturing global dependencies in sequences through its self-attention mechanism, overcomes the limitations of traditional models in modeling long-distance associations. However, its quadratic computational complexity faces efficiency challenges when processing extremely long sequences. To address the limitations of the Transformer in time series prediction, a series of efficient improved models have been proposed, significantly enhancing prediction performance and computational efficiency. Among them, LogSparse sparse attention, combined with local convolution enhancement and global attention with a logarithmic step, breaks through memory bottlenecks and enhances the model's awareness of local context. Reformer uses Locality Sensitive Hashing (LSH) to replace traditional dot-product attention and introduces a reversible residual network, significantly reducing memory usage and computational overhead when processing long sequences. Informer introduces the ProbSparse sparse attention mechanism, reducing complexity to O(L logL), and uses a generative decoder to generate results in one step, solving the efficiency bottleneck of long sequence prediction. Autoformer innovatively embeds a seasonal-trend decomposition architecture at the model level and proposes an autocorrelation mechanism based on subsequence matching to replace point-to-point attention, significantly improving the ability to capture periodicity. FEDformer combines sequence decomposition with Fourier transform and wavelet transform for sparse attention computation in the frequency domain, effectively capturing global features and further reducing complexity. Pyraformer constructs a pyramid-shaped attention graph structure, capturing both short-term local features and extremely long-term dependencies at different resolutions with linear complexity. Crossformer employs a two-stage attention mechanism across dimensions and time, explicitly capturing the interdependencies between different variables. PatchTST uses a channel-independent strategy and time-slice blocking technique, preserving local semantic information while significantly reducing computational cost.

[0007] However, existing time series forecasting techniques have the following shortcomings in the scenario of coal shipping cost forecasting: (1) Difficulty in separating multiple dynamically superimposed cycles in freight data. Ocean freight has multiple cycles, and these cycle signals superimpose each other, with the cycle length changing dynamically due to factors such as market activities and climate. Existing models lack the ability to explicitly model and separate multi-scale, non-fixed-cycle signals, resulting in incomplete learning of key periodic patterns.

[0008] (2) Limited ability to capture non-stationary and sudden fluctuations. Coal shipping costs are often affected by sudden events such as policies, weather, and geopolitics, exhibiting strong non-stationarity and violent fluctuations. Most existing technologies are based on the statistical regularities of historical data for prediction, which makes it difficult to effectively model sudden changes, resulting in prediction lag or large deviations when prices change abruptly.

[0009] (3) Weak generalization ability of models under data limitations. Shipping data usually has a short time span, limited sample size, and contains noise and missing data. Deep learning models have a large number of parameters and are prone to overfitting in small sample scenarios, which leads to a decrease in the prediction stability and generalization performance of new data in practical applications. In fact, the performance may be worse than that of simple models. Summary of the Invention

[0010] To overcome the shortcomings of existing time series forecasting methods, such as difficulty in separating multiple dynamically superimposed periods in freight data, limited ability to capture non-stationary and sudden fluctuations, and poor generalization ability to new data, this invention provides a coal ocean freight forecasting method based on multi-period decomposition and fusion, so as to improve the accuracy of coal ocean freight forecasting and reduce coal transportation costs.

[0011] To solve the above-mentioned technical problems, the technical solution adopted by the present invention is as follows: In a first aspect, the present invention provides a method for predicting coal shipping costs based on multi-period decomposition and fusion, the method comprising: Constructing a historical time series of coal shipping cost data; The historical time series is input into the time series prediction module based on multi-period decomposition and fusion. The main periods are extracted by fast Fourier transform, the data is reshaped in two dimensions, and the feature components of different periods are extracted by two-dimensional convolutional network. After weighted fusion, preliminary prediction results are generated. External emergency information is input into the emergency information enhancement module based on the gated residual mechanism. After embedding and encoding, it is fused with the preliminary prediction results to generate the correction amount of the preliminary prediction results. The correction amount and the preliminary prediction result are aligned dimensionally on the prediction window, and residual correction is achieved through element-level addition to obtain the final prediction result.

[0012] Following the above technical solution, historical time series This includes coal shipping costs, market environment indicators, real-time supply and demand data, and a coal price index, expressed as follows: ,in The data representing time t contains C index features, where T represents the length of the time series. For historical time series Each indicator feature in the data is reversibly normalized and then input into the time series prediction module based on multi-period decomposition and fusion. After residual correction, it is then inversely normalized to obtain the final prediction result.

[0013] Following the above technical solution, the specific steps of the time series prediction module based on multi-period decomposition and fusion include: Embedding and encoding are performed on historical time series data to obtain coded representations. ; Encoding representation The main periods are extracted by performing a Fast Fourier Transform, and the one-dimensional embedded coding data is reconstructed into two dimensions based on the main periods to obtain a multi-period two-dimensional tensor. ; Two-dimensional convolutional networks are used to process two-dimensional tensors in each period. Perform feature extraction to obtain feature components of different periods; By weighted summation of the feature components of different periods, multi-period aggregated features are obtained, and preliminary prediction results are obtained.

[0014] Following the above technical solution, a deep linear network is used to embed and encode the historical time series, and the output of the last layer is taken as the encoded representation, wherein the deep linear network includes a multilayer perceptron.

[0015] Following the above technical solution, the encoding representation is... A Fast Fourier Transform (FFT) is performed to transform the frequency domain to the time domain. Periodic characteristics are identified by calculating the amplitude intensity of each frequency component. The period lengths corresponding to the k frequency components with the largest amplitudes are selected as the principal period, denoted as . The calculation process is expressed as follows: ; ; ; ; In the formula, For Fast Fourier Transform, For amplitude calculation, This indicates taking the average value across the variable dimensions. This represents the amplitude corresponding to the i-th period. Represents the set of amplitudes; This indicates finding the k frequency components with the largest amplitude. Let represent the frequency component corresponding to the i-th period, and T represent the length of the time series. Indicates the i-th period; This represents the weight corresponding to the i-th period, used for subsequent feature aggregation. τ For hyperparameters; The one-dimensional embedded coding data is reshaped into two dimensions based on the main period, and the calculation process is expressed as follows: ; In the formula, C represents the encoded representation. The number of indicator features; This is used to represent the coding. Each indicator feature in the data is reshaped in two dimensions according to its main period to obtain a multi-period two-dimensional tensor. ; Two-dimensional convolutional networks are used to process two-dimensional tensors in each period. Feature extraction is performed to obtain feature components of different periods. ; Characteristic components of different periods Flattening the sequence to a one-dimensional form and performing adaptive weighted summation yields multi-period aggregated features. Then, linear projection is used to generate preliminary prediction results. The calculation process is as follows: ; ; In the formula, For flattening operation, and These are the learnable linear weight matrix and the bias vector, respectively.

[0016] Following the above technical solution, the specific steps of the burst information enhancement module based on the gated residual mechanism include: External emergency information is embedded and encoded to generate an external emergency embedding vector. The preliminary prediction results are flattened and concatenated with the external event embedding vector to generate a joint feature vector. The joint feature vector is input into a shared multilayer perceptron backbone network and then split to two parallel output heads: (1) a gating head that uses the Sigmoid activation function to generate gating coefficients between 0 and 1. (1) Used to measure the intensity of the impact of external emergencies on the prediction results; (2) Residual head, using the Tanh activation function to generate the original residual amplitude. , representing the direction and magnitude of the correction value under the assumption of complete influence; the calculation process is expressed as: ; ; In the formula, Joint eigenvectors; , and , These are the learnable weight matrices and bias vectors for the gated branch and the residual branch, respectively. Use the Sigmoid activation function; Gating coefficient Compared with the original residual amplitude Perform weighted calculations to obtain the correction amount for the preliminary prediction results. The calculation method is expressed as follows: ; in, It represents the Hadamardi (or Hadama) stack.

[0017] Following the above technical solution, the correction amount is extended to the same dimension as the preliminary prediction result through a broadcast mechanism, and the correction amount is superimposed on the preliminary prediction result using element-level addition to achieve residual correction.

[0018] Following the above technical solution, the method further includes: A training strategy based on random input masks is introduced during the model training phase, as follows: A binary mask matrix is ​​generated according to a preset mask ratio, and element-wise multiplication is performed with the historical time series to generate a mask time series. Input the mask time series into the prediction model, output the reconstruction results, and calculate the reconstruction loss; The prediction loss and reconstruction loss are weighted and fused to construct a joint loss function, and the model parameters are optimized based on the joint loss function.

[0019] In a second aspect, the present invention provides a computer device / apparatus / system, including a memory, a processor, and a computer program stored in the memory, wherein the processor executes the computer program to implement the steps of the method described in the first aspect.

[0020] Thirdly, the present invention provides a computer-readable storage medium having a computer program / instructions stored thereon, which, when executed by a processor, implement the steps of the method described in the first aspect.

[0021] In summary, compared with the prior art, the above-described technical solutions conceived by this invention can achieve the following beneficial effects: (1) This invention proposes a time series prediction module based on multi-period decomposition and fusion. The model takes historical time series as input, extracts the main period by fast Fourier transform, and uses a two-dimensional convolutional network to perform weighted fusion of different period features in order to deeply capture the complex long-term dependence and multi-period superposition patterns in the data, thereby generating preliminary prediction results and improving the accuracy of coal shipping cost prediction.

[0022] (2) This invention breaks through the limitation of existing methods that mainly rely on historical data. It uses an external information enhancement module based on a gated residual mechanism to interact with the coding features of sudden events and the preliminary prediction results to generate adaptive prediction corrections. It effectively integrates external sudden event information, avoids prediction deviations caused by unexpected shocks, and improves the accuracy and robustness of predictions.

[0023] (3) The present invention constructs an auxiliary reconstruction task in the training stage through a training strategy based on input random mask. It aims to improve the generalization performance of the model under noise interference and unseen sudden situations by introducing data perturbation to force the model to learn to use context information, and ultimately enhance the stability and reliability of the model in real complex scenarios. Attached Figure Description

[0024] Figure 1 This is an overall flowchart of a coal shipping cost prediction method based on multi-period decomposition and fusion according to an embodiment of the present invention. Figure 2 This is a framework diagram of a time series prediction module based on multi-period decomposition and fusion according to an embodiment of the present invention; Figure 3 This is a framework diagram of a burst information enhancement module based on a gated residual mechanism according to an embodiment of the present invention; Figure 4 This is a flowchart illustrating the implementation of a training strategy based on an input random mask according to an embodiment of the present invention. Detailed Implementation

[0025] To make the objectives, technical solutions, and advantages of this invention clearer, the invention will be further described in detail below with reference to the accompanying drawings and embodiments. It should be understood that the specific embodiments described herein are merely illustrative and not intended to limit the invention. All other embodiments obtained by those skilled in the art based on the embodiments provided by this invention without inventive effort are within the scope of protection of this invention.

[0026] Obviously, the accompanying drawings described below are merely some examples or embodiments of the present invention. Those skilled in the art can apply the present invention to other similar scenarios based on these drawings without any inventive effort. Furthermore, it is understood that although the efforts made in this development process may be complex and lengthy, for those skilled in the art related to the content disclosed in this invention, modifications to design, manufacturing, or production based on the technical content disclosed in this invention are merely conventional technical means and should not be construed as insufficient disclosure of the present invention.

[0027] In this invention, the reference to "embodiment" means that a specific feature, structure, or characteristic described in connection with an embodiment may be included in at least one embodiment of the invention. The appearance of this phrase in various places in the specification does not necessarily refer to the same embodiment, nor is it a mutually exclusive, independent, or alternative embodiment. It will be explicitly and implicitly understood by those skilled in the art that the embodiments described in this invention may be combined with other embodiments without conflict.

[0028] Unless otherwise defined, the technical or scientific terms used in this invention shall have the ordinary meaning understood by one of ordinary skill in the art to which this invention pertains. The terms "a," "an," "an," "the," and similar words used in this invention do not indicate quantity limitation and may indicate singular or plural. The terms "comprising," "including," "having," and any variations thereof used in this invention are intended to cover non-exclusive inclusion; for example, a process, method, system, product, or device that includes a series of steps or modules (units) is not limited to the listed steps or units, but may also include steps or units not listed, or may include other steps or units inherent to these processes, methods, products, or devices. The terms "connected," "linked," "coupled," and similar words used in this invention are not limited to physical or mechanical connections, but may include electrical connections, whether direct or indirect. "A plurality" used in this invention refers to two or more. "And / or" describes the relationship between related objects, indicating that three relationships may exist; for example, "A and / or B" can represent: A alone, A and B simultaneously, and B alone. The character " / " generally indicates that the preceding and following objects have an "or" relationship. The terms "first," "second," and "third" used in this invention are merely to distinguish similar objects and do not represent a specific ordering of the objects.

[0029] This invention provides a coal shipping cost forecasting method based on multi-period decomposition and fusion with burst information enhancement, specifically a method applicable to coal shipping costs forecasting for multiple international and domestic shipping routes. This method mainly includes the following four components: Content 1: Using historical time series as input, design a time series prediction module based on multi-period decomposition and fusion. By using fast Fourier transform and two-dimensional convolutional network to capture the long-term dependence and periodic characteristics of the data, preliminary prediction results are generated.

[0030] Content 2: Using external emergency information as input, design an emergency information enhancement module based on a gated residual mechanism. By encoding the event and interacting with its features, a correction amount for the preliminary prediction result is generated.

[0031] Content 3: Align the preliminary prediction results of Content 1 with the correction values ​​of Content 2 on the prediction window, and fuse them using an adaptive weighting method to obtain the final prediction results.

[0032] Content 4 proposes a training strategy based on input random mask, which introduces data perturbation through auxiliary reconstruction task to improve the generalization of the model under noise interference and sudden situations.

[0033] Content 1 is achieved through the following steps: (1) The collected data is preprocessed and then embedded and encoded; (2) Perform a fast Fourier transform on the encoded data, extract the main period, and reshape the data according to the main period; (3) A two-dimensional convolutional network is used to process the reshaped data to obtain feature components of different periods; (4) Weighted summation of the feature components of different periods is performed to obtain multi-period aggregated features, and then preliminary prediction results are obtained.

[0034] Content 2 is achieved through the following steps: (1) Embedding and encoding information about external emergencies; (2) The preliminary prediction results obtained from Content 1 are fused with the sudden event embedding to generate a joint feature vector; (3) Use a multilayer perceptron network to process the joint feature vector and calculate the gating coefficients and the original residual amplitude in parallel; (4) Weight the gating coefficients and the original residual magnitude to obtain the correction amount of the preliminary prediction result.

[0035] Content 3 is achieved through the following steps: (1) Align the correction amount output by Content 2 with the preliminary prediction result output by Content 1 in terms of dimensions; (2) The element-level addition operation is used to superimpose the aligned correction amount onto the preliminary prediction result to achieve residual correction; (3) Perform inverse normalization on the corrected data to obtain the final prediction result.

[0036] Content 4 is achieved through the following steps: (1) Generate a binary mask matrix according to the mask ratio, and perform element-wise multiplication to obtain the mask time series; (2) Input the mask time series into the prediction model for reconstruction and calculate the reconstruction loss; (3) Construct a joint loss function based on the prediction loss and reconstruction loss, optimize the model parameters based on the joint loss function, force the model to use context information to infer missing content, and improve generalization ability.

[0037] Figure 1This is an overall flowchart of a coal shipping cost forecasting method based on multi-period decomposition and fusion, as proposed in this invention. For the collected data (1) coal shipping costs, 2) market environment indicators, 3) real-time supply and demand data, and 4) coal price index, preprocessing techniques such as missing value imputation and timestamp alignment are first used to generate standardized historical time series. The historical time series is then reversibly normalized and input into the multi-period decomposition and fusion-based forecasting module. Through period decomposition, feature extraction, and feature fusion, a preliminary forecast result is generated. Subsequently, this result, along with preprocessed burst information, is input into the burst information enhancement module. The feature fusion of these two components learns a correction based on the preliminary forecast result. Finally, the correction value and the preliminary forecast result are dimensionally aligned and superimposed. The superimposed result is then inversely normalized to obtain the final forecast result.

[0038] Content 1: Using historical time series as input, design a time series prediction module based on multi-period decomposition and fusion. By using fast Fourier transform and two-dimensional convolutional network to capture the long-term dependence and periodic characteristics of the data, preliminary prediction results are generated.

[0039] In coal shipping cost forecasting, the data exhibits a complex, multi-period superposition characteristic. Influenced by seasonal climate change, macroeconomic rhythms, and maritime trade practices, shipping cost series simultaneously exhibit short-term weekly trading fluctuations, medium-term monthly / quarterly supply and demand fluctuations, and long-term annual seasonal trends. Only by accurately capturing and separating these periodic features at different scales and establishing deep dependencies between the data across short- and long-term dimensions can accurate predictions of shipping cost trends be achieved. Therefore, this invention proposes a time series forecasting module based on multi-period decomposition and fusion. This module is no longer limited to a single time dimension perspective but identifies the main cycles of the data through frequency domain transformation, reshaping the one-dimensional time series into a two-dimensional multi-period tensor. Utilizing the powerful feature extraction capabilities of a two-dimensional convolutional network, it captures the evolutionary patterns of the data from different periodic scales, ultimately generating preliminary prediction results. The module framework diagram is attached. Figure 2 .

[0040] (1) The collected data is preprocessed and then embedded and encoded.

[0041] First, missing values ​​were filled using linear interpolation. Then, the collection frequency for different indicators was standardized to once a day. For indicators with high collection frequency, downsampling was used for sparsity reduction; while for indicators with low collection frequency, resampling was used for filling. Non-numerical indicators were discretized using one-hot encoding. After preprocessing, the coal shipping cost data presented a standard historical time series. ,in For data at time t It contains a total of C indicator features.

[0042] Since different indicators typically have different physical meanings, dimensions, and value ranges, reversible instance normalization is first performed on historical time series data to improve the model's adaptability to data at different scales. For each sequence sample... Each feature Calculate its mean and standard deviation Then normalize to obtain the normalized sequence. The calculation process is expressed as follows: ; ; ; In the formula, T represents the length of the time series.

[0043] A deep linear network is then used for preliminary processing. The purpose is to project all indicators onto the same representation space, ultimately obtaining the embedded encoded representation of the state data within that space. This scheme uses a multilayer perceptron to implement the deep linear network. The calculation process is expressed as follows: ; in, It is a non-linear activation function that limits the output to the interval (0, 1). and These are the learnable linear weight matrix and bias vector, respectively. The output of the last layer is taken as the encoded representation, denoted as... This serves as the input for a time series prediction module based on multi-period decomposition and fusion.

[0044] (2) Perform a fast Fourier transform on the encoded data, extract the main period, and reshape the data according to the main period.

[0045] The obtained encoded representation As input, the data is first transformed from the time domain to the frequency domain using a Fast Fourier Transform (FFT). Significant periodic features in the data are identified by calculating the amplitude intensity of each frequency component. The period lengths corresponding to the k frequency components with the largest amplitudes are selected as the principal periods, denoted as . The calculation process is expressed as follows: ; ; ; ; in, For Fast Fourier Transform, For amplitude calculation, This indicates taking the average value across the variable dimensions. This represents the amplitude corresponding to the i-th period. Represents the set of amplitudes; This represents the frequency corresponding to the i-th period. Indicates the i-th period; This represents the weight corresponding to the i-th period, which is used for subsequent feature aggregation. τ This is a hyperparameter.

[0046] Subsequently, the one-dimensional embedded encoding data is reshaped into a two-dimensional tensor based on the main period. The calculation process is expressed as follows: ; In the formula, C represents the encoded representation. The number of indicator features.

[0047] This is used to represent the coding. Each indicator feature in the data is reshaped in two dimensions according to its main period to obtain a multi-period two-dimensional tensor. .

[0048] (3) A two-dimensional convolutional network is used to process the reshaped data to obtain feature components of different periods.

[0049] For each reshaped two-dimensional tensor Using a two-dimensional convolutional network, feature representations corresponding to different periods are obtained. The calculation process is expressed as follows: ; The two-dimensional convolution kernel slides along the time dimension (columns) and the period dimension (rows), which can simultaneously aggregate intra-period changes and cross-period changes of different periods.

[0050] (4) Weighted summation of the feature components of different periods is performed to obtain multi-period aggregated features and preliminary prediction results.

[0051] Representing the characteristics corresponding to different periods Flattening the sequence to a one-dimensional form and performing adaptive weighted summation yields multi-period aggregated features. Then, linear projection is used to generate preliminary prediction results. The calculation process is as follows: ; ; in For flattening operation, and These are the learnable linear weight matrix and bias vector, respectively. The generated preliminary prediction results... As input to the emergency information enhancement module.

[0052] Content 2: Taking external emergency event information as input, design an emergency information enhancement module based on a gated residual mechanism. By encoding the event and interacting with its features, a correction amount for the preliminary prediction results is generated. This paper proposes a burst information enhancement module based on a gated residual mechanism. The module mainly consists of an embedding coding layer, a feature fusion layer, and a multilayer perceptron network. The module framework diagram is attached. Figure 3 .

[0053] (1) Embedding and encoding information on external emergencies.

[0054] External emergencies Typically existing in discrete form, in order for neural networks to process this unstructured information, it first needs to be mapped to a continuous vector space to obtain the burst event embedding. The calculation process is expressed as follows: ; in, This indicates an embedding operation. For the embedded dimension.

[0055] (2) The preliminary prediction results obtained from Content 1 are fused with the sudden event embedding to generate a joint feature vector.

[0056] To assess the specific impact of an emergency on current predictions, the model needs to know not only "what emergency occurred" (embedded event) but also "what was originally predicted" (preliminary prediction result). The preliminary prediction result output from Content 1 will be used. Perform a flattening operation to transform it from a time series tensor into a one-dimensional feature vector, and then combine it with the event embedding vector. By concatenating the features along the feature dimension, a joint feature vector is obtained that integrates historical trend context and information about the sudden event. The calculation process is expressed as follows: ; in, For flattening operation, This is for splicing operations.

[0057] (3) Use a multilayer perceptron network to process the joint feature vector and calculate the gating coefficient and the original residual amplitude in parallel.

[0058] Joint feature vectors The input is fed into a shared multilayer perceptron backbone network, and then split to two parallel output heads: 1) Gating Head: generates gating coefficients between 0 and 1 using the sigmoid activation function. 1) Used to measure the intensity of the impact of sudden events on the prediction results, where 0 indicates no impact; 2) Residual Head: The original residual amplitude is generated using the Tanh activation function. , representing the direction and magnitude of the correction value under the assumption of complete influence. The calculation process for both is as follows: ; ; in, , and , These are the learnable weight matrices and bias vectors for the gated branch and the residual branch, respectively. This is the Sigmoid activation function. The gating mechanism enables the model to adaptively filter noise, generating only effective correction signals for major emergencies.

[0059] (4) Weight the gating coefficients and the original residual magnitude to obtain the correction amount of the preliminary prediction result.

[0060] The gating coefficients are applied to the original residual magnitude using element-wise multiplication to generate the correction amount. Its calculation method is expressed as follows: ; in, This represents the Hadamard product. This gating mechanism ensures the smoothness and differentiability of the correction process, avoiding potential predictive abrupt changes.

[0061] Content 3: Align the preliminary prediction results of Content 1 with the correction values ​​of Content 2 on the prediction window, and fuse them using an adaptive weighting method to obtain the final prediction results.

[0062] (1) Align the correction amount output by Content 2 with the preliminary prediction result output by Content 1 in terms of dimensions.

[0063] Due to the preliminary prediction results output by content 1 It is a multidimensional vector covering the prediction window length L, while content 2 outputs the correction amount. It is a one-dimensional vector for each target variable, so the two cannot be directly superimposed. The correction amount is first distributed via a broadcast mechanism. Extended to preliminary forecast results Same dimensions.

[0064] (2) The element-level addition operation is used to superimpose the aligned correction amount onto the preliminary prediction result to achieve residual correction.

[0065] The impact of the sudden event is superimposed as residuals onto the preliminary prediction results obtained from the prediction model, resulting in a prediction sequence that is still within the normalized numerical range. The calculation process is expressed as follows: ; This approach implements a decoupled modeling concept that combines "basic trends" with "sudden disturbances." This residual connection method allows the model to degenerate into a pure prediction model when there are no sudden events, while adaptively adjusting its trajectory when events occur, significantly improving the model's robustness.

[0066] (3) Perform inverse normalization on the corrected data to obtain the final prediction result.

[0067] Because the input data underwent instance normalization during preprocessing, the model output does not contain the statistical distribution information of the original data. To obtain the final true prediction value, inverse normalization is required. This involves extracting the mean vector calculated and retained during the preprocessing stage of the input sample sequence. and standard deviation vector ,right Perform a reverse transformation. This operation reinjects the non-stationary trend back into the prediction results, yielding the final prediction. The calculation process is expressed as follows: ; in, .

[0068] Instance normalization directly transmits distribution information from the input to the output, allowing the intermediate model layers to focus only on local fluctuation patterns after removing trends. This is key to solving the problem of significant price shifts in ocean freight data across different years.

[0069] Content 4: A training strategy based on input random masking is proposed. By introducing data perturbation through the auxiliary reconstruction task, the generalization ability of the model under noise interference and sudden situations is improved. Training strategies based on random input masks are a common sub-supervised learning enhancement technique. They aim to improve the model's understanding of time-series contextual dependencies by reconstructing past auxiliary tasks while learning the primary prediction task. This prevents overfitting to specific noise and improves robustness. See the attached diagram for the framework. Figure 4 .

[0070] (1) Generate a binary mask matrix according to the mask ratio, and perform element-wise multiplication to obtain the mask time series.

[0071] First, based on the preset mask ratio, a Bernoulli distribution is used to generate a sequence that corresponds to the input time series. Binary mask matrices with the same shape This introduces random perturbation sources into the data. Unlike fixed noise additions, random masks simulate real-world unstructured missing data such as sensor malfunctions and data loss, forcing the model to infer information from the global context rather than relying on a few fixed time points or features. The input sequence... Element-wise multiplication with the mask matrix, i.e., Hadamard product Generate mask time series The calculation process is expressed as follows: .

[0072] (2) Input the mask time series into the prediction model for reconstruction and calculate the reconstruction loss.

[0073] The masked time series data is input into the backbone network of the prediction model. In addition to outputting predictions for the future, the model uses an additional reconstruction head to map the latent features back to the original input dimensions, obtaining the reconstruction result. Reconstruction losses Focusing only on the recovery of the masked portion, the mean square error (MSE) between the original and reconstructed sequences at the mask location is calculated. The calculation process is as follows: ; in, Represents a matrix of all ones. This represents the masked portion. To accurately reconstruct the erased data, the model must have a deep understanding of the inherent periodicity and correlations between variables in the coal shipping cost data.

[0074] (3) Construct a joint loss function based on the prediction loss and reconstruction loss, optimize the model parameters based on the joint loss function, force the model to use context information to infer missing content, and improve generalization ability.

[0075] The predicted loss of the main task Reconstruction loss of auxiliary tasks Weighted fusion is performed to construct the final joint loss function. The calculation process is expressed as follows: ; in, and This is a hyperparameter used to balance the weights of the two tasks. Based on this joint loss, all parameters of the model are updated through backpropagation. This strategy enables multi-task learning. The prediction loss guides the model to focus on future trends, while the reconstruction loss guides the model to understand historical structures. The combination of the two enables the model to have both prediction accuracy and the ability to maintain stable performance under noise and sudden conditions.

[0076] Furthermore, the present invention also provides a computer device / apparatus / system, including a memory, a processor, and a computer program stored in the memory, wherein the processor executes the computer program to implement the steps of the above-described method.

[0077] The present invention also provides a computer-readable storage medium having a computer program / instructions stored thereon, which, when executed by a processor, implement the steps of the above-described method.

[0078] This invention is described with reference to flowchart illustrations and / or block diagrams of methods, systems, and computer program products according to embodiments of the invention. It should be understood that each block of the flowchart illustrations and / or block diagrams, as well as combinations of blocks in the flowchart illustrations and / or block diagrams, can be implemented by computer program instructions. These computer program instructions can be provided to a processor of a general-purpose computer, a special-purpose computer, an embedded processor, or other programmable data processing apparatus to produce a machine such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions specified in one or more blocks of the flowchart illustrations and / or block diagrams.

[0079] These computer program instructions may also be stored in a computer-readable storage medium capable of directing a computer or other programmable data processing device to function in a particular manner, such that the instructions stored in the computer-readable storage medium produce an article of manufacture including instruction means that implement the functions specified in one or more flowcharts and / or one or more blocks in a block diagram.

[0080] These computer program instructions may also be loaded onto a computer or other programmable data processing apparatus to cause a series of operational steps to be performed on the computer or other programmable data processing apparatus to produce a computer-implemented process, thereby providing steps for implementing the functions specified in one or more flowcharts and / or one or more blocks in a block diagram.

[0081] It should be noted that, depending on the implementation needs, the various steps / components described in this invention can be broken down into more steps / components, or two or more steps / components or parts of the operation of steps / components can be combined into new steps / components to achieve the purpose of this invention.

[0082] Those skilled in the art will readily understand that the above are merely preferred embodiments of the present invention and are not intended to limit the present invention. Any modifications, equivalent substitutions, and improvements made within the spirit and principles of the present invention should be included within the scope of protection of the present invention.

Claims

1. A method for predicting coal shipping costs based on multi-period decomposition and fusion, characterized in that, The method includes: Constructing a historical time series of coal shipping cost data; The historical time series is input into the time series prediction module based on multi-period decomposition and fusion. The main periods are extracted by fast Fourier transform, the data is reshaped in two dimensions, and the feature components of different periods are extracted by two-dimensional convolutional network. After weighted fusion, preliminary prediction results are generated. External emergency information is input into the emergency information enhancement module based on the gated residual mechanism. After embedding and encoding, it is fused with the preliminary prediction results to generate the correction amount of the preliminary prediction results. The correction amount and the preliminary prediction result are aligned dimensionally on the prediction window, and residual correction is achieved through element-level addition to obtain the final prediction result.

2. The coal shipping cost prediction method based on multi-period decomposition and fusion according to claim 1, characterized in that, Historical Time Series This includes coal shipping costs, market environment indicators, real-time supply and demand data, and a coal price index, expressed as follows: ,in The data representing time t contains C index features, where T represents the length of the time series. For historical time series Each indicator feature in the data is reversibly normalized and then input into the time series prediction module based on multi-period decomposition and fusion. After residual correction, it is then inversely normalized to obtain the final prediction result.

3. The coal shipping cost prediction method based on multi-period decomposition and fusion according to claim 1, characterized in that, The specific steps of the time series prediction module based on multi-period decomposition and fusion include: Embedding and encoding are performed on historical time series data to obtain coded representations. ; Encoding representation The main periods are extracted by performing a Fast Fourier Transform, and the one-dimensional embedded coding data is reconstructed into two dimensions based on the main periods to obtain a multi-period two-dimensional tensor. ; Two-dimensional convolutional networks are used to process two-dimensional tensors in each period. Perform feature extraction to obtain feature components of different periods; By weighted summation of the feature components of different periods, multi-period aggregated features are obtained, and preliminary prediction results are obtained.

4. The coal shipping cost prediction method based on multi-period decomposition and fusion according to claim 3, characterized in that, A deep linear network is used to embed and encode historical time series data, and the output of the last layer is taken as the encoded representation. The deep linear network includes a multilayer perceptron.

5. The coal shipping cost prediction method based on multi-period decomposition and fusion according to claim 3, characterized in that, Encoding representation Perform a fast Fourier transform to convert it from the time domain to the frequency domain, and identify periodic characteristics by calculating the amplitude intensity of each frequency component; The period lengths corresponding to the k frequency components with the largest amplitudes are selected as the principal period, denoted as . The calculation process is expressed as follows: ; ; ; ; In the formula, For Fast Fourier Transform, For amplitude calculation, This indicates taking the average value across the variable dimensions. This represents the amplitude corresponding to the i-th period. Represents the set of amplitudes; This indicates finding the k frequency components with the largest amplitude. Let represent the frequency component corresponding to the i-th period, and T represent the length of the time series. Indicates the i-th period; This represents the weight corresponding to the i-th period, used for subsequent feature aggregation. τ For hyperparameters; The one-dimensional embedded coding data is reshaped into two dimensions based on the main period, and the calculation process is expressed as follows: ; In the formula, C represents the encoded representation. The number of indicator features; This is used to represent the coding. Each indicator feature in the data is reshaped in two dimensions according to its main period to obtain a multi-period two-dimensional tensor. ; Two-dimensional convolutional networks are used to process two-dimensional tensors in each period. Feature extraction is performed to obtain feature components of different periods. ; Characteristic components of different periods Flattening the sequence to a one-dimensional form and performing adaptive weighted summation yields multi-period aggregated features. Then, linear projection is used to generate preliminary prediction results. The calculation process is as follows: ; ; In the formula, For flattening operation, and These are the learnable linear weight matrix and the bias vector, respectively.

6. The coal shipping cost prediction method based on multi-period decomposition and fusion according to claim 1, characterized in that, The specific steps of the burst information enhancement module based on the gated residual mechanism include: External emergency information is embedded and encoded to generate an external emergency embedding vector. The preliminary prediction results are flattened and concatenated with the external event embedding vector to generate a joint feature vector. The joint feature vector is input into a shared multilayer perceptron backbone network and then split to two parallel output heads: (1) a gating head that uses the Sigmoid activation function to generate gating coefficients between 0 and 1. (1) Used to measure the intensity of the impact of external emergencies on the prediction results; (2) Residual head, using the Tanh activation function to generate the original residual amplitude. , representing the direction and magnitude of the correction value under the assumption of complete influence; the calculation process is expressed as: ; ; In the formula, Joint eigenvectors; , and , These are the learnable weight matrices and bias vectors for the gated branch and the residual branch, respectively. Use the Sigmoid activation function; Gating coefficient Compared with the original residual amplitude Perform weighted calculations to obtain the correction amount for the preliminary prediction results. The calculation method is expressed as follows: ; in, It represents the Hadamardi (or Hadama) stack.

7. The coal shipping cost prediction method based on multi-period decomposition and fusion according to claim 1, characterized in that, The correction amount is extended to the same dimension as the initial prediction result through a broadcast mechanism, and the correction amount is superimposed on the initial prediction result using element-level addition to achieve residual correction.

8. The coal shipping cost prediction method based on multi-period decomposition and fusion according to claim 1, characterized in that, The method also includes: A training strategy based on random input masks is introduced during the model training phase, as follows: A binary mask matrix is ​​generated according to a preset mask ratio, and element-wise multiplication is performed with the historical time series to generate a mask time series. Input the mask time series into the prediction model, output the reconstruction results, and calculate the reconstruction loss; The prediction loss and reconstruction loss are weighted and fused to construct a joint loss function, and the model parameters are optimized based on the joint loss function.

9. A computer device / equipment / system, comprising a memory, a processor, and a computer program stored in the memory, characterized in that, The processor executes the computer program to implement the steps of the method according to any one of claims 1 to 8.

10. A computer-readable storage medium having a computer program / instructions stored thereon, characterized in that, When the computer program / instructions are executed by the processor, they implement the steps of the method according to any one of claims 1 to 8.