Retail sales forecasting method based on multi-model fusion perspective

By adopting a multi-model fusion perspective, a time calendar and price feature system is constructed. Combining SARIMA, Prophet, N-BEATS and XGBoost models, the problem of prediction stability and error control of complex retail sales data across different time spans is solved, achieving more accurate retail sales forecasting.

CN122243542APending Publication Date: 2026-06-19NANTONG UNIV

Patent Information

Authority / Receiving Office
CN · China
Patent Type
Applications(China)
Current Assignee / Owner
NANTONG UNIV
Filing Date
2026-02-06
Publication Date
2026-06-19

AI Technical Summary

Technical Problem

When faced with massive retail sales data with complex nonlinear interactions, existing technologies struggle to effectively distinguish the applicable boundaries and advantages of models across different time spans, resulting in significant differences in the stability and error control of prediction tasks.

Method used

We adopt a multi-model fusion perspective to construct a time calendar feature and price feature system. By combining SARIMA, Prophet, N-BEATS and XGBoost models, we use feature engineering and supervised regression to peel away the trend and seasonal components in the signal layer by layer, capture short-term fluctuations and long-term trends, prevent overfitting, and improve prediction accuracy by using multi-layer difference and change point detection mechanisms.

Benefits of technology

By clarifying the applicable boundaries and advantageous ranges of different algorithms, we can provide more valuable reference for model selection, assist in making scientific quantitative decisions, and improve the accuracy and stability of retail sales forecasting.

✦ Generated by Eureka AI based on patent content.

Smart Images

  • Figure CN122243542A_ABST
    Figure CN122243542A_ABST
Patent Text Reader

Abstract

This invention discloses a retail sales forecasting method based on a multi-model fusion perspective, comprising the following steps: constructing a time-calendar feature system and a price feature system respectively; fusing multiple models to predict the results; wherein, the time-calendar feature system is constructed by extracting and encoding the following features from the date field: basic time features, weekend identifiers, and special event features; the price feature system is constructed, namely relative price features, to reflect the difference between the current price and the historical average price; wherein, the multi-model prediction results are fused, and the models include one or more of SARIMA, Prophet, N-BEATS, and XGBoost models. This invention can provide more valuable model selection criteria for complex time series forecasting tasks, assisting relevant practitioners in making more scientific quantitative decisions.
Need to check novelty before this filing date? Find Prior Art

Description

Technical Field

[0001] This invention belongs to the field of artificial intelligence and computer technology, and more specifically, relates to a retail sales forecasting method based on a multi-model fusion perspective. Background Technology

[0002] With the explosive growth of e-commerce data collection dimensions, leveraging statistical and machine learning techniques to uncover the evolutionary trends behind sales time series has become crucial for improving retail decision-making efficiency. The evolution from early statistical methods to modern machine learning paradigms has not only reshaped the ability of this invention to process complex sequence data but also greatly expanded the application boundaries of prediction technology in the e-commerce environment. When faced with sales data exhibiting nonlinear and highly volatile characteristics, selecting appropriate algorithms based on the length of the prediction period has become key to improving decision-making quality.

[0003] In the past, statistical inference-based models have long dominated the literature. Many scholars have used the SARIMA model to analyze the seasonality and stationarity of data, demonstrating its robustness in handling linear dependencies. Meanwhile, the Prophet algorithm was developed to address the strong cyclical patterns commonly found in business and economic data. This model employs a decomposable time series structure and exhibits unique advantages in handling trend abrupt changes and holiday effects, and has been widely applied in various industrial forecasting scenarios.

[0004] However, the limitations of traditional methods are becoming increasingly apparent when faced with complex nonlinear interactions within massive datasets, prompting the rapid development of ensemble learning and deep learning technologies. Among these, the XGBoost model, with its efficient gradient boosting framework and regularization mechanism, has demonstrated predictive accuracy surpassing traditional statistical methods in numerous data science competitions and empirical studies. In the field of deep learning, the proposed N-BEATS architecture, through stacking fully connected layers and residual connections, has successfully broken through the bottleneck of traditional recurrent neural networks in long-sequence modeling, further demonstrating the enormous potential of deep neural networks in extracting high-dimensional features from time series data.

[0005] While various algorithms have made achievements in specific fields, the performance differences between statistical evaluation models, ensemble learning models, and deep learning models across different time spans (forecast horizon) within a unified experimental framework still need further investigation. In particular, when facing medium- to long-term prediction tasks with multiple steps, models using different mechanisms often exhibit significant differences in stability and error control, a point that has not been fully explored in existing literature. Summary of the Invention

[0006] To address the aforementioned issues, this invention proposes a retail sales forecasting method based on a multi-model fusion perspective. By comparing the differences in capturing short-term fluctuations and long-term trends among four representative models—SARIMA, Prophet, N-BEATS, and XGBoost—this invention clarifies the applicable boundaries and advantageous ranges of different algorithms. This provides a more valuable basis for model selection for complex time series forecasting tasks, assisting practitioners in making more scientific quantitative decisions.

[0007] To address at least one of the aforementioned technical problems, according to one aspect of the present invention, a retail sales forecasting method based on a multi-model fusion perspective is provided, comprising the following steps:

[0008] 1. The process includes the following steps: constructing a time calendar feature system and a price feature system respectively; and fusing the prediction results of multiple models.

[0009] Specifically, a time calendar feature system is constructed to extract and encode the following features from the date field:

[0010] Basic time characteristics: year, month, day of the month, and day of the week;

[0011] A weekend identifier is created, and a boolean variable is constructed to capture the weekend shopping peak.

[0012] Special event characteristics are used to classify and encode holidays and introduce variables to capture the pulling effect of coupon issuance days on the sales of discounted products;

[0013] Among them, a price feature system, namely relative price features, is constructed to reflect the difference between the current price and the historical average price;

[0014] The results are fused from multiple models, including one or more of the following models: SARIMA, Prophet, N-BEATS, and XGBoost.

[0015] Furthermore, the relative price features constructed in the price feature system include: the rate of price change, which is used to capture the non-linear stimulating effect of promotions and discounts on sales.

[0016] Furthermore, the XGBoost model adopts a "feature engineering + supervised regression" paradigm, continuously adding new decision trees through an additive model to fit the residuals from the previous step. The algorithm iteration process is as follows:

[0017] Additive model assumptions; assumptions after The predicted value obtained from the round of iterations is ; in the Wheel, training a new tree This minimizes the objective function:

[0018]

[0019] Second-order Taylor expansion optimization; XGBoost model adjusts the loss function. A second-order Taylor approximation was performed;

[0020] in, For the first-order gradient, It is a second-order Hessian matrix;

[0021] Structure scoring and splitting strategy; the XGBoost model defines a structure score formula for a given leaf node. Optimal weight The corresponding minimum loss is:

[0022]

[0023] in It is the set of samples that fall into the leaf nodes; during training, the algorithm traverses all possible feature split points, searching for those that make the leaf nodes fall into the leaf nodes. The scheme that reduces the most splitting;

[0024] Mechanisms to prevent overfitting; in the formula For L2 regularization, Penalize leaf nodes; and This directly penalizes the model's complexity.

[0025] Furthermore, the SARIMA model eliminates nonstationarity through multi-level differencing and combines autoregressive and moving average components for modeling.

[0026] Stabilization treatment; using Box-Cox transformation to stabilize variance; to eliminate trends and seasonality, the original sequence was... conduct Ordinary difference sum of order The formula for the seasonal difference is:

[0027]

[0028] in Lag operator , The residual sequence after stationarization;

[0029] Model fitting; stationary sequence It is modeled as the following linear equation:

[0030]

[0031] Non-seasonal portion: The current value is affected by the past. The influence of the weather;

[0032] Seasonal component: The current value is affected by the past number. The impact of the same period in the previous week;

[0033] Similarly, for the past error terms respectively Modeling;

[0034] Parameter estimation; finding the optimal parameter combination using the maximum likelihood estimation method. The order is determined by minimizing the Akaike information criterion. and .

[0035] Furthermore, the Prophet model incorporates time... Treated as the sole regression variable, predicted values ​​are generated by decomposing and superimposing components; the core generation formula is:

[0036]

[0037] Nonlinear trend term To address the characteristics of e-commerce sales having an upper limit and growth rates varying over time, a logistic regression growth model is adopted.

[0038]

[0039] in, For carrying capacity, For growth rate, The offset parameter; Prophet introduces a change point detection mechanism, allowing the growth rate at potential change points. Sudden changes occur, allowing us to capture trend shifts resulting from adjustments in promotional strategies;

[0040] Seasonal items Using Fourier series to fit multi-period cycles; for periodic cycles The model is constructed in the following matrix form:

[0041]

[0042] Where the eigenvector The coefficients are to be fitted.

[0043] Holiday effect Construct the indicator function matrix Each column represents a festival:

[0044]

[0045] in Regularization parameters Control the sparsity of the impact of holidays.

[0046] Furthermore, the N-BEATS model is a deep neural network architecture that uses stacked residual connections to progressively remove trend and seasonal components from the signal; the deep stacking process is as follows:

[0047] The model is composed of It consists of stacked blocks, each stacked block contains multiple blocks; the first... The workflow for a block is as follows:

[0048] Feature encoding; input is historical data from the lookback window. Initially Nonlinear features are extracted using four fully connected layers and the ReLU activation function.

[0049]

[0050] The basis function expansion coefficients are used to output two projection vectors—the "backward coefficients"—from the final layer of the network. And "forward coefficient" :

[0051]

[0052] Waveform synthesis and decomposition; coefficients and preset basis functions Multiply to generate the fitted curve:

[0053] Backcast, used for fitting history:

[0054] Forecast, used to predict the future:

[0055] Bidirectional residual propagation; downward propagation: historical signal residuals not described by the current block. It will be used as input for the next block. ;

[0056] Backward output: The predicted future signal of the current block This will be added to the final result;

[0057]

[0058] The first stack can focus on capturing macro trends, while subsequent stacks can focus on capturing high-frequency details.

[0059] According to another aspect of the present invention, a computer-readable storage medium is provided having a computer program stored thereon that, when executed by a processor, implements the steps of the retail sales forecasting method based on a multi-model fusion perspective of the present invention.

[0060] According to another aspect of the present invention, a computer device is provided, including a memory, a processor, and a computer program stored in the memory and executable on the processor, wherein the processor executes the program to implement the steps of the retail sales forecasting method based on a multi-model fusion perspective of the present invention.

[0061] Compared with existing technologies, the beneficial effects of the above-described method of the present invention are as follows:

[0062] This invention clarifies the applicable boundaries and advantageous ranges of different algorithms, thereby providing a more valuable basis for model selection for complex time series prediction tasks and assisting relevant practitioners in making more scientific quantitative decisions. Attached Figure Description

[0063] To more clearly illustrate the technical solutions of the embodiments of the present invention, the accompanying drawings of the embodiments will be briefly described below. Obviously, the drawings described below only relate to some embodiments of the present invention and are not intended to limit the present invention.

[0064] Figure 1 This is a comparison chart of the prediction results of each model and the actual results in a 7-day prediction window according to a preferred embodiment of the present invention;

[0065] Figure 2 This is a comparison chart of the prediction results of each model and the actual results in a 14-day prediction window according to a preferred embodiment of the present invention;

[0066] Figure 3 This is a comparison chart of the prediction results of each model and the actual results in a 28-day prediction window according to a preferred embodiment of the present invention;

[0067] Figure 4 This is a flowchart of a preferred embodiment of the present invention. Detailed Implementation

[0068] To make the objectives, technical solutions, and advantages of the embodiments of the present invention clearer, the technical solutions of the embodiments of the present invention will be clearly and completely described below with reference to the accompanying drawings. Obviously, the described embodiments are only some, not all, of the embodiments of the present invention.

[0069] Unless otherwise defined, the technical or scientific terms used herein shall have the ordinary meaning as understood by one of ordinary skill in the art to which this invention pertains.

[0070] Example 1:

[0071] like Figure 1-4 As shown, this invention provides a retail sales forecasting method based on a multi-model fusion perspective, comprising the following steps:

[0072] Construct two types of feature systems:

[0073] (1) Calendar Features

[0074] Retail sales exhibit a strong cyclical pattern. This invention extracts and encodes the following features from the date field:

[0075] Basic time characteristics: year, month, day of the month, day of the week;

[0076] Weekend identifier: Construct a boolean variable to capture the weekend shopping peak;

[0077] Special event characteristics: Holidays are categorized and coded, and variables are introduced to capture the boosting effect of food voucher issuance days on food sales.

[0078] (2) Price Features

[0079] Price is the most sensitive factor affecting sales volume. This invention constructs a relative price feature to reflect the difference between the current price and the historical average price.

[0080] Price change rate: This feature can effectively capture the non-linear stimulating effect of promotional discounts (negative values) on sales.

[0081] This embodiment selects four representative prediction models, covering classical statistics, industry generalized additive models, cutting-edge deep learning, and ensemble machine learning paradigms.

[0082] SARIMA is the cornerstone of processing univariate time series data. It is particularly useful for the significant "weekly" cyclical characteristics found in the M5 dataset. SARIMA eliminates nonstationarity through multi-level differencing and combines autoregressive (AR) and moving average (MA) components for modeling. Calculation process and formula derivation:

[0083] Stationarity processing. First, the variance is stabilized using a Box-Cox transformation. Then, to eliminate trends and seasonality, the original sequence is... conduct Ordinary difference sum of order The formula for the seasonal difference is:

[0084]

[0085] in Lag operator , It is the stationary residual sequence.

[0086] Model fitting. Stationary sequences. It is modeled as the following linear equation:

[0087]

[0088] Non-seasonal portion: (Current value is affected by past) (Impact of the weather)

[0089] Seasonal component: (The current value is affected by the past number) (The impact of the same period in the previous week)

[0090] Similarly, for the past error terms respectively Modeling is performed.

[0091] Parameter estimation. The maximum likelihood estimation (MLE) method is used to find the optimal combination of parameters. The order is determined by minimizing the Akaike Information Criterion (AIC). and .

[0092] Prophet abandons the strict lag dependency structure of traditional ARIMA, instead adopting the idea of ​​a "generalized additive model (GAM)" similar to curve fitting. It incorporates time... Treated as the sole regression variable, predicted values ​​are generated by decomposing and superimposing components. Calculation process and formula derivation: Core generation formula:

[0093]

[0094] Nonlinear trend term To address the limitations of e-commerce sales (such as inventory capacity) and the fact that growth rates vary over time, a logistic growth model is employed.

[0095]

[0096] in, For carrying capacity, For growth rate, This is the offset parameter. Prophet introduces a change-point detection mechanism, in... Allowable growth rate at potential turning points Sudden changes occur, allowing us to capture trend shifts resulting from adjustments in promotional strategies.

[0097] Seasonal items The Fourier series is used to fit a multi-period cycle. For a period of one cycle (... The model is constructed in the following matrix form:

[0098]

[0099] Where the eigenvector These are the coefficients to be fitted. They are usually taken as... This allows for a thorough fit of the weekly fluctuations.

[0100] Holiday effect Construct the indicator function matrix. Each column represents a festival (e.g., SuperBowl):

[0101]

[0102] in Regularization parameters Control the sparsity of the impact of holidays.

[0103] N-BEATS is a revolutionary deep neural network architecture. Instead of using recurrent neural networks (RNNs), it peels away the trend and seasonal components of a signal layer by layer through stacked residual links. This design solves the vanishing gradient problem in deep networks and gives neural networks interpretability similar to statistical decomposition. Deep stacking process:

[0104] The model is composed of It consists of stacks, and each stack contains multiple blocks. The workflow of a block is as follows:

[0105] (1) Feature Encoding. The input is the historical data of the lookback window. (Initially) Nonlinear features are extracted using four fully connected layers (FC) and the ReLU activation function.

[0106]

[0107] (2) Basis function expansion coefficients. The last layer of the network outputs two projection vectors—the "backward coefficients". And "forward coefficient" :

[0108]

[0109] (3) Waveform Synthesis. This is the core of N-BEATS. The coefficients and preset basis functions... Multiply to generate the fitted curve:

[0110] Backcast (used for fitting history):

[0111] Forecast (used to predict the future):

[0112] (4) Double Residual

[0113] Downward propagation: Historical signal residuals not described by the current block It will be used as input for the next block. .

[0114] Backward output: The predicted future signal of the current block This will be added to the final result.

[0115]

[0116] This mechanism allows the first stack to focus on capturing macro trends, while subsequent stacks focus on capturing high-frequency details.

[0117] XGBoost is presented as a representative of ensemble learning in this invention. Unlike the end-to-end time-series models mentioned above, XGBoost adopts a "feature engineering + supervised regression" paradigm. Its core lies in continuously adding new decision trees through an additive model to fit the residuals from the previous step. Algorithm iteration flow:

[0118] The additive model assumes... (The assumption is repeated...) The predicted value obtained from the round of iterations is In the first The invention trains a new tree. This minimizes the objective function:

[0119]

[0120] Second-order Taylor expansion optimization. To quickly find the optimal tree structure, XGBoost optimizes the loss function. A second-order Taylor approximation was performed. This is the main improvement over the traditional GBDT, which only uses the first-order derivative:

[0121]

[0122] in, For the first-order gradient, This is a second-order Hessian matrix. This step makes XGBoost converge to extremely fast speed.

[0123] Structure scoring and splitting strategy. Specifically, for each tree, XGBoost defines a structure score formula to determine how to split a leaf node. For a given leaf node... Its optimal weight The corresponding minimum loss is:

[0124]

[0125] in This is the set of samples that fall into that leaf node. During training, the algorithm traverses all possible feature split points, searching for those that make... Reduce the most splitting scheme (maximize gain).

[0126] Mechanisms to prevent overfitting. (In the formula) (L2 regularization) and (Leaf node penalty) directly penalizes the model's complexity. Combining column sampling and shrinkage, XGBoost exhibits extremely strong generalization ability on the high-dimensional and sparse M5 dataset.

[0127] This invention uses typical stores from the M5 dataset as examples to demonstrate the visualization results and quantitative evaluation of four models—SARIMA, Prophet, N-BEATS, and XGBoost—under 7-day (short-term), 14-day (medium-term), and 28-day (long-term) prediction windows. The prediction curves of the four models at different time spans are shown below. In the figure, the gray area represents historical training data, the light gray shaded area represents the prediction window, the black dashed line (Actual Ground Truth) represents actual sales, and the colored solid line represents the model's predicted value.

[0128] For the 7-day forecast window results, as follows Figure 1 As shown. Figure 1This is the most challenging week for forecasting because the data in this period contains dramatic, erratic fluctuations (sharp drops followed by rapid rebounds). This is a key test to differentiate a model's sensitivity to recent features. XGBoost is the only model that effectively captures the "sharp drop and sharp rise" pattern. Relying on the powerful utilization of lagged features by the tree model, it accurately identifies the turning point. In contrast, Prophet's prediction curve is too smooth, exhibiting severe sluggishness; while SARIMA captures the downward trend, it significantly underestimates the strength of the subsequent rebound. Therefore, in short-term high-volatility scenarios, the feature-engineered machine learning model (XGBoost) significantly outperforms time series models that rely on extrapolating their own sequences.

[0129] For the 14-day forecast window results, as follows: Figure 2 As shown. Figure 2 Covering a two-week forecast period, the test focused on the model's ability to transition from high volatility (week one) to a stable period (week two). The first half of the test set showed a significant drop in sales (V-shaped fluctuation). Figure 2 As can be seen, XGBoost once again demonstrated the strongest performance. It not only most accurately fitted the trough of the first week but also quickly returned to normal in the second week, accurately predicting the weekend peak. While N-BEATS showed some dynamic adjustment capability, attempting to follow the jagged fluctuations of the data, it exhibited lag and underfitting at the peaks. Prophet performed the worst in this time window. It completely ignored the downward trend of the first week, indicating that its additive model structure is insensitive to short-term, sudden market changes.

[0130] For the 28-day forecast window results, as follows: Figure 3 As shown. Figure 3 The performance of the four models on a four-week (28-day) test set is presented, covering four complete sales cycles. XGBoost demonstrates exceptional robustness. In long-term forecasting, the predicted curve consistently follows the peaks and troughs of the actual values, without any error accumulation over time, proving that the model successfully captures stable weekly seasonal characteristics. While SARIMA is not as refined as XGBoost, it outperforms the deep learning model N-BEATS in capturing long-term trends, demonstrating the stability of traditional statistical models when handling highly cyclical data. Prophet exhibits significant rigidity, with its predicted curve displaying a standard sine wave, systematically underestimating weekend peak sales and failing to adapt to the non-linear fluctuations of real data.

[0131] In addition, this invention also records the evaluation metrics of SARIMA, Prophet, N-BEATS and XGBOOST models during the training process, including mean absolute error (MAE), root mean square error (RMSE) and mean absolute percentage error (MAPE), and the results are shown in Table 1.

[0132]

[0133] When comprehensively analyzing the sales forecast results for different forecast windows (7 days, 15 days, and 28 days) on the M5 dataset, we can see that:

[0134] In summary, XGBoost, by efficiently utilizing lag features, balances sensitivity to short-term sudden fluctuations with stability to long-term cyclical changes, achieving the lowest MAPE (4.47%-6.12%) across all prediction step sizes, thus establishing its superior performance. As the prediction window expands from 7 days to 28 days, the data characteristics gradually shift from irregular oscillations to regular weekly seasonality, leading to an improvement in the overall accuracy of each model. During this process, the traditional statistical model SARIMA exhibits extremely strong long-term adaptability, outperforming the deep learning model N-BEATS in 28-day predictions. Although N-BEATS performs excellently when handling large-scale homogeneous data, the traditional ARIMA model demonstrates stronger robustness on the M5 dataset of this invention. This is primarily due to ARIMA's independent modeling of univariate sequences, which avoids negative transfer of noise between different products. When the M5 dataset exhibits high sparsity and random fluctuations, the simple statistical model follows the 'mean regression' principle, avoiding the tendency of deep learning models to overfit noise. Furthermore, the difference mechanism of statistical methods is more advantageous than unoptimized deep learning models when extrapolating nonlinear data trends. In contrast, Prophet, due to the structural rigidity caused by its smoothing assumption, consistently struggles to characterize the micro-texture and extreme values ​​of sales, and its performance across all time spans is inferior to other models.

[0135] Comprehensive analysis of multi-step prediction results:

[0136] In the 7-day short-term forecast, the test set data exhibited sharp, irregular fluctuations (i.e., a "V-shaped" drop and rebound on days 4-6). XGBoost, with its strong sensitivity to recent lag characteristics, was the only model that accurately captured this sharp turn and quickly returned to high levels, achieving the best MAPE (6.12%). In contrast, Prophet, due to its smoothing assumption based on Fourier series, showed extremely strong rigidity to such high-frequency abrupt changes, completely smoothing out the fluctuations and resulting in the largest error. SARIMA and N-BEATS, falling between the two, identified the downward trend, but both showed significant lag in the strength and phase of the rebound, failing to closely approximate the extreme points of the true values ​​like XGBoost.

[0137] In the 14-day mid-term forecast, as the forecast window expands to two weeks, the model needs to handle the transition from initial sharp fluctuations to a subsequent stable weekly pattern. XGBoost demonstrated excellent adaptability (MAPE 5.15%), not only correcting for initial volatility but also quickly identifying regular peaks and troughs in the second week. At this point, N-BEATS showed slightly unstable performance, with its forecast curve exhibiting some oscillations at the peaks, and its accuracy was approached by traditional statistical models. SARIMA, on the other hand, gradually began to demonstrate its advantage in handling periodic data, with its error relatively narrowing. Only Prophet remained constrained by the structural limitations of its additive model, struggling to capture local changes and consistently underestimating the weekend sales peak, its performance still primarily capturing overall trends rather than details.

[0138] In the 28-day long-term forecast, strong weekly seasonality became the dominant feature. XGBoost further solidified its state-of-the-art (SOTA) status, achieving a perfect fit for four complete cycles with a MAPE of 4.47% and no error accumulation over time. Notably, in this long-cycle stationary scenario, the traditional statistical model SARIMA (5.61%) outperformed the deep learning model N-BEATS (6.42%), demonstrating that in long, highly regular sequences, capturing linear relationships may be more stable than the complex fitting of neural networks. Although Prophet's error decreased with increasing periodicity, its smooth curve shape could not accurately depict the fine texture of real sales, and its performance across all time spans lagged behind feature-engineered machine learning models.

[0139] In summary, this invention compares the predictive performance of four models—SARIMA, Prophet, N-BEATS, and XGBoost—over different prediction windows (7 days, 14 days, and 28 days). The analysis highlights the significant advantages of ensemble learning and deep learning-based methods in processing time series data, enabling models to maintain accurate capture of trends and fluctuations over longer time spans.

[0140] Across all the time spans examined, XGBoost and N-BEATS significantly outperformed traditional statistical models (SARIMA and Prophet), particularly excelling in capturing nonlinear dependencies and dynamic features in complex sequences. Comparison of data from different time windows revealed that as the prediction period lengthened (from 7 to 28 days), the modern models did not exhibit significant performance degradation; instead, they demonstrated strong robustness in metrics such as MAPE (mean absolute percentage error).

[0141] Especially in long-term prediction, XGBoost showed extremely high adaptability and accuracy across all test windows, with a MAPE as low as 4.47% and a MAE of 210.41 in the 28-day prediction window, significantly outperforming all other models.

[0142] Therefore, these findings underscore the necessity of outperforming traditional statistical methods in real-world prediction tasks. For applications seeking ultimate accuracy and stability, the XGBoost model is strongly recommended; while N-BEATS is a highly competitive option for scenarios requiring the construction of end-to-end deep learning frameworks. This data-driven model selection strategy not only provides more precise quantitative evidence for business planning but also helps reduce potential operational risks arising from prediction biases, offering solid support for long-term decision-making.

[0143] Example 2:

[0144] The computer-readable storage medium of this embodiment stores a computer program that, when executed by a processor, implements the steps in the retail sales forecasting method based on a multi-model fusion perspective of Embodiment 1.

[0145] The computer-readable storage medium in this embodiment can be an internal storage unit of the terminal, such as the terminal's hard disk or memory; the computer-readable storage medium in this embodiment can also be an external storage device of the terminal, such as a plug-in hard disk, smart memory card, secure digital card, flash memory card, etc. equipped on the terminal; furthermore, the computer-readable storage medium can include both the terminal's internal storage unit and external storage devices.

[0146] The computer-readable storage medium of this embodiment is used to store computer programs and other programs and data required by the terminal. The computer-readable storage medium can also be used to temporarily store data that has been output or will be output.

[0147] Example 3:

[0148] The computer device of this embodiment includes a memory, a processor, and a computer program stored in the memory and executable on the processor. When the processor executes the program, it implements the steps in the retail sales forecasting method based on a multi-model fusion perspective of Embodiment 1.

[0149] In this embodiment, the processor can be a central processing unit, or other general-purpose processors, digital signal processors, application-specific integrated circuits, off-the-shelf programmable gate arrays or other programmable logic devices, discrete gate or transistor logic devices, discrete hardware components, etc. The general-purpose processor can be a microprocessor or any conventional processor, etc. The memory can include read-only memory and random access memory, and provides instructions and data to the processor. A portion of the memory can also include non-volatile random access memory. For example, the memory can also store device type information.

[0150] Those skilled in the art will understand that the content disclosed in the embodiments can be provided as a method, system, or computer program product. Therefore, this solution can take the form of a hardware embodiment, a software embodiment, or an embodiment combining software and hardware aspects. Furthermore, this solution can take the form of a computer program product implemented on one or more computer-usable storage media (including but not limited to disk storage and optical storage) containing computer-usable program code.

[0151] This solution is described with reference to flowchart illustrations and / or block diagrams of methods and computer program products according to embodiments of this solution. It should be understood that each block of the flowchart illustrations and / or block diagrams, and combinations of blocks in the flowchart illustrations and / or block diagrams, can be implemented by computer program instructions. These computer program instructions can be provided to a processor of a general-purpose computer, special-purpose computer, embedded processor, or other programmable data processing device to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing device, generate instructions for implementing the flowchart illustrations and / or block diagrams. Figure 1 One or more processes and / or boxes Figure 1 A device that provides the functions specified in one or more boxes.

[0152] These computer program instructions may also be stored in a computer-readable storage medium that can direct a computer or other programmable data processing device to function in a particular manner, such that the instructions stored in the computer-readable storage medium produce an article of manufacture including instruction means, which are implemented in a process Figure 1 One or more processes and / or boxes Figure 1 The function specified in one or more boxes.

[0153] These computer program instructions may also be loaded onto a computer or other programmable data processing equipment to cause a series of operational steps to be performed on the computer or other programmable equipment to produce a computer-implemented process, thereby providing instructions that execute on the computer or other programmable equipment for implementing the process. Figure 1 One or more processes and / or boxes Figure 1 The steps of the function specified in one or more boxes.

[0154] Those skilled in the art will understand that all or part of the processes in the above embodiments can be implemented by a computer program instructing related hardware. The program can be stored in a computer-readable storage medium, and when executed, it can include the processes of the embodiments of the above methods. The storage medium can be a magnetic disk, optical disk, read-only memory (ROM), or random access memory (RAM), etc.

[0155] The examples described herein are merely preferred embodiments of the invention and are not intended to limit the concept and scope of the invention. Any modifications and improvements made by those skilled in the art to the technical solutions of the invention without departing from the design concept of the invention should fall within the protection scope of the invention.

[0156] The foregoing has shown and described the basic principles, main features, and advantages of the present invention. Those skilled in the art should understand that the present invention is not limited to the specific embodiments described above. The specific embodiments and descriptions in the specification are merely for further illustrating the principles of the invention. Various changes and modifications can be made to the present invention without departing from its spirit and scope, and all such changes and modifications fall within the scope of the present invention as claimed. The scope of protection of the present invention is defined by the claims and their equivalents.

Claims

1. A retail sales forecasting method based on a multi-model fusion perspective, characterized in that, The steps include: constructing a time calendar feature system and a price feature system respectively; fusing multiple models to predict results; Specifically, a time calendar feature system is constructed to extract and encode the following features from the date field: Basic time characteristics: year, month, day of the month, and day of the week; A weekend identifier is created, and a boolean variable is constructed to capture the weekend shopping peak. Special event characteristics are used to classify and encode holidays and introduce variables to capture the pulling effect of coupon issuance days on the sales of discounted products; Among them, a price feature system, namely relative price features, is constructed to reflect the difference between the current price and the historical average price; The results are fused from multiple models, including one or more of the following models: SARIMA, Prophet, N-BEATS, and XGBoost.

2. The method as described in claim 1, characterized in that, The relative price features constructed in the price feature system include: the rate of price change, which is used to capture the non-linear stimulating effect of promotions and discounts on sales.

3. The method as described in claim 2, characterized in that, The XGBoost model employs a "feature engineering + supervised regression" paradigm, continuously adding new decision trees through an additive model to fit the residuals from the previous step. The algorithm iteration process is as follows: Additive model assumptions; assumptions after The predicted value obtained from the round of iterations is ; in the Wheel, training a new tree This minimizes the objective function: Second-order Taylor expansion optimization; XGBoost model adjusts the loss function. A second-order Taylor approximation was performed; in, For the first-order gradient, It is a second-order Hessian matrix; Structure scoring and splitting strategy; the XGBoost model defines a structure score formula for a given leaf node. Optimal weight The corresponding minimum loss is: in It is the set of samples that fall into the leaf nodes; during training, the algorithm traverses all possible feature split points, searching for those that make the leaf nodes fall into the leaf nodes. The scheme that reduces the most splitting; Mechanisms to prevent overfitting; in the formula For L2 regularization, Penalize leaf nodes; and This directly penalizes the model's complexity.

4. The method as described in claim 2, characterized in that, The SARIMA model eliminates nonstationarity through multi-level differencing and combines autoregressive and moving average components for modeling. Stabilization treatment; using Box-Cox transformation to stabilize variance; to eliminate trends and seasonality, the original sequence was... conduct Ordinary difference sum of order The formula for the seasonal difference is: in Lag operator , The residual sequence after stationarization; Model fitting; stationary sequence It is modeled as the following linear equation: Non-seasonal portion: The current value is affected by the past. The influence of the weather; Seasonal component: The current value is affected by the past number. The impact of the same period in the previous week; Similarly, for the past error terms respectively Modeling; Parameter estimation; finding the optimal parameter combination using the maximum likelihood estimation method. The order is determined by minimizing the Akaike information criterion. and .

5. The method as described in claim 2, characterized in that, The Prophet model will time Treated as the sole regression variable, predicted values ​​are generated by decomposing and superimposing components; the core generation formula is: Nonlinear trend term To address the characteristics of e-commerce sales having an upper limit and growth rates varying over time, a logistic regression growth model is adopted. in, For carrying capacity, For growth rate, The offset parameter; Prophet introduces a change point detection mechanism, allowing the growth rate at potential change points. Sudden changes occur, allowing us to capture trend shifts resulting from adjustments in promotional strategies; Seasonal items Using Fourier series to fit multi-period cycles; for periodic cycles The model is constructed in the following matrix form: Where the eigenvector The coefficients are to be fitted. Holiday effect Construct the indicator function matrix Each column represents a festival: in Regularization parameters Control the sparsity of the impact of holidays.

6. The method as described in claim 2, characterized in that, The N-BEATS model is a deep neural network architecture that uses stacked residual connections to progressively remove trend and seasonal components from a signal; the deep stacking process is as follows: The model is composed of It consists of stacked blocks, each stacked block contains multiple blocks; the first... The workflow for a block is as follows: Feature encoding; The input is the historical data from the lookback window. Initially Nonlinear features are extracted using four fully connected layers and the ReLU activation function. The basis function expansion coefficients are used to output two projection vectors—"backward coefficients"—from the final layer of the network. And "forward coefficient" : Waveform synthesis and decomposition; coefficients and preset basis functions Multiply to generate the fitted curve: Backcast, used for fitting history: Forecast, used to predict the future: Bidirectional residual propagation; downward propagation: historical signal residuals not described by the current block. It will be used as input for the next block. ; Backward output: The predicted future signal of the current block This will be added to the final result; The first stack can focus on capturing macro trends, while subsequent stacks can focus on capturing high-frequency details.

7. A computer-readable storage medium having a computer program stored thereon, characterized in that: When the program is executed by the processor, it implements the steps in the retail sales forecasting method based on a multi-model fusion perspective as described in any one of claims 1 to 6.

8. A computer device, comprising a memory, a processor, and a computer program stored in the memory and executable on the processor, characterized in that, When the processor executes the program, it implements the steps in the retail sales forecasting method based on a multi-model fusion perspective as described in any one of claims 1 to 6.