KAN network-based method and system for predicting outlet moisture of tobacco drying process
The method for predicting the moisture content at the outlet of tobacco shreds based on KAN network solves the problems of traditional models' difficulty in modeling the internal mechanism of the drying process and the lack of interpretability of deep learning. It achieves high-precision prediction of the moisture content at the outlet of tobacco shreds and is suitable for the process control needs of industrial sites.
Patent Information
- Authority / Receiving Office
- CN · China
- Patent Type
- Applications(China)
- Current Assignee / Owner
- CHINA TOBACCO YUNNAN IND
- Filing Date
- 2026-03-24
- Publication Date
- 2026-06-26
AI Technical Summary
Traditional soft measurement models are difficult to effectively model the intrinsic mechanism of the wire drying process, and deep learning methods lack interpretability, making it difficult to provide a reliable basis for process analysis and operation adjustment.
A method for predicting the moisture content at the outlet of tobacco shreds during the drying process based on KAN network is adopted. Key process variables are screened through the maximum correlation minimum redundancy algorithm. Combined with the dynamic normalization strategy of sliding window and sparsity regularization constraint, a KAN prediction network integrating SiLU basis function and parameterized B spline function is constructed. The network parameters are iteratively optimized and the generalization performance is verified to achieve real-time prediction of the moisture content at the outlet of tobacco shreds.
It achieves high-precision and high-reliability prediction of moisture content at the outlet during the tobacco drying process, provides a preliminary reference for process control, compensates for the lag of direct measurement, and adapts to the continuous production needs of industrial sites.
Smart Images

Figure CN122286147A_ABST
Abstract
Description
Technical Field
[0001] This application relates to the field of tobacco technology, and in particular to a method and system for predicting the moisture content at the outlet of tobacco drying process based on KAN network. Background Technology
[0002] Accurate prediction of moisture content at the outlet of the tobacco drying process is of great significance for ensuring the stability of product quality and the safety of production operations in the cigarette manufacturing process.
[0003] However, the wire drying process has strong multivariate coupling, significant nonlinearity and complex dynamic characteristics, making it difficult for traditional soft sensor models to effectively model its intrinsic mechanism; at the same time, most existing deep learning methods lack interpretability and are difficult to provide a reliable basis for process analysis and operation adjustment. Summary of the Invention
[0004] The main purpose of this application is to provide a method and system for predicting the outlet moisture content of tobacco drying process based on KAN network, so as to solve the problems that traditional soft measurement models in the prior art are difficult to effectively model the internal mechanism of the drying process, and most existing deep learning methods lack interpretability and are difficult to provide a reliable basis for process analysis and operation adjustment.
[0005] To achieve the above objectives, this application provides the following technical solution: A method for predicting the moisture content at the outlet of tobacco drying process based on KAN network, the method comprising: Step S1: Collect the full-process time-series process data of the tobacco drying process, and use the maximum correlation minimum redundancy algorithm to screen key process variables to obtain the initial input feature sequence and the target sequence of outlet moisture content; Step S2: Based on the dynamic normalization strategy of sliding window, the initial input feature sequence and the target moisture content sequence at the outlet are normalized and sliced by sliding time window to obtain a standardized time series feature sample set. Step S3: Divide the standardized time-series feature sample set into a training set and a test set according to the production batch, and construct a KAN prediction network that integrates SiLU basis functions and parameterized B spline functions; Step S4: Introduce sparsity regularization constraints and adaptive grid update strategy into the KAN prediction network, and perform iterative optimization of network parameters using the training set to obtain the converged KAN prediction model. Step S5: Verify the generalization performance of the converged KAN prediction model using the test set to obtain the tobacco shred outlet moisture prediction model. Step S6: Connect the tobacco shred outlet moisture prediction model to the real-time collected drying process time series data and perform forward inference calculation to obtain the predicted value of tobacco shred outlet moisture during the drying process.
[0006] Beneficial effects of steps S1 to S6: This method, through the screening and sequence construction of core process variables, anchors key process parameters related to the moisture content at the outlet of the drying process, avoiding the interference of redundant variables on the modeling process, and adapting to the strong coupling of multiple variables in the drying process. Specifically, step S1 achieves accurate screening of key process variables related to the outlet moisture content, eliminates redundant interference information, ensures a strong correlation between input features and prediction targets, and adapts to the multi-variable coupling process characteristics of the drying process; step S2, through standardization processing adapted to the non-stationary characteristics of industrial data, strengthens the model's robustness to fluctuations in different batches of raw materials and operating conditions, alleviating the performance degradation problem of traditional methods under operating condition shifts; step S3 constructs a network structure to efficiently characterize the complex nonlinear mapping relationship between process variables and outlet moisture, breaking through the limitations of traditional fixed activation function networks in nonlinear expression; step S4, through regularization constraints and mesh update strategies, completes model parameter iteration. While optimizing, the model's generalization performance is improved, the risk of overfitting is avoided, and structural support is provided for the model's inherent interpretability. Step S5 completes the systematic verification of the model's generalization performance, ensuring the model's predictive stability under unseen operating conditions and adapting to the application needs of continuous production in industrial settings. Step S6 realizes online real-time inference prediction of outlet moisture, providing a preliminary reference for process control in the drying process, compensating for the lag defects of direct measurement, and relying on the model's interpretability characteristics, providing reliable support for on-site process analysis and operation adjustment. Overall, it achieves high-precision and high-reliability prediction of outlet moisture in the drying process, taking into account both the model's predictive performance and the practical application value in industrial scenarios.
[0007] As a further improvement to this application, step S1 involves collecting the entire time-series process data of the tobacco drying process, and using the maximum correlation minimum redundancy algorithm to screen key process variables, thereby obtaining the initial input feature sequence and the target moisture content sequence at the outlet, including: Step S1.1: Collect the full-process time-series process data during the tobacco drying process to obtain the original process time-series dataset; Step S1.2: Perform timing alignment and standardization on the original process timing dataset to obtain a standardized timing process dataset; Step S1.3: Extract the time series data of process variables and the time series data of outlet moisture content from the standardized time series process dataset, and define them as the feature candidate variable set and the target variable sequence, respectively. Step S1.4: The maximum correlation and minimum redundancy algorithm is used to calculate the correlation and redundancy of the feature candidate variable set and the target variable sequence to obtain the maximum correlation and minimum redundancy score result for each candidate variable. Step S1.5: Sort and filter the set of feature candidate variables according to the maximum correlation and minimum redundancy score of each candidate variable to obtain the set of key process variables related to the outlet moisture content; Step S1.6: Extract time series data corresponding to the key process variable set from the standardized time series process dataset to obtain the initial input feature sequence; Step S1.7: Perform time-series dimension matching between the initial input feature sequence and the target variable sequence to obtain the initial input feature sequence and the outlet moisture content target sequence.
[0008] Beneficial effects of steps S1.1 to S1.7: Step S1.1: Complete collection of time-series process data for the entire tobacco drying process, providing a basic data source matching the production process for subsequent feature selection and data processing. Step S1.2: Perform time-series alignment and standardization on the raw data to eliminate time-series misalignments and dimensional differences, improving overall data quality and providing a standardized data foundation for subsequent variable extraction and quantitative calculation. Step S1.3: Separate and extract process variables and the target variable (outlet moisture content) from the standardized data, clarifying the candidate range and prediction target for feature selection, and establishing a clear variable correspondence for subsequent correlation calculations. Step S1.4: Calculate the correlation and redundancy quantification between candidate variables and target variables using the maximum correlation minimum redundancy algorithm, providing a basis for feature selection. The selection process provides objective and quantitative evaluation criteria, avoiding subjective bias caused by manual selection. Step S1.5 uses the quantitative scoring results to sort and select key process variables, eliminating redundant and irrelevant variables and retaining core features strongly correlated with outlet moisture content, thereby reducing the computational complexity and noise interference of subsequent modeling. Step S1.6 accurately extracts the corresponding time series data of key process variables from standardized time series data to form an initial input feature sequence consistent with the selection results. Step S1.7 achieves temporal dimension matching between the initial input feature sequence and the target variable sequence, eliminating temporal misalignment problems and ensuring accurate correspondence between input features and prediction targets in the time dimension, providing an effective sample basis for temporal matching for subsequent data processing and model construction.
[0009] As a further improvement of this application, step S2 involves performing data standardization and sliding time window slicing on the initial input feature sequence and the target outlet moisture content sequence based on a sliding window dynamic normalization strategy to obtain a standardized time-series feature sample set, including: Step S2.1: Perform time-series dimension alignment and timestamp matching on the initial input feature sequence and the target moisture content sequence at the outlet; Step S2.2: Perform sliding time window slicing on the time-aligned initial input feature sequence and the outlet moisture content target sequence to obtain a time-series feature slice sequence with a fixed window length and a corresponding label sequence; Step S2.3: Perform window-by-window statistical calculations on the temporal feature slice sequence and the corresponding label sequence to obtain the mean vector and standard deviation vector corresponding to each sliding window; Step S2.4: Based on the dynamic normalization strategy of sliding window, perform window-by-window data standardization on the mean vector and standard deviation vector corresponding to each sliding window, as well as the temporal feature slice sequence and corresponding label sequence of the fixed window length, to obtain the standardized temporal feature slice sequence and standardized label sequence. Step S2.5: Perform sample dimension normalization on the standardized temporal feature slice sequence and standardized label sequence to obtain a standardized temporal feature sample set.
[0010] Beneficial effects of steps S2.1 to S2.5: Step S2.1 achieves precise temporal correspondence between the input feature sequence and the target sequence, eliminating temporal misalignment and providing an effective data foundation for subsequent window slicing processing. Step S2.2 adapts to the large time delay dynamics of the drying process by using sliding time window slicing, transforming continuous time-series data into structured samples that meet model input requirements, providing a unified sample unit for subsequent data standardization. Step S2.3 accurately quantifies the statistical characteristics of data within each sliding window, providing a calculation basis for dynamic normalization to adapt to the data distribution within the window, avoiding the problem of insufficient adaptability to operating conditions caused by global statistics. Step S2.4 completes window-by-window data standardization based on the dynamic normalization strategy of sliding windows, alleviating the model performance degradation problem caused by the non-stationarity of industrial data, enhancing the model's robustness to fluctuations in different batches of raw materials and operating conditions, and providing standardized data for subsequent sample regularization. Step S2.5 completes the dimensional regularization and matching verification of standardized samples, ensuring the consistency of sample set dimensions and data integrity, providing a standardized sample foundation with structural regularity and temporal matching for subsequent model training and testing.
[0011] As a further improvement to this application, step S3, dividing the standardized time-series feature sample set into a training set and a test set according to the production batch, and constructing a KAN prediction network that integrates SiLU basis functions and parameterized B-spline functions, includes: Step S3.1: Extract the production batch identifier and time series data corresponding to the standardized time series feature sample set; Step S3.2: Sort all time-series data in the standardized time-series feature sample set according to the time sequence of the production batch identifiers to obtain the time-series sorted feature sample set; Step S3.3: Divide the time-series sorted feature sample set according to the time sequence of the production batch identifiers to obtain the training set and the test set; Step S3.4: Determine the network topology based on the Kolmogorov-Arnold representation theorem, and define the basic KAN network topology framework by combining the feature dimensions and output dimensions of the training set and the test set. Step S3.5: Introduce SiLU basis functions and parameterized B-spline functions based on the Cox. de Boor recursive formula on the network connection edges of the basic KAN network topology framework to obtain the hybrid basis function KAN network layer; Step S3.6: Stack the hybrid basis function KAN network layers in multiple layers and define the dimensional parameters of the input layer, hidden layer, and output layer to obtain a KAN prediction network that integrates SiLU basis functions and parameterized B-spline functions.
[0012] Beneficial effects of steps S3.1 to S3.6: Step S3.1: Clarify the batch affiliation and temporal attributes of the samples to provide a clear classification basis for subsequent dataset partitioning and avoid misalignment between sample temporal and batch information; Step S3.2: Ensure the temporal continuity of the sample set, conforming to the batch sequence logic of industrial production, avoiding the risk of temporal information leakage caused by random sorting, and providing a temporal basis that conforms to the actual production scenario for subsequent dataset partitioning; Step S3.3: Ensure that the test set is completely independent of the training set in both batch and temporal dimensions, conforming to the actual working conditions of industrial field model applications, objectively supporting the verification of the model's cross-batch generalization ability, and avoiding interference from data leakage on model performance evaluation; Step S3. 4. Construct a basic network structure adapted to the prediction task of the wire drying process, providing a theoretically sound structural support for subsequent basis function introduction and network construction, and ensuring the adaptability of the network structure to the prediction task; Step S3.5 balances the ability to characterize global smoothness trends and local nonlinear characteristics, breaking through the limitations of single basis functions in the nonlinear expression of complex industrial processes, and adapting to the differentiated effects of different process variables in the wire drying process; Step S3.6 Construct a complete prediction network with multi-dimensional nonlinear mapping capabilities, realizing the effective expression of the complex coupling relationship between process variables and outlet moisture, and providing a structurally complete prediction network foundation for subsequent model training and optimization; As a further improvement to this application, step S4 involves introducing sparse regularization constraints and an adaptive grid update strategy into the KAN prediction network, and iteratively optimizing the network parameters using the training set to obtain a converged KAN prediction model, including: Step S4.1: Construct a mean squared error loss function based on the KAN prediction network and the training set, and introduce a sparsity regularization constraint composed of L1 regularization and entropy regularization to obtain a total loss function with sparsity regularization constraint. Step S4.2: An adaptive grid update strategy is introduced into the total loss function to obtain the iterative optimization objective function adapted to the KAN prediction network; Step S4.3: The Adam optimizer is used to perform forward and backward propagation calculations on the iterative optimization objective function and the training set to obtain the updated network parameters after each iteration. Step S4.4: Synchronously update the side function parameters and grid node parameters of the KAN prediction network according to the network parameter update value to obtain the iteratively updated KAN prediction network; Step S4.5: Perform iterative convergence verification on the iteratively updated KAN prediction network. Stop the iteration when the preset convergence condition is met, and obtain the converged KAN prediction model.
[0013] Beneficial effects of steps S4.1 to S4.5: Step S4.1 establishes an optimization objective that balances fitting accuracy and parameter sparsity for model iterative optimization. While constraining redundant connections in the network, it provides underlying support at the loss level for the model's inherent interpretability, ensuring the matching degree between the optimization process and the prediction task requirements. Step S4.2 adapts the network's spline grid to the data distribution characteristics, improving the network's adaptability to nonlinear changes in the process, ensuring the iterative optimization process's ability to capture local data features, and avoiding the expression limitations caused by fixed grids. Step S4.3 achieves efficient and stable iterative updates of network parameters, avoiding gradient fluctuations during the optimization process, ensuring the convergence stability of the parameter update process, and providing a reliable computational path for continuous optimization of network parameters. Step S4.4 achieves collaborative optimization of network edge function parameters and grid node parameters, ensuring the consistency between the network's nonlinear expressive ability and the data distribution, and avoiding fitting bias caused by asynchronous parameter updates. Step S4.5 terminates the iteration when the model reaches a stable fitting state, avoiding the risk of overfitting caused by excessive iteration, ensuring the model's generalization performance, and outputting a converged model with stable prediction capabilities. As a further improvement to this application, step S5 involves verifying the generalization performance of the converged KAN prediction model using the test set to obtain a tobacco shred outlet moisture prediction model, including: Step S5.1: Perform forward propagation inference calculation between the converged KAN prediction model and the test set to obtain the predicted outlet moisture content sequence corresponding to the test set; Step S5.2: Align the predicted export moisture content sequence with the actual export moisture content label sequence in terms of time series dimension to obtain the aligned predicted sequence and the actual label sequence. Step S5.3: Calculate the prediction error between the aligned prediction sequence and the real label sequence to obtain the generalization performance evaluation index of the converged KAN prediction model. Step S5.4: Perform convergence verification on the generalization performance evaluation index results to obtain the model performance verification results; Step S5.5: Based on the model performance verification results, the parameters of the converged KAN prediction model that meets the performance requirements are solidified to obtain the tobacco shred outlet moisture prediction model.
[0014] Beneficial effects of steps S5.1 to S5.5: Step S5.1 provides a foundation for predictive output under unseen operating conditions to validate model generalization performance, aligning with the actual scenario of cross-batch applications in industrial production, and providing predictive data support that matches the on-site operating conditions for subsequent performance evaluation; Step S5.2 eliminates the interference of time-series misalignment on performance evaluation, ensuring accurate correspondence between prediction results and real labels in the time dimension, and providing effective time-series matching data pairs for subsequent prediction error calculation; Step S5.3 achieves objective quantitative evaluation of model prediction performance, comprehensively reflecting the model's prediction capability and generalization performance under unseen operating conditions, and providing quantifiable evaluation criteria for subsequent model performance verification; Step S5.4 completes the compliance determination of model generalization performance, accurately identifying models that do not meet application requirements, avoiding the application risks of substandard models, and ensuring the reliability of the output model for industrial applications; Step S5.5 completes the fixing of model parameters, eliminating the risk of parameter fluctuations in subsequent application processes, and providing a stable, reusable, and standardized prediction model for online prediction applications in industrial settings; As a further improvement to this application, step S6 involves connecting the tobacco shred outlet moisture prediction model to the real-time collected drying process time-series data and performing forward inference calculations to obtain the predicted value of the tobacco shred outlet moisture during the drying process, including: Step S6.1: Collect real-time process timing data of the tobacco drying process; Step S6.2: Align the real-time process time-series data stream with the tobacco shred outlet moisture prediction model in terms of time-series dimensions and perform sliding time window slicing to obtain a real-time time-series feature slice sequence with a fixed window length; Step S6.3: Based on the dynamic normalization strategy of the sliding window, the real-time time series feature slice sequence with fixed window length is subjected to data standardization processing to obtain standardized real-time time series feature samples. Step S6.4: Input the standardized real-time time series feature samples into the tobacco shred outlet moisture prediction model to perform forward propagation inference calculation and obtain the normalized outlet moisture prediction value. Step S6.5: Perform inverse normalization calculation on the normalized export moisture prediction value to obtain the export moisture prediction sequence under physical dimensions. Step S6.6: Extract the values at the corresponding prediction time from the predicted moisture content at the outlet in the physical dimension to obtain the predicted moisture content at the outlet of the tobacco drying process.
[0015] Beneficial effects of steps S6.1 to S6.6: Step S6.1 provides a data source that is synchronized with the actual operating conditions in real time for online prediction, ensuring the consistency of input data with the production process and providing a real-time and effective data foundation for subsequent time-series slicing processing. Step S6.2 achieves the adaptation of real-time data with model input requirements, ensuring the structural consistency between online input samples and offline training samples, eliminating inference bias caused by time series length mismatch, and providing structured real-time samples that meet model requirements for subsequent standardization processing. Step S6.3 ensures the consistency of preprocessing logic between online data and offline training data, eliminates numerical distribution differences caused by operating condition fluctuations, avoids prediction performance degradation caused by data distribution offset, and provides a standardized and unified basis for model inference. The standardized input samples are used; step S6.4 realizes efficient inference calculation of real-time data and outputs prediction results consistent with offline training logic, ensuring the stability and real-time performance of the online prediction process and providing basic prediction data for subsequent dimensional restoration; step S6.5 restores the prediction results to physical dimensions that meet the requirements of industrial field applications, ensuring the on-site interpretability and process adaptability of the prediction results, and providing a dimensionally unified sequence basis for the extraction of the final prediction value; step S6.6 outputs accurate prediction results corresponding to the prediction time, making up for the lag defects of direct on-site measurement, providing a preliminary reference for the feedforward control of the wire drying process, and adapting to the real-time prediction needs of continuous production in industrial fields; To achieve the above objectives, this application also provides the following technical solutions: A KAN network-based system for predicting the moisture content at the outlet of a tobacco drying process is provided. This system is applied to the aforementioned method for predicting the moisture content at the outlet of a tobacco drying process. The system includes: The tobacco drying data acquisition and screening module is used to collect the full-process time-series process data of tobacco drying, and to screen key process variables through the maximum correlation and minimum redundancy algorithm to obtain the initial input feature sequence and the target sequence of outlet moisture content. The tobacco drying data processing module is used to perform data standardization and sliding time window slicing on the initial input feature sequence and the outlet moisture content target sequence based on a dynamic normalization strategy using a sliding window, so as to obtain a standardized time-series feature sample set. The sample partitioning and model building module is used to partition the standardized time-series feature sample set into a training set and a test set according to the production batch, and to build a KAN prediction network that integrates SiLU basis functions and parameterized B spline functions. The KAN prediction network iteration module is used to introduce sparsity regularization constraints and adaptive grid update strategy into the KAN prediction network, and to perform iterative optimization of network parameters through the training set to obtain the converged KAN prediction model. The KAN prediction network verification module is used to verify the generalization performance of the converged KAN prediction model through the test set, and obtain the tobacco shred outlet moisture prediction model. The tobacco drying process outlet moisture prediction module is used to connect the tobacco outlet moisture prediction model to the real-time collected drying process time sequence data and perform forward inference calculations to obtain the predicted value of the tobacco drying process outlet moisture.
[0016] To achieve the above objectives, this application also provides the following technical solutions: An electronic device includes a processor and a memory coupled to the processor, the memory storing program instructions executable by the processor; when the processor executes the program instructions stored in the memory, it implements the above-described method for predicting the outlet moisture content of tobacco drying process based on a KAN network.
[0017] To achieve the above objectives, this application also provides the following technical solutions: A computer-readable storage medium storing program instructions, which, when executed by a processor, enable the implementation of the above-described method for predicting the outlet moisture content of tobacco shreds during the drying process based on a KAN network. Attached Figure Description
[0018] Figure 1 This is a schematic flowchart illustrating the steps of an embodiment of the method for predicting the outlet moisture content of tobacco shreds based on a KAN network in the tobacco drying process according to this application. Figure 2 This is a model index diagram of an embodiment of the method for predicting the outlet moisture content of tobacco shreds based on KAN network in the drying process of this application; Figure 3 This is a schematic diagram of the functional modules of an embodiment of a tobacco drying process outlet moisture prediction system based on KAN network according to this application; Figure 4 This is a schematic diagram of the structure of an embodiment of the electronic device of this application; Figure 5 This is a schematic diagram of the structure of one embodiment of the storage medium of this application. Detailed Implementation
[0019] The technical solutions of the embodiments of this application will be clearly and completely described below with reference to the accompanying drawings. Obviously, the described embodiments are only a part of the embodiments of this application, and not all of the embodiments. Based on the embodiments of this application, all other embodiments obtained by those of ordinary skill in the art without creative effort are within the scope of protection of this application.
[0020] The terms "first," "second," and "third" in this application are used for descriptive purposes only and should not be construed as indicating or implying relative importance or implicitly specifying the number of technical features indicated. Therefore, a feature defined as "first," "second," or "third" may explicitly or implicitly include at least one of that feature. In the description of this application, "multiple" means at least two, such as two, three, etc., unless otherwise explicitly specified. All directional indications (such as up, down, left, right, front, back, etc.) in the embodiments of this application are only used to explain the relative positional relationships and movements between components in a specific orientation (e.g., as shown in the figures). If the specific orientation changes, the directional indications also change accordingly. Furthermore, the terms "comprising" and "having," and any variations thereof, are intended to cover non-exclusive inclusion. For example, a process, method, system, product, or device that includes a series of steps or units is not limited to the listed steps or units, but may optionally include steps or units not listed, or may optionally include other steps or units inherent to these processes, methods, products, or devices.
[0021] In this document, the term "embodiment" means that a particular feature, structure, or characteristic described in connection with an embodiment may be included in at least one embodiment of this application. The appearance of this phrase in various places throughout the specification does not necessarily refer to the same embodiment, nor is it a mutually exclusive, independent, or alternative embodiment. It will be explicitly and implicitly understood by those skilled in the art that the embodiments described herein can be combined with other embodiments.
[0022] It should be noted that, due to the limited types and number of symbols or letters that can represent specific meanings, for embodiments with many formulas or codes, there may be situations where symbols or letters cannot meet the usage requirements. Therefore, the interpretation of formula symbols in the steps or sub-steps of the embodiments is only valid for the current step or sub-step.
[0023] If the same symbol has different interpretations in different steps or sub-steps, the interpretation in the current step or sub-step shall prevail; if the same symbol appears in different steps or sub-steps, but no interpretation is given in subsequent steps or sub-steps after its first appearance, the interpretation in the first step or sub-step shall be used.
[0024] like Figure 1 As shown, this embodiment provides an example of a method for predicting the outlet moisture content of tobacco shreds during the drying process based on a KAN network. In this embodiment, the method for predicting the outlet moisture content of tobacco shreds during the drying process includes the following steps: Step S1: Collect the full-process time-series process data of the tobacco drying process, and use the maximum correlation minimum redundancy algorithm to screen key process variables to obtain the initial input feature sequence and the target moisture content sequence at the outlet.
[0025] Furthermore, step S1 specifically includes the following steps: Step S1.1: Collect the full-process time sequence data of tobacco drying process to obtain the original process time sequence dataset.
[0026] Preferably, the data is collected from the real-time historical database of the distributed control system (DCS) that accompanies the wire drying machine, covering the entire process of wire drying, from preheating and start-up, feeding, stable production to discharge and shutdown.
[0027] Preferably, the sampling frequency is set to 1Hz (1 sampling point / second), the single batch acquisition time is not less than 4500s, covering the entire production cycle; the timestamp accuracy is at the millisecond level, and all acquired data is bound to the timestamp of a globally unified clock.
[0028] Preferably, the range of collected variables covers 28 process variables in four categories: energy supply, material state, airflow and environment, and process feedback. All variables are continuous numerical variables, and the online detection data of the moisture content of the material at the outlet of the drying machine is collected simultaneously as the raw data for the prediction target.
[0029] Preferably, a sensor range validity threshold can be set to remove invalid data that exceeds the upper and lower limits of the corresponding sensor range. For example, the effective range of cylinder wall temperature is 80℃ to 200℃, and the effective range of material moisture content is 8% to 20%. Sampling points that exceed the range are marked as invalid values, and finally the original process time series dataset is formed.
[0030] Step S1.2: Perform timing alignment and standardization on the original process timing dataset to obtain a standardized time-series process dataset.
[0031] Preferably, for timing alignment processing, the timing data of all variables can be resampled and aligned based on the 1-second step of the DCS global clock; for variables with misaligned timestamps, linear interpolation is used to complete the resampling.
[0032] Preferably, the standardization process in this step mainly addresses the issues of missing values and outliers in the original data. Specifically, missing value imputation can be achieved by setting a missing value handling threshold. For sequence segments with ≤5 consecutive missing points for a single variable, forward imputation is used; for sequence segments with >5 consecutive missing points, the mean of the same batch during stable production is used. Outlier removal can be accomplished using the 3σ criterion. For a single-variable time series, the mean μ and standard deviation σ of the entire series are calculated. Sampling points exceeding the range [μ-3σ, μ+3σ] are marked as outliers and replaced with the median of the five valid sampling points before and after that point.
[0033] For example, this step can be implemented using the following pseudocode: import pandas as pd import numpy as np # Timing Alignment and Resampling def time_align_process(raw_data, resample_freq='1S'): # Resampling by global timestamp aligned_data = raw_data.resample(resample_freq).mean() # Linear interpolation to complete misaligned data aligned_data = aligned_data.interpolate(method='linear', limit=5) return aligned_data # Handling outliers and missing values def data_clean_process(aligned_data): for col in aligned_data.columns: # 3σ criterion for outlier removal mu = aligned_data[col].mean() sigma = aligned_data[col].std() upper_bound = mu + 3 * sigma lower_bound = mu - 3 * sigma aligned_data[col] = np.where((aligned_data[col]>upper_bound) |(aligned_data[col] <lower_bound), np.nan, aligned_data[col]) # Missing value completion aligned_data[col] = aligned_data[col].fillna(method='ffill',limit=5) aligned_data[col] = aligned_data[col].fillna(aligned_data[col].mean()) return aligned_data Step S1.3: Extract the time series data of process variables and the time series data of outlet moisture content from the standardized time series process dataset, and define them as the feature candidate variable set and the target variable sequence, respectively.
[0034] Preferably, for the set of candidate variables, the full time-series data of 28 process variables other than the outlet moisture content can be extracted from the standardized time-series process dataset to construct a set of candidate variables X∈RN×D, where N is the total length of the time-series sample, D=28 is the dimension of the candidate features, and each row corresponds to all candidate feature values of a time step.
[0035] Preferably, for the target variable sequence, the full-time data of the moisture content of the outlet material can be extracted from the standardized time-series process dataset to construct the target variable sequence Y∈R. N×1 Each element corresponds to the measured outlet moisture content at a time step, which serves as the target variable for subsequent prediction tasks.
[0036] Step S1.4: The maximum correlation and minimum redundancy algorithm is used to calculate the correlation and redundancy of the feature candidate variable set and the target variable sequence, and the maximum correlation and minimum redundancy score results of each candidate variable are obtained.
[0037] Preferably, this step uses the maximum correlation minimum redundancy (mRMR) algorithm to complete the quantitative calculation of the correlation between candidate features and target variables and the redundancy between features.
[0038] The core principle of the mRMR algorithm is to maximize the mutual information between candidate features and the target variable (maximum correlation term) while minimizing the average mutual information between candidate features and selected features (minimum redundancy term), thereby achieving quantitative evaluation of features. Its core scoring formula is as follows: .
[0039] Where, x i y is the candidate feature variable to be evaluated; Y is the target variable sequence; I(;) is the mutual information between two continuous variables, calculated by kernel density estimation; S is the set of selected feature variables; |S| is the number of selected features.
[0040] Preferably, the calculation process is as follows: first, calculate the mutual information between all candidate features and the target variable Y to obtain the maximum correlation term for each feature; then, iteratively calculate the average mutual information between each candidate feature and the selected feature set to obtain the minimum redundancy term; finally, calculate the mRMR score result for each candidate variable.
[0041] For example, this step can be implemented using the following pseudocode: from sklearn.feature_selection import mutual_info_regression def mrmr_score_calc(X, y, selected_features): # Calculate the mutual information (maximum correlation term) between candidate features and the target variable. relevance = mutual_info_regression(X, y, discrete_features=False) # Calculate the average mutual information (minimum redundancy term) between candidate features and selected features. redundancy = np.zeros(X.shape[1]) if len(selected_features)>0: for i in range(X.shape[1]): redundancy[i] = np.mean(mutual_info_regression(X[:, selected_features], X[:, i], discrete_features=False)) # Calculate mRMR score mrmr_score = relevance - redundancy return mrmr_score Step S1.5: Sort and filter the set of feature candidate variables according to the maximum correlation and minimum redundancy score of each candidate variable to obtain the set of key process variables related to the outlet moisture content.
[0042] Preferably, this step completes feature sorting and filtering based on mRMR scoring results.
[0043] For feature ranking, all 28 candidate feature variables are sorted in descending order of mRMR score from high to low. The higher the score, the stronger the correlation between the feature and the outlet moisture content and the lower the redundancy.
[0044] In terms of setting the screening threshold, a dual screening threshold is set: first, the features with the highest cumulative scores are selected; second, low-relevance features with mRMR scores less than 0.05 are removed. Finally, 13 key process variables are selected to form a set of key process variables.
[0045] Among them, the key variable set details are divided into four categories: energy supply (drying machine cylinder wall temperature, cylinder wall steam pressure, hot air initial temperature, working steam pressure, HT steam flow rate, condensate temperature), material state (inlet material cumulative flow rate, inlet material moisture content, drying machine quantitative feeding), airflow and environment (hot air volume, exhaust air velocity, exhaust pressure change), and process feedback (outlet material temperature).
[0046] Step S1.6: Extract time series data corresponding to the key process variable set from the standardized time series process dataset to obtain the initial input feature sequence.
[0047] Preferably, the full time-series data of the corresponding variables can be extracted from the standardized time-series process dataset based on the column indices of each variable in the key process variable set, and the initial input feature sequence X can be constructed. in ∈R N×d Where N is the total length of the time series sample, d=13 is the key feature dimension after screening, each row corresponds to the preprocessed values of 13 key process variables at a time step, and the timestamp of the sequence corresponds one-to-one with the timestamp of the target variable sequence.
[0048] Step S1.7: Perform time-series dimension matching between the initial input feature sequence and the target variable sequence to obtain the initial input feature sequence and the target sequence of outlet moisture content.
[0049] Preferably, this step addresses the large time delay characteristic of the wire drying process by completing the time-series matching of input features and target variables.
[0050] Regarding the lag time setting, based on the material transmission mechanism of the drying process, the material transmission lag time from the inlet to the outlet of the drying machine is set to τ=12s, meaning that changes in the inlet process parameters will be reflected in the change of the outlet moisture content after 12s.
[0051] Specifically, for the time-series matching rule, the feature vector of the initial input feature sequence at time t is matched with the outlet moisture content value of the target variable sequence at time t+τ to construct a one-to-one time-series sample pair.
[0052] For sequence regularization, invalid segments that cannot be matched at the beginning and end of the sequence are removed, and the initial input feature sequence and the target moisture content sequence at the outlet are finally obtained with complete temporal dimension matching and one-to-one correspondence between sample pairs.
[0053] Beneficial effects of steps S1.1 to S1.7: Step S1.1: Complete collection of time-series process data for the entire tobacco drying process, providing a basic data source matching the production process for subsequent feature selection and data processing. Step S1.2: Perform time-series alignment and standardization on the raw data to eliminate time-series misalignments and dimensional differences, improving overall data quality and providing a standardized data foundation for subsequent variable extraction and quantitative calculation. Step S1.3: Separate and extract process variables and the target variable (outlet moisture content) from the standardized data, clarifying the candidate range and prediction target for feature selection, and establishing a clear variable correspondence for subsequent correlation calculations. Step S1.4: Calculate the correlation and redundancy quantification between candidate variables and target variables using the maximum correlation minimum redundancy algorithm, providing a basis for feature selection. The selection process provides objective and quantitative evaluation criteria, avoiding subjective bias caused by manual selection. Step S1.5 uses the quantitative scoring results to sort and select key process variables, eliminating redundant and irrelevant variables and retaining core features strongly correlated with outlet moisture content, thereby reducing the computational complexity and noise interference of subsequent modeling. Step S1.6 accurately extracts the corresponding time series data of key process variables from standardized time series data to form an initial input feature sequence consistent with the selection results. Step S1.7 achieves temporal dimension matching between the initial input feature sequence and the target variable sequence, eliminating temporal misalignment problems and ensuring accurate correspondence between input features and prediction targets in the time dimension, providing an effective sample basis for temporal matching for subsequent data processing and model construction.
[0054] Step S2: Based on the dynamic normalization strategy of sliding window, the initial input feature sequence and the target moisture content sequence at the outlet are standardized and sliced by sliding time window to obtain a standardized time series feature sample set.
[0055] Furthermore, step S2 specifically includes the following steps: Step S2.1: Perform time-series dimension alignment and timestamp matching on the initial input feature sequence and the target moisture content sequence at the outlet.
[0056] Preferably, this step uses the initial input feature sequence and the outlet moisture content target sequence output in step S1 as the processing objects to complete the precise alignment of the time dimension of the two sequences, eliminate the time misalignment problem caused by sensor acquisition delay and data transmission deviation, and provide a basic sequence with strict time matching for subsequent sliding window slicing.
[0057] Among them, the alignment benchmark and accuracy settings take the timestamp of the target sequence of outlet moisture content as the only alignment benchmark, match the sampling frequency of 1Hz in the drying process, and set the timestamp matching tolerance threshold to ±50ms. Sampling points that exceed the tolerance range are judged as time sequence misalignment points.
[0058] For feature sampling points with temporal misalignment, piecewise linear interpolation is used to correct the resampling in the time dimension. The interpolation formula is as follows: .
[0059] Where t is the target base timestamp, t k t k+1 x represents the effective sampling timestamps adjacent to each other before and after the misalignment point. i Let i be the sampled value of the i-th feature. These are the eigenvalues after interpolation correction.
[0060] Preferably, after completing the interpolation correction, the time step lengths of the initial input feature sequence and the target moisture content sequence at the outlet are verified. For sequences with inconsistent lengths, the intersection of their timestamps is taken as the effective interval, and redundant sequence segments with no matching at the beginning and end are trimmed to ensure that the total number of time steps of the two sequences are completely consistent.
[0061] Step S2.2: Perform sliding time window slicing on the time-aligned initial input feature sequence and the outlet moisture content target sequence to obtain a time-series feature slice sequence with a fixed window length and a corresponding label sequence.
[0062] Preferably, this step takes the time-aligned initial input feature sequence and the target moisture content sequence at the outlet output from step S2.1 as the processing objects. In view of the large time delay dynamic characteristics of the drying process, the continuous time series data is converted into structured samples that meet the input requirements of the KAN network through a sliding time window, so as to realize the correspondence and matching between historical process information and future outlet moisture content.
[0063] Among them, for window parameter settings, based on the material transfer lag time and process dynamic characteristics of the wire drying process, the core window parameter is set as follows: historical window length seq len =12 (corresponds to 12 seconds of historical process data, covering the entire material transport delay cycle), prediction step size pred step =1 (predict the outlet moisture content in the next 1 second), stride=1 (the window slides step by step over time to ensure the temporal continuity of the sample).
[0064] The mathematical definition of a sliding slice is as follows: For a time series of total length N, the feature slice and corresponding label of the k-th sample generated by the sliding window are defined as follows: X k =[x k ,x k+1 ,...,x k+seq_len-1 ]∈R seq_len×d .
[0065] y k =y k+seq_len-1+pred_step ∈R1。
[0066] Where d=13 is the input feature dimension, x k Let y be the eigenvector at time step k. k For the outlet moisture content label value corresponding to the time step, the value of k ranges from 1 to k to N-seq. len -pred step +1.
[0067] Preferably, according to the above sliding rule, the time-aligned feature sequence and target sequence are sliced window by window to generate a time-series feature slice sequence and corresponding label sequence with a fixed window length, ensuring that the time correspondence between each feature slice and label strictly conforms to the hysteresis characteristics of the wire drying process.
[0068] Step S2.3: Perform window-by-window statistical calculations on the temporal feature slice sequence and the corresponding label sequence to obtain the mean vector and standard deviation vector corresponding to each sliding window.
[0069] Preferably, this step takes the time-series feature slice sequence with a fixed window length and the corresponding label sequence output in step S2.2 as the processing object, and provides window-specific statistical characteristic parameters for subsequent dynamic normalization, completely abandoning the dependence on global statistics and fundamentally adapting to the non-stationary characteristics of industrial data.
[0070] Specifically, for the statistical calculation rules, for each independent time-series feature slice window, the statistical measure is calculated separately according to the feature dimension, ensuring that the standardization of each window depends only on the data distribution within its own window and is not related to other windows or the global data distribution.
[0071] Among them, for the formulas for calculating the mean and standard deviation, for the k-th feature slice window X k ∈R seq_len×d The mean μ of its i-th dimension feature k,i With standard deviation σ k,i The calculation formula is as follows: . θ=10 -8 To prevent division by zero errors, extremely small constants must be strictly avoided, especially when the standard deviation is 0 due to the lack of fluctuation in values within the feature window.
[0072] Preferably, according to the above formula, statistical calculations are performed on all feature slice windows window by window and dimension by dimension to generate the mean vector μ corresponding to each sliding window. k ∈R d and standard deviation vector σ k ∈R d Ultimately, a mean vector sequence and a standard deviation vector sequence are formed that correspond one-to-one with the feature slice sequence.
[0073] Step S2.4: Based on the dynamic normalization strategy of sliding window, perform window-by-window data standardization on the mean vector and standard deviation vector corresponding to each sliding window, as well as the time-series feature slice sequence and corresponding label sequence with fixed window length, to obtain the standardized time-series feature slice sequence and standardized label sequence.
[0074] Preferably, this step takes the mean vector and standard deviation vector corresponding to each sliding window output in step S2.3, as well as the time-series feature slice sequence and corresponding label sequence with fixed window length output in step S2.2, as the processing objects to complete the window-by-window standardization process, eliminate the difference in the units of different features, and adapt to the data distribution shift caused by the fluctuation of the drying process, thereby improving the robustness of the model to cross-batch and variable working condition scenarios.
[0075] Among them, the dynamic normalization of the feature sequence is the same as the conventional normalization, that is, the value is subtracted from the mean and then divided by the standard deviation, so as to map the feature data in each window to a standard distribution with a mean of 0 and a variance of 1, while preserving the relative fluctuation trend within the window, rather than the absolute value.
[0076] Specifically, for the label sequence standardization process, the same window-by-window standardization rule as the feature sequence is adopted for the label sequence. Each label value is standardized using the label statistics of the corresponding feature window to ensure the logical consistency between label standardization and feature standardization.
[0077] Specifically, for the domain mapping, in order to adapt to the [-1,1] domain requirement of the B-spline basis function in the subsequent KAN network, a linear mapping is performed on the standardized features and label data, constraining the data range to the [-1,1] interval, thereby further improving the fitting accuracy of the B-spline function.
[0078] For example, this step can be implemented using the following pseudocode: def window_dynamic_normalize(feature_slices, label_sequence, mean_vectors, std_vectors): n_windows = feature_slices.shape[0] normalized_features = np.zeros_like(feature_slices) # Window-by-Window Standardized Feature Sequence for k in range(n_windows): normalized_features[k] = (feature_slices[k]- mean_vectors[k]) / std_vectors[k] # Tag Sequence Standardization label_mean = np.mean(label_sequence) label_std = np.std(label_sequence) + 1e-8 normalized_labels = (label_sequence - label_mean) / label_std # Map to the interval [-1, 1] normalized_features = np.clip(normalized_features, -1, 1) normalized_labels = np.clip(normalized_labels, -1, 1) return normalized_features, normalized_labels Step S2.5: Perform sample dimension normalization on the standardized temporal feature slice sequence and the standardized label sequence to obtain a standardized temporal feature sample set.
[0079] Preferably, this step uses the standardized temporal feature slice sequence and standardized label sequence output in step S2.4 as the processing objects to complete the dimension regularization, validity verification and format unification of the sample set, construct a standardized temporal feature sample set that meets the requirements of subsequent batch partitioning and KAN network training, and eliminate invalid samples that interfere with model training.
[0080] Among them, for dimension consistency regularization, the dimension of all feature slices can be verified to ensure that the dimension of the feature matrix of each sample is strictly [seq_len,d]=[12,13] and the dimension of each label is strictly [1], and invalid samples with dimension mismatch or incomplete data are removed.
[0081] Among them, for the removal of invalid samples, a sample validity threshold can be set. Samples with no fluctuation or all zero values whose feature variance is less than 1e-6 within the window are judged as invalid samples and removed to avoid the risk of model overfitting caused by samples without information.
[0082] Preferably, the verified feature slice sequence and label sequence can be converted into a tensor format adapted to the deep learning framework, with a unified data type of float32, while saving the mean and standard deviation of each sample to provide a basis for inverse normalization of subsequent online prediction.
[0083] Preferably, the final standardized time-series feature sample set contains three core parts: feature tensor X dataset ∈R M×12×13 Label tensor Y dataset ∈R M×1 Window statistics set S dataset ∈R M×2×13 , where M is the total number of valid samples.
[0084] Beneficial effects of steps S2.1 to S2.5: Step S2.1 achieves precise temporal correspondence between the input feature sequence and the target sequence, eliminating temporal misalignment and providing an effective data foundation for subsequent window slicing processing. Step S2.2 adapts to the large time delay dynamics of the drying process by using sliding time window slicing, transforming continuous time-series data into structured samples that meet model input requirements, providing a unified sample unit for subsequent data standardization. Step S2.3 accurately quantifies the statistical characteristics of data within each sliding window, providing a calculation basis for dynamic normalization to adapt to the data distribution within the window, avoiding the problem of insufficient adaptability to operating conditions caused by global statistics. Step S2.4 completes window-by-window data standardization based on the dynamic normalization strategy of sliding windows, alleviating the model performance degradation problem caused by the non-stationarity of industrial data, enhancing the model's robustness to fluctuations in different batches of raw materials and operating conditions, and providing standardized data for subsequent sample regularization. Step S2.5 completes the dimensional regularization and matching verification of standardized samples, ensuring the consistency of sample set dimensions and data integrity, providing a standardized sample foundation with structural regularity and temporal matching for subsequent model training and testing.
[0085] Step S3: Divide the standardized time-series feature sample set into training set and test set according to the production batch, and construct a KAN prediction network that integrates SiLU basis function and parameterized B spline function.
[0086] Furthermore, step S3 specifically includes the following steps: Step S3.1: Extract the production batch identifier and time series data corresponding to the standardized time series feature sample set.
[0087] Preferably, this step uses the standardized time-series feature sample set output by S2 as the processing object to complete the association and binding of production batch identifiers with corresponding time-series samples, providing a clear batch attribution basis for subsequent time-series sorting and leak-free dataset partitioning.
[0088] For the traceability and binding of batch identifiers, the production batch metadata recorded by the DCS system of the tobacco drying machine can be used as the basis. Each production batch corresponds to a unique batch ID, which includes core identifiers such as batch start time, stop time, cigarette brand, and tobacco batch number. Through timestamp matching, each sample in the standardized time-series feature sample set is bound to its unique batch ID, thus clarifying the production batch affiliation of the sample.
[0089] For batch data validity verification, a batch validity threshold can be set. A batch with ≥3000 valid samples during the stable production stage is considered a complete and valid batch. Non-steady-state samples during the batch start-up preheating and shutdown closing stages are removed, and only valid samples during the stable operation stage of the drying machine are retained to avoid interference from non-steady-state data on model building.
[0090] Specifically, for the structured output of associated data, the batch ID, batch time interval, and valid sample set within the batch can be structured and integrated to output a standardized time-series feature sample set with batch labels, ensuring that each sample can be traced back to its production batch and production time node.
[0091] For example, this step can be implemented using the following pseudocode: import pandas as pd import numpy as np def batch_label_bind(standardized_dataset, batch_meta_data): # Batch metadata includes batch ID, start time, and end time. sample_timestamps = standardized_dataset["timestamps"] batch_labels = np.zeros(len(sample_timestamps), dtype=object) # Match the batch ID to each sample for _, batch in batch_meta_data.iterrows(): batch_mask = (sample_timestamps>= batch["start_time"])&(sample_timestamps<= batch["end_time"]) batch_labels[batch_mask] = batch["batch_id"] # Bind batch labels to dataset standardized_dataset["batch_id"] = batch_labels # Screening for valid batches and valid samples valid_batch_mask = standardized_dataset["batch_id"] != 0 labeled_dataset = {k: v[valid_batch_mask] for k, v in standardized_dataset.items()} return labeled_dataset Step S3.2: Sort all time series data in the standardized time series feature sample set according to the time sequence of the production batch identifiers to obtain the time series sorted feature sample set.
[0092] Preferably, this step uses the standardized time-series feature sample set with batch labels output in step S3.1 as the processing object, strictly follows the time-series logic of industrial production to complete the sample sorting, avoids leakage of time-series information, and ensures that the dataset partitioning conforms to the real application logic of industrial site.
[0093] Among them, the sorting benchmark setting can use the actual production time of the production batch as the only global sorting benchmark, strictly follow the time sequence rule of "first batches put into production first, later batches put into production last", completely abandon random shuffling operations, and fit the core needs of industrial scenarios of "historical data modeling and future data application".
[0094] Among them, for the time-sequence preservation constraint within the batch, the production time sequence of samples within a single batch is strictly preserved during the sorting process, ensuring that samples within the batch are arranged in ascending order of time, without disrupting the dynamic time-sequence dependency of continuous production in the wire drying process.
[0095] Preferably, after sorting, the monotonicity of the batch production time is verified to ensure that the time sequence of the sorted batches is strictly increasing and without overlap; at the same time, the continuity of the timestamps of the samples within the batch is verified to avoid time sequence disorder caused by sorting.
[0096] Preferably, the data structure of the sample set, such as features, labels, and batch labels, is reconstructed according to the sorted batch order and the temporal order of samples within the batch, ensuring that the indexes of all data are completely corresponding, and finally outputting the temporally sorted feature sample set.
[0097] Step S3.3: Divide the time-series sorted feature sample set according to the time sequence of the production batch identifiers to obtain the training set and the test set.
[0098] Preferably, this step uses the time-series sorted feature sample set output in step S3.2 as the processing object to complete the dataset partitioning without time-series leakage, ensuring the objectivity and authenticity of the model generalization performance verification, and conforming to the actual scenario of cross-batch application in industrial fields.
[0099] Among them, the division rules and parameter settings strictly follow the "past training, future testing" principle of time-series industrial modeling, and set the batch-level division ratio to 8:2; based on the total number of effective batches, all samples of the first 80% of complete production batches in the time series are selected as the training set, and all samples of the remaining 20% of complete production batches in the time series are selected as completely independent test sets.
[0100] For example, taking 10 valid complete batches as an example, the first 8 batches of samples form the training set, and the last 2 batches of samples form the test set. The division boundary is strictly based on the batch, and the internal samples of a single batch are not split.
[0101] In particular, for the absence of data leakage constraints, the production time of all samples in the test set is guaranteed to be completely later than the production time of all samples in the training set. This fundamentally avoids the "future information leakage" problem caused by random partitioning and time sequence disruption, ensuring that the model performance evaluation can truly reflect the cross-batch generalization ability.
[0102] Preferably, after the partitioning is completed, the feature dimensions and label dimensions of the training set and the test set are verified to be completely consistent, wherein the feature dimensions are [number of samples, 12, 13] and the label dimensions are [number of samples, 1]. At the same time, the training set and the test set are verified to have no batch overlap and no sample duplication. Finally, the training set and the test set are output.
[0103] Step S3.4: Determine the network topology based on the Kolmogorov-Arnold representation theorem, and define the basic KAN network topology framework by combining the feature dimensions and output dimensions of the training and test sets.
[0104] Preferably, this step is based on the feature dimensions and output dimensions of the training and test sets output in step S3.3, and combines the Kolmogorov-Arnold representation theorem to define a basic KAN network topology adapted to the task of predicting moisture content at the drying wire outlet, providing theoretical support and structural foundation for the subsequent introduction of hybrid basis functions and network construction.
[0105] Among them, the Kolmogorov-Arnold representation theorem, which provides the core theoretical support, states that any multivariable continuous function f:[0,1] defined on a unit cube... n→R can all be expressed as a composite and linear combination of a finite number of single-variable continuous functions, with the core formula being: .
[0106] Where ϕ q,p For an inner univariate continuous function, Φ q It is a continuous function of the outermost single variable.
[0107] This theorem provides the theoretical basis for KAN networks. Unlike traditional neural networks that set fixed activation functions at neuron nodes, KAN explicitly deploys learnable nonlinear functions on the network connection edges, making it naturally suited to the interpretability modeling needs of industrial processes.
[0108] For the topology parameter settings, the hierarchical structure of the basic KAN network can be set according to the input and output characteristics of the prediction task: the number of nodes in the input layer is determined by the sliding window length and the feature dimension, that is, the input layer dimension = window length × feature dimension = 12 × 13 = 156; two hidden layers are set, and the number of nodes in each hidden layer is set to 64; the output layer is a single node, corresponding to the predicted value of the outlet moisture content, and the basic topology is finally determined to be [156, 64, 64, 1].
[0109] Among them, the core rules of the basic framework can define the basic logic of inter-layer propagation. The output of each layer consists of the sum of the activation functions of the corresponding edges of all nodes in the previous layer. The fixed activation functions at the node level of traditional neural networks are not introduced. All nonlinear transformations are implemented by learnable functions on the network connection edges, and the final output is the basic KAN network topology framework.
[0110] Step S3.5: Introduce SiLU basis functions and parameterized B-spline functions based on the Cox. de Boor recursive formula on the network connection edges of the basic KAN network topology framework to obtain the hybrid basis function KAN network layer.
[0111] Preferably, this step is based on the basic KAN network topology framework output in step S3.4. A hybrid basis function that integrates global smoothness characteristics and local fitting ability is introduced on the network connection edges to solve the problem that a single basis function is difficult to simultaneously capture the global trend of the drying process and the local nonlinearity, thus obtaining a hybrid basis function KAN network layer.
[0112] Among them, for the core design of the hybrid edge activation function, a hybrid edge activation function that integrates the SiLU global basis function and the parameterized B-spline function can be designed to address the differentiated nonlinear characteristics of different process variables in the wire drying process. The edge activation function Bj,i(x) connecting the i-th node in the l-th layer to the j-th node in the (l+1)-th layer is defined as: B j,i (x)=w base ·SiLU(x)+wspline ·B-spline(x). Where, w base w represents the trainable weights of the SiLU basis functions. spline Let x be the trainable scaling factor of the B-spline function, and let x be the input signal of the edge.
[0113] In the implementation of the SiLU basis function, the SiLU basis function is responsible for providing a globally smooth nonlinear transformation, adapting to the overall trend fitting between process variables and outlet moisture content. Its formula is: . This function is differentiable everywhere, smooth and non-monotonic, and has no upper bound but a lower bound. It can make up for the insufficient generalization ability of B-spline functions outside the data domain and improve the robustness of the model to operating condition fluctuations.
[0114] For the implementation of parameterized B-spline functions, third-order B-spline basis functions can be used, based on the Cox-deBoor recursive formula definition, to accurately characterize local nonlinear properties. The recursive formula is as follows: .
[0115] Where k=3 is the order of the B-spline, t i The preset grid nodes are set to 8, and the domain is mapped to the interval [-1,1], which perfectly matches the standardized output range of step S2, ensuring the fitting accuracy of the B-spline function.
[0116] Preferably, the hybrid basis functions can be optimized for adaptability. The SiLU basis function is responsible for capturing the global influence trend of process variables on outlet moisture, while the B-spline function is responsible for characterizing the local nonlinear sensitivity characteristics within a specific operating range. The combination of the two not only ensures the global generalization ability of the model, but also improves the fitting accuracy of local operating fluctuations in the drying process, thus completing the core structure definition of the hybrid basis function KAN network layer.
[0117] Step S3.6: Stack multiple layers of the hybrid basis function KAN network and define the dimensional parameters of the input layer, hidden layer, and output layer to obtain a KAN prediction network that integrates SiLU basis functions and parameterized B-spline functions.
[0118] This step uses the hybrid basis function KAN network layer output in step S3.5 and the basic topology framework defined in step S3.4 as the basis to complete the multi-layer stacking and dimensional adaptation of the network, and build a KAN prediction network with a complete structure that is adapted to the silk drying prediction task.
[0119] For multi-layer stacking, the hybrid basis function KAN network layers can be stacked in a topology of [156,64,64,1]. The layers are fully connected, that is, each node in the upper layer is connected to each node in the lower layer through the hybrid basis function edge activation function defined in step S3.5, so as to ensure the integrity of the nonlinear expressive power.
[0120] For inter-layer propagation, the computational logic of the network's forward propagation can be clearly defined. For the j-th neuron in the (l+1)-th layer, its input value... The formula is obtained by summing the edge activation functions of all nodes in the l-th layer: n in Let l be the number of nodes in the l-th layer. Let Bj,i() be the output value of the i-th node in the l-th layer, and let Bj,i() be the mixed basis activation function of the corresponding connection edge.
[0121] Preferably, the dimensions and functions of each layer are configured as follows: For the input layer, the temporal feature slices [12,13] can be flattened into a 156-dimensional one-dimensional vector to complete the input of the temporal features and adapt to the input format requirements of the KAN network.
[0122] For the hidden layer, there are two stacked KAN network layers with hybrid basis functions, each with a fixed number of 64 nodes. All edge activation functions adopt a unified SiLU+B spline hybrid structure to ensure the consistency of the network's nonlinear expression.
[0123] For the output layer, a single-node output can be set, and the edge activation function can be a linear function to directly output the standardized predicted outlet moisture content, avoiding prediction deviations caused by additional nonlinear transformations.
[0124] Preferably, the trainable parameters of the hybrid basis functions can be initialized, the weights of the SiLU basis functions are initialized to 1.0, and the control coefficients of the B-splines are initialized to a normal distribution with a mean of 0 and a standard deviation of 0.1, to ensure the stability of the forward propagation of the network in the early stage of training.
[0125] For example, this step can be implemented using the following pseudocode: import torch import torch.nn as nn from kan import KANLayer class HybridKANPredictor(nn.Module): def __init__(self, input_dim=156, hidden_dims=[64, 64], output_dim=1, spline_order=3, grid_size=8): super().__init__() # Multilayer hybrid basis function KAN layer stacking self.layers = nn.ModuleList() # Input layer → First hidden layer self.layers.append(KANLayer(in_features=input_dim, out_features=hidden_dims[0], spline_order=spline_order, grid_size=grid_size,base_activation=nn.SiLU())) # First hidden layer → Second hidden layer self.layers.append(KANLayer(in_features=hidden_dims[0], out_features=hidden_dims[1], spline_order=spline_order, grid_size=grid_size,base_activation=nn.SiLU())) # Second Hidden Layer → Output Layer self.output_layer = KANLayer(in_features=hidden_dims[1], out_features=output_dim, spline_order=spline_order, grid_size=grid_size,base_activation=nn.Identity()) def forward(self, x): # Flattened sequence feature input x = x.flatten(start_dim=1) # Forward Propagation for layer in self.layers: x = layer(x) x = self.output_layer(x) return x # Instantiate the complete KAN prediction network kan_predictor = HybridKANPredictor() Beneficial effects of steps S3.1 to S3.6: Step S3.1: Clarify the batch affiliation and temporal attributes of the samples to provide a clear classification basis for subsequent dataset partitioning and avoid misalignment between sample temporal and batch information; Step S3.2: Ensure the temporal continuity of the sample set, conforming to the batch sequence logic of industrial production, avoiding the risk of temporal information leakage caused by random sorting, and providing a temporal basis that conforms to the actual production scenario for subsequent dataset partitioning; Step S3.3: Ensure that the test set is completely independent of the training set in both batch and temporal dimensions, conforming to the actual working conditions of industrial field model applications, objectively supporting the verification of the model's cross-batch generalization ability, and avoiding interference from data leakage on model performance evaluation; Step S3. 4. Construct a basic network structure adapted to the prediction task of the wire drying process, providing a theoretically sound structural support for subsequent basis function introduction and network construction, and ensuring the adaptability of the network structure to the prediction task; Step S3.5 balances the ability to characterize global smoothness trends and local nonlinear characteristics, breaking through the limitations of single basis functions in the nonlinear expression of complex industrial processes, and adapting to the differentiated effects of different process variables in the wire drying process; Step S3.6 Construct a complete prediction network with multi-dimensional nonlinear mapping capabilities, realizing the effective expression of the complex coupling relationship between process variables and outlet moisture, and providing a structurally complete prediction network foundation for subsequent model training and optimization.
[0126] Step S4: Introduce sparsity regularization constraints and adaptive grid update strategy into the KAN prediction network, and perform iterative optimization of network parameters using the training set to obtain the converged KAN prediction model.
[0127] Furthermore, step S4 specifically includes the following steps: Step S4.1: Construct the mean squared error loss function based on the KAN prediction network and training set, and introduce the sparsity regularization constraint composed of L1 regularization and entropy regularization to obtain the total loss function with sparsity regularization constraint.
[0128] Preferably, this step uses the KAN prediction network that integrates SiLU basis functions and parameterized B spline functions output in step S3 and the training set obtained in step S3 as a basis to construct a total loss function that takes into account fitting accuracy, parameter sparsity and model complexity for the regression prediction task of moisture at the drying outlet, providing a clear optimization objective for subsequent iterative optimization.
[0129] For the main loss term calibration, mean squared error (MSE) can be used as the main loss function for the prediction task to constrain the fitting deviation between the model's predicted values and the true labels, targeting N within a batch. batch For each sample, the main loss term is defined as follows: .in, Let y be the standardized predicted outlet moisture content of the i-th sample output by the KAN network. i This represents the true label value of the corresponding sample.
[0130] Among them, for the sparsity regularization constraint design, for the trainable parameters of the KAN network edge functions, the dual constraints of L1 regularization and entropy regularization are introduced. During the training process, the parameters of unimportant connection edges are forced to approach 0, thereby achieving sparsity of the network structure, providing support for the intrinsic interpretability of the model, and suppressing overfitting.
[0131] Among them, for the L1 regularization term, the trainable control coefficients and basis function weights of the B-spline function, The sparsity of the constraint parameters is expressed by the following formula: Where L is the total number of network layers. , Let c be the basis weights and spline scaling factor of the edge function from the i-th input node to the j-th output node in the l-th layer. (l,j,i) Let λ1 be the trainable control coefficient vector of the corresponding side B-spline function, and λ1 be the L1 regularization weight, calibrated to 10. -4 .
[0132] Among them, the entropy regularization term, for the output distribution of the edge activation function, reduces the complexity of the activation function and avoids overfitting noise. The formula is as follows: Where H() is the calculation function, λ is the output value of the corresponding edge activation function on the training set, and λ2 is the entropy regularization weight, calibrated to 10. -5 .
[0133] In summary, by linearly combining the main loss term and the regularization term, we obtain the final total loss function: L total =L MSE +L L1 +L entropy .
[0134] Step S4.2 introduces an adaptive grid update strategy into the total loss function to obtain the iterative optimization objective function adapted to the KAN prediction network.
[0135] Preferably, this step is based on the total loss function with sparsity regularization constraints output in step S4.1. To address the problem that a fixed B-spline grid is difficult to adapt to the changes in the data distribution during the wire drying process, an adaptive grid update strategy is introduced to optimize the grid node distribution of the B-spline basis function, improve the model's fitting accuracy for nonlinear relationships, and finally determine the complete iterative optimization objective function.
[0136] Among them, the core principle of adaptive grid update is that the fitting accuracy of B-spline function is highly dependent on the distribution of grid nodes. Fixed grid is prone to problems such as underfitting in dense data areas and overfitting in sparse areas. Adaptive grid update dynamically adjusts the position of grid nodes according to the distribution of input data, so that the grid points match the actual distribution of data, while ensuring the monotonicity and non-overlapping of the grid.
[0137] The core formula and parameter calibration for grid updates are as follows: ① Grid update triggering rules: Set the grid update to be performed once every 2 training epochs to avoid training instability caused by frequent updates; during the update, new grid nodes are calculated based on the input data distribution of the current training batch.
[0138] ② Mesh Update Calculation: For a B-spline function with a single edge, the mesh nodes are updated based on the quantiles of the input data x. The update formula is: t new =quantile(x,linspace(0,1,G+k)) where t new The updated grid node vector is defined by G, where G is the number of grid nodes (calibrated to 8), k is the B-spline order (calibrated to 3), quantile() is the quantile calculation function, and linspace(0,1,G+k) represents the equally spaced quantile points between 0 and 1.
[0139] ③ Smooth Mesh Update Constraint: To avoid training oscillations caused by sudden mesh changes, a smooth update strategy is adopted. The new mesh node is obtained by weighting the old mesh and the new computational mesh, as shown in the formula: t updated =(1-ϵ grid )·t old +ϵ grid ·t new . t old For the old mesh nodes before the update, ϵ grid The grid update weight is set to 0.02 to ensure a smooth transition during grid updates.
[0140] ④ Mesh domain constraint: The updated mesh nodes are strictly constrained within the range of [-1,1], which is consistent with the standardized data range of step S2, thus avoiding fitting deviation caused by domain offset.
[0141] Preferably, the iterative optimization objective function can be determined by combining the adaptive mesh update strategy with the total loss function in step S4.1 to determine the complete iterative optimization objective: minimizing L. total With the core objective of simultaneously optimizing the trainable parameters of the side functions in each iteration, performing adaptive grid updates according to a set period, and simultaneously adjusting the grid nodes and corresponding control coefficients of the B-splines, ultimately obtaining an iteratively optimized objective function adapted to the KAN prediction network.
[0142] Step S4.3: The Adam optimizer is used to perform forward and backward propagation calculations on the iterative optimization objective function and training set to obtain the updated network parameters after each iteration.
[0143] Preferably, this step is based on the iterative optimization objective function of the adapted KAN prediction network output in S4.2 and the training set obtained in S3. The Adam optimizer is used to complete the batch-level forward and backward propagation calculations, solve the gradient of the loss function with respect to the trainable parameters of the network, and obtain the updated network parameters after each iteration.
[0144] Specifically, for training hyperparameter calibration, the core training hyperparameters are calibrated based on the dataset size of the silk drying process and the KAN network structure: ① Batch size = 128, to ensure the stability of gradient calculation and training efficiency.
[0145] ② The total number of training epochs is 30, covering the number of iterations required for model convergence.
[0146] ③ Adam optimizer core parameters: learning rate 0.01, first moment estimate exponential decay rate 0.9, second moment estimate exponential decay rate 0.999, numerical stability constant 10. -8 Weight decay 10 -6 .
[0147] ④ Training set and validation set splitting: From the 8 batches of the training set, the last batch of the time series is selected as the validation set during the training process, accounting for 12.5% of the total training set. This is used to monitor the generalization performance of the model and avoid overfitting.
[0148] For data loading and batch processing, a time-preserving batch loading method is adopted to avoid disrupting the temporal order of the training set, ensuring the temporal continuity of samples within a batch, and conforming to the dynamic characteristics of the wire drying process; the feature tensor and label tensor of each training batch are adapted to the device and migrated to GPU / CPU computing devices to ensure computational efficiency.
[0149] For the forward propagation calculation process, based on each training batch, the batch feature tensor is input into the KAN prediction network, and forward propagation calculation is performed according to the inter-layer propagation rules defined in S3. The process goes through input layer flattening, double hidden layer mixed basis function transformation, and output layer linear mapping in sequence, and finally outputs the standardized outlet moisture content prediction value of all samples in the batch.
[0150] Specifically, for backpropagation and gradient calculation, the batch total loss is calculated based on the predicted value output by the forward propagation and the batch true label using the total loss function defined in step S4.1; based on the automatic differentiation mechanism, backpropagation calculation is performed to solve the gradient value of the total loss function with respect to all trainable parameters of the network (including SiLU basis weights, B-spline scaling factors, and B-spline control coefficients).
[0151] Specifically, for parameter update calculation, the Adam optimizer calculates the update step size and update value for each trainable parameter based on the calculated gradient value and the deviation correction of the first and second moments, providing a basis for subsequent network parameter updates.
[0152] For example, this step can be implemented isomorphically as the following pseudocode: from torch.utils.data import DataLoader, TensorDataset # Loading training data train_features = torch.tensor(train_set["features"], dtype=torch.float32) train_labels = torch.tensor(train_set["labels"], dtype=torch.float32) train_dataset = TensorDataset(train_features, train_labels) train_loader = DataLoader(train_dataset, batch_size=128, shuffle=False) # Preserve order and do not shuffle # Optimizer initialization optimizer = torch.optim.Adam(kan_predictor.parameters(), lr=0.01, betas=(0.9, 0.999), eps=1e-8, weight_decay=1e-6) # Single-round training iteration logic def train_one_epoch(model, train_loader, optimizer, total_loss_func,epoch): model.train() epoch_loss = 0.0 for batch_idx, (batch_x, batch_y) in enumerate(train_loader): # Gradient Zeroing optimizer.zero_grad() # Forward Propagation batch_preds = model(batch_x) # Loss Calculation total_loss, mse_loss, l1_loss = total_loss_func(batch_preds,batch_y, model) # Backpropagation total_loss.backward() # Gradient clipping to avoid gradient explosion torch.nn.utils.clip_grad_norm_(model.parameters(), max_norm=1.0) # Optimizer step, calculate parameter update values optimizer.step() # Cumulative Losses epoch_loss += total_loss.item() * batch_x.size(0) # Adaptive grid update (executed every 2 epochs) if epoch % 2 == 0 and batch_idx == 0: For each layer in model.layers: adaptive_grid_update(layer, batch_x.flatten(start_dim=1)) avg_epoch_loss = epoch_loss / len(train_loader.dataset) return avg_epoch_loss, model Step S4.4: Synchronously update the side function parameters and grid node parameters of the KAN prediction network according to the network parameter update values to obtain the iteratively updated KAN prediction network.
[0153] Preferably, this step is based on the network parameter update values after each iteration output in step S4.3 and the adaptive grid update rules defined in step S4.2, to complete the synchronous update of all trainable parameters and grid node parameters of the KAN network, ensuring the consistency of network structure and parameters, and obtaining the iteratively updated KAN prediction network.
[0154] Specifically, the parameter classification synchronization update rule performs synchronous updates for the two core parameters of the KAN network to avoid fitting bias caused by parameter asynchrony: ① Trainable side function parameter update: Based on the parameter update values calculated by the Adam optimizer, apply the SiLU basis function weights w of all layers of the network. base B-spline scaling factor w spline The B-spline control coefficients c are updated synchronously, covering the trainable parameters of all inter-layer connection edges, ensuring the consistency of the forward propagation logic.
[0155] ② Synchronous update of grid node parameters: According to the update cycle set in step S4.2, in the corresponding training round, after the grid nodes are smoothly updated based on the input data distribution, the numerical calculation matrix of the B-spline basis function is updated synchronously to ensure that the calculation of the B-spline function is completely matched with the updated grid nodes, and to avoid calculation errors caused by misalignment between the basis function and the grid.
[0156] Among them, for parameter update constraints: ① Gradient clipping constraint: Set the maximum norm of the gradient to 1.0 to clip the gradients obtained from backpropagation, avoid abnormal parameter updates caused by gradient explosion, and ensure the numerical stability of the training process.
[0157] ② Mesh monotonicity constraint: The updated mesh nodes must be strictly monotonically increasing to avoid mesh node overlap and intersection, and to ensure the calculation validity of the Cox-deBoor recursive formula.
[0158] ③ Numerical range constraints: The update values of all trainable parameters are strictly constrained within a reasonable range to avoid training crashes caused by parameter overflow.
[0159] Preferably, after each parameter update, a forward propagation check is performed to ensure that the network can output predicted values normally without abnormal values such as NaN or Inf. If an anomaly occurs, the parameter state before the current update is rolled back, and gradient calculation and updates are re-executed. After completing parameter updates and grid updates for all batches in this round, the updated KAN prediction network for this iteration is obtained and used for the next iteration training or convergence check.
[0160] Step S4.5: Perform iterative convergence verification on the updated KAN prediction network. Stop the iteration when the preset convergence condition is met, and obtain the converged KAN prediction model.
[0161] Preferably, this step takes the iteratively updated KAN prediction network output in step S4.4 as the processing object, and uses multi-dimensional convergence verification rules to determine whether the model has reached a stable convergence state. When the convergence condition is met, the iteration is terminated to avoid overfitting caused by overtraining, and finally the converged KAN prediction model is output.
[0162] For the convergence condition calibration, a three-level convergence verification rule can be set, and training will stop when any one of the termination conditions is met: ① Maximum epoch termination condition: When the training iterations reach the preset maximum epochs=30, training is forcibly terminated to avoid meaningless continuous iterations.
[0163] ② Loss convergence termination condition: Monitor the MSE loss of the validation set. If the decrease in validation set loss is less than 10 for 5 consecutive epochs... -6 The model has been determined to have converged, and training is terminated early.
[0164] ③ Overfitting warning termination condition: If the training set loss continues to decrease while the validation set loss continues to increase for three consecutive epochs, the model is judged to be overfitting, training is terminated early, and the model weight with the lowest validation set loss is retained.
[0165] Preferably, during training, after each epoch of training is completed, forward propagation is performed on the validation set to calculate the validation set MSE loss; if the current validation set loss is the lowest historical value, the current model weights are saved as the optimal weights to avoid losing the optimal model parameters when the training is terminated prematurely.
[0166] Preferably, after each training round, the model is switched to evaluation mode, and gradient-free forward propagation is performed on the validation set to calculate the validation set loss and evaluation metric. The model is then compared with historical training records to verify whether the aforementioned preset convergence termination condition is met. If the termination condition is met, iterative training stops, and the saved optimal model weights are loaded; otherwise, the next round of iterative training continues.
[0167] Preferably, for a KAN prediction network that meets the convergence condition, all trainable parameters and grid node parameters are fixed, and the gradient calculation function is turned off to obtain a converged KAN prediction model that can be used for subsequent generalization performance verification and online inference.
[0168] Beneficial effects of steps S4.1 to S4.5: Step S4.1 establishes an optimization objective that balances fitting accuracy and parameter sparsity for model iterative optimization. While constraining redundant connections in the network, it provides underlying support at the loss level for the model's inherent interpretability, ensuring the matching degree between the optimization process and the prediction task requirements. Step S4.2 adapts the network's spline grid to the data distribution characteristics, improving the network's adaptability to nonlinear changes in the process, ensuring the iterative optimization process's ability to capture local data features, and avoiding the expression limitations caused by fixed grids. Step S4.3 achieves efficient and stable iterative updates of network parameters, avoiding gradient fluctuations during the optimization process, ensuring the convergence stability of the parameter update process, and providing a reliable computational path for continuous optimization of network parameters. Step S4.4 achieves collaborative optimization of network side function parameters and grid node parameters, ensuring the consistency between the network's nonlinear expressive ability and the data distribution, and avoiding fitting bias caused by asynchronous parameter updates. Step S4.5 terminates the iteration when the model reaches a stable fitting state, avoiding the risk of overfitting caused by excessive iteration, ensuring the model's generalization performance, and outputting a converged model with stable prediction capabilities.
[0169] Step S5: Verify the generalization performance of the converged KAN prediction model using a test set to obtain the tobacco shred exit moisture prediction model.
[0170] Furthermore, step S5 specifically includes the following steps: Step S5.1: Perform forward propagation inference calculations on the converged KAN prediction model and the test set to obtain the predicted outlet moisture content sequence corresponding to the test set.
[0171] Preferably, this step takes the converged KAN prediction model output in step S4 and the independent test set obtained in step S3 as input. In the pure inference mode without training interference, the prediction calculation of the full set of samples is completed, and the outlet moisture content prediction sequence corresponding to the test set time series is generated, providing basic data for subsequent performance evaluation.
[0172] Specifically, regarding inference mode and environment locking, the converged KAN prediction model can be permanently switched to the evaluation mode (eval mode), and all training-specific mechanisms, including gradient calculation, parameter update, adaptive grid update, Dropout regularization, etc., can be turned off. All trainable parameters and B-spline grid nodes of the network are fixed to ensure the determinism and reproducibility of the inference process and completely eliminate the interference of the training mechanism on the prediction results.
[0173] Among these, the feature data format of the test set can be verified to ensure that its dimension is [M]. test [,12,13](M test The test set (total number of valid samples), feature dimension order, and data preprocessing logic are completely consistent with the training set, without any additional feature transformations, thus avoiding prediction bias caused by preprocessing differences.
[0174] For time-series-preserving batch inference, the same batch size as the training phase (batch_size=128) can be used for inference. The batch loading process strictly preserves the original production time sequence of the test set samples, without scrambling or rearranging the samples, to ensure the temporal continuity of the final predicted sequence and the correspondence between the samples.
[0175] For the forward propagation inference process, the forward propagation logic can be strictly implemented in the same way as the training phase for each test batch: first, the input temporal feature slices are flattened into 156-dimensional vectors, and then the nonlinear transformation is completed through the two-layer hybrid basis function KAN hidden layer. Finally, the standardized outlet moisture content prediction value is generated through the linear output layer.
[0176] Preferably, the prediction results of all batches can be sequentially spliced according to the original sample order of the test set to generate a complete outlet moisture content prediction sequence that corresponds one-to-one with the test set samples. The output sequence dimension is [M]. test [1], which perfectly matches the dimension of the true label sequence in the test set.
[0177] Step S5.2: Align the predicted export moisture content sequence with the actual export moisture content label sequence in terms of time series dimension to obtain the aligned predicted sequence and the actual label sequence.
[0178] Preferably, this step takes the predicted outlet moisture content sequence and the actual outlet moisture content label sequence output in step S5.1 as input to eliminate the error interference caused by sample time sequence misalignment, index mismatch, and outliers, so as to achieve a precise one-to-one correspondence between the predicted value and the actual value in the time dimension, and provide effective data pairs for time sequence matching for subsequent error calculation.
[0179] For the establishment of timestamp-level alignment benchmarks, the original production timestamp bound to each sample in the test set can be used as the unique alignment benchmark. The target prediction timestamp (sample window end time + 1s prediction step) corresponding to each predicted value and the collection timestamp corresponding to each real label value are clearly defined, and a one-to-one mapping relationship between "predicted value - timestamp" and "real value - timestamp" is established.
[0180] Specifically, for matching tolerance and misalignment correction, the timestamp matching tolerance threshold can be set to ±50ms. For samples whose timestamp deviation exceeds the tolerance, they are marked as time-series mismatched samples, and the predicted sequence and the real label sequence are synchronously removed. For samples with slight misalignment, linear interpolation is used to complete time-series rematching to ensure that each pair of predicted values and real values correspond to the same physical time of the outlet moisture content.
[0181] Preferably, after completing temporal matching, redundant segments where the beginning and end of the predicted sequence and the true label sequence do not match are trimmed. Simultaneously, outlier samples containing NaN and Inf in the predicted sequence, along with their corresponding true labels, are removed. This ensures that the aligned predicted sequence and the true label sequence are completely identical in terms of sample count, temporal length, and dimension, all being [M]. valid [1] dimension, where M valid This represents the total number of valid samples after time-series matching.
[0182] Step S5.3: Calculate the prediction error between the aligned predicted sequence and the true label sequence to obtain the generalization performance evaluation index of the converged KAN prediction model.
[0183] Preferably, this step uses the aligned predicted sequence and the true label sequence output from step S5.2 as input, and employs universally recognized quantitative evaluation indicators in the field of industrial soft measurement to objectively calculate the model's prediction accuracy and generalization ability, providing a quantifiable basis for subsequent model performance verification. The evaluation indicator system can be calibrated for the regression prediction task of moisture content at the drying wire outlet, selecting three core evaluation indicators commonly used in the field of industrial process soft measurement. These indicators comprehensively cover the three core dimensions of model fit, error amplitude, and average deviation, and are the coefficient of determination (R²). 2 ), Root Mean Square Error (RMSE), Mean Absolute Error (MAE).
[0184] It is worth noting that the evaluation indicators in this step are all based on mature existing technologies and conventional calculations, so the specific calculation process will not be described in detail.
[0185] For example, this step can be implemented using the following pseudocode: from sklearn.metrics import r2_score, mean_squared_error, mean_absolute_error def model_performance_calc(aligned_pred, aligned_label): # Core Indicator Calculation r2 = r2_score(aligned_label.flatten(), aligned_pred.flatten()) rmse = np.sqrt(mean_squared_error(aligned_label.flatten(), aligned_pred.flatten())) mae = mean_absolute_error(aligned_label.flatten(), aligned_pred.flatten()) # Encapsulation Metric Results performance_result = { "overall_R2": r2, "overall_RMSE": rmse, "overall_MAE": mae } return performance_result Step S5.4: Perform convergence verification on the generalization performance evaluation index results to obtain the model performance verification results.
[0186] Preferably, this step takes the generalization performance evaluation index results output in step S5.3 as input, combines the industrial application requirements of cigarette manufacturing process, sets quantitative performance qualification thresholds and multi-dimensional verification rules, completes the systematic verification of model generalization ability and overfitting risk, and outputs clear model performance verification conclusions.
[0187] Among them, combining the process control precision requirements for outlet moisture in the drying process and the engineering application standards of industrial soft measurement, the three-level performance qualification thresholds are calibrated: ① Core mandatory thresholds: R2 ≥ 0.90 for the entire test set and RMSE ≤ 0.05% for the entire test set. These two are mandatory requirements that must be met and directly determine whether the model is qualified for industrial deployment.
[0188] ② Robustness thresholds for different operating conditions: MAE ≤ 0.03% for steady-state production and R2 ≥ 0.85 for fluctuating operating conditions, to ensure the performance stability of the model under different production scenarios.
[0189] ③ Overfitting prevention threshold: The R2 difference between the training set and the test set should be ≤0.05 to avoid the risk of the model overfitting on the training set and the sudden drop in performance in practical applications.
[0190] Preferably, this step can perform hierarchical verification: ① First-level mandatory verification: Prioritize the verification of core mandatory thresholds. If any one of them is not met, it is directly judged as "performance not up to standard, does not meet industrial application requirements".
[0191] ② Second-level robustness verification: After the core threshold is met, the robustness threshold of the working condition is verified. If it is not met, it is determined that "the working condition adaptability is insufficient and needs to be optimized before deployment".
[0192] ③ Third-level overfitting check: After the first two levels of checks pass, the overfitting prevention threshold is checked. If the threshold is exceeded, it is determined that "there is a risk of overfitting and the generalization ability is insufficient".
[0193] If all thresholds are met, the system is deemed to have "qualified performance and meets the requirements for industrial field applications".
[0194] In summary, based on the above hierarchical verification process, clear model performance verification results are output, including the final qualification judgment, the compliance status of each indicator, and a detailed explanation of the non-compliance items, providing a clear basis for subsequent model solidification or optimization.
[0195] Step S5.5: Based on the model performance verification results, the parameters of the converged KAN prediction model that meets the performance requirements are solidified to obtain the tobacco shred outlet moisture prediction model.
[0196] Preferably, this step takes the model performance verification result output in step S5.4 and the converged KAN prediction model output in step S4 as input, completes full parameter solidification, deployment adaptation and metadata binding for the model that meets the performance standards, and outputs a standardized tobacco shred export moisture prediction model that can be directly used for online deployment in industrial sites, ensuring the consistency between offline verification and online inference.
[0197] In this process, only converged KAN prediction models with a performance verification result of "qualified" will undergo subsequent solidification operations; for models that fail to meet the performance standards, this step will be terminated directly, and the process will return to the model training and optimization stage.
[0198] For models whose full parameters are irreversibly fixed, the following operations can be performed: ① Lock all trainable parameters of the network, including SiLU basis function weights, B-spline scaling factors, and B-spline control coefficients, fix the parameter values, and prohibit any form of modification or update.
[0199] ② Permanently fix the grid nodes of the B-spline function and disable the adaptive grid update function to ensure that the basis function structure during online inference is completely consistent with that during the offline verification stage.
[0200] ③ Permanently disable all training-related functions such as gradient calculation, backpropagation, and parameter optimization of the model, locking the model in pure inference mode to eliminate uncertainty in online application.
[0201] Preferably, the solidified model can be bound to core rule metadata from the offline training phase, including key information such as sliding window length, feature dimension order, dynamic normalization logic based on the sliding window, statistics required for label denormalization, and prediction step size. This ensures complete consistency between online preprocessing and offline training logic, avoiding preprocessing deviations after deployment. The solidified model structure, fixed parameters, bound metadata, generalization performance evaluation results, and training configuration information are integrated and serialized to generate a standardized model file that can be loaded across platforms and is traceable, meeting the deployment, operation, and auditing requirements of industrial sites.
[0202] Beneficial effects of steps S5.1 to S5.5: Step S5.1 provides a foundation for predictive output under unseen operating conditions to verify the model's generalization performance, aligning with the actual scenario of cross-batch applications in industrial production, and providing predictive data support that matches the on-site operating conditions for subsequent performance evaluation; Step S5.2 eliminates the interference of time-series misalignment on performance evaluation, ensuring accurate correspondence between the prediction results and the true labels in the time dimension, and providing effective time-series matching data pairs for subsequent prediction error calculation; Step S5.3 achieves objective quantitative evaluation of the model's predictive performance, comprehensively reflecting the model's predictive ability and generalization performance under unseen operating conditions, and providing quantifiable evaluation criteria for subsequent model performance verification; Step S5.4 completes the compliance determination of the model's generalization performance, accurately identifying models that do not meet application requirements, avoiding the application risks of substandard models, and ensuring the reliability of the output model for industrial applications; Step S5.5 completes the fixing of model parameters, eliminating the risk of parameter fluctuations in subsequent applications, and providing a stable, reusable, and standardized prediction model for online prediction applications in industrial settings.
[0203] Step S6: Connect the tobacco shred outlet moisture prediction model to the real-time collected drying process time series data and perform forward inference calculation to obtain the predicted value of tobacco shred outlet moisture during the drying process.
[0204] Furthermore, step S6 specifically includes the following steps: Step S6.1: Collect real-time process timing data of the tobacco drying process.
[0205] Preferably, this step involves collecting real-time data that matches the input requirements of the prediction model and is highly synchronous with the real-time operation data of the wire drying machine production process, providing a real-time data source of the same origin and dimension as offline training for subsequent online preprocessing and model inference.
[0206] The data acquisition source can be connected to the distributed control system (DCS) of the wire drying machine. It adopts the industrial standard OPCUA communication protocol to complete real-time data reading. The communication cycle is set to 100ms to ensure the real-time performance and stability of data acquisition and avoid timing misalignment caused by data transmission delay.
[0207] Preferably, the range of collected variables is strictly limited to the 13 key process variables selected in step S1, which are completely consistent with the input feature dimensions and variable order of offline training. These variables include four major categories of core variables: energy supply, material state, airflow and environment, and process feedback. No additional variables are introduced, and the variable order is not adjusted, thus ensuring the consistency between the input features and the model training requirements from the source.
[0208] Preferably, the sampling frequency is strictly calibrated to 1Hz, perfectly matching the sampling frequency of the offline training data. Each sampling point is bound to a millisecond-level timestamp of the DCS global unified clock. A sensor range validity threshold consistent with the offline preprocessing is synchronously set to remove invalid data exceeding the upper and lower limits of the sensor range. Sampling points exceeding the range are marked as outliers, providing valid raw data for subsequent time-series alignment. Simultaneously, a first-in-first-out (FIFO) real-time data buffer queue is set, with a queue length of 20 sampling points, covering the historical data length required by the sliding window, ensuring the continuity and traceability of real-time data and preventing data loss.
[0209] Step S6.2: Align the real-time process time-series data stream with the tobacco shred outlet moisture prediction model in terms of time-series dimensions and slice the data using a sliding time window to obtain a real-time time-series feature slice sequence with a fixed window length.
[0210] Preferably, the sliding window parameters are completely the same as the offline window parameters in step S2.2, and the historical window length seq is set. len =12 (corresponds to 12 seconds of historical process data, covering the entire material transport delay cycle of the wire drying process), sliding step length stride=1s, prediction step length pred step =1s, completely consistent with the construction rules of offline training samples, eliminating the structural bias between offline and online. At the same time, based on the FIFO cache queue, when the number of valid continuous sampling points in the queue is ≥12, the 13-dimensional feature data of the latest 12 consecutive time steps in the queue are extracted to construct a real-time time series feature slice with dimensions [12,13], which is completely matched with the dimensions and structure of the offline training samples; for each new real-time sampling point, the window slice is updated synchronously to achieve second-by-second sliding update and ensure the real-time performance of the prediction.
[0211] Step S6.3: Based on the dynamic normalization strategy of sliding window, the real-time time series feature slice sequence with fixed window length is subjected to data standardization processing to obtain standardized real-time time series feature samples.
[0212] Preferably, this step takes the real-time temporal feature slice sequence with a fixed window length output in step S6.2 as the processing object, strictly follows the sliding window dynamic normalization strategy in step S2, completes the standardization processing of real-time feature slices, ensures that the distribution of online input data is completely consistent with the distribution logic of offline training samples, and avoids the data distribution offset problem caused by operating condition fluctuations.
[0213] Specifically, for the current real-time feature slice window, the mean vector and standard deviation vector within the window are calculated strictly according to the calculation rules of S2.3, based on the feature dimensions. The calculation formula is completely consistent with that in the offline stage, so it will not be repeated here.
[0214] Specifically, based on the calculated mean vector and standard deviation vector within the window, the real-time feature slices are standardized dimension by dimension. The formula is completely consistent with step S2.4, so it will not be repeated here.
[0215] Step S6.4: Input the standardized real-time time-series feature samples into the tobacco shred outlet moisture prediction model to perform forward propagation inference calculation and obtain the normalized outlet moisture prediction value.
[0216] Preferably, the prediction model that has been solidified in step S5 and permanently locked in the evaluation mode can be used. All trainable parameters and grid nodes of the model have been fixed, the gradient calculation function has been permanently turned off, and there is no interference from any training-related mechanisms, ensuring the determinism and reproducibility of each inference.
[0217] For input format adaptation, standardized real-time temporal feature samples can be converted into tensor formats that are suitable for the model. A batch dimension is added, and the tensor format is converted into [1, 12, 13], which is completely matched with the input format of offline training and validation. At the same time, according to the logic of the training stage, the temporal feature slices are flattened into 156-dimensional one-dimensional vectors to adapt to the input layer dimension requirements of the KAN network.
[0218] For forward propagation inference, the inter-layer propagation rules defined in step S3 are followed, and the forward propagation process is completely consistent with that of offline inference: the flattened feature vectors are sequentially transformed through the two-layer hybrid basis function KAN hidden layer, and finally the prediction result is generated through the linear output layer, without any additional feature transformation or structural adjustment.
[0219] Finally, inference is completed in a gradient-free computation environment, and the output is a standardized export moisture prediction value with dimension [1,1], providing basic data for subsequent inverse normalization processing.
[0220] Step S6.5: Perform inverse normalization calculation on the normalized export moisture forecast value to obtain the export moisture forecast sequence under physical dimensions.
[0221] Preferably, the inverse normalization strictly follows the statistical parameters of the label standardization during the offline training phase, including the global mean μ of the training set labels. y Compared with the global standard deviation σ y This parameter has been bound to the model metadata in the model solidification stage of step S5, ensuring that the denormalization logic and the offline standardization logic are strictly inversely related.
[0222] Preferably, the inverse normalization formula is completely inverse of the offline label normalization formula. The inverse normalization formula is as follows: .in, For standardized predictions, y physical This is the predicted moisture content at the outlet after restoring the physical dimensions.
[0223] Preferably, the predicted values after inverse normalization can be constrained within a reasonable range [8%, 20%] for the wire drying process, eliminating abnormal predicted values that exceed common process knowledge, thus ensuring the industrial applicability of the output results. The predicted values of physical dimensions generated second by second are spliced together in chronological order to form an outlet moisture prediction sequence under physical dimensions, providing a temporal basis for the extraction of the final predicted values.
[0224] Step S6.6: Extract the values at the corresponding prediction time from the predicted moisture content at the outlet in the physical dimension to obtain the predicted moisture content at the outlet during the tobacco drying process.
[0225] Preferably, based on the sliding window timestamp in step S6.2, the target prediction time corresponding to the current prediction value can be determined as the latest timestamp of the window + 1s (which perfectly matches the pred_step=1s of the offline training), accurately corresponding to the physical time when the material in the drying process arrives at the outlet, ensuring that the prediction value matches the timing of the actual production process.
[0226] Beneficial effects of steps S6.1 to S6.6: Step S6.1 provides a data source that is synchronized with the actual operating conditions in real time for online prediction, ensuring the consistency of input data with the production process and providing a real-time and effective data foundation for subsequent time-series slicing processing. Step S6.2 achieves the adaptation of real-time data with model input requirements, ensuring the structural consistency between online input samples and offline training samples, eliminating inference bias caused by time series length mismatch, and providing structured real-time samples that meet model requirements for subsequent standardization processing. Step S6.3 ensures the consistency of preprocessing logic between online data and offline training data, eliminates numerical distribution differences caused by operating condition fluctuations, avoids prediction performance degradation caused by data distribution offset, and provides a standardized and unified basis for model inference. The process involves: 1) Standardizing input samples; 2) Implementing efficient inference calculations for real-time data, outputting prediction results consistent with offline training logic, ensuring the stability and real-time performance of the online prediction process, and providing basic prediction data for subsequent dimensional restoration; 3) Restoring the prediction results to physical dimensions that meet the requirements of industrial field applications, ensuring the interpretability and process adaptability of the prediction results on-site, and providing a dimensionally unified sequence basis for the extraction of the final prediction value; 4) Outputting accurate prediction results corresponding to the prediction time, compensating for the lag defects of direct on-site measurement, providing a preliminary reference for the feedforward control of the wire drying process, and adapting to the real-time prediction needs of continuous production in industrial fields.
[0227] For example, the data used in this embodiment comes from the historical operating data of the actual production process of drying tobacco leaves for a certain brand in a cigarette factory. All data was collected from the process control system at the production site and exhibits typical characteristics of industrial process data.
[0228] All collected data are numerical time series data, with a sampling frequency of 1 second per sample. Under normal production conditions, each batch's production process lasts approximately 1 hour and 20 minutes, corresponding to approximately 4772 sample points. This data can relatively completely reflect the dynamic changes in the wire drying process from start-up to stable operation.
[0229] This embodiment acquired data from 10 complete production batches, with a total of 47,698 records in the original state, including 28 process variables, covering multiple aspects such as material status, thermal parameters, and ventilation and dehumidification.
[0230] After completing the dataset partitioning, based on the feature selection results described in Chapter 3, 13 key process variables closely related to the moisture content at the outlet of the drying process were selected from the original 28 variables as model input features, and the moisture content of the outlet material was used as the prediction target variable.
[0231] To comprehensively evaluate the performance of this embodiment in the task of predicting moisture content at the yarn drying outlet, this embodiment selects a variety of typical soft sensing models as comparison objects, covering traditional machine learning methods, convolutional and recurrent neural network models, and deep models based on attention mechanisms.
[0232] All comparative models were trained and tested under the same dataset partitioning, input variables, and time window conditions to ensure the fairness of the comparison.
[0233] Specific comparison models include: MLP: Multilayer feedforward neural network, as a basic nonlinear modeling method, is used to verify the effect of deep structures on improving prediction performance; CNN: One-dimensional convolutional neural network, which extracts local temporal features within a time window through convolution operations; LSTM: Long Short-Term Memory network, which can model long-term dependencies in time series; TCN: Temporal Convolutional Network, which uses dilated convolutional structures to model long-term dependencies. Transformer: A time series modeling model based on self-attention mechanism, capable of capturing global dependencies in long sequences; KAN: Kolmogorov-Arnold network based on learnable side functions.
[0234] In terms of model architecture, the KAN network adopts a deep stacked structure with a topology of [156, 64, 64, 1]. The input layer dimension is determined by the sliding window length (seq). len =12) and the number of feature variables (input) dim =13) The output layer is a single-dimensional moisture prediction value. For the core activation function configuration, a 3rd-order B-spline curve is used as the edge connection function, with the spline domain mapped to −1,1, and SiLU used as the residual basis activation function. Regarding training hyperparameters, the Adam optimizer is used, with a learning rate (LR) of 0.01, a batch size of 128, and a total of 30 training iterations. To ensure model sparsity and generalization performance, the regularization loss weight is set to 10. -4 The grid update weight is set to 0.02.
[0235] See Figure 2 After training, the metrics for each model are shown in Table 1 below (Model Metrics Comparison Table): Table 1: Model Indicator Comparison Table.
[0236]
[0237] As can be seen from the table, traditional MLP and CNN models have relatively limited prediction accuracy in this task, making it difficult to fully characterize the complex nonlinear and dynamic characteristics of the silk drying process. After introducing a time-series modeling mechanism, the prediction performance of models such as LSTM and TCN is significantly improved, indicating that historical time-series information plays an important role in predicting export moisture content.
[0238] The attention-based Transformer model further improves prediction accuracy, demonstrating its advantage in capturing long-term dependencies. However, the KAN model proposed in this paper achieves state-of-the-art results across all evaluation metrics, achieving the highest R-value on the test set. 2 It achieves the lowest RMSE and the highest value. Notably, even with a significantly smaller model parameter size than the Transformer, KAN still achieves superior prediction performance, demonstrating its clear advantages in function representation efficiency and generalization ability.
[0239] Beneficial effects of steps S1 to S6: This method, through the screening and sequence construction of core process variables, anchors key process parameters related to the moisture content at the outlet of the drying process, avoiding the interference of redundant variables on the modeling process, and adapting to the strong coupling of multiple variables in the drying process. Specifically, step S1 achieves accurate screening of key process variables related to the outlet moisture content, eliminates redundant interference information, ensures a strong correlation between input features and prediction targets, and adapts to the multi-variable coupling process characteristics of the drying process; step S2, through standardization processing adapted to the non-stationary characteristics of industrial data, strengthens the model's robustness to fluctuations in different batches of raw materials and operating conditions, alleviating the performance degradation problem of traditional methods under operating condition shifts; step S3 constructs a network structure to efficiently characterize the complex nonlinear mapping relationship between process variables and outlet moisture, breaking through the limitations of traditional fixed activation function networks in nonlinear expression; step S4, through regularization constraints and mesh update strategies, completes model parameter iteration. While optimizing, the model's generalization performance is improved, the risk of overfitting is avoided, and structural support is provided for the model's inherent interpretability. Step S5 completes the systematic verification of the model's generalization performance, ensuring the model's predictive stability under unseen operating conditions and adapting to the application needs of continuous production in industrial settings. Step S6 realizes online real-time inference prediction of outlet moisture, providing a preliminary reference for process control in the drying process, compensating for the lag defects of direct measurement, and relying on the model's interpretability characteristics, providing reliable support for on-site process analysis and operation adjustment. Overall, it achieves high-precision and high-reliability prediction of outlet moisture in the drying process, taking into account both the model's predictive performance and the practical application value in industrial scenarios.
[0240] like Figure 3 As shown, this embodiment provides an example of a tobacco drying process outlet moisture prediction system based on KAN network. In this embodiment, the tobacco drying process outlet moisture prediction system is applied to the tobacco drying process outlet moisture prediction method as described in the above embodiment.
[0241] Specifically, the tobacco drying process outlet moisture prediction system includes a tobacco drying data acquisition and screening module 1, a tobacco drying data processing module 2, a sample division and model construction module 3, a KAN prediction network iteration module 4, a KAN prediction network verification module 5, and a tobacco drying process outlet moisture prediction module 6, which are connected electrically or through communication in sequence.
[0242] The tobacco drying data acquisition and screening module 1 is used to collect the entire time-series process data of the tobacco drying process, and to screen key process variables using the maximum correlation minimum redundancy algorithm to obtain the initial input feature sequence and the target moisture content sequence at the outlet. The tobacco drying data processing module 2 is used to standardize the initial input feature sequence and the target moisture content sequence at the outlet using a sliding window dynamic normalization strategy, and to slice the data into sliding time windows to obtain a standardized time-series feature sample set. The sample partitioning and model building module 3 is used to divide the standardized time-series feature sample set into training and testing sets according to the production batch, and to construct a fused SiLU basis function. The KAN prediction network is equipped with a parameterized B-spline function. The KAN prediction network iteration module 4 is used to introduce sparse regularization constraints and adaptive grid update strategies into the KAN prediction network, and to perform iterative optimization of network parameters through the training set to obtain the converged KAN prediction model. The KAN prediction network verification module 5 is used to verify the generalization performance of the converged KAN prediction model through the test set to obtain the tobacco shred outlet moisture prediction model. The tobacco shred drying process outlet moisture prediction module 6 is used to connect the tobacco shred outlet moisture prediction model to the real-time collected drying process time series data and perform forward inference calculation to obtain the predicted value of the tobacco shred drying process outlet moisture.
[0243] Figure 4 This is a schematic diagram of the structure of an electronic device according to an embodiment of this application. Figure 4 As shown, the electronic device 7 includes a processor 71 and a memory 72 coupled to the processor 71.
[0244] The memory 72 stores program instructions for implementing the KAN network-based method for predicting the outlet moisture content of tobacco drying process according to any of the above embodiments.
[0245] The processor 71 is used to execute program instructions stored in the memory 72 to perform KAN network-based prediction of the outlet moisture content of the tobacco drying process.
[0246] The processor 71 can also be referred to as a CPU (Central Processing Unit). The processor 71 may be an integrated circuit chip with signal processing capabilities. The processor 71 can also be a general-purpose processor, a digital signal processor (DSP), an application-specific integrated circuit (ASIC), a field-programmable gate array (FPGA), or other programmable logic devices, discrete gate or transistor logic devices, or discrete hardware components. A general-purpose processor can be a microprocessor or any conventional processor.
[0247] Furthermore, Figure 5 This is a schematic diagram of the structure of a storage medium according to an embodiment of this application. See also: Figure 5 The storage medium 8 in this embodiment stores program instructions 81 capable of implementing all the above methods. These program instructions 81 can be stored in the storage medium as a software product, including several instructions to cause a computer device (which may be a personal computer, server, or network device, etc.) or processor to execute all or part of the steps of the methods in each embodiment of this application. The aforementioned storage medium includes various media capable of storing program code, such as USB flash drives, portable hard drives, read-only memory (ROM), random access memory (RAM), magnetic disks, or optical disks, or terminal devices such as computers, servers, mobile phones, and tablets.
[0248] In the several embodiments provided in this application, it should be understood that the disclosed systems, methods, and approaches can be implemented in other ways. For example, the system embodiments described above are merely illustrative; for instance, the division of units is only a logical functional division, and in actual implementation, there may be other division methods. For example, multiple units or components may be combined or integrated into another system, or some features may be ignored or not executed. Furthermore, the mutual coupling or direct coupling or communication connection shown or discussed may be through some interfaces; the indirect coupling or communication connection between systems or units may be electrical, mechanical, signal, or other forms.
[0249] Furthermore, the functional units in the various embodiments of this application can be integrated into one processing unit, or each unit can exist physically separately, or two or more units can be integrated into one unit. The integrated units described above can be implemented in hardware or as software functional units. The above are merely embodiments of this application and do not limit the patent scope of this application. Any equivalent structural or procedural transformations made based on the description and drawings of this application, or direct or indirect applications in other related technical fields, are similarly included within the patent protection scope of this application.
Claims
1. A method for predicting the outlet moisture content of tobacco shreds during the drying process based on KAN networks, characterized in that, The method for predicting the moisture content at the outlet during the tobacco drying process includes: Step S1: Collect the full-process time-series process data of the tobacco drying process, and use the maximum correlation minimum redundancy algorithm to screen key process variables to obtain the initial input feature sequence and the target sequence of outlet moisture content; Step S2: Based on the dynamic normalization strategy of sliding window, the initial input feature sequence and the target moisture content sequence at the outlet are normalized and sliced by sliding time window to obtain a standardized time series feature sample set. Step S3: Divide the standardized time-series feature sample set into a training set and a test set according to the production batch, and construct a KAN prediction network that integrates SiLU basis functions and parameterized B spline functions; Step S4: Introduce sparsity regularization constraints and adaptive grid update strategy into the KAN prediction network, and perform iterative optimization of network parameters using the training set to obtain the converged KAN prediction model. Step S5: Verify the generalization performance of the converged KAN prediction model using the test set to obtain the tobacco shred outlet moisture prediction model. Step S6: Connect the tobacco shred outlet moisture prediction model to the real-time collected drying process time series data and perform forward inference calculation to obtain the predicted value of tobacco shred outlet moisture during the drying process.
2. The method for predicting the outlet moisture content of tobacco shreds during the drying process according to claim 1, characterized in that, Step S1: Collect the entire time-series process data of the tobacco drying process, and use the maximum correlation minimum redundancy algorithm to screen key process variables to obtain the initial input feature sequence and the target moisture content sequence at the outlet, including: Step S1.1: Collect the full-process time-series process data during the tobacco drying process to obtain the original process time-series dataset; Step S1.2: Perform timing alignment and standardization on the original process timing dataset to obtain a standardized timing process dataset; Step S1.3: Extract the time series data of process variables and the time series data of outlet moisture content from the standardized time series process dataset, and define them as the feature candidate variable set and the target variable sequence, respectively. Step S1.4: The maximum correlation and minimum redundancy algorithm is used to calculate the correlation and redundancy of the feature candidate variable set and the target variable sequence to obtain the maximum correlation and minimum redundancy score result for each candidate variable. Step S1.5: Sort and filter the set of feature candidate variables according to the maximum correlation and minimum redundancy score of each candidate variable to obtain the set of key process variables related to the outlet moisture content; Step S1.6: Extract time series data corresponding to the key process variable set from the standardized time series process dataset to obtain the initial input feature sequence; Step S1.7: Perform time-series dimension matching between the initial input feature sequence and the target variable sequence to obtain the initial input feature sequence and the outlet moisture content target sequence.
3. The method for predicting the outlet moisture content of tobacco shreds during the drying process according to claim 1, characterized in that, Step S2, based on the dynamic normalization strategy of the sliding window, performs data standardization and sliding time window slicing on the initial input feature sequence and the target moisture content sequence at the outlet, to obtain a standardized time-series feature sample set, including: Step S2.1: Perform time-series dimension alignment and timestamp matching on the initial input feature sequence and the target moisture content sequence at the outlet; Step S2.2: Perform sliding time window slicing on the time-aligned initial input feature sequence and the outlet moisture content target sequence to obtain a time-series feature slice sequence with a fixed window length and a corresponding label sequence; Step S2.3: Perform window-by-window statistical calculations on the temporal feature slice sequence and the corresponding label sequence to obtain the mean vector and standard deviation vector corresponding to each sliding window; Step S2.4: Based on the dynamic normalization strategy of sliding window, perform window-by-window data standardization on the mean vector and standard deviation vector corresponding to each sliding window, as well as the temporal feature slice sequence and corresponding label sequence of the fixed window length, to obtain the standardized temporal feature slice sequence and standardized label sequence. Step S2.5: Perform sample dimension normalization on the standardized temporal feature slice sequence and standardized label sequence to obtain a standardized temporal feature sample set.
4. The method for predicting the outlet moisture content of tobacco shreds during the drying process according to claim 1, characterized in that, Step S3: Divide the standardized time-series feature sample set into a training set and a test set according to the production batch, and construct a KAN prediction network that integrates SiLU basis functions and parameterized B-spline functions, including: Step S3.1: Extract the production batch identifier and time series data corresponding to the standardized time series feature sample set; Step S3.2: Sort all time-series data in the standardized time-series feature sample set according to the time sequence of the production batch identifiers to obtain the time-series sorted feature sample set; Step S3.3: Divide the time-series sorted feature sample set according to the time sequence of the production batch identifiers to obtain the training set and the test set; Step S3.4: Determine the network topology based on the Kolmogorov-Arnold representation theorem, and define the basic KAN network topology framework by combining the feature dimensions and output dimensions of the training set and the test set. Step S3.5: Introduce SiLU basis functions and parameterized B-spline functions based on the Cox. de Boor recursive formula on the network connection edges of the basic KAN network topology framework to obtain the hybrid basis function KAN network layer; Step S3.6: Stack the hybrid basis function KAN network layers in multiple layers and define the dimensional parameters of the input layer, hidden layer, and output layer to obtain a KAN prediction network that integrates SiLU basis functions and parameterized B-spline functions.
5. The method for predicting the outlet moisture content of tobacco shreds during the drying process according to claim 1, characterized in that, Step S4 involves introducing sparse regularization constraints and an adaptive grid update strategy into the KAN prediction network, and iteratively optimizing the network parameters using the training set to obtain a converged KAN prediction model, including: Step S4.1: Construct a mean squared error loss function based on the KAN prediction network and the training set, and introduce a sparsity regularization constraint composed of L1 regularization and entropy regularization to obtain a total loss function with sparsity regularization constraint. Step S4.2: An adaptive grid update strategy is introduced into the total loss function to obtain the iterative optimization objective function adapted to the KAN prediction network; Step S4.3: The Adam optimizer is used to perform forward and backward propagation calculations on the iterative optimization objective function and the training set to obtain the updated network parameters after each iteration. Step S4.4: Synchronously update the side function parameters and grid node parameters of the KAN prediction network according to the network parameter update value to obtain the iteratively updated KAN prediction network; Step S4.5: Perform iterative convergence verification on the iteratively updated KAN prediction network. Stop the iteration when the preset convergence condition is met, and obtain the converged KAN prediction model.
6. The method for predicting the outlet moisture content of tobacco shreds during the drying process according to claim 1, characterized in that, Step S5: Verify the generalization performance of the converged KAN prediction model using the test set to obtain the tobacco shred exit moisture prediction model, including: Step S5.1: Perform forward propagation inference calculation between the converged KAN prediction model and the test set to obtain the predicted outlet moisture content sequence corresponding to the test set; Step S5.2: Align the predicted export moisture content sequence with the actual export moisture content label sequence in terms of time series dimension to obtain the aligned predicted sequence and the actual label sequence. Step S5.3: Calculate the prediction error between the aligned prediction sequence and the real label sequence to obtain the generalization performance evaluation index of the converged KAN prediction model. Step S5.4: Perform convergence verification on the generalization performance evaluation index results to obtain the model performance verification results; Step S5.5: Based on the model performance verification results, the parameters of the converged KAN prediction model that meets the performance requirements are solidified to obtain the tobacco shred outlet moisture prediction model.
7. The method for predicting the outlet moisture content of tobacco shreds during the drying process according to claim 1, characterized in that, Step S6: Integrate the tobacco shred outlet moisture prediction model with the real-time collected drying process time-series data and perform forward inference calculations to obtain the predicted moisture value at the tobacco shred outlet during the drying process, including: Step S6.1: Collect real-time process timing data of the tobacco drying process; Step S6.2: Align the real-time process time-series data stream with the tobacco shred outlet moisture prediction model in terms of time-series dimensions and perform sliding time window slicing to obtain a real-time time-series feature slice sequence with a fixed window length; Step S6.3: Based on the dynamic normalization strategy of the sliding window, the real-time time series feature slice sequence with fixed window length is subjected to data standardization processing to obtain standardized real-time time series feature samples. Step S6.4: Input the standardized real-time time series feature samples into the tobacco shred outlet moisture prediction model to perform forward propagation inference calculation and obtain the normalized outlet moisture prediction value. Step S6.5: Perform inverse normalization calculation on the normalized export moisture prediction value to obtain the export moisture prediction sequence under physical dimensions. Step S6.6: Extract the values at the corresponding prediction time from the predicted moisture content at the outlet in the physical dimension to obtain the predicted moisture content at the outlet of the tobacco drying process.
8. A KAN network-based system for predicting the moisture content at the outlet of tobacco drying process, wherein the system is applied to the method for predicting the moisture content at the outlet of tobacco drying process as described in any one of claims 1 to 7, characterized in that, The tobacco drying process outlet moisture prediction system includes: The tobacco drying data acquisition and screening module is used to collect the full-process time-series process data of tobacco drying, and to screen key process variables through the maximum correlation and minimum redundancy algorithm to obtain the initial input feature sequence and the target sequence of outlet moisture content. The tobacco drying data processing module is used to perform data standardization and sliding time window slicing on the initial input feature sequence and the outlet moisture content target sequence based on a dynamic normalization strategy using a sliding window, so as to obtain a standardized time-series feature sample set. The sample partitioning and model building module is used to partition the standardized time-series feature sample set into a training set and a test set according to the production batch, and to build a KAN prediction network that integrates SiLU basis functions and parameterized B spline functions. The KAN prediction network iteration module is used to introduce sparsity regularization constraints and adaptive grid update strategy into the KAN prediction network, and to perform iterative optimization of network parameters through the training set to obtain the converged KAN prediction model. The KAN prediction network verification module is used to verify the generalization performance of the converged KAN prediction model through the test set, and obtain the tobacco shred outlet moisture prediction model. The tobacco drying process outlet moisture prediction module is used to connect the tobacco outlet moisture prediction model to the real-time collected drying process time sequence data and perform forward inference calculations to obtain the predicted value of the tobacco drying process outlet moisture.
9. An electronic device, characterized in that, The method includes a processor and a memory coupled to the processor, the memory storing program instructions executable by the processor; when the processor executes the program instructions stored in the memory, it implements the method for predicting the outlet moisture content of the tobacco drying process as described in any one of claims 1 to 7.
10. A computer-readable storage medium, characterized in that, The computer-readable storage medium stores program instructions that, when executed by a processor, enable the method for predicting the moisture content at the outlet of the tobacco drying process as described in any one of claims 1 to 7.