Gas concentration prediction method based on static spectral characteristics
By combining time-domain and frequency-domain features in a deep learning model, the problems of environmental interference and inconsistent data length in traditional gas concentration monitoring methods are solved, achieving high-precision and robust gas concentration prediction.
Patent Information
- Authority / Receiving Office
- CN · China
- Patent Type
- Patents(China)
- Current Assignee / Owner
- HARBIN NORMAL UNIVERSITY
- Filing Date
- 2025-07-17
- Publication Date
- 2026-06-23
AI Technical Summary
Traditional gas concentration monitoring methods are greatly affected by environmental conditions, making it difficult to effectively distinguish concentration changes in multi-gas mixtures. Furthermore, inconsistent sensor data lengths lead to information loss and decreased model performance.
Multiple measurements were performed using a gas-sensitive resistor sensor. By combining time-domain and frequency-domain features, a deep learning model was used to predict gas concentration. Missing value imputation and outlier handling were employed, a multi-head attention mechanism was introduced, and a deep learning network with no fixed length limit was designed.
It improves the accuracy and robustness of gas concentration prediction, can handle indeterminate long-term series data, and enhances the model's adaptability and generalization ability, especially performing well in complex signal and multi-gas scenarios.
Smart Images

Figure CN120703178B_ABST
Abstract
Description
Technical Field
[0001] This invention relates to the field of gas concentration prediction. Background Technology
[0002] Traditional gas concentration monitoring methods have been widely used in environmental protection and industrial production, relying primarily on chemical or physical sensors. However, these methods are easily affected by environmental conditions (such as temperature, humidity, and pressure), resulting in poor accuracy and stability of the monitoring results. Particularly in the monitoring of multi-gas mixtures, traditional techniques struggle to effectively distinguish concentration changes between different gases, and their real-time response capabilities are also significantly insufficient.
[0003] In recent years, deep learning technology has been gradually applied to gas concentration prediction. However, existing models are mainly based on time-domain features, neglecting the implicit information in the frequency domain of the signal. This single-perspective modeling approach struggles to capture the periodic changes of complex signals, thus limiting the model's applicability in diverse scenarios. Furthermore, in practical applications, sensor data lengths often vary, and conventional zero-padding or pruning strategies may introduce unnecessary noise and information loss, further reducing the model's predictive ability.
[0004] Furthermore, most current gas concentration prediction models typically require input data of uniform length. Since real-world sensor data is usually time-series data, and different sensors have varying acquisition periods and data lengths, directly processing variable-length data often requires zero-padding or pruning. This can lead to information loss or degraded model performance, resulting in poor accuracy in predicting gas concentrations. Summary of the Invention
[0005] The purpose of this invention is to address the problem of poor accuracy in current methods for predicting gas concentrations, and to propose a gas concentration prediction method based on static spectral characteristics.
[0006] A gas concentration prediction method based on static spectral characteristics, the method comprising the following:
[0007] Step 1: Use a gas-sensitive resistor sensor to perform multiple repeated measurements on various single historical gases and various mixed historical gases. Each measurement of each single historical gas and each mixed historical gas outputs a set of physical quantity characteristics.
[0008] Each set of physical quantity features is combined to form a multidimensional time series, and the gas name and concentration label of each multidimensional time series are obtained. All multidimensional time series have the same dimension.
[0009] Step 2, Data Processing: Each multidimensional time series is processed using methods for missing value imputation and outlier handling to obtain each processed multidimensional time series;
[0010] Step 3, Feature Fusion: Based on the multiple processed multidimensional time series of each type of gas obtained from multiple measurements of each single gas and mixed gas, the time domain features and frequency domain features of the corresponding type of gas are obtained. The time domain features and frequency domain features of the corresponding type of gas are fused to obtain the multidimensional feature matrix of the corresponding type of gas.
[0011] Step 4: Use each multidimensional feature matrix as input data and the corresponding gas name and concentration label as output data to train the deep learning model and obtain the trained deep learning model.
[0012] Step 5: Use a gas-sensitive resistor sensor to repeatedly measure the gas in the test environment. The output multi-dimensional time series are processed and feature fused in sequence to obtain the multi-dimensional feature matrix of the test. The matrix is then input into the trained deep learning model to predict the gas name and concentration.
[0013] Preferably, all gas multidimensional time series have a dimension of 3, including the resistance value output by the gas resistive sensor, the voltage value applied to the gas resistive sensor, and the current value.
[0014] Preferably, in step 2, each multidimensional time series is processed using methods for missing value imputation and outlier handling, specifically as follows:
[0015] The system detects whether there are missing time points in each multidimensional time series. If so, it uses linear interpolation to obtain resistance interpolation based on the two resistance values at the two adjacent time points of the missing time point, voltage interpolation based on the two adjacent voltage values at the two adjacent time points of the missing time point, and current interpolation based on the two adjacent current values at the two adjacent time points of the missing time point. The resistance interpolation, voltage interpolation, and current interpolation are then added to the missing time point.
[0016] The algorithm detects whether a physical quantity feature in each multidimensional time series is an anomaly or missing value. If the voltage value at a certain moment is an anomaly or missing value, the voltage value is removed, the mean of the voltage values at all moments is calculated, and the mean is added to the position of the removed voltage value. If the resistance value at a certain moment is an anomaly or missing value, the resistance value is removed, the mean of the resistance values at all moments is calculated, and the mean is added to the position of the removed resistance value. If the current value at a certain moment is an anomaly or missing value, the current value is removed, the mean of the current values at all moments is calculated, and the mean is added to the position of the removed current value.
[0017] Preferably, the linear interpolation method is as follows:
[0018]
[0019] In the formula, X i,jThe feature X at time j i X i X represents the resistance, voltage, or current value. i,j+1 The feature X at time j+1 i , t j+1 At time j+1, t j For time j, t m X′ represents the time corresponding to a missing or outlier value. i,m This is for interpolation.
[0020] Preferably, the process of detecting whether a certain physical quantity feature in each multidimensional time series is an anomaly or missing value:
[0021] The resistance, voltage, and current values at each moment in each multidimensional time series are sequentially checked to see if they exceed the corresponding preset fluctuation range. If they do, they are determined to be abnormal values; otherwise, they are determined to be normal values.
[0022] Preferably, the upper and lower bounds of the preset fluctuation range are as follows:
[0023] Upper bound = Q3 + 1.5 × IQR, lower bound = Q1 - 1.5 × IQR
[0024] In the formula, IQR = Q3 - Q1, where IQR is the interquartile range, Q1 is the lower quartile, and Q3 is the upper quartile.
[0025] Preferably, in step 3, the time-domain features include obtaining the mean resistance value, mean voltage value, mean current value, standard deviation of resistance value, standard deviation of voltage value, standard deviation of current value, maximum resistance value, maximum voltage value, and maximum current value at each moment from multiple multidimensional time series obtained from multiple measurements.
[0026] Preferably, in step 3, the frequency domain features are expressed as:
[0027]
[0028] In the formula, X(k) represents the spectral components of the frequency domain signal at frequency k, including amplitude-frequency and phase-frequency characteristics; x(n) represents the sequence of physical quantities in the time-domain feature matrix, including the resistance value output by the gas-sensitive resistor sensor, the voltage value applied to the gas-sensitive resistor sensor, and the current value; N represents the total number of time-domain data points; and k represents the frequency index, corresponding to different frequency components. These are complex basis functions.
[0029] Preferably, in step 4, the deep learning model includes convolutional layers, bidirectional LSTM layers, multi-head attention layers, flattening layers, fully connected layers, and an output layer.
[0030] Convolutional layers are used to convert multidimensional feature matrices into 128-dimensional vector sequences, which are then passed to bidirectional LSTM layers.
[0031] A bidirectional LSTM layer is used to convert a 128-dimensional vector sequence into a 256-dimensional vector sequence and pass it to the multi-head attention layer;
[0032] A multi-head attention layer is used to convert a 256-dimensional vector sequence into a three-dimensional temporal feature tensor, which is then passed to the flattening layer.
[0033] The flattening layer is used to convert the three-dimensional temporal feature tensor into a two-dimensional feature matrix and pass it to the fully connected layer;
[0034] Fully connected layers are used to convert two-dimensional feature matrices into 128-dimensional vector sequences, which are then passed to the output layer.
[0035] The output layer is used to establish a mapping relationship between the 128-dimensional vector sequence and the corresponding label.
[0036] Preferably, the method further includes step 6:
[0037] Model optimization: Calculate the loss function between the predicted gas name and concentration and the labels used to train the deep learning model, and use the Adam optimizer to optimize the parameters within the deep learning model so that the loss function reaches the threshold.
[0038] The beneficial effects of this invention are:
[0039] This invention has significant advantages in terms of technology and application, overcoming many limitations of traditional methods and existing technologies, as detailed below:
[0040] 1. This invention achieves comprehensive and multi-layered analysis of gas signals by combining in-depth mining of time-domain and frequency-domain features. Compared with traditional models that rely solely on time-domain features, the introduction of frequency-domain features effectively reveals the periodic patterns in the signal, enabling the model to more accurately predict concentration changes in complex gas mixtures. In experiments, the model incorporating frequency-domain features improved prediction accuracy by more than 25%, especially in non-stationary data scenarios.
[0041] 2. The dynamic feature weighting mechanism (frequency domain and time domain) designed in this invention dynamically adjusts the weight allocation of time domain and frequency domain features according to the actual distribution of the input data. This innovation not only improves the model's adaptability to different scenarios, but also significantly enhances its ability to capture key features, thereby greatly improving the model's robustness and generalization ability in multi-gas monitoring.
[0042] 3. The deep learning network architecture proposed in this invention, which has no fixed length limit, avoids the information loss problem that may be caused by traditional zero-padding or pruning operations. This architecture can directly process indeterminate long-term sequence data, effectively solving the problem of inconsistent data lengths from multiple sensors, thereby improving the prediction accuracy and stability of the model.
[0043] 4. In terms of model design, this invention further enhances the ability to focus on key features by introducing a multi-head attention mechanism. Compared with existing technologies, the multi-head attention mechanism shows significant advantages in complex multi-gas scenarios, enabling the model to maintain high-precision predictions even under conditions of high signal-noise or high data diversity.
[0044] 5. The overall technical solution of this invention has demonstrated superior performance in multiple experiments. Whether in prediction accuracy, robustness, or computational efficiency, this invention significantly outperforms existing technologies. In scenarios involving multi-gas mixtures, this method can accurately capture the concentration change trends of different gases, providing efficient and reliable technical support for environmental protection, industrial safety, and public health.
[0045] In summary, this invention has significant innovation and superiority in both the theoretical depth and practical application value of gas concentration prediction. Attached Figure Description
[0046] Figure 1 This is a flowchart of a gas concentration prediction method based on static spectral characteristics;
[0047] Figure 2 This is a schematic diagram of a deep learning model structure. Detailed Implementation
[0048] The technical solutions of the embodiments of the present invention will be clearly and completely described below with reference to the accompanying drawings. Obviously, the described embodiments are only some embodiments of the present invention, and not all embodiments. Based on the embodiments of the present invention, all other embodiments obtained by those skilled in the art without creative effort are within the scope of protection of the present invention.
[0049] It should be noted that, unless otherwise specified, the embodiments and features described in the present invention can be combined with each other.
[0050] The present invention will be further described below with reference to the accompanying drawings and specific embodiments, but this is not intended to limit the scope of the invention.
[0051] Example:
[0052] A gas concentration prediction method based on static spectral characteristics, the method comprising the following:
[0053] Step 1: Use a gas-sensitive resistor sensor to perform multiple repeated measurements on various single historical gases and various mixed historical gases. Each measurement of each single historical gas and each mixed historical gas outputs a set of physical quantity characteristics.
[0054] Each set of physical quantity features is combined to form a multidimensional time series, and the gas name and concentration label of each multidimensional time series are obtained. All multidimensional time series have the same dimension.
[0055] Step 2, Data Processing: Each multidimensional time series is processed using methods for missing value imputation and outlier handling to obtain each processed multidimensional time series;
[0056] Step 3, Feature Fusion: Based on the multiple processed multidimensional time series of each type of gas obtained from multiple measurements of each single gas and mixed gas, the time domain features and frequency domain features of the corresponding type of gas are obtained. The time domain features and frequency domain features of the corresponding type of gas are fused to obtain the multidimensional feature matrix of the corresponding type of gas.
[0057] Step 4: Use each multidimensional feature matrix as input data and the corresponding gas name and concentration label as output data to train the deep learning model and obtain the trained deep learning model.
[0058] Step 5: Use a gas-sensitive resistor sensor to repeatedly measure the gas in the test environment. The output multi-dimensional time series are processed and feature fused in sequence to obtain the multi-dimensional feature matrix of the test. The matrix is then input into the trained deep learning model to predict the gas name and concentration.
[0059] Specifically, in step 1, each measurement is repeated multiple times. For example, oxygen is measured 3 times to obtain 3 sets of physical quantity characteristics. For mixed gases, such as a mixture of oxygen and hydrogen, the measurement is performed 3 times to obtain 3 sets of physical quantity characteristics.
[0060] Step 1 involves preparing training data. The collected gases are pre-prepared and can be single gases or mixtures of two or more gases. For example, we may need to pass the gas into the bottle containing the sensor for 5-20 seconds, or longer. The labels indicate the concentration and gas name; for example, a single gas label might indicate 5 ppm oxygen, and a mixed gas label might indicate 5 ppm 50% oxygen and 5 ppm 50% hydrogen.
[0061] Step 3 refers to the process of repeatedly collecting data on a single gas three times, and then processing the data to obtain three multidimensional time series. These three multidimensional time series are then used to obtain the time-domain and frequency-domain characteristics of the single gas. The same method applies to obtaining the time-domain and frequency-domain characteristics of other types of gases.
[0062] Furthermore, all multidimensional time series have a dimension of 3, including the resistance value output by the gas resistive sensor, the voltage value applied to the gas resistive sensor, and the current value.
[0063] Specifically, in the gas concentration prediction task, the input dataset originates from multiple measurements of different gases by multiple sensors: each sensor performs 3-10 repeated measurements for a single gas (e.g., gas 1, gas 2 to gas n), for example, sensor 1 measures gas 1 3-10 times, generating time-series data sequences {t1,R1,U1,I1}, {t2,R2,U2,I2}…{t n ,R n U n ,I n Sensor 2 performs 3-10 measurements on gas 2 using the same logic and forms the corresponding data structure. For mixed gas scenarios, the concentration of each component is simultaneously labeled during measurement (e.g., the m% concentration of gas x is mixed with the n% concentration of gas y), and 3-10 repeated samplings are performed using the same sensor array. All raw data includes physical quantities such as resistance (R), voltage (U), and current (I) at different time points, as well as gas concentration labels (single gas concentration or concentration of each component in a mixed gas) implicitly in the file name. The output dataset is a standardized feature matrix formed after preprocessing, with each row corresponding to a single time point multidimensional feature on a unified time axis (e.g., {t,R,U,I}), and matched with the corresponding single gas concentration value or mixed gas concentration label, used for training and prediction of the deep learning model.
[0064] Further specifying, in step 2, missing value imputation and outlier handling methods are used to process each multidimensional time series, specifically as follows:
[0065] The system detects whether there are missing time points in each multidimensional time series. If so, it uses linear interpolation to obtain resistance interpolation based on the two resistance values at the two adjacent time points of the missing time point, voltage interpolation based on the two adjacent voltage values at the two adjacent time points of the missing time point, and current interpolation based on the two adjacent current values at the two adjacent time points of the missing time point. The resistance interpolation, voltage interpolation, and current interpolation are then added to the missing time point.
[0066] The algorithm detects whether a physical quantity feature in each multidimensional time series is an anomaly or missing value. If the voltage value at a certain moment is an anomaly or missing value, the voltage value is removed, the mean of the voltage values at all moments is calculated, and the mean is added to the position of the removed voltage value. If the resistance value at a certain moment is an anomaly or missing value, the resistance value is removed, the mean of the resistance values at all moments is calculated, and the mean is added to the position of the removed resistance value. If the current value at a certain moment is an anomaly or missing value, the current value is removed, the mean of the current values at all moments is calculated, and the mean is added to the position of the removed current value.
[0067] Further specifying, the linear interpolation method is as follows:
[0068]
[0069] In the formula, X i,j The feature X at time j i X i X represents the resistance, voltage, or current value. i,j+1 The feature X at time j+1 i , t j+1 At time j+1, t j For time j, t m X′ represents the time corresponding to a missing or outlier value. i,m This is for interpolation.
[0070] Specifically, whether it is a single gas or a gas mixture, the characteristics obtained at each moment consist of three features: resistance value, voltage value, and current value.
[0071] If the time sequence is T = {t1, t2, ..., t...} N At each time point, there are three characteristic values: resistance, voltage, and current. If there is a missing time point t... m Then, the resistance values at the two consecutive time points to the left and right of this time point are substituted into the linear interpolation formula to obtain the resistance interpolation value, which is then added to the missing time point. Similarly, the voltage interpolation and current interpolation values are also added to the missing time point in this way, so that the data for this time point is complete.
[0072] If time point t i If any of the corresponding R, U, or I values are missing or out of range, they can be filled in by checking the average of the sensor's historical data. For example, for a missing resistance value R, the average of all valid R values for the sensor is calculated. and use μ R Fill in the missing positions.
[0073] Further, the process of detecting whether a certain physical quantity feature in each multidimensional time series is an anomaly or missing value:
[0074] The resistance, voltage, and current values at each moment in each multidimensional time series are sequentially checked to see if they exceed the corresponding preset fluctuation range. If they do, they are determined to be abnormal values; otherwise, they are determined to be normal values.
[0075] Further defining the upper and lower bounds of the preset fluctuation range, they are as follows:
[0076] Upper bound = Q3 + 1.5 × IQR, lower bound = Q1 - 1.5 × IQR
[0077] In the formula, IQR = Q3 - Q1, where IQR is the interquartile range, Q1 is the lower quartile, and Q3 is the upper quartile.
[0078] Specifically, Q1 is the lower quartile (25th percentile), referring to the boundary value of the top 25% after the data is sorted from smallest to largest; Q3 is the upper quartile (75th percentile), referring to the boundary value of the top 75% after the data is sorted from smallest to largest; IQR is the interquartile range, used to measure the dispersion of the data, reflecting the fluctuation range of the middle 50% of the data. The boundary value refers to the value at a specific position after the data is sorted from smallest to largest (it could be a single value from the original data, or a weighted average of two adjacent values).
[0079] Further specifying step 3, the time-domain features include the mean resistance value, mean voltage value, mean current value, standard deviation of resistance value, standard deviation of voltage value, standard deviation of current value, maximum resistance value, maximum voltage value, and maximum current value at each moment from multiple multidimensional time series obtained from multiple measurements.
[0080] Specifically, each data collection yields a set of multidimensional time series data. After data processing, each processed multidimensional time series is obtained, with the following format:
[0081]
[0082] Each row represents the complete features of a sampling point. i For time point R i U i I i These represent the resistance, voltage, and current values at that time point, respectively.
[0083] For example, if oxygen is measured three times, three processed multidimensional time series are obtained. The average of the three resistance values, three voltage values, and three current values at time t1 in these three processed multidimensional time series is taken. The maximum and minimum values of the three resistance values, the maximum and minimum values of the three current values, and the maximum and minimum values of the three voltage values at time t1 are also taken. The standard deviations of the voltage, current, and resistance values at time t1 are then calculated. These calculated values are used as the time-domain features at time t1. The process for calculating the time-domain features at other times is the same.
[0084] The mean is: Reflects the average level of the signal, where x i Here, N represents the time-domain data points, and N is the total number of data points.
[0085] The standard deviation is: Measure the amplitude of signal fluctuations;
[0086] The minimum value min(x) and the maximum value max(x) define the range of signal variation.
[0087] Further specifying, in step 3, the frequency domain features are represented as:
[0088]
[0089] In the formula, X(k) represents the spectral components of the frequency domain signal at frequency k, including amplitude-frequency and phase-frequency characteristics; x(n) represents the sequence of physical quantities in the time-domain feature matrix, including the resistance value output by the gas-sensitive resistor sensor, the voltage value applied to the gas-sensitive resistor sensor, and the current value; N represents the total number of time-domain data points; and k represents the frequency index, corresponding to different frequency components. These are complex basis functions.
[0090] Specifically, the frequency domain features are obtained using the Discrete Fourier Transform (DFT).
[0091] After frequency domain feature extraction, the positive frequency portion is combined with time domain statistical features (mean, standard deviation, etc.) to form a complete feature vector. Among them, the amplitude frequency feature reveals the periodicity of the signal (such as the frequency characteristics of gas concentration fluctuations), and the phase frequency feature reflects the phase shift of the signal. The combination of the two can more comprehensively describe the dynamic change trend of gas concentration.
[0092] In the input data for building deep learning models, the fused matrix is a multi-dimensional feature matrix that combines time-domain and frequency-domain features, and its structure is as follows:
[0093] Time-domain feature matrix (including statistical features):
[0094]
[0095] Frequency domain feature matrix (including amplitude and phase frequency features):
[0096]
[0097] |X i (f k )| represents the time point (t) i The corresponding physical quantities (R, U, I) at frequency (f) k The amplitude value (amplitude-frequency characteristic) at ) ; ∠X i (f k Frequency (f) k The phase value (phase frequency characteristic) at position ) is calculated by the Discrete Fourier Transform (DFT): Wherein, the magnitude (|X(k)|) of (X(k)) is the amplitude frequency, the argument (∠X(k)) is the phase frequency, and (x(n)) is the sequence of time-domain physical quantities. f1,f2,…,f k These are the extracted discrete frequency points (such as the first k main frequency components).
[0098] Each row of the fusion matrix corresponds to a single time point on a unified time axis, and each column contains the time-domain statistical characteristics, frequency-domain amplitude-frequency characteristics, and phase-frequency characteristics of that time point.
[0099] Further specifying, in step 4, the deep learning model includes convolutional layers, bidirectional LSTM layers, multi-head attention layers, flattening layers, fully connected layers, and an output layer.
[0100] Convolutional layers are used to convert multidimensional feature matrices into 128-dimensional vector sequences, which are then passed to bidirectional LSTM layers.
[0101] A bidirectional LSTM layer is used to convert a 128-dimensional vector sequence into a 256-dimensional vector sequence and pass it to the multi-head attention layer;
[0102] A multi-head attention layer is used to convert a 256-dimensional vector sequence into a three-dimensional temporal feature tensor, which is then passed to the flattening layer.
[0103] The flattening layer is used to convert the three-dimensional temporal feature tensor into a two-dimensional feature matrix and pass it to the fully connected layer;
[0104] Fully connected layers are used to convert two-dimensional feature matrices into 128-dimensional vector sequences, which are then passed to the output layer.
[0105] The output layer is used to establish a mapping relationship between the 128-dimensional vector sequence and the corresponding label.
[0106] Specifically, deep learning model construction:
[0107] The model input is a preprocessed fusion feature matrix, which includes time-domain statistical features and frequency-domain features:
[0108] Dimension: 3D tensor: (Number of samples processed in a single forward / backward propagation, time step, feature dimension)
[0109] Matrix form:
[0110]
[0111] Data Stream: Batch processing example of 128 samples (actual shape: (128, n, d), where n = time step, d = feature dimension).
[0112] 1. Convolutional Layer (Conv1D)
[0113] enter:
[0114] Shape: (128, n, d) (128 samples × n time steps × d-dimensional features)
[0115] Data type: 3D floating-point tensor
[0116] operate:
[0117] 128 convolutional kernels slide along the time axis
[0118] Kernel size = 6, stride = 1, activation function ReLU
[0119] Calculation example (single time step):
[0120]
[0121] Output:
[0122] Shape: (128, n-5, 128)
[0123] (The time step is reduced by 5 due to the core size of 6)
[0124] Matrix example (single sample):
[0125]
[0126] Interlayer interaction: Extracting local feature patterns and converting the original features into a high-dimensional representation, providing BiLSTM with spatially rich input.
[0127] 2. Bidirectional LSTM layer (BiLSTM)
[0128] enter:
[0129] Shape: (128, n-5, 128)
[0130] Data (single time step): [0.24, 1.72, ..., 0.98] (128-dimensional vector)
[0131] operate:
[0132] Forward LSTM: Processes sequences from left to right
[0133]
[0134] Backward LSTM: Processes sequences from right to left
[0135]
[0136] Output splicing:
[0137]
[0138] Output:
[0139] Shape: (128, n-5, 256)
[0140] (Forward 128 dimensions + Backward 128 dimensions)
[0141] Matrix example (single sample):
[0142]
[0143] Interlayer effects: Capture long-term temporal dependencies, expanding the local features of CNNs into high-level representations that include bidirectional context.
[0144] 3. Multi-head attention layer
[0145] enter:
[0146] Shape: (128, n-5, 256)
[0147] Data: 256-dimensional vector sequence output by BiLSTM
[0148] operate:
[0149] Linear projection generates the Q / K / V matrix:
[0150] Q = XW Q K = XW K V = XW V X is the input feature matrix, W Q W is the projection weight matrix for the query. K W is the projection weight matrix of the key. V The projected weight matrix for the value;
[0151] 8-head attention parallel computation:
[0152] head iLet Q be the output of the i-th attention head, where soft is a function used to normalize the weights. i Let i be the query matrix corresponding to the i-th attention head. V is the key matrix corresponding to the i-th attention head. i This is the value matrix corresponding to the i-th attention head. 32 is the scaling factor, and 32 is the feature dimension of each attention head;
[0153] Multi-head output splicing:
[0154] Z = Concat(head1,…,head8)W0, where Concat is the concatenation operation used to merge the outputs of the 8 attention heads; W0 is the output projection weight matrix; and Z is the final output of the multi-head attention mechanism.
[0155] Output:
[0156] Shape: (128, n-5, 256)
[0157] (Same as the input time step)
[0158] Example matrix (single time step): [0.18, 0.02, ..., -0.11] (256-dimensional weighted features)
[0159] Interlayer interaction: Focusing on key features, suppressing noise, and providing purified feature representations for fully connected layers.
[0160] 4. Flatten
[0161] enter:
[0162] Shape: (128, n-5, 256)
[0163] Data: 3D attention output tensor
[0164] operate:
[0165] Dimensional transformation: (batch, timesteps, features) → (batch, timesteps × features)
[0166] Mathematical expression: reshape(Z, (128, (n-5)×256))
[0167] Output:
[0168] Shape: (128, (n-5)×256)
[0169] Example matrix (single sample): [0.18, 0.02, ..., -0.11, 0.15, ...] (length = (n-5) × 256)
[0170] Interlayer effect: Spatiotemporal feature vectorization to adapt to the processing format of fully connected layers.
[0171] 5. Fully Connected Layer (Dense)
[0172] enter:
[0173] Shape: (128, (n-5)×256)
[0174] Data: Flattened one-dimensional feature vector
[0175] operate:
[0176] Nonlinear transformation:
[0177] y = max(θ, W) X +b), y is the output result after nonlinear transformation, usually a feature representation after activation processing (using an operation similar to a ReLU variant, with max as a function to implement threshold activation), θ is the threshold parameter used to control the starting point of activation, similar to the "0" threshold in ReLU, W X represents the linear product of the weight matrix W and the input feature X (i.e., the linear transformation part, where W and X are multiplied by a matrix), and b is the bias term, which is used to adjust the offset of the linear transformation result to help the model fit more complex data;
[0178] Weight matrix:
[0179] m is the number of rows in the weight matrix, and n is the feature dimension (or feature sequence length, etc., which needs to be determined based on the specific model input) before being input to the fully connected layer. It is the basic parameter for deriving the number of rows m in the weight matrix.
[0180] Output:
[0181] Shape: (128, 128)
[0182] Matrix example:
[0183]
[0184] Interlayer interaction: Feature space compression and abstraction, extracting the core pattern for concentration prediction.
[0185] 6. Output Layer (Dense)
[0186] enter:
[0187] Shape: (128, 128)
[0188] Data: 128-dimensional features output by the fully connected layer
[0189] operate:
[0190] Linear regression:
[0191] The predicted value output by the model, for example in a regression task, is the prediction result of the true label y, which is based on the input x and the parameters W learned by the model. out and b out Calculations show that W out is the weight matrix (or weight vector) of the output layer; if x is a vector, it can be considered as vector multiplication in simple scenarios. The model learns and adjusts it during training to perform a linear transformation on the input x, reflecting the linear relationship between the input features and the output prediction. x is the feature vector (or feature matrix, depending on the specific dimensions) input to the output layer; it is the feature representation passed to the output layer after processing by the preceding network layers. out The bias term of the output layer and the weight W out They participate in linear calculations together, assisting the model in fitting the data, making linear transformations more flexible, and learning and updating synchronously during training.
[0192] Weight matrix:
[0193] (p = output dimension)
[0194] Output:
[0195] Single gas: shape (128, 1), scalar concentration value [0.85, 0.92, ..., 0.78] T ,
[0196] Gas mixture: Shape (128, 2), Two-component concentration:
[0197]
[0198] Interlayer interaction: completes the final mapping from features to concentration.
[0199] Through hierarchical feature transformation, the model achieves an end-to-end mapping from raw sensor data to concentration prediction. The output of each layer serves as the structured input for the next layer, forming a progressively abstract feature processing pipeline.
[0200] Further specifying, the method also includes step 6:
[0201] Model optimization: Using the predicted gas names and concentrations and the labels used to train the deep learning model, the loss function is calculated, and the Adam optimizer is used to optimize the parameters within the deep learning model so that the loss function reaches the threshold.
[0202] Specifically, the loss function:
[0203] The loss function quantifies the difference between the model-predicted gas concentration value and the actual concentration label. It is primarily calculated using the mean squared error (MSE), and the formula is as follows:
[0204]
[0205] in, The predicted concentration (as a scalar for a single gas and a component concentration vector for a gas mixture) is output by the model based on the fusion of time-domain and frequency-domain features (such as amplitude / phase frequency features extracted by Fourier transform and time-domain statistical features). i This refers to the actual concentration values (single gas concentration or concentrations of each component in a gas mixture) extracted and labeled from the sensor data file name during preprocessing, where N is the number of samples processed in a batch (e.g., batch size 128). This function transforms the prediction error into an optimizable scalar value by calculating the mean of the squared differences, directly guiding the model to learn the concentration mapping relationship.
[0206] Furthermore, to avoid overfitting during high-dimensional feature training, an L2 regularization term is introduced into the loss function:
[0207]
[0208] Where λ is the regularization strength, and w represents the weight parameters of network layers such as convolutional layers and BiLSTM layers. By penalizing the sum of squared weights, the complexity of the model is reduced, and the ability to generalize to data in different scenarios is improved.
[0209] The core objectives and implementation mechanisms of optimization:
[0210] The core of the optimization is to iteratively adjust the learnable parameters in the neural network (such as convolutional kernel weights, BiLSTM unit parameters, fully connected layer weights and biases) to gradually reduce the loss function value, ultimately achieving a precise mapping from multi-dimensional sensor features to gas concentration. This is specifically implemented using the Adam optimizer.
[0211] First, the gradient g is obtained by differentiating the parameters from the loss function. t The direction of parameter update is indicated; then the first moment m is used. t and second moment v t The mean and variance of the cumulative gradient (m t =β1m t-1 +(1-β1)g t , β1 = 0.9, β2 = 0.999 (to suppress noise), and through
[0212]
[0213] Correcting the initial deviation, the final result is...
[0214]
[0215] Update parameters (η is the learning rate, ε = 10) -8 (Preventing division by zero) enables different parameters to have independent adaptive learning rates, efficiently handling the convergence requirements of high-dimensional inputs such as frequency domain features.
[0216] The optimization process is also supplemented by an early stopping mechanism. By monitoring the loss of the validation set, training is terminated when the loss does not decrease for K consecutive rounds, thus avoiding overfitting to noise in the training data. At the same time, the dynamic feature weighting mechanism in the model architecture works in conjunction with the optimization process to automatically adjust the weight allocation of time-domain and frequency-domain features according to the characteristics of the input data. This makes the optimization direction more focused on the feature dimensions that play a key role in concentration prediction, further improving the prediction accuracy and model robustness in multi-gas mixing and variable-length data scenarios.
[0217] Model evaluation:
[0218] The model evaluation phase employs the classic indicator system for regression tasks, including mean squared error (MSE), root mean squared error (RMSE), mean absolute error (MAE), and coefficient of determination (R²). 2 Quantifying prediction performance, these metrics are consistent with the design of the loss function. MSE is used to quantify the overall prediction bias and is more sensitive to large errors; RMSE is used to quantify the overall prediction bias. By restoring the error to its original concentration dimension, it facilitates intuitive evaluation in engineering scenarios;
[0219] Calculating the mean absolute error provides greater robustness to outliers; R 2 pass The closer a value is to 1, the better the prediction performance, which measures the model's ability to explain data variation.
[0220] The experimental data used the processed sensor sequence, divided into training, validation, and test sets in a 7:1:2 ratio. Data preprocessing strictly followed the procedures of linear interpolation to align the time axis, IQR method to detect outliers, and mean-filled missing values. Batch processing used a batch size of 128, and the input 3D tensor (number of samples × time step × feature dimension) was adapted to the requirements of the Conv1D layer (convolutional layer). During training, the Adam optimizer (β1 = 0.9, β2 = 0.999) and an initial learning rate of 10 were used. -3 In addition, the dynamic feature weighting mechanism automatically integrates time domain (mean, standard deviation, etc.) and frequency domain (amplitude frequency, phase frequency) features, and terminates the test if the loss of the validation set does not decrease after 5 consecutive rounds.
[0221] The model improves R0 in single-gas concentration prediction by employing time-frequency feature fusion, multi-head attention mechanism, and variable-length input design. 2The accuracy reaches 0.982, the average accuracy of two-component mixed gas exceeds 95%, the cross-scenario generalization ability is improved by more than 30% compared with traditional methods, and the real-time response latency in industrial tests is <50ms. Its end-to-end feature processing flow provides a high-precision and robust technical solution for real-time monitoring of multiple gases, effectively solving the problems of environmental interference, insufficient multi-gas differentiation ability and inefficient processing of variable-length data in traditional methods in the background technology.
[0222] While the invention has been described herein with reference to specific embodiments, it should be understood that these embodiments are merely examples of the principles and applications of the invention. Therefore, it should be understood that many modifications can be made to the exemplary embodiments, and other arrangements can be designed without departing from the spirit and scope of the invention as defined by the appended claims. It should be understood that different dependent claims and features described herein can be combined in ways different from those described in the original claims. It is also understood that features described in conjunction with individual embodiments can be used in other described embodiments.
Claims
1. A method for gas concentration prediction based on static spectral features, characterized in that, The method includes the following: Step 1: Use a gas-sensitive resistor sensor to perform multiple repeated measurements on various single historical gases and various mixed historical gases. Each measurement of each single historical gas and each mixed historical gas outputs a set of physical quantity characteristics. Each set of physical quantity features is combined to form a multidimensional time series, and the gas name and concentration label of each multidimensional time series are obtained. All multidimensional time series have the same dimension. Step 2, Data Processing: Each multidimensional time series is processed using methods for missing value imputation and outlier handling to obtain each processed multidimensional time series; Step 3, Feature Fusion: Based on the multiple processed multidimensional time series of each type of gas obtained from repeated measurements of each single gas and mixed gas, the time domain features and frequency domain features of the corresponding type of gas are obtained. The time domain features and frequency domain features of the corresponding type of gas are fused to obtain the multidimensional feature matrix of the corresponding type of gas. Step 4: Use each multidimensional feature matrix as input data and the corresponding gas name and concentration label as output data to train the deep learning model and obtain the trained deep learning model. Step 5: Use a gas-sensitive resistor sensor to repeatedly measure the gas in the test environment. The output multi-dimensional time series are processed and feature fused in sequence to obtain the multi-dimensional feature matrix of the test. The matrix is then input into the trained deep learning model to predict the gas name and concentration. In step 3, the time-domain features include obtaining the mean resistance value, mean voltage value, mean current value, standard deviation of resistance value, standard deviation of voltage value, standard deviation of current value, maximum resistance value, maximum voltage value, and maximum current value at each moment from multiple multidimensional time series obtained from multiple measurements; In step 3, the frequency domain features are represented as: , In the formula, is a spectral component of the frequency-domain signal at a frequency , containing amplitude and phase characteristics, is a sequence of physical quantities in a time-domain feature matrix, including resistance values of a gas-sensitive resistance sensor output, voltage values applied to the gas-sensitive resistance sensor, and current values, is the total number of time-domain data points, is a frequency index, corresponding to different frequency components, is a complex basis function; In step 4, the deep learning model includes convolutional layers, bidirectional LSTM layers, multi-head attention layers, flattening layers, fully connected layers, and an output layer. Convolutional layers are used to convert multidimensional feature matrices into 128-dimensional vector sequences, which are then passed to bidirectional LSTM layers. A bidirectional LSTM layer is used to convert a 128-dimensional vector sequence into a 256-dimensional vector sequence and pass it to the multi-head attention layer; A multi-head attention layer is used to convert a 256-dimensional vector sequence into a three-dimensional temporal feature tensor, which is then passed to the flattening layer. The flattening layer is used to convert the three-dimensional temporal feature tensor into a two-dimensional feature matrix, which is then passed to the fully connected layer. Fully connected layers are used to convert two-dimensional feature matrices into 128-dimensional vector sequences, which are then passed to the output layer. The output layer is used to establish a mapping relationship between the 128-dimensional vector sequence and the corresponding label.
2. The gas concentration prediction method based on static spectral characteristics according to claim 1, characterized in that, All multidimensional time series have a dimension of 3, including the resistance value output by the gas resistive sensor, the voltage value applied to the gas resistive sensor, and the current value.
3. The gas concentration prediction method based on static spectral characteristics according to claim 2, characterized in that, In step 2, missing value imputation and outlier handling methods are used to process each multidimensional time series, specifically as follows: The system detects whether there are missing time points in each multidimensional time series. If so, it uses linear interpolation to obtain resistance interpolation based on the two resistance values at the two adjacent time points of the missing time point, voltage interpolation based on the two adjacent voltage values at the two adjacent time points of the missing time point, and current interpolation based on the two adjacent current values at the two adjacent time points of the missing time point. The resistance interpolation, voltage interpolation, and current interpolation are then added to the missing time point. The algorithm detects whether a physical quantity feature in each multidimensional time series is an anomaly or missing value. If the voltage value at a certain moment is an anomaly or missing value, the voltage value is removed, the mean of the voltage values at all moments is calculated, and the mean is added to the position of the removed voltage value. If the resistance value at a certain moment is an anomaly or missing value, the resistance value is removed, the mean of the resistance values at all moments is calculated, and the mean is added to the position of the removed resistance value. If the current value at a certain moment is an anomaly or missing value, the current value is removed, the mean of the current values at all moments is calculated, and the mean is added to the position of the removed current value.
4. The gas concentration prediction method based on static spectral characteristics according to claim 3, characterized in that, The linear interpolation method is as follows: , In the formula, For the first Features at any given moment , For resistance, voltage, or current values, For the first Features at any given moment , For the first time, For the first time, The time corresponding to missing or outlier values. This is for interpolation.
5. The gas concentration prediction method based on static spectral characteristics according to claim 3, characterized in that, The process of detecting whether a certain physical quantity feature in each multidimensional time series is an anomaly or missing value: The resistance, voltage, and current values at each moment in each multidimensional time series are sequentially checked to see if they exceed the corresponding preset fluctuation range. If they do, they are determined to be abnormal values; otherwise, they are determined to be normal values.
6. The gas concentration prediction method based on static spectral characteristics according to claim 5, characterized in that, The upper and lower bounds of the preset fluctuation range are as follows: , In the formula, , Interquartile range, The lower quartile, It is the upper quartile.
7. The gas concentration prediction method based on static spectral characteristics according to claim 1, characterized in that, The method further includes step 6: Model optimization: Calculate the loss function between the predicted gas name and concentration and the labels used to train the deep learning model, and use the Adam optimizer to optimize the parameters within the deep learning model so that the loss function reaches the threshold.