Time sequence classification analysis method and system for glass quality influence factors
A technology of time series and influencing factors, applied in manufacturing computing systems, data processing applications, instruments, etc., can solve problems such as energy consumption, impact, and reduction of oxygen content in the kiln
Active Publication Date: 2021-05-04
WUHAN UNIV OF TECH
0 Cites 1 Cited by
AI-Extracted Technical Summary
Problems solved by technology
On the other hand, the quality of glass production is also affected by fluctuations in furnace pressure
If the kiln pressure is too high, the oxygen content in the kiln will be reduced, causing a corresponding impact. If the kiln pressure is too low, cold air will enter the kiln, and the preset temperature cannot be rea...
During specific implementation, the features with higher scores can be further analyzed for feature selection, and some variables are controlled by artificial control or automatic control technology so as to better control the factors on the glass production line and improve glass quality.
Step 5, model construction: use random forest, xgboost and lightgbm algorithm to construct the model of time series classification respectively, according to data set of the present invention, glass quality has 5 categories in total, and each time series data of input can correspond An output category. The present invention uses the training set to train the model, and uses the verification set to verify the performance of the model trained by the model. The model is trained multiple times by changing the training parameters of these three models until the model with the best performance is obtained so that the model has the best accuracy on the verification set.
 The present invention can provide a model for time series classification and analysis of factors affecting glass quality. The present invention combines the machine learning model with the glass manufacturing process, analyzes and models time series data such as furnace temperature, pressure, natural gas flow, oxygen flow, and pressure, and finds key factors that affect glass quality, thereby improving glass quality. Production yield. The random forest algorithm and tree-based integrated model xgboost and lightgbm models of the present invention classify time series data and glass quality data. ...
The invention provides a time sequence classification analysis method and system for glass quality influence factors, and the method comprises the steps: obtaining original time sequence data collected by each sensor on a glass production line and corresponding glass quality data, and adding a label to the glass quality data according to a glass quality index; segmenting the time sequence data to correspond to glass quality labels; performing feature construction on the processed time sequence data through data analysis, and analyzing and finding out relatively important time sequence features; dividing a training set and a verification set, respectively using a random forest, xgboost and lightgbm modes to construct a time sequence classification model, and iteratively training the model; the importance score obtained on the basis of a Permutation implication feature selection method and the importance score obtained by weighting results obtained by feature importance functions of corresponding models on the basis of prediction accuracy of a random forest model, an xgboost model and a lightgbm model are integrated, and a factor analysis result influencing the glass quality is obtained so as to correspondingly control factors on a glass production line.
Ensemble learningCharacter and pattern recognition +2
Environmental geologyData science +8
- Experimental program(1)
The technical solution of the present invention will be specifically described below with reference to the accompanying drawings and examples.
The present invention is capable of providing a model for time series classification and analysis of glass quality affected factors. The present invention is to combine the machine learning model and the glass manufacturing process, analyze and model the temperature, pressure, natural gas flow, oxygen flow, pressure, etc., and find key factors affecting the quality of the glass, thereby increasing the glass. Life rate of production. The random forest algorithm of the present invention and the tree-based integrated model XGBoost and LightGBM model are classified by time series data and glass quality data. The tree-based model will split each feature during classification, find the optimal split point to generate a tree, thereby achieving the classification of the time series. At the same time, the three models will be sorted after training, and the present invention can analyze and control the important features of the model selection, thereby better improving the quality of the glass.
Seefigure 1The embodiment of the present invention provides a time series classification method for analyzing the influencing factors of glass quality, including the following steps:
Step 1, data acquisition, obtain raw time series data and glass quality data, and add tags to the glass quality data according to the glass quality indicator:
Data acquisition is divided into acquisition of raw time series data and acquisition of glass quality data.
Time series data acquisition: Munction temperature, gas flow, pressure, etc. of the melting kiln, annealed kiln, etc., and measured the data every 10s, and the data is collected into the database through each sensor, then export the melting kiln from the database. Temperature, pressure, gas flow, etc. are relatively large depending on the glass quality.
Glass quality data acquisition: Get glass quality data is used to make labels, such as the defect type of each glass, and the number of defects per glass. For the label of the glass quality, the data obtained by the present invention is a parameter of the glass quality, so it is necessary to grade the glass quality according to these parameters, and the embodiment is preferably divided into five grades, that is, 5 categories.
The type of glass quality defects mainly have air bubble lengths, bubbles, inclusions, inclusions, point-shaped defects. EXAMPLES, according to these five variables, the quality of glass is divided, the less glass defects, the better the quality of the glass. Maximize the five variable data of all the glass, divided into the [0, 1] section, and then part of the 5 variables of each glass, then add data. One normalization, so that the defects of each glass can be represented by the value of the [0,1] interval, the value of the glass defect is between 0.2 is 1 grade, the value is 0.2 ~ 0.4 is 2 grade, the value At 0.4 ~ 0.6 is 3 grade, the value is 0.6 ~ 0.8 is 4 grade, the value is 0.8 ~ 1.0 is 5 grade, the higher the grade, the worse the quality of the glass.
In step one, since the type of data acquired by the sensor is very much, such as some switches, adjustment valves and devices fail to fail, may be removed, and can first remove it.
Step 2, data processing, separate the time series data, cut into many time sequence fragments to correspond to the glass quality tag:
First, analyze the time series data such as temperature, pressure, gas flow, such as the maximum, minimum, average value of this time series, whether there is a lack value. Then, the deletion data in the time series is filled by the data of the previous or next adjacent time, and the average value of the abnormal data is filled. After filling, the time series data is divided, and the sequence fragment is divided into a variety of time sequences to correspond to the glass quality tag. For the label of the glass quality, the present invention focuses on the quality of the glass quality, and grade the quality of the glass.
In an embodiment, the temperature, gas flow, and the pressure and other time series data are measured and stored by the sensor, there will be some missing data and abnormal data. Therefore, there is a need for some missing data, preferably by the previous time data using the current time, because these variables are not very large. Fill some exception data with average. At the same time, because the label data of the glass may only be stored in 10 minutes, the temperature, pressure, gas flow and other sensor data 10s are saved, and the time series data such as temperature, pressure, gas flow is required to ensure a sequence data. Corresponding to a glass quality label.
Step 3, Feature Engineering: Through data analysis, the time series data after step two data processing is characterized, and the analysis finds a relatively important feature.
Preferably, the embodiment of the present invention uses a TSFRESH time sequence feature tool library for further analysis of more comparison time sequence features.
TSFRESH is a Python package that extracts timing data characteristics, which can automatically calculate a large number of time series features, which describes the basic features of the time series, such as peak number, average value, or maximum or more complex feature, such as time inversion. Symmetric statistics.
In an embodiment, the data processing after data processing is characterized, and the analysis finds a more important feature, such as the maximum value, minimum value, mean, variance, peak number of time series, and the like. For example, using the TSFRESH library through the extract_features () function can extract more than 1000 features from each time series, then select the features of the constructor via the select_features () function, and finally only 300 important features are selected.
Step 4, dividing the data set: In order to better observe the prediction result of the model, the data processed after the characteristic engineering is divided into training set and verification set, and the score ratio is 8: 2, 80% of the data set is a training set. Used to train models, 20% of data sets are verification sets, which are used to verify the training.
Step 5. Model construct: Use a random forest, xgboost, and lightGBM algorithm to build a model of time sequence classification. According to the data set of the present invention, there is a total of 5 categories in the glass, and each time series data of the input will correspond to an output. category. The invention uses training sets to train models, with verification sets to verify the performance of model training. Multiple training in the model by changing the training parameters of these three models until the optimal model is optimal, making the model on the verification set.
Among them, the random forest algorithm is Bagging expansion variants, and it is further introduced in the training process of decision trees in the training process of decision tree in decision trees. XGboost is an integrated learning boosting that is an improvement to the Boosting algorithm on the basis of GBDT, and the regular item of the model complexity is added. The GBDT is an approximation of the negative gradient of the model as the residual to fit the residual. XGBoost also fitted data residuals and used Taylor expansion to model loss residuals, while adding regularizations on the loss function. LightGBM is a distributed efficient framework for implementing the GBDT algorithm. It performs decision tree generation through the LEAF-WISE split method, looking for feature split points based on histograms, and supports parallel learning, and can process big data more efficiently. By using these three trees-based models, the model training is carried out, and the results of the present invention will be more robust to the results of the present invention.
Step Six: Feature Importance Selection, the feature of the feature of the transformation importance method and the model is calculated, and several features with the highest score of feature score are used as factors affecting the quality of the glass:
(1) Permutation Importance (Replacement Features Importance) Feature Selection Method:
1) After step five training well, the embodiment of the present invention uses the XGBoost model (or using a random forest, the LightGBM model) obtained by step five, and scores the characteristics of the original validation concentration.
2) The value of a feature column of the verification set is then discontinued, and the validation set at this time is reatted through the model to obtain score.
3) The difference between the above 1) and 2) can be obtained to obtain this feature to the prediction, and the more important of this feature is more important.
Each feature is performed in the above method, and each feature is obtained, and the corresponding importance is divided into score0.
The PERMUTATION IMPORTANCE feature selection method observes the accuracy of the model forecast by chaos, the more accurate rates, the more important it is, and the more important it is. For example: the embodiment has a well-trained XGboost glass quality classification model and the classification accuracy of the model (such as Accuracy), the model is validated on the verification set. Accuracy is 100. EXAMPLES Temperature Time Series Data on a certain position in the melting kiln is verified on the verification set, and the predictive accuracy is performed on a well-rated model with this reordering time series data. If this model becomes 30 on the validation set, the importance of the temperature characteristics of this position in the melting kiln can be remembered as 70.
(2) The present invention selects the optimal random forest, XgBoost and LightGBM models of training, and records the accuracy of the predicted prediction using these three models to perform feature weighted.
Assuming that the accuracy of the random forest prediction is ACC1, the accuracy of the XGBOOST model prediction is ACC2. The accuracy of the LightGBM prediction is ACC3, then the present invention defines the weight Weight1 that is characterized by the random forest model Weight1 is ACC1 / (ACC1 + ACC2 + ACC3) ), the weight Weight2 of the XGBoost model gives the ACC2 / (ACC1 + ACC2 + ACC3), and the weight Weight3 of the LightGBM model assigns Weight3 is ACC3 / (ACC1 + ACC2 + ACC3). Then, by the characteristic importance function feature_importances_, the importance score of each feature can be obtained, and the implementation is scheduled to score1. The importance of feature_importances_, the importance scores of each feature can be obtained, and the embodiments can be obtained by the characteristic importance function of the XGBoost model. Remember score2, through the characteristic of the LightGBM model, the importance of feature_importances_, the importance score of each feature can be obtained, and the implementation is SCIE3.
During the specific implementation, the characteristic importance function of the model can be implemented in a prior art. For example, a function of scale feature importance in XGBoost optionally one of the following:
Weight: A feature is used to split data in all trees;
Cover: Same as above, first get a number of times a feature is used to split data in all trees, and then use the number of training data through these split points to impart weight;
GAIN: The average training loss is reduced when splitting with a feature. (3) The final score calculation formula for each feature is as follows:
Score = score0 + score1 × weight1 + score2 × weight2 + score3 × weight3
EXAMPLES The number of important features selected in advance, and 20 features with the highest feature score are selected as factors affecting the quality of the glass.
In the specific implementation, a feature selective selection is further analyzed, and some variables are controlled to better control the factors on the glass production line by humans control or automatic control techniques to better control the factors on the glass production line, and improve the quality of the glass.
For example, it is assumed that the factors affecting the quality of the glass have 100 positional temperature, the humidity of 100 positions and 100 positions, and the temperature characteristics of 10 positions can be obtained by feature selective, which is considered to be 10 positions. The temperature of the temperature is relatively large, and the temperature of these locations can be controlled.
DETAILED DESCRIPTION OF THE INVENTION The method proposed by the technical solution of the present invention can be implemented by using computer software technology using computer software technology, and system devices for implementing methods, such as a computer readable storage medium, which stores according to the technical solution of the present invention, and includes running corresponding computers. The computer device of the program should also be within the scope of protection of the present invention.
In some possible embodiments, there is provided a time series classification analysis system for glass quality affected factors, including the following modules.
The first module is used to obtain the original time series data acquired by each sensor on the glass production line and the corresponding glass quality data, and add a label according to the glass quality indicator;
The second module is used to separate the time series data, and the time sequence fragment of the segmentation is corresponding to the glass quality tag;
The third module is configured to characterize the time series data processed after the second module processing, and the analysis finds a comparison of time sequence characteristics.
The fourth module is used to divide the resulting result of the third module to divide into training set and verification set;
The fifth module is used to build a model of time sequence classification using random forest, xgboost, and lightgbm, and is based on training set and verification sets to obtain an optimal performance of performance;
Sixth module, used for feature importance selection, including integrated implementation IMPORTANCE feature selection method, and based on the random forest model of the fifth module, the predictive accuracy of the XGBoost model, and LightGBM model on the characteristic importance of the corresponding model The results obtained by the function are weighted, and the final feature importance score is obtained, and several characteristics with the highest score of special score are factors that affect the quality of the glass.
In some possible embodiments, a time series classification analysis system for glass quality affecting factors is provided, including processors and memory, memory for storage program instructions, processors for calling instructions in memory execution as described above. A time series classification analysis method for the influencing factors of glass quality.
In some possible embodiments, a time series classification analysis system for glass quality affecting factors, including a readable storage medium, the computer program, which is stored on the readable storage medium, and the computer program is implemented. A time series classification analysis method for the influencing factors of glass quality as described above.
The specific embodiments described herein are merely illustrative of the spirit of the invention. Those skilled in the art of the present invention can make a wide variety of modifications or supplements or replacement in a similar manner, but will not deviate from the spirit of the invention or transcending the appended claims. Range.
Description & Claims & Application Information
We can also present the details of the Description, Claims and Application information to help users get a comprehensive understanding of the technical details of the patent, such as background art, summary of invention, brief description of drawings, description of embodiments, and other original content. On the other hand, users can also determine the specific scope of protection of the technology through the list of claims; as well as understand the changes in the life cycle of the technology with the presentation of the patent timeline. Login to view more.