Method for predicting corrosion depth of aircraft wing skin based on gan and improved lightgbm algorithm
By employing a data augmentation method based on GAN and an improved LightGBM algorithm, the problem of insufficient data in aircraft wing skin corrosion prediction was solved, achieving high-precision prediction even with small sample sizes, and improving the model's data feature extraction capability and prediction accuracy.
Patent Information
- Authority / Receiving Office
- CN · China
- Patent Type
- Patents(China)
- Current Assignee / Owner
- CHINA AERO POLYTECH ESTAB
- Filing Date
- 2025-06-06
- Publication Date
- 2026-06-26
AI Technical Summary
Existing technologies for predicting corrosion of aircraft wing skin suffer from high data acquisition costs and poor data transferability, resulting in insufficient model prediction accuracy and difficulty in effectively predicting metal corrosion loss.
We employ a GAN-based and improved LightGBM algorithm, using generative adversarial networks for data augmentation. We further improve the LightGBM algorithm by utilizing K-Means clustering and weight centroid calculation to generate a predicted corrosion depth for aircraft wing skin.
It improves the accuracy of aircraft wing skin corrosion depth prediction under small sample conditions, effectively captures complex spatiotemporal dependencies and nonlinear interactions between variables in atmospheric data, and enhances the accuracy and completeness of data feature extraction.
Smart Images

Figure CN120744862B_ABST
Abstract
Description
Technical Field
[0001] This invention relates to the field of aircraft wing skin corrosion prediction, specifically to a method for predicting the corrosion depth of aircraft wing skin based on GAN and an improved LightGBM algorithm. Background Technology
[0002] During their service life, aircraft wing skins are constantly exposed to the high humidity of the atmosphere, making metal components highly susceptible to corrosion. This corrosion can lead to component failure, reduced load-bearing area, and decreased performance, threatening flight safety and increasing aircraft maintenance costs. Therefore, predicting the corrosion behavior and corrosion loss of metals under specific times and environments has considerable application value in the aerospace industry.
[0003] Machine learning models, due to their powerful nonlinear fitting and data learning capabilities, have a natural advantage in predicting metal corrosion behavior under complex atmospheric environments. This involves using environmental parameters that significantly influence corrosion behavior, along with outdoor exposure time, as model inputs, and the metal corrosion loss under these conditions as the output, designing a machine learning model as a supervised learning regression problem. However, unlike machine learning applications in other fields, the metal corrosion dataset used to train the model for metal corrosion loss prediction requires extensive outdoor exposure experiments. That is, the dataset...
[0004] Acquiring data is costly and difficult, and the models have poor transferability. Therefore, the lack of data has become a significant factor limiting the predictive accuracy of models. Data augmentation techniques can effectively solve the problem of insufficient data. Data augmentation, also known as data enlargement, aims to enhance existing data through specific algorithms or strategies without substantially increasing the amount of data, thereby generating value equivalent to more data. Data augmentation can effectively improve the accuracy of machine learning models under data constraints. For example, designing a machine learning regression model based on data augmentation strategies can significantly improve the accuracy of the original machine learning algorithm in predicting corrosion and wear of wing skin under atmospheric conditions during its service life. Summary of the Invention
[0005] To address the shortcomings of the existing technology, the present invention aims to provide a method for predicting the corrosion depth of aircraft wing skin based on GAN and an improved LightGBM algorithm. This method first utilizes generative adversarial networks for data augmentation and then uses a gradient regression model for sample regression.
[0006] Specifically, on the one hand, this invention provides a method for predicting the corrosion depth of aircraft wing skin based on GAN and an improved LightGBM algorithm, which includes the following steps:
[0007] S1. Data preprocessing: Collect atmospheric corrosion environmental data of aircraft skin, normalize the continuous data, and map the value range to the interval [-1,1].
[0008] S2. Construct a generative adversarial network to augment the atmospheric corrosion environmental data of aircraft skin, and obtain the augmented dataset.
[0009] S3. Improve the LightGBM algorithm and use the enhanced dataset to predict the corrosion depth of aircraft wing skin, specifically including:
[0010] S31, Search for the weight center point The output of LightGBM is divided into n subsets using the K-Means clustering algorithm, and the cluster centers of each subset are found. The center point of the weights in the i-th column The calculation formula is:
[0011] ;
[0012] S32, Calculate the threshold The threshold value output in the i-th column The calculation formula is:
[0013] ;
[0014] in, The average of the weights in the i-th column;
[0015] S33. Generating the weight vector and weight matrix of the predicted values: For a set of predicted values weight matrix The formula for calculating the weight vector in the i-th row and j-th column is:
[0016] ;
[0017] in, Weight matrix The weight vector in the i-th row and j-th column, The predicted value for the data in the i-th row and j-th column. and These are all model hyperparameters. , , The total number of rows in the prediction results data table. This represents the total number of columns in the prediction results data table;
[0018] S34. Calculate the predicted value of corrosion depth of aircraft wing skin. , The calculation method is shown in the following formula:
[0019] ;
[0020] Where Y is the predicted value of the initial LightGBM output, and T is the matrix transpose symbol.
[0021] Preferably, in step S2, the generative adversarial network includes a conditional generator and a discriminator. The conditional generator includes four fully connected layers, and the discriminator includes two fully connected layers. Through the fully connected layers of the generator, a scalar of continuous data is generated using the hyperbolic tangent function tanh. The parameters of the generator and discriminator are initialized, and the training hyperparameters are set. In the training iteration, the parameters of the discriminator and the generator are updated alternately, and the feature information between columns of the aircraft skin sample data is extracted as enhanced output data and a training set is constructed.
[0022] Preferably, in step S2, the specific steps for updating the discriminator parameters during the training iteration are as follows:
[0023] Sampling and generation: Sample a batch of random noise vectors from the noise distribution, input them into the generator to generate a batch of virtual data samples, and at the same time, obtain a batch of real data samples;
[0024] Loss calculation: Input real data and fake data into the discriminator respectively, and use the binary cross-entropy loss function to calculate the discriminator's loss on real data and virtual data. The goal of the discriminator is to maximize its ability to correctly classify real data and virtual data, that is, to make the discriminator output close to 1 for real data and close to 0 for virtual data.
[0025] Backpropagation update: Based on the calculated loss, the gradient of the discriminator parameters with respect to the loss function is calculated using the backpropagation algorithm, and the discriminator parameters are updated according to the specified learning rate.
[0026] Preferably, in step S2, the specific steps for updating the generator parameters during the training iteration are as follows:
[0027] Resampling and Generation: A batch of random noise vectors is resampled from the noise distribution, and a new batch of virtual data samples is generated by the generator.
[0028] Calculate the loss: Input the generated virtual data into the discriminator. The generator's goal is to make the discriminator misclassify the generated virtual data as real data, that is, to make the discriminator output a value close to 1 for the generated virtual data. Based on this goal, calculate the discriminator's loss on the generated virtual data.
[0029] Backpropagation update: Based on the calculated loss, the gradient of the generator parameters with respect to the loss function is calculated using the backpropagation algorithm, and the generator parameters are updated according to the specified learning rate.
[0030] Preferably, the first and second fully connected layers of the condition generator use the ReLU activation function, the third and fourth fully connected layers use the LeakyReLU activation function, and the fourth fully connected layer performs Dropout processing.
[0031] Preferably, all four fully connected layers of the condition generator are processed using batch normalization.
[0032] Preferably, the first and second fully connected layers of the discriminator use the ReLU activation function and have 128 neurons; the third and fourth fully connected layers of the condition generator have 512 neurons, and the Dropout process of the fourth fully connected layer of the condition generator deactivates neurons with a probability of 0.3.
[0033] Preferably, the improved condition generator is described by the following formula:
[0034] ;
[0035] in, For the input layer, These represent the first, second, third, and fourth layers of the generator model, respectively; z is a random noise vector; ReLU is the ReLU activation function; BN is batch normalization; FC is a fully connected layer; drop is dropout regularization; LeakyReLUn is the LeakyReLU activation function, where n is the probability of retaining neurons in the neural network; and gumbel softmax is discrete distribution reparameterization. Let i be the specific numerical scalar value of the i-th data in a given row; Let i be the one-hot vector of the i-th data in a certain row.
[0036] Preferably, the improved discriminator is described by the following formula:
[0037] ;
[0038] Where r is the sample data input to the discriminator. This is the output of the discriminator.
[0039] Preferably, the ReLU activation function expression is f(x) = max(0,x), and the LeakyReLU activation function expression is:
[0040] ;
[0041] in, It is a positive integer less than 1.
[0042] Compared with the prior art, the beneficial effects of the present invention are as follows:
[0043] This invention provides a method for predicting the corrosion depth of aircraft wing skin based on GAN and an improved LightGBM algorithm. It first uses an improved generative adversarial network for data augmentation, and then uses a gradient regression model to perform sample regression for predicting the corrosion depth of aircraft wing skin. It can also ensure prediction accuracy even with small sample sizes.
[0044] The generative adversarial network of this invention constructs a four-layer fully connected network, which, compared to the traditional two-layer structure, can more deeply mine the potential feature information between data columns, effectively capture the complex spatiotemporal dependencies and nonlinear interactions between variables in atmospheric data, such as the mutual influence between indicators such as temperature, humidity, and pollutant concentration, thereby improving the accuracy and completeness of data feature extraction.
[0045] This invention improves the LightGBM gradient regression model, making it more suitable for predicting corrosion depth of aircraft wing skin and significantly improving the prediction accuracy of aircraft wing skin corrosion depth. Attached Figure Description
[0046] Figure 1 This is an overall flowchart of the present invention;
[0047] Figure 2 This is a schematic diagram of the workflow of the present invention;
[0048] Figure 3 This is a schematic diagram of the LightGBM algorithm model in a specific embodiment of the present invention. Detailed Implementation
[0049] Hereinafter, embodiments of the present invention will be described with reference to the accompanying drawings.
[0050] This invention provides a method for predicting corrosion depth of aircraft wing skin based on GAN and an improved LightGBM algorithm, such as... Figure 1 and Figure 2 As shown, it includes the following steps:
[0051] S1. Data preprocessing: Collect atmospheric corrosion environment data of aircraft skin, normalize the continuous data, and map the value range to the interval [-1,1] to obtain the atmospheric corrosion environment dataset of aircraft skin. Each dataset contains five input attributes: average temperature, average relative humidity, annual precipitation, sunshine duration, and average air pressure.
[0052] S2. Construct a generative adversarial network to augment the atmospheric corrosion environmental data of aircraft skin, and obtain the augmented dataset.
[0053] Generative Adversarial Networks (GANs) consist of a conditional generator and a discriminator. The conditional generator comprises four fully connected layers, and the discriminator comprises two fully connected layers. Through the fully connected layers of the generator, a scalar of continuous data is generated using the hyperbolic tangent function tanh. A reparameterization method for discrete random variables is used to generate model vectors for discrete data, initialize the parameters of the generator and discriminator, and set the training hyperparameters. During training iterations, the parameters of the discriminator and generator are updated alternately, and the feature information between columns is extracted as augmented output data to construct the training set.
[0054] The discriminator parameter update process is as follows:
[0055] Sampling and Generation: A batch of random noise vectors is sampled from the noise distribution and input into the generator to generate a batch of virtual data samples. At the same time, a batch of real tabular data samples is obtained.
[0056] Loss calculation: Real and fake data are input into the discriminator separately, and the binary cross-entropy loss function is used to calculate the discriminator's loss on real and fake data. The goal of the discriminator is to maximize its ability to correctly classify real and fake data, that is, to make the discriminator output close to 1 for real data and close to 0 for fake data.
[0057] Backpropagation update: Based on the calculated loss, the gradient of the discriminator parameters with respect to the loss function is calculated using the backpropagation algorithm, and the discriminator parameters are updated according to a certain learning rate.
[0058] The generator parameter update process is as follows:
[0059] Resampling and Generation: A batch of random noise vectors is resampled from the noise distribution, and a new batch of virtual data samples is generated by the generator.
[0060] Loss calculation: The generated virtual data is input into the discriminator. The generator's goal is to make the discriminator misclassify the generated virtual data as real data, that is, to make the discriminator output a value close to 1 for the generated virtual data. Based on this, the discriminator's loss on the generated virtual data is calculated.
[0061] Backpropagation update: Based on the loss, the gradient of the generator parameters with respect to the loss function is calculated using the backpropagation algorithm, and the generator parameters are updated according to a certain learning rate.
[0062] By repeatedly performing the above two steps and alternately updating the parameters of the discriminator and the generator, the generator gradually generates more realistic tabular data, and the discriminator correspondingly improves its ability to distinguish between them. Eventually, a balance is reached where the discriminator can no longer accurately distinguish between the generated data and the real data.
[0063] The condition generator uses ReLU activation in its first and second fully connected layers, and LeakyReLU activation in its third and fourth fully connected layers. The fourth fully connected layer also undergoes Dropout. All four fully connected layers of the condition generator are processed using batch normalization. The discriminator uses ReLU activation in its first and second fully connected layers, with 128 neurons each. The third and fourth fully connected layers of the condition generator have 512 neurons each, and the Dropout process in the fourth fully connected layer deactivates neurons with a probability of 0.3.
[0064] In a specific embodiment, the ReLU activation function expression is f(x) = max(0,x), and the LeakyReLU activation function expression is:
[0065] ;
[0066] in, It is a positive integer less than 1.
[0067] Specifically, the generator uses four fully connected layers to extract feature information between columns. The first two fully connected layers use the ReLU activation function. To improve the generator's output accuracy, the third and fourth fully connected layers use the LeakyReLU activation function, and the number of neurons is increased to 512. To prevent overfitting, the fourth layer is treated with Dropout. Dropout is a regularization technique used in neural network training. It partially deactivates neurons according to Gaussian probability, reducing complex co-adaptation between neurons, thus forcing each neuron in the network to learn useful features independently, rather than relying on a specific combination of the remaining neurons. All four fully connected layers are processed using batch normalization (BN). Scalars are generated using the hyperbolic tangent function tanh, which limits the neuron's output to between -1 and 1, providing better gradients. The model vector is generated using reparameterized Gumbel-Softmax of discrete random variables. GumbelSoftmax adds samples from the Gumble distribution to the log probabilities of discrete variables and then applies the Softmax function, simulating an effect similar to directly drawing samples from a discrete distribution, making it well-suited for generating discrete data. Finally, the generation layer of the one-hot vectors of discrete value columns in the original framework is removed. The improved conditional generator is described by the following equation:
[0068] ;
[0069] in, For the input layer, These represent the first, second, third, and fourth layers of the generator model, respectively; z is a random noise vector; ReLU is the ReLU activation function; BN is batch normalization; FC is a fully connected layer; drop is dropout regularization; LeakyReLUn is the LeakyReLU activation function, where n is the probability of retaining neurons in the neural network; and gumbel softmax is discrete distribution reparameterization. For the specific numerical scalar of the i-th data in a certain row; Let i be the one-hot vector of the i-th data in a certain row.
[0070] In this dataset, firstly, the generator's complexity is higher than the traditional CTGAN framework; secondly, the data volume becomes smaller in small sample scenarios, making the original PacGAN framework redundant and requiring further improvement. Specifically, the Pac value is changed from 10 to 2 to accelerate the convergence process of CTGAN. Similar to the improved generator, the discriminator also removes the one-hot vector generation layer for discrete value columns from the original framework. Furthermore, compared to the original discriminator construction, the improved version still consists of two fully connected neural networks, but in the activation function of the second layer, ReLU is used instead of LeakyReLU, reducing the number of neurons from 256 to 128.
[0071] The improved discriminator is described by the following formula:
[0072] ;
[0073] Where r is the sample data input to the discriminator. This is the output of the discriminator.
[0074] S3. Improve the LightGBM algorithm and predict the corrosion depth of aircraft wing skin. The LightGBM algorithm model diagram is shown below. Figure 3 As shown. This step specifically includes the following sub-steps:
[0075] S31, Search for the weight center point The output of LightGBM is divided into n subsets using the K-Means clustering algorithm, and the cluster centers of each subset are found. The center point of the weights in the i-th column The calculation formula is:
[0076] ;
[0077] S32, Calculate the threshold The threshold value output in the i-th column The calculation formula is:
[0078] ;
[0079] in, The average of the weights in the i-th column;
[0080] S33. Generating the weight vector and weight matrix of the predicted values: For a set of predicted values weight matrix The formula for calculating the weight vector in the i-th row and j-th column is:
[0081] ;
[0082] in, Weight matrix The weight vector in the i-th row and j-th column, The predicted value for the data in the i-th row and j-th column. and These are all model hyperparameters. , , The total number of rows in the prediction results data table. This represents the total number of columns in the prediction results data table;
[0083] S34. Calculate the predicted value of corrosion depth of aircraft wing skin. , The calculation method is shown in the following formula:
[0084] ;
[0085] Where Y is the predicted value of the initial LightGBM output, and T is the matrix transpose symbol. Specific Implementation
[0087] S1. Data preprocessing: Collect atmospheric corrosion environmental data of aircraft skin, normalize the continuous data, and map the value range to the interval [-1,1].
[0088] This embodiment selects the atmospheric corrosion database for steel from the National Data Center for Materials Corrosion and Protection, China. This database contains atmospheric corrosion experimental results for three steels—Q235, 09Cu, and 10Cr—from 331 regions in China. Each dataset contains five input attributes: average temperature, average relative humidity, annual precipitation, sunshine duration, and average air pressure. The output consists of two attributes: the annual loss and the four-year loss for each of the three steels (loss data were measured using an ultrasonic corrosion thickness gauge during the experiments), totaling six attributes, i.e., a 6-dimensional vector.
[0089] The following table extracts environmental data from 15 test stations in the dataset:
[0090] Table 1 shows the atmospheric environmental data of some test stations.
[0091]
[0092] S2. Construct a generative adversarial network to augment the atmospheric corrosion environmental data of aircraft skin, resulting in an augmented dataset. The original training set is then fed into an improved CTGAN network to expand the data to five times its original size. After convergence, the data is used for prediction on the test set.
[0093] S3. The corrosion depth of the aircraft wing skin was predicted using the enhanced test set. The prediction results for the above 15 sites are shown in the table below (taking Q235 steel as an example):
[0094] Table 2. True labeled values and predicted values of corrosion thickness of Q235 steel at each test site in the dataset (unit: μm)
[0095]
[0096] The embodiments described above are merely preferred embodiments of the present invention and are not intended to limit the scope of the present invention. Various modifications and improvements made by those skilled in the art to the technical solutions of the present invention without departing from the spirit of the present invention should fall within the protection scope defined by the claims of the present invention.
Claims
1. A method for predicting corrosion depth of aircraft wing skin based on GAN and an improved LightGBM algorithm, characterized in that: It includes the following steps: S1. Data preprocessing: Collect atmospheric corrosion environmental data of aircraft skin, normalize the continuous data, and map the value range to the interval [-1,1]. S2. Construct a generative adversarial network to augment the atmospheric corrosion environmental data of aircraft skin, and obtain the augmented dataset. Generative Adversarial Networks (GANs) consist of a conditional generator and a discriminator. The conditional generator consists of four fully connected layers, and the discriminator consists of two fully connected layers. Through the fully connected layers of the generator, a scalar of continuous data is generated using the hyperbolic tangent function tanh. The parameters of the generator and discriminator are initialized, and the training hyperparameters are set. During the training iteration, the parameters of the discriminator and generator are updated alternately, and the feature information between columns of aircraft skin sample data is extracted as enhanced output data and a training set is constructed. S3. Improve the LightGBM algorithm and use the enhanced dataset to predict the corrosion depth of aircraft wing skin, specifically including: S31, Search for the weight center point The output of LightGBM is divided into n subsets using the K-Means clustering algorithm, and the cluster centers of each subset are found. The center point of the weights in the i-th column The calculation formula is: ; S32, Calculate the threshold The threshold value output in the i-th column The calculation formula is: ; in, The average of the weights in the i-th column; S33. Generating the weight vector and weight matrix of the predicted values: For a set of predicted values weight matrix The formula for calculating the weight vector in the i-th row and j-th column is: ; in, Weight matrix The weight vector in the i-th row and j-th column, The predicted value for the data in the i-th row and j-th column. and These are all model hyperparameters. , , The total number of rows in the prediction results data table. This represents the total number of columns in the prediction results data table; S34. Calculate the predicted value of corrosion depth of aircraft wing skin. , The calculation method is shown in the following formula: ; Where Y is the predicted value of the initial LightGBM output, and T is the matrix transpose symbol.
2. The method for predicting aircraft wing skin corrosion depth based on GAN and improved LightGBM algorithm according to claim 1, characterized in that: In step S2, the specific steps for updating the discriminator parameters during the training iteration are as follows: Sampling and generation: Sample a batch of random noise vectors from the noise distribution, input them into the generator to generate a batch of virtual data samples, and at the same time, obtain a batch of real data samples; Loss calculation: Input real data and fake data into the discriminator respectively, and use the binary cross-entropy loss function to calculate the discriminator's loss on real data and virtual data. The goal of the discriminator is to maximize its ability to correctly classify real data and virtual data, that is, to make the discriminator output close to 1 for real data and close to 0 for virtual data. Backpropagation update: Based on the calculated loss, the gradient of the discriminator parameters with respect to the loss function is calculated using the backpropagation algorithm, and the discriminator parameters are updated according to the specified learning rate.
3. The method for predicting aircraft wing skin corrosion depth based on GAN and improved LightGBM algorithm according to claim 1, characterized in that: In step S2, the specific steps for updating the generator parameters during the training iteration are as follows: Resampling and Generation: A batch of random noise vectors is resampled from the noise distribution, and a new batch of virtual data samples is generated by the generator. Calculate the loss: Input the generated virtual data into the discriminator. The generator's goal is to make the discriminator misclassify the generated virtual data as real data, that is, to make the discriminator output a value close to 1 for the generated virtual data. Based on this goal, calculate the discriminator's loss on the generated virtual data. Backpropagation update: Based on the calculated loss, the gradient of the generator parameters with respect to the loss function is calculated using the backpropagation algorithm, and the generator parameters are updated according to the specified learning rate.
4. The method for predicting aircraft wing skin corrosion depth based on GAN and improved LightGBM algorithm according to claim 1, characterized in that: The first and second fully connected layers of the condition generator use the ReLU activation function, while the third and fourth fully connected layers use the LeakyReLU activation function. The fourth fully connected layer also undergoes Dropout processing.
5. The method for predicting aircraft wing skin corrosion depth based on GAN and improved LightGBM algorithm according to claim 1, characterized in that: All four fully connected layers of the condition generator are processed using batch normalization.
6. The method for predicting aircraft wing skin corrosion depth based on GAN and improved LightGBM algorithm according to claim 4, characterized in that: The first and second fully connected layers of the discriminator use the ReLU activation function and have 128 neurons. The third and fourth fully connected layers of the condition generator have 512 neurons. The Dropout process of the fourth fully connected layer of the condition generator deactivates neurons with a probability of 0.
3.
7. The method for predicting aircraft wing skin corrosion depth based on GAN and improved LightGBM algorithm according to claim 1, characterized in that: The improved condition generator is described by the following formula: ; in, For the input layer, These represent the first, second, third, and fourth layers of the generator model, respectively; z is a random noise vector; ReLU is the ReLU activation function; BN is batch normalization; FC is a fully connected layer; drop is dropout regularization; LeakyReLUn is the LeakyReLU activation function, where n is the probability of retaining neurons in the neural network; and gumbel softmax is discrete distribution reparameterization. Let i be the specific numerical scalar value of the i-th data in a given row; Let i be the one-hot vector of the i-th data in a certain row.
8. The method for predicting corrosion depth of aircraft wing skin based on GAN and improved LightGBM algorithm according to claim 1, characterized in that: The improved discriminator is described by the following formula: ; Where r is the sample data input to the discriminator. This is the output of the discriminator.
9. The method for predicting corrosion depth of aircraft wing skin based on GAN and improved LightGBM algorithm according to claim 6, characterized in that: The ReLU activation function expression is f(x) = max(0,x), and the LeakyReLU activation function expression is: ; in, It is a positive integer less than 1.