A wind turbine multi-fault diagnosis method based on GRU
By using a GRU-based fault diagnosis method, combined with manual feature extraction and an ESN model, and utilizing time-domain, frequency-domain, and time-frequency-domain analysis, the accuracy problem of multi-fault diagnosis in complex wind turbine systems was solved, achieving efficient fault identification and classification.
Patent Information
- Authority / Receiving Office
- CN · China
- Patent Type
- Patents(China)
- Current Assignee / Owner
- CHONGQING TECH & BUSINESS UNIV
- Filing Date
- 2023-06-20
- Publication Date
- 2026-06-23
AI Technical Summary
Existing technologies struggle to effectively combine vibration and current signals for accurate diagnosis of multiple faults in wind turbines, especially in complex systems with numerous fault types and complex data. Traditional methods are unable to fully reflect the inherent fault information of the system.
A fault diagnosis method based on GRU is adopted, which combines manual feature extraction and neural network feature extraction of ESN model. Through time domain, frequency domain and time-frequency domain analysis, the signal is monitored by acceleration and current sensors, and the feature matrix is input into GRU model for fault identification.
It improves the accuracy of multi-fault diagnosis for wind turbines, achieves better identification of fault categories and multi-fault classification, and enhances the intelligence and stability of feature extraction.
Smart Images

Figure CN116792264B_ABST
Abstract
Description
Technical Field
[0001] This invention relates to the field of fault diagnosis, and in particular to a method for diagnosing multiple faults in wind turbines based on GRU. Background Technology
[0002] With the increasing severity of energy issues and the large-scale increase in wind turbines, the effectiveness of state monitoring and fault diagnosis decision-making methods for complex systems such as wind turbines is becoming increasingly important in managing these critical and complex equipment. Numerous scholars have conducted research, such as introducing the functions and component schemes of online monitoring systems for wind turbine transmission system faults, using this system to perform abnormal spectrum analysis on real fault cases, and accurately diagnosing fault points; analyzing wind turbine vibration signals, plotting time-frequency domain diagrams and Welch Periodograms (WP), and through a series of experimental comparisons, demonstrating that WP can more effectively present the signal-to-noise ratio of vibration signals when using noisy vibration signals as the original input during monitoring; and using bearings, a complex component, as an application object, studying how their faults can induce stator current modulation; and using stator current detection methods to analyze defects in rolling bearings, providing a more effective fault diagnosis method for rotating machinery bearings.
[0003] Considering that the internal components of complex systems are black boxes and their fault states are unknown, this chapter explores the problem of multiple faults in complex systems. Vibration signals can basically reflect the faults of complex systems, but since they can only be installed on the outside of the equipment, the vibration signals measured cannot contain all the information when the faulty key component is unknown. The current waveform output by the system in its operating state is an internal signal of the system. Although it is much simpler than vibration signals, it can still reflect the operating state of the system to a certain extent and contains partial information about the distribution of faults. Therefore, how to combine the GRU model to realize the multi-fault diagnosis of wind turbines is a technical challenge for realizing the fault classification of wind turbines. Summary of the Invention
[0004] The purpose of this invention is to provide a multi-fault diagnosis method for wind turbines based on GRU. Addressing the challenges of increasingly diverse fault types and more complex data in complex systems, this invention proposes a neural network feature extraction method based on the ESN model, building upon manual feature extraction. This GRU fault diagnosis method, utilizing neural networks to directly start from low-level input signals and use the resulting feature matrix as input, can better identify fault categories, achieve multi-fault classification, and has high accuracy.
[0005] To achieve the above objectives, the technical solution adopted by this invention is: a multi-fault diagnosis method for wind turbines based on GRU, comprising the following steps:
[0006] Step 1: Use an accelerometer to monitor the original vibration signals in the x, y, and z axes under 12 different conditions: healthy, slight pitting of the gear ring, moderate pitting of the gear ring, partial breakage of the gear ring tooth, partial breakage of the sun gear tooth, complete breakage of the sun gear tooth, moderate double pitting of the sun gear, slight pitting of the planetary gears, moderate pitting of the planetary gears, moderate double pitting of the planetary gears, pitting of the planetary gears + breakage, and moderate double breakage of the planetary gears. Use a current sensor to monitor the original current signals in the x, y, and z axes under 12 different conditions: healthy, slight pitting of the gear ring, moderate pitting of the gear ring, partial breakage of the gear ring tooth, partial breakage of the sun gear tooth, complete breakage of the sun gear tooth, moderate double pitting of the sun gear, slight pitting of the planetary gears, moderate pitting of the planetary gears, moderate double pitting of the planetary gears, pitting of the planetary gears + breakage, and moderate double breakage of the planetary gears.
[0007] Step 2: Using manual feature extraction methods, feature extraction is performed on the monitored vibration and current signals in the time domain, frequency domain, and time-frequency domain to obtain vibration feature vectors T1, F1, TF1 and current feature vectors T2, F2, TF2 with dimensions of 36, 81, and 48 (12×3, 27×3, 16×3), respectively; neural network feature extraction is then performed on the monitored vibration and current signals using ESN to obtain vibration feature vector E1 and current feature vector E2.
[0008] Step 3: Use the feature vector matrix {T1 F1 TF1 E1 T2 F2 TF2 E2} as the final feature of each sample, and input the feature matrix formed by it into the GRU to complete the fault diagnosis of the gearbox.
[0009] Preferably, the GRU used in step three has only two gates: the update gate z. t and reset door r t ;
[0010] Update Gate Z t Used to control the range of state information between the previous time step and the current state. When z t The smaller the value, the more information the previous neuron needs to retain, while the current neuron needs to retain less information. Therefore, the update gate provides a good solution for capturing large-interval dependencies in the monitored data and for addressing the gradient decay problem in RNNs; the reset gate r t This is used to control the degree to which information from the previous time step is ignored; when the information from the previous neuron contains a lot of useless information, r t The value can be relatively small, which means that most of the information from the current neuron is used as input, that is, only a small part of the information from the previous neuron is retained; for the input vector x t The transformation function in the GRU hidden element is as follows:
[0011] z t =δ(W z x t +V z H t-1 )
[0012] r t =δ(W r x t +V r H t-1 )
[0013]
[0014]
[0015] Among them W z W r They are z t and r t The weight matrix, W c H is the weight matrix of the output state. t-1 The input is at time t-1. h represents the product of elements. t and H t V represents the candidate states and output state at time t, where δ and tanh are the candidate states for the update and reset gates, respectively. z V r These are parameter matrices and vectors; by analyzing the principles of the GRU model, it can be seen that the neurons in the GRU are interconnected and interdependent.
[0016] The GRU model's backpropagation network training employs the Back-Propagation Through Time (BPTT) algorithm. First, weighted inputs and the error at time t+1 are defined. Then, the propagation proceeds backward in time, calculating the error at time t and the output weights. This allows the determination of the weights at time t+1 and the calculation of the objective function's parameter gradient structure. This gradient is then used to update the model's parameters, enabling iterative model updates.
[0017] Preferably, manual feature extraction is a time-domain analysis method. Commonly used statistical feature parameters in time-domain analysis can be divided into dimensional and dimensionless parameters. Dimensional parameters include mean, skewness, peak value, standard deviation, variance, root mean square (RMS), and kurtosis. RMS is effective for detecting wear-related faults, while peak value is effective for detecting pitting damage on bearing surfaces. Skewness and kurtosis are sensitive to impulsive faults in the signal.
[0018] Dimensionless parameters include waveform factor, kurtosis factor, peak factor, impulse factor, and margin factor; kurtosis factor and margin factor are more advantageous for detecting early-stage impact-type faults.
[0019] Dimensional parameters are related not only to the operating state of the equipment, but also to its operating parameters such as speed and load; while dimensionless parameters are insensitive to changes in signal amplitude and frequency distribution, but are also independent of the equipment's operating conditions. Therefore, when diagnosing faults in complex systems, dimensional and dimensionless parameters are usually combined to take advantage of both and improve the accuracy of the diagnosis.
[0020] Preferably, Fourier Transform (FT) is used to extract frequency domain features. By performing a Fourier Transform on the time-domain signal, the horizontal axis of the resulting signal spectrum represents the frequency of the vibration signal, and the vertical axis represents the amplitude. To improve computational efficiency, Fast Fourier Transform (FFT) improves the Discrete Fourier Transform (DFT) algorithm by utilizing periodicity and symmetry, greatly reducing the computational load. Based on this DFT, FFT has the characteristics of simplicity and speed. FFT uses a sine curve with fidelity to accumulate and calculate the amplitude and phase of different sine waves in the signal, thereby transforming the signal into frequency domain sampling. Time-domain sampling corresponds to the periodicity of the frequency domain, which facilitates data processing.
[0021] For a sequence Xn, the DFT is defined as:
[0022]
[0023] Where N is the length of the sequence; the calculation of the DFT can be divided into two parts: one part is the odd-numbered part, and the other part is the even-numbered part; therefore, the above formula can be expressed as
[0024]
[0025] Where i = 0, 1, ..., N-1, and 0 ≤ n <M≡N / 2;
[0026] The feature parameters are the root mean square values of each sub-band after the signal undergoes FFT; the specific implementation process is shown below:
[0027]
[0028]
[0029] Where x(t) represents the acquired signal, S represents the number of spectral lines; X(f) represents the spectrum of x(t) after Fast Fourier Transform (FFT(i)). rms FFT(m) represents the root mean square value of the i-th sub-band after Fast Fourier Transform, while FFT(m) represents the m-th spectral line of the i-th sub-band.
[0030] Preferably, after time-domain feature extraction and FFT frequency-domain feature extraction, WPT is used as the time-frequency feature parameter to extract the monitoring signal. In the WPT energy spectrum feature parameter extraction, a suitable mother wavelet is first selected to perform n-level wavelet packet decomposition on the vibration signal, and the obtained coefficient vector is shown below:
[0031]
[0032] in This represents the coefficient of the i-th node in the n-th layer of the decomposition.
[0033] Then, the wavelet packet decomposition coefficients are reconstructed, and the reconstructed total signal S is shown in the following equation:
[0034]
[0035] in This represents the signal of the i-th node in the n-th layer. Calculate the energy value corresponding to each sub-band.
[0036]
[0037] In the above formula, x ik (i = 0, 1, 2... 2) n -1; k = 1, 2...N) represents the reconstructed signal The amplitude of the energy spectrum is given by N, where N represents the number of vibration signal sample points. The energy spectrum eigenvector F is then constructed as shown in the following equation:
[0038]
[0039] Preferably, the ESN in step two is an echo state network, composed of three different groups of neurons: an input layer, a storage layer, and an output layer. It uses a reservoir as the information processing medium to map the input signal from a low-dimensional space to a high-dimensional state space. Then, a linear regression method is used to train some of the network's connection weights, while the weights in other parts of the network are randomly connected during training. For a reservoir with N internal network units, the internal connection weight w... m It is an N×N matrix; representing the input u(t) at each time t, and the activation layer x(t) of the neurons in the reservoir is (x1(t), x2(t), ... x N The output is y(t).
[0040] The execution of the network is controlled by two state-space equations:
[0041] x(t)=f(w in u(t)+w m x(t-1)+v(t))
[0042] y(t)=g(w out [x(t),u(t)]),
[0043] Where w in These are the input weights, which are the connections from the input cells to the reservoir; w out is the output weight, which is the connection from the reservoir to the output state; f is the activation function, g is the nonlinear output unit, and tanh function is used here; [x(t),u(t)] is the vector connection; v(t) is the optional noise vector;
[0044] The final performance of the ESN depends on the parameters of the standby system, including the size of the input unit w. in The sparsity of the reservoir (SP), the size of the reservoir (N), and the internal weight matrix ε connecting the reservoirs. max When an ESN network is built, w out It will be learned; matrix w m and w in Initially, connections are made in a sparse, random pattern before learning begins; the reservoir is sparsely connected, with sparsity SP referring to the connections between neurons in the reservoir; the larger the proportion of interconnected neurons in the reservoir, the stronger the nonlinear approximation ability; typically, 1% to 5% connectivity is maintained; the more neurons in the reservoir, the more accurate the ESN's description of a given system; the characteristic of echo states is that if the network runs long enough, the current state of the network is determined only by the network's input and output; the reservoir vector w m It is determined by a parameter called spectral radius ε max Further calibration is performed using the absolute value w of the link weight matrix. m The largest eigenvalue, ε max A value less than 1 is a necessary condition to ensure network stability and achieve echo state characteristics;
[0045] The goal is to make the actual output y(n) of the network approximate the expected output y(n). In short, it's about optimizing the output layer. Therefore, the cost function of this model, which minimizes the training error, is specifically expressed as follows:
[0046] E rror (t)=g -1 y'(t)-w out (u'(t),x(t)),
[0047] Where y'(t) and u'(t) correspond to y(t) and u(t) after training.
[0048] The technical effects of this invention are as follows:
[0049] This study applies the EGRU (Geometric Array Root) method, which combines GRU with manual and neural network feature extraction, to the field of fault diagnosis to investigate its capabilities in feature learning and pattern recognition. Addressing the challenges of increasingly diverse fault types and more complex data in complex systems, a neural network feature extraction method based on the ESN (Enhanced Sequence Number) model is proposed, building upon manual feature extraction. The neural network can directly start from low-level input signals and obtain better feature representations through layer-by-layer intelligent learning, enhancing the intelligence of feature extraction. Furthermore, combining this method with traditional manual feature extraction, which is characterized by high stability and reliability, demonstrates that the GRU fault diagnosis method, using the feature matrix constructed from this method as input, can better identify fault categories, achieve multi-fault classification, and possesses high accuracy. Attached Figure Description
[0050] Figure 1 This is a flowchart illustrating a multi-fault diagnosis method for wind turbines based on GRU.
[0051] Figure 2 The data distribution diagrams for the top two features in the feature ranking sets obtained from the two models are shown.
[0052] Figure 3 The diagram shows the confusion matrix structure for the two models. Detailed Implementation
[0053] To enable those skilled in the art to better understand the technical solution of the present invention, the present invention will be described in detail below. The description in this part is only exemplary and explanatory, and should not be used to limit the scope of protection of the present invention in any way.
[0054] like Figure 1 As shown, a specific implementation example of this application is a multi-fault diagnosis method for wind turbines based on GRU, which includes the following steps:
[0055] Step 1: Build a wind turbine simulation experimental platform;
[0056] Step 2: Use an accelerometer to monitor the original vibration signals in the x, y, and z axes of 12 different states: healthy, slight pitting of the gear ring, moderate pitting of the gear ring, partial breakage of the gear ring, partial breakage of the sun gear, complete breakage of the sun gear, moderate double pitting of the sun gear, slight pitting of the planetary gear, moderate pitting of the planetary gear, moderate double pitting of the planetary gear, pitting of the planetary gear + breakage of the planetary gear, and moderate double breakage of the planetary gear.
[0057] Step 3: Using manual feature extraction methods, feature extraction is performed on the monitored vibration signal in the time domain, frequency domain, and time-frequency domain to obtain feature vectors T1, F1, and TF1 with dimensions of 36, 81, and 48 (12×3, 27×3, and 16×3), respectively; neural network feature extraction is performed on the monitored vibration signal using ESN to obtain feature vector E1.
[0058] Step 4: Use current sensors to monitor the original current signals in the x, y, and z axes under 12 different conditions: healthy, slight pitting of the gear ring, moderate pitting of the gear ring, partial breakage of the gear ring, partial breakage of the sun gear, complete breakage of the sun gear, moderate double pitting of the sun gear, slight pitting of the planetary gear, moderate pitting of the planetary gear, moderate double pitting of the planetary gear, pitting of the planetary gear + breakage of the planetary gear, and moderate double breakage of the planetary gear.
[0059] Step 5: Using manual feature extraction methods, extract features from the monitored current signal in the time domain, frequency domain, and time-frequency domain to obtain feature vectors T2, F2, and TF2, respectively. Use ESN to perform neural network feature extraction on the monitored vibration signal to obtain feature vector E2.
[0060] Step 6: Use the feature vector matrix {T1 F1 TF1 E1 T2 F2 TF2 E2} as the final feature of each sample, and input the resulting feature matrix into the GRU to complete the fault diagnosis of the gearbox; its structure and flow are as follows. Figure 1 As shown;
[0061] Sample collection
[0062] The multi-fault diagnosis experiment for complex systems collected vibration and current signals from the gear ring, sun gear, and planet gears under 12 different states. During data collection, the sampling frequency was 1MHz (far exceeding the requirements of the sampling theorem), and the duration of each sample was 20 seconds. The acceleration of the vibration signal was 100mV / g. This experiment selected 12,000 samples from the collected data (6,000 vibration signal samples + 6,000 current signal samples), with a sampling length of 1024. 80% of the samples (9,600 samples) were used as the training set, and the remaining 20% (2,400 samples) were used as the test set. The sample state labels are shown in Table 4.2, with 1,000 samples per category (800 for training + 200 for testing). The data was randomly shuffled during the experiment.
[0063] Table 4.1 12 Status Labels for Ring Gear, Sun Gear, and Planetary Gears
[0064]
[0065]
[0066] Feature extraction
[0067] Table 4.2 Number of vibration signal features extracted in multi-fault diagnosis experiment
[0068]
[0069] In this experiment, 12 time-domain features (calculation formulas for each time-domain index are shown in Table 4.2), 27 frequency-domain features, and 16 time-frequency-domain features were extracted from the vibration signal data along the x, y, and z axes, respectively. ESN was then used to extract 200 features; that is, the feature dimension for each category is (12+27+16)×3+200=465. Similarly, 12 time-domain features (calculation formulas for each time-domain index are shown in Table 4.3), 27 frequency-domain features, and 16 time-frequency-domain features were extracted from the monitored current signals along the x, y, and z axes, respectively. ESN was then used to extract 200 features. That is, the feature dimension for each category is (12+27+16)×3+200=465. This was used as input for fault classification; the model feature extraction structure settings are shown in Tables 4.2 and 4.3.
[0070] Table 4.3 Number of Current Signal Features Extracted in Multi-Fault Diagnosis Experiment
[0071]
[0072] Artificial Feature Extraction
[0073] Fault feature extraction is crucial for fault diagnosis. Single fault diagnosis experiments based on original samples and GRUs have demonstrated that without in-depth mining of information in monitoring signals, a high fault identification rate is impossible in complex systems. In signal analysis of complex systems, the simplest and most commonly used method is time-domain analysis. Although time-domain statistical analysis cannot accurately determine the fault type or location, it is computationally simple and can intuitively display the operating status of the equipment. The statistical characteristic parameters commonly used in time-domain analysis can be further divided into dimensional parameters and dimensionless parameters.
[0074] Dimensional parameters include mean, skewness, peak value, standard deviation, variance, root mean square (RMS), and kurtosis. Generally, the RMS value is more effective for detecting wear-related faults, while the peak value is more effective for detecting pitting damage on bearing surfaces. The magnitude of skewness and kurtosis is more sensitive to impulsive faults in the signal.
[0075] Dimensionless parameters include waveform factor, kurtosis factor, peak factor, impulse factor, and margin factor; kurtosis factor and margin factor are more advantageous for detecting early-stage impact-related faults.
[0076] Dimensional parameters are related not only to the operating state of the equipment but also to its operating parameters such as speed and load; while dimensionless parameters are insensitive to changes in signal amplitude and frequency distribution, but are also independent of the equipment's operating conditions. Therefore, when diagnosing faults in complex systems, dimensional and dimensionless parameters are often combined to leverage the advantages of both and improve diagnostic accuracy.
[0077] In signal analysis of complex systems, time-domain signal analysis alone is often insufficient. Fourier Transform (FT) is a method frequently used in signal processing to extract frequency-domain features. The principle of Fourier transform shows that any continuously measured signal can be represented by an infinite superposition of sinusoidal signals of different frequencies. For example, the frequency spectrum of a curve might consist of several vertical lines, allowing for a more intuitive exploration of its properties. Therefore, by performing a Fourier transform on the time-domain signal, the horizontal axis of the resulting signal spectrum represents the frequency of the vibration signal, and the vertical axis represents the amplitude. To improve computational efficiency, the Fast Fourier Transform (FFT) improves upon the Discrete Fourier Transform (DFT) algorithm by utilizing periodicity and symmetry, significantly reducing the computational load. Based on this DFT, FFT is simple and fast. FFT uses a sinusoidal curve with fidelity to accumulate and calculate the amplitude and phase of different sinusoidal waves in the signal, thus transforming the signal into frequency-domain samples. Time-domain sampling corresponds to the periodicity of the frequency domain, facilitating data processing.
[0078] For a sequence Xn, the DFT is defined as:
[0079]
[0080] Where N is the length of the sequence; researchers have demonstrated that the computation of the DFT can be divided into two parts: an odd-numbered part and an even-numbered part. Therefore, the above formula can be expressed as:
[0081]
[0082] Where i = 0, 1, ..., N-1, and 0 ≤ n <M≡N / 2。
[0083] The feature parameters used in this paper are the root mean square values of each sub-band of the signal after FFT; the specific implementation process is shown below:
[0084]
[0085]
[0086] Where x(t) represents the acquired signal, and S represents the number of spectral lines. X(f) represents the spectrum of x(t) after Fast Fourier Transform (FFT(i)). rms FFT(m) represents the root mean square value of the i-th sub-band after Fast Fourier Transform, while FFT(m) represents the m-th spectral line of the i-th sub-band.
[0087] For early-stage faults or relatively obvious fault characteristics in complex systems, time-domain and frequency-domain analysis methods are effective. However, as the complexity of the fault increases, the fault state becomes difficult to identify, and the acquired signal exhibits strong nonlinear and non-stationary characteristics. Wavelet Transform (WT) only decomposes the low-frequency part of the signal and does not decompose the high-frequency signal, so its frequency resolution decreases as the frequency increases. Wavelet Packet Decomposition (WPT) overcomes the limitation of WT's poor frequency resolution in the high-frequency band by dividing the time-frequency plane more finely to improve the signal analysis capability. WPT uses analysis trees to represent wavelet packets. After dividing the frequency band into different levels, it adaptively selects the best basis function that matches the signal based on the characteristics of these divided signals. This makes wavelet packet decomposition a method that can provide a more refined and effective way to extract feature information from signals.
[0088] Therefore, after using time-domain feature extraction and FFT frequency-domain feature extraction, WPT is then used as a time-frequency feature parameter to extract the monitoring signal. In the WPT energy spectrum feature parameter extraction, a suitable mother wave is first selected to perform n-level wavelet packet decomposition on the vibration signal, and the obtained coefficient vector is shown below:
[0089]
[0090] in This represents the coefficient of the i-th node in the n-th layer of the decomposition.
[0091] Then, the wavelet packet decomposition coefficients are reconstructed, and the reconstructed total signal S is shown in the following equation:
[0092]
[0093] in Represent the signal of the i-th node in the n-th layer, and calculate the energy value corresponding to each sub-band.
[0094]
[0095] In the above formula, x ik (i = 0, 1, 2... 2)n -1; k = 1, 2...N) represents the reconstructed signal S n i The amplitude, where N represents the number of vibration signal sample points, is then used to construct the energy spectrum eigenvector F as shown in the following equation:
[0096]
[0097] Table 4.31 Calculation formula for time-domain feature extraction index
[0098]
[0099] The EchoStateNetworks (ESN) architecture consists of three distinct sets of neurons: an input layer, a storage layer, and an output layer. Its main idea is to use a reservoir as the information processing medium, mapping the input signal from a low-dimensional space to a high-dimensional state space. Then, a linear regression method is used to train some of the network's connection weights, while the weights for other network components are randomly connected during training. For a reservoir with N internal network units, the internal connection weight w... m It is an N×N matrix; representing the input u(t) at each time t, and the activation layer x(t) of the neurons in the reservoir is (x1(t), x2(t), ... x N The output is y(t).
[0100] The execution of the network is controlled by two state-space equations:
[0101] x(t)=f(w in u(t)+w m x(t-1)+v(t))
[0102] y(t)=g(w out [x(t),u(t)])
[0103] Where w in These are the input weights, which are the connections from the input cells to the reservoir; w out is the output weight, which is the connection from the reservoir to the output state; f is the activation function, g is the nonlinear output unit, and tanh function is used here; [x(t),u(t)] is the vector connection; v(t) is the optional noise vector;
[0104] The final performance of the ESN depends on the parameters of the standby system, including the size of the input unit w. in The sparsity of the reservoir (SP), the size of the reservoir (N), and the internal weight matrix ε connecting the reservoirs. max When an ESN network is built, w out It will be learned; matrix w m and win Initially, connections are made in a sparse, random pattern before learning begins; the reservoir is sparsely connected, with sparsity SP referring to the connections between neurons in the reservoir; the larger the proportion of interconnected neurons in the reservoir, the stronger the nonlinear approximation ability; typically, 1% to 5% connectivity is maintained; naturally, the more neurons in the reservoir, the more accurate the ESN's description of a given system; however, an excessively large reservoir size may lead to overfitting. Therefore, choosing an appropriate number of neurons N is crucial; the characteristic of echo states is that if the network runs long enough, the current state of the network is determined solely by the network's input and output; the reservoir vector w... m It is determined by a parameter called spectral radius ε max Further calibration is performed using the absolute value w of the link weight matrix. m The largest eigenvalue, ε max A value less than 1 is a necessary condition to ensure network stability and achieve echo state characteristics;
[0105] The goal is to make the actual output y(n) of the network approximate the expected output y(n). In short, it's about optimizing the output layer. Therefore, the cost function of this model, which minimizes the training error, is specifically expressed as follows:
[0106] E rror (t)=g -1 y'(t)-w out (u'(t),x(t)),
[0107] Where y'(t) and u'(t) correspond to y(t) and u(t) after training.
[0108] Parameter Design
[0109] For artificial intelligence algorithms, parameter selection is crucial and challenging. Among the key parameters, the number of feature subsets is obtained using 5-fold cross-validation (5-CV). The optimal feature subset is selected based on the highest average F1 score from the cross-validation results, reducing the impact of useless features and improving the fault diagnosis process. First, the original data is divided into 5 groups. Each subset is used as a validation set, and the remaining 4 subsets are used as the training set, resulting in 5 models. The average classification accuracy of these 5 models on the final validation set is used as the performance metric for the classifier under 5-CV. 5-CV effectively avoids overfitting and underfitting, and the final results are quite convincing. The parameter s in the 5-fold cross-validation method is selected within the range {s|1≤s≥300, s∈N}. +Cross-validation, a cyclic estimation method, makes the obtained parameters more stable and reliable. However, due to its large computational load and limited experimental conditions, this paper only uses cross-validation when selecting the optimal feature subset. After obtaining the F1 values of different subsets, the feature set corresponding to the highest F1 value is selected as the optimal feature subset. The parameter settings are shown in Table 4.4.
[0110] Table 4.4 Gear Ring Status Recognition Parameter Structure Settings
[0111]
[0112] Experimental results
[0113] The F1 values of the two models were calculated and compared. It can be seen that the F1 value of GRU is the highest, close to 70%, while the F1 value of LSTM is close to 65%, indicating that the selected features are more correlated with faults.
[0114] The number of feature subsets for the two models at their highest F1 scores was counted. Since LSTM and GRU have similar structures, the number of feature subsets is about the same, at 151 and 156 respectively, as shown in Table 4.5; indicating that the models have strong robustness.
[0115] Table 4.5 Gear Ring Status Recognition Parameter Structure Settings
[0116]
[0117] The experimental results of multi-fault diagnosis for complex systems based on vibration signals are shown in Table 4.6. When diagnosing faults based on vibration signals, GRU achieved the best classification accuracy (60.396%) among the five models: SVM, RF, LSTM, ESN, and GRU. SVM performed the worst, with an accuracy of only 22.726%. ESN's performance was still not as outstanding as when based on the original signal. The GRU model, which incorporates ESN to process features (denoted as EGRU), achieved a classification accuracy of 70.975%, an improvement of 17.516% over the GRU model. This indicates that the EGRU model has the highest classification accuracy for a single signal, i.e., vibration signals.
[0118] In the multi-fault diagnosis experiment of complex systems based on vibration signals, the classification accuracy of GRU was improved by 8% compared with LSTM. Table 4.7 lists the operation time of LSTM model and GRU model in the multi-fault diagnosis experiment of complex systems based on vibration signals. The results show that the average operation time of LSTM model is 170.636s and the average operation time of GRU model is 156.131s, which is still shorter than the operation time of LSTM model with the same structure. This shows that the operation efficiency of GRU model is higher than that of LSTM.
[0119] Table 4.6 Accuracy of Multi-Fault Diagnosis for Complex Systems Based on Vibration Signals
[0120]
[0121] Table 4.7 Experimental times for LSTM and GRU
[0122]
[0123] The experimental results of multi-fault diagnosis of complex systems based on current signals are shown in Table 4.8. The classification performance of GRU is still better than that of LSTM, with an accuracy of 53.988%. The classification accuracy of EGRU model reached 63.788%, which is 18.152% higher than that of GRU model.
[0124] In the experiment of multi-fault diagnosis of complex systems based on current signals, the classification accuracy of GRU was improved by 24.984% compared with LSTM. Table 4.9 lists the time of LSTM model and GRU model in the experiment of multi-fault diagnosis of complex systems based on current signals. The results show that the average computation time of LSTM model is 187.67s and the average computation time of GRU model is 162.223s, which is still shorter than the running time of LSTM model.
[0125] Table 4.8 Accuracy of Multi-Fault Diagnosis for Complex Systems Based on Current Signals
[0126]
[0127]
[0128] Table 4.9 Five Experimental Times for LSTM and GRU
[0129]
[0130] The experimental results of multi-fault diagnosis of complex systems based on feature layer fusion are shown in Table 4.10. The EGRU classification accuracy in Tables 4.6, 4.8, and 4.10 is the highest, indicating that manually extracted features rely heavily on professional signal processing knowledge and fault diagnosis experience, and cannot fully learn fault characteristics. This further demonstrates that combining manually extracted features with features extracted by neural networks can overcome the dependence on fault diagnosis knowledge and experience, improve the stability and intelligence of recognition, and increase the accuracy of feature classification to a certain extent. The results in Table 4.10 are better than the corresponding models in Tables 4.6 and 4.8. For example, the accuracy of the EGRU model based on feature layer fusion is 80.058%, which is 12.798% higher than the EGRU model based on vibration signals and 25.506% higher than the EGRU model based on current signals. This shows that information fusion technology can increase the reliability of data and improve the classification accuracy of the model to a certain extent.
[0131] Table 4.10 Accuracy of Multi-Fault Diagnosis for Complex Systems Based on Feature Layer Fusion
[0132]
[0133] In the experiment of multi-fault diagnosis of complex systems based on feature layer fusion, the classification accuracy of GRU was improved by 24.984% compared with LSTM. Table 4.11 lists the time of LSTM model and GRU model in the experiment of multi-fault diagnosis of complex systems based on current signal. The results show that the average computation time of LSTM model is 281.657s and the average computation time of GRU model is 218.497s, which is still shorter than the running time of LSTM model. Meanwhile, the average computation time of EGRU model is 267.325s, indicating that due to the ESN neural network feature extraction on the basis of GRU model, the increase in model input leads to the corresponding increase in time. However, the time of EGRU model is still shorter than that of LSTM model, indicating that it has better model computing power.
[0134] Table 4.11 Experimental times for LSTM and GRU
[0135]
[0136] Neural Network Feature Extraction Evaluation Analysis
[0137] To evaluate the performance of the proposed EGRU model feature extraction method, the data distributions of the first two features learned from the vibration signal were selected and presented as follows. Figure 2 As shown in the diagram, each shape represents a state (there are twelve different states in this experiment). From Figure 2The comparison shows that the EGRU model has a greater ability to distinguish between different categories of data points, indicating that manual feature extraction and neural network feature extraction can greatly improve the ability to classify faults.
[0138] Therefore, this experiment shows that neural networks can directly start from low-level input signals and obtain better feature representations through layer-by-layer intelligent learning, thus enhancing the intelligence of feature extraction. In summary, the experimental results of EGRU show that when performing multi-fault diagnosis, combining manual feature extraction and neural network adaptive feature extraction and applying them to the GRU model will improve the model's ability to diagnose multiple faults in complex systems.
[0139] Model evaluation analysis
[0140] Based on the distribution of each class in the training data, the final fault diagnosis accuracy may be very high in some classes (with good training data) and relatively low in other classes (with less / poor data); while the confusion matrix can simply represent the information hidden in the classification accuracy, and is a summary of the prediction results of the classification problem by subdividing each class. Figure 3 Table 4.10 shows the confusion matrix of the GRU and EGRU models at the highest accuracy in five experiments; Table 4.12 lists the number of correct predictions and the number of incorrect predictions for each category of EGRU in the multi-fault diagnosis experiment; In terms of the diagnostic accuracy of each category, the diagnostic performance of the EGRU model is significantly improved compared with other models, and the number of correct fault diagnoses for each category is relatively even, indicating that the model can treat samples of each category fairly in the classification process.
[0141] Table 4.12 EGRU Confusion Matrix for Multi-Fault Diagnosis Experiment
[0142]
[0143] In summary, neural networks can directly start from low-level input signals and obtain better feature representations through layer-by-layer intelligent learning, thus enhancing the intelligence of feature extraction. Compared with feature extraction by neural network algorithms, traditional manual feature extraction has the characteristics of strong stability and high reliability. However, neural network feature extraction is free from dependence on signal processing technology and diagnostic experience, eliminating the tedious and complex process of manually extracting fault features. In addition, manually statistical feature parameters and fault features extracted by neural networks are fault features extracted from different perspectives. Combining the two can more effectively improve the classification accuracy of the classifier.
[0144] Based on the traditional fault diagnosis of complex systems which relies on manual feature extraction, this paper proposes a fault diagnosis method that combines neural network feature extraction with manual feature extraction. This method first uses time domain, frequency domain, and time-frequency domain methods to extract fault features. Then, it combines the features extracted by the neural network with the time domain features extracted by manual methods to form an augmented feature vector, which is then input into a GRU for fault diagnosis. Finally, an application example of multi-fault diagnosis of complex systems demonstrates that the proposed method EGRU can effectively improve the diagnostic accuracy.
[0145] It should be noted that, in this document, the terms “comprising,” “including,” or any other variations thereof are intended to cover non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements includes not only those elements but also other elements not expressly listed, or elements inherent to such process, method, article, or apparatus.
[0146] This article uses specific examples to illustrate the principles and implementation methods of the present invention. The above examples are only for the purpose of helping to understand the method and core ideas of the present invention. The above descriptions are only preferred embodiments of the present invention. It should be noted that due to the limitations of textual expression, while there are objectively infinite specific structures, those skilled in the art can make several improvements, modifications, or changes without departing from the principles of the present invention, and can also combine the above technical features in an appropriate manner. These improvements, modifications, changes, or combinations, or the direct application of the inventive concept and technical solution to other situations without modification, should all be considered within the scope of protection of the present invention.
Claims
1. A multi-fault diagnosis method for wind turbines based on GRU, characterized in that, Includes the following steps: Step 1: Use an accelerometer to monitor the original vibration signals in the x, y, and z axes under 12 different conditions: healthy, slight pitting of the gear ring, moderate pitting of the gear ring, partial breakage of the gear ring tooth, partial breakage of the sun gear tooth, complete breakage of the sun gear tooth, moderate double pitting of the sun gear, slight pitting of the planetary gears, moderate pitting of the planetary gears, moderate double pitting of the planetary gears, pitting of the planetary gears + breakage, and moderate double breakage of the planetary gears. Use a current sensor to monitor the original current signals in the x, y, and z axes under 12 different conditions: healthy, slight pitting of the gear ring, moderate pitting of the gear ring, partial breakage of the gear ring tooth, partial breakage of the sun gear tooth, complete breakage of the sun gear tooth, moderate double pitting of the sun gear, slight pitting of the planetary gears, moderate pitting of the planetary gears, moderate double pitting of the planetary gears, pitting of the planetary gears + breakage, and moderate double breakage of the planetary gears. Step 2: Using manual feature extraction methods, feature extraction is performed on the monitored vibration and current signals in the time domain, frequency domain, and time-frequency domain to obtain vibration feature vectors T1, F1, TF1 and current feature vectors T2, F2, TF2 with dimensions of 36, 81, and 48 (12×3, 27×3, 16×3), respectively; neural network feature extraction is then performed on the monitored vibration and current signals using ESN to obtain vibration feature vector E1 and current feature vector E2. Step 3: Use the feature vector matrix {T1 F1 TF1 E1 T2 F2 TF2 E2} as the final feature of each sample, and input the feature matrix formed by them into the GRU to complete the fault diagnosis of the gearbox. The GRU used in step three has only two gates: the update gate z. t and reset door r t ; Update Gate Z t Used to control the range of state information between the previous time step and the current state, when z t The smaller the value, the more information the previous neuron needs to retain, while the current neuron needs to retain less information; reset gate r t Used to control the degree to which information from the previous time step is ignored; when the information from the previous neuron contains useless information, r t A smaller value indicates that the current neuron's information is used as input, meaning that only a portion of the information from the previous neuron is retained; for the input vector x t The transformation function in the GRU hidden element is as follows: Among them W z W r They are z t and r t The weight matrix, W c H is the weight matrix of the output state. t-1 The input is at time t-1. h represents the product of elements. t and H t These are the candidate states and output state at time t, respectively, where δ and tanh are the candidate states for the update gate and reset gate, respectively. z V r These are parameter matrices and vectors, respectively; by analyzing the principle of the GRU model, it is found that the neurons in the GRU are interconnected and interdependent. The GRU model's backward network training uses the Back-Propagation Through Time (BPTT) algorithm. First, weighted inputs and the error at time t+1 are defined. Then, the propagation proceeds backward in time, that is, the error and output weights at time t are calculated, the weights at time t+1 are obtained, and the parameter gradient structure of the objective function is calculated. This is used as the basis for updating the model's parameters and thus iteratively updating the model.
2. The method for multi-fault diagnosis of wind turbines based on GRU according to claim 1, characterized in that, Manual feature extraction is a time-domain analysis method. Commonly used statistical feature parameters in time-domain analysis methods are divided into dimensional parameters and dimensionless parameters. Dimensional parameters include mean, skewness, peak value, standard deviation, variance, root mean square (RMS), and kurtosis. The RMS value corresponds to the detection of wear-related faults, while the peak value corresponds to the detection of pitting damage on the bearing surface. The magnitude of skewness and kurtosis corresponds to impulsive faults in the signal. Dimensionless parameters include waveform factor, kurtosis factor, peak factor, impulse factor, and margin factor; kurtosis factor and margin factor are used to detect early-stage faults of the impact type.
3. The method for multi-fault diagnosis of wind turbines based on GRU according to claim 2, characterized in that, Fourier Transform (FT) is used to extract frequency domain features; by performing Fourier Transform on the time domain signal, in the obtained signal spectrum, the horizontal axis represents the frequency of the vibration signal and the vertical axis represents the amplitude. For sequence X n The DFT is defined as: Where N is the length of the sequence; the DFT calculation is divided into two parts: an odd-numbered part and an even-numbered part; therefore, the above formula can be expressed as Where i = 0, 1, …, N-1, and 0 ≤ n <N / 2; The feature parameters are the root mean square values of each sub-band after the signal undergoes FFT; the specific implementation process is shown below: in This represents the acquired signal, and S represents the number of spectral lines; express The spectrum after Fast Fourier Transform This represents the root mean square value of the i-th sub-band after the Fast Fourier Transform, while This represents the m-th spectral line of the i-th sub-band.