Voltage transformer error prediction method based on modal decomposition and gated recurrent unit
By combining mode decomposition and gated cyclic units, the problem of error prediction for voltage transformers under complex operating conditions is solved, enabling accurate prediction over multiple time steps and improving the operational reliability of the power grid.
Patent Information
- Authority / Receiving Office
- CN · China
- Patent Type
- Patents(China)
- Current Assignee / Owner
- SOUTHEAST UNIV
- Filing Date
- 2025-04-27
- Publication Date
- 2026-06-26
AI Technical Summary
Existing error prediction methods for voltage transformers are inadequate in terms of dynamic adaptability and prediction error accumulation, and are particularly difficult to meet the accuracy requirements of the power grid under complex operating conditions.
A method based on mode decomposition and gated recurrent unit is adopted. Through adaptive empirical mode decomposition and nonlocal attention mechanism, the voltage transformer signal is decomposed into multiple mode components. Feature fusion is performed by combining gated recurrent unit and nonlocal attention mechanism to achieve multi-step prediction.
It effectively suppresses noise interference, improves the signal-to-noise ratio, dynamically balances local time-series features and global contextual information, solves the error accumulation problem in long-term series prediction using traditional methods, and improves the accuracy and stability of prediction.
Smart Images

Figure CN120449093B_ABST
Abstract
Description
Technical Field
[0001] This invention relates to the field of online monitoring technology for voltage transformers, and specifically discloses a voltage transformer error prediction method based on mode decomposition and gated cyclic units. Background Technology
[0002] Voltage transformers are core measuring devices in power systems. Their core function is to achieve accurate conversion and electrical isolation between the primary high-voltage system and secondary equipment, providing necessary voltage signals for key functions such as power metering, relay protection, and system monitoring. With the construction of ultra-high-voltage power grids and large-scale grid connection of energy sources, capacitive voltage transformers (CVTs) face increasingly severe operating environment challenges. On the one hand, harmonic pollution and DC bias caused by AC / DC hybrid systems significantly increase the complexity of the electromagnetic environment. On the other hand, during long-term operation, their internal key components (such as capacitive voltage dividers and electromagnetic units) are subjected to multi-physical field coupling effects such as temperature fluctuations and mechanical disturbances, resulting in insulation aging and deterioration of core magnetic properties. This leads to a gradual increase in measurement errors, seriously affecting the measurement accuracy and operational reliability of the power system.
[0003] Currently, fault diagnosis of voltage transformers (CVTs) mainly relies on a passive approach of periodic calibration or post-event maintenance, lacking online monitoring and early warning mechanisms for early performance degradation. Since CVTs typically operate under high voltage, strong electromagnetic interference, and extreme temperature environments, the performance degradation of their internal components is a gradual process. If not detected and addressed promptly, it can lead to serious consequences such as metering inaccuracies and protection malfunctions, directly impacting the safe and stable operation of the power system. Therefore, studying the error evolution patterns of voltage transformers and developing high-precision signal state prediction methods are of great significance for achieving intelligent equipment operation and maintenance, reducing sudden faults, and improving power grid reliability.
[0004] Existing methods for predicting the state of voltage transformers have certain limitations and shortcomings, including the following:
[0005] 1) Mechanism Modeling Method: Based on the law of electromagnetic induction and equivalent circuit theory, a differential equation or state-space model is established, incorporating primary / secondary winding parameters, core excitation impedance, distributed capacitance, and other elements, to quantitatively analyze the error characteristics of the voltage transformer. However, this method relies too heavily on idealized assumptions and fails to consider practical factors such as core nonlinearity and insulation aging, resulting in limited model accuracy. Furthermore, this method completely ignores transient interference factors such as lightning surges and switching operations, causing the model to fail to accurately reflect the dynamic characteristics of the device in complex actual operating environments, leading to significant deviations between the error prediction results and the actual situation.
[0006] 2) Statistical regression method: This method establishes a linear relationship model between single variables such as temperature and load and error using historical data. While easy to implement, this method only reflects the static relationship between a single environmental variable and error and cannot handle the coupling effects of multiple factors. Furthermore, the model's predictive accuracy drops sharply when the operating environment changes significantly. When facing new operating conditions, a large amount of new data needs to be collected for modeling, resulting in poor adaptability.
[0007] 3) Shallow machine learning methods: These methods employ algorithms such as BP neural networks and support vector machines to establish error prediction models through training. However, the limitations of this network structure result in insufficient dependency modeling capabilities for state-owned enterprises, requiring repeated training under different operating conditions. Furthermore, the cumulative error effect is significant in multi-step predictions, and prediction accuracy decreases noticeably after the prediction compensation exceeds a certain unit limit.
[0008] The differences compared to existing technologies are as follows:
[0009] Comparison with the technology of patent CN115438576A "An electronic voltage transformer error prediction method based on Prophet, self-attention mechanism and time series convolutional network".
[0010] Patent CN115438576A proposes an error prediction method for electronic voltage transformers based on Prophet, self-attention mechanism, and time-series convolutional networks. This method focuses on the prediction of ratio error in electronic voltage transformers. By combining the periodicity analysis capability of the Prophet model, the local feature extraction capability of the time-series convolutional network, and the feature enhancement effect of the self-attention mechanism, it achieves the prediction of the long-term error trend of electronic voltage transformers. This technical solution is applicable to the condition monitoring and maintenance of electronic voltage transformers in power systems, and is particularly suitable for handling error data with obvious periodic characteristics. This patent also proposes an error prediction method for capacitive voltage transformers based on mode decomposition and gated cyclic units. This method addresses the challenge of error prediction for capacitive voltage transformers under complex operating conditions. It decomposes non-stationary signals into multi-mode components through improved mode decomposition technology and achieves dynamic feature fusion by combining gated cyclic units and non-local attention mechanisms. This solution is particularly suitable for processing voltage transformer output signals containing transient responses and noise interference, and can effectively solve the error accumulation problem in multi-step prediction using traditional methods. The two have fundamentally different application scenarios and target objects.
[0011] Patent CN115438576A standardizes the ratio difference data of electronic voltage transformers and inputs it into the Prophet model, decomposing it into three basic components: a trend term, a periodic term, and a noise term. It extracts local temporal features using dilated causal convolutions and residual connections in a time-series convolutional network and weights them with the periodic term from the Prophet model. A self-attention mechanism is used to calculate feature weights, enhancing key information. Finally, a fully connected layer reduces the dimensionality of the output prediction result. By employing a cascaded structure of Prophet and a time-series convolutional network, it emphasizes the combination of the periodicity of the time series and local features. This patent employs an improved mode decomposition method combined with frequency-domain adaptive noise adjustment technology to decompose the output signal of a capacitive voltage transformer into multiple intrinsic mode function components. A feature evaluation system is constructed based on multi-scale information entropy theory to classify the components. Adaptive wavelet packet thresholding is used for high-entropy noise modes, preserving effective signal features while reducing noise. A hybrid architecture combining gated recurrent units and non-local attention is designed to dynamically balance local temporal features and global contextual information through a gating mechanism, enabling parallel prediction of each mode component. The prediction results of each component are then superimposed to obtain the final predicted output signal value. This patent decomposes the signal into multi-mode components through mode decomposition and employs a parallel modeling strategy, emphasizing multi-scale decomposition and global feature fusion. The two technical solutions are fundamentally different.
[0012] Patent CN115438576A utilizes the Prophet model, employing piecewise linear functions to fit the trend term and Fourier series to fit the periodic term. This model exhibits strong robustness to missing and outlier values and can identify and adapt to periodic changes across different time scales. In the time-series convolutional network, dilated causal convolutions expand the receptive field, and residual connections prevent gradient vanishing, enhancing local feature extraction capabilities. A self-attention mechanism calculates feature weights, strengthening the model's focus on key information, primarily using fixed-structure convolutions and attention mechanisms. The improved mode decomposition method in this patent dynamically adjusts the noise amplitude coefficient through frequency domain analysis, achieving adaptive optimization of signal decomposition results and significantly improving decomposition quality. Wavelet packet transform combined with a dynamic thresholding strategy is used for noise reduction, effectively suppressing noise while fully preserving the transient characteristics of the signal. When building the prediction model, gating coefficients are calculated in real-time, dynamically adjusting the fusion weights of local temporal features (GRU output) and global contextual information (attention output), enabling the model to automatically adjust its processing strategy based on the characteristics of the input signal. These two approaches differ fundamentally in their technical methods.
[0013] The Prophet model in patent CN115438576A demonstrates excellent fitting ability for error data with obvious periodic characteristics, capturing periodic variation patterns. Time-series convolutional networks, through multi-level feature abstraction, effectively extract nonlinear features from error signals, making them suitable for medium- to long-term predictions, but their ability to handle high-frequency noise and transient interference signals is relatively limited. This patent, through multi-modal signal decomposition, maintains stable performance even under complex conditions such as voltage fluctuations and harmonic interference; wavelet packet denoising technology can precisely suppress noise in different frequency bands, preserving key signal features while denoising; and the non-local attention mechanism overcomes the "memory length limitation" of traditional time-series models, establishing feature associations across time periods and significantly improving the accuracy of multi-step predictions. The two technologies differ fundamentally in their technical effects.
[0014] In summary, existing methods share common problems such as poor dynamic adaptability and severe accumulation of prediction errors. Particularly when dealing with complex nonlinear systems like CVTs, the prediction reliability of existing methods fails to meet the accuracy requirements of smart grids. Therefore, there is an urgent need to develop a data-driven voltage transformer error prediction method that deeply mines the time-correlation characteristics of the signal, effectively mitigating the error accumulation problem during the prediction process and achieving accurate prediction of error signals over multiple future time steps. Summary of the Invention
[0015] To address the problem of prediction error accumulation in CVTs under actual variable operating conditions, this invention proposes a voltage transformer error prediction method based on mode decomposition and gated cyclic units. This method decomposes the transformer output signal into multiple modes using an adaptive empirical mode decomposition method, classifies modes based on signal entropy, establishes multiple prediction models for different modes, and superimposes the multi-component prediction values into the prediction result, thereby achieving intelligent prediction of voltage transformer signals.
[0016] To achieve the above objectives, the technical solution adopted by the present invention is as follows:
[0017] A voltage transformer error prediction method based on mode decomposition and gated cyclic unit is characterized by the following steps:
[0018] Step 1: Read the original output signal of the voltage transformer by equal-interval sampling to establish a historical dataset;
[0019] Step 2: An improved adaptive noise fully integrated empirical mode decomposition method is adopted, which decomposes the non-stationary signal into several intrinsic mode function components through a white noise spectrum adaptive adjustment mechanism.
[0020] Step 3: Construct a feature evaluation system based on multi-scale information entropy, achieve accurate separation of feature modes through joint analysis of modal entropy value and frequency band, and use adaptive wavelet packet thresholding to denoise high-entropy noise modes while preserving the transient response characteristics of feature modes;
[0021] Step 4: Perform attention-enhanced signal prediction, design a hybrid architecture of gated recurrent unit and attention mechanism, capture local dynamic characteristics through spatiotemporal gating unit, and capture dependencies over long time intervals using nonlocal attention mechanism to achieve adaptive weight allocation of key features;
[0022] Step 5: Input each modal component in parallel into the signal prediction model for training and predict the future trend data of each component. The prediction results are superimposed as the output signal prediction value.
[0023] Step 6: Validate the model on the validation set and use multiple evaluation metrics to comprehensively evaluate the model performance.
[0024] As a further improvement of the present invention, the improved adaptive noise fully integrated empirical mode decomposition method in step 2 includes the following process:
[0025] (2-1) In the improved mode decomposition process, adaptive noise is added to the signal, and the noise amplitude adjustment coefficient is set to [value missing]. Used to add noise and generate white noise. It satisfies the condition that the mean is 0 and the variance is 1, where ;
[0026] Adding noise to generate a new signal: Noise is added next to generate a disturbance signal. :
[0027]
[0028] noise amplitude The selection of [the appropriate parameter] directly affects the purity and decomposition stability of the modal components. Frequency domain adaptive noise adjustment measurement is adopted.
[0029]
[0030] in Indicates signal frequency. Indicates the standard deviation of the signal;
[0031] (2-2) Perform EMD decomposition on each noise disturbance signal to obtain a set of IMF components and margins;
[0032]
[0033] in Indicates the first Number of IMFs in each decomposition. Indicates the first eigenmode components, Representing the residual component, the first IMF component of all noise disturbance signals is taken, and the average is calculated as... Update the signal, repeat the iteration, and extract the remaining modal components.
[0034] As a further improvement of the present invention, step 3 achieves accurate separation of characteristic modes through joint analysis of modal entropy value and frequency band, and adaptive wavelet packet thresholding is used to denoise high-entropy noise modes, including the following process:
[0035] (3-1) For the extracted multiple IMF components, calculate the signal entropy of each component and use the Gaussian kernel function to estimate the kernel density:
[0036]
[0037] in Indicates position Density estimate at [location] For sample size, For bandwidth, For kernel functions;
[0038] (3-2) For high-entropy mode components, wavelet packet transform combined with an adaptive thresholding method is used for denoising. The adaptive thresholding method preserves the detailed features of the signal during denoising, thus achieving a better balance between maintaining signal continuity and effective denoising. The calculation formula is:
[0039]
[0040] In the formula: These are wavelet packet coefficients; This is a global threshold calculated based on the noise level. As an adaptive factor, ;
[0041] Each wavelet packet coefficient Compare with the calculated threshold φ: If If so, then retain the coefficient; if If so, set the coefficient to zero to remove noise.
[0042] As a further improvement to the present invention, the attention-enhanced signal prediction model in step 4 includes the following process:
[0043] (4-1) The application of nonlocal attention mechanisms can help the model better understand and capture dependencies over long time intervals. Nonlocal attention operations calculate nonlocal block responses based on the relationships between nodes at different locations, i.e.:
[0044]
[0045] in The input timing signal; For position Enhanced features at the location; Calculate position and The strength of the association between the two points in time is used to quantify the dependency between them; Transform the input signal into a feature representation that is more suitable for attention aggregation; This is a normalization factor to prevent the attention weight factor from becoming ineffective due to different scales;
[0046] (4-2) By dynamically allocating weights through gating fusion, the non-local attention and GRU weights are adaptively balanced. Based on the input features at the current time step, it is determined in real time whether to rely on the global context or the local state. During the modeling process, through... Gating enables a smooth transition. The gating fusion unit adaptively mixes local and global features by calculating the gating value. The gating value calculation concatenates the local and global features into a matrix space, which is then mapped to the gating space through the weight matrix and bias vector.
[0047] Beneficial effects:
[0048] By employing an improved mode decomposition method, multi-scale feature extraction of the signal is achieved. Through signal entropy-guided modal component analysis and noise reduction, the influence of noise in strong electromagnetic interference environments is suppressed, improving the signal-to-noise ratio while preserving effective signal features. A gated fusion mechanism dynamically balances local temporal features and global contextual information, solving the error accumulation problem in long-term series prediction using traditional methods. Multimodal parallel processing can adapt to different operating conditions and maintain stable prediction performance even under complex conditions. This method is applicable to the field of online monitoring technology for voltage transformers, solving the problem of error prediction for voltage transformers under complex operating conditions. Attached Figure Description
[0049] Figure 1 This is a flowchart of the voltage transformer signal prediction method provided by the present invention;
[0050] Figure 2 This is the GRU prediction model based on a non-local attention mechanism proposed in this invention. Detailed Implementation
[0051] The present invention will be further described in detail below with reference to the accompanying drawings and specific embodiments:
[0052] This invention discloses a voltage transformer error prediction method based on mode decomposition and gated cyclic units. The method flow is as follows: Figure 1 As shown, the GRU prediction model based on the non-local attention mechanism is as follows: Figure 2 As shown, the specific steps include the following:
[0053] Step 1: Historical Output Signal Acquisition. Acquire the output signals of the voltage transformer. Label and classify these signals for subsequent analysis and diagnosis.
[0054] Step 2: Adaptive Signal Decomposition. Adaptive noise is introduced based on the signal standard deviation to improve the empirical mode decomposition method for complete noise sets. This decomposes the noisy signal into multiple intrinsic mode components. Specific steps include:
[0055] (2-1) During the CEEMDAN decomposition process, adaptive noise is added to the signal. The noise amplitude adjustment coefficient is set to... Used to add noise and generate white noise. It satisfies the condition that the mean is 0 and the variance is 1, where .
[0056] Adding noise to generate a new signal: Noise is added next to generate a disturbance signal. :
[0057]
[0058] noise amplitude The selection of [the appropriate modal component] directly affects the purity and decomposition stability of the modal components. This paper employs frequency-domain adaptive noise adjustment measurement.
[0059]
[0060] in Indicates signal frequency. It represents the standard deviation of the signal.
[0061] (2-2) Perform EMD decomposition on each noise disturbance signal to obtain a set of IMF components and margins.
[0062]
[0063] in Indicates the first Number of IMFs in each decomposition. Indicates the first eigenmode components, Represents the residual component. Take the first IMF component of all noise perturbation signals and calculate its average as... Update the signal, repeat the iteration, and extract the remaining modal components.
[0064] Step 3: Multimodal Feature Extraction and Classification. A feature evaluation system is constructed based on information entropy theory. The information entropy of each component is calculated, identifying the high-entropy component of the noise-dominated mode, the medium-entropy component of the feature-information mode, and the low-entropy component of the trend component. Wavelet thresholding is used for noise reduction of the high-entropy component. Specific steps include:
[0065] (3-1) For the extracted multiple IMF components, calculate the signal entropy of each component and use the Gaussian kernel function to estimate the kernel density:
[0066]
[0067] in Indicates position Density estimate at [location] For sample size, For bandwidth, This is the kernel function.
[0068] (3-2) For high-entropy mode components, wavelet packet transform combined with an adaptive thresholding method is used for denoising. The adaptive thresholding method preserves the detailed features of the signal during denoising, thus achieving a better balance between maintaining signal continuity and effective denoising. The calculation formula is:
[0069]
[0070] In the formula: These are wavelet packet coefficients; This is a global threshold calculated based on the noise level. As an adaptive factor, .
[0071] Each wavelet packet coefficient Compare with the calculated threshold φ: If If so, then retain the coefficient; if If so, set the coefficient to zero to remove noise.
[0072] Step 4: Construct an attention-enhanced signal prediction model. A gated recurrent unit network is used, introducing a non-local attention mechanism to capture dependencies over long time intervals, thereby improving the model's ability to focus on important features and its long-term memory capacity. The specific steps are as follows:
[0073] (4-1) The application of nonlocal attention mechanisms can help the model better understand and capture dependencies over long time intervals, thereby improving prediction accuracy. Nonlocal attention operations compute nonlocal block responses based on the relationships between nodes at different locations, i.e.:
[0074]
[0075] in The input timing signal; For position Enhanced features at the location; Calculate position and The strength of the association between the two points in time is used to quantify the dependency between them; Transform the input signal into a feature representation that is more suitable for attention aggregation; This is a normalization factor to prevent the attention weight factor from becoming ineffective due to different scales.
[0076] (4-2) The Gated Recurrent Unit (GRU) is a simplified variant of the recurrent neural network derived from LSTM. The improved GRU prediction model's basic network architecture dynamically allocates weights through gating fusion, adaptively balancing the weights of non-local attention (global features) and GRU (local temporal features). Based on the input features at the current time step, it decides in real-time whether to rely on the global context or the local state. During the modeling process, through... Gating enables a smooth transition, avoiding the instability caused by hard switching. The gating fusion unit adaptively blends local and global features by calculating gating values. The gating value calculation concatenates local and global features into a matrix space, which is then mapped to the gating space through a weight matrix and a bias vector.
[0077] Step 5: Multimodal model training and optimization. Input each modal component into the signal prediction model for training and predict the future trend data of each component. The prediction results are superimposed as the output signal prediction value.
[0078] Step 6: Performance Evaluation and Validation. Validate the model on the validation set, using multiple evaluation metrics to comprehensively assess its performance and ensure its accuracy and stability. The specific steps are as follows:
[0079] (6-1) The reliability of the model is evaluated using the mean absolute error (MAE) and root mean square error (RMSE) indices.
[0080] Mean Absolute Error (MAE) measures the average absolute deviation between predicted and actual values. It directly reflects the magnitude of the prediction error and is suitable for evaluating the overall predictive stability of a prediction model. The calculation formula is:
[0081]
[0082] in Indicates the first The true value of each sample; Indicates the first Predicted values for each sample; This represents the number of samples.
[0083] The root mean square error (RMSE) measures the deviation between predicted and true values, providing a more rigorous assessment of model stability and preventing extreme errors. The calculation formula is as follows:
[0084]
[0085] The improved NL-GRU model can effectively enhance the model's dependence on long-term series, with less bias in the prediction curve and a gradual stabilization in the later stages of prediction.
[0086] The above description is merely a preferred embodiment of the present invention and is not intended to limit the present invention in any other way. Any modifications or equivalent changes made based on the technical essence of the present invention shall still fall within the scope of protection claimed by the present invention.
Claims
1. A voltage transformer error prediction method based on mode decomposition and gated cyclic unit, characterized in that, Includes the following steps: Step 1: Read the original output signal of the voltage transformer by equal-interval sampling to establish a historical dataset; Step 2: An improved adaptive noise fully integrated empirical mode decomposition method is adopted, which decomposes the non-stationary signal into several intrinsic mode function components through a white noise spectrum adaptive adjustment mechanism. The improved adaptive noise fully integrated empirical mode decomposition method in step 2 includes the following process: (2-1) In the improved mode decomposition process, adaptive noise is added to the signal, and the noise amplitude adjustment coefficient is set to... Used to add noise and generate white noise. It satisfies the condition that the mean is 0 and the variance is 1, where ; Adding noise to generate a new signal: Noise is added next to generate a disturbance signal. : ; noise amplitude The selection of [the appropriate parameter] directly affects the purity and decomposition stability of the modal components. Frequency domain adaptive noise adjustment measurement is adopted. ; in Indicates signal frequency. Indicates the standard deviation of the signal; (2-2) Perform EMD decomposition on each noise disturbance signal to obtain a set of IMF components and margins; ; in Indicates the first Number of IMFs in each decomposition. Indicates the first eigenmode components, Representing the residual component, the first IMF component of all noise disturbance signals is taken, and the average is calculated as... Update the signal, repeat the iteration, and extract the remaining modal components; Step 3: Construct a feature evaluation system based on multi-scale information entropy, achieve accurate separation of feature modes through joint analysis of modal entropy value and frequency band, and use adaptive wavelet packet thresholding to denoise high-entropy noise modes while preserving the transient response characteristics of feature modes; Step 3 achieves accurate separation of characteristic modes through joint analysis of modal entropy and frequency bands. Adaptive wavelet packet thresholding is used for noise reduction of high-entropy noise modes, including the following process: (3-1) For the extracted multiple IMF components, calculate the signal entropy of each component and use the Gaussian kernel function to estimate the kernel density: ; in Indicates position Density estimate at [location] For sample size, For bandwidth, For kernel functions; (3-2) For high-entropy mode components, wavelet packet transform combined with an adaptive thresholding method is used for denoising. The adaptive thresholding method preserves the detailed features of the signal during denoising, thus achieving a better balance between maintaining signal continuity and effective denoising. The calculation formula is: ; In the formula: These are wavelet packet coefficients; This is a global threshold calculated based on the noise level. As an adaptive factor, ; Each wavelet packet coefficient Compare with the calculated threshold φ: If If so, then retain the coefficient; if If so, set the coefficient to zero to remove noise; Step 4: Perform attention-enhanced signal prediction, design a hybrid architecture of gated recurrent unit and attention mechanism, capture local dynamic characteristics through spatiotemporal gating unit, and capture dependencies over long time intervals using nonlocal attention mechanism to achieve adaptive weight allocation of key features; Step 5: Input each modal component in parallel into the signal prediction model for training and predict the future trend data of each component. The prediction results are superimposed as the output signal prediction value. Step 6: Validate the model on the validation set and use multiple evaluation metrics to comprehensively evaluate the model performance.
2. The voltage transformer error prediction method based on mode decomposition and gated cyclic unit according to claim 1, characterized in that, The attention-enhanced signal prediction model in step 4 includes the following process: (4-1) The application of nonlocal attention mechanisms can help the model better understand and capture dependencies over long time intervals. Nonlocal attention mechanisms calculate nonlocal block responses based on the relationships between nodes at different locations, i.e.: ; in The input timing signal; For position Enhanced features at the location; Calculate position and The strength of the association between the two points in time is used to quantify the dependency between them; Transform the input signal into a feature representation that is more suitable for attention aggregation; This is a normalization factor to prevent the attention weight factor from becoming ineffective due to different scales; (4-2) By dynamically allocating weights through gating fusion, the non-local attention and GRU weights are adaptively balanced. Based on the input features at the current time step, it is determined in real time whether to rely on the global context or the local state. During the modeling process, through... Gating enables a smooth transition. The gating fusion unit adaptively mixes local and global features by calculating the gating value. The gating value calculation concatenates the local and global features into a matrix space, which is then mapped to the gating space through the weight matrix and bias vector.