Distribution box operation fault monitoring system based on specific perception reinforcement learning

What is AI technical title?
AI technical title is built by PatSnap AI team. It summarizes the technical point description of the patent document.
By using a specific perception reinforcement learning system, and utilizing a TCN-LSTM hybrid coding network and a specific perception dual-branch decoupled autoencoder, the problem of accurate identification of ultra-early faults in distribution boxes was solved, achieving high accuracy and stability in fault monitoring.

CN122243470APending Publication Date: 2026-06-19ZHEJIANG ZHENGYE ELECTRIC POWER TECHNOLOGY CO LTD

View PDF 0 Cites 0 Cited by

Patent Information

Authority / Receiving Office: CN · China
Patent Type: Applications(China)
Current Assignee / Owner: ZHEJIANG ZHENGYE ELECTRIC POWER TECHNOLOGY CO LTD
Filing Date: 2026-03-23
Publication Date: 2026-06-19

Smart Images

Figure CN122243470A_ABST

Patent Text Reader

Abstract

This invention discloses a distribution box operation fault monitoring system based on anomalous perception reinforcement learning, including a data acquisition and processing module, a feature construction module, a coupling and decoupling module, and a decision optimization module. The data acquisition and processing module preprocesses the data. In the feature construction module, associated features are extracted using a hybrid coding network. In the coupling and decoupling module, a specific perception dual-branch decoupled autoencoder is designed. The dual-branch structure decouples the associated features and provides dual adaptive enhancement of channels and space. Independent constraint loss is calculated based on the Hilbert-Schmidt independence criterion, and specific enhancement loss is calculated based on the contrastive loss concept to improve the decoupling accuracy of the autoencoder. The decision optimization module, tailored to the distribution box fault monitoring scenario, customizes the state space, action space, and reward function of the anomalous perception reinforcement learning network and optimizes it using the PPO algorithm to output the optimal decision action, achieving accurate identification of distribution box faults in their very early stages.

Need to check novelty before this filing date? Find Prior Art

Description

Technical Field

[0001] This invention relates to the field of distribution boxes, and in particular to a distribution box operation fault monitoring system based on specific perception reinforcement learning. Background Technology

[0002] Distribution boxes are core equipment at the end of power distribution systems. Their operational stability directly determines the reliability of downstream power supply and is a key link in preventing electrical fires and equipment damage. With the advancement of new power system construction, distributed power sources and nonlinear loads are being connected on a large scale, making the operating conditions of low-voltage distribution networks increasingly complex and the fault modes more diverse. This places extremely high demands on the accuracy, real-time performance, and dynamic adaptability of fault monitoring technology.

[0003] Most existing technologies employ traditional threshold monitoring and conventional deep learning schemes, which cannot capture weak features of very early faults such as loose connection points and conductor microcracks. Harmonics and fault features are not completely decoupled, resulting in high false alarm and false negative rates under strong interference. In addition, existing technologies do not optimize deep learning models for power distribution scenarios when performing fault monitoring, leading to poor model adaptability.

[0004] Therefore, it is necessary to design a power distribution box operation fault monitoring system based on special perception reinforcement learning to improve the operational stability of power distribution equipment and the accuracy of fault monitoring. Summary of the Invention

[0005] To address the shortcomings of existing technologies, this invention proposes a fault monitoring system for distribution boxes based on specific perception reinforcement learning, in order to improve the operational stability of power distribution equipment and the accuracy of fault monitoring.

[0006] The technical solution to achieve the purpose of this invention is as follows:

[0007] A fault monitoring system for distribution boxes based on special perception reinforcement learning, including:

[0008] The data acquisition and processing module triggers the smart sensors at fixed intervals to collect continuous data during the operation of the power distribution box. Multi-source data for each period For the first one Multi-source data collected in each cycle This includes electrical data Non-electrical data The collected multi-source data were preprocessed, with outlier removal using box plots, noise reduction using Gaussian filtering, and data standardization using Z-scores to unify the data dimensions, resulting in the [data type]. Standard multi-source data for each period , to obtain continuous Multidimensional temporal feature matrix of each period ;

[0009] The feature construction module uses a TCN-LSTM hybrid coding network to construct multi-dimensional temporal feature matrices. Extracting temporal features The multi-dimensional time-series feature matrix is transformed by short-time Fourier transform. The electrical data is converted into a time-frequency spectrum and frequency domain features are extracted from the time-frequency spectrum using a convolutional neural network. ; Time-domain features and frequency domain features After concatenation, the associated features are obtained by mapping through a fully connected layer. ;

[0010] The coupling / decoupling module design specifically perceptually uses a dual-branch decoupling autoencoder to process associated features through a shared coding layer. Encoding yields shared feature vectors Subsequently, background harmonic components are extracted through the decoupling branch in the dual-branch coding layer. Normal operating condition components Pure fault components And utilize specific enhancement branches for pure fault components Enhancement is performed to obtain enhanced pure fault components. ; background harmonic components Normal operating condition components Enhance the purity of fault components Concatenating along the feature dimension and processing through a fully connected layer yields the reconstructed features. Based on enhancing pure fault components Generate initial fault monitoring results And calculate the fault specificity score. and fault specificity level A multi-dimensional constraint loss function is introduced to optimize the specific perception dual-branch decoupled autoencoder;

[0011] The decision optimization module will use the initial fault monitoring results. The input is fed into a special perception reinforcement learning network. For distribution box fault monitoring, the state space, action space, and reward function are specifically designed. At the same time, the PPO algorithm is used to optimize the special perception reinforcement learning network. The output includes the final fault identification result, the fault evolution trend prediction result, and the optimal decision action of differentiated handling instructions.

[0012] Furthermore, the feature construction module constructs features from a multi-dimensional temporal feature matrix. Extracting related features Includes the following steps:

[0013] Multi-dimensional time series feature matrix The input is fed into the TCN-LSTM hybrid coding network, and local temporal features are extracted through the three causal convolutional layers in the TCN causal convolutional layer. ; Multi-dimensional time series feature matrix The input is fed into an LSTM layer, and through a two-layer bidirectional LSTM structure, global correlation features are obtained. ; local temporal features Globally related features Temporal features obtained by splicing ;

[0014] Multi-dimensional time series feature matrix The electrical data is converted into a time-frequency spectrum using a short-time Fourier transform. The input is fed into a convolutional neural network, where frequency domain features are obtained through a 3-layer convolutional-pooling cascade structure. ;

[0015] Time domain features and frequency domain features After concatenation, associated features are obtained through mapping using two layers of MLP. ;

[0016] Furthermore, the multi-dimensional time series feature matrix After being input into the TCN-LSTM hybrid coding network, for the first... Standard multi-source data for each period Local temporal features are extracted using a 3-layer dilated causal convolutional structure through the TCN causal convolutional layer. ;

[0017] Subsequently, the multi-dimensional time series feature matrix will be... The input is fed into an LSTM layer, where a bidirectional LSTM structure is used to obtain the forward hidden state and the backward hidden state respectively. The forward hidden state and the backward hidden state are concatenated to obtain the global association feature. ;

[0018] Local temporal features Globally related features Temporal features are obtained by concatenating the data and reducing its dimensionality through a linear layer. ;

[0019] Furthermore, the multi-dimensional time series feature matrix The electrical data is converted into a time-frequency spectrum using a short-time Fourier transform. Specifically, for the first Standard multi-source data for each period Standard electrical data is Using short-time Fourier transform to... Standard electrical data for each cycle Convert to the first Time-frequency spectrum corresponding to each period ; will the first Time-frequency spectrum corresponding to each period The input is fed into a convolutional neural network, passing through a 3-layer convolutional-pooling cascade structure, with a specific attention enhancement module embedded within the convolutional-pooling cascade structure. After processing by the 3 convolutional-pooling layers, it flows from... Extract the frequency domain feature map, flatten the frequency domain feature map along the spatial and channel dimensions to obtain a one-dimensional feature vector, and then reduce the dimensionality through a fully connected layer to obtain the second... Frequency domain characteristics of each period Then, through the above steps, we obtain... Frequency domain characteristics corresponding to each period ;

[0020] Time domain features and frequency domain features After concatenation, the coupling correlation between the time domain and frequency domain is mined through two layers of MLP to obtain the correlation features. .

[0021] Furthermore, the coupling / decoupling module calculates fault-specific scores using a specific sensing dual-branch decoupling autoencoder. and fault specificity level This includes the following steps:

[0022] The specific-aware dual-branch decoupled autoencoder consists of a shared coding layer, a dual-branch coding layer, a fusion reconstruction layer, and a fault prediction layer, which integrates related features. The input is fed into a specificity-aware dual-branch decoupled autoencoder, where associated features are first processed through a shared coding layer. Compressing and eliminating redundant information yields a shared feature vector. ;

[0023] Then share the feature vector The input is fed into a dual-branch coding layer, which includes a decoupling branch and a specificity enhancement branch; the decoupling branch is based on the idea of latent space separation from the shared feature vector. Extracting background harmonic components Normal operating condition components Pure fault components The specific enhancement branch utilizes a convolutional neural network to analyze pure fault components from both channel and spatial dimensions. Perform dual adaptive enhancement to obtain enhanced pure fault components. ;

[0024] Background harmonic components Normal operating condition components Enhance the purity of fault components The concatenation along the feature dimension is fed into the fusion and reconstruction layer, where feature mapping is performed through multiple fully connected layers to obtain the reconstructed features. ;

[0025] In the fault prediction layer, based on enhanced pure fault components Generate initial fault monitoring results And calculate the fault specificity score. and fault specificity level A multi-dimensional constraint loss function is introduced to optimize it;

[0026] Furthermore, the specific-aware dual-branch decoupled autoencoder employs a shared coding layer-dual-branch coding layer-fusion reconstruction layer-fault prediction layer architecture to separate harmonic features and fault features, and enhances ultra-early fault features. Simultaneously, a multi-dimensional constraint loss function is introduced to optimize the autoencoder parameters. Specifically, firstly, the associated features... The input is fed into a shared encoding layer, where convolutional operations and nonlinear mappings are used to associate the features. Compressed into shared feature vectors ;

[0027] Then the shared feature vectors will be used. The input is fed into a two-branch coding layer, which includes a decoupling branch and a specificity enhancement branch. The two branches jointly process the shared feature vector. Specifically, the decoupling branch is based on the latent space separation idea of the autoencoder, which separates the shared feature vectors. Decoupling yields background harmonic components Normal operating condition components Pure fault components ;

[0028] The specific enhancement branch employs a convolutional neural network with an embedded specific enhancement attention module to target pure fault components. Channel attention weighting is performed by extracting global features for each feature channel in the clean fault component through global average pooling and global max pooling operations. This is then processed by a multilayer perceptron to generate channel attention weights, which are then used to weight the clean fault component. Weighted summation is used to obtain channel fault characteristics. Subsequently, the channel fault characteristics were identified. Spatial attention weighting is performed, along with global average pooling and global max pooling along the channel dimension, and then fused through convolution to obtain spatial attention weights. These spatial attention weights are then used to analyze channel fault features. Weighting is applied to strengthen the spatial region corresponding to the distribution box fault and weaken the weight of the background interference region, ultimately resulting in an enhanced pure fault component. ;

[0029] The fusion reconstruction layer receives the background harmonic components from the decoupled branch output. Normal operating condition components And the enhanced pure fault component of the specific enhanced branch output. The reconstructed features are obtained by concatenating the three components and then reconstructing them through multiple fully connected layers. ;

[0030] This will enhance the pure fault component. The input is fed into the fault prediction layer, where fault prediction is performed using MLP to obtain the initial fault monitoring results. ;

[0031] Furthermore, to improve the encoding and decoding capabilities of the specific perception dual-branch decoupled autoencoder and achieve accurate decoupling of features, a multi-dimensional constraint loss function is introduced to perform four-dimensional synchronous optimization of the specific perception dual-branch decoupled autoencoder. The total loss function is obtained by weighted summation of reconstruction loss, independent constraint loss, specificity enhancement loss, and physical constraint loss.

[0032] Specifically, regarding reconstruction loss The reconstructed features are calculated using the mean squared error loss method. Related features The deviation between them is used to constrain the specific perception of the dual-branch decoupled autoencoder to accurately reconstruct the associated features and avoid the loss of effective information;

[0033] For independent constraint loss The background harmonic components were calculated pairwise using the Hilbert-Schmidt independence criterion. Normal operating condition components Pure fault components The independence between them is ensured to guarantee the accuracy of the decoupling operation;

[0034] For specificity enhancement loss First, the pure fault component will be enhanced. Fault-specific scoring is achieved through single-layer MLP mapping. At the same time, based on the fault specificity score The size of the fault is used to classify the specificity level. For example, when fault specificity scoring When it is less than 0.5, its fault specificity level Set to 0; when the fault specificity score is... When the value is greater than or equal to 0.5 and less than 0.7, its fault specificity level is... Set to 1; when the fault specificity score is... When the value is greater than or equal to 0.7 and less than 0.9, its fault specificity level is... Set to 2; when the fault specificity score is... When the value is greater than or equal to 0.9, its fault specificity level is... Set to 3; subsequently, use contrastive loss based on fault-specific scoring. Constructing specific enhancement loss ;

[0035] For physical constraint loss Based on the physical prior laws of distribution box faults, the background harmonic components are analyzed. Pure fault components Constraints are applied to ensure that the components obtained after decoupling conform to physical laws and to avoid meaningless feature mappings.

[0036] Furthermore, the decision optimization module uses the optimal decision action output from a special perception reinforcement learning network, including the following steps:

[0037] To address the needs of fault monitoring and handling in distribution boxes, the state space, action space, and reward function are customized to enhance the purity of fault components. Fault specificity score Fault specificity level Initial fault monitoring results Normal operating condition components The state vectors in the state space are obtained by concatenation. Three types of action sets are designed in the action space. At the same time, a multi-dimensional dynamic reward function driven by specific perception is constructed.

[0038] The special perception reinforcement learning network consists of a policy network and a value network. The value network evaluates the value of the current state, and then the policy network optimizes the action space to determine the optimal decision action.

[0039] Furthermore, due to the state space To provide reinforcement learning with full-dimensional state information, and to ensure the state space conforms to the actual scenario of the current distribution box, the state vector is obtained by concatenating the four features from the aforementioned coupling and decoupling module; subsequently, an action space is designed for reinforcement learning. Based on the needs of distribution box fault monitoring and handling, three sets of actions were customized and designed, including fault identification action set, trend prediction action set, and handling instruction action set; in the completed state space and action space After design, a multi-dimensional dynamic reward function driven by specific perception is constructed. The total reward function is calculated by balancing the specificity of fault identification, identification accuracy, physical compliance, and handling optimization.

[0040] For specific rewards The ability of a reinforcement learning network to identify very early and subtle faults is calculated based on the change in fault-specific scores.

[0041] Rewards for recognition accuracy It improves the accuracy of the special perception reinforcement learning network in identifying fault type and fault location by calculating the matching degree between the fault identification result and the real fault label.

[0042] For physical compliance rewards It is calculated based on whether the decision-making actions output by the special perception reinforcement learning network conform to physical laws and safety regulations;

[0043] Regarding the handling and optimization of penalties It penalizes over-processing, ineffective processing, and illegal processing of the decision-making actions output by the special perception reinforcement learning network in order to balance the effectiveness of power distribution box fault handling and power supply reliability.

[0044] Furthermore, a value network is used to evaluate the state value of the current state vector, and the advantage function of the current state vector is calculated based on the state value. The PPO algorithm is then used in the policy network to optimize the action space and output the optimal decision action. Specifically, firstly, the TCN-LSTM hybrid coding network is reused to accept the current... The state vector of each period And extract temporal correlation information and core decision features from it, and output a state feature vector; the first The state feature vector of each cycle is input into the value network, and the value of the current state feature vector is evaluated through multiple fully connected layers to obtain the state value. Then, the advantage function of the current state vector is calculated. Subsequently, the advantage function is based on the current state vector. Determine the pruning objective function of the PPO algorithm ;

[0045] After optimizing the special perception reinforcement learning network through the above steps, the optimal decision action is determined, including the final fault identification result, the fault evolution trend prediction result, and the differentiated handling instruction.

[0046] Compared with existing technologies, this invention preprocesses data through an acquisition and processing module, extracts associated features through a hybrid coding network in the feature construction module, designs a specific perception dual-branch decoupled autoencoder in the coupling and decoupling module, decouples associated features through a dual-branch structure and performs dual adaptive enhancement of channels and space, and calculates independent constraint loss and specific enhancement loss based on the Hilbert-Schmidt independence criterion and contrastive loss respectively to improve the decoupling accuracy of the autoencoder. In the decision optimization module, the state space, action space and reward function of the specific perception reinforcement learning network are customized for distribution box fault monitoring, and the PPO algorithm is used for optimization to output the optimal decision action, thus realizing the accurate identification of ultra-early faults in distribution boxes. Attached Figure Description

[0047] Figure 1 This is a framework diagram of a power distribution box operation fault monitoring system based on special perception reinforcement learning.

[0048] Figure 2 Flowchart of the feature module operation;

[0049] Figure 3 The flowchart for the operation of the coupling / decoupling module;

[0050] Figure 4 Flowchart for the decision optimization module. Detailed Implementation

[0051] The present invention will be further described in detail below with reference to the accompanying drawings and embodiments.

[0052] Example 1

[0053] like Figure 1 As shown, a specific embodiment of the present invention, a power distribution box operation fault monitoring system based on specific perception reinforcement learning, includes:

[0054] The data acquisition and processing module triggers the smart sensors at fixed intervals to collect continuous data during the operation of the power distribution box. Multi-source data for each period For the first one Multi-source data collected in each cycle This includes electrical data. Non-electrical data The electrical data refers to the three-phase voltage, three-phase current, zero-sequence voltage, and zero-sequence current data during the operation of the distribution box. The non-electrical data refers to the temperature and humidity data collected by various sensors inside the distribution box. The collected multi-source data is preprocessed, with outlier removal using box plots, noise reduction using Gaussian filtering, and data standardization using Z-score normalization to obtain the final data. Standard multi-source data for each period , and thus continuous Standard multi-source data from each period are integrated into a multi-dimensional time-series feature matrix. ;

[0055] The feature construction module uses a TCN-LSTM hybrid coding network to construct multi-dimensional temporal feature matrices. Extracting temporal features The multi-dimensional time-series feature matrix is transformed by short-time Fourier transform. The electrical data is converted into a time-frequency spectrum and frequency domain features are extracted from the time-frequency spectrum using a convolutional neural network. ; Time-domain features and frequency domain features After concatenation, the associated features are obtained by mapping through a fully connected layer. ;

[0056] The coupling / decoupling module design specifically perceptually uses a dual-branch decoupling autoencoder to process associated features through a shared coding layer. Encoding yields shared feature vectors Subsequently, background harmonic components are extracted through the decoupling branch in the dual-branch coding layer. Normal operating condition components Pure fault components And utilize specific enhancement branches for pure fault components Enhancement is performed to obtain enhanced pure fault components. ; background harmonic components Normal operating condition components Enhance the purity of fault components Concatenating along the feature dimension and processing through a fully connected layer yields the reconstructed features. Based on enhancing pure fault components Generate initial fault monitoring results And calculate the fault specificity score. and fault specificity level A multi-dimensional constraint loss function is introduced to optimize the specific perception dual-branch decoupled autoencoder;

[0057] The decision optimization module will use the initial fault monitoring results. The input is fed into a special perception reinforcement learning network. For distribution box fault monitoring, the state space, action space, and reward function are specifically designed. At the same time, the PPO algorithm is used to optimize the special perception reinforcement learning network. The output includes the final fault identification result, the fault evolution trend prediction result, and the optimal decision action of differentiated handling instructions.

[0058] Furthermore, such as Figure 2 As shown, the feature construction module constructs features from a multi-dimensional temporal feature matrix. Extracting related features Includes the following steps:

[0059] Multi-dimensional time series feature matrix The input is fed into a TCN-LSTM hybrid coding network, where the three causal convolutional layers in the TCN causal convolutional layer capture local temporal correlations between adjacent periods and extract local temporal features. ; Multi-dimensional time series feature matrix The input is fed into an LSTM layer, where a two-layer bidirectional LSTM structure captures long-distance dependencies across consecutive periods, extracts the progressive evolution of fault degradation, and yields global correlation features. ; local temporal features Globally related features Temporal features obtained by splicing ;

[0060] Multi-dimensional time series feature matrix The electrical data is converted into a time-frequency spectrum using a short-time Fourier transform. The input is fed into a convolutional neural network, and a specificity-enhancing attention module is embedded in a 3-layer convolutional-pooling cascade structure to strengthen the fault specificity of the features. The output of the convolutional-pooling cascade structure is then passed through a fully connected layer to reduce the dimensionality of the frequency domain features. ;

[0061] Time domain features and frequency domain features After concatenation, associated features are obtained through mapping using two layers of MLP. ;

[0062] Furthermore, the multi-dimensional time series feature matrix After being input into the TCN-LSTM hybrid coding network, for the first... Standard multi-source data for each period First, local temporal correlations between adjacent cycles are extracted through TCN causal convolutional layers; specifically, a three-layer dilated causal convolutional structure is adopted, which gradually expands the receptive field through exponentially increasing dilation coefficients. For example, the first... The cycle is in the first The formula for calculating layer dilated causal convolution is as follows:

[0063] ,

[0064] in, Indicates the first In the layer dilated causal convolution, the first The weight matrix of each convolutional kernel. Indicates the first The output of the layer dilated causal convolution is the first layer. Layer input, Indicates the first The coefficient of thermal expansion of the layer, Indicates the first The bias vector of the layer is obtained by calculating the bias vector of each period through the formula. After extracting local temporal correlations using layer dilated causal convolution, the data is concatenated periodically to obtain a multi-dimensional temporal feature matrix. In the Output of layer dilated causal convolution It is important to note that the input of the first layer... That is, multi-dimensional temporal feature matrix The output of layer 3 Local temporal features of the TCN causal convolutional layer output are obtained after processing with the ReLU activation function. ;

[0065] Subsequently, the multi-dimensional time series feature matrix will be... The input is fed into the LSTM layer, where a gating mechanism captures long-distance dependencies across consecutive cycles. A bidirectional LSTM structure is employed to utilize past-future and future-past bidirectional temporal information to more comprehensively extract the progressive evolutionary patterns of fault degradation. Specifically, for the forward LSTM, input gates, forget gates, and output gates are used to generate the forward hidden state corresponding to the current time step based on the input information at the current time step and the hidden state at the previous time step. Similarly, for the backward LSTM, the backward hidden state corresponding to the current time step is obtained. The forward and backward hidden states are then concatenated to obtain the global correlation features. ;

[0066] Local temporal features Globally related features Temporal features are obtained by concatenating the data and reducing its dimensionality through a linear layer. ;

[0067] Furthermore, the multi-dimensional time series feature matrix The electrical data is converted into a time-frequency spectrum using a short-time Fourier transform. Specifically, for the first Standard multi-source data for each period Its standard electrical data is Using short-time Fourier transform to... Standard electrical data for each cycle Convert to the first Time-frequency spectrum corresponding to each period ;Will The input is fed into a convolutional neural network (CNN), which consists of three convolutional-pooling cascaded layers. For each convolutional-pooling cascaded layer, intermediate features are first extracted through two-dimensional convolution. A specific attention enhancement module is then introduced to perform global average pooling and global max pooling on the intermediate features to obtain the channel average vector and channel maximum vector, respectively. These two vectors are input into a single-layer MLP, and the results are summed to obtain the channel attention weights. The channel attention weights are then multiplied by the intermediate features channel by channel to obtain the channel attention output. Next, global average pooling and global max pooling are performed on the channel attention output along the channel dimension to obtain the spatial average feature and spatial maximum feature, respectively. These two features are then concatenated along the channel dimension and processed by a... The convolutional layers process the data to obtain a spatial attention weight map. This spatial attention weight map is then multiplied spatially by intermediate features before being input into the pooling layers of the convolutional-pooling cascade structure for max pooling to reduce the feature map size. Based on these steps, a 3-layer convolutional-pooling cascade structure is used to... Extract the frequency domain feature map, flatten the frequency domain feature map along the spatial and channel dimensions to obtain a one-dimensional feature vector, and then reduce the dimensionality through a fully connected layer to obtain the second... Frequency domain characteristics of each period Then, through the above steps, we obtain... Frequency domain characteristics corresponding to each period ;

[0068] Time domain features and frequency domain features After concatenation, the coupling correlation between the time domain and frequency domain is mined through two layers of MLP to obtain the correlation features. .

[0069] Furthermore, such as Figure 3 As shown, the coupling / decoupling module calculates the fault specificity score through a specific sensing dual-branch decoupling autoencoder. and fault specificity level This includes the following steps:

[0070] The specific-aware dual-branch decoupled autoencoder consists of a shared coding layer, a dual-branch coding layer, a fusion reconstruction layer, and a fault prediction layer, which integrates related features. The input is fed into a specificity-aware dual-branch decoupled autoencoder, where associated features are first processed through a shared coding layer. Compressing and eliminating redundant information yields a shared feature vector. ;

[0071] Then share the feature vector The input is fed into a dual-branch coding layer, which includes a decoupling branch and a specificity enhancement branch; the decoupling branch is based on the idea of latent space separation from the shared feature vector. Extracting background harmonic components Normal operating condition components Pure fault components The specific enhancement branch utilizes a convolutional neural network to analyze pure fault components from both channel and spatial dimensions. A dual adaptive enhancement method is employed to obtain an enhanced pure fault component by amplifying the weak specific characteristics of ultra-early faults. ;

[0072] Finally, the background harmonic components Normal operating condition components Enhance the purity of fault components The concatenation along the feature dimension is fed into the fusion and reconstruction layer, where feature mapping is performed through multiple fully connected layers to obtain the reconstructed features. ;

[0073] For a specific sensing dual-branch decoupled self-encoder, based on enhancing pure fault components Generate initial fault monitoring results And calculate the fault specificity score. and fault specificity level A multi-dimensional constraint loss function is introduced to optimize it;

[0074] Furthermore, the specific-aware dual-branch decoupled autoencoder employs a shared coding layer-dual-branch coding layer-fusion reconstruction layer-fault prediction layer architecture to accurately separate harmonic features and fault features, and enhances ultra-early fault features. Simultaneously, a multi-dimensional constraint loss function is introduced to optimize the autoencoder parameters. Specifically, the associated features are first... The input is fed into a shared encoding layer, where convolutional operations and nonlinear mappings are used to associate the features. Compressed into shared feature vectors The calculation formula is as follows:

[0075] ,

[0076] in, The non-linear mapping function representing the shared coding layer consists of convolutional layers, non-linear activation functions, and fully connected layers.

[0077] Then the shared feature vectors will be used. The input is fed into a two-branch coding layer, which includes a decoupling branch and a specificity enhancement branch. The two branches jointly process the shared feature vector. This achieves feature decoupling and early-stage fault feature enhancement; specifically, the decoupling branch is based on the latent space separation idea of autoencoders, which separates shared feature vectors. Decoupling yields background harmonic components Normal operating condition components Pure fault components The calculation formula is as follows:

[0078] ,

[0079] in, The encoding mapping function representing the decoupling branch consists of multiple fully connected layers and a nonlinear activation function, with background harmonic components. This represents the background harmonic interference characteristics unrelated to faults in the distribution box on the power grid side, and the components under normal operating conditions. This describes the fluctuation characteristics of the distribution box under varying operating conditions such as load fluctuations and ambient temperature changes; pure fault components. This indicates characteristic changes caused by a fault in the distribution box;

[0080] The specific enhancement branch uses a convolutional neural network to target the clean fault components output by the decoupling branch. Dual adaptive enhancement from both channel and spatial dimensions amplifies the weak specific characteristics of ultra-early faults to obtain enhanced pure fault components. Specifically, firstly, regarding the pure fault components... Channel attention weighting is performed by extracting global features for each feature channel in the clean fault component through global average pooling and global max pooling operations. This is then processed by a multilayer perceptron to generate channel attention weights, which are then used to weight the clean fault component. Weighted summation is used to obtain channel fault characteristics. The calculation formula is as follows:

[0081]

[0082] in, Indicates channel attention weights. This represents the Sigmoid activation function that normalizes the weights. This represents a multilayer perceptron that maps pooled features. This indicates that the global average feature of each feature channel is extracted through a global average pooling operation. This indicates that the global maximum feature for each feature channel is extracted using a global max pooling operation. This indicates element-wise multiplication; subsequently, channel fault characteristics are... Spatial attention weighting is performed, along with global average pooling and global max pooling along the channel dimension, and then fused through convolution to obtain spatial attention weights. These spatial attention weights are then used to analyze channel fault features. Weighting is applied to strengthen the spatial region corresponding to the distribution box fault and weaken the weight of the background interference region, ultimately resulting in an enhanced pure fault component. The calculation formula is as follows:

[0083] ,

[0084] ,

[0085] in, Represents spatial attention weights. This represents a two-dimensional convolution operation. Indicates feature concatenation operation;

[0086] The fusion reconstruction layer receives the background harmonic components from the decoupled branch output. Normal operating condition components And the enhanced pure fault component of the specific enhanced branch output. The reconstructed features are obtained by concatenating the three components and then reconstructing them through multiple fully connected layers. The calculation formula is as follows:

[0087] ,

[0088] in, The nonlinear mapping function represents the fusion and reconstruction layer composed of multiple fully connected layers;

[0089] This will enhance the pure fault component. The input is fed into the fault prediction layer, where fault prediction is performed using MLP to obtain the initial fault monitoring results. ;

[0090] Furthermore, to improve the encoding and decoding capabilities of the specificity-aware dual-branch decoupled autoencoder and achieve accurate decoupling of features, a multi-dimensional constraint loss function is introduced to perform four-dimensional synchronous optimization of the specificity-aware dual-branch decoupled autoencoder. The total loss function is obtained by weighted summation of reconstruction loss, independent constraint loss, specificity enhancement loss, and physical constraint loss, and the calculation formula is as follows:

[0091] ,

[0092] in, To reconstruct the loss, To independently constrain losses, For specific enhancement loss, For physical constraint loss, These are the weighting coefficients corresponding to the four types of losses; specifically, for the reconstruction loss... The reconstructed features are calculated using the mean squared error loss method. Related features The deviation between them is used to accurately reconstruct associated features using a constraint-specific perceptual dual-branch decoupled autoencoder, avoiding the loss of effective information. The calculation formula is as follows:

[0093] ,

[0094] in, Represents the square of the L2 norm;

[0095] For independent constraint loss The background harmonic components were calculated pairwise using the Hilbert-Schmidt independence criterion. Normal operating condition components Pure fault components The independence between them is ensured to guarantee the accuracy of the decoupling operation, with background harmonic components. Normal operating condition components For example, its corresponding independence The calculation formula is as follows:

[0096] ,

[0097] ,

[0098] in, Represents a centered matrix. Indicates the number of data collection periods. express An identity matrix of order 1. express A column of 1 vector, Indicates matrix transpose. These represent the background harmonic components. Normal operating condition components The Gram matrix is calculated using a Gaussian kernel function. The trace operation is the sum of the elements on the main diagonal of the matrix; similarly, the background harmonic components are calculated. Pure fault components Independence between Normal operating condition components Pure fault components Independence between Then the independent constraint loss was calculated. The calculation formula is as follows:

[0099] ;

[0100] For specificity enhancement loss First, the pure fault component will be enhanced. Fault-specific scoring is achieved through single-layer MLP mapping. At the same time, based on the fault specificity score The size of the fault is used to classify the specificity level. For example, when fault specificity scoring When it is less than 0.5, its fault specificity level Set to 0; when the fault specificity score is... When the value is greater than or equal to 0.5 and less than 0.7, its fault specificity level is... Set to 1; when the fault specificity score is... When the value is greater than or equal to 0.7 and less than 0.9, its fault specificity level is... Set to 2; when the fault specificity score is... When the value is greater than or equal to 0.9, its fault specificity level is... Set to 3; subsequently, use contrastive loss based on fault-specific scoring. Constructing specific enhancement loss The calculation formula is as follows:

[0101] ,

[0102] in, Indicates the first Labels for standard multi-source data corresponding to each period The standard threshold for representing the fault specificity score, The interval threshold represents the contrast loss to distinguish between positive and negative samples;

[0103] For physical constraint loss Based on the physical prior laws of distribution box faults, the background harmonic components are analyzed. Pure fault components Constraints are applied to ensure that the components obtained after decoupling conform to physical laws and to avoid meaningless feature mappings. The calculation formula is as follows:

[0104] ,

[0105] in, To analyze background harmonic components based on the standard harmonic frequency range Calculated penalty term, To analyze the pure fault components based on the irreversibility of equipment degradation and the monotonically increasing evolution law of contact resistance The penalty item is calculated based on features that do not conform to the equipment degradation time sequence.

[0106] Furthermore, such as Figure 4 As shown, the decision optimization module uses the optimal decision action output from the special perception reinforcement learning network, including the following steps:

[0107] To address the needs of fault monitoring and handling in distribution boxes, the state space, action space, and reward function are customized to enhance the purity of fault components. Fault specificity score Fault specificity level Initial fault monitoring results Normal operating condition components The state vectors corresponding to the state space are obtained by concatenation. Three types of action sets are designed in the action space, and a multi-dimensional dynamic reward function driven by specific perception is constructed.

[0108] The special perception reinforcement learning network consists of a policy network and a value network. The value network evaluates the value of the current state, and then the policy network optimizes the action space to determine the optimal decision action.

[0109] Furthermore, due to the state space To provide reinforcement learning with full-dimensional state information, and to ensure that the state space conforms to the actual scenario of the current distribution box, the state vector is obtained by concatenating the four features from the aforementioned coupling and decoupling module. The calculation formula is as follows:

[0110] ,

[0111] in, The feature concatenation operation is represented; subsequently, an action space is designed for reinforcement learning. Based on the needs of distribution box fault monitoring and handling, three sets of actions are customized: fault identification action set, trend prediction action set, and handling instruction action set. The fault identification action set includes fault type and fault location; the trend prediction action set includes the fault evolution trend prediction results for the distribution box's future operating cycle; and the handling instruction action set includes differentiated fault handling instructions. This is done within the completed state space. and action space After design, a multi-dimensional dynamic reward function driven by specific perception is constructed. By balancing the specificity of fault identification, identification accuracy, physical compliance, and handling optimization, the total reward function is calculated. For example, for the _th The formula for calculating the total reward function for each cycle is as follows:

[0112] ,

[0113] in, Indicates the first In one cycle, the special perception reinforcement learning network performs actions. The final total reward value, Indicates the first A state vector for each period, Indicates the execution of an action The next A state vector for each period, These represent specificity rewards, recognition accuracy rewards, physical compliance rewards, and processing optimization penalties, respectively.

[0114] For specific rewards The ability of a reinforcement learning network to identify very early, subtle faults is calculated based on the change in fault-specific scores. The calculation formula is as follows:

[0115] ,

[0116] in, The gain coefficient representing the specific reward. Indicates the first Fault specificity score for each cycle, Indicates the first Fault specificity score for each cycle;

[0117] Rewards for recognition accuracy It improves the accuracy of the special perception reinforcement learning network in identifying fault type and fault location by calculating the matching degree between the fault identification result and the real fault label. The calculation formula is as follows:

[0118] ,

[0119] in, The gain coefficient represents the reward for recognition accuracy. This indicates the number of fault cycles correctly identified by the specific perception reinforcement learning network. This indicates that the specific perception reinforcement learning network correctly identified the number of cycles in a normal cycle. This represents the number of cycles that the special perception reinforcement learning network identifies as faulty cycles if they are normal cycles. This represents the number of cycles that the special perception reinforcement learning network identifies as normal cycles; the higher the recognition accuracy, the higher the positive reward of the special perception reinforcement learning network, and the lower the recognition accuracy, the lower the positive reward of the special perception reinforcement learning network.

[0120] For physical compliance rewards It is calculated based on whether the decision-making actions output by the special perception reinforcement learning network conform to physical laws and safety regulations, and it is determined by the gain coefficient of the physical compliance reward. When a decision-making action conforms to physical laws and safety regulations, a physical compliance reward will be given. The value is Physical compliance rewards are given when decision-making actions do not conform to physical laws and safety regulations. The value is ;

[0121] Regarding the handling and optimization of penalties It penalizes over-processing, ineffective processing, and illegal processing of the decision-making actions output by the special perception reinforcement learning network in order to balance the effectiveness of power distribution box fault handling and power supply reliability. The calculation formula is as follows:

[0122] ,

[0123] in, This represents the penalty coefficient for optimizing the punishment. Indicates the number of excessive actions. Indicates the number of invalid actions. This indicates the number of actions taken to handle violations; the more excessive, ineffective, or illegal actions taken, the greater the penalty.

[0124] Furthermore, a value network is used to evaluate the state value of the current state vector, and the advantage function of the current state vector is calculated based on the state value. The PPO algorithm is then used in the policy network to optimize the action space and output the optimal decision action. Specifically, firstly, the TCN-LSTM hybrid coding network is reused to accept the current... The state vector of each period And extract temporal correlation information and core decision features from it, and output a state feature vector; the first The state feature vector of each cycle is input into the value network, and the value of the current state feature vector is evaluated through multiple fully connected layers to obtain the state value. Then, the advantage function of the current state vector is calculated. The calculation formula is as follows:

[0125] ,

[0126] in, This represents the discount factor, used to weigh future rewards; followed by the advantage function based on the current state vector. The pruning objective function of the PPO algorithm is determined, and its calculation formula is as follows:

[0127] ,

[0128] in, This represents the training parameters of the policy network. This represents the expectation over all periods. Indicates the probability ratio. This represents the clipping function to limit the range of probability ratios. This represents the pruning threshold used to limit the magnitude of policy network parameter updates;

[0129] After optimizing the special perception reinforcement learning network through the above steps, the optimal decision action is determined, including the final fault identification result, the fault evolution trend prediction result, and the differentiated handling instruction.

[0130] Preferably, the distribution box operation fault monitoring system based on specific perception reinforcement learning of the present invention collects distribution box operation data at a frequency of 50Hz, i.e., one cycle of data collection every 20ms. The data collected in each cycle has a dimension of 10, so the dimension of the multi-dimensional time series feature matrix is... In the TCN-LSTM hybrid coding network, the three dilated causal convolutional layers in the TCN causal convolutional layer use a kernel size of 3. The first layer has 32 output channels with a dilation coefficient of 1, the second layer has 64 output channels with a dilation coefficient of 2, and the third layer has 64 output channels with a dilation coefficient of 4. The resulting local temporal features after processing by the TCN causal convolutional layers are... The dimension size is The LSTM layer contains two bidirectional LSTM layers, each with 32 hidden units. The final output is the global association feature. The dimension size is; local temporal features Globally related features Then through a dimension linear progression Time-domain features are obtained after dimensionality reduction. The dimension size is ;

[0131] In time-frequency graph Extracting frequency domain features In the three-layer convolutional-pooling cascade structure used, the kernel size of the first convolutional-pooling cascade structure is [size missing]. The number of channels is 32, and the maximum pooling kernel is... Then the dimension of the first layer output is The kernel size used in the second convolutional-pooling cascade structure is [size missing]. The number of channels is 64, and the maximum pooling kernel is... Then the dimension of the second layer output is The kernel size used in the third convolutional-pooling cascade structure is [size missing]. The number of channels is 128, and the maximum pooling kernel is... Then the dimension of the output of the third layer is After being flattened, they passed through dimensions of and Frequency domain features are obtained after dimensionality reduction of the fully connected layer. The dimension size is ; Time-domain features and frequency domain features After splicing, the dimensions are as follows: and MLP mining yields correlation features by interpolating the coupling between the time and frequency domains. The dimension size is ;

[0132] In the coupling / decoupling module, the shared coding layer consists of two convolutional layers and two fully connected layers, where the kernel size of the two convolutional layers is [missing information]. The output channel sizes are 64 and 128 respectively, and the dimensions of the two fully connected layers are respectively , , obtained The dimension size is Subsequently, background harmonic components were obtained using a reparameterization technique based on the VAE standard Gaussian distribution through the decoupled branch in the dual-branch coding layer. Normal operating condition components Pure fault components The dimensions of these three components are all The specific enhancement branch utilizes a convolutional neural network to analyze pure fault components from both channel and spatial dimensions. Enhanced pure fault components obtained after dual adaptive enhancement The dimension size is ; background harmonic components Normal operating condition components Enhance the purity of fault components The data is concatenated along the feature dimension and input into the fusion reconstruction layer, which contains two layers with a dimension of [missing information]. , The fully connected layer obtains reconstructed features through feature mapping via a fusion reconstruction layer. The dimension size is This will enhance the purity of the fault component. The input is fed into the fault prediction layer, through a dimension of size... Initial fault monitoring results obtained by fault prediction using MLP The dimension is ;also, The four weighting coefficients are 1, 0.5, 0.8, and 0.3, respectively, which are the standard thresholds for the fault specificity score. The value is 0.6, compared to the threshold of the loss interval. The value is 0.25;

[0133] A special perception reinforcement learning network is used in the decision optimization module to enhance the pure fault component. Fault specificity score Fault specificity level Initial fault monitoring results Normal operating condition components The dimension of the state vector obtained by concatenation is... The TCN-LSTM hybrid coding network is reused to extract temporal correlation information and core decision features from the state vector. The dimension of the output state feature vector is [missing value]. The value network consists of three fully connected layers, with the following dimensions: , , The dimension size of the obtained state value is The policy network also contains three fully connected layers, with dimensions of [dimensions to be filled in]. , , The dimension of the optimal decision action output is Discount factor in policy network The value is 0.99, the clipping threshold. The value is 0.2; the multi-dimensional dynamic reward function of the special perception reinforcement learning network. Each gain coefficient The values are 10, 20, 50, and 30 respectively;

[0134] Furthermore, with a data acquisition interval of 20ms, multi-source data from 20,000 consecutive historical data acquisition cycles of distribution boxes were used for training. The dataset was divided into training, validation, and test sets in a 7:2:1 ratio. The training set was used for model parameter learning, the validation set was used to monitor overfitting and adjust hyperparameters during training, and the test set was used to finally evaluate the model's generalization ability. The neural networks in the feature construction module and the coupling / decoupling module were trained using the Adam optimizer with an initial learning rate of 0.001. The learning rate was adjusted every 20 training epochs with a decay weight of 0.1. The batch size was set to 100, and the maximum number of training epochs was set to 300. An early stopping strategy was adopted: training was stopped if the validation set loss did not decrease for 10 consecutive epochs. In the special perception reinforcement learning network of the decision optimization module, the learning rate of the policy network was set to 0.0001 with a weight decay of 0.001, and the learning rate of the value network was set to 0.0003 with a weight decay of 0.001. The initial exploration strategy adopted... Mixed strategy with Gaussian noise, initial exploration rate The value is set to 0.3, the exploration rate decays by 0.001 per round, and the Gaussian noise has a mean of 0 and a variance of 0.02, which is used to superimpose the output of the policy network to increase exploration diversity. The number of training rounds is set to 1000, with a special perception reinforcement learning network evaluation performed every 50 training rounds. The loss function is the aforementioned multi-dimensional dynamic reward function. Training stops and the optimal decision action is output when the validation set loss does not decrease for 10 consecutive rounds.

[0135] This invention discloses a power distribution box operation fault monitoring system based on anomalous perception reinforcement learning, including a data acquisition and processing module, a feature construction module, a coupling and decoupling module, and a decision optimization module. The data acquisition and processing module preprocesses the data. In the feature construction module, associated features are extracted through a hybrid coding network. In the coupling and decoupling module, a anomalous perception dual-branch decoupled autoencoder is designed. The dual-branch structure decouples the associated features and provides dual adaptive enhancement of channels and space. The independent constraint loss and anomalous enhancement loss are calculated based on the Hilbert-Schmidt independence criterion and the contrastive loss concept to improve the decoupling accuracy of the autoencoder. In the decision optimization module, the state space, action space, and reward function of the anomalous perception reinforcement learning network are customized for power distribution box fault monitoring, and the PPO algorithm is used for optimization to output the optimal decision action, achieving accurate identification of early-stage faults in the power distribution box.

[0136] The above description is merely a preferred embodiment of the present invention. The scope of protection of the present invention is not limited to the above embodiments. All technical solutions falling within the scope of the present invention's concept are within the scope of protection of the present invention. It should be noted that for those skilled in the art, any improvements and modifications made without departing from the principles of the present invention should also be considered within the scope of protection of the present invention.

Claims

1. A power distribution box operation fault monitoring system based on specific perception reinforcement learning, characterized in that, include: The data acquisition and processing module triggers the smart sensor at a fixed period to collect multi-source data, including electrical and non-electrical data, during the operation of the distribution box. The multi-source data is preprocessed to obtain standard multi-source data and integrated into a multi-dimensional time-series feature matrix. The feature construction module uses a TCN-LSTM hybrid coding network to extract time-domain features from a multi-dimensional time-series feature matrix. It then uses a short-time Fourier transform to convert the electrical data in the multi-dimensional time-series feature matrix into a time-frequency spectrum and uses a convolutional neural network to extract frequency-domain features from the time-frequency spectrum. Finally, it concatenates the time-domain features and frequency-domain features and maps them through a fully connected layer to obtain associated features. The coupling and decoupling module is based on a specific perception dual-branch decoupling autoencoder. It decouples the associated features through the decoupling branch in the dual-branch coding layer to obtain the pure fault component, and uses the specific enhancement branch to perform dual adaptive enhancement of the pure fault component in both channel and space to obtain the enhanced pure fault component. Initial fault monitoring results are generated based on enhanced pure fault components, and fault specificity scores and fault specificity levels are calculated. Independent constraint loss is calculated based on the Hilbert-Schmidt independence criterion. Specificity enhancement loss is calculated using the contrastive loss concept. The specificity-aware dual-branch decoupled autoencoder is optimized through a multi-dimensional constraint loss function. The decision optimization module is designed specifically for power distribution box fault monitoring. It features a state space, action space, and reward function, and uses the PPO algorithm to optimize the special perception reinforcement learning network to output the optimal decision action.

2. The distribution box operation fault monitoring system based on specific perception reinforcement learning as described in claim 1, characterized in that, The specific sensing dual-branch decoupled autoencoder includes: The specific sensing dual-branch decoupled autoencoder includes a shared coding layer, a dual-branch coding layer, a fusion reconstruction layer, and a fault prediction layer; A shared feature vector is obtained by compressing the associated features through a shared coding layer; The dual-branch coding layer includes a decoupling branch and a specific enhancement branch. The decoupling branch separates the latent space of the shared feature vector to extract background harmonic components, normal operating condition components, and pure fault components. The specific enhancement branch performs dual adaptive enhancement on the pure fault components to generate enhanced pure fault components. By fusing and reconstructing the layer, the background harmonic components, normal operating condition components, and enhanced pure fault components are spliced together and feature mapped to generate reconstructed features. The enhanced pure fault component is input into the fault prediction layer to generate initial fault monitoring results, calculate the fault specificity score and classify the fault specificity level; a multi-dimensional constraint loss function is introduced to optimize the specificity-aware dual-branch decoupled autoencoder.

3. The distribution box operation fault monitoring system based on specific perception reinforcement learning as described in claim 2, characterized in that, The specific perception reinforcement learning network includes: To address the needs of power distribution box fault monitoring and handling, a customized design was implemented for the state space, action space, and reward function of the special perception reinforcement learning network. By splicing and enhancing the pure fault component, fault specificity score, fault specificity level, initial fault monitoring result and normal operating condition component, a state vector corresponding to the state space is generated; an action space containing a fault identification action set, a trend prediction action set and a handling instruction action set is constructed; and a multi-dimensional dynamic reward function driven by specific perception is constructed. A special perception reinforcement learning network architecture consisting of a policy network and a value network is constructed. The value network evaluates the state value corresponding to the state vector, and the policy network optimizes the action space to determine and output the optimal decision action.

4. The distribution box operation fault monitoring system based on specific perception reinforcement learning as described in claim 1, characterized in that, The method for generating the associated features includes: The TCN-LSTM hybrid coding network consists of TCN causal convolutional layers and LSTM layers. A multi-dimensional temporal feature matrix is input into the TCN causal convolutional layers, which capture local temporal correlations between adjacent periods and extract local temporal features through a multi-layer causal convolutional structure. The multi-dimensional temporal feature matrix is then input into the LSTM layers, which capture long-distance dependencies between consecutive periods through a multi-layer bidirectional LSTM structure, extracting the progressive evolutionary patterns of fault degradation and generating global correlation features. Finally, the local temporal features are concatenated with the global correlation features and their dimensionality reduced to generate temporal domain features. Electrical data is extracted from a multi-dimensional time-series feature matrix, and a time-frequency spectrum is generated using short-time Fourier transform. A convolutional neural network with an embedded specific attention enhancement module is constructed to extract frequency domain features from the time-frequency spectrum. The time-domain features and frequency-domain features are concatenated and then mapped using a multilayer perceptron to generate associated features.

5. The distribution box operation fault monitoring system based on specific perception reinforcement learning as described in claim 2, characterized in that, The multi-dimensional constraint loss function includes: The multidimensional constraint loss function includes reconstruction loss, independent constraint loss, specificity enhancement loss, and physical constraint loss; The deviation between the reconstructed features and the associated features is calculated using the mean squared error loss, thus obtaining the reconstruction loss; The independence between characteristic components is calculated using the Hilbert-Schmidt independence criterion. The independence between background harmonic components and normal operating condition components, background harmonic components and pure fault components, and normal operating condition components and pure fault components are calculated separately to obtain the independent constraint loss. The enhanced pure fault components are input into a multilayer perceptron, which maps them to fault specificity scores. The fault specificity levels are divided based on the fault specificity scores, and a specificity enhancement loss is constructed using contrastive loss. Constraint rules are set based on the physical prior laws of distribution box faults. Constraints are applied to background harmonic components and pure fault components respectively through constraint rules to generate physical constraint loss. The multidimensional constraint loss function is obtained by weighted summation of reconstruction loss, independent constraint loss, specificity enhancement loss and physical constraint loss.

6. The distribution box operation fault monitoring system based on specific perception reinforcement learning as described in claim 3, characterized in that, The multi-dimensional dynamic reward function includes: The multi-dimensional dynamic reward function includes specificity rewards, recognition accuracy rewards, physical compliance rewards, and processing optimization penalties; Calculate the specificity reward based on the change in fault specificity score between different periods; calculate the recognition accuracy reward based on the matching degree between the fault recognition result and the real fault label; obtain the decision action output by the special perception reinforcement learning network, verify the matching degree between the decision action and physical laws and safety regulations, and calculate the physical compliance reward based on the matching degree; identify the over-handling behavior, ineffective handling behavior and illegal handling behavior in the handling instructions in the decision action, and calculate the handling optimization penalty based on the identification result. By adding the specificity reward, the recognition accuracy reward, the physical compliance reward, and the handling optimization penalty, a multi-dimensional dynamic reward function for the specific perception reinforcement learning network is obtained.

7. The distribution box operation fault monitoring system based on specific perception reinforcement learning as described in claim 4, characterized in that, The specific attention enhancement module includes: The specific enhanced attention module embedded in the convolutional neural network performs feature extraction on the time-frequency spectrum through two-dimensional convolution operations to generate intermediate features. Global average pooling and global max pooling are then performed on the intermediate features to generate channel average vectors and channel maximum vectors, respectively. The channel average vectors and channel maximum vectors are then input into a single-layer multilayer perceptron to generate channel attention weights. The channel attention weights are then multiplied with the intermediate features channel by channel to generate channel attention outputs. The channel attention outputs are subjected to global average pooling and global max pooling in the channel dimension, respectively, to generate spatial average features and spatial maximum features. The spatial average features and spatial maximum features are concatenated in the channel dimension and a spatial attention weight map is generated by convolution operation. The spatial attention weight map is multiplied with the intermediate features in a spatial position-wise manner, and the processed features are max pooled by a pooling layer.

8. The distribution box operation fault monitoring system based on specific perception reinforcement learning as described in claim 1, characterized in that, The PPO algorithm optimizes the specific perception reinforcement learning network, including: The state vector is received through a TCN-LSTM hybrid coding network to generate a state feature vector; the state feature vector is then input into a value network, and the value of the state feature vector is evaluated through multiple fully connected layers to generate the state value of the corresponding state. Obtain the reward function calculation result for the corresponding state, calculate the advantage function corresponding to the current state vector based on the state value and the reward function, and determine the pruning objective function of the PPO algorithm based on the advantage function of the current state vector. The parameters of the policy network are optimized by pruning the objective function, the probability distribution of actions in the action space is mapped by the policy network, the probability distribution of actions is optimized by pruning the objective function, and the optimal decision action is determined and output by the policy network.

9. The distribution box operation fault monitoring system based on specific perception reinforcement learning as described in claim 2, characterized in that, The specific enhancement branch includes: Channel attention weighting is performed on the pure fault components. Global average pooling and global max pooling operations are performed on the pure fault components to extract the global features of each feature channel in the pure fault components. The extracted global features are input into the multilayer perceptron for nonlinear mapping to generate the corresponding channel attention weights. The generated channel attention weights are used to weight the pure fault components channel by channel to generate channel fault features. Global average pooling and global max pooling are performed on the channel fault features along the channel dimension, and then concatenated. Spatial attention weights are extracted from the concatenated features through convolution operations. By using the generated spatial attention weights, channel fault features are weighted at each spatial location to generate enhanced pure fault components.

10. The distribution box operation fault monitoring system based on specific perception reinforcement learning as described in claim 1, characterized in that, The preprocessing methods for the standard multi-source data include: Outlier removal is performed on multi-source data using box plots; Gaussian filtering denoising rules are set to denoise the multi-source data after outlier removal; Z-score standardization rules are set to unify the dimensions of the denoised multi-source data and generate standard multi-source data.