Power distribution network operation risk identification method based on optimization agent neural architecture search
By optimizing the agent neural architecture search, the problems of manual dependence and high computational cost in deep learning models for distribution network risk identification are solved, generating a neural network architecture suitable for edge devices, and realizing real-time risk identification and safety management at the distribution network operation site.
Patent Information
- Authority / Receiving Office
- CN · China
- Patent Type
- Applications(China)
- Current Assignee / Owner
- XIAN UNIV OF TECH
- Filing Date
- 2026-05-22
- Publication Date
- 2026-06-19
Smart Images

Figure CN122244641A_ABST
Abstract
Description
Technical Field
[0001] This application relates to the field of power visual inspection technology, and in particular to a method for identifying distribution network operation risks based on optimized agent neural architecture search. Background Technology
[0002] In recent years, deep learning models have demonstrated good performance in power distribution network risk identification tasks, indicating that deep learning can serve as a reliable, efficient, and fast risk identification method. However, the structure of deep learning models often relies on manual design, requiring extensive manual debugging of the network structure for different application scenarios, making it difficult to obtain the optimal model structure, and the model design process is inefficient. To address the problem of deep neural network structure design relying on human experience, researchers have proposed Neural Architecture Search (NAS), which automatically searches for the optimal neural network structure in a predefined search space, thereby achieving automatic design of neural network structures.
[0003] Traditional neural architecture search methods typically require repeated training and evaluation of a large number of candidate network structures, resulting in extremely high computational costs. Currently, to reduce the computational cost of neural architecture search, predictor-based methods train predictive models to predict the performance of candidate neural networks, while supernet-based methods construct supernets to transform the architecture search problem into a differentiable optimization problem, thus utilizing gradient descent for the search. However, predictor-based methods suffer from unstable predictive capabilities and biased evaluation results; supernet-based methods struggle to directly incorporate hardware-related constraints such as inference latency and power consumption. Summary of the Invention
[0004] This application provides a method for identifying distribution network operation risks based on optimized surrogate neural architecture search. By introducing an optimized surrogate model, the neural architecture search problem is transformed into an optimization problem that can be solved based on gradients, providing an implementation approach for deploying distribution network risk identification models suitable for edge devices.
[0005] To achieve the above objectives, the technical solution of this invention is as follows: This invention provides a method for identifying distribution network operation risks based on optimized agent neural architecture search, including: Image data of power distribution network operation sites are collected, and the image data is manually annotated to construct a power distribution network operation risk identification dataset; A neural architecture search space is constructed, which is represented by a directed acyclic graph structure and consists of multiple network nodes and the connection relationships between the nodes. Different network nodes transfer feature information through candidate operations. The neural architecture search space is subjected to probability distribution relaxation, which maps discrete architecture variables to a continuous optimizable proxy space. Among them, the candidate operation selection variable between network nodes is represented as a category probability distribution, and the operation selection probability vector is used to describe the probability of each candidate operation being selected, and the probability vector satisfies the normalization constraint. The topological connection relationship between network nodes is represented as a Bernoulli distribution, and the topological connection probability parameter is used to represent the probability of the occurrence of the corresponding connection relationship between nodes. A proxy sampling function that satisfies differentiability and unbiasedness is constructed. The sampling process of the discrete architecture is continuously differentiable by using the reparameterization method. The operation selection probability vector and topological connection probability parameters are iteratively updated based on the gradient descent method to complete the end-to-end search optimization of the neural network architecture. Hardware performance constraints are introduced into the end-to-end search process of neural network architecture. A multi-objective optimization function is constructed, which includes recognition accuracy, model inference latency and model parameter quantity. The multi-objective optimization function is used as the objective function for search optimization. Candidate network architectures are comprehensively evaluated and the optimal neural network architecture suitable for edge computing device deployment is selected. The optimal neural network architecture is trained on a dataset of distribution network operation risk identification. The trained risk identification model is then deployed on an edge computing device to perform real-time risk identification tasks based on images of distribution network operation sites.
[0006] In some possible implementations, the labeled content of the distribution network operation risk identification dataset includes at least safe operation behavior, dangerous operation behavior, and illegal operation behavior; the image data of the distribution network operation site includes at least the operation behavior information of the operators and the boundary information of the operation area.
[0007] In some possible implementations, candidate operations include convolution operations, depthwise separable convolution operations, pooling operations, skip connection operations, and no operations.
[0008] In some possible implementations, the normalization constraint for the probability vector is that the sum of the selection probabilities of all candidate operations corresponding to the same network node is 1.
[0009] In some possible implementations, the surrogate sampling function includes a surrogate sampling function for the operation feature variables corresponding to the candidate operation selection, and a surrogate sampling function for the topological structure variables corresponding to the node connection relationship. Both sets of sampling functions achieve continuous differentiable transformation of the discrete sampling process through reparameterization technology.
[0010] In some possible implementations, the reparameterization process of the surrogate sampling function for the operational feature variables includes: For each candidate operation corresponding to each network node, an independent and identically distributed standard Gumbel random noise variable is introduced. The random noise variable is independent of the operation selection probability vector to be optimized. Based on the log probability value of the operation selection probability vector, the standard Gumbel random noise variable, and the preset temperature parameter, a continuously differentiable surrogate sampling mapping is constructed through the Gumbel soft maximization distribution. The discrete sampling process is transformed into a continuously differentiable function transformation of the operation selection probability vector, and the surrogate sample value of the operation feature variable is calculated. The temperature parameter has a range of values of [value missing]. The smoothness of the sampling distribution is controlled by adjusting the temperature parameter; when the temperature parameter approaches 0, the probability distribution of the surrogate sampled values is similar to that in the discrete space according to softmax( The probability distribution of the distributed sampling is consistent to ensure the unbiasedness of the sampling process. Choose a probability vector for the operation.
[0011] In some possible implementations, the temperature parameter adopts a linear decay strategy during the architecture search iteration process, with the initial value set to 1 and gradually decreasing to close to 0 as the number of iterations increases.
[0012] In some possible implementations, the reparameterization process of the topology variable proxy sampling function includes: For each pair of network nodes with possible connections, independent and identically distributed standard Gumbel random noise variables are introduced. The random noise variables are independent of the topology connection probability parameters to be optimized. Based on the log-odds values corresponding to the topological connectivity probability parameters and the standard Gumbel random noise variables, a continuously differentiable surrogate sampling map is constructed using the Sigmoid function. This transforms the discrete Bernoulli sampling process into a function transformation that is continuously differentiable to the topological connectivity probability parameters, and the surrogate sampling values of the topological structure variables are calculated. When the topology connection probability parameter completes convergence optimization, the node connection decision result corresponding to the proxy sample value is consistent with the preset discrete Bernoulli distribution sampling result, ensuring the unbiasedness of the topology connection sampling process.
[0013] In some possible implementations, the iterative update process based on gradient descent employs a two-layer alternating optimization strategy, including: Within a preset iteration interval, the operation selection probability vector and topology connection probability parameters are first fixed, and the weight parameters of the neural network are optimized; then the weight parameters of the neural network are fixed, and the operation selection probability vector and topology connection probability parameters are updated through the gradient returned by the surrogate sampling function, until the preset iteration rounds are completed.
[0014] In some possible implementations, the multi-objective optimization function takes maximizing the accuracy of distribution network operation risk identification as the optimization objective, and uses model inference delay and the number of model parameters not exceeding the preset threshold of the target edge computing device as hard constraints to construct a constrained multi-objective optimization objective function; during the architecture search iteration process, when a candidate network architecture meets the hard constraints of inference delay and the number of parameters, it is included in the accuracy optimization ranking range.
[0015] One or more technical solutions provided in the embodiments of the present invention have at least the following technical effects or advantages: In this embodiment of the invention, firstly, the discrete architecture search space is mapped to a continuous proxy space through probability distribution relaxation. Combined with a differentiable proxy sampling function constructed using a reparameterization method, this solves the core problems of non-differentiability of discrete sampling and inability to perform end-to-end gradient optimization in traditional neural architecture search. It eliminates the need for repeated training and evaluation of massive candidate architectures, reducing the computational cost of architecture search and freeing network structure design from strong reliance on human experience, thus improving the efficiency of model design for distribution network risk identification tasks. Secondly, hardware performance constraints are introduced during the architecture search phase, constructing a multi-objective optimization function for recognition accuracy, inference latency, and parameter quantity. Unlike existing optimization modes that prioritize design and then lightweighting, this achieves end-to-end joint optimization of recognition accuracy and edge hardware adaptability. The generated model is adaptable to edge computing device deployment, solving the problems of limited computing power and insufficient model real-time performance at the edge of the distribution network. Furthermore, the entire process is adapted to distribution network operation risk identification scenarios. Based on a dedicated labeled dataset, model search and training are completed, enabling accurate identification of various operational risk behaviors on-site. While ensuring recognition accuracy, real-time edge detection is achieved, effectively improving the intelligent level of distribution network operation safety management. Attached Figure Description
[0016] To more clearly illustrate the embodiments of the present invention, the accompanying drawings used in the embodiments of the present invention will be briefly introduced below. Obviously, the drawings described below are some embodiments of the present invention. For those skilled in the art, other drawings can be obtained based on these drawings without creative effort.
[0017] Figure 1 A schematic flowchart of an embodiment of a method for identifying distribution network operation risks based on optimized agent neural architecture search provided by an embodiment of the present invention; Figure 2 This is a schematic diagram of the gradient calculation process in the search process of an embodiment of the present invention. Detailed Implementation
[0018] The technical solutions of the embodiments of this application will be clearly and completely described below with reference to the accompanying drawings. Obviously, the described embodiments are only some, not all, of the embodiments of this application. All other embodiments obtained by those skilled in the art based on the embodiments of this application without creative effort are within the scope of protection of this application.
[0019] In the relevant descriptions of this embodiment, the terms "including," "containing," and "possessing" are all open terms and are generally understood to include but not be limited to; the term "at least one" is generally understood to mean one or more, where "multiple" refers to two or more; the term "at least one of the following" or similar expressions refer to any combination of these items, including any combination of single or multiple items, for example, "at least one of a, b, or c", or "at least one of a, b, and c", which can all mean: a, b, c, ab (i.e., a and b), ac, bc, or abc, where a, b, and c can be single or multiple; the symbol "A / B" is used to describe the selection relationship of associated objects, generally indicating an "or" relationship.
[0020] In the following description of the embodiments, the terminology used in the embodiments of this application is for the purpose of describing particular embodiments only and is not intended to be limiting of the application. The singular forms "a" and "the" as used in the embodiments of this application and the appended claims are also intended to include the plural forms, unless the context clearly indicates otherwise.
[0021] Those skilled in the art should understand that, in the following description of the embodiments of this application, the sequence of numbers does not imply the order of execution. Some or all steps may be executed in parallel or sequentially. The execution order of each process should be determined by its function and internal logic, and should not constitute any limitation on the implementation process of the embodiments of this application.
[0022] Those skilled in the art will understand that the numerical ranges in the embodiments of this application should be understood to specifically disclose each intermediate value between the upper and lower limits of the range. Any stated value or intermediate value within a stated range, as well as any other stated value or each smaller range between intermediate values within a range, are also included within this invention. The upper and lower limits of these smaller ranges may be independently included or excluded from the range.
[0023] Unless otherwise stated, the technical / scientific terms used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this application pertains. While this application describes only preferred methods and materials, any methods and materials similar or equivalent to those described herein may be used in the implementation or testing of this application. All references to this specification are incorporated by way of citation to disclose and describe the methods and / or materials associated with those references. In the event of any conflict with any incorporated reference, the content of this specification shall prevail.
[0024] To illustrate the technical solution of the present invention, specific embodiments are described below.
[0025] Figure 1 This is a schematic flowchart of an embodiment of a distribution network operation risk identification method based on optimized agent neural architecture search provided by the present invention. See also: Figure 1 As shown, the above method may include: S101. Collect image data from the power distribution network operation site, manually annotate the image data, and construct a power distribution network operation risk identification dataset; In some embodiments, image data can cover typical operational scenarios such as distribution network inspection, maintenance, and live / outage repair. Sources can include edge-deployable acquisition devices such as on-site monitoring, inspection robots, and individual worker terminals, while also covering diverse samples under different lighting conditions, weather, and shooting angles. Core content includes at least information on worker actions, work area boundaries, wearing of safety protective equipment, and core distribution network equipment. Dataset annotation follows power industry safety operation standards, with core annotation content including at least three categories: safe operation behaviors, hazardous operation behaviors, and violations. Additional annotations can be added for violations such as improper wearing of safety protective equipment and crossing work area boundaries, achieving comprehensive risk coverage for distribution network operations.
[0026] After annotation, the dataset can be further preprocessed with standardization such as resolution normalization and noise reduction. Data augmentation can solve the problem of imbalanced risk samples. The dataset is divided into training, validation and test sets according to a preset ratio to meet the full-process requirements of full model training, performance evaluation in the neural architecture search stage and final model quantization test.
[0027] S102, Construct the neural architecture search space. The search space is represented by a directed acyclic graph structure, which consists of multiple network nodes and the connection relationships between nodes. Different network nodes transfer feature information through candidate operations. In the directed acyclic graph, each network node corresponds to the feature mapping of the power distribution network operation image. The node hierarchy, from shallow to deep, corresponds to the extraction stages of basic visual features, operation behavior, and high-level semantic features of violation scenarios. The connection relationships between nodes and candidate operations together constitute the complete searchable network structure range.
[0028] In some embodiments, candidate operations may include convolution operations, depthwise separable convolution operations, pooling operations, skip connection operations, and no-operation operations. Convolution operations are used for core visual feature extraction; depthwise separable convolution operations balance feature extraction capabilities with model lightweighting; pooling operations are used for feature dimensionality reduction to compress computation; skip connection operations alleviate the gradient vanishing problem in deep networks; and no-operation operations are used to flexibly adjust network depth and connection paths. All types of operations can simultaneously adapt to the accuracy requirements of risk identification and the computational constraints of edge deployment. In this embodiment of the invention, through modular search space design, the network architecture can be decomposed into two types of parameterizable discrete variables: candidate operation selection and node topology connections. This provides standardized support for the probability distribution relaxation of the discrete search space and the construction of differentiable proxy sampling functions in subsequent steps.
[0029] S103 performs probability distribution relaxation on the neural architecture search space, mapping discrete architecture variables to a continuous optimizable agent space. Specifically, the candidate operation selection variables among network nodes are represented as category probability distributions, and operation selection probability vectors are used to describe the probability of each candidate operation being selected, with the probability vectors satisfying normalization constraints; the topological connection relationships among network nodes are represented as Bernoulli distributions, and topological connection probability parameters are used to represent the probability of occurrence of the corresponding connection relationship between nodes. In some embodiments, the normalization constraint of the probability vector is: the sum of the selection probabilities of all candidate operations corresponding to the same network node is 1.
[0030] Specifically, traditional Neural Architecture Search (NAS) can be formulated as a two-layer optimization problem in a discrete space: ; in, Representative architecture, To build the search space, Train a loss function for the architecture weights. For architecture Trainable weight parameters, For architecture In the training set After the complete training, the optimal weight parameters are obtained. The next layer optimization is performed on the candidate architecture on the training dataset. The training process is as follows, and the goal of the upper-level optimization is to search for a model that can be applied to the validation dataset. The neural architecture that maximizes the evaluation accuracy is denoted as . f ,Model (Including parameter θ) can be used for f The distribution characteristics can be approximated, and the neural architecture search problem can be approximated as: ; in express f The optimal parameters are sought. This invention focuses only on differentiable models, while the combinatorial space of neural architecture search is extremely large, making it difficult to obtain ideal model parameters. It is difficult. Therefore, the problem can be approximated by minimizing empirical risk, which can be formalized as follows: ; ; in, The fitting loss function for the surrogate model. This is the dataset sampled from the search space S; where, For the first i One candidate architecture, B Let D be the total number of architectures contained in the current dataset D. Specifically, the upper-level optimization is based on the architecture. Maximize within the range The lower-level optimization is to optimize the dataset. The model is then fitted.
[0031] Furthermore, in the neural architecture search space, the network architecture can be selected based on operational features (candidate operations). and topology Parameterize it. Even It can fit the distribution characteristics into a continuous form. and Since they are still discrete variables, the entire search space remains discrete. Then, let the operational characteristics... It follows a class distribution, and its probability vector is: ( N Represents the number of feature nodes. M (as candidate operands), for node i : ; in represent M The sampling probability of each candidate operation. P To select the first j The probability of each candidate operation, 0 < j < M -1, satisfies and .
[0032] Let topology The distribution follows a Bernoulli distribution with parameters For a specific connection, we have: ; in Represents a node h and nodes k The probability of the existence of a connection between them ranges from [0,1].
[0033] make and They respectively follow probability distributions and This enables the relaxation of architectural variables. It will be handled by... and Zhang Cheng's continuous space As a proxy space, the optimization problem can be reformulated as follows: ; ; ; ; in, For architecture sampling functions, Let b be the candidate neural network architecture sampled from the probability distribution of the continuous agent space. The goal of the lower-level optimization is to solve for the model parameters. It depends on the sampling function From the distribution and Data sets collected in Upper-level optimization focuses on solving... and Although this formula still retains the form of a two-level optimization, both the upper and lower levels of optimization are based on... The different parts are optimization objectives—that is, for the input variables and weights respectively. Therefore, the problem can be solved end-to-end based on gradients.
[0034] S104: Construct a surrogate sampling function that satisfies differentiability and unbiasedness. Use the reparameterization method to make the sampling process of the discrete architecture continuously differentiable. Iteratively update the operation selection probability vector and topological connection probability parameters based on the gradient descent method to complete the end-to-end search optimization of the neural network architecture. In some embodiments, the surrogate sampling function includes a surrogate sampling function for the operation feature variables corresponding to the candidate operation selection, and a surrogate sampling function for the topological structure variables corresponding to the node connection relationships. Both sets of sampling functions achieve continuous differentiable transformation of the discrete sampling process through reparameterization techniques. Specifically, as follows... Figure 2 As shown, Figure 2 This is a schematic diagram of the gradient calculation process in the search process of an embodiment of the present invention.
[0035] The reparameterization implementation process of the surrogate sampling function for operational feature variables includes: For each candidate operation corresponding to each network node, an independent and identically distributed standard Gumbel (0, 1) random noise variable is introduced. This random noise variable is related to the operation selection probability vector to be optimized. They are independent of each other; Based on the operation selection probability vector The logarithmic probability value, the standard Gumbel (0,1) random noise variable, and the preset temperature parameters By constructing a continuously differentiable surrogate sampling map using the Gumbel soft maximization distribution, the discrete sampling process is transformed into a probability vector for operation selection. A continuously differentiable function transformation is used to calculate the surrogate sample values of the operational feature variables; Wherein, the temperature parameter The range of values is By adjusting the temperature parameters The value of controls the smoothness of the sampling distribution; when the temperature parameter When the value approaches 0, the probability distribution of the surrogate sampled value is similar to that in the discrete space according to softmax( The probability distribution of the distributed sampling is consistent, ensuring the unbiasedness of the sampling process.
[0036] Specifically, as shown in step S103 above, the objective function for maximizing the neural architecture search is: The objective function is about and The gradient is: ; in for The gradient of the output with respect to the input features is given by the parameters. Decide.
[0037] Furthermore, since the sampling function s is defined in a discrete space, the aforementioned gradient cannot be directly calculated. Therefore, this invention constructs a surrogate sampling function that satisfies differentiability and can approximate the discrete sampling function s (i.e., unbiasedness) in an unbiased and relaxed manner. Specifically, this invention employs a reparameterization technique to implement the surrogate sampling function. Proxy sampled values of operational features Sampled from the Günbel soft maximization distribution, for nodes i ,have: ; in, , These are independent and identically distributed Gumbel (0, 1) sampled values. The temperature parameter has a range of values. ; Representative node i Candidate operation j The unnormalized log probability value, For nodes i Candidate operation m The unnormalized log probability value, 0 < m < M -1, For the first i The node, the first m Independent and identically distributed Gumbel (0, 1) samples of each candidate operation.
[0038] Furthermore, operational characteristic variables Satisfies unbiasedness: ; in, Describes when the temperature coefficient When the value approaches 0, the proxy sample value The probability is equal to 1. This indicates that temperature in continuous space... When →0, the model is at the node i Select operation j The probability is equivalent to directly extracting the values from the discrete space according to the softmax function. The probability of sampling from the distribution.
[0039] In some embodiments, temperature parameters A linear decay strategy is adopted during the architecture search iteration process. The initial value of the iteration is set to 1, and it is gradually reduced to close to 0 as the number of iterations increases.
[0040] Understandably, this dynamic control strategy can precisely balance the exploratory and convergent aspects of the architecture search process, while also taking into account the differentiability of the surrogate sampling function and the unbiasedness of the sampling results: the higher temperature parameter in the early stage of iteration makes the Gumbel soft maximization sampling distribution smoother, ensuring that the architecture search can fully traverse the preset search space and avoid getting trapped in local optima too early; as the iteration progresses, the temperature parameter decays linearly, gradually tightening the sampling distribution, so that the surrogate sampling values gradually approach the discrete sampling (e.g., one-hot sampling) results, ensuring the unbiasedness of the architecture parameter optimization process; when the temperature approaches 0 at the end of the iteration, the optimization results of the continuous surrogate space can be precisely aligned with the network architecture of the original discrete search space, ensuring that the model obtained by the final search simultaneously meets the accuracy requirements of distribution network operation risk identification and the deployment constraints of edge devices.
[0041] In some embodiments, the reparameterization implementation process of the topology variable proxy sampling function includes: For each pair of network nodes with a possibility of connection, independent and identically distributed standard Gumbel (0,1) random noise variables are introduced. These random noise variables are related to the topology connection probability parameters to be optimized. They are independent of each other; Based on the topology connection probability parameter The corresponding logarithmic probability value and the standard Gumbel (0,1) random noise variable are used to construct a continuously differentiable surrogate sampling map through the Sigmoid function, transforming the discrete Bernoulli sampling process into a probability parameter of topological connectivity. Proxy sample values of topological variables are calculated by transforming continuously differentiable functions. Where, when the topology connection probability parameter When convergence optimization is completed, the node connection decision result corresponding to the proxy sample value is consistent with the preset discrete Bernoulli distribution sampling result, ensuring the unbiasedness of the topology connection sampling process.
[0042] Specifically, for the proxy sample values of the topology ,have: ; in, , These are independent and identically distributed Gumbel (0, 1) sampled values. The Sigmoid function maps the input to the interval (0, 1); Represents implicit probability variables The logarithmic probability value, i.e. ,and In addition, topological variables Satisfies unbiasedness: ; This indicates that continuous parameters When optimized to its extreme, the model determines whether nodes h and k are connected, which is consistent with the discrete Bernoulli distribution (0-1 distribution) originally set in this invention.
[0043] Based on this, when hour, The objective function is about ( , The surrogate gradient of ) can be approximated by the following formula: .
[0044] In some embodiments, the iterative update process based on the gradient descent method employs a two-layer alternating optimization strategy, including: Within the preset iteration interval, first fix the operation selection probability vector. and topology connection probability parameters First, optimize the weight parameters of the neural network; then, fix the weight parameters of the neural network and update the operation selection probability vector using the gradient returned by the surrogate sampling function. and topology connection probability parameters This continues until the preset number of iterations is completed.
[0045] Specifically, given an agent model In one search, the surrogate model uses the output of the surrogate sampling function. Iterative computation is performed on the input, followed by forward propagation, and the model output is directly used as the maximization term for gradient calculation. Then, gradient updates are alternately performed within predefined iteration intervals. Finally, after a specified number of optimization rounds, based on the current optimization... The batch of high-quality architectures obtained from sampling is the output of this search algorithm.
[0046] S105 introduces hardware performance constraints in the end-to-end search process of neural network architecture, constructs a multi-objective optimization function that includes recognition accuracy, model inference latency and model parameter quantity, uses the multi-objective optimization function as the objective function for search optimization, comprehensively evaluates candidate network architectures, and selects the optimal neural network architecture suitable for deployment on edge computing devices. In some embodiments, the multi-objective optimization function takes maximizing the accuracy of distribution network operation risk identification as the optimization objective and uses model inference latency and model parameter quantity not exceeding the preset threshold of the target edge computing device as hard constraints to construct a constrained multi-objective optimization objective function. The preset threshold can be pre-calibrated based on the hardware computing power and memory capacity of commonly used edge devices in the distribution network (such as edge terminals of inspection robots, smart boxes at the monitoring front end, and individual law enforcement devices of operators), as well as the real-time alarm response requirements for on-site risk identification. For example, for mainstream edge terminals based on ARM architecture, the preset single-frame image inference latency is no more than 50ms and the model parameter quantity is no more than 10MB to ensure that the generated model can run stably on the on-site equipment.
[0047] In some embodiments, the performance calibration of candidate operation levels can be completed in advance on the target edge device, and a lookup table of inference latency and parameter quantity for each candidate operation such as convolution and depthwise separable convolution can be established. The total latency and total parameter quantity of any candidate network architecture can be calculated quickly without round-by-round testing, which greatly improves the search efficiency. At the same time, the multi-objective optimization function maintains differentiability throughout the process and can be deeply adapted to the gradient backpropagation process of the proxy sampling function, realizing end-to-end joint optimization of architecture parameters, recognition accuracy, and hardware performance.
[0048] In each iteration of the architecture search, the candidate network architectures generated by the proxy sampling function are first checked for compliance with hard constraints: only when a candidate network architecture simultaneously meets the hard constraints of inference latency and parameter quantity are it included in the optimization ranking of recognition accuracy and the gradient update range of architecture parameters; for candidate architectures that do not meet the hard constraints, their sampling probability is reduced through a gradient penalty mechanism, so that the search process always focuses on feasible edge-adaptive architectures, avoids invalid search overhead, and further improves the targeting and efficiency of architecture search.
[0049] S106 uses the distribution network operation risk identification dataset to train the optimal neural network architecture in its entirety. The trained risk identification model is then deployed on an edge computing device to perform real-time risk identification tasks on distribution network operation site images.
[0050] In the full training phase, the topology and operation configuration of the optimal neural network architecture obtained through multi-objective optimization can be fixed. End-to-end full-supervised training is carried out based on the training set and validation set constructed and divided in step S101. During the training process, a cross-entropy loss function optimized for the unbalanced characteristics of risk samples in the distribution network is used, along with weight regularization, dynamic decay of the learning rate, and early stopping strategy to avoid model overfitting. At the same time, the risk identification accuracy and single-frame inference latency of the model are verified synchronously based on the validation set to complete hyperparameter tuning and model convergence verification. After training, the independent test set divided in step S101 is used to conduct comprehensive quantitative tests on the model's various operation behavior recognition accuracy, generalization ability, inference speed, memory usage, and other indicators to ensure that the model performance fully matches the preset multi-objective optimization requirements and the deployment threshold of the target edge devices.
[0051] In the edge deployment and real-time identification stage, for mainstream edge computing devices in the power distribution network (including intelligent terminals at the monitoring front end, edge computing units of inspection robots, individual law enforcement terminals for operators, edge gateways in power distribution rooms, etc.), the trained models are adapted and optimized through compilation. By combining the lightweight characteristics of the architecture itself, the model quantization and operator optimization adapted to the edge inference framework can be completed, further compressing the model size and improving the efficiency of edge inference without losing the core recognition accuracy.
[0052] In some embodiments, to comprehensively verify the overall performance of the method of the present invention, multi-dimensional comparative performance tests can be conducted based on the independent test set for distribution network operation risk identification constructed in step S101. The test environment covers the GPU server environment for model training, as well as the deployment environments of mainstream ARM architecture edge computing terminals, inspection robot edge units, and individual soldier law enforcement terminals in the distribution network field; at the same time, multiple schemes are set as comparison baselines. For example, commonly used methods in the field of distribution network risk identification, such as manually designed lightweight models, traditional differentiable neural architecture search methods (DARTS), and hardware-aware NAS methods, are used to comprehensively verify the effectiveness of the method of the present invention from five dimensions: identification accuracy, inference performance, computational complexity, search efficiency, and edge deployment adaptability.
[0053] Test results show that the neural network model obtained by the method of this invention through optimized proxy neural architecture search has excellent comprehensive performance in the task of risk identification in power distribution network operations. This method can automatically generate a dedicated neural network model that balances the accuracy of risk identification in power distribution network operations, low inference latency, and low computational complexity without requiring extensive manual debugging of the network structure. The model is naturally adapted to the deployment requirements of edge computing devices in power distribution network sites and can be stably deployed and run without additional post-processing such as model pruning and quantization. This provides an efficient, reliable, and feasible technical approach for intelligent safety management and control in power distribution network operations.
[0054] The various embodiments in this specification are described in a progressive manner. For the same or similar parts between the various embodiments, please refer to each other. Each embodiment focuses on describing the differences from other embodiments.
[0055] The above embodiments are only used to illustrate the technical solutions of this application, and are not intended to limit this application. Although this application has been described in detail with reference to the foregoing embodiments, those skilled in the art should understand that modifications can still be made to the technical solutions described in the foregoing embodiments, or equivalent substitutions can be made to some or all of the technical features therein. Such modifications or substitutions do not cause the essence of the corresponding technical solutions to deviate from the scope of the technical solutions of this application.
Claims
1. A method for identifying distribution network operation risks based on optimized agent neural architecture search, characterized in that, include: Image data of power distribution network operation sites are collected, and the image data is manually annotated to construct a power distribution network operation risk identification dataset; A neural architecture search space is constructed, which is represented by a directed acyclic graph structure and consists of multiple network nodes and the connection relationships between the nodes. Different network nodes transfer feature information through candidate operations. The neural architecture search space is subjected to probability distribution relaxation processing, which maps discrete architecture variables into a continuous optimizable proxy space. Among them, the candidate operation selection variable between network nodes is represented as a category probability distribution, and the probability of each candidate operation being selected is described by an operation selection probability vector, and the probability vector satisfies normalization constraints. The topological connection relationship between network nodes is represented as a Bernoulli distribution, and the topological connection probability parameter represents the probability of occurrence of the corresponding connection relationship between nodes. A proxy sampling function that satisfies differentiability and unbiasedness is constructed. The sampling process of the discrete architecture is continuously differentiable by using the reparameterization method. The operation selection probability vector and the topological connection probability parameter are iteratively updated based on the gradient descent method to complete the end-to-end search optimization of the neural network architecture. Hardware performance constraints are introduced in the end-to-end search process of neural network architecture. A multi-objective optimization function is constructed, which includes recognition accuracy, model inference latency and model parameter quantity. The multi-objective optimization function is used as the objective function for search optimization. Candidate network architectures are comprehensively evaluated and the optimal neural network architecture suitable for edge computing device deployment is selected. The optimal neural network architecture is fully trained using the aforementioned power distribution network operation risk identification dataset. The trained risk identification model is then deployed on an edge computing device to perform real-time risk identification tasks based on images of power distribution network operation sites.
2. The method for identifying distribution network operation risks based on optimized agent neural architecture search according to claim 1, characterized in that, The labeled content of the power distribution network operation risk identification dataset includes at least safe operation behavior, dangerous operation behavior, and illegal operation behavior; the image data of the power distribution network operation site includes at least the operation behavior information of the operators and the boundary information of the operation area.
3. The method for identifying distribution network operation risks based on optimized agent neural architecture search according to claim 1, characterized in that, The candidate operations include convolution operations, depthwise separable convolution operations, pooling operations, skip connection operations, and no operations.
4. The method for identifying distribution network operation risks based on optimized agent neural architecture search according to claim 1, characterized in that, The normalization constraint of the probability vector is: the sum of the selection probabilities of all candidate operations corresponding to the same network node is 1.
5. The method for identifying distribution network operation risks based on optimized agent neural architecture search according to claim 1, characterized in that, The proxy sampling function includes a proxy sampling function for the operation feature variables selected for the corresponding candidate operation and a proxy sampling function for the topological structure variables of the corresponding node connection relationship. Both sets of sampling functions achieve continuous differentiable transformation of the discrete sampling process through reparameterization technology.
6. The method for identifying distribution network operation risks based on optimized agent neural architecture search according to claim 5, characterized in that, The reparameterization implementation process of the surrogate sampling function for the operational feature variables includes: For each candidate operation corresponding to each network node, an independent and identically distributed standard Gumbel random noise variable is introduced. The random noise variable is independent of the operation selection probability vector to be optimized. Based on the log probability value of the operation selection probability vector, the standard Gumbel random noise variable, and the preset temperature parameter, a continuously differentiable surrogate sampling map is constructed through the Gumbel soft maximization distribution. The discrete sampling process is transformed into a continuously differentiable function transformation of the operation selection probability vector, and the surrogate sampling value of the operation feature variable is calculated. The temperature parameter has a range of values of [value missing]. The smoothness of the sampling distribution is controlled by adjusting the value of the temperature parameter; when the temperature parameter approaches 0, the probability distribution of the proxy sampled value is consistent with the probability distribution of the operation selection probability vector in the discrete space according to the softmax function, which is used to ensure the unbiasedness of the operation sampling process.
7. The method for identifying distribution network operation risks based on optimized agent neural architecture search according to claim 6, characterized in that, The temperature parameter adopts a linear decay strategy during the architecture search iteration process. The initial value of the iteration is set to 1, and it gradually decreases to close to 0 as the number of iterations increases.
8. The method for identifying distribution network operation risks based on optimized agent neural architecture search according to claim 5, characterized in that, The reparameterization implementation process of the topology variable proxy sampling function includes: For each pair of network nodes with possible connections, independent and identically distributed standard Gumbel random noise variables are introduced. These random noise variables are independent of the topology connection probability parameters to be optimized. Based on the log-odds value corresponding to the topological connectivity probability parameter and the standard Gumbel random noise variable, a continuously differentiable surrogate sampling map is constructed through the Sigmoid function, which transforms the discrete Bernoulli sampling process into a function transformation that is continuously differentiable to the topological connectivity probability parameter, and calculates the surrogate sampling value of the topological structure variable. When the topology connection probability parameter completes convergence optimization, the node connection decision result corresponding to the proxy sample value is consistent with the preset discrete Bernoulli distribution sampling result, ensuring the unbiasedness of the topology connection sampling process.
9. The method for identifying distribution network operation risks based on optimized agent neural architecture search according to claim 1, characterized in that, The iterative update process based on the gradient descent method employs a two-layer alternating optimization strategy, including: Within a preset iteration interval, the operation selection probability vector and topology connection probability parameters are first fixed, and the weight parameters of the neural network are optimized; then the weight parameters of the neural network are fixed, and the operation selection probability vector and topology connection probability parameters are updated through the gradient returned by the surrogate sampling function, until the preset iteration rounds are completed.
10. The method for identifying distribution network operation risks based on optimized agent neural architecture search according to claim 1, characterized in that, The multi-objective optimization function aims to maximize the accuracy of distribution network operation risk identification, and uses model inference delay and model parameter quantity not exceeding the preset threshold of the target edge computing device as hard constraints to construct a constrained multi-objective optimization objective function. During the architecture search iteration process, when a candidate network architecture meets the hard constraints of inference delay and parameter quantity, it is included in the accuracy optimization ranking range.