Hierarchical-feedback mechanism and TEG-guided search method for differentiable neural architectures
By employing a hierarchical-feedback mechanism and a TEG-guided neural architecture search method, combined with a phased depth increment and a cyclic feedback mechanism, the high computational resource consumption and insufficient correlation of architecture performance in existing methods are addressed, achieving efficient neural architecture optimization and improved accuracy.
Patent Information
- Authority / Receiving Office
- CN · China
- Patent Type
- Patents(China)
- Current Assignee / Owner
- ANHUI UNIV
- Filing Date
- 2024-07-25
- Publication Date
- 2026-06-30
AI Technical Summary
Existing neural architecture search methods suffer from bottlenecks in computation time and resource consumption, and the separation of the search process from the evaluation process results in limited correlation between architecture performance, making it difficult to efficiently optimize the architecture of deep convolutional neural networks.
We employ a differentiable neural architecture search method based on a hierarchical-feedback mechanism and TEG-guided approach. By combining a phased depth increment and a recurrent feedback mechanism with TEG metrics, we optimize search efficiency and architecture performance, and integrate search and evaluation networks to improve model accuracy.
We have achieved efficient neural architecture search on the CIFAR and ImageNet datasets, which significantly reduces computational costs and improves the accuracy and architecture performance of the search model.
Smart Images

Figure CN119150955B_ABST
Abstract
Description
Technical Field
[0001] This invention relates to the field of neural network architecture search technology, and specifically to a differentiable neural architecture search method based on a hierarchical-feedback mechanism and TEG-guided method. Background Technology
[0002] The development of deep convolutional neural networks (DCNNs) has played a crucial role in the success of computer vision tasks. However, manually designing new network architectures is not only time-consuming and labor-intensive, but also requires extensive experience in network training and is difficult to scale. Neural Architecture Search (NAS) has been used in recent years to replace manual design, automatically discovering efficient network architectures within a given search space, thereby reducing manpower and cost.
[0003] Despite its significant advantages in automation, NAS still faces the problem of substantial computational time and resource consumption. Most NAS methods rely heavily on validation sets and perform accuracy-based architecture optimization. Therefore, frequent training and evaluation of sampled architectures become major bottlenecks hindering search efficiency and interpretability. Even with various effective heuristics for channel approximation or architecture sampling, the training convergence speed of hypernetworks remains very slow. Approximate surrogate inference techniques, such as truncated training and early stopping, while accelerating the search process, often introduce significant search bias.
[0004] Differentiable Architecture Search (DARTS) has become one of the most popular NAS methods due to its relatively low computational cost and competitive performance. Unlike traditional methods that search over a set of discrete candidate architectures, DARTS relaxes the search space into a continuous space, allowing for optimization of the architecture through gradient descent. This gradient-based optimization efficiency reduces the search cost from thousands of GPU-days to just a few. Recent NAS surveys show a wealth of research on DARTS due to its simplicity and elegance. Furthermore, the application of gradient optimization in continuous search strategies has become a significant trend in the NAS field.
[0005] However, existing DARTS methods require separating the search process into two steps: search and evaluation. The search step uses a shallow network to discover the optimal cell structure, while the evaluation step stacks these cells to build a deep network for final evaluation. This approach leads to the optimization of the search process being independent of the target evaluation network. PDARTS attempts to mitigate this depth gap by progressively deepening the search network. EnTranNAS, by combining evaluation and search network modules, constructs a search network to narrow this gap. Furthermore, SNAS and GDAS employ Gumbel-Softmax and an improved through-pass Gumbel-Softmax, respectively, to alleviate the gap caused by discretization, while AutoHAS enhances GDAS with an entropy term to simultaneously search for hyperparameters and architectures. Although these methods optimize the search and evaluation processes to some extent, they still separate the two processes, resulting in limited correlation between the architectural performance discovered in the search network and the actual performance of the evaluation network.
[0006] On the other hand, some recent studies have questioned the effectiveness of DARTS. Li and Talwalkar observed that even simple random search methods can find architectures superior to the original DARTS. Zela et al. and Liang et al. showed that DARTS easily degenerates into networks filled with parameterless operations (such as skip connections), leading to poor performance of the searched architectures. To alleviate these problems, Yu et al. proposed a recurrent differentiable architecture search method called CDARTS. CDARTS integrates the search and evaluation networks into a unified architecture and jointly trains the two networks in a recurrent manner, but it still suffers from the problem of high search and training costs.
[0007] Recently, researchers have addressed this issue by proposing training-free NAS. Studies have found that even at initialization (i.e., without gradient descent), metrics such as sample Jacobian, Neural Tangent Kernel, and "synflow" are highly correlated with network accuracy. This significantly reduces search costs. However, these works only validate some highly customized search methods and exploit the limited properties of deep networks empirically or in a specific way. Furthermore, these training-free metrics still only pursue final search performance, offering limited benefits for interpreting and understanding search trajectories and different search spaces. To address these issues, Chen et al. proposed a unified, visual, training-free NAS framework called TEG (Trainability, Expressivity, Generalization), which improves search time while simultaneously enhancing the accuracy of the search model.
[0008] While PDARTS improves the gap between search and evaluation by gradually increasing network depth, and CDARTS jointly optimizes the search and evaluation networks by introducing a recurrent feedback mechanism, there is still room for improvement in terms of architecture selection and performance optimization. Summary of the Invention
[0009] To address the aforementioned technical problems, this invention provides a differentiable neural architecture search method based on a hierarchical-feedback mechanism and TEG-guided approach. This innovative method combines the advantages of PDARTS and CDARTS, further introducing the TEG metric to optimize search efficiency and architecture performance. In the first stage, PDARTS's phased depth increments are employed, combined with the TEG metric to optimize architecture selection. In the second stage, a recurrent feedback mechanism is introduced, and the TEG metric is used to further optimize the performance of the final deep network. Through this integration of hierarchical and feedback mechanisms, the method of this invention significantly improves search efficiency while maintaining high accuracy of the search model.
[0010] To solve the above-mentioned technical problems, the present invention adopts the following technical solution:
[0011] A search method for differentiable neural architectures based on a hierarchical-feedback mechanism and TEG guidance includes the following steps:
[0012] Step 1, Hierarchical Search Phase: Neural architecture search is performed based on the differentiable architecture search method. The differentiable architecture search method includes a search step and an evaluation step. The search step contains multiple search phases, each search phase corresponds to a search network, and as the number of search phases increases, the number of units in the corresponding search network increases accordingly, gradually approaching the number of units in the evaluation network used in the evaluation step.
[0013] At the beginning of each search phase, the TEG metric of the search network for that current search phase is calculated, and the TEG metric is used to guide the optimization of the search network parameters ω for that current search phase. S And architectural weight α,
[0014] Step 2, Feedback Search Phase:
[0015] The evaluation network is an extension of the search network in the last search phase. It integrates the search network and the evaluation network and trains the search network and the evaluation network in the last search phase in a loop.
[0016] During joint training, the corresponding optional operations for each optional edge are gradually reduced until only the two operations with the highest weights remain. Finally, the set of operations with the highest recognition success rate on the training set is selected to form the final network required.
[0017] Furthermore, the search step includes three search phases, with the number of units in the search network corresponding to the three search phases being 5, 8, and 11, respectively; the evaluation step uses an evaluation network with 20 units.
[0018] Furthermore, in step one, the TEG metric is used to guide the optimization of the search network parameters ω in the current search phase. S When the architecture weight is α, the corresponding two-level optimization problem is:
[0019]
[0020] Where, r TEG λ is the TEG metric of the search network in the current search phase. ω With λ α It is a weighting parameter used to balance the weight of the loss and the TEG metric in the total loss; and These represent the validation loss function and the training loss function, respectively. This represents network parameters.
[0021] Furthermore, the integration of the search network and the evaluation network, and the joint training of the search network and the evaluation network in the final search stage in a cyclical manner, specifically includes:
[0022] The system architecture search is modeled as a joint optimization problem of the search network and the evaluation network, and the objective function of the joint optimization problem is:
[0023]
[0024]
[0025] Where ω E and To evaluate the network parameters, ω S and These are the parameters for searching the network; and These represent the validation loss function and the training loss function, respectively. Represents the corresponding weight parameter, r TEG The TEG metric for the search network at the current search stage;
[0026] The objective function of the joint optimization problem is optimized through two stages: individual learning and joint learning. In the individual training stage, a weight sharing strategy is used to update ω. E The structure of the evaluation network is updated based on the architecture weights of the search network; the weights of the evaluation network are initialized using the parameters of the search network.
[0027] In the joint learning phase, the search algorithm updates the architecture weights α using introspective distillation and feedback from the evaluation network's features; the objective function of the joint optimization problem is further expressed as:
[0028]
[0029] in, This indicates the use of fixed weights. To optimize the architectural weights α in the search network, This indicates that a fixed architectural weight α is used to optimize the evaluation of the weights ω in the network. E , This represents the introspective distillation process, used to transfer knowledge from the evaluation network to the search network, and using features obtained from the evaluation network as supervision signals to guide the update of the architecture weights α in the search network.
[0030] Compared with the prior art, the beneficial technical effects of the present invention are:
[0031] This invention proposes a differentiable neural architecture search framework based on a hierarchical-feedback mechanism and TEG-guided search, termed DARTS-HF-TEG. The method consists of two phases: the first phase, a hierarchical search phase, employs a phased, incremental network depth approach, guided by TEG metrics for architecture selection; the second phase, a feedback search phase, introduces a recurrent feedback mechanism and further optimizes the performance of the final deep network using TEG metrics. Experiments and analyses on CIFAR, ImageNet, and NATS-Bench demonstrate the effectiveness of this method. Specifically, in the DARTS search space, this invention achieves an average top-1 accuracy of 97.50% on CIFAR10 (requiring only 0.16 GPU-days) and a top-1 accuracy of 75.9% on ImageNet (requiring only 0.8 GPU-days). Attached Figure Description
[0032] Figure 1 This is a flowchart illustrating the differentiable neural architecture search method proposed in this invention.
[0033] Figure 2 This is a schematic diagram of the overall framework of the hierarchical search stage of the present invention;
[0034] Figure 3 This is a schematic diagram of the feedback search phase of the present invention. Detailed Implementation
[0035] A preferred embodiment of the present invention will now be described in detail with reference to the accompanying drawings.
[0036] 1. Differentiable Architecture Search (DARTS)
[0037] The goal of differentiable architecture search is to find a cell motif that can be repeatedly stacked to construct a convolutional network. A cell is a directed acyclic graph consisting of an ordered sequence of N nodes, represented as... Each node x i It is a latent representation (e.g., a feature map), and each directed edge (i,j) is associated with an operation o. (i,j) Regarding this, the operation will transfer information from x i Convert to x j .set up The operation space is represented by a set of candidate operations, such as convolution, pooling, and skip connections. Each operation represents a function o(·), and is defined in x. i The above is executed. To ensure the search space is continuous, DARTS relaxes the choice of a specific operation to a normalized exponential function (softmax) of all possible operations:
[0038]
[0039] in, Indicates an optional operation, a pair of nodes (x i ,x j The operation of the mixed weights is based on the dimension of The vector α(i,j) is parameterized. Here, parameter α is the architecture encoding to be optimized. The intermediate nodes of a unit are computed based on all its preceding nodes. Output node x N-1 It is obtained by concatenating all intermediate nodes in the channel dimension.
[0040] With the definition of a cell, the search for the optimal architecture becomes a two-level optimization problem:
[0041]
[0042] Where α is the optimized architecture weight on the validation data (val), and ω S Let ω represent the search network parameters learned from the training data. This expression represents the alternating optimization of the search network parameters and architecture weights. S And α. After obtaining the optimal structure, the discovered neural units (Cells) are superimposed to construct a new evaluation network, which is then retrained on the target task.
[0043] In DARTS, the optimization of the evaluation network is separated from the search for the architecture. Therefore, the size of the architecture parameters is not strongly correlated with the contribution of this operation to the performance of the searched network, resulting in a searched architecture that is not optimal for the final evaluation network.
[0044] 2.TEG
[0045] Trainability, expressivity, and generalization are three important, distinct, and complementary properties for characterizing and understanding neural networks. Specifically, trainability relates to the convergence speed during the optimization process; expressivity relates to the functional complexity of the network; and generalization indicates the model's error on unknown data.
[0046] The greater the difference in learning speed among different feature modules, the more difficult it is to optimize the network. Therefore, Yu et al. used the empirical condition number of NTK (neural tangential kernel) to represent trainability:
[0047]
[0048] In the formula, the network parameter θ is taken from the open-source normal initialization. (N l (where the width is the width of the l-th layer), therefore it is calculated during network initialization. It is negatively correlated with both the training and testing accuracy of the network. Therefore, minimizing during the search process... This will encourage the discovery of architectures with high performance.
[0049] For a given training example and parameters, the number of linear regions R(x) train θ can be approximated by the number of unique activation patterns in all ReLU layer combinations throughout the network. Therefore, Yu et al. used empirical numbers in linear regions to represent expressiveness:
[0050]
[0051] It is positively correlated with both the training accuracy and testing accuracy of the network. Furthermore, The correlation with training accuracy is stronger than that with test accuracy, which validates... It indicates how well the network fits the training data, but not its generalization ability.
[0052] Generalization is empirically estimated by calculating the test MSE error of the network's NTK kernel regression:
[0053]
[0054] This represents the NTK of the last layer in a deep network. Attempting to revert x via NTK kernel regression test With x train Link them together and combine them with the given training labels x train The data is passed to the test data. If the deep neural network's prediction... If it becomes data-independent, then it will fail to generalize, and the MSE will become very large.
[0055] Combination Along with MSE, the TEG reward metric can be expressed as r TEG =r k +r R +r MSE , with r k For example:
[0056]
[0057]
[0058] in, The trainability of the architecture sampled at step t is evaluated. R and r MSE It can be calculated in the same way.
[0059] 3. The differentiable neural architecture search method in this invention
[0060] The proposed method for searching differentiable neural architectures based on a hierarchical-feedback mechanism and TEG-guided optimization consists of two parts. The first stage is a hierarchical search stage, which uses progressively increasing network depth and incorporates TEG metrics for architecture selection. The second stage is a feedback search stage, which introduces a cyclic feedback mechanism and further optimizes the performance of the final deep network using TEG metrics. Figure 1 As shown, this invention, through the integration of this hierarchical and feedback mechanism, greatly improves search efficiency while maintaining a high level of accuracy in the search model.
[0061] 3.1 Hierarchical Search Phase
[0062] In DARTS, architecture search is performed on an 8-unit network, while the discovered architectures are evaluated on a 20-unit network. However, shallow and deep networks behave very differently, meaning that the structure selected during the search process is not necessarily optimal.
[0063] This invention also employs a strategy of gradually increasing network depth during the search process, so that at the end of the search, the depth is as close as possible to the number of units in the evaluation network. However, unlike the previous invention, because the feedback mechanism in the feedback search phase can further bridge the depth gap, this invention chooses to slow down the rate of network depth increase in the hierarchical search phase to save computational costs. Figure 2 As shown. Furthermore, this invention does not immediately perform a search upon obtaining the final-stage network. Instead, it performs a cyclical feedback search together with the evaluation network, the specific approach of which is described in the detailed introduction of the feedback search phase. In addition, at the beginning of each search epoch, this invention calculates the TEG metric of the current search network and uses this TEG metric to guide the optimization of the parameters ω of the search network in the current search epoch. S Given the architecture weights α, this two-level optimization problem can be expressed as:
[0064]
[0065] Where, r TEG Let λ be the TEG metric of the network at the current epoch. ω With λ α These are weighting parameters used to balance the weights of the loss and TEG metric in the total loss. Compared to the PDARTS method, the method of this invention saves computation time and reduces the network gap between different depths.
[0066] Figure 2 As part of the overall framework for the hierarchical search phase, the depth of the search network is increased from 5 in the initial phase to 8 and 11 in the intermediate and final phases, respectively, while the number of candidate operations (represented by connections of different colors) is reduced accordingly, and the operation with the lowest score in the previous phase is removed.
[0067] 3.2 Feedback Search Phase
[0068] In the feedback search phase, the evaluation network is extended from the search network in the final phase, integrating the search and evaluation networks into a unified architecture and jointly training the two networks in a cyclical manner. The architecture search is modeled as a joint optimization problem of these two networks:
[0069]
[0070] Where ω ETo evaluate the network weights, this invention also employs two stages—individual learning and joint learning—to optimize the objective function. For the evaluation network, its internal unit structure is generated by discretizing the learned weights α. Thresholding the learned values in α preserves the top-k (k=2) operations of each node in the cell. During the individual training stage, a weight-sharing strategy is used to update ω. E The architecture of the evaluation network is updated based on the architecture weights of the search network. The weights of the evaluation network are initialized using the parameters of the search network.
[0071] During the joint learning phase, the search algorithm updates the architecture weights α using introspective distillation and feedback from the evaluation network's features. More specifically, the joint optimization of the two networks is expressed as:
[0072]
[0073] in, It uses fixed weights. To optimize the search network's heavy architecture weights α, It uses a fixed architectural weight α to optimize the evaluation of the network weight ω. E . This represents the introspective distillation process, which transfers knowledge from the evaluation network to the search network. Features obtained from the evaluation network are used as supervision signals to guide the update of the architecture hyperparameter α in the search network.
[0074] Feedback search phase, such as Figure 3 As shown, it consists of two branches: a search network and an evaluation network. To facilitate information transfer, a connection is established between the two branches. Specifically, there exists an architecture transfer path that transmits the discovered cell graph from the search branch to the evaluation branch, as shown... Figure 3 As indicated by the bold arrow at the top. Based on previous work, this invention retains the top-k (k=2) most efficient operations from all candidate operations from all previous nodes. On the other hand, there is also a feedback distillation path that passes the feature feedback from the evaluation branch to the search branch, such as... Figure 3 As indicated by the solid arrows at the bottom, this feedback acts as a supervisory signal, helping the search network find better cell structures.
[0075] Specifically, multi-level features of the evaluation network are used as feedback signals because they are representative in capturing image semantics. For example... Figure 3The lateral embedding connections shown combine low-resolution, semantically strong features with high-resolution, semantically weak features at multiple levels. Features are derived from the output of each stage and then the corresponding feature logits are generated through the embedding module. The function of the embedding module is to project dense feature maps onto a low-dimensional subspace. The logarithm of the resulting evaluation network is passed through a soft cross-entropy layer as a supervision signal for the search network.
[0076] Figure 3 This diagram illustrates the feedback search phase, which includes two networks: a search network (left) and an evaluation network (right). The Embedding module maps the features of each phase to a one-dimensional vector. Figure 2 and Figure 3 middle
[0077] It will be apparent to those skilled in the art that the present invention is not limited to the details of the exemplary embodiments described above, and that the invention can be implemented in other specific forms without departing from its spirit or essential characteristics. Therefore, the embodiments should be considered in all respects as exemplary and non-limiting, and the scope of the invention is defined by the appended claims rather than the foregoing description. Thus, all variations falling within the meaning and scope of equivalents of the claims are intended to be included within the present invention, and no reference numerals in the claims should be construed as limiting the scope of the claims.
[0078] Furthermore, it should be understood that although this specification describes embodiments, not every embodiment contains only one independent technical solution. This narrative style is merely for clarity. Those skilled in the art should consider the specification as a whole, and the technical solutions in each embodiment can also be appropriately combined to form other embodiments that can be understood by those skilled in the art.
Claims
1. A search method for differentiable neural architectures based on a hierarchical-feedback mechanism and TEG-guided approach, characterized in that, Includes the following steps: Step 1, Hierarchical Search Phase: Neural architecture search is performed based on the differentiable architecture search method. The differentiable architecture search method includes a search step and an evaluation step. The search step contains multiple search phases, each search phase corresponds to a search network, and as the number of search phases increases, the number of units in the corresponding search network increases accordingly, gradually approaching the number of units in the evaluation network used in the evaluation step. At the beginning of each search phase, the TEG metric of the search network for that current search phase is calculated, and the TEG metric is used to guide the optimization of the search network parameters for that current search phase. and architectural weight , Step 2, Feedback Search Phase: The evaluation network is an extension of the search network in the last search phase. It integrates the search network and the evaluation network and trains the search network and the evaluation network in the last search phase in a loop. During joint training, the corresponding optional operations for each optional edge are gradually reduced until only the two operations with the highest weight remain. Finally, the set of operations with the highest recognition success rate on the training set is selected to form the final network required. In step one, the TEG metric is used to guide the optimization of the search network parameters for the current search phase. and architectural weight When this happens, the corresponding bi-level optimization problem is: ; ; in, The TEG metric for the search network at the current search stage; and It is a weighting parameter used to balance the weight of the loss and the TEG metric in the total loss; and These represent the validation loss function and the training loss function, respectively. Indicates network parameters; The integration of the search network and the evaluation network, and the joint training of the search network and the evaluation network in the final search stage in a cyclical manner, specifically includes: The system architecture search is modeled as a joint optimization problem of the search network and the evaluation network, and the objective function of the joint optimization problem is: ; ; ; in and To evaluate the network parameters, and These are the parameters for searching the network; and These represent the validation loss function and the training loss function, respectively. This represents the corresponding weight parameters. The TEG metric for the search network at the current search stage; The objective function of the joint optimization problem is optimized through two stages: individual learning and joint learning. During the individual training stage, a weight-sharing strategy is used to update the objective function. The structure of the evaluation network is updated based on the architecture weights of the search network; the weights of the evaluation network are initialized using the parameters of the search network. During the joint learning phase, the search algorithm updates the architecture weights through introspective distillation, utilizing feature feedback from the evaluation network. The objective function of the joint optimization problem is further expressed as: ; in, This indicates the use of fixed weights. To optimize the architecture weights in the search network , This indicates the use of fixed architectural weights. To optimize the evaluation of weights in the network , This represents the introspective distillation process, used to transfer knowledge from the evaluation network to the search network, and using features obtained from the evaluation network as supervisory signals to guide the architectural weights in the search network. Update.
2. The method for searching differentiable neural architectures based on hierarchical feedback mechanism and TEG guidance according to claim 1, characterized in that, The search process includes three search phases, with 5, 8, and 11 units in the search network corresponding to the three search phases, respectively; the evaluation phase uses an evaluation network with 20 units.