Computational complexity penalty term design method for target detection neural architecture search
By dynamically generating complexity constraint thresholds and penalty coefficients and embedding them into a single-objective genetic algorithm, the problem of wasted computational resources and difficulty in balancing accuracy and complexity in target detection neural architecture search is solved, thus achieving efficient neural architecture search.
Patent Information
- Authority / Receiving Office
- CN · China
- Patent Type
- Applications(China)
- Current Assignee / Owner
- XIDIAN UNIV
- Filing Date
- 2026-05-06
- Publication Date
- 2026-06-12
AI Technical Summary
Existing neural architecture search methods for object detection lack targeted computational complexity penalty mechanisms, resulting in wasted computing resources, prolonged search cycles, difficulty in balancing accuracy and computational complexity, and poor adaptability of penalty mechanisms, making it difficult to adapt to different deployment devices and scenarios.
We design a computational complexity penalty term for neural architecture search in object detection. By dynamically generating complexity constraint thresholds and penalty coefficients, we embed them into a single-objective genetic algorithm and combine elite initialization and iterative screening mechanisms to optimize detection accuracy and computational complexity.
It achieves precise constraints on computational complexity, retains a high-precision architecture, adapts to different devices, shortens search time, improves search efficiency, and reduces deployment costs.
Smart Images

Figure CN122196308A_ABST
Abstract
Description
Technical Field
[0001] This invention belongs to the field of deep learning and computer vision technology, and specifically relates to a method, system, electronic device and storage medium for designing a computational complexity penalty term for object detection neural architecture search, which is used to balance model accuracy and computational complexity in the process of automated design of neural network architecture. Background Technology
[0002] Neural Architecture Search (NAS) technology can automatically design high-performance neural network architectures, which has significant application value in the field of object detection. However, current NAS schemes for object detection based on single-object genetic algorithms suffer from the following core problems: First, there is a lack of targeted computational complexity penalty mechanisms. Existing NAS solutions only control computational overhead through simple parameter constraints or post-pruning methods, without eliminating invalid candidate architectures with high FLOPs from the search source, resulting in a large waste of computing resources and a significantly extended search cycle.
[0003] Second, the challenge of balancing accuracy and computational complexity remains unresolved. Existing solutions either prioritize accuracy optimization while neglecting complexity control, resulting in excessively high FLOPs in the searched architectures, making them unsuitable for edge devices; or they excessively constrain the number of parameters, sacrificing feature extraction capabilities, leading to a decrease in detection accuracy.
[0004] Third, existing penalty mechanisms have poor adaptability and high implementation costs. The complexity constraints introduced by some NAS solutions are designed for image classification tasks and do not adapt to the feature extraction and multi-scale prediction requirements of object detection. Furthermore, the penalty mechanisms are complex and difficult to directly embed into NAS frameworks based on single-objective genetic algorithms.
[0005] Among the existing neural architecture search methods for object detection, there is a scheme that introduces an infinite penalty term into object detection NAS, but it has the following inherent defects: (1) It adopts a fixed threshold preset by humans, which cannot automatically adapt to the differences in computing power of different deployment devices, nor can it dynamically adjust the constraint strength according to the search process; (2) It adopts hard penalty throughout the process, directly eliminating all architectures that exceed the threshold, which is easy to lose the slightly over-threshold architectures that appear early and contribute greatly to the accuracy, causing the search to fall into a local optimum; (3) It only uses FLOPs as a complexity indicator, which cannot accurately reflect the real inference latency on different hardware; (4) It initializes the population completely randomly, resulting in a slow convergence speed. Summary of the Invention
[0006] To address the aforementioned problems in the existing technology, this invention provides a method for designing a computational complexity penalty term for neural architecture search in object detection. This method can accurately constrain the computational complexity of candidate network architectures during the neural architecture search process, effectively eliminate high-complexity invalid architectures, and retain potentially high-precision architectures, thus balancing model detection accuracy and computational overhead.
[0007] The technical problem to be solved by this invention is achieved through the following technical solution: In a first aspect, the present invention provides a method for designing a computational complexity penalty term for target detection neural architecture search, embedding the design of the computational complexity penalty term into a target detection neural architecture search method based on a single-objective genetic algorithm, including the following core steps: Step S1: Determine the quantitative index of computational complexity, and dynamically generate a complexity constraint threshold based on the computing power information of the target deployment equipment, as a dynamic threshold; Step S2: Using the dynamic threshold, construct a mathematical model of computational complexity penalty term including dynamic penalty coefficient, wherein the dynamic penalty coefficient gradually increases from the initial value according to a preset strategy during the search iteration process; Step S3: Embed the mathematical model of the computational complexity penalty term into the two-layer optimization objective function of the neural architecture search to form an upper-layer optimization objective function with a penalty term, which is used to simultaneously optimize detection accuracy and computational complexity in architecture search. Step S4: In the population initialization phase, decode each candidate architecture and calculate its actual computational complexity based on the quantification index, and output each candidate architecture and its actual computational complexity. Step S5: In the population iteration and update phase, based on the upper-level optimization objective function with penalty term and the actual computational complexity of each candidate architecture, the candidate architecture is evaluated and screened using the computational complexity penalty term. In the early stage of the search, candidate architectures whose actual complexity slightly exceeds the dynamic threshold are allowed to participate in evolution with reduced fitness. In the later stage of the search, candidate architectures whose actual complexity exceeds the dynamic threshold are removed, and the population is updated.
[0008] Furthermore, in step S1, the quantification index of computational complexity includes floating-point operations (FLOPs) or hardware inference latency; when the quantification index is hardware inference latency, it is estimated by using a pre-built target device operation-level latency lookup table.
[0009] Furthermore, in step S1, the method for dynamically generating the complexity constraint threshold is as follows: obtain the peak floating-point computing power and target inference frame rate of the target deployment device, and calculate the dynamic threshold based on the preset time margin coefficient.
[0010] Furthermore, in step S2, the dynamic penalty coefficient increases from its initial value according to an exponential growth strategy during the coarse search phase, and is set to infinity during the fine training phase to achieve hard constraints.
[0011] Furthermore, in step S4, the population initialization adopts a hybrid strategy that combines elite initialization with random generation, where elite individuals are derived from a pre-trained object detection model architecture.
[0012] Furthermore, in step S5, the early search stage refers to the coarse search stage of population iteration, in which candidate architectures with actual computational complexity within 1.2 times the dynamic threshold are allowed to participate in evolution; the late search stage refers to the fine training stage, in which the dynamic penalty coefficient is made to approach infinity, and any candidate architecture with actual computational complexity exceeding the dynamic threshold is hard-removed.
[0013] Furthermore, in step S3, the upper-level optimization objective function with a penalty term is specifically the sum of the detection loss on the validation set and the dynamic penalty coefficient multiplied by the penalty term. The lower the value of this objective function, the higher the fitness of the candidate architecture. The two-layer optimization objective function also includes a lower-level optimization objective function, which is a composite loss function containing classification loss, regression loss, and confidence loss. It is used to optimize the network weight parameters under a given candidate architecture. The minimization result of the lower-level optimization objective function provides the computational basis for the detection loss on the validation set.
[0014] Secondly, the present invention provides a computational complexity penalty term design system for target detection neural architecture search, comprising: The threshold generation module is used to determine the quantitative index of computational complexity and dynamically generate complexity constraint thresholds based on the computing power information of the target deployment equipment, which serve as dynamic thresholds. The penalty term construction module is used to construct a mathematical model of computational complexity penalty term containing dynamic penalty coefficients using the dynamic threshold. The dynamic penalty coefficients increase according to a preset strategy during the search iteration process. The objective function embedding module is used to embed the mathematical model of the computational complexity penalty term into the two-layer optimization objective function of the neural architecture search, forming the upper-layer optimization objective function with a penalty term; The initialization and statistics module is used to decode candidate architectures and calculate their actual computational complexity during the population initialization phase. The iterative screening and update module is used to dynamically evaluate and screen candidate architectures based on the upper-level objective function with a penalty term and the actual computational complexity during the population iterative update stage, and update the population, finally outputting a high-precision object detection network architecture that meets the complexity constraints.
[0015] Thirdly, the present invention provides an electronic device, including a memory, a processor, and a computer program stored in the memory and executable on the processor, wherein the processor executes the program to implement the above-mentioned method for designing a computational complexity penalty term for target detection neural architecture search.
[0016] Fourthly, the present invention provides a computer-readable storage medium having a computer program stored thereon, wherein the program, when executed by a processor, implements the steps of the above-described method for designing a computational complexity penalty term for target detection neural architecture search.
[0017] Compared with the prior art, the present invention has the following advantages: (1) Significantly improved threshold adaptability: The device-aware dynamic threshold generation mechanism can automatically adapt to different deployed devices without manual parameter adjustment, while also supporting scenario-based manual adjustment, covering the full scenario requirements from edge terminals to servers.
[0018] (2) Avoid losing high-precision architectures: The iterative dynamic penalty intensity allows slightly over-threshold architectures to participate in the competition during the coarse search stage, preserving potential high-precision architectures; the fine training stage switches to hard penalty to ensure that the final model is lightweight.
[0019] (3) Enhanced hardware deployment: Add hardware inference latency as a complex metric option, quickly estimate latency through a pre-built hardware lookup table, and make the searched architecture truly low latency on the target device.
[0020] (4) Improved search efficiency: The elite population initialization strategy accelerates convergence, and the early elimination mechanism of dynamic penalty shortens the overall search time.
[0021] (5) Extremely low implementation cost: The core process of the original NAS framework is completely retained. Only the penalty calculation and threshold setting modules need to be modified. It can be directly embedded into the existing system without large-scale code reconstruction.
[0022] The present invention will now be described in further detail with reference to the accompanying drawings. Attached Figure Description
[0023] Figure 1 This is an overall flowchart of the computational complexity penalty term design method for target detection neural architecture search provided in this embodiment of the invention; Figure 2 This is a schematic diagram of the single-objective, two-layer optimization NAS framework on which the present invention is based, illustrating the alternating iterative relationship between upper-layer architecture optimization and lower-layer weight optimization; Figure 3 This is a flowchart of the encoding / decoding and complexity statistics of the candidate architecture in this embodiment of the invention; Figure 4This is a comparison chart of the population FLOPs distribution changes under dynamic penalties and existing fixed penalties in this embodiment of the invention. Detailed Implementation
[0024] The present invention will be further described in detail below with reference to specific embodiments, but the implementation of the present invention is not limited thereto.
[0025] Example 1 Reference Figure 1 This invention proposes a method for designing a computational complexity penalty term for neural architecture search in object detection. Within a single-objective genetic algorithm-based NAS framework for object detection, precise control of computational complexity is achieved through the following steps: S1. Determine the quantitative indicators and dynamic thresholds for computational complexity.
[0026] This step first determines the quantification metrics for computational complexity. These metrics include two options: floating-point operations (FLOPs) and hardware inference latency, which can be selected based on deployment requirements.
[0027] If the FLOPs metric is selected: it should be consistent with the FLOPs statistical standard in the performance evaluation stage of the basic NAS method to ensure that the evaluation of the penalty item is seamlessly integrated with the overall search process.
[0028] If hardware latency metrics are selected: a pre-built operational-level latency lookup table for the target device is constructed to record the measured single-operator latency on the target hardware for different convolutional kernel sizes, number of channels, and module types.
[0029] Then, a complexity constraint threshold is dynamically generated based on the computing power information of the target deployment equipment, denoted as the dynamic threshold ε.
[0030] The specific method for generating the constraint threshold ε is as follows: Obtain the peak floating-point computing power of the target deployment device. Target inference frame rate Time margin coefficient (Recommended value: 0.8-1.2), Calculation: .
[0031] Based on the actual deployment scenario of the target detection task and the upper limit of the device's computing resources, the computational complexity constraint threshold ε can be flexibly adjusted according to requirements: for edge terminal devices with limited computing resources (such as embedded industrial control computers and mobile chips), ε can be adjusted to 10G~15G FLOPs; for server-side deployment scenarios with sufficient computing resources, ε can be adjusted to 20G~30G FLOPs. In this embodiment, the preferred value is ε=20G FLOPs to balance the dual requirements of model accuracy and deployment efficiency in industrial defect detection scenarios.
[0032] This dynamic threshold will be used to construct subsequent penalty terms.
[0033] S2. Construct a mathematical model that includes a computational complexity penalty term with dynamic penalty coefficients.
[0034] Using the dynamic threshold ε generated in step S1, a computational complexity penalty term is constructed. For any candidate network architecture a, its actual computational complexity is set to f(a) (FLOPs or inference latency), which is the total number of floating-point operations (FLOPs, pooling, etc.) during the forward propagation of the neural network model corresponding to that architecture. The mathematical definition of the computational complexity penalty term P(a) is:
[0035] Simultaneously, a dynamic penalty coefficient λ is set, which gradually increases from its initial value according to a preset strategy during the search iteration process. Specifically, the penalty coefficient λ employs an iterative dynamic adjustment strategy: (1) In the coarse search phase (e.g., the first 80 iterations): Initial value , The iteration count increases exponentially with the number of iterations k, where K is the total number of coarse search iterations (K=80 in this example). In the first iteration, λ=10, in the 40th iteration, λ≈148, and in the 80th iteration, λ≈22026.
[0036] (2) During the fine training phase (e.g., 300 iterations): λ is set to infinity to ensure that the final output architecture strictly meets the complexity constraints.
[0037] The penalty mechanism operates in two ways: Scenario 1: When the actual computational complexity of the candidate architecture f(a) ≤ ε, P(a) = 0, the penalty term does not take effect. In this case, the upper objective function is equivalent to the original NAS objective function. The optimization process only focuses on improving the model detection accuracy and does not impose additional constraints on the candidate architecture.
[0038] Scenario 2: When the actual computational complexity of the candidate architecture f(a) > ε, P(a) is the positive difference between f(a) and ε, triggering the penalty mechanism. During the fine-tuning training phase, the dynamic penalty coefficient λ is set to infinity. At this time, the penalty term λ·P(a) tends to infinity, causing the upper-level objective function value to tend to infinity, thereby directly determining the candidate architecture as an invalid individual, achieving a hard constraint on candidate architectures with high computational complexity.
[0039] (1) Coarse search stage: λ gradually increases with iteration, and the fitness of the superthreshold architecture decreases linearly with the degree of superscalarity, allowing Slightly over-threshold architectures participate in the competition; (2) Detailed training stage: Penalty items This causes the value of the upper-level objective function to approach infinity, thus directly determining the candidate architecture as an invalid individual.
[0040] The above penalty term is designed in a simple form and has a clear physical meaning: it allows architectures with FLOPs below the threshold to freely participate in the accuracy competition, while imposing a decisive penalty on architectures with FLOPs exceeding the threshold, without the need to introduce a complex multi-objective weight balancing mechanism.
[0041] S3. Embed the computational complexity penalty term into the single-objective bi-layer optimization objective function.
[0042] like Figure 2 As shown, the NAS framework upon which this invention is based adopts a single-objective, two-layer optimization structure: the upper-layer optimizer is responsible for optimizing the target detection network architecture, and the lower-layer optimizer is responsible for optimizing all weight parameters under this architecture. The two run alternately and iteratively until a target detection model with excellent performance and a compact structure is obtained.
[0043] The computational complexity penalty term constructed in step S2 is embedded into the upper-level single-objective optimization objective function to construct the upper-level optimization objective function with the penalty term, specifically defined as:
[0044] Where A is the search space constructed by the basic NAS method, N(a,ω) is the neural network model corresponding to candidate architecture a, ω is the network weight parameter, and ω*(a) is the optimal weight parameter obtained by training candidate architecture a on the training set. The detection loss on the validation set is λ, which is the penalty coefficient, and P(a) is the computational complexity penalty term designed in this invention.
[0045] The lower-level optimization objective function uses a composite loss function that includes classification loss, regression loss, and confidence loss to optimize the network weights. The loss function is defined as follows: , α, β, and γ are dynamically adjusted weighting coefficients that control the contribution ratios of the classification task, bounding box regression task, and target confidence task to the total loss, respectively.
[0046] By embedding penalty terms, the detection accuracy and computational complexity can be managed in a coordinated manner during the architecture optimization phase.
[0047] S4. In the population initialization and encoding / decoding stages, the computational complexity statistics of candidate architectures are completed simultaneously.
[0048] In the population initialization phase, this step employs a hybrid initialization strategy of elite seeding combined with random generation: the initial population size is set to 20, with 4 individuals representing the architecture encoding of a pre-trained high-performance model (YOLOv5s, YOLOv8s, TMNet-1, TMNet-2), and the remaining 16 individuals randomly generated within the search space. This strategy accelerates population convergence and avoids starting the search from scratch.
[0049] like Figure 3 As shown, the encoding and decoding process of the candidate architecture is performed simultaneously with the computational complexity statistics. When encoding each module, the parameter encoding information for the number of layers, width factor, and depth factor is first embedded: (1) Module type encoding: For the four module types C3, C3X, C3SE and C3GH, the four numbers 1, 2, 3 and 4 are used for encoding respectively; (2) Convolution kernel size encoding: For the five convolution kernel sizes of 1×1, 3×3, 5×5, 7×7 and 9×9, the five numbers 1, 3, 5, 7 and 9 are used for encoding respectively; (3) Channel number encoding: For the five channel number configurations of 32, 64, 128, 256 and 512, the corresponding values are used for encoding.
[0050] For each candidate architecture, key parameters such as module type, convolutional kernel size, number of channels, network depth, and width are extracted after decoding. Then, based on the quantization metrics determined in step S1, the actual computational complexity f(a) of each candidate architecture is calculated: If using the FLOPs metric: the actual FLOPs value of the architecture can be accurately calculated through the computational statistics interface of deep learning frameworks (such as PyTorch's thop tool); If a latency metric is used: the inference latency estimate for this architecture is obtained by summing the latency values of all operations using a pre-built hardware operation latency lookup table.
[0051] This statistical analysis is performed synchronously during population initialization and after each offspring generation to ensure the real-time nature and accuracy of computational complexity information. This step outputs each candidate architecture and its corresponding actual computational complexity.
[0052] S5. Iterative selection and population update based on penalty terms, matching coarse search and fine training strategy.
[0053] In this step, during the population iteration and update phase, based on the upper-level objective function with a penalty term constructed in step S3 and the actual complexity of each candidate architecture calculated in step S4, the candidate architectures are evaluated and screened using a computational complexity penalty term. Simultaneously, this invention employs a two-stage training strategy of "coarse search and fine training," with the penalty term taking effect throughout the entire process. The specific rules are as follows: Coarse Search Phase (80 rounds of training in this embodiment): If f(a) > 1.2ε, the candidate architecture is directly excluded; if 1.0ε < f(a) ≤ 1.2ε, it is retained but with a reduced fitness (allowing slightly over-threshold architectures to participate in the competition); if f(a) ≤ ε, it is determined as a valid architecture and participates in the evolution normally. In this phase, high-complexity invalid architectures are quickly excluded through the penalty term, significantly reducing the computational overhead, accelerating the population convergence speed, and concentrating the search resources of the genetic algorithm on valid candidate architectures that meet the complexity constraints.
[0054] Fine Training Phase (300 rounds of training in this embodiment): At this time, the dynamic penalty coefficient λ → ∞, and the penalty term λ·P(a) → ∞, making the upper-layer objective function value tend to infinity, so that any candidate architecture with f(a) > ε is directly determined as an invalid individual and excluded. In this phase, full-scale training is performed on the preferred architectures that meet the complexity constraints screened out in the coarse search phase to fully optimize the detection accuracy and generalization ability of the model.
[0055] In each generation of population update, the above process is alternately iterated: The lower-layer optimizer trains the optimal weights for each candidate architecture in the current population, the upper-layer optimizer uses these weights to calculate the validation loss and evaluate the fitness in combination with the penalty term, and then genetic operations such as selection, simulated binary crossover (SBX), and Gaussian mutation are performed according to the fitness to generate a new generation of offspring population. The hyperparameters during the training process (including the initial learning rate, momentum, weight decay coefficient, loss coefficient, etc.) are kept consistent with the original settings of the basic NAS method and do not require additional tuning. Repeat this process until the preset termination condition (such as the maximum number of iteration rounds or fitness convergence) is reached. Finally, a high-precision object detection network architecture that meets the computational complexity constraints is output.
[0056] Through the above mechanism, as the number of iteration rounds increases, the proportion of high-complexity invalid architectures in the population gradually decreases, the proportion of valid architectures continues to increase, the computational overhead of the search process is significantly reduced, and the search efficiency is greatly improved.
[0057] Experimental Verification Next, the performance of the computational complexity penalty term design method provided by the present invention is verified and analyzed through simulation experiments.
[0058] I. Experimental Datasets In this embodiment, experiments are carried out on the Crack dataset: Crack Crack Dataset: This dataset is an object detection dataset based on crack annotation, only containing one category of cracks, with a total of 879 training images, 80 validation images, and 37 test images. The characteristics of the dataset are that the targets are tiny and the shapes are irregular, and the detection difficulty is relatively large, which is suitable for verifying the robustness of the model in small-object detection.
[0059] II. Experimental Settings This embodiment uses YOLOv5s as the baseline network and NAS as the basic template for architecture search. The initial population size is set to 20, the computational complexity constraint threshold ε is set to 20 G FLOPs, the coarse search phase is trained for 80 epochs, the fine training phase is trained for 300 epochs, the batch size is set to 16, the Adam optimizer (β1=0.9, β2=0.999) is used, the initial learning rate is set to 0.01, a cosine annealing learning rate decay strategy is adopted, and the L2 regularization coefficient is set to 0.0005.
[0060] III. Performance Comparison Experiment like Figure 4 Based on the data in the figure, we will analyze the results of the two sets of experiments.
[0061] Left figure (no penalty): 200 candidate architectures are evenly distributed across the entire horizontal axis range of 0–3.0 GPU days. 45 architectures exceeding the threshold (red dots, FLOPs>20G) appear randomly in various time periods, each undergoing a complete training and evaluation process, consuming the same amount of time as the effective architectures. The total search time covers the entire 0–3.0 GPU days interval.
[0062] The right figure (with penalty term): While searching 200 candidate architectures, all 46 architectures exceeding the threshold are concentrated in the extremely narrow range of 0–0.10 GPU days on the far left of the horizontal axis. This indicates that these architectures were immediately deemed invalid by the penalty term after their calculated FLOPs exceeded the threshold ε=20G, and weight training was skipped. The search time for the 154 effective architectures (green dots) is concentrated between 0 and 1.6 GPU days, compressing the overall search upper limit from 3.0 GPU days to approximately 1.6 GPU days.
[0063] Therefore, with the same total number of search architectures, the introduction of a computational complexity penalty term reduces the single evaluation time of architectures exceeding the threshold from comparable to ordinary architectures to less than 0.10 GPU days, shortening the overall effective search time by approximately 47%. This verifies that the penalty term significantly improves NAS search efficiency through a "judgment-and-rejection" mechanism, concentrating computational resources on effective architectures that meet complexity constraints.
[0064]
[0065] This invention, by employing dynamic penalty strength, retains some high-precision architectures that slightly exceed the threshold. It accelerates convergence through elite initialization and, combined with the early rejection mechanism of dynamic penalty, significantly reduces the overall search time while effectively improving search efficiency and accuracy. Compared to other metrics, while using hardware latency as the quantification metric slightly increases FLOPs, the measured latency on the Jetson Nano is successfully reduced, truly achieving hardware-oriented lightweight design. Experiments show that the present invention improves mAP on the Crack dataset compared to existing NAS methods, reduces search time, and significantly reduces inference latency on Jetson Nano edge devices. Furthermore, the method is simple in form, has extremely low implementation cost, requires no modification to the core code of the original search framework, and can be directly embedded into existing NAS systems based on single-objective genetic algorithms. It has strong engineering practical value and is particularly suitable for resource-constrained applications such as industrial defect detection and edge device deployment.
[0066] As can be seen from the above embodiments, the present invention provides an improved method for designing computational complexity penalty terms for target detection neural architecture search. While fully retaining the core process of the original NAS framework, it solves the inherent defects of existing methods such as rigid fixed thresholds, easy loss of high-precision architecture due to global hard penalties, distortion of FLOPs indicators, and slow convergence speed by introducing four major improvements: dynamic threshold for device perception, iterative dynamic penalty intensity, hardware perception complexity quantification, and elite population initialization.
[0067] Example 2 This embodiment provides a computational complexity penalty term design system for target detection neural architecture search, including: The threshold generation module is used to determine the quantitative index of computational complexity and dynamically generate complexity constraint thresholds based on the computing power information of the target deployment equipment, which serve as dynamic thresholds. The penalty term construction module is used to construct a mathematical model of computational complexity penalty term containing dynamic penalty coefficients using the dynamic threshold, wherein the dynamic penalty coefficients increase according to a preset strategy during the search iteration process; The objective function embedding module is used to embed the mathematical model of the computational complexity penalty term into the two-layer optimization objective function of the neural architecture search, forming an upper-layer optimization objective function with a penalty term; The initialization and statistics module is used to decode candidate architectures and calculate their actual computational complexity during the population initialization phase. The iterative screening and update module is used to dynamically evaluate and screen candidate architectures based on the upper-level objective function with penalty term and the actual computational complexity during the population iterative update stage, and update the population, and finally output a high-precision target detection network architecture that meets the complexity constraints.
[0068] When the above system modules are executed, they implement the steps of the method described in Embodiment 1.
[0069] Example 3 This embodiment provides an electronic device, including a memory and a processor. The memory stores a computer program, and when the processor runs the program, it executes the steps described in Embodiment 1 to output a lightweight, high-precision target detection network model. Specifically, this electronic device can be a desktop computer, a portable computer, a smart mobile terminal, a server, etc. No limitation is made here; any electronic device that can implement this invention falls within the protection scope of this invention.
[0070] Example 4 This embodiment provides a computer-readable storage medium storing a computer program. When the program is called by a processor, it executes the steps of the method described in Embodiment 1, which can be used for automated training and lightweight model generation for target detection neural architecture search.
[0071] Optionally, the computer-readable storage medium may be non-volatile memory (NVM), such as at least one disk storage device.
[0072] Optionally, the computer-readable storage medium may also be at least one storage device located remotely from the aforementioned processor.
[0073] The above description, in conjunction with specific preferred embodiments, provides a further detailed explanation of the present invention. It should not be construed that the specific implementation of the present invention is limited to these descriptions. For those skilled in the art, various simple deductions or substitutions can be made without departing from the concept of the present invention, and all such modifications and substitutions should be considered within the scope of protection of the present invention.
Claims
1. A method for designing a computational complexity penalty term for neural architecture search in object detection, characterized in that, Includes the following steps: Step S1: Determine the quantitative index of computational complexity, and dynamically generate a complexity constraint threshold based on the computing power information of the target deployment equipment, as a dynamic threshold; Step S2: Using the dynamic threshold, construct a mathematical model of computational complexity penalty term including dynamic penalty coefficient, wherein the dynamic penalty coefficient gradually increases from the initial value according to a preset strategy during the search iteration process; Step S3: Embed the mathematical model of the computational complexity penalty term into the two-layer optimization objective function of the neural architecture search to form an upper-layer optimization objective function with a penalty term, which is used to simultaneously optimize detection accuracy and computational complexity in architecture search. Step S4: In the population initialization phase, decode each candidate architecture and calculate its actual computational complexity based on the quantification index, and output each candidate architecture and its actual computational complexity. Step S5: In the population iteration and update phase, based on the upper-level optimization objective function with penalty term and the actual computational complexity of each candidate architecture, the candidate architecture is evaluated and screened using the computational complexity penalty term. In the early stage of the search, candidate architectures whose actual complexity slightly exceeds the dynamic threshold are allowed to participate in evolution with reduced fitness. In the later stage of the search, candidate architectures whose actual complexity exceeds the dynamic threshold are removed, and the population is updated.
2. The method according to claim 1, characterized in that, In step S1, the quantification index of computational complexity includes the number of floating-point operations or hardware inference latency; when the quantification index is hardware inference latency, it is estimated by using a pre-built target device operation-level latency lookup table.
3. The method according to claim 1, characterized in that, In step S1, the method for dynamically generating the complexity constraint threshold is as follows: obtain the peak floating-point computing power and target inference frame rate of the target deployment device, and calculate the dynamic threshold based on the preset time margin coefficient.
4. The method according to claim 1, characterized in that, In step S2, the dynamic penalty coefficient increases from its initial value according to an exponential growth strategy during the coarse search phase, and is set to infinity during the fine training phase to achieve hard constraints.
5. The method according to claim 1, characterized in that, In step S4, the population initialization adopts a hybrid strategy combining elite initialization and random generation, wherein the elite individuals are derived from the pre-trained target detection model architecture.
6. The method according to claim 1, characterized in that, In step S5, the early search stage refers to the coarse search stage of population iteration, in which candidate architectures with actual computational complexity within 1.2 times the dynamic threshold are allowed to participate in evolution; the late search stage refers to the fine training stage, in which the dynamic penalty coefficient is made to approach infinity, and any candidate architecture with actual computational complexity exceeding the dynamic threshold is hard-removed.
7. The method according to any one of claims 1-6, characterized in that, In step S3, the upper-level optimization objective function with a penalty term is specifically the sum of the detection loss on the validation set and the dynamic penalty coefficient multiplied by the penalty term. The lower the value of this objective function, the higher the fitness of the candidate architecture. The two-layer optimization objective function also includes a lower-level optimization objective function, which is a composite loss function containing classification loss, regression loss, and confidence loss. It is used to optimize the network weight parameters under a given candidate architecture. The minimization result of the lower-level optimization objective function provides the computational basis for the detection loss on the validation set.
8. A system for designing a computational complexity penalty term for searching a neural architecture for object detection, characterized in that, include: The threshold generation module is used to determine the quantitative index of computational complexity and dynamically generate complexity constraint thresholds based on the computing power information of the target deployment equipment, which serve as dynamic thresholds. The penalty term construction module is used to construct a mathematical model of computational complexity penalty term containing dynamic penalty coefficients using the dynamic threshold, wherein the dynamic penalty coefficients increase according to a preset strategy during the search iteration process; The objective function embedding module is used to embed the mathematical model of the computational complexity penalty term into the two-layer optimization objective function of the neural architecture search, forming an upper-layer optimization objective function with a penalty term; The initialization and statistics module is used to decode candidate architectures and calculate their actual computational complexity during the population initialization phase. The iterative screening and update module is used to dynamically evaluate and screen candidate architectures based on the upper-level objective function with penalty term and the actual computational complexity during the population iterative update stage, and update the population, and finally output a high-precision target detection network architecture that meets the complexity constraints.
9. An electronic device comprising a memory, a processor, and a computer program stored in the memory and executable on the processor, characterized in that, When the processor executes the program, it implements the computational complexity penalty term design method for target detection neural architecture search as described in any one of claims 1 to 7.
10. A computer-readable storage medium having a computer program stored thereon, characterized in that, When the program is executed by the processor, it implements the steps of the computational complexity penalty term design method for target detection neural architecture search as described in any one of claims 1 to 7.