Lightweight target detection method, lightweight method and device of target detection model
Patent Information
- Authority / Receiving Office
- CN · China
- Patent Type
- Patents(China)
- Current Assignee / Owner
- CHONGQING UNIV OF POSTS & TELECOMM
- Filing Date
- 2025-01-09
- Publication Date
- 2026-06-23
Smart Images

Figure CN119942137B_ABST
Abstract
Description
Technical Field
[0001] This invention belongs to the field of computer vision, specifically relating to a lightweight object detection method, a lightweight method and apparatus for object detection models. Background Technology
[0002] Object detection is a crucial core area of computer vision, primarily focusing on object localization and classification. As one of the fundamental problems in computer vision, object detection forms the basis for many other visual tasks, such as instance segmentation, image annotation, and object tracking. In recent years, the rapid development of deep learning technology has greatly propelled the advancement of object detection techniques, leading to significant breakthroughs and widespread applications in fields such as autonomous driving, video surveillance, medical image analysis, and industrial inspection.
[0003] Before the advent of deep learning, traditional image processing relied on manual feature extraction. This required designing corresponding feature extractors for different detection tasks based on prior knowledge. In complex scenes, the matching degree between prior features and real targets was low, resulting in poor model robustness. With the continuous advancement of deep learning research, convolutional neural network (CNN) models have been increasingly applied to tasks such as image recognition, classification, and detection, becoming a major research algorithm. CNN models have stronger generalization and discrimination capabilities than manually extracted features, and have become the mainstream technology in the field of object detection.
[0004] However, as the complexity of computer vision tasks increases and convolutional neural networks continue to evolve, their structures become increasingly complex through recursion and reuse, with ever-increasing network layers and parameters. This places ever-higher demands on computational and storage hardware. While deep learning-based object detection algorithms perform well, their high computational cost makes them difficult to meet real-time requirements, especially on edge devices with limited computing power. Summary of the Invention
[0005] To address the aforementioned issues, this invention provides a lightweight target detection method, a lightweight target detection model, and a lightweight device. This invention combines a lightweight feature extraction module and cross-layer ranking pruning, which can significantly reduce the number of model parameters and computational load while maintaining detection accuracy, thereby improving the detection frame rate and meeting the real-time detection needs of target detection models on edge devices.
[0006] In a first aspect, the present invention provides a lightweight target detection method, the method comprising:
[0007] Acquire the target image;
[0008] The target image is input into the backbone network of the target detection model to obtain the target feature information of the target image; the backbone network adopts the lightweight feature extraction module PGELAN; the lightweight feature extraction module PGELAN includes two convolutional layers and multiple lightweight convolutional modules PGConv; after the first convolutional layer, multiple lightweight convolutional modules PGConv are connected in progressively decreasing order, and after the multiple lightweight convolutional modules PGConv, a second convolutional layer is connected;
[0009] The target image is input into the neck network of the target detection model to obtain the target fusion information of the target image;
[0010] The target image is input into the head network of the target detection model to obtain the target object information of the target image.
[0011] In a second aspect, the present invention also provides a lightweight method for object detection models, the method comprising:
[0012] An object detection model is obtained, comprising a backbone network, a neck network, and a head network. The backbone network employs a lightweight feature extraction module, PGELAN. The lightweight feature extraction module PGELAN includes two convolutional layers and multiple lightweight convolutional modules, PGConv. Each convolutional layer is followed by multiple progressively decreasing lightweight convolutional modules, PGConv, and a second convolutional layer is connected to each of the multiple lightweight convolutional modules. The backbone network, neck network, and head network each include multiple modules, each module includes multiple convolutional layers, each convolutional layer includes multiple filters, each filter includes multiple convolutional kernels, and each convolutional kernel includes multiple weight parameters.
[0013] The first importance of each filter in the current module is calculated based on the values of all filters in the current module and the average filter weight value of the convolutional layer.
[0014] Based on the global pruning rate of the current module, the filters with the lowest first importance are removed through soft pruning;
[0015] Determine the size of the convolution kernel of the removed filter, as well as the number of input and output channels of the convolutional layer in which it is located;
[0016] The number of weight parameters reduced after soft pruning is calculated based on the size of the convolution kernel of the removed filter and the number of input and output channels of the convolutional layer.
[0017] The pruning rate of each convolutional layer in the current module is calculated based on whether the filters of each convolutional layer have been removed and the number of filters in each convolutional layer.
[0018] Based on the difference in feature maps of the current module before and after soft pruning, the second importance of each filter in the current module that was soft pruned is calculated.
[0019] Based on the pruning rate of each convolutional layer in the current module, filters with lower second importance are removed through hard pruning.
[0020] In a third aspect, the present invention also provides a lightweight target detection device, the device comprising:
[0021] The first acquisition module is used to acquire the target image;
[0022] The first processing unit is used to input the target image into the backbone network of the target detection model to obtain the target feature information of the target image; the backbone network adopts the lightweight feature extraction module PGELAN; the lightweight feature extraction module PGELAN includes two convolutional layers and multiple lightweight convolutional modules PGConv; the first convolutional layer is followed by multiple lightweight convolutional modules PGConv in progressively decreasing order, and the multiple lightweight convolutional modules PGConv are followed by a second convolutional layer;
[0023] The second processing unit is used to input the target image into the neck network of the target detection model to obtain the target fusion information of the target image;
[0024] The third processing unit is used to input the target image into the head network of the target detection model to obtain the target object information of the target image.
[0025] In a fourth aspect, the present invention also provides a lightweight device for a target detection model, the device comprising:
[0026] The second acquisition module is used to acquire the target detection model, which includes a backbone network, a neck network, and a head network. The backbone network uses a lightweight feature extraction module PGELAN. The lightweight feature extraction module PGELAN includes two convolutional layers and multiple lightweight convolutional modules PGConv. After the first convolutional layer, multiple lightweight convolutional modules PGConv are connected in progressively decreasing order. After the multiple lightweight convolutional modules PGConv, a second convolutional layer is connected. The backbone network, neck network, and head network each include multiple modules, each module includes multiple convolutional layers, each convolutional layer includes multiple filters, each filter includes multiple convolutional kernels, and each convolutional kernel includes multiple weight parameters.
[0027] The first calculation unit is used to calculate the first importance of each filter in the current module based on the values of all filters in the current module and the average filter weight value of the convolutional layer.
[0028] The first pruning unit is used to remove the first less important filter by soft pruning based on the global pruning rate of the current module.
[0029] The second calculation unit is used to determine the size of the convolution kernel of the removed filter and the number of input channels and output channels of the convolutional layer in which it is located.
[0030] The third calculation unit is used to calculate the number of weight parameters reduced after soft pruning based on the size of the convolution kernel of the removed filter and the number of input and output channels of the convolutional layer.
[0031] The fourth calculation unit is used to calculate the pruning rate of each convolutional layer in the current module based on whether the filters of each convolutional layer in the current module have been removed and the number of filters in each convolutional layer.
[0032] The fifth calculation unit is used to calculate the second importance of each filter that has been soft pruned in the current module based on the difference in feature maps before and after soft pruning.
[0033] The second pruning unit is used to remove the second least important filter by hard pruning based on the pruning rate of each convolutional layer in the current module.
[0034] The beneficial effects of this invention are:
[0035] The lightweight object detection method and apparatus of this invention replaces the feature extraction module in the standard object detection model with the lightweight feature extraction module PGELAN. PGELAN uses lightweight convolution PGConv, which can reduce the number of parameters while outputting convolutional features from different receptive fields to obtain richer feature information, thereby maintaining strong feature extraction capabilities. The lightweight object detection model method and apparatus of this invention prunes the optimized object detection model through a global filter weight cross-layer ranking algorithm and a minimum norm change filter selection algorithm. With only a slight decrease in accuracy, redundant features are significantly pruned, thereby reducing the number of model parameters and computational overhead, and significantly improving the detection efficiency of the object detection model. Attached Figure Description
[0036] Figure 1 This is a flowchart of the lightweight target detection method according to an embodiment of the present invention;
[0037] Figure 2 This is a schematic diagram of the target detection model structure according to an embodiment of the present invention;
[0038] Figure 3 This is a schematic diagram of the PGELAN lightweight feature extraction module structure according to an embodiment of the present invention;
[0039] Figure 4 This is a schematic diagram of the lightweight convolutional PGConv structure according to an embodiment of the present invention;
[0040] Figure 5 This is a flowchart of a lightweight target detection model according to an embodiment of the present invention;
[0041] Figure 6 This is a schematic diagram of the lightweight target detection device according to an embodiment of the present invention;
[0042] Figure 7 This is a schematic diagram of the lightweight device structure of the target detection model according to an embodiment of the present invention. Detailed Implementation
[0043] The technical solutions of the embodiments of the present invention will be clearly and completely described below with reference to the accompanying drawings. Obviously, the described embodiments are only some embodiments of the present invention, and not all embodiments. Based on the embodiments of the present invention, all other embodiments obtained by those skilled in the art without creative effort are within the scope of protection of the present invention.
[0044] The lightweight target detection method and target detection model provided in this application can be implemented based on Artificial Intelligence (AI) technology. Artificial intelligence is the theory, method, technology, and application system that uses digital computers or machines controlled by digital computers to simulate, extend, and expand human intelligence, perceive the environment, acquire knowledge, and use that knowledge to obtain optimal results. In other words, artificial intelligence is a comprehensive technology within computer science that attempts to understand the essence of intelligence and produce a new type of intelligent machine that can react in a way similar to human intelligence. Artificial intelligence studies the design principles and implementation methods of various intelligent machines, enabling them to have perception, reasoning, and decision-making functions. Artificial intelligence technology is a comprehensive discipline involving a wide range of fields, including both hardware and software technologies. Basic AI technologies generally include sensors, dedicated AI chips, cloud computing, distributed storage, big data processing technology, operating / interactive systems, and mechatronics. Artificial intelligence software technologies mainly include computer vision technology, speech processing technology, natural language processing technology, and machine learning / deep learning.
[0045] To facilitate understanding of the embodiments of this application, the following uses the execution of an image target detection task as an example to describe in detail the specific implementation of the lightweight target detection method, the lightweight target detection model method, and the apparatus of the present invention.
[0046] Figure 1This is a flowchart of a lightweight target detection method according to an embodiment of the present invention, as follows: Figure 1 As shown, the method includes:
[0047] 101. Obtain the target image;
[0048] The target image is an image containing information about the target object to be detected. These target images can encompass numerous different fields and methods. For example, in the field of security monitoring, target images typically originate from widely distributed surveillance cameras. Each target image may contain important information about the target object, such as its physical characteristics, actions, and movement trajectories. For instance, in the field of medical image processing, target images can be image data acquired through advanced medical imaging equipment such as X-rays, CT scans, and MRI scans. These target images present the internal structure and physiological condition of the human body in a visually intuitive way, and each pixel may contain important information about the target object, such as internal lesions or organ conditions of a patient.
[0049] 102. Input the target image into the backbone network of the target detection model to obtain the target feature information of the target image; the backbone network adopts the lightweight feature extraction module PGELAN; the lightweight feature extraction module PGELAN includes two convolutional layers and multiple lightweight convolutional modules PGConv; after the first convolutional layer, multiple lightweight convolutional modules PGConv are connected in progressively decreasing order, and after the multiple lightweight convolutional modules PGConv, a second convolutional layer is connected;
[0050] In embodiments of the present invention, such as Figure 2 As shown, the target detection model can be the standard target detection algorithm network structure of the YOLO series, which includes a backbone network, a neck network, and a head network. Taking YOLOv7-tiny as an example, the backbone network contains four feature extraction modules and three downsampling modules, used to extract features at different scales and generate multi-scale feature maps. Subsequently, these feature information are fused in the neck network and head network to obtain the final target detection result. The neck network and head network contain four feature extraction modules, two upsampling modules, and one detection module, which can be used to extract deeper feature information, thereby improving the detection effect.
[0051] The backbone network is responsible for deep feature extraction from the input image, capturing feature information at various levels, from basic texture and color distribution to more abstract object contours and structures, providing a foundation for subsequent target detection. The neck network further processes and fuses the features extracted by the backbone network, comprehensively considering feature information at different levels. It cleverly combines and adjusts features from different stages of the backbone network to generate richer, more comprehensive feature representations with stronger semantic expressive power, providing higher-quality feature input for the head network to accurately detect targets. Based on the feature information provided by the backbone and neck networks, the head network ultimately completes the target detection task, predicting key information such as the target's category, location, and size. It also employs advanced techniques, such as non-maximum suppression (NMS) algorithm, to remove duplicate detection results, ensuring that each target is detected only once.
[0052] In this invention, considering the redundant computation problem of the traditional backbone network's feature extraction module when processing complex image information, the lightweight feature extraction module PGELAN is used to replace the feature extraction module in the backbone network. It can capture key feature information in the image in a shorter time. Compared with the original module, it can process more image data in the same amount of time, thereby improving the feature extraction efficiency of the entire model and providing more timely feature support for subsequent tasks such as object detection and classification.
[0053] Figure 3 This is a schematic diagram of the lightweight feature extraction module PGELAN according to an embodiment of the present invention, as shown below. Figure 3 As shown, the lightweight feature extraction module PGELAN includes two convolutional layers and multiple lightweight convolutional modules PGConv. The first convolutional layer is followed by multiple progressively smaller lightweight convolutional modules PGConv, and these multiple PGConv modules are then connected to a second convolutional layer. The input size of this module is W×H×Cin, where the feature map size is W×H and the number of channels is Cin. The module contains four convolutional operations. First, Conv1 halves the number of channels, resulting in C_ output channels and a kernel size of 1, used to integrate input information, reduce dimensionality, and extract shallow features. The feature map is then sequentially processed by two lightweight convolutional modules PGConv. PGConv2 has a kernel size of 3 and a receptive field size of 3; PGConv3 has a kernel size of 3 and a receptive field size of 5, utilizing different receptive fields to obtain multi-scale feature information. Then, the outputs of Conv1, PGConv2, and PGConv3 are fused using a Concat operation to generate a fused feature map, which serves as the input to Conv4. Conv4 has a kernel size of 1 and an output channel number of Cout. It learns the interrelationships between channels by linearly combining the features of different channels, thereby achieving cross-channel information integration.
[0054] in, Figure 4 This is a schematic diagram of the lightweight convolutional PGConv structure according to an embodiment of the present invention, as shown below. Figure 4 As shown, PGConv consists of three convolutions. The input feature map size is W×H×Cin. First, the number of channels is halved by convolution Conv1 with a kernel size of 3, which is used to extract the main channel features and integrate information. Then, it goes through two partial convolutions PConv: PConv2 has a kernel size of 3 and a receptive field of 5; PConv3 has a kernel size of 3 and a receptive field of 7. The output channels of PConv2 and PConv3 are added together and fused with the output of Conv1 through the Concat operation to form the final output feature map. PGConv reduces the number of parameters while expanding the receptive field of some channels to obtain richer hierarchical feature representations.
[0055] Based on the aforementioned lightweight feature extraction module PGELAN and the lightweight convolution PGConv, step 102 may include:
[0056] The first feature map of the target image is extracted using the first convolutional layer;
[0057] Multiple lightweight convolutional modules, PGConv, are used to extract multiple second feature maps corresponding to the first feature map of the target image.
[0058] In this embodiment, the first feature map or the second feature map of the target image is split into a first channel feature map and a second channel feature map according to the number of input channels, based on a preset ratio; the channel features of the first channel feature map are extracted using the current lightweight convolution module PGConv; and the channel features of the first channel feature map are concatenated with the second channel feature map to obtain one of the second feature maps of the target image.
[0059] The first feature map and multiple second feature maps of the target image are concatenated using the concat join method to obtain the third feature map of the target image.
[0060] The fourth feature map corresponding to the third feature map of the target image is extracted using a second convolutional layer; the fourth feature map is used to indicate the target feature information of the target image.
[0061] It is understandable that the size of the first channel feature map and the second channel feature map of the target image can be the same or different. For example, when the preset ratio is 0.25, the first feature map to be split is a 64-channel feature map. The first 16 channels of the feature map are convolved using 16 filters to obtain the first channel feature map, and the feature maps of the remaining 48 channels are directly used as the second channel feature map. The first channel feature map and the second channel feature map are concat connected to obtain one of the second feature maps. If the second feature map is to be split, the splitting process is similar, and will not be described in detail in this embodiment.
[0062] In a preferred embodiment of the present invention, the activation function used in the lightweight feature extraction module PGELAN is the HardSwish function. The performance of HardSwish is similar to that of SiLU, but because it uses piecewise linear calculation, it avoids the sigmoid operation in SiLU. Therefore, the calculation is simpler, faster, and has a lower memory footprint, making it more suitable for edge devices and embedded systems.
[0063] 103. Input the target image into the neck network of the target detection model to obtain the target fusion information of the target image;
[0064] In this embodiment of the invention, the neck network of the target detection model can be a common structure including PANFPN, SPPCSPC, Elan-w, upsample, etc.; this embodiment does not specifically limit it. In this embodiment, the neck network can be used to further fuse the target feature information extracted from the head network of the target detection model to obtain the target fusion information of the target image.
[0065] 104. Input the target image into the head network of the target detection model to obtain the target object information of the target image.
[0066] In this embodiment of the invention, the head network of the target detection model can be a common one containing multiple branches, using 1×1 convolution operations to obtain the corresponding number of categories. This invention does not specifically limit this. In this embodiment of the invention, the head network can then be used to further classify the target fusion information extracted from the neck network of the target detection model to obtain the target object information of the target image.
[0067] This invention employs a lightweight feature extraction module, PGELAN, in the backbone network of the target detection model. This reduces the number of parameters while outputting convolutional features from different receptive fields, thereby obtaining richer feature information and maintaining strong feature extraction capabilities.
[0068] Figure 5This is a flowchart of a lightweight method for target detection models according to an embodiment of the present invention, as shown below. Figure 5 As shown, the method includes:
[0069] 201. Obtain an object detection model, which includes a backbone network, a neck network, and a head network. The backbone network uses a lightweight feature extraction module PGELAN. The lightweight feature extraction module PGELAN includes two convolutional layers and multiple lightweight convolutional modules PGConv. After the first convolutional layer, multiple lightweight convolutional modules PGConv are connected in progressively decreasing order. After the multiple lightweight convolutional modules PGConv, a second convolutional layer is connected. The backbone network, neck network, and head network each include multiple modules, each module includes multiple convolutional layers, each convolutional layer includes multiple filters, each filter includes multiple convolutional kernels, and each convolutional kernel includes multiple weight parameters.
[0070] In this embodiment of the invention, the target detection model is the same as the target detection model used in the lightweight target detection method. Since the target detection model has a lot of redundant parameters, this embodiment needs to prune the filters in the target detection model so that the pruned target detection model has the characteristics of being lightweight.
[0071] 202. Based on the values of all filters in the current module and the average filter weight value of the convolutional layer, calculate the first importance of each filter in the current module;
[0072] In this embodiment of the invention, for each filter of each convolutional layer i in the target detection model The primary importance of each filter is determined using the following formula:
[0073]
[0074] in, This indicates the first importance of filter j in convolutional layer i. #AvgW represents the initial value of filter j in convolutional layer i. i This represents the average filter weight of convolutional layer i, which allows the importance calculation to consider not only the magnitude of the weight but also the weight ratio of the layer in which the weight is located, making the importance calculation more accurate.
[0075] 203. Based on the global pruning rate of the current module, remove the filters with the lowest first importance through soft pruning;
[0076] This embodiment can globally sort the importance of each filter and remove filters with lower importance through soft pruning; for these filters, soft pruning with weights reset to 0 is performed. This process is repeated until all convolutional layers in the current module have undergone soft pruning. Then, hard pruning is performed to remove the least important filters. 204. Determine the size of the convolutional kernel of the removed filters and the number of input and output channels of the convolutional layer in which they reside;
[0077] In this embodiment of the invention, when a filter is removed by soft pruning, the size of the corresponding convolutional kernel, as well as the number of input and output channels of the convolutional layer containing the filter, can be determined. This information can be used to calculate the number of weight parameters.
[0078] 205. Based on the size of the convolution kernel of the removed filter and the number of input and output channels of the convolutional layer, calculate the number of weight parameters reduced after soft pruning.
[0079] In this embodiment of the invention, the calculation method for the reduction in the number of weight parameters after soft pruning includes:
[0080] The number of first weight parameters is calculated by multiplying the square of the convolution kernel of the removed filter by the number of input channels in the convolutional layer.
[0081] The number of second weight parameters is calculated by multiplying the square of the convolution kernel of the removed filter by the number of output channels of the convolutional layer.
[0082] The number of weight parameters reduced after soft pruning is calculated based on the sum of the number of the first weight parameter and the number of the second weight parameter.
[0083] For example, the number of parameters that can be reduced after the current filter is clipped is expressed as:
[0084] Params = Cin i ×K i ×K i +Cout i+1 ×K i+1 ×K i+1
[0085] Among them, Cin i ×K i ×K i Represents the number of first-weighted parameters, Cout i+1 ×K i+1 ×K i+1 Cin represents the number of second weighted parameters. i Cout represents the number of input channels in convolutional layer i where the removed filter is located. i+1K represents the number of output channels in convolutional layer i+1 where the removed filter is located. i K represents the kernel size of convolutional layer i containing the removed filter. i+1 This indicates the kernel size of the convolutional layer i+1 where the removed filter is located.
[0086] Since the embodiments of the present invention consider the current module as a whole, filters from different convolutional layers in the same module can be removed across layers. This can significantly reduce redundant features, thereby reducing the number of model parameters and computational overhead, and significantly improving the detection efficiency of the target detection model.
[0087] It is understandable that the global pruning rate refers to the proportion of parameters removed from the entire network during pruning operations on a neural network, relative to the total number of parameters in the original network. Assuming the total number of parameters in the original object detection model is N, and the number of parameters removed after soft pruning is M, then the global pruning rate P is the ratio of M to N. In this embodiment of the invention, by repeatedly executing steps 202-205, filters with smaller weights can be removed, continuously removing parameters until the number of parameters reduced by the softly pruned filters meets the parameter count requirement corresponding to the global pruning rate, at which point the process stops.
[0088] The global pruning rate can be preset according to actual conditions. For example, if the total number of parameters in the original object detection model is 10M, and the global pruning rate is set to 30%, then by removing filters with smaller weights in steps 202-205, the removal continues until the number of parameters reduced by the softly pruned filters reaches 3M, at which point soft pruning can be stopped. 206. Based on whether filters in each convolutional layer of the current module have been removed and the number of filters in each convolutional layer, the pruning rate of each convolutional layer in the current module is calculated.
[0089] In this embodiment of the invention, the pruning rate of each convolutional layer in the current module is calculated as follows:
[0090] The first indication number is calculated based on whether the filters of each convolutional layer in the current module have been removed.
[0091] The second indicator number is calculated based on the number of filters in each convolutional layer of the current module;
[0092] The pruning rate of each convolutional layer in the current module is calculated based on the ratio of the first number of indicators to the second number of indicators.
[0093] For example, the pruning rate p of each convolutional layer in the current module i The calculation formula is expressed as:
[0094]
[0095] Where <(·) is an indicator function used to determine the filter Whether it is removed in the global sort, n i This represents the number of filters in convolutional layer i, determined by n. i The number of filters removed is used as a first indicator, and the number of filters in convolutional layer i is used as a second indicator. The pruning rate of each convolutional layer in the current module can be measured by the ratio of the first indicator and the second indicator.
[0096] 207. Based on the difference in feature maps of the current module before and after soft pruning, calculate the second importance of each filter in the current module that has been soft pruned;
[0097] In this embodiment of the invention, the calculation method for the second importance of each soft-pruned filter includes:
[0098] The input image is passed through the current module before soft pruning to obtain the first feature map;
[0099] The input image is passed through the current module after soft pruning to obtain the second feature map;
[0100] The second importance of each filter that has been soft-pruned is obtained based on the similarity distance between the first feature map and the second feature map.
[0101] For example, for each module, the feature map Feao generated by the current module can be calculated. For the convolutional layer i and filter j from the bottom of the group upwards, after soft pruning by setting the filter parameters to 0, the feature map generated by the current module is recalculated. The magnitude of feature map change is used as the second importance of the current filter. The formula for calculating the second importance is as follows:
[0102]
[0103] in, This indicates the second importance of filter j in the soft-pruned convolutional layer i. Distance functions such as Manhattan distance, Euclidean distance, Chebyshev distance, and Minkowski distance can be used.
[0104] For example, suppose the current module consists of convolutional layers 20 through 25. The soft-pruned filter is located at the first filter of the 20th convolutional layer. After the input image passes through convolutional layers 1 through 19, the output feature map of the 19th convolutional layer is obtained. Before soft pruning the first filter of the 20th convolutional layer, passing the output feature map of the 19th convolutional layer through convolutional layers 20 through 25 results in the image "Feao". After soft pruning the first filter of the 20th convolutional layer, passing the output map of the 19th convolutional layer through convolutional layers 20 through 25 results in a different image because the processing of the first filter of the 20th convolutional layer is missing. Correspondingly, Feao and The distance can reflect the second importance of the first filter in the 20th convolutional layer that has been soft-pruned.
[0105] 208. Based on the pruning rate of each convolutional layer in the current module, remove the second least important filter through hard pruning.
[0106] This embodiment can globally sort the second importance of filters and remove filters with lower second importance through hard pruning.
[0107] This invention combines soft and hard pruning to reduce the computational cost involved in pruning operations; it can significantly prune redundant features, thereby reducing the number of model parameters and computational overhead, and significantly improving the detection efficiency of the target detection model.
[0108] Figure 6 This is a schematic diagram of the lightweight target detection device according to an embodiment of the present invention; as shown. Figure 6 As shown, the device includes:
[0109] The first acquisition module 301 is used to acquire the target image;
[0110] The first processing unit 302 is used to input the target image into the backbone network of the target detection model to obtain the target feature information of the target image; the backbone network adopts the lightweight feature extraction module PGELAN; the lightweight feature extraction module PGELAN includes two convolutional layers and multiple lightweight convolutional modules PGConv; after the first convolutional layer, multiple lightweight convolutional modules PGConv are connected in progressively decreasing order, and after the multiple lightweight convolutional modules PGConv, a second convolutional layer is connected;
[0111] The second processing unit 303 is used to input the target image into the neck network of the target detection model to obtain the target fusion information of the target image;
[0112] The third processing unit 304 is used to input the target image into the head network of the target detection model to obtain the target object information of the target image.
[0113] Figure 7 This is a schematic diagram of the lightweight device structure of the target detection model according to an embodiment of the present invention, as shown below. Figure 7 As shown, the device includes:
[0114] The second acquisition module 401 is used to acquire an object detection model, which includes a backbone network, a neck network, and a head network. The backbone network uses a lightweight feature extraction module PGELAN. The lightweight feature extraction module PGELAN includes two convolutional layers and multiple lightweight convolutional modules PGConv. After the first convolutional layer, multiple lightweight convolutional modules PGConv are connected in progressively decreasing order. After the multiple lightweight convolutional modules PGConv, a second convolutional layer is connected. The backbone network, neck network, and head network each include multiple modules, each module includes multiple convolutional layers, each convolutional layer includes multiple filters, each filter includes multiple convolutional kernels, and each convolutional kernel includes multiple weight parameters.
[0115] The first calculation unit 402 is used to calculate the first importance of each filter in the current module based on the values of all filters in the current module and the average filter weight value of the convolutional layer.
[0116] The first pruning unit 403 is used to remove the first less important filter by soft pruning based on the global pruning rate of the current module.
[0117] The second calculation unit 404 is used to determine the size of the convolution kernel of the removed filter and the number of input channels and output channels of the convolution layer.
[0118] The third calculation unit 405 is used to calculate the number of weight parameters reduced after soft pruning based on the size of the convolution kernel of the removed filter and the number of input channels and output channels of the convolutional layer.
[0119] The fourth calculation unit 406 is used to calculate the pruning rate of each convolutional layer in the current module based on whether the filters of each convolutional layer in the current module have been removed and the number of filters in each convolutional layer.
[0120] The fifth calculation unit 407 is used to calculate the second importance of each filter that has been soft pruned in the current module based on the difference in feature maps before and after soft pruning.
[0121] The second pruning unit 408 is used to remove the second less important filter by hard pruning based on the pruning rate of each convolutional layer in the current module.
[0122] Those skilled in the art will understand that all or part of the steps in the various methods of the above embodiments can be implemented by a program instructing related hardware. The program can be stored in a computer-readable storage medium, which may include ROM, RAM, disk, or optical disk, etc.
[0123] Although embodiments of the invention have been shown and described, it will be understood by those skilled in the art that various changes, modifications, substitutions and alterations can be made to these embodiments without departing from the principles and spirit of the invention, the scope of which is defined by the appended claims and their equivalents.
Claims
1. A lightweight target detection method, characterized in that, The method includes: Acquire the target image; The target image is input into the backbone network of the target detection model to obtain the target feature information of the target image; the backbone network adopts the lightweight feature extraction module PGELAN; the lightweight feature extraction module PGELAN includes two convolutional layers and multiple lightweight convolutional modules PGConv; after the first convolutional layer, multiple lightweight convolutional modules PGConv are connected in progressively decreasing order, and after the multiple lightweight convolutional modules PGConv are connected to a second convolutional layer; the process of inputting the target image into the backbone network of the target detection model to obtain the target feature information of the target image includes: The first feature map of the target image is extracted using the first convolutional layer; The extraction of multiple second feature maps corresponding to the first feature map of the target image using a series of progressively decreasing lightweight convolutional modules PGConv includes: According to a preset ratio, the first feature map or the second feature map of the target image is split into a first channel feature map and a second channel feature map according to the number of input channels; The channel features of the first channel feature map are extracted using the current lightweight convolution module PGConv. The channel features of the first channel feature map are concatted with the second channel feature map to obtain one of the second feature maps of the target image; The first feature map and multiple second feature maps of the target image are concatenated using the concat join method to obtain the third feature map of the target image. A second convolutional layer is used to extract a fourth feature map corresponding to the third feature map of the target image; the fourth feature map is used to indicate the target feature information of the target image. PGConv consists of three convolutions. First, Conv1 halves the number of channels with a kernel size of 3, used to extract the main channel features and integrate information. Then, it goes through two partial convolutions PConv: PConv2 has a kernel size of 3 and a receptive field of 5; PConv3 has a kernel size of 3 and a receptive field of 7. The output channels of PConv2 and PConv3 are added together and fused with the output of Conv1 through the Concat operation to form the final output feature map. The target image is input into the neck network of the target detection model to obtain the target fusion information of the target image; The target image is input into the head network of the target detection model to obtain the target object information of the target image.
2. The lightweight target detection method according to claim 1, characterized in that, The lightweight convolutional module PGConv uses the HardSwish activation function.
3. A lightweight method for object detection models, characterized in that, The method includes: An object detection model is obtained, comprising a backbone network, a neck network, and a head network. The backbone network employs a lightweight feature extraction module, PGELAN. The PGELAN module includes two convolutional layers and multiple lightweight convolutional modules, PGConv. Each convolutional layer is followed by a series of progressively decreasing lightweight convolutional modules, PGConv, and then a second convolutional layer. The backbone network, neck network, and head network each comprise multiple modules, each module containing multiple convolutional layers, and each convolutional layer containing multiple... Each filter contains multiple convolutional kernels, and each kernel contains multiple weight parameters. PGConv consists of three convolutions. First, Conv1 halves the number of channels with a kernel size of 3, used to extract the main channel features and integrate information. Then, it goes through two partial convolutions, PConv2 and PConv3. PConv2 has a kernel size of 3 and a receptive field of 5, while PConv3 has a kernel size of 3 and a receptive field of 7. The output channels of PConv2 and PConv3 are added together and fused with the output of Conv1 through a Concat operation to form the final output feature map. The first importance of each filter in the current module is calculated based on the values of all filters in the current module and the average filter weight value of the convolutional layer. Based on the global pruning rate of the current module, the filters with the lowest first importance are removed through soft pruning; Determine the size of the convolution kernel containing the removed filter, as well as the number of input and output channels of the convolutional layer. The number of weight parameters reduced after soft pruning is calculated based on the size of the convolution kernel of the removed filter and the number of input and output channels of the convolutional layer. The pruning rate of each convolutional layer in the current module is calculated based on whether the filters of each convolutional layer have been removed and the number of filters in each convolutional layer. Based on the difference in feature maps of the current module before and after soft pruning, the second importance of each filter in the current module that was soft pruned is calculated. Based on the pruning rate of each convolutional layer in the current module, filters with lower second importance are removed through hard pruning.
4. The lightweight method for a target detection model according to claim 3, characterized in that, The calculation method for the reduction in the number of weight parameters after soft pruning includes: The number of first weight parameters is calculated by multiplying the square of the convolution kernel of the removed filter by the number of input channels in the convolutional layer. The number of second weight parameters is calculated by multiplying the square of the convolution kernel of the removed filter by the number of output channels of the convolutional layer. The number of weight parameters reduced after soft pruning is calculated based on the sum of the number of the first weight parameter and the number of the second weight parameter.
5. A lightweight method for a target detection model according to claim 3, characterized in that, The calculation method for the pruning rate of each convolutional layer in the current module includes: The first indication number is calculated based on whether the filters of each convolutional layer in the current module have been removed. The second indicator number is calculated based on the number of filters in each convolutional layer of the current module; The pruning rate of each convolutional layer in the current module is calculated based on the ratio of the first number of indicators to the second number of indicators.
6. A lightweight target detection device, characterized in that, The apparatus is used to perform the lightweight target detection method according to any one of claims 1 to 2, the apparatus comprising: The first acquisition module is used to acquire the target image; The first processing unit is used to input the target image into the backbone network of the target detection model to obtain the target feature information of the target image; the backbone network adopts the lightweight feature extraction module PGELAN; the lightweight feature extraction module PGELAN includes two convolutional layers and multiple lightweight convolutional modules PGConv; the first convolutional layer is followed by multiple lightweight convolutional modules PGConv in progressively decreasing order, and the multiple lightweight convolutional modules PGConv are followed by a second convolutional layer; The second processing unit is used to input the target image into the neck network of the target detection model to obtain the target fusion information of the target image; The third processing unit is used to input the target image into the head network of the target detection model to obtain the target object information of the target image.
7. A lightweight device for a target detection model, characterized in that, The apparatus is used to perform a lightweight method for the target detection model according to any one of claims 3 to 5, the apparatus comprising: The second acquisition module is used to acquire the target detection model, which includes a backbone network, a neck network, and a head network. The backbone network uses a lightweight feature extraction module PGELAN. The lightweight feature extraction module PGELAN includes two convolutional layers and multiple lightweight convolutional modules PGConv. After the first convolutional layer, multiple lightweight convolutional modules PGConv are connected in progressively decreasing order. After the multiple lightweight convolutional modules PGConv, a second convolutional layer is connected. The backbone network, neck network, and head network each include multiple modules, each module includes multiple convolutional layers, each convolutional layer includes multiple filters, each filter includes multiple convolutional kernels, and each convolutional kernel includes multiple weight parameters. The first calculation unit is used to calculate the first importance of each filter in the current module based on the values of all filters in the current module and the average filter weight value of the convolutional layer. The first pruning unit is used to remove the first less important filter by soft pruning based on the global pruning rate of the current module. The second calculation unit is used to determine the size of the convolution kernel of the removed filter and the number of input channels and output channels of the convolutional layer in which it is located. The third calculation unit is used to calculate the number of weight parameters reduced after soft pruning based on the size of the convolution kernel of the removed filter and the number of input and output channels of the convolutional layer. The fourth calculation unit is used to calculate the pruning rate of each convolutional layer in the current module based on whether the filters of each convolutional layer in the current module have been removed and the number of filters in each convolutional layer. The fifth calculation unit is used to calculate the second importance of each filter that has been soft pruned in the current module based on the difference in feature maps before and after soft pruning. The second pruning unit is used to remove the second least important filter by hard pruning based on the pruning rate of each convolutional layer in the current module.