Deep convolutional model hybrid compression method and device based on multi-agent cooperation

By constructing a dependency graph between layers of a deep convolutional model and performing reinforcement learning for multi-agent collaboration, a target compression action is generated, which solves the problem of poor compression performance of deep convolutional models and enables efficient deployment and performance optimization on edge devices.

CN122242612APending Publication Date: 2026-06-19CHONGQING JIAOTONG UNIV

Patent Information

Authority / Receiving Office
CN · China
Patent Type
Applications(China)
Current Assignee / Owner
CHONGQING JIAOTONG UNIV
Filing Date
2026-03-16
Publication Date
2026-06-19

AI Technical Summary

Technical Problem

In existing technologies, deep convolutional models are difficult to compress to the best effect and are difficult to deploy on resource-constrained edge devices. This is mainly because they ignore the complex coupling and dependency relationships within the neural network, resulting in low model compression efficiency.

Method used

By constructing a dependency graph between the layers of a deep convolutional model, grouping based on the dependency graph, and using multi-agent cooperative reinforcement learning for agent allocation, target compression actions are generated. Combined with pruning and quantization, a lightweight model is generated, and performance is optimized through multi-teacher distillation techniques.

Benefits of technology

It achieves efficient compression of deep convolutional models, maintains the structural integrity and reasoning ability of the models, is suitable for deployment on resource-constrained edge devices, reduces the number of parameters and computational cost, and at the same time maintains the model's feature extraction capability and accuracy to the greatest extent.

✦ Generated by Eureka AI based on patent content.

Smart Images

  • Figure CN122242612A_ABST
    Figure CN122242612A_ABST
Patent Text Reader

Abstract

This invention provides a method and apparatus for hybrid compression of deep convolutional models based on multi-agent collaboration, relating to the field of artificial intelligence technology. The method includes: acquiring the deep convolutional model to be compressed; constructing a dependency graph between layers and dividing it into multiple dependency groups; assigning agents to dependency groups and performing reinforcement learning to obtain reinforcement agents; generating target compression actions based on the reinforcement agents; pruning and quantizing the deep convolutional model to generate a target lightweight model; and performing multi-teacher knowledge distillation on the model. The method and apparatus provided by this invention, by constructing a dependency graph of the deep convolutional model and grouping it, assigning agents to dependency groups and performing reinforcement learning, generating target compression actions based on the reinforcement agents to compress the deep convolutional model, and performing multi-teacher knowledge distillation on the compressed model, achieves joint optimization of channel pruning and weight quantization, thereby improving the efficiency of model compression.
Need to check novelty before this filing date? Find Prior Art

Description

Technical Field

[0001] This invention relates to the field of artificial intelligence technology, and in particular to a method and apparatus for hybrid compression of deep convolutional models based on multi-agent collaboration. Background Technology

[0002] Convolutional neural networks have achieved remarkable success in artificial intelligence fields such as computer vision and natural language processing. As task complexity and performance requirements increase, the depth and width of network models continue to grow, leading to an exponential increase in the number of parameters and floating-point operations. However, in practical applications, the large amount of computing resources consumed and storage requirements make it difficult to directly deploy deep network models on resource-constrained edge devices (such as mobile terminals and embedded devices).

[0003] To address the aforementioned model deployment challenges, existing technologies primarily employ model compression techniques to reduce network parameters and computational load. Mainstream methods include network pruning and quantization. Network pruning sparsifies the model by removing redundant connections or channels; quantization compresses the model size by reducing the numerical precision of network weights and activation values. In current practice, structured pruning is typically used to adapt to general hardware acceleration, or post-training quantization and quantization-aware training are used to further compress the model. However, existing research is mostly limited to single methods and relies on manual intervention, failing to achieve optimal sparsity quantization bit width automatically. Furthermore, the complex coupling relationships within the neural network are ignored during pruning, making it difficult to achieve optimal model compression results.

[0004] Therefore, how to improve the compression efficiency of deep convolutional models and ensure the performance of the compressed models has become a technical problem that the industry urgently needs to solve. Summary of the Invention

[0005] This invention provides a hybrid compression method and apparatus for deep convolutional models based on multi-agent collaboration, which addresses the shortcomings of existing technologies in achieving optimal compression results for deep convolutional models and improves the compression efficiency of deep convolutional models.

[0006] This invention provides a hybrid compression method for deep convolutional models based on multi-agent cooperation, comprising: Obtain the deep convolutional model to be compressed, construct the dependency graph between each layer of the deep convolutional model, and group the layers based on the dependency graph to obtain multiple dependency groups; Based on the dependency group, agents are assigned, and reinforcement learning is performed on the agents to obtain reinforced agents; Based on the enhanced agent, a target compression action is generated for the dependency group corresponding to the enhanced agent; Based on the target compression action, the deep convolutional model is pruned and quantized to generate a target lightweight model.

[0007] In some embodiments, performing reinforcement learning on the agent to obtain a reinforced agent includes: Each agent is controlled to observe its own state and the channel pruning rate information of all other agents, and to generate a compression action for the corresponding dependency group; the compression action includes the channel pruning rate parameter and the weight quantization bit width parameter. Based on the compression action, the deep convolutional model is compressed, and the performance feedback index of the compressed deep convolutional model is obtained. The agent's policy network is updated based on the performance feedback metrics until the agent's output meets the preset convergence condition, thus obtaining the reinforced agent.

[0008] In some embodiments, the method further includes: Obtain multiple pre-trained teacher models; Based on the multiple pre-trained teacher models, the target lightweight model is trained by distillation.

[0009] In some embodiments, the distillation training of the target lightweight model based on the plurality of pre-trained teacher models includes: Obtain a training set; the training set includes input samples and the corresponding ground truth labels of the input samples; The input sample is input into the target lightweight model to obtain the student output probability distribution and the student intermediate layer feature map; The input samples are input into each teacher model to obtain the teacher output probability distribution and teacher intermediate layer feature map corresponding to each teacher model; Based on the student output probability distribution, determine the student loss of the target lightweight model on the current training samples; Based on the student output probability distribution, the teacher output probability distribution, the student intermediate layer feature map, and the teacher intermediate layer feature map, the first distillation loss and the second distillation loss are determined; Based on the student loss, the first distillation loss, and the second distillation loss, a total loss function is constructed. The parameters of the target lightweight model are updated based on the total loss function.

[0010] In some embodiments, the first distillation loss is determined based on the following steps: Based on the teacher output probability distribution, determine the prediction distribution entropy of each teacher model for the current training sample; Based on the student loss and the predicted distribution entropy, calculate the distillation weights of each teacher model in the current training batch; The teacher output probability distributions corresponding to each teacher model are weighted and fused according to the distillation weights to obtain the target teacher distribution, and the first distillation loss between the student output probability distribution and the target teacher distribution is calculated.

[0011] In some embodiments, the second distillation loss is determined based on the following steps: Global average pooling is performed on the student intermediate layer feature map to obtain the student feature vector; The student feature vectors are input into the classifiers of each teacher model to obtain the mapping output of student features in each teacher classification space; Based on the mapping output and the real label, calculate the feature fit loss; Based on the feature adaptability loss, calculate the hierarchical confidence of each teacher model for the student intermediate layer feature map; Calculate the variance of the student intermediate layer feature map, and determine the feature importance weights based on the variance; The second distillation loss is calculated based on the hierarchical confidence level and the feature importance weight.

[0012] This invention provides a hybrid compression device for deep convolutional models based on multi-agent cooperation, comprising: The grouping module is used to obtain the deep convolutional model to be compressed, construct the dependency graph between the layers of the deep convolutional model, and group the layers based on the dependency graph to obtain multiple dependency groups. The reinforcement module is used to allocate agents based on the dependency group, perform reinforcement learning on the agents, and obtain reinforced agents. The decision module is used to generate a target compression action for the dependency group corresponding to the reinforced agent based on the reinforced agent. The compression module is used to prune and quantize the deep convolutional model based on the target compression action to generate the target lightweight model.

[0013] The present invention provides an electronic device, including a memory, a processor, and a computer program stored in the memory and running on the processor, wherein the processor executes the computer program to implement the hybrid compression method of deep convolutional models based on multi-agent cooperation.

[0014] The present invention provides a non-transitory computer-readable storage medium having a computer program stored thereon, wherein the computer program, when executed by a processor, implements the hybrid compression method for deep convolutional models based on multi-agent cooperation.

[0015] The present invention also provides a computer program product, including a computer program that, when executed by a processor, implements the hybrid compression method for deep convolutional models based on multi-agent cooperation.

[0016] The present invention provides a hybrid compression method and apparatus for deep convolutional models based on multi-agent collaboration. By constructing a dependency graph between layers of the deep convolutional model and grouping them based on the dependency graph, it can automatically identify and clarify complex inter-layer coupling relationships (such as residual connections, splicing, etc.) in the neural network. This ensures that layers with parameter dependencies are grouped into the same group for unified processing, avoiding network structure breakage or dimensionality mismatch caused by local pruning, and guaranteeing the structural integrity and reasoning ability of the compressed model. Based on the dependency groups, agents are assigned and reinforcement learning is performed, decomposing the huge and complex problem of searching for compression parameters across the entire network into multiple low-dimensional local optimization sub-problems, reducing the optimization cost. This approach achieves the technical effect of reducing space complexity and overcoming the difficulty of convergence in high-dimensional action spaces controlled by single agents. It also enables automated search for compression strategies without extensive human intervention. By generating target compression actions based on reinforcement agents and performing pruning and quantization, it achieves joint optimization of channel pruning (structural simplification) and weight quantization (precision adjustment). Furthermore, it optimizes the compressed model through knowledge distillation. This approach significantly reduces the number of model parameters and computational cost while maximizing the model's feature extraction capabilities and accuracy. Ultimately, it generates a lightweight target model suitable for deployment on resource-constrained edge devices, improving the efficiency of model compression. Attached Figure Description

[0017] The accompanying drawings, which are incorporated in and form part of this specification, illustrate embodiments consistent with the invention and, together with the description, serve to explain the principles of the invention.

[0018] To more clearly illustrate the technical solutions in this invention or the prior art, the drawings used in the description of the embodiments or the prior art will be briefly introduced below. Obviously, the drawings described below are some embodiments of this invention. For those skilled in the art, other drawings can be obtained from these drawings without creative effort.

[0019] Figure 1 This is a flowchart illustrating the hybrid compression method for deep convolutional models based on multi-agent collaboration provided by the present invention.

[0020] Figure 2 This is a schematic diagram of the structure of the deep convolution model hybrid compression device based on multi-agent cooperation provided by the present invention.

[0021] Figure 3 This is a schematic diagram of the structure of the electronic device provided by the present invention. Detailed Implementation

[0022] To enable those skilled in the art to better understand the present invention, the technical solutions of the present invention will be clearly and completely described below with reference to the accompanying drawings of the embodiments of the present invention. Obviously, the described embodiments are only some embodiments of the present invention, and not all embodiments. Based on the embodiments of the present invention, all other embodiments obtained by those skilled in the art without creative effort should fall within the scope of protection of the present invention.

[0023] It should be noted that the terms "first," "second," etc., used in this invention are used to distinguish similar objects and are not necessarily used to describe a specific order or sequence. It should be understood that such data can be interchanged where appropriate so that the embodiments of the invention described herein can be implemented in orders other than those illustrated or described herein. Furthermore, the terms "comprising" and "having," and any variations thereof, are intended to cover non-exclusive inclusion; for example, a process, method, system, product, or device that comprises a series of steps, units, or modules is not necessarily limited to those explicitly listed, but may include other steps, units, or modules not explicitly listed or inherent to such processes, methods, products, or devices.

[0024] Figure 1 This is a flowchart illustrating the hybrid compression method for deep convolutional models based on multi-agent collaboration provided by the present invention, as shown below. Figure 1 As shown, the method includes steps 110, 120, 130 and 140.

[0025] Step 110: Obtain the deep convolutional model to be compressed, construct the dependency graph between each layer of the deep convolutional model, and group the layers based on the dependency graph to obtain multiple dependency groups.

[0026] Specifically, the execution entity of the multi-agent collaborative deep convolutional model hybrid compression method provided in this embodiment of the invention is a multi-agent collaborative deep convolutional model hybrid compression device. This device can be implemented in software, such as a multi-agent collaborative deep convolutional model hybrid compression program running on a computer; or it can be implemented in hardware, such as a computer or server that executes the multi-agent collaborative deep convolutional model hybrid compression method.

[0027] The hybrid compression method for deep convolutional models based on multi-agent collaboration provided in this invention adopts the idea of ​​centralized training and decentralized execution to perform model compression based on multi-agent reinforcement learning (MARL). Through the mutual cooperation between agents, high-precision model compression is achieved without destroying the dependencies between them.

[0028] The first step is to group the network to be compressed according to the dependency graph, and then assign agents to each group. Step Two: Intelligent Agent Get your own from the environment (Observation) (Rewards), and then make pruning and quantitative decision-making actions. After performing the compression action, feedback is sent to the environment; during training... Communicate with the central controller, the central controller will... Scoring based on the actions performed and will Feedback to the corresponding Help them optimize their compression strategy.

[0029] The third step involved multiple rounds of iterative exploration, ultimately resulting in a high-performance, lightweight network.

[0030] The model compression method constructed using the above framework allows for rapid recovery of network model performance by fine-tuning the model after compression using a simple Stochastic Gradient Descent (SGD) optimizer combined with a cosine annealing learning rate scheduler. A dependency graph is a directed graph structure or matrix representation used to characterize the input-output coupling relationships between layers of a deep neural network.

[0031] A dependency group refers to a set of independent decision-making units to which network layers with strong coupling dependencies are assigned after connectivity analysis based on the dependency graph.

[0032] In this embodiment of the invention, the deep convolutional model to be compressed can be a pre-trained convolutional neural network. Because deep convolutional neural networks have complex inter-layer connection structures (such as residual connections and skip connections), directly pruning a single layer independently often leads to network structure destruction or tensor dimension mismatch. Therefore, before compression, it is necessary to first analyze the network's topology.

[0033] The process of constructing a dependency graph involves identifying inter-layer and intra-layer dependencies in the network to ensure structural consistency during pruning operations. Inter-layer dependencies refer to the input-output correspondence between adjacent convolutional or fully connected layers; for example, if the output of layer i is directly used as the input of layer j, then the number of output channels of layer i must match the number of input channels of layer j. Intra-layer dependencies refer to constraints introduced by residual structures or element-wise addition operations; for example, in a residual block of a Residual Network (ResNet), when the output of the main branch is added to the output of a skip connection, the tensor dimensions (especially the number of channels) of both must be completely identical.

[0034] Obtain the depthwise convolutional model to be compressed, and construct a dependency matrix (or adjacency matrix) to represent the dependency graph using an automated scanning algorithm. After constructing the dependency graph, use a graph traversal algorithm (such as breadth-first search or depth-first search) to find all connected subgraphs. All network layers that are interconnected in the dependency graph are grouped into the same dependency group.

[0035] It's important to note that all convolutional layers within the same dependency group must share the same channel pruning rate. This means that when one layer in the group is pruned, all other layers in the group must be pruned simultaneously with the same proportion to ensure the network's physical connectivity and tensor dimension alignment, thus avoiding inference errors caused by dimension mismatch.

[0036] In this embodiment of the invention, an agent is assigned to each group with dependencies. The agent compresses the group by observing information. In the MARL framework, model compression is achieved through collaboration.

[0037] For example, dependency graphs can be modeled and grouped in the following way. First, the neural network... Deconstruction is based on the sequence This encompasses both parametric and non-parametric layers. Each The inputs and outputs are labeled as follows: and The final decomposition form of the network is as follows: The reconstructed neural network is shown below. Among them, symbols This characterizes the dependencies between adjacent layers. From this, two core dependency types can be identified: inter-layer dependencies and intra-layer dependencies.

[0038] Dependency Graph (DepGraph) It is a directed graph structure used to characterize the dependencies between layers of a neural network. This indicates that the first Layer output Depends on the first Layer input The process of constructing the dependency graph is as follows: in, This refers to the logical "OR" operation; This refers to the logical "AND" operation; This is an indicator function that returns "True" if the condition is true; a graph traversal algorithm is used to identify all network layers with coupling relationships, ultimately forming a dependency group. .

[0039] Interlayer dependency: If a two-layer neural network structure and If dependencies exist, define inter-layer dependencies. .

[0040] Intra-layer dependency: If the input and output of a certain layer use the same pruning scheme, i.e., satisfying the condition... Then define the intra-layer dependencies of that layer. .

[0041] Step 120: Assign agents based on the dependency groups, perform reinforcement learning on the agents, and obtain reinforced agents.

[0042] Specifically, based on the number of dependency groups obtained from the preceding steps, an equal number of independent reinforcement learning agents are initialized. Each agent corresponds one-to-one with a dependency group, serving as the dedicated compression controller for that dependency group.

[0043] After agent allocation, the system constructs a multi-agent reinforcement learning environment configured to simulate the inference process of the deep convolutional model to be compressed on the hardware device. This environment includes the initialization of the following core components: State space definition: The environment defines an observation interface for each agent to obtain features of the current dependency group (such as kernel size, number of input / output channels, percentage of floating-point operations (FLOPs), etc.) and global context information. Action space definition: The environment defines a continuous action output interface for each agent, allowing the agent to output values ​​in the range [0,1], representing the pruning rate and quantization policy, respectively. Each agent has one policy network and two value networks. The policy network is used to compress actions based on the current observations; the value networks are used to evaluate the expected reward of the current action, taking the smaller of the two values ​​to reduce estimation bias.

[0044] After completing the allocation and environment setup, the system officially starts the reinforcement learning process. This is a trial-and-error learning process.

[0045] During the initialization phase, the policy network parameters of all agents are randomly initialized (or pre-trained parameters are loaded). At this time, the compressed actions output by the agents are random.

[0046] During the interactive iteration phase, the agents interact with the environment in multiple rounds. In each round, all agents simultaneously give actions, and the environment "simulates and compresses" the model based on these actions and provides feedback on the results.

[0047] The goal of the system is to find an optimal set of agent policy parameters that minimizes the accuracy loss of the compressed model while satisfying preset hardware constraints (such as target FLOPs and target latency).

[0048] After several rounds of training, when the policy network parameters of the agents tend to stabilize and the cumulative reward no longer increases significantly, the system outputs the current set of agents as "reinforced agents". These reinforced agents have the decision-making ability to efficiently compress the current convolutional model with a specific depth.

[0049] Step 130: Based on the enhanced agent, generate a target compression action for the dependency group corresponding to the enhanced agent.

[0050] Specifically, after completing reinforcement learning training and obtaining a set of stable reinforcement agents, the system uses these reinforcement agents to generate the final compression scheme for the model to be compressed.

[0051] For each dependency group, the system invokes its corresponding reinforcement agent. The agent first acquires the feature state information of the current dependency group, such as the weight distribution statistics of the convolutional kernels, the layer position index, and the number of input / output channels. Simultaneously, the agent also acquires collaborative information from other reinforcement agents (such as pruning decisions from preceding dependency groups) to maintain global consistency. Based on the above input information, the reinforcement agent outputs a high-dimensional continuous action vector through its pre-trained policy network. This vector typically includes channel pruning rate parameters and weight quantization bit width parameters.

[0052] It should be noted that continuous values ​​(such as 0.735) directly output by reinforcement agents often cannot be used directly for hardware deployment. They need to be parsed and mapped to be converted into physically meaningful target compression actions.

[0053] Step 140: Based on the target compression action, prune and quantize the deep convolutional model to generate a target lightweight model, and perform multi-teacher distillation on the target lightweight model.

[0054] Specifically, after generating the target compression action containing the specific pruning rate and quantization bit width for each dependency group, the system performs actual physical pruning and quantization operations on the original deep convolutional model, and performs multi-teacher distillation on the target lightweight model. This process is essentially a model structure reconstruction process.

[0055] After the pruning, quantization, and knowledge distillation operations described above, the system outputs a lightweight target model with a simplified structure, significantly reduced number of parameters, and compatibility with the target hardware instruction set. This model retains the core feature extraction capabilities of the original model, but significantly reduces computational complexity and resource consumption, providing a foundation for subsequent deployment.

[0056] The hybrid compression method for deep convolutional models based on multi-agent collaboration provided in this invention constructs a dependency graph between layers of the deep convolutional model and groups them based on the dependency graph. This automatically identifies and clarifies complex inter-layer coupling relationships (such as residual connections and splicing) in the neural network, ensuring that layers with parameter dependencies are grouped into the same group for unified processing. This avoids network structure breakage or dimensionality mismatch caused by local pruning, guaranteeing the structural integrity and reasoning ability of the compressed model. Furthermore, agent allocation and reinforcement learning based on dependency groups decompose the large and complex problem of searching for compression parameters across the entire network into multiple low-dimensional local optimization sub-problems, reducing the optimization cost. This approach achieves the technical effect of reducing space complexity and overcoming the difficulty of convergence in high-dimensional action spaces controlled by single agents. It also enables automated search for compression strategies without extensive human intervention. By generating target compression actions based on reinforcement agents and performing pruning and quantization, it achieves joint optimization of channel pruning (structural simplification) and weight quantization (precision adjustment). Furthermore, it optimizes the compressed model through knowledge distillation. This approach significantly reduces the number of model parameters and computational cost while maximizing the model's feature extraction capabilities and accuracy. Ultimately, it generates a lightweight target model suitable for deployment on resource-constrained edge devices, improving the efficiency of model compression.

[0057] In some embodiments, performing reinforcement learning on the agent to obtain a reinforced agent includes: Each agent is controlled to observe its own state and the channel pruning rate information of all other agents, and to generate a compression action for the corresponding dependency group; the compression action includes the channel pruning rate parameter and the weight quantization bit width parameter. Based on the compression action, the deep convolutional model is compressed, and the performance feedback index of the compressed deep convolutional model is obtained. The agent's policy network is updated based on the performance feedback metrics until the agent's output meets the preset convergence condition, thus obtaining the reinforced agent.

[0058] Specifically, each dependency group corresponds to an agent. Due to the tight coupling between layers in a deep convolutional neural network (i.e., the number of output channels in the previous layer determines the number of input channels in the next layer), isolated decisions by a single agent often lead to network structure breakage or a sharp decline in performance. To address this, this invention employs a unique state observation and cooperation mechanism.

[0059] The state observation information received by each agent consists of two parts: local state information (its own state) and global cooperation information. Local state information includes network layer attributes within the current dependency group, such as layer type (convolutional or fully connected layer), kernel size, current number of input / output channels, and various current statistical features (such as weight distribution). Global cooperation information includes pruning decision information from other agents. Specifically, when making decisions, the current agent obtains the channel pruning rates of the outputs of all other agents (or associated neighboring agents).

[0060] It should be noted that when generating "compressed actions," the agent not only outputs the initial action based on its own policy network, but also introduces a cooperative correction mechanism. For example, the agent adjusts its own pruning amplitude based on the average pruning rate of other agents to ensure a uniform sparsity distribution throughout the network and prevent over-pruning of a certain layer from becoming a bottleneck.

[0061] In this embodiment of the invention, the compression action is a hybrid action space, specifically including a channel pruning rate parameter and a weight quantization bit width parameter. The channel pruning rate parameter (continuous value) represents the proportion of channels that need to be retained in the current dependency group (e.g., retaining 70% of the channels). This parameter directly determines the number of convolutional kernels that are masked in physical memory. The weight quantization bit width parameter (discrete mapping value) represents the data bit width (e.g., 4-bit, 8-bit, 16-bit) used for the weights and activation values ​​of the current dependency group. Since hardware instruction sets typically only support specific discrete bit widths, the continuous values ​​output by the agent will be converted into an effective quantization bit width supported by the hardware through a hardware-aware mapping function (such as nearest neighbor mapping).

[0062] After all agents output their actions, the environment performs actual pruning and quantization operations on the deep convolutional model, resulting in a temporary compressed model. Subsequently, this compressed model is quickly evaluated on a validation set, and performance feedback metrics are calculated.

[0063] The policy network of the agents is updated based on the performance feedback metric until the outputs of all agents tend to stabilize and the corresponding performance feedback metric no longer increases significantly (i.e., the preset convergence condition is met). The agents obtained at this time are called "reinforced agents", which have learned the optimal hybrid compression strategy for the deep convolution model.

[0064] The model compression process can be modeled as a Markov Decision Process (MDP) using the MARL framework. Table 1 shows the main symbols used in the multi-agent collaborative deep convolutional model hybrid compression method provided in this embodiment of the invention.

[0065] Table 1. Key symbols and their meanings in the method.

[0066] MDP consists of a quintuple Composition, among which, Represents the set of observations; Represents a set of actions; Represents the probability of observation transition; Represents an award; This represents the discount factor. Its core definitions of observation, action, and reward are as follows: Observation space The observations will be divided into two parts, for each agent. In addition to observing the current state of the model, it is also possible to observe the model compression state at the current stage, i.e. status It consists of two parts: the current state of this group and the current compression state of the model, therefore... Observations Represented as: The observations in this group are as follows: in For group-level indexes; Used as a layer type identifier (1 represents a convolutional layer, 0 represents a fully connected layer). , , These are the number of input channels, the number of output channels, and the kernel size, respectively. It is the group convolution ratio, which reflects the relationship between the number of convolutional layer groups and the number of input channels; The current pruning rate; This is for quantization bit width.

[0067] The model's compressed state observation formula is: in, For group index; It is the number of intelligent agents; This represents the current best accuracy rate. Reduce the FLOPs ratio; The percentage reduction in the number of parameters; This represents the current pruning step. The overall error of the model is quantized. For each dimension of the observation vector, it is normalized to [0,1] to make them all on the same scale.

[0068] Action space: Indicates the first indivual A set of executable actions The model compression actions that the agent can execute are defined within a hybrid, continuous action space. The agent iteratively explores the optimal lightweight model based on preset minimum pruning rate and quantization rate steps. Each agent... action Includes pruning rate decision and quantitative decision. .

[0069] 1. Pruning action decision Define the pruning rate as the set of pruning rates that the current agent can choose. , This represents the number of channels after pruning; the original network channel count was... To avoid agents getting trapped in local optima during pruning exploration, the actions of other agents are also taken into account when actually performing pruning actions. In practice, the pruning rate is as follows: in, It is a constraint on the global maximum pruning rate; express The product of the pruning rate chosen by the current agent and the proportion of pruning rates chosen by all agents ensures that the pruning rate of each agent is reasonable under global constraints. To ensure effective pruning without disrupting the necessary network structure, the following constraints are applied: First, the number of pruning channels is calculated, with the number of pruning channels per layer as follows: in, This refers to the total number of channels that can be pruned. The calculation involves rounding down the number of channels that need pruning, ensuring that at least one channel is pruned. Then, the dependency graph is used... Check if the pruning operation is valid as follows. If valid, perform the pruning; otherwise, skip pruning for that layer.

[0070] Under this constraint, pruning will not disrupt the model's structure and dependencies.

[0071] 2. Quantifying action decisions (1) Quantified action is defined as , Used to control the number of quantization bits. The quantization bit depth is used to select the quantization method. The quantization bit depth is adjusted based on the values ​​in the action vector. The quantization bit depth is discretized from continuous action values ​​to the hardware-supported bit width using the following mapping function: in, The quantization operation is performed on the list of quantization bit widths supported by the hardware, and rounding is performed to ensure a valid quantization bit width; round() refers to the rounding function, which rounds to the nearest integer; b∈B is the selected quantization bit width.

[0072] The quantization method action value is scaled and rounded, mapped to a predefined quantization strategy index, thereby selecting a specific quantization method. An index mapping mechanism is used, with the following mapping function used to select the action value... The resulting mapping function is: in, List of methods; The number of methods is indicated. This mapping function implements a precise mapping of continuous action values ​​to a predefined discrete quantization strategy.

[0073] (2) Hardware-sensing quantizer A hardware-aware quantizer can be designed to perform quantization, which adaptively selects the quantization strategy based on the configuration of the target hardware.

[0074] ① The quantizer first verifies whether the target bit width is supported by the hardware. If the target bit width is not in the hardware-supported bit width list... In this process, the supported bit width is selected based on the nearest neighbor principle. As shown in the following formula: in, This refers to the target quantization bit width.

[0075] ② Next, a quantization strategy is selected. The selection of the quantization strategy is highly dependent on the characteristics of the target hardware platform. The quantization strategy is chosen based on the hardware configuration and computing power. If the hardware environment or data distribution is more suitable for symmetric processing, the framework will prioritize symmetric quantization, which offers superior computational efficiency and strong hardware adaptability; otherwise, asymmetric quantization, which can effectively handle asymmetric weight distributions, will be used. In the symmetric quantization scenario, the scaling factor and quantization weights are shown in the following formulas: The clip function clips the range of tensor elements. The weights are after quantization; These are the original weights.

[0076] In asymmetric quantization scenarios, the scaling factor, zero-point parameters, and quantization weights are calculated as follows: in, This is the scaling factor; It is a zero-point two-parameter function.

[0077] ③ After quantization, the mean absolute error is calculated as the quantization error, as shown in the following formula: in, This refers to the number of weight elements. The quantizer directly evaluates the quantization benefit, which is used in the reward function to guide the agent in exploring efficient compression strategies to achieve hardware benefits, including reduced memory usage, increased speed, and energy savings. These benefits are calculated based on the bit width as follows: in, Indicates the percentage of energy savings; Energy consumption factor; This represents the multiplier of the theoretical speed increase. The inference speed increase is inversely proportional to the bit width, but considering hardware efficiency, an efficiency factor is introduced. This is the reasoning speed factor; This indicates the percentage reduction in memory usage.

[0078] Reward Function: A scalar reward function can be established by weighting and integrating multi-dimensional indicators to comprehensively reward and punish the model from multiple perspectives such as performance and efficiency, as shown in the following formula. Furthermore, the impact of each agent's action on the global model is considered, and rewards are further allocated based on each agent's contribution to model compression.

[0079] in, , This indicates the change in accuracy and the rate of accuracy retention. The ratio of FLOPs change between the two steps before and after the compression action; The ratio of the changes between the two steps before and after the time-delay compression action, and The calculation method is the same; The weighting coefficients for accuracy, FLOPs, and latency can be changed according to the application scenario; Quantify the benefits of hardware perception; To constrain over-pruning and quantization penalties, including pruning penalties and quantization penalties, the following definitions apply: Pruning punishment: Quantified penalties: in, , The accuracy retention rates are the user-preset maximum pruning rate and quantization rate allowed for a single step operation, respectively. Quantization rate measures the degree of compression of the average quantization bit width of the quantizable layers in the model compared to the original 32-bit floating-point representation. It is defined as follows: ,in, This represents the total number of quantization layers in the model. Let be the quantization bit width of the i-th quantization layer. When the sum of the pruning rate and quantization rate selected by the agent exceeds the preset maximum pruning rate and quantization rate thresholds, a penalty mechanism is triggered.

[0080] Each agent shares the common goal of exploring a high-precision, lightweight model, thus forming a collaborative relationship. Based on the reward mechanism and action strategy of reinforcement learning, the collaborative relationship between agents is deeply integrated and optimized.

[0081] The reward distribution mechanism uses model pruning rate and quantization rate as core evaluation indicators to construct a dynamic incentive framework. This achieves precise matching of reward distribution with individual contributions, effectively suppressing "free-riding" behavior in group collaboration and realizing "more work, more reward," as shown in the following formula: use The pruning rate value and the quantization rate value are all The reward allocation based on the ratio of pruning rate, quantization rate, and scalability incentivizes agents to consider each other's performance during the compression process. (Introduction) Ensure the network maintains high accuracy throughout the compression process.

[0082] Agents collaborate by sharing experience. Specifically, each agent adjusts its pruning actions based on the pruning rates of other agents, as shown in the following equation: in, For collaboration weights, This represents the average of the pruning rates of other agents. Incorporating this average pruning rate into the action update improves cooperation between agents. During the update phase, the calculation of the cooperation loss must simultaneously consider the impact of the pruning rate on the agent cooperation effect. As shown in the following formula: in, For when The Q-value, where the Q-value refers to the evaluation value of the value network for the i-th agent; For others The Q value is denoted by MSE, which represents the mean squared error and measures the difference in policy value between individuals and the group. Because the quantization rate needs to consider hardware information and only considers the hierarchical quantization bit width during execution, it doesn't require much consideration of other agent actions. Therefore, in agent action collaboration, only the pruning rate of different agents is considered.

[0083] By leveraging action outputs to collaboratively integrate group information and optimize individual decisions, and by using collaborative loss to drive optimization, each agent can overcome its own local information limitations.

[0084] For example, agents can be trained using Multi-Agent TwinDelayed Deep Deterministic Policy Gradient (MATD3), achieving a compressed scheme that combines centralized training with decentralized execution of agents.

[0085] Centralized training: During training, agents communicate with a central controller, which uses the data provided by each agent. Training the critic value network. In the MATD3 algorithm, the central controller maintains 2n critic value networks, with each pair of critics corresponding to one agent. The central controller feeds back the evaluation results to the corresponding agents to guide the updating and optimization of the policy network actor's compression strategy. The loss function of the value network is defined as follows: in, It refers to the expectation of observation o, action a, reward r, and the next state; This refers to the value estimate of the current observation-action pair by the i-th critic network; target value. The target network depends on the next state With the next action The calculation yields the following formula: .

[0086] Decentralized execution: Each agent makes decisions using its own policy network, and the agent obtains its own... , Then, compressed decision actions are generated based on the policy network. The quality of the current action is determined based on the q-value fed back to it by the central controller.

[0087] The hybrid compression method for deep convolutional models based on multi-agent collaboration provided in this invention achieves global optimization through information sharing and reward allocation. Agents perform compression actions by sharing pruning rates and quantization information, and rewards are allocated based on each agent's contribution, enabling better coordination among agents and effectively solving the technical problem of easy local optima in layer-by-layer compression in existing technologies. Agents achieve collaborative compression with a global perspective by sharing pruning information. Simultaneously, incorporating pruning rates and quantization bit widths into a unified action space for joint search avoids performance loss caused by step-by-step optimization, achieving automated, high-performance model compression for specific hardware constraints.

[0088] In some embodiments, the multi-teacher distillation of the target lightweight model includes: Obtain multiple pre-trained teacher models; Based on the multiple pre-trained teacher models, the target lightweight model is trained by distillation.

[0089] Specifically, after the pruning and quantization operations described above, the generated lightweight target model significantly reduces the number of parameters and computational complexity, but its feature extraction capability is often weakened to some extent, leading to a decrease in classification accuracy. In order to restore or even improve the model's recognition accuracy while maintaining the advantages of lightweight design, this embodiment of the invention introduces a multi-teacher knowledge distillation technique.

[0090] Knowledge distillation is a model compression technique that trains a small model (the student model, i.e., the target lightweight model in this embodiment) to mimic the output behavior of a large model (the teacher model), thereby transferring the "hidden knowledge" contained in the large model to the small model. Compared to traditional single-teacher distillation, this embodiment employs a multi-teacher distillation strategy, aiming to utilize the complementary knowledge provided by multiple pre-trained teacher models to more comprehensively and robustly guide the training of the target lightweight model, avoiding the overfitting or knowledge blind spots that may exist with a single teacher model.

[0091] In traditional multi-teacher knowledge distillation methods, teacher weights are typically fixed or based on fixed criteria. However, the learning process of a student model changes as training progresses. Therefore, the method provided in this invention dynamically adjusts teacher weights based on the student's adaptability to different teacher knowledge. At each training stage, the student determines the weight of each teacher based on their loss on the current training samples, in order to receive more guidance from the teacher best suited for the current learning stage.

[0092] In this embodiment of the invention, multiple pre-trained teacher models are obtained; based on the multiple pre-trained teacher models, the target lightweight model, i.e., the student model, is subjected to distillation training.

[0093] The hybrid compression method for deep convolutional models based on multi-agent collaboration provided in this invention uses a multi-teacher model to distill and train the target lightweight model, enabling the target lightweight model to quickly correct feature biases caused by pruning quantization, learn more robust classification boundaries, and ultimately achieve performance levels close to or even exceeding those of the original full-precision model with extremely low computational resource consumption.

[0094] In some embodiments, the distillation training of the target lightweight model based on the plurality of pre-trained teacher models includes: Obtain a training set; the training set includes input samples and the corresponding ground truth labels of the input samples; The input sample is input into the target lightweight model to obtain the student output probability distribution and the student intermediate layer feature map; The input samples are input into each teacher model to obtain the teacher output probability distribution and teacher intermediate layer feature map corresponding to each teacher model; Based on the student output probability distribution, determine the student loss of the target lightweight model on the current training samples; Based on the student output probability distribution, the teacher output probability distribution, the student intermediate layer feature map, and the teacher intermediate layer feature map, the first distillation loss and the second distillation loss are determined; Based on the student loss, the first distillation loss, and the second distillation loss, a total loss function is constructed. The parameters of the target lightweight model are updated based on the total loss function.

[0095] Specifically, the system acquires a training set for distillation training. This training set contains a large number of input samples (e.g., image data) and the ground truth label for each sample.

[0096] During training, the system performs a forward propagation operation: The student model forward propagates by feeding the input samples into the target lightweight model (the student model). The student model performs layer-by-layer computation, outputting two key pieces of information: the student output probability distribution and the student intermediate layer feature map. The student output probability distribution represents the student model's predicted probability of the current sample belonging to each category (usually processed using softmax). The student intermediate layer feature map represents the feature tensors extracted by the student model from intermediate layers of the network (such as the outputs of some convolutional layers).

[0097] The same input sample is fed into multiple pre-trained teacher models. Each teacher model also outputs its corresponding teacher output probability distribution and teacher intermediate layer feature map. Since the teacher model is usually an uncompressed, complete model, its feature extraction ability and prediction accuracy are often superior to the student model.

[0098] To provide comprehensive guidance for the learning of student models, the system constructs a composite loss function consisting of three parts: student loss, first distillation loss, and second distillation loss.

[0099] The student loss is calculated by directly using the student's output probability distribution and the true label to determine the cross-entropy loss. This loss term ensures the student model's basic classification ability, making its predictions as close as possible to the true answers.

[0100] The first distillation loss is calculated by comprehensively utilizing the probability distributions of student outputs and all teacher outputs. This part of the loss is mainly used to enable the student model to learn the "soft label" knowledge of the teacher model, that is, to learn the probability prediction of the teacher model for incorrect categories (for example, although the picture is a cat, the teacher model believes that it has a 5% chance of looking like a dog; this "looks like a dog" information is hidden knowledge). The specific calculation involves the dynamic allocation of weights.

[0101] The second distillation loss is calculated by the system using the student's intermediate layer feature map and the teacher's intermediate layer feature map. This part of the loss is mainly used to enable the student model to learn the teacher model's "intermediate layer representation," that is, to learn how the teacher model extracts features (such as texture, edges, etc.). The specific calculation involves feature layer matching and alignment.

[0102] In this embodiment of the invention, the training set can be: Where N is the number of samples, It is the input sample (such as an image). It corresponds to the ground-truth tag (real category).

[0103] The teacher model is: There are n pre-trained teacher models, with the k-th teacher denoted as . .

[0104] This refers to the logits output by the network (the raw scores before Softmax). This represents the number of categories.

[0105] Softmax function: ,in It is a temperature parameter used to control the "smoothness" of the Softmax output. The larger the value, the smoother the output and the more category-related information it contains.

[0106] The learning progress of the student model can be measured by calculating the student loss in the current batch. A higher student loss indicates poorer learning in that batch, and vice versa. Specifically, the student loss can be the difference between the student model's prediction and the true label, typically represented by cross-entropy loss. in, This represents the loss of the student model in the current batch. It represents the number of categories. This is the true label of the sample. If the sample belongs to category c, then... ,otherwise . It is the student model's predicted probability for category c, that is, the probability distribution obtained by transforming the student model's predicted output through Softmax.

[0107] The system weights and sums the three loss components to construct a total loss function. Based on this total loss function, the system uses the backpropagation algorithm to calculate the gradients of each parameter in the student model and uses an optimizer to update the parameters of the target lightweight model. The parameters of the teacher model remain fixed (frozen) during training and are not updated. Through multiple rounds of iterative training, the accuracy of the student model will gradually improve, ultimately resulting in a high-performance lightweight model.

[0108] In this embodiment of the invention, the overall loss consists of three parts, and the contributions of each part are balanced by hyperparameters: This refers to the cross-entropy loss of the student model itself (related to the ground-truth labels), which ensures the basic classification ability of the student model. Hyperparameters Weights used to control the teachers' prediction of distillation loss; The weights used to control the feature loss of the intermediate layer, balancing the transfer of feature knowledge.

[0109] The hybrid compression method for deep convolutional models based on multi-agent collaboration provided in this invention distills and trains the target lightweight model using multiple pre-trained teacher models, significantly improving the classification accuracy and generalization performance of the final lightweight model. By dynamically adjusting teacher weights according to the student's adaptability to different teacher knowledge, in each training stage, the student determines the weight of each teacher based on their loss on the current training samples, so as to obtain more guidance from the teacher most suitable for the current learning stage. By utilizing the knowledge of the teacher's intermediate layer structure and allowing students to prioritize imitating teachers with similar feature spaces, training instability is avoided.

[0110] In some embodiments, the first distillation loss is determined based on the following steps: Based on the teacher output probability distribution, determine the prediction distribution entropy of each teacher model for the current training sample; Based on the student loss and the predicted distribution entropy, calculate the distillation weights of each teacher model in the current training batch; The teacher output probability distributions corresponding to each teacher model are weighted and fused according to the distillation weights to obtain the target teacher distribution, and the first distillation loss between the student output probability distribution and the target teacher distribution is calculated.

[0111] Specifically, to fully utilize the integrated knowledge of multiple teacher models, the system first needs to evaluate the confidence level of each teacher model for the current input sample. A teacher model with high confidence often exhibits a "sharp" output probability distribution (i.e., the probability prediction for the correct class is close to 1, and for other classes it is close to 0); conversely, a teacher model with low confidence often has a relatively "flat" output probability distribution. Therefore, this embodiment of the invention introduces prediction distribution entropy (as an indicator to measure the uncertainty of the teacher model).

[0112] For the Teacher Model It is for the current input sample Given the predicted output probability distribution, calculate the information entropy of this distribution: Teacher The predicted distribution entropy represents the teacher's The uncertainty in prediction based on the current sample. The smaller the entropy value, the more stable the teacher's prediction; the larger the entropy, the lower the teacher's prediction confidence and the smaller the assigned weight. This is equivalent to scoring teachers; teachers with smaller entropy values ​​score higher, and teachers with larger entropy values ​​score lower. Among these, Teacher For the current sample belonging to the first The predicted probability of the class is obtained by converting the teacher's output logits using the softmax function with temperature T.

[0113] The system also needs to perceive the learning status of the current student model (i.e., the target lightweight model) and calculate the student loss. If the student model performs poorly on the current sample (i.e., the prediction error or bias is large), it indicates that the sample is a "difficult sample" and requires stronger guidance from the teacher; conversely, if the student performs well, the intensity of guidance should be appropriately reduced to avoid overfitting.

[0114] The teacher weights are dynamically adjusted based on student loss and teacher predicted distribution entropy. The specific formula is as follows: in, It is the first The weight of each teacher is assigned; the higher the weight, the more the student model imitates that teacher.

[0115] A higher value indicates that students are experiencing difficulties learning from the sample. In this case, a correction value greater than 1 will proportionally increase the weight of all teachers on that sample, thus strengthening the granularity of teacher guidance. Conversely, if... A smaller value indicates that students have mastered the knowledge from this sample, and the correction value approaches 1. Therefore, the teacher's weight should be reduced to effectively prevent teacher overfitting. The loss on a single sample should be adjusted accordingly. Maximum loss within the batch Normalization is performed to ensure the stability of adaptive weight adjustment.

[0116] Finally, the system uses this weight to weight the output probability distributions of each teacher model, thereby constructing the first distillation loss for the student model. That is, using... The weighted average of all teachers' predictions yields the corresponding teacher prediction loss: in, This refers to the teacher's logits tag; This refers to the students' logits. According to the formula above, teachers whose predictions are closer to the true labels will be given greater weight. The student model prioritizes learning predictions from teachers with high weights to reduce interference from low-quality teachers.

[0117] The hybrid compression method for deep convolutional models based on multi-agent collaboration provided in this invention calculates the first distillation loss, enabling the compressed target lightweight model to learn the prediction probability distribution information of multiple pre-trained teacher models for samples. This not only mimics the correct prediction results of the teacher models but also captures the implicit correlations between different categories, making up for the deficiency of insufficient training information in single hard labels and significantly improving the generalization ability and classification accuracy of the target lightweight model.

[0118] In some embodiments, the second distillation loss is determined based on the following steps: Global average pooling is performed on the student intermediate layer feature map to obtain the student feature vector; The student feature vectors are input into the classifiers of each teacher model to obtain the mapping output of student features in each teacher classification space; Based on the mapping output and the real label, calculate the feature fit loss; Based on the feature adaptability loss, calculate the hierarchical confidence of each teacher model for the student intermediate layer feature map; Calculate the variance of the student intermediate layer feature map, and determine the feature importance weights based on the variance; The second distillation loss is calculated based on the hierarchical confidence level and the feature importance weight.

[0119] Specifically, in this embodiment of the invention, the structural knowledge of the intermediate layer of the teacher is utilized, and the student prioritizes imitating teachers with similar feature spaces to avoid training instability. Traditional distillation only utilizes the teacher's final output (logits), but the intermediate layer features contain structural knowledge, which can provide the student model with more complete learning knowledge. The idea of ​​"dynamic weighting" is extended to the intermediate layer, allowing the student to prioritize imitating teachers with more similar feature spaces to itself, avoiding training instability caused by imitating mismatched features.

[0120] First, perform global average pooling. For the student model... Output feature map of the layer Perform global average pooling to convert the vector into a feature vector suitable for the teacher classifier. : in, This is a global average pooling operation. This represents the dimension of the feature vector after pooling.

[0121] Then, a teacher classification space mapping is performed. Student feature vectors are input into the classifiers of each teacher model, resulting in the mapping output of student features in each teacher classification space. The student feature vectors are then mapped... Enter the first A classifier for each teacher The logits output of student features in the teacher classification space is obtained. : The teacher classifier is: of The weight matrix, when the number of categories in the classification task is... The high-level feature vector output by the teacher model after pooling has a dimension of 1. Its essence is the final learnable weight matrix of the teacher model to complete the category prediction. The input of the teacher classifier is the high-level feature vector extracted by the teacher model, and the output is the category logits score of the corresponding task. It is the core of the teacher model's mapping from the feature space to the category space, and it is a fixed parameter completed in pre-training. It does not participate in the update during the distillation process and serves as an evaluation tool for the student feature fit.

[0122] Finally, based on the mapped output and the true labels, the feature fit loss is calculated. The loss generated after the student features are mapped by the classifier is as follows: in, This refers to the actual label; The measure of feature fit is passed through each teacher classifier. get; For students Layer characteristics in teachers In the classification space, the first The logits value of the class; The smaller the value, the higher the student's level. Layer characteristics and teachers The better the classifier's adaptability, the higher the knowledge transfer value of the features at that layer.

[0123] Based on this feature-adaptive loss, the hierarchical confidence of each teacher model for the student's intermediate feature map is calculated. That is, based on the aforementioned adaptability loss and combined with the discriminative ability of the student features themselves, the intermediate-level confidence is obtained. : in, Let be the level confidence of the k-th teacher for the s-th student. The larger the value, the higher the knowledge transfer value of the corresponding level feature of the teacher, thus realizing the accurate knowledge transfer of the student's intermediate level features.

[0124] By introducing inter-layer importance weights for students, the key feature layers of the student model are selected, and the calculation formula is as follows: in, This represents the total number of feature layers in the student model. The variance represents the features of the s-th layer of the student model, used to quantify the discriminative power of that layer. The larger the variance, the more critical the contribution of that layer to the model performance. This is the sum of the variances of all feature layers in the student model, used to normalize the variance scale. This allows the student to focus its efforts on more valuable layers, reducing ineffective learning in redundant layers and improving distillation efficiency. It focuses on layers with strong discriminative power in the student network, strengthening the extraction of advantageous features.

[0125] It can ensure the compatibility between student characteristics and teacher classification space, while also taking into account the discriminative ability of student characteristics themselves.

[0126] The intermediate layer feature distillation loss can transfer the structural knowledge of the teacher's intermediate layers, improving the feature extraction ability of the student model. The final second distillation loss is calculated using the following formula: in, It is a dimension alignment function used to solve the problem of inconsistent feature dimensions between the student and teacher intermediate layers; For teacher characteristics and student characteristics Distance measures the similarity of feature maps; the smaller the distance, the closer the features are.

[0127] The hybrid compression method for deep convolutional models based on multi-agent collaboration provided in this invention calculates the second distillation loss, enabling the compressed lightweight model to not only learn the final classification result of the teacher model, but also effectively mimic the ability of the intermediate layers of the teacher model to extract image features (such as texture and edges). This achieves knowledge transfer in both deep semantics and shallow details, significantly compensating for the decrease in feature extraction ability caused by model pruning and quantization, accelerating model convergence, and further improving the final recognition accuracy.

[0128] The apparatus provided in the embodiments of the present invention will be described below. The apparatus described below can be referred to in correspondence with the method described above.

[0129] Figure 2 This is a schematic diagram of the structure of the hybrid compression device for deep convolutional models based on multi-agent cooperation provided by the present invention, as shown below. Figure 2 As shown, the device includes a grouping module 210, a reinforcement module 220, a decision-making module 230, and a compression module 240 connected in sequence.

[0130] Grouping module 210 is used to obtain the deep convolutional model to be compressed, construct the dependency graph between each layer of the deep convolutional model, and group the layers based on the dependency graph to obtain multiple dependency groups; The reinforcement module 220 is used to allocate agents based on the dependency group, perform reinforcement learning on the agents, and obtain reinforced agents. Decision module 230 is used to generate a target compression action for the dependency group corresponding to the reinforced agent based on the reinforced agent; Compression module 240 is used to prune and quantize the deep convolutional model based on the target compression action to generate a target lightweight model, and to perform multi-teacher distillation on the target lightweight model.

[0131] The hybrid compression device for deep convolutional models based on multi-agent collaboration provided in this invention constructs a dependency graph between layers of the deep convolutional model and groups them based on the dependency graph. This automatically identifies and clarifies complex inter-layer coupling relationships (such as residual connections and splicing) in the neural network, ensuring that layers with parameter dependencies are grouped into the same group for unified processing. This avoids network structure breakage or dimensionality mismatch caused by local pruning, guaranteeing the structural integrity and reasoning capability of the compressed model. Furthermore, agent allocation and reinforcement learning based on dependency groups decompose the large and complex problem of searching for compression parameters across the entire network into multiple low-dimensional local optimization sub-problems, reducing the optimization cost. This approach achieves the technical effect of reducing space complexity and overcoming the difficulty of convergence in high-dimensional action spaces controlled by single agents. It also enables automated search for compression strategies without extensive human intervention. By generating target compression actions based on reinforcement agents and performing pruning and quantization, it achieves joint optimization of channel pruning (structural simplification) and weight quantization (precision adjustment). Furthermore, it optimizes the compressed model through knowledge distillation. This approach significantly reduces the number of model parameters and computational cost while maximizing the model's feature extraction capabilities and accuracy. Ultimately, it generates a lightweight target model suitable for deployment on resource-constrained edge devices, improving the efficiency of model compression.

[0132] Figure 3 This is a schematic diagram of the structure of the electronic device provided by the present invention, such as... Figure 3 As shown, the electronic device may include a processor 310, a communications interface 320, a memory 330, and a communications bus 340, wherein the processor 310, the communications interface 320, and the memory 330 communicate with each other via the communications bus 340. The processor 310 can call logical commands stored in the memory 330 to execute the methods described in the above embodiments, for example: A deep convolutional model to be compressed is obtained, a dependency graph between the layers of the deep convolutional model is constructed, and multiple dependency groups are obtained based on the dependency graph; agents are assigned based on the dependency groups, and reinforcement learning is performed on the agents to obtain reinforced agents; based on the reinforced agents, a target compression action is generated for the dependency group corresponding to the reinforced agents; the deep convolutional model is pruned and quantized based on the target compression action to generate a target lightweight model, and multi-teacher distillation is performed on the target lightweight model.

[0133] Furthermore, the logical instructions in the aforementioned memory can be implemented as software functional units and sold or used as independent products, and can be stored in a computer-readable storage medium. Based on this understanding, the technical solution of the present invention, or the part that contributes to the prior art, or a part of the technical solution, can be embodied in the form of a software product. This computer software product is stored in a storage medium and includes several instructions to cause a computer device (which may be a personal computer, server, or network device, etc.) to execute all or part of the steps of the methods described in the various embodiments of the present invention. The aforementioned storage medium includes various media capable of storing program code, such as USB flash drives, portable hard drives, read-only memory (ROM), random access memory (RAM), magnetic disks, or optical disks.

[0134] The processor in the electronic device provided in this embodiment of the invention can call logical instructions in the memory to implement the above method. Its specific implementation method is the same as the aforementioned method implementation method and can achieve the same beneficial effects, which will not be repeated here.

[0135] This invention also provides a computer-readable storage medium having a computer program stored thereon, which, when executed by a processor, is implemented to perform the methods provided in the above embodiments.

[0136] The specific implementation method is the same as the aforementioned method implementation method and can achieve the same beneficial effects, so it will not be repeated here.

[0137] This invention provides a computer program product, including a computer program that, when executed by a processor, implements the method described above.

[0138] The system embodiments described above are merely illustrative. The units described as separate components may or may not be physically separate. The components shown as units may or may not be physical units; that is, they may be located in one place or distributed across multiple network units. Some or all of the modules can be selected to achieve the purpose of this embodiment according to actual needs. Those skilled in the art can understand and implement this without any creative effort.

[0139] Through the above description of the embodiments, those skilled in the art can clearly understand that each embodiment can be implemented by means of software plus necessary general-purpose hardware platforms, and of course, it can also be implemented by hardware. Based on this understanding, the above technical solutions, in essence or the part that contributes to the prior art, can be embodied in the form of a software product. This computer software product can be stored in a computer-readable storage medium, such as ROM / RAM, magnetic disk, optical disk, etc., and includes several instructions to cause a computer device (which may be a personal computer, server, or network device, etc.) to execute the methods described in the various embodiments or some parts of the embodiments.

[0140] Finally, it should be noted that the above embodiments are only used to illustrate the technical solutions of the present invention, and not to limit them; although the present invention has been described in detail with reference to the foregoing embodiments, those skilled in the art should understand that modifications can still be made to the technical solutions described in the foregoing embodiments, or equivalent substitutions can be made to some of the technical features; and these modifications or substitutions do not cause the essence of the corresponding technical solutions to deviate from the spirit and scope of the technical solutions of the embodiments of the present invention.

Claims

1. A hybrid compression method for deep convolutional models based on multi-agent cooperation, characterized in that, include: Obtain the deep convolutional model to be compressed, construct the dependency graph between each layer of the deep convolutional model, and group the layers based on the dependency graph to obtain multiple dependency groups; Based on the dependency group, agents are assigned, and reinforcement learning is performed on the agents to obtain reinforced agents; Based on the enhanced agent, a target compression action is generated for the dependency group corresponding to the enhanced agent; Based on the target compression action, the deep convolutional model is pruned and quantized to generate a target lightweight model, and the target lightweight model is then subjected to multi-teacher distillation.

2. The hybrid compression method for deep convolutional models based on multi-agent cooperation according to claim 1, characterized in that, The reinforcement learning process for the agent to obtain a reinforced agent includes: Each agent is controlled to observe its own state and the channel pruning rate information of all other agents, and to generate a compression action for the corresponding dependency group; the compression action includes the channel pruning rate parameter and the weight quantization bit width parameter. Based on the compression action, the deep convolutional model is compressed, and the performance feedback index of the compressed deep convolutional model is obtained. The agent's policy network is updated based on the performance feedback metrics until the agent's output meets the preset convergence condition, thus obtaining the reinforced agent.

3. The hybrid compression method for deep convolutional models based on multi-agent cooperation according to claim 1, characterized in that, The multi-teacher distillation of the target lightweight model includes: Obtain multiple pre-trained teacher models; Based on the multiple pre-trained teacher models, the target lightweight model is trained by distillation.

4. The hybrid compression method for deep convolutional models based on multi-agent cooperation according to claim 3, characterized in that, The distillation training of the target lightweight model based on the multiple pre-trained teacher models includes: Obtain a training set; the training set includes input samples and the corresponding ground truth labels of the input samples; The input sample is input into the target lightweight model to obtain the student output probability distribution and the student intermediate layer feature map; The input samples are input into each teacher model to obtain the teacher output probability distribution and teacher intermediate layer feature map corresponding to each teacher model; Based on the student output probability distribution, determine the student loss of the target lightweight model on the current training samples; Based on the student output probability distribution, the teacher output probability distribution, the student intermediate layer feature map, and the teacher intermediate layer feature map, the first distillation loss and the second distillation loss are determined; Based on the student loss, the first distillation loss, and the second distillation loss, a total loss function is constructed. The parameters of the target lightweight model are updated based on the total loss function.

5. The hybrid compression method for deep convolutional models based on multi-agent cooperation according to claim 4, characterized in that, The first distillation loss is determined based on the following steps: Based on the teacher output probability distribution, determine the prediction distribution entropy of each teacher model for the current training sample; Based on the student loss and the predicted distribution entropy, calculate the distillation weights of each teacher model in the current training batch; The teacher output probability distributions corresponding to each teacher model are weighted and fused according to the distillation weights to obtain the target teacher distribution, and the first distillation loss between the student output probability distribution and the target teacher distribution is calculated.

6. The hybrid compression method for deep convolutional models based on multi-agent cooperation according to claim 4, characterized in that, The second distillation loss is determined based on the following steps: Global average pooling is performed on the student intermediate layer feature map to obtain the student feature vector; The student feature vectors are input into the classifiers of each teacher model to obtain the mapping output of student features in each teacher classification space; Based on the mapping output and the real label, calculate the feature fit loss; Based on the feature adaptability loss, calculate the hierarchical confidence of each teacher model for the student intermediate layer feature map; Calculate the variance of the student intermediate layer feature map, and determine the feature importance weights based on the variance; The second distillation loss is calculated based on the hierarchical confidence level and the feature importance weight.

7. A hybrid compression device for deep convolutional models based on multi-agent cooperation, characterized in that, include: The grouping module is used to obtain the deep convolutional model to be compressed, construct the dependency graph between the layers of the deep convolutional model, and group the layers based on the dependency graph to obtain multiple dependency groups. The reinforcement module is used to allocate agents based on the dependency group, perform reinforcement learning on the agents, and obtain reinforced agents. The decision module is used to generate a target compression action for the dependency group corresponding to the reinforced agent based on the reinforced agent. The compression module is used to prune and quantize the deep convolutional model based on the target compression action to generate a target lightweight model, and to perform multi-teacher distillation on the target lightweight model.

8. An electronic device comprising a memory, a processor, and a computer program stored in the memory and running on the processor, characterized in that, When the processor executes the computer program, it implements the hybrid compression method for deep convolutional models based on multi-agent cooperation as described in any one of claims 1 to 6.

9. A non-transitory computer-readable storage medium having a computer program stored thereon, characterized in that, When the computer program is executed by the processor, it implements the hybrid compression method for deep convolutional models based on multi-agent cooperation as described in any one of claims 1 to 6.

10. A computer program product, comprising a computer program, characterized in that, When the computer program is executed by the processor, it implements the hybrid compression method for deep convolutional models based on multi-agent cooperation as described in any one of claims 1 to 6.