A heterogeneous large model federated fine-tuning method and system based on structural bias compensation

What is AI technical title?
AI technical title is built by PatSnap AI team. It summarizes the technical point description of the patent document.
By introducing a lightweight compensation module, sparse distribution, and dynamic quantization uploading into the federated fine-tuning of heterogeneous large models, the problems of structural bias and communication bottlenecks are solved, and efficient model collaborative training and accuracy improvement are achieved.

CN122047516BActive Publication Date: 2026-06-16THE CHINESE UNIV OF HONG KONG (SHENZHEN) +1

Patent Information

Authority / Receiving Office: CN · China
Patent Type: Patents(China)
Current Assignee / Owner: THE CHINESE UNIV OF HONG KONG (SHENZHEN)
Filing Date: 2026-04-16
Publication Date: 2026-06-16

AI Technical Summary

⚠Technical Problem

Existing technologies suffer from structural biases, redundant communication overhead, and feature collapse risks in federated fine-tuning of heterogeneous large models, leading to performance degradation and communication bottlenecks.

⚗Method used

A heterogeneous large model federated fine-tuning method based on structural bias compensation is adopted. Through server-side pre-training of a lightweight compensation module, sparse distribution mechanism and dynamic quantization upload, combined with semantic contrast enhancement and error feedback mechanism, the transmission of model parameters and local fine-tuning are optimized.

🎯Benefits of technology

It effectively eliminates structural biases, prevents feature collapse, reduces communication overhead, enables efficient model co-training, and improves training accuracy and stability on edge devices.

✦ Generated by Eureka AI based on patent content.

Smart Images

Figure CN122047516B_ABST

Patent Text Reader

Abstract

The application discloses a heterogeneous large model federated fine-tuning method and system based on structural bias compensation, and the method comprises the following steps: S1, a heterogeneous large model federated fine-tuning architecture based on structural bias compensation is given; S2, the server performs feature distillation based on semantic contrast enhancement; S3, the server performs sparse delivery based on a mask; S4, the client performs two-stage local fine-tuning based on structural bias perception; and S5, the client performs INT8 dynamic quantization uploading with error feedback. The application greatly reduces the effective payload of federal communication under the premise of ensuring the model accuracy.

Need to check novelty before this filing date? Find Prior Art

Description

Technical Field

[0001] This invention relates to the field of federated learning, and in particular to a method and system for federated fine-tuning of heterogeneous large models based on structural bias compensation. Background Technology

[0002] While large-scale basic models (such as the GPT series and ViT) perform well in various downstream tasks, they require federated tuning (FT) techniques in privacy-sensitive fields such as healthcare and finance to achieve collaborative training that keeps data within the data domain. However, there is a significant conflict between the massive number of parameters in large models and the limited computing and storage resources of edge devices (such as mobile phones and IoT devices).

[0003] Model heterogeneous federated fine-tuning (MHFT), especially deep partial training (DPT), allows different clients to randomly retain "partial layers" of the global model for training based on their own computing power, and is a mainstream solution to the aforementioned contradiction. However, existing technologies have the following serious drawbacks:

[0004] 1) Severe “structural bias” leads to client drift: Existing techniques suggest that the performance degradation of federated learning is mainly due to statistical bias caused by the non-independent and identically distributed (Non-IID) data. However, in reality, the client skips certain layers in the DPT method, causing the Jacobian matrix of the missing layers to be treated as the identity matrix (I) during backpropagation. This introduces a significant structural bias. The more missing layers there are, the more linearly the dimensionality of the error subspace of gradient distortion increases.

[0005] 2) Existing compensation module communication overhead is still redundant: Although some studies have proposed inserting auxiliary lightweight modules to compensate for missing layers, transmitting all of these module parameters in the hundreds of rounds of federation iterations will still consume expensive uplink / downlink bandwidth on the edge network.

[0006] 3) Traditional feature distillation carries the risk of "feature collapse": When pre-training the compensation module on the server side, existing methods typically only use a simple mean squared error (MSE) to align the module's output. However, a single MSE loss can easily lead to the model learning overly smooth and undiscriminating features (i.e., feature collapse or homogenization), making it impossible for the compensation module to accurately capture the unique semantic functions of a specific layer. Summary of the Invention

[0007] The purpose of this invention is to overcome the shortcomings of the prior art and provide a method and system for federated fine-tuning of heterogeneous large models based on structural deviation compensation.

[0008] The objective of this invention is achieved through the following technical solution: a federated fine-tuning method for heterogeneous large models based on structural deviation compensation, comprising the following steps:

[0009] S1. Given a heterogeneous large model federated fine-tuning architecture based on structural deviation compensation, including: one server and K heterogeneous clients;

[0010] S2. Server-side feature distillation based on semantic contrast enhancement: in each round of federated communication Before starting, the server uses a public proxy dataset. For a global large model Each layer of pre-trained lightweight compensation module ;

[0011] S3. Server-side sparse distribution based on mask: The server distributes the backbone network subset and distilled compensation modules for the current round. To the When there are multiple clients, a spatiotemporally sparse delivery mechanism is executed.

[0012] S4. The client performs two-stage local fine-tuning based on structural deviation perception;

[0013] S5. Client performs INT8 dynamic quantization upload with error feedback: After local fine-tuning is completed, the client uploads the updated model parameters, using the INT8 dynamic quantization mechanism with local error compensation during the upload.

[0014] A heterogeneous large model federated fine-tuning system based on structural deviation compensation includes a server and K heterogeneous clients;

[0015] The server includes:

[0016] The feature distillation module performs feature distillation based on semantic contrast enhancement in each round of federated communication. Before starting, utilize public proxy datasets. For a global large model Each layer of pre-trained lightweight compensation module ;

[0017] The sparse delivery module performs sparse delivery based on a mask: it distributes the current round's backbone network subset and the distilled compensation module. To the When there are multiple clients, a spatiotemporally sparse delivery mechanism is executed.

[0018] The client includes:

[0019] The local fine-tuning module performs two-stage local fine-tuning based on structural deviation perception.

[0020] The dynamic quantization upload module is used for INT8 dynamic quantization upload with error feedback: After local fine-tuning is completed, the updated model parameters are uploaded, and the upload adopts the INT8 dynamic quantization mechanism with local error compensation.

[0021] The beneficial effects of this invention are: 1. This invention introduces a lightweight structural compensation kernel (BRICK) to reconstruct the complete gradient flow on the client side with minimal parameter overhead (0.3%-0.5% of the original layer), eliminating structural deviations caused by skipping layers;

[0022] 2. This invention proposes a feature distillation algorithm based on semantic contrast enhancement, which not only enables the compensation module to accurately fit the original layer in numerical output, but also introduces a contrastive learning mechanism to force the compensation module to maintain the uniqueness and distinguishability of hierarchical representation in high-dimensional semantic space and prevent feature collapse.

[0023] 3. This invention proposes a residual dynamic sparse quantization communication mechanism with error feedback, which compresses the parameters of bidirectional transmission between the end and cloud from continuous floating-point numbers to low-bit sparse matrices, thus breaking through the communication bottleneck. Attached Figure Description

[0024] Figure 1 This is a schematic diagram illustrating the principle of the present invention. Detailed Implementation

[0025] The technical solution of the present invention will be further described in detail below with reference to the accompanying drawings, but the scope of protection of the present invention is not limited to the following description.

[0026] like Figure 1 As shown, a federated fine-tuning method for heterogeneous large models based on structural deviation compensation includes the following steps:

[0027] S1. Given a heterogeneous large model federated fine-tuning architecture based on structural deviation compensation, including: one server and K heterogeneous clients;

[0028] The server stores a structurally complete global basic model (global large model). The original model is a pre-trained, complete ViT or MLP-Mixer model.

[0029] S2. Server-side feature distillation based on semantic contrast enhancement: in each round of federated communication Before starting, the server uses a public proxy dataset. For a global large model Each layer of pre-trained lightweight compensation module ;

[0030] In Deep Partial Training (DPT) federated fine-tuning, due to limited client computing power, only a "partial layer" (sub-network) of the model can be truncated and trained. The "original model," however, possesses all the complete, untruncated network layers. Let... This refers to the first existing element in this complete global model. The forward network structure and parameters of the layer.

[0031] Public proxy datasets are publicly available sample feature information that is accessible to the server, including:

[0032] Publicly available sample feature information: This is a small batch of publicly available data that represents the natural domains that various clients may be involved in.

[0033] Non-privacy: It does not contain any local private raw data of any particular client, thus ensuring privacy and security when operating on the server side.

[0034] Unlabeled or weakly labeled information: The operation here is "feature distillation" rather than supervised learning. Therefore, public agent datasets are mainly used as forward input incentives for the network and are not required to contain accurate labels for downstream tasks.

[0035] On the server side, the parameters of the original model are frozen (not updated), and the main goal is to optimize and train the parameters of the lightweight compensation module. The specific training process is as follows:

[0036] Step 1: Forward propagation to acquire features. Samples are extracted from the public agent dataset and input into the model. When the data flows to the... When layering, the output features of the previous layer are input into:

[0037] 1. The original model's first... Layer, to obtain standard output ;

[0038] 2. Lightweight compensation module to obtain analog output. ;

[0039] 3. At the same time, obtain the output of all other layers of the original model.

[0040] Step 2: Calculate the composite loss function Using the three sets of outputs obtained earlier, calculate the following composite loss:

[0041] Step 3: Backpropagation and Parameter Update. The feature alignment loss and semantic contrast loss are added together with weights (balancing coefficient β) to obtain the overall optimization objective. The objective is calculated relative to the lightweight compensation module using the backpropagation algorithm. The gradient of the parameters is updated only using an optimizer (such as Adam or SGD). The parameters.

[0042] To ensure It can accurately simulate the original model. layer To achieve the desired output while capturing the unique semantic information of that layer, this invention designs a composite loss function comprising two parts. :

[0043] Feature alignment loss (MSE Loss): Ensures that the output of the compensation module is as close as possible to the original layer output in numerical space.

[0044]

[0045] Semantic Contrastive Loss (InfoNCE): To prevent compensation modules in different layers from learning similar, collapsed feature representations, contrastive learning is introduced. For the ... Layer compensation module The output is compared with the corresponding layer of the original model. The output is treated as a positive sample pair, and compared with all other layers of the original model. The output is considered as a negative sample pair.

[0046]

[0047] in, For cosine similarity, For temperature coefficient. This loss forces The representation is close to At the same time, it is far removed from the representation of other layers.

[0048] Overall optimization goal: ,in This is the balance coefficient.

[0049] Traditional MSE loss only constrains the compensation module Numerically approximating the target layer This is a "point-to-point" low-dimensional constraint. In large models, as the number of layers increases, the level of feature abstraction gradually increases (semantic transfer). Using only MSE can easily lead to... Learning the average feature makes and The outputs are too similar (feature collapse). The semantic contrastive loss (InfoNCE) introduced in this invention imposes a high-dimensional topological constraint: it forces... Feature representation should not only be close to the target (Maximize the numerator), while simultaneously moving away from all other layers. The feature representation (minimization of the denominator) elevates the distillation process from a simple "numerical simulation" to a level of "semantic-functional decoupling." It ensures that each lightweight compensation module accurately captures the unique semantic functions at a specific depth of the original large model, preventing information confusion between layers. This is crucial for maintaining the expressive power of the deep architecture of the large model and significantly improves the generalization performance of the compensation module in complex downstream tasks.

[0050] S3. Server-side sparse distribution based on mask: The server distributes the backbone network subset and distilled compensation modules for the current round. To the When there are multiple clients, a spatiotemporally sparse delivery mechanism is executed.

[0051] Incremental calculation: The change in parameters of the server-side calculation module compared to the previous round. .in, This represents the compensation modules of all layers in the current t-th round of the global large model. The matrix formed by the parameters, This represents the compensation modules of all layers in the global large model in round t-1. The matrix formed by the parameters;

[0052] Mask generation: calculation Quantiles of absolute values are used to generate a binary mask matrix. Only the Top-valued ones are retained. % parameters (e.g.) ), and set the rest to zero.

[0053] Sparse transmission: The server only sends non-zero incremental values and their corresponding sparse indices to the client. The client then reconstructs the parameters for the current round using its local cache from the previous round.

[0054] Local cache (Baseline): The client's memory or hard drive stores the previous round (the...). The parameter matrix at the end of the (round) federated fine-tuning .

[0055] Receiving sparse information: The client receives two extremely small data packets from the server:

[0056] Non-zero increment values (i.e., the magnitude of parameter values that have changed significantly).

[0057] The corresponding sparse indexes (mask indices) record the specific coordinates / positions of these values in the parameter matrix.

[0058] Parameter Reconstruction:

[0059] The client initializes a zero-based matrix locally and, based on the received sparse index, fills the corresponding coordinates with non-zero increment values to reconstruct the complete sparse increment matrix. .

[0060] Then execute the formula: .

[0061] In this way, the parameters that were not transmitted (parameters that changed very little) will use the values from the previous round by default, thus perfectly "pieced together" the latest complete parameters required for the current round without receiving the entire large file. .

[0062] S4. The client performs two-stage local fine-tuning based on structural deviation perception;

[0063] After receiving the model, the client uses its local private data. Conduct training:

[0064] Phase 1 (Structural Compensation Training): Insert corresponding compensation modules at the locations of missing layers that are skipped by the client due to insufficient computing power. After connecting the computation graph, the weights of the retained backbone layer and the weights of the inserted compensation modules are jointly fine-tuned to minimize the loss of downstream tasks.

[0065] In the embodiments of this application, due to computing power limitations, the client cannot run the complete 12-layer model and may only retain layers 1, 3, 5, and 7 (skipping layers 2, 4, 6, etc.). This forced truncation will result in missing gradient flow during backpropagation. The specific process includes:

[0066] Plug-in filling: Insert the newly reconstructed, corresponding lightweight compensation modules (BRICKs) into the locations of skipped or missing layers in the network (such as layers 2, 4, and 6).

[0067] The computation graph is connected: the original backbone layer and the inserted compensation module are linked together to form a complete end-to-end network with a fully connected computation graph. Because the compensation module is extremely small, the client's computing power can still handle it even after the missing layers are filled.

[0068] Joint fine-tuning and optimization: Input local private data (such as local medical images) and calculate the true labels. With model prediction The task loss between them (such as cross-entropy loss) is then calculated. Then backpropagation is performed, while updating the backbone layer weights retained by the client and the weights of the newly inserted compensation modules.

[0069] Phase Two (Parallel Soft Distillation): For the backbone layer retained on the client side, it is connected in parallel with the corresponding compensation module. Lightweight soft distillation is performed using a linear annealing coefficient, enabling the compensation module to quickly absorb domain-specific knowledge from local data, providing better initialization for subsequent uploads to the server.

[0070] Phase one addressed the "missing layer" issue. Phase two aims to enable the compensation modules corresponding to the retained backbone layers to quickly learn the unique distribution (domain knowledge) of local data, so that it can be transmitted back to the server in the next round, including:

[0071] Parallel Structure: For backbone layers that the client can retain (e.g., layers 1 and 3), the corresponding compensation module for that layer is connected in parallel with the backbone layer for computation. For the same feature input, both the backbone layer and the compensation module produce an output.

[0072] Linear annealing fusion (core technique): Utilizing a parameter that decreases with the number of training steps. (Falls linearly from 1 to 0):

[0073] Final output features = *Backbone layer output+ * Output of the compensation module;

[0074] At the beginning of training ( The network fully trusts the massive backbone layer (Teacher).

[0075] As training progresses ( As the network gradually shrinks, it becomes increasingly reliant on the output of the compensation module (Student).

[0076] Calculate distillation loss: Calculate the mean square error (MSE) between the compensation module output and the backbone output, prompting the lightweight compensation module to mimic the behavior of the backbone.

[0077] S5. Client performs INT8 dynamic quantization upload with error feedback: After local fine-tuning is completed, the client uploads the updated model parameters, using the INT8 dynamic quantization mechanism with local error compensation during the upload.

[0078] The current update is summed with the accumulated quantization error from the previous cache. The sum is then dynamically quantized into INT8 format, and a new quantization error is calculated and stored in the local cache for the next round of compensation. The client only uploads the quantized INT8 matrix and scaling factor. This feature is enabled to reduce data transmission when communication overhead becomes a bottleneck; it is disabled to maximize accuracy when performing precision-sensitive tasks. This quantization function supports flexible on / off control, allowing users to choose to enable or disable it based on real-time network conditions and performance requirements.

[0079] Among them, quantization error (denoted as ):

[0080] Time point: Occurs in the current round (the...) When the quantization operation ends (round).

[0081] Meaning: When a high-precision floating-point number is forcibly rounded to a low-precision INT8 integer, some decimal places will inevitably be "erased". This portion of the actual value lost in the current round due to quantization and not sent to the server is called "quantization error" and is stored in the local cache.

[0082] The cumulative quantization error of the previous cache (denoted as ) ):

[0083] Time point: Occurs in the current round (the...) Before the start of the (round) quantitative operation.

[0084] Meaning: This is the previous round (number). The numerical residuals that were not sent out due to precision truncation in the previous round are retrieved from the cache in this round as historical information and prepared to be added to the update amount in this round for processing.

[0085] In the embodiments of this application, the physical meaning of the scaling factor is "how many floating-point units one INT8 unit represents". After receiving INT8 data, the server only needs to multiply it by this factor. This allows us to recover an approximate floating-point matrix.

[0086] The scaling factor uses the most common symmetric linear quantization, and the calculation process is as follows:

[0087] Determine the data to be quantized: Assume the accumulated matrix to be quantized in this round is as follows. (It is in FP32 format).

[0088] Finding the maximum absolute value: Finding the matrix The element with the largest absolute value is denoted as . .

[0089] Calculate scaling factor The range of INT8 is [-127, 127] (usually -128 is reserved for other uses or symmetrical applications). To map floating-point numbers in the range [-R, R] to [-127, 127], the scaling factor is calculated as follows:

[0090] With the scaling factor, the "error" generated in the current round can be restored.

[0091] Perform dynamic quantization (to obtain the INT8 matrix to be uploaded) ): here This indicates rounding to the nearest integer. This is the INT8 matrix that is ultimately transmitted to the server.

[0092] Simulate server-side dequantization (to obtain approximate floating-point numbers) If the server receives and The value it recovers is:

[0093] Calculate the current quantization error : Convert the original floating-point matrix to be quantized Subtract the approximate matrix after dequantization What remains is the portion that was truncated by "rounding": This calculation It will be immediately stored in the client's local memory / cache, and the current round ends.

[0094] In the embodiments of this application, in the next round (the... (round), the client completed the local two-stage fine-tuning and calculated a new model update (gradient), denoted as (FP32 format).

[0095] Without compensation: direct quantification Parameter updates with very small values (less than the scaling precision) will be rounded to 0 again. Over time, updates in small directions will never reach the server, and the model will fail to converge.

[0096] Perform compensation (execution error feedback):

[0097] 1. Accumulate historical errors: Before quantization, retrieve the error from the previous round from the local cache. Add it to the update amount of this round to obtain a new matrix to be quantized: ;

[0098] 2. Requantification: For Repeat the above steps of calculating the scaling factor, quantizing, and calculating the new error.

[0099] The core meaning of compensation: Those decimals that were truncated to 000 in the previous round because their values were too small did not disappear, but were preserved in the error. In the next round or subsequent rounds, as this error accumulates, it will eventually break through the minimum scale of quantization, thus being quantized as 1 or -1 and sent to the server. Mathematically, this mechanism ensures that, in the long run, the quantization parameters uploaded by the client are completely consistent with the expected value of the full-precision parameters (unbiased estimation).

[0100] A heterogeneous large model federated fine-tuning system based on structural deviation compensation includes a server and K heterogeneous clients;

[0101] The server includes:

[0102] The feature distillation module performs feature distillation based on semantic contrast enhancement in each round of federated communication. Before starting, utilize public proxy datasets. For a global large model Each layer of pre-trained lightweight compensation module ;

[0103] The sparse delivery module performs sparse delivery based on a mask: it distributes the current round's backbone network subset and the distilled compensation module. To the When there are multiple clients, a spatiotemporally sparse delivery mechanism is executed.

[0104] The client includes:

[0105] The local fine-tuning module performs two-stage local fine-tuning based on structural deviation perception.

[0106] The dynamic quantization upload module is used for INT8 dynamic quantization upload with error feedback: After local fine-tuning is completed, the updated model parameters are uploaded, and the upload adopts the INT8 dynamic quantization mechanism with local error compensation.

[0107] This invention makes full use of two core characteristics in the deep learning training process: sparsity and redundancy.

[0108] In the downlink (the sparsely distributed link in S3), since the model parameters usually change little between adjacent rounds, transmitting only the Top-P% parameters (sparse increments) with the largest changes is sufficient to carry most of the optimization information, thereby significantly reducing downlink bandwidth.

[0109] In the uplink (the link for dynamic quantization upload in S5), directly compressing a 32-bit floating-point number into an 8-bit integer introduces significant quantization noise, causing the model to fail to converge. The core of this invention lies in introducing an "Error Feedback Loop." It utilizes the client's local memory to "remember" the precision residuals discarded during each round of quantization and prioritizes compensating for them in the next round. Mathematically, this ensures that, in the long run, the cumulative expectation of the quantization gradient is an unbiased estimate of the full-precision gradient.

[0110] This innovative two-way combination completely breaks down the "bandwidth wall" of federated learning in the era of large models. It is not just simple data compression, but a gradient-preserving communication mechanism. It makes it possible to conduct collaborative training of large models in Internet of Things (IoT) devices with extremely limited bandwidth or in unstable cellular network environments, greatly expanding the application boundaries of federated learning.

[0111] In the embodiments of this application, non-independent identically distributed (Non-IID) experiments were conducted on benchmark datasets such as DomainNet and NICO++. The average test accuracy of this invention using the ViT-Base large model reached 70.33% and 88.11%, respectively, comprehensively surpassing advanced comparative methods such as DepthFL, InclusiveFL, and FedRA, with a significant accuracy improvement (up to 5.02%). In extremely heterogeneous scenarios with vastly different client computing power and some devices only capable of running models with a very small number of layers (e.g., only 4 layers), the performance of traditional methods drops sharply. The compensation mechanism of this invention ensures that low-end devices can still contribute high-quality gradient updates. Experiments show that even with extremely high system heterogeneity, the accuracy of this invention is still higher than the suboptimal method, demonstrating excellent stability and robustness. By combining downlink sparse mask transmission with uplink INT8 error feedback quantization, this invention significantly reduces the effective payload of federated communication while maintaining model accuracy. This makes it possible to deploy large model federated fine-tuning in edge network environments.

[0112] The foregoing description illustrates and describes a preferred embodiment of the present invention. However, as previously stated, it should be understood that the present invention is not limited to the forms disclosed herein and should not be construed as excluding other embodiments. It can be used in various other combinations, modifications, and environments, and can be altered within the scope of the inventive concept described herein through the foregoing teachings or techniques or knowledge in related fields. Any modifications and variations made by those skilled in the art that do not depart from the spirit and scope of the present invention should be within the protection scope of the appended claims.

Claims

1. A federated fine-tuning method for heterogeneous large models based on structural deviation compensation, characterized in that: Includes the following steps: S1. Given a heterogeneous large model federated fine-tuning architecture based on structural deviation compensation, including: one server and K heterogeneous clients; S2. Server-side feature distillation based on semantic contrast enhancement: in each round of federated communication Before starting, the server uses a public proxy dataset. For a global large model Each layer of pre-trained lightweight compensation module ; Assume the original model has all uncut, complete network layers, and the function... This indicates the first element that originally existed in the original model. The forward network structure and parameters of the layer; In step S2: to ensure It can accurately simulate the original model. layer The output can capture the unique semantic information of this layer, so a composite loss function with two parts is designed. : Feature alignment loss: ensures that the output of the compensation module is as close as possible to the original layer output in numerical space. ; Semantic contrastive loss: To prevent compensation modules in different layers from learning similar, collapsed feature representations, contrastive learning is introduced: For the th Layer compensation module ,Will The output corresponds to the layer of the original model. The output of is treated as a positive sample pair, while The output is the same as all other layers of the original model. The output is treated as a negative sample pair, and the semantic contrast loss is obtained: ; in, For cosine similarity, As a temperature coefficient, this loss forces The representation is close to Simultaneously, it should be far removed from the representations of other layers; Overall optimization objective: ,in This is the balance coefficient; S3. Server-side sparse distribution based on mask: The server distributes the backbone network subset and distilled compensation modules for the current round. To the When there are multiple clients, a spatiotemporally sparse delivery mechanism is executed. Step S3 includes: Calculate the increment: The change in parameters of the server-side compensation module compared to the previous round. ; in, This represents the compensation modules of all layers in the current t-th round of the global large model. The matrix formed by the parameters, This represents the compensation modules of all layers in the global large model in round t-1. The matrix formed by the parameters; Mask generation: calculation Quantiles of absolute values are used to generate a binary mask matrix. Only the Top-valued ones are retained. The % parameter is set to zero; Sparse transmission: The server only sends non-zero incremental values and their corresponding sparse indices to the client, and the client reconstructs the parameters of the current round using the local cache of the previous round; S4. The client performs two-stage local fine-tuning based on structural deviation perception; S5. Client performs INT8 dynamic quantization upload with error feedback: After local fine-tuning is completed, the client uploads the updated model parameters, using the INT8 dynamic quantization mechanism with local error compensation during the upload.

2. The method for federated fine-tuning of heterogeneous large models based on structural deviation compensation according to claim 1, characterized in that: The server stores a structurally complete global model. The original model is a pre-trained, complete ViT or MLP-Mixer model.

3. The heterogeneous large model federated fine-tuning method based on structural deviation compensation according to claim 1, characterized in that: Step S4 includes: After the client receives and reconstructs the model, it uses local private data. Two-stage training is conducted to achieve local fine-tuning: Phase 1 involves structural compensation training, where corresponding compensation modules are inserted at the locations of missing layers that are skipped by the client due to insufficient computing power. After connecting the computation graph, the weights of the retained backbone layer and the weights of the inserted compensation modules are jointly fine-tuned to minimize the loss of downstream tasks. Phase Two is Parallel Soft Distillation: For the backbone layer retained by the client, it is connected in parallel with the corresponding compensation module; a lightweight soft distillation is performed using a linear annealing coefficient, enabling the compensation module to quickly absorb the domain-specific knowledge of the local data, providing better initialization for subsequent upload to the server.

4. The heterogeneous large model federated fine-tuning method based on structural deviation compensation according to claim 1, characterized in that: Step S5 includes: The current update amount is added to the cumulative quantization error of the previous cache, and then the result is dynamically quantized into INT8 format. At the same time, the new quantization error is calculated and stored in the local cache for the next round of compensation. The difference between the value after dynamic quantization to INT8 format and the value before quantization is the quantization error. The cumulative quantization error of the previous cache round refers to the quantization error of the previous cache round. The client only uploads the quantized INT8 matrix and the scaling factor to the server; the physical meaning of the scaling factor is: the number of floating-point units represented by 1 INT8 unit.

5. The heterogeneous large model federated fine-tuning method based on structural deviation compensation according to claim 4, characterized in that: The INT8 dynamic quantization upload supports flexible on / off control, allowing users to choose to enable or disable it based on real-time network conditions and performance requirements. When communication overhead becomes a bottleneck, enable the INT8 dynamic quantization upload function to reduce data transmission volume; when performing precision-sensitive tasks, disable the INT8 dynamic quantization upload function to maximize accuracy.

6. A heterogeneous large model federated fine-tuning system based on structural deviation compensation, wherein the method described in any one of claims 1 to 5 is characterized in that: It includes one server and K heterogeneous clients; The server includes: The feature distillation module performs feature distillation based on semantic contrast enhancement in each round of federated communication. Before starting, utilize public proxy datasets. For a global large model Each layer of pre-trained lightweight compensation module ; The sparse delivery module performs sparse delivery based on a mask: it distributes the current round's backbone network subset and the distilled compensation module. To the When there are multiple clients, a spatiotemporally sparse delivery mechanism is executed. The client includes: The local fine-tuning module performs two-stage local fine-tuning based on structural deviation perception. The dynamic quantization upload module is used for INT8 dynamic quantization upload with error feedback: After local fine-tuning is completed, the updated model parameters are uploaded, and the upload adopts the INT8 dynamic quantization mechanism with local error compensation.