An industrial internet of things-oriented semantic consistent federated knowledge distillation method
By generating proxy data through image-to-text multi-agent and text conditional diffusion models, and dynamically selecting teacher clients for adaptive knowledge distillation, the problems of semantic misalignment and high communication overhead under non-independent and identically distributed conditions in the Industrial Internet of Things are solved, thereby improving the convergence stability and generalization performance of the model.
Patent Information
- Authority / Receiving Office
- CN · China
- Patent Type
- Applications(China)
- Current Assignee / Owner
- QINGDAO UNIV OF TECH
- Filing Date
- 2026-05-15
- Publication Date
- 2026-06-12
Smart Images

Figure CN122197907A_ABST
Abstract
Description
Technical Field
[0001] This invention belongs to the fields of industrial Internet of Things (IIoT) and federated learning technology, and specifically relates to a semantically consistent federated knowledge distillation method for industrial Internet of Things. Background Technology
[0002] In the Industrial Internet of Things (IIoT) scenario, numerous sensors, edge controllers, industrial vision devices, and predictive maintenance systems continuously collect data on equipment operation, fault diagnosis, and quality inspection. This data is typically distributed across different factories, production lines, or equipment nodes, and possesses high commercial sensitivity and privacy attributes. Directly aggregating and using this data for model training can easily lead to data leaks and cross-entity data compliance risks.
[0003] Federated learning, due to its ability to retain original data locally and upload only model parameters or gradients for collaborative training, has become an important technical path for distributed intelligent modeling in the Industrial Internet of Things (IIoT). However, the application of federated learning in IIoT still faces significant technical challenges. First, the differences in equipment types, operating conditions, sensor configurations, and task objectives among different industrial nodes result in the local data of each client typically exhibiting non-independent and identically distributed (Non-IID) characteristics. Under these conditions, traditional federated learning methods based on model parameter or gradient aggregation (such as FedAvg) are prone to client drift, local model bias, and global average information loss, thereby reducing model convergence stability and generalization ability. Second, industrial edge environments generally suffer from constraints such as limited bandwidth, limited communication rounds, and limited terminal computing resources. Frequent transmission of complete model parameters or high-dimensional intermediate features can create a significant communication burden.
[0004] To mitigate the communication overhead and heterogeneous data impact of parameter aggregation, existing techniques incorporate knowledge distillation into federated learning. This involves transmitting knowledge representations such as soft labels, category probability distributions, feature embeddings, or historical model outputs to achieve knowledge transfer between clients. Compared to full parameter transmission, this knowledge distillation-based federated learning method has lower communication costs and can support heterogeneous model collaboration to some extent. However, existing knowledge distillation methods typically rely on numerical knowledge carriers (such as logits, probability distributions, or hidden features). In severe Non-IID scenarios, the numerical outputs of different clients often correspond to their respective local data distributions and local decision boundaries. The same category or similar samples lack consistent semantic representation across different clients. Directly aggregating or distilling these outputs can easily lead to semantic misalignment, knowledge drift, and negative transfer, affecting the stable recognition capability of the global model.
[0005] Furthermore, some knowledge distillation-based federated learning methods rely on public or auxiliary datasets on the server side as a shared distillation medium to align outputs from different clients and train a global model. However, in an Industrial Internet of Things (IIoT) environment, representative publicly available industrial data is difficult to obtain, and there may be distributional discrepancies between publicly available data and actual operating conditions, failing to adequately cover specific production lines, equipment, or failure modes. Other data-free distillation methods employ generative models to construct surrogate samples to replace real public data for knowledge transfer; however, existing generative surrogate data methods suffer from unstable generation quality, insufficient sample reliability in the early training phase, heavy server-side computational burden, and insufficient semantic controllability of synthesized samples, making it difficult to form a reliable, compact, and consistent shared knowledge space with multiple clients.
[0006] In recent years, the development of large language models, multi-agent collaborative models, and diffusion models has provided new solutions to the aforementioned problems. Multi-agent mechanisms can extract semantics, assess quality, and screen consistency from local samples from different perspectives; diffusion models can generate controllable and diverse proxy data under semantic constraints. However, existing federated learning and knowledge distillation schemes typically fail to effectively combine multi-agent semantic consistency construction, text-conditional diffusion-based proxy data generation, and dynamic teacher knowledge construction tailored to client capability differences. This makes it difficult to simultaneously address issues such as semantic misalignment caused by Non-IID, lack of common data, insufficient proxy data quality, high communication overhead, and static bias in the distillation target.
[0007] In summary, while existing federated learning methods for the Industrial Internet of Things (IIoT) have made some progress in terms of privacy protection, communication efficiency, and heterogeneous data collaboration, there is still a lack of technical solutions that can establish a consistent semantic space across clients using lightweight semantic knowledge without transmitting raw industrial data, and improve the model's convergence stability, generalization performance, and robustness through high-quality proxy data and dynamic knowledge distillation. Summary of the Invention
[0008] To address the aforementioned technical issues, this invention provides a semantically consistent federated knowledge distillation method for the Industrial Internet of Things (IIoT). This method aims to establish a consistent semantic knowledge space across clients, generate high-quality and controllable proxy data, and achieve dynamic adaptive distillation without relying on public datasets or transmitting raw industrial data. This, in turn, improves the convergence stability and generalization ability of federated models in heterogeneous IIoT environments.
[0009] To achieve the above objectives, the technical solution of the present invention is as follows: A semantically consistent federated knowledge distillation method for the Industrial Internet of Things (IIoT) includes the following steps: Step 1: Build a federated learning system that includes a server and multiple industrial clients, with each client storing local private data; Step 2: Each client uses the image-to-text multi-agent module to extract candidate semantic text from local private data. After quality evaluation and consistency screening, a unified semantic label or semantic prototype is generated and uploaded to the server along with the output of the lightweight model. Step 3: The server receives and aggregates all semantic tags or semantic prototypes uploaded by clients to form a global semantic condition set; Step 4: The server uses the global semantic condition set as input and generates a proxy dataset through a text conditional diffusion model, which serves as a shared medium for cross-client knowledge distillation. Step 5: The server uses the proxy dataset to evaluate the capabilities of each client model in different categories, and dynamically selects a teacher client for each student client based on the evaluation results, thus constructing a dynamic teacher objective; Step Six: The client receives the dynamic teacher objectives sent by the server, performs adaptive knowledge distillation on the proxy dataset, and updates the local model by combining it with local private data; Step 7: Repeat steps 2 to 6 until the model converges or reaches the preset number of communication rounds.
[0010] In the above scheme, the image-to-text multi-agent module in step two includes multiple text extraction agents and at least one evaluation agent; the text extraction agents are used to generate candidate semantic texts of the samples from different semantic perspectives; the evaluation agent is used to score the quality of the candidate semantic texts according to the correspondence between the samples and the candidate semantic texts, and to remove texts that are below a preset quality threshold.
[0011] In the above scheme, the consistency screening in step two specifically involves: mapping candidate semantic texts to a shared semantic vector space, calculating the similarity between multiple candidate semantic descriptions of the same sample, and filtering out descriptions that are below a preset semantic consistency threshold; for samples that meet the consistency requirements, generating unified semantic labels through majority attribute fusion, template fusion, or constrained large language model summarization.
[0012] In the above scheme, when the server aggregates the semantic tags or semantic prototypes uploaded by the client in step three, it also performs deduplication, clustering and frequency-aware selection on the semantic information of multiple clients, and uses the lightweight model output uploaded by the client to calculate the confidence weight of each semantic condition, forming a semantic condition sampling distribution to guide the generation of proxy data.
[0013] In the above scheme, the text conditional diffusion model described in step four takes the global semantic condition set as the condition input, performs a reverse denoising process starting from random noise, and generates proxy samples that are consistent with the semantic conditions; the generated proxy samples and the corresponding semantic labels together constitute the proxy dataset of the current communication round.
[0014] In the above scheme, the dynamic selection of teacher clients in step five specifically involves: for the target student client, the server selects other clients whose ability indicators in this category are higher than those of the student client and meet the preset threshold as candidate teachers, and assigns aggregation weights according to the ability indicators of each teacher in the corresponding category, and aggregates the outputs of multiple teacher clients into the dynamic teacher target of the student client.
[0015] In the above scheme, the adaptive knowledge distillation in step six includes: the client simultaneously optimizes the soft target knowledge distillation loss and the semantic label supervision loss on the proxy dataset; and adaptively adjusts the weights of the knowledge distillation loss and the semantic supervision loss according to the difference between the client's own performance on the proxy dataset and the global average performance.
[0016] In a further technical solution, in step five, when there is a lack of teacher clients that meet the conditions under a certain category, the server uses the global aggregation result of the lightweight output of all clients as a substitute teacher signal to ensure that the distillation process is executed continuously.
[0017] In a further technical solution, the text conditional diffusion model is a pixel space diffusion model or a latent space diffusion model; the lightweight model output uploaded by the client is logits, soft labels, or low-dimensional feature summaries.
[0018] In a further technical solution, the client performs semantic knowledge generation and uploading at intervals during multiple rounds of communication, and the server recalculates dynamic teacher objectives or reuses historical proxy datasets at intervals during multiple rounds of communication.
[0019] Through the above technical solution, the semantically consistent federated knowledge distillation method for the Industrial Internet of Things provided by this invention has the following beneficial effects: First, this invention improves the semantic alignment capability of federated learning in heterogeneous industrial IoT scenarios. It elevates client-side knowledge representation from simple numerical logits or feature vectors to semantic knowledge generated, evaluated, and screened for consistency by multiple agents. This reduces interference from local data distribution and local decision boundaries on knowledge representation, mitigating the risks of semantic misalignment, knowledge drift, and negative transfer under non-independent and identically distributed conditions.
[0020] Second, it reduces reliance on public datasets and raw data sharing. This invention utilizes lightweight semantic information uploaded by the client as a condition to generate a proxy dataset on the server side through a text conditional diffusion model. It eliminates the need for centralized collection of real industrial data and does not require a pre-existing public dataset that perfectly matches actual working conditions, thus helping to meet the requirements of industrial data privacy protection, cross-entity collaboration, and data compliance.
[0021] Third, it improves the controllability of proxy data and the quality of distillation media. Since proxy samples are generated under the guidance of globally aggregated semantic conditions, the generation process can revolve around semantic knowledge jointly confirmed by multiple clients. Compared with generative proxy data methods that lack semantic constraints, it has stronger semantic interpretability, category coverage, and generation controllability, which helps to form a stable cross-client shared knowledge space.
[0022] Fourth, this invention improves the stability and adaptability of the knowledge distillation process. It evaluates the capabilities of different clients across different categories based on the proxy dataset and dynamically selects teachers, assigns teacher weights, and adjusts the distillation intensity for student clients. This avoids biased supervision caused by fixed teachers or uniform aggregation strategies, allowing different clients to absorb globally effective knowledge while maintaining the adaptability of their local features.
[0023] Fifth, it reduces communication burden and enhances applicability to industrial deployment. During communication, this invention primarily transmits semantic tags, semantic prototypes, and lightweight model outputs, reducing reliance on complete model parameters, high-dimensional gradients, or large-scale intermediate features. Simultaneously, the proxy dataset is generated on the server side and reused as a shared distillation medium, which helps reduce communication rounds and single-round transmission overhead, making it suitable for industrial IoT environments with limited bandwidth and equipment resources.
[0024] Sixth, it improves the model's convergence performance, generalization ability, and robustness. Through the synergistic effect of semantically consistent knowledge construction, textual conditional diffusion proxy data generation, and dynamic knowledge distillation, this invention can improve the convergence stability of model training under highly heterogeneous multi-client data conditions, enabling the federated model to better adapt to industrial intelligent recognition tasks under different equipment, different working conditions, and different category distributions. Attached Figure Description
[0025] To more clearly illustrate the technical solutions in the embodiments of the present invention or the prior art, the accompanying drawings used in the description of the embodiments or the prior art will be briefly introduced below.
[0026] Figure 1 This is a schematic diagram of a system architecture for a semantically consistent federated knowledge distillation method for the Industrial Internet of Things disclosed in an embodiment of the present invention; Figure 2 This is a flowchart of a semantically consistent federated knowledge distillation method for the Industrial Internet of Things disclosed in an embodiment of the present invention; Figure 3 The graph shows a comparison of the convergence performance of various federated learning methods in Fashion-MNIST under different data heterogeneity conditions; where, (a) (b) (c) ; Figure 4The graph shows a comparison of the convergence performance of various federated learning methods under different data heterogeneity conditions in CIFAR-10; where, (a) (b) (c) ; Figure 5 The graph shows a comparison of the convergence performance of various federated learning methods under different data heterogeneity conditions in CIFAR-100; where, (a) (b) (c) . Detailed Implementation
[0027] This invention provides a semantically consistent federated knowledge distillation method for the Industrial Internet of Things (IIoT). To enable those skilled in the art to clearly and completely understand and implement this invention, the specific embodiments of the invention are described below using an IIoT image classification or fault identification scenario. These embodiments are only for illustrating the technical solutions of this invention and do not constitute a limitation on the scope of protection of this invention.
[0028] In this embodiment, as Figure 1 As shown, an Industrial Internet of Things (IIoT) system comprises a server and multiple industrial clients. These industrial clients can be factory edge servers, production line control nodes, industrial camera nodes, equipment monitoring terminals, or other edge computing nodes with local training capabilities. Each client stores a local private dataset, which may include equipment images, fault images, operational status images, or other data suitable for industrial intelligent identification. The server does not receive raw data from the clients; it only receives semantic text, semantic prototypes, and lightweight model outputs uploaded by the clients, and is responsible for generating proxy data, constructing teacher knowledge, and coordinating communication rounds.
[0029] like Figure 2 As shown, it includes the following steps: S1: Build a server-client collaborative federated learning system.
[0030] In this embodiment, the number of clients participating in federated training is Server initialization communication rounds Number of task categories Semantic text quality threshold Semantic consistency threshold , size of proxy dataset Noise reduction steps in text conditional diffusion model Distillation temperature And the upper and lower bounds of the dynamic distillation weights. Each client initializes a local model. and local private data Basic training is performed on the server. Local data on the client is not uploaded to the server; the server only serves as a coordinating node for semantic knowledge aggregation, proxy data generation, and distillation target construction.
[0031] S2: The client extracts local samples and generates candidate semantic text.
[0032] In the In round-robin communication, the first Each client from its local private dataset Extracting a subset of samples , For the first Round communication Each client extracts a subset of samples from its local dataset. For each sample in the subset... The client invokes an image-to-text multi-agent module, which includes multiple text extraction agents and at least one scoring agent. The text extraction agents generate candidate semantic texts for the samples from different semantic perspectives. The scoring agent scores the candidate semantic texts based on the correspondence between the samples and the candidate semantic texts, and removes texts below a preset quality threshold. This embodiment includes... Each text extraction agent. Based on samples Generate a candidate semantic text , denoted as: ; Each text extraction agent can employ different image description models, different cue templates, different sampling parameters, or different visual attention strategies to obtain complementary semantic descriptions. Semantic descriptions are used to express the device type, component status, fault characteristics, appearance attributes, operating condition cues, or other interpretable semantic information related to the recognition task in the samples.
[0033] S3: The client evaluates and filters candidate semantic texts.
[0034] To reduce the amount of erroneous, irrelevant, or hallucinatory descriptions entering the subsequent distillation process, the client sets an evaluation agent. The evaluation of the intelligent agent is based on the samples. With candidate semantic text The correspondence between them is used to evaluate the quality of candidate semantic texts and obtain a quality score for the candidate semantic texts. : ; When rating Not lower than the preset semantic text quality threshold When, retain the candidate semantic text; when Below the preset semantic text quality threshold When this happens, the candidate semantic text is removed. Through this step, the client forms a set of candidate semantic texts for each sample, avoiding the impact of low-reliability semantic knowledge on cross-client knowledge aggregation.
[0035] S4: The client performs semantic consistency calculations and generates unified semantic tags.
[0036] For candidate semantic texts that pass the quality screening, the client utilizes semantic embedding functions. Mapping this to a shared semantic vector space yields semantic embedding vectors. For the same sample The client calculates the average similarity among multiple candidate semantic descriptions as the semantic consistency index for the sample. If the semantic consistency index is not lower than a preset semantic consistency threshold... If the multiple candidate semantic texts of the sample have consistent semantics, then the sample is determined to have consistent semantics; if the threshold is lower... If the sample is found to have semantic ambiguity or conflict, it will not be included in the upload set.
[0037] For samples that meet the consistency requirements, the client aggregates the candidate semantic text to generate unified semantic tags. Aggregation methods can include keyword attribute voting, templated fusion, semantic embedding clustering, or constrained large language model summarization. The aggregation process is constrained to retain entity, attribute, state, and category cues relevant to the industrial recognition task, and does not introduce semantic information that does not correspond to the samples.
[0038] S5: The client forms a semantic knowledge set and uploads lightweight knowledge.
[0039] No. The client will be the first All unified semantic tags that meet the quality and consistency conditions in the round constitute a semantic knowledge set. , For the first The client in the first A set of semantic knowledge generated in rounds. To further reduce communication overhead, the client can calculate the average or cluster centers of multiple semantic tags under the same category after semantic embedding, forming a category-level semantic prototype. , For the first The client in the first The first round generated Class semantic prototype. Client uploads to server. or And upload the lightweight model output corresponding to this semantic knowledge. , For the first Lightweight model output uploaded by each client. The lightweight model output can be logits, soft labels, or low-dimensional feature summaries. Clients do not upload original samples. Complete local dataset Or high-dimensional model parameters.
[0040] S6: Server aggregates global semantic conditions.
[0041] After receiving the semantic knowledge sets uploaded by all clients, the server performs deduplication, semantic embedding clustering, and frequency-aware selection on the semantic text or semantic prototypes to obtain a global semantic condition set. The server further utilizes the lightweight model output uploaded by the client to calculate the confidence weights of the semantic conditions. For a semantic condition... The server can determine a client's confidence level in a semantic condition based on the entropy of the client's output distribution; the more concentrated the output distribution, the higher the confidence weight. The server forms a global semantic condition sampling distribution based on the confidence weights of each client. This is used to guide the generation of subsequent proxy data.
[0042] S7: The server generates a proxy dataset based on a text conditional diffusion model.
[0043] The server is configured with a text conditional diffusion model, which uses a global set of semantic conditions. The input is conditional and includes a forward noise generation process and a reverse noise reduction process. The forward noise generation process is used to progressively perturb the sample into Gaussian noise, and the reverse noise reduction process is used to progressively recover the surrogate sample from random noise under semantic constraints.
[0044] During the proxy data generation phase, the server samples the distribution according to global semantic conditions. Extract semantic conditions And start execution from Gaussian noise Step-by-step reverse denoising to obtain semantic conditions Corresponding proxy samples The generated proxy samples and their corresponding semantic labels are combined to form a proxy dataset. : ; in, The index for the proxy sample, from 1 to . For the first A generated proxy sample. For the first The textual conditions corresponding to each sample are the semantic conditions used when generating that sample. . This represents the total number of samples in the proxy dataset.
[0045] When downstream tasks require discrete category supervision, the server maps semantic labels using a text-to-category mapping function. Category Labels This proxy dataset serves as a distillation medium shared among clients and does not contain any raw industrial data uploaded by the clients.
[0046] S8: The server evaluates the client category capabilities based on proxy data.
[0047] The server will obtain the first Round communication Local model for each client In the proxy dataset A unified evaluation is conducted on the above. For the k-th class of agent sample set... Calculate the first The client model in the first Round of 1 Capability metrics of proxy samples .
[0048] ; in, For the first Wheel Client In the Classification accuracy on a class basis For the first The central label of the proxy dataset generated by the round server is All sample sets, For the proxy dataset, the first The total number of samples in the class For the rth round client The model, for input samples The original output logits; The activation function, usually softmax, is used to transform the model output into a probability distribution; Choose the class with the highest probability from the class predicted by the model. An indicator function that takes the value 1 if the condition is true and 0 if the condition is false.
[0049] This capability metric can be classification accuracy, average confidence score, the inverse of cross-entropy loss, or a weighted combination thereof. Through this step, the server obtains the relative strength of each client under different categories or semantic conditions, providing a basis for subsequent dynamic teacher selection.
[0050] S9: The server selects a category-aware teacher for each student client.
[0051] For the student client that needs updating and categories The server selects candidate teachers from clients other than the client itself. Candidate teachers must meet certain criteria in their category. The ability indicators of the candidate teachers are higher than those of the student clients and reach the preset category threshold. The server selects several clients with higher ability indicators from the candidate teachers as the student clients in the relevant category. The set of teachers is defined below. The category threshold can be gradually reduced with each communication round, so that highly reliable teacher knowledge is prioritized in the early stages of training, and the range of knowledge sources that can participate is gradually expanded in the later stages of training.
[0052] S10: The server uses the dynamic knowledge creation module to build dynamic teacher objectives.
[0053] For student clients In category The teacher collection below, the dynamic knowledge creation module, is based on the category of each teacher's client. The ability metrics are assigned aggregate weights. Teacher clients with higher ability metrics have greater teacher weights. For proxy samples... The dynamic knowledge generation module aggregates the model output of the selected teacher client on the proxy sample according to weights to obtain the student client. The dynamic teacher objective is determined by the dynamic knowledge creation module. If no teacher client meets the criteria under a certain category, the module uses the global aggregated result of the lightweight output of all clients as a substitute teacher objective to ensure the continuous execution of the distillation process.
[0054] S11: The client performs adaptive distillation updates based on dynamic knowledge.
[0055] The server sends the proxy dataset, corresponding semantic labels, and dynamic teacher objectives to the relevant clients, or provides the clients with accessible proxy data and distillation objectives. (Client) Optimize the local model on the proxy dataset, with training objectives including soft-objective knowledge distillation loss. and semantic label supervision loss The total loss can be expressed as: ; in, Used to constrain the consistency between client output and dynamic teacher goals. Used to constrain the consistency between client output and semantic tags or mapping category tags. The client in the first Distillation weight of the wheel According to the client Performance on the proxy dataset is adaptively adjusted. When client performance is below the average of all clients, the distillation weight is increased to enhance its absorption of teacher knowledge; when client performance is close to or above the average, the distillation weight is decreased to avoid over-distillation and preserve local adaptability.
[0056] S12: The client combines local private data to complete the local model update.
[0057] After receiving the dynamic distillation knowledge, the client Distillation training on proxy data can be combined with supervised training on local private data. For local private data, the client updates the model according to the conventional supervised learning objective; for proxy data, the client updates the model according to the dynamic distillation objective. In this way, the client model can maintain its adaptability to the local industrial scenario while also absorbing effective knowledge from other clients in relevant categories.
[0058] S13: Repeat the communication rounds and obtain the final model.
[0059] The server and client repeatedly execute steps S2 to S12 until the preset number of communication rounds is reached. The global model performance meets the requirements or the model converges. After training, each client retains the updated local model for local industrial image classification, equipment status recognition, fault diagnosis, or other industrial IoT intelligent analysis tasks. Since the original industrial samples are not transmitted during training, this embodiment can achieve cross-client knowledge collaboration while protecting data privacy.
[0060] Experimental data description To verify the effectiveness of the proposed method (hereinafter referred to as FedSKD), comparative experiments were conducted on several publicly available image classification datasets. The experimental setup is as follows: Datasets: The three benchmark datasets Fashion-MNIST, CIFAR-10 and CIFAR-100 are used to simulate the image classification task of the Industrial Internet of Things.
[0061] Data heterogeneity settings: Control the differences in data distribution among clients using Dirichlet distribution, and set heterogeneity parameters separately. The values are 0.05, 0.1, and 0.3, with larger values indicating greater differences in data distribution among clients.
[0062] Comparison methods: FedAvg (traditional federated averaging), FedGKD (federated knowledge distillation), and ProxyFL (agent-assisted federated learning) were selected as baseline methods.
[0063] Evaluation metric: The classification accuracy of the global model on the test set.
[0064] Experimental results Figure 3 This demonstrates a comparison of convergence performance on the Fashion-MNIST dataset. Figure 3 As can be seen from the convergence curves of the three sets of experiments, all four algorithms exhibit a pattern of rapid initial acceleration followed by stable convergence under different data heterogeneity settings. The traditional FedAvg baseline algorithm has the lowest accuracy across all settings, and its convergence speed and steady-state performance are weaker than the other three federated learning methods. The FedSKD algorithm of this invention performs best in both convergence speed and final steady-state accuracy, followed by ProxyFL and FedGKD.
[0065] Figure 4 This paper presents a comparison of convergence performance on the CIFAR-10 dataset. Under three different settings of CIFAR-10, all methods exhibit a convergence pattern of gradually increasing and then stabilizing. Overall, FedSKD consistently performs best, demonstrating faster convergence speed and higher steady-state accuracy compared to FedAvg, FedGKD, and ProxyFL, with its advantage persisting throughout the training process. When the data heterogeneity is 0.05, the overall performance benchmark is highest and the convergence effect is best; performance is moderate at 0.1; and performance is weak at 0.3. Even under different experimental settings, FedSKD consistently outperforms other compared methods, demonstrating stronger adaptability and superior performance.
[0066] Figure 5 This paper presents a comparison of convergence performance on the CIFAR-100 dataset. Under the three heterogeneity settings of CIFAR-100 data, all compared methods generally show an overall trend of training accuracy gradually increasing with iterations and tending towards convergence. A horizontal comparison with the three baseline methods FedAvg, FedGKD, and ProxyFL shows that FedSKD significantly outperforms all three settings in terms of accuracy throughout the entire training process, exhibiting faster convergence speed and the highest steady-state convergence accuracy. This advantage persists throughout the entire training phase. This demonstrates that FedSKD has a stronger ability to resist data heterogeneity, fully validating the effectiveness and superiority of the proposed method in complex federated learning scenarios.
[0067] Experimental conclusions The experimental results above show that the semantically consistent federated knowledge distillation method (FedSKD) for industrial IoT proposed in this invention can achieve faster convergence speed and higher global model accuracy than existing methods under various datasets and different data heterogeneity conditions, and has good robustness and generalization ability.
[0068] The above description of the disclosed embodiments enables those skilled in the art to make or use the invention. Various modifications to these embodiments will be readily apparent to those skilled in the art, and the general principles defined herein may be implemented in other embodiments without departing from the spirit or scope of the invention. Therefore, the invention is not to be limited to the embodiments shown herein, but is to be accorded the widest scope consistent with the principles and novel features disclosed herein.
Claims
1. A semantically consistent federated knowledge distillation method for the Industrial Internet of Things, characterized in that, Includes the following steps: Step 1: Build a federated learning system that includes a server and multiple industrial clients, with each client storing local private data; Step 2: Each client uses the image-to-text multi-agent module to extract candidate semantic text from local private data. After quality evaluation and consistency screening, a unified semantic label or semantic prototype is generated and uploaded to the server along with the output of the lightweight model. Step 3: The server receives and aggregates all semantic tags or semantic prototypes uploaded by clients to form a global semantic condition set; Step 4: The server uses the global semantic condition set as input and generates a proxy dataset through a text conditional diffusion model, which serves as a shared medium for cross-client knowledge distillation. Step 5: The server uses the proxy dataset to evaluate the capabilities of each client model in different categories, and dynamically selects a teacher client for each student client based on the evaluation results, thus constructing a dynamic teacher objective; Step Six: The client receives the dynamic teacher objectives sent by the server, performs adaptive knowledge distillation on the proxy dataset, and updates the local model by combining it with local private data; Step 7: Repeat steps 2 to 6 until the model converges or reaches the preset number of communication rounds.
2. The method according to claim 1, characterized in that, The image-to-text multi-agent module in step two includes multiple text extraction agents and at least one evaluation agent; the text extraction agents are used to generate candidate semantic texts of the samples from different semantic perspectives; the evaluation agent is used to score the quality of the candidate semantic texts based on the correspondence between the samples and the candidate semantic texts, and to remove texts that are below a preset quality threshold.
3. The method according to claim 1, characterized in that, The consistency screening in step two specifically involves: mapping candidate semantic texts to a shared semantic vector space, calculating the similarity between multiple candidate semantic descriptions of the same sample, and filtering out descriptions below a preset semantic consistency threshold; for samples that meet the consistency requirements, generating unified semantic labels through majority attribute fusion, template fusion, or constrained large language model summarization.
4. The method according to claim 1, characterized in that, In step three, when the server aggregates the semantic tags or semantic prototypes uploaded by the client, it also performs deduplication, clustering, and frequency-aware selection on the semantic information from multiple clients, and uses the lightweight model output uploaded by the client to calculate the confidence weight of each semantic condition, forming a semantic condition sampling distribution to guide the generation of proxy data.
5. The method according to claim 1, characterized in that, The text conditional diffusion model described in step four takes the global semantic condition set as input, performs a reverse denoising process starting from random noise, and generates proxy samples that are consistent with the semantic conditions. The generated proxy samples and the corresponding semantic labels together constitute the proxy dataset of the current communication round.
6. The method according to claim 1, characterized in that, The dynamic selection of teacher clients in step five is as follows: For the target student client, the server selects other clients whose ability indicators in this category are higher than those of the student client and meet the preset threshold as candidate teachers, and assigns aggregation weights according to the ability indicators of each teacher in the corresponding category, and aggregates the outputs of multiple teacher clients into the dynamic teacher target of the student client.
7. The method according to claim 1, characterized in that, The adaptive knowledge distillation described in step six includes: the client simultaneously optimizing the soft-target knowledge distillation loss and the semantic label supervision loss on the proxy dataset; and adaptively adjusting the weights of the knowledge distillation loss and the semantic supervision loss based on the difference between the client's own performance on the proxy dataset and the global average performance.
8. The method according to claim 1, characterized in that, In step five, when there is no teacher client that meets the conditions under a certain category, the server uses the global aggregation result of the lightweight output of all clients as a substitute teacher signal to ensure that the distillation process is executed continuously.
9. The method according to claim 1, characterized in that, The text conditional diffusion model is a pixel space diffusion model or a latent space diffusion model; the lightweight model output uploaded by the client is logits, soft labels, or low-dimensional feature summaries.
10. The method according to claim 1, characterized in that, The client generates and uploads semantic knowledge at intervals during multiple rounds of communication, while the server recalculates dynamic teacher objectives or reuses historical proxy datasets at intervals during multiple rounds of communication.