An asynchronous federated learning aggregation method and system based on clustering cache

An asynchronous federated learning method, which uses intermediate feature clustering and caching mechanisms on the client side, solves the problems of training speed and model obsolescence in heterogeneous environments, improves the efficiency and accuracy of federated learning, adapts to changes in data distribution, and ensures stable convergence of the model.

CN122087484BActive Publication Date: 2026-06-26HEFEI UNIV OF TECH

Patent Information

Authority / Receiving Office
CN · China
Patent Type
Patents(China)
Current Assignee / Owner
HEFEI UNIV OF TECH
Filing Date
2026-04-23
Publication Date
2026-06-26

Smart Images

  • Figure CN122087484B_ABST
    Figure CN122087484B_ABST
Patent Text Reader

Abstract

The present application relates to the technical field of artificial intelligence and federated machine learning, and particularly relates to an asynchronous federated learning aggregation method and system based on clustering cache. The present application clusters clients by intermediate features of the clients. In global aggregation, intra-cluster aggregation is firstly performed, and then global aggregation is performed based on client clusters. The client clusters are dynamically updated. In intra-cluster aggregation, an active set and a slow set are introduced to realize an asynchronous participation mechanism of intra-cluster aggregation. Weighted aggregation of the active set and the slow set solves the problem of system heterogeneity. The present application solves the defect of insufficient timeliness of existing federated learning methods in a data heterogeneous scene, and guarantees the timeliness and model training accuracy of federated learning in the data heterogeneous scene.
Need to check novelty before this filing date? Find Prior Art

Description

Technical Field

[0001] This invention relates to the fields of artificial intelligence and federated machine learning technology, and in particular to an asynchronous federated learning aggregation method and system based on clustering caching. Background Technology

[0002] Traditional synchronous federated learning (such as the FedAvg algorithm) requires the server to wait for all selected clients to complete their local training before aggregation. In heterogeneous environments, the computing power of devices varies significantly, leading to the "weakest link" effect: training speed is limited by the slowest device, resulting in low overall training efficiency.

[0003] To address the synchronous waiting problem, existing asynchronous federated learning methods (such as FedAsync) allow clients to upload updates asynchronously. However, the asynchronous mechanism causes some clients to use outdated global models for training, resulting in gradients with "staleness." Direct aggregation of these gradients reduces the model's convergence accuracy.

[0004] Some studies (such as EAFL) attempt to improve aggregation quality by grouping clients with similar data distributions using clustering methods. However, gradients themselves are highly volatile and affected by factors such as learning rate and batch sampling, leading to unstable clustering results. Frequent clustering changes can actually negatively impact training performance.

[0005] While existing caching or reuse mechanisms (such as CaBaFL) have proposed the idea of ​​gradient caching, they do not adequately assess the timeliness of cached gradients and fail to effectively quantify feature drift caused by model evolution, which may lead to the misuse of outdated gradients. Summary of the Invention

[0006] To overcome the shortcomings of existing federated learning methods in terms of timeliness in heterogeneous data scenarios, this invention proposes an asynchronous federated learning aggregation method based on clustering caching, which ensures the timeliness of federated learning and the accuracy of model training in heterogeneous data scenarios.

[0007] This invention proposes an asynchronous federated learning aggregation method based on clustering caching, which is used for federated learning of agents on the client side in a visual classification scenario. The agents are used to perform target classification in the client scenario.

[0008] S1. Deploy and initialize the intelligent agent on the client;

[0009] S2. Train the client on the local dataset until convergence, and calculate the client's intermediate features; the client's intermediate features come from the features output by the client's intermediate layer for the last batch of training samples trained locally.

[0010] S3. Clustering is performed based on the intermediate features of the client, and a cluster head is specified for each client cluster. The cluster head cache space is set and initialized. After clustering, each client stores or updates its cache entry in the cluster head cache space after completing each round of local training. The cache entry includes the client's parameter update amount, the number of cache entry updates, and the client's intermediate features.

[0011] S4. After the idle client loads the global model, it restarts local training. The m clients that complete local training first in the client cluster constitute the active set, and the clients outside the cluster active set that have cache entries stored in the cluster head cache space constitute the slow set. Idle clients refer to clients that are not in local training when they receive the global model. The initial value of the global model is a random value.

[0012] S5. Each client cluster aggregates the parameter update amounts from the local training of the active clients in this round and the parameter update amounts in the cache entries of the slow clients to obtain an aggregated update value; the aggregated update value is then superimposed on the most recently acquired global model to obtain the cluster-aggregated model.

[0013] S6. Aggregate the aggregation models within each cluster according to the ratio of the number of clients in each cluster, update the aggregation results to the global model, and broadcast the global model to each client in the cluster through each cluster head;

[0014] In each cluster state, repeat steps S4-S6 to perform global aggregation multiple times; whenever the number of global aggregations reaches the set value R, return to step S2 to re-cluster.

[0015] Repeat the above steps until the global model converges.

[0016] Preferably, in step S5, the aggregation method within each client cluster is as follows: the ratio of the amount of data in the client's local dataset to the total number of clients in the cluster is used as the reference coefficient for the client; for the active set, the reference coefficient is used as the aggregation coefficient; for the slow set, the product of the reference coefficient and the feature weight of the client is used as the aggregation coefficient; then, the parameter update amounts of the clients participating in the aggregation are weighted and summed using the aggregation coefficient to obtain the aggregated update amount value.

[0017] The feature weights of the slow-focused clients are used to characterize the client's computation speed and the difference in scene distribution between the client and the active-focused clients; the difference in scene distribution is characterized by dataset features.

[0018] Preferably, the difference in scene distribution between slow-focused client j and active-focused client j is represented by a feature drift factor. Characterization; Feature drift factor Take intermediate features of client j The cosine distance from the mean of the intermediate features of the active centralized clients.

[0019] Preferred feature weights for slow-focused client j Take the client's obsolescence factor Cache expiration factor and transition items The product of , To set weights; obsolescence factor Take the global aggregation count in the current clustering state and the local training round number of client j. The smaller of the reciprocal of the difference and 1; cache expiration factor Take the attenuation coefficient Power of 1.

[0020] Preferably, the number m of active clients in each client cluster is proportional to the size of the client cluster, and m is the larger of the product of the number of clients in the client cluster and a set ratio and 1.

[0021] Preferably, in step S4, idle clients are restarted for local training, while the remaining clients continue local training for the current round; clients are added to the active set in the order in which they complete local training, until the required number m is met; in step S5, when the number of clients that have completed local training reaches a set proportion of the total number of clients in the cluster, cluster aggregation is performed.

[0022] Preferably, the slow set construction method is as follows: after the active set is constructed, the slow set is initialized as an empty set; clients in the cluster outside the active set that have cached entries stored in the cache area are identified as slow ends; slow ends whose difference between the global aggregation count and the number of local training rounds of the client in the current cluster state is less than a set threshold are added to the slow set.

[0023] Preferably, the intermediate features are the mean of the input features of the agent output layer in the latest training batch, or the features of the input features of the agent output layer in the latest training batch after pooling and equalization.

[0024] The present invention proposes a vehicle network monitoring method, which first deploys monitoring equipment as clients at various monitoring locations within the target area, then trains a global model using the asynchronous federated learning aggregation method based on clustering caching and deploys it on each client as an agent; then the client collects scene images in real time and identifies vehicles through the agent; the monitoring locations include one or more of the following: highways, urban and rural roads, intersections, and bridges.

[0025] The present invention proposes an asynchronous federated learning aggregation system based on clustering caching, comprising a memory and a processor. The memory stores a computer program, and the processor is connected to the memory. The processor is used to execute the computer program to implement the asynchronous federated learning aggregation method based on clustering caching.

[0026] The advantages of this invention are:

[0027] (1) This invention introduces active and slow sets to achieve an asynchronous participation mechanism for intra-cluster aggregation. Each training round does not require all clients to complete synchronously. Instead, after a certain proportion of clients complete local training in the current round, the parameters are updated by combining the current parameter update amount of the active set clients and the historical parameter update amount of the slow set clients. This avoids global aggregation delays caused by client training speeds and ensures that all clients can receive the latest global aggregation model in real time, preventing slower clients from being affected by outdated global models. This invention introduces a dynamic re-clustering mechanism, enabling it to adapt to data distribution drift during training and further guaranteeing model performance.

[0028] (2) The present invention solves the system heterogeneity problem by weighted aggregation of active set and slow set; adopts asynchronous participation mechanism, each round of training does not require all clients to complete synchronously, but selects some clients for training according to the proportion of client participation, thus solving the system heterogeneity problem.

[0029] (3) After the global model is distributed, only idle clients load the global model to start new local training; clients that are already training locally continue their current local training. In this way, the difference in training speed among clients can be ignored, the aggregation speed can be improved, and it can be ensured that each client loads a brand new global model before starting new local training, thus solving the problem of global model obsolescence.

[0030] (4) This invention extracts intermediate features, and during federated learning, only a small amount of data needs to be forward-propagated, without the need for complete training and gradient backpropagation. The communication volume in the clustering stage is significantly reduced compared to traditional methods. At the same time, the caching mechanism eliminates the need for non-participating clients to frequently interact with the server, further reducing the overall communication overhead. The K-Means clustering method based on intermediate feature activation values ​​is adopted, and dynamic re-clustering is performed. Intermediate features reflect the essence of data distribution and are more stable than the ladder model. They are not directly affected by training hyperparameters such as learning rate and batch sampling, so that the clustering results remain stable between adjacent training rounds. The dynamic re-clustering mechanism enables the grouping to adapt to model evolution, achieving a balance between stability and adaptability, ensuring more accurate clustering, reducing oscillations, and accelerating convergence.

[0031] (5) The intermediate features of the client are always based on the last batch of training samples that have just been completed in this training. The intermediate features combine the parameter features of the agent on the client and the features of the collected samples in the client's scene. This is beneficial for considering the characteristics of the client's scene in federated learning and ensuring the reliability of asynchronous federated learning.

[0032] (6) This invention introduces a correlation staleness factor for the slow end. Cache expiration factor and characteristic drift factor Feature weights This invention implements an obsolescence awareness mechanism to prevent the staleness of slow clients from excessively affecting the global model, enabling efficient training in heterogeneous environments. Feature weights use a triple mechanism to determine cache effectiveness, comprehensively evaluating cache quality and reducing computation while accelerating model convergence. This invention automatically reduces the weights of clients with low update frequencies, preventing slow devices from excessively influencing the global model.

[0033] (7) In this invention, in addition to the active set, the cached entries are filtered sequentially into slow ends and slow sets, which avoids slow clients from participating in global aggregation and further reduces the adverse effects of outdated parameters on global aggregation.

[0034] (8) The present invention adopts a gradient cache reuse mechanism through the cluster head cache space. During local training, its gradient and other data features are added to the cache. Even if it does not participate in local training, the cache is still valid and participates in aggregation, which reduces the computational overhead and accelerates the model convergence. Attached Figure Description

[0035] Figure 1 This is a flowchart of an asynchronous federated learning aggregation method based on clustering caching proposed in this invention;

[0036] Figure 2 Training convergence trends for algorithms on the MNIST dataset;

[0037] Figure 3 The convergence trend of algorithm training for the CIFAR-10 dataset. Detailed Implementation

[0038] The technical solutions of the embodiments of the present invention will be clearly and completely described below with reference to the accompanying drawings. Obviously, the described embodiments are only some embodiments of the present invention, and not all embodiments. Based on the embodiments of the present invention, all other embodiments obtained by those of ordinary skill in the art without creative effort are within the scope of protection of the present invention.

[0039] like Figure 1As shown, this invention proposes an asynchronous federated learning aggregation method based on clustering caching for federated learning of agents on the client side in visual classification scenarios. The agents are used to classify targets within the client-side environment. For example, in a vehicle-to-everything (V2X) environment, the agents identify all vehicles based on traffic images acquired by the client. "Asynchronous" refers to the differences in datasets across different clients; for instance, some clients have datasets with more categories than others, and some are faster than others. Examples include highway monitoring modules, urban and rural road monitoring modules, and intersection monitoring modules that restrict vehicle categories within a V2X environment.

[0040] Due to varying road speed limits and permitted vehicle types, the local datasets (i.e., local monitored vehicle datasets associated with vehicle type and speed) differ between different clients, making it difficult for existing asynchronous federated learning methods to meet accuracy requirements. This invention proposes an asynchronous federated learning aggregation method based on clustering caching to address this problem.

[0041] The asynchronous federated learning aggregation method based on clustering caching proposed in this invention includes the following steps:

[0042] S1. Deploy and initialize the intelligent agent on the client;

[0043] All clients have the same agent structure, which performs target recognition based on acquired images.

[0044] S2. Train the client on the local dataset until convergence, and calculate the intermediate features of each client on the last training batch of each client (i.e. the training batch at convergence).

[0045] The intermediate features can be selected from the input features of the agent's output layer, or the input features after pooling and equalization within the training batch; for example, let B be the training sample set of the last training batch of client i, and let the intermediate features of client i be denoted as... ,but:

[0046] (1)

[0047] Where B is the last batch of training samples used in the local training on the client. For training batch size, Let x be the input features of the current output layer for the agent i during training on sample x. For client-side parameters; Pool indicates average pooling.

[0048] In this way, the intermediate features of the client are always obtained based on the last batch of training samples of the latest completed local training. These intermediate features integrate the parameter features of the agent on the client and the features of the collected samples in the client's scenario. This is beneficial for considering the characteristics of the client's scenario in federated learning and ensures the reliability of asynchronous federated learning.

[0049] S3, Client-based intermediate features Clustering is performed to obtain client clusters. Cluster heads are specified for each cluster, and cluster head cache spaces are initialized. Client i stores cache entries in the cache space of the corresponding cluster head based on its local training data. ;in, The amount of parameter updates cached for client i (i.e., the difference in agent parameters before and after the most recent round of local training). The cache entry update count (i.e., the number of local training rounds for client i after the most recent clustering). For client i, the intermediate features are used; after each round of local training, each client updates the cached entries. An update will be performed.

[0050] Specifically, after the cluster head's cache space is initialized, each cache entry... In Starting from 1, the accumulation continues after each round of local training is completed. Updated to +1.

[0051] Within the client cluster, nodes with sufficient energy and high computational stability are selected as cluster heads to ensure adequate computing resources. A cluster head cache space is set up at each cluster head to store cached entries from each client within the cluster, facilitating subsequent intra-cluster aggregation computations.

[0052] Specifically, in step S3, the intermediate features vi of each client are extracted and clustered, and the clusters are divided according to the clustering results. The clustering method can be K-Means clustering, which uses Euclidean distance as the similarity metric.

[0053] S4. After the idle client loads the global model, it restarts local training. The m clients that complete local training first in the client cluster constitute the active set, and the clients outside the cluster active set that have cache entries stored in the cluster head cache space constitute the slow set. Idle clients refer to clients that are not in local training when they receive the global model. The initial value of the global model is a random value.

[0054] In other words, if the client is idle after the global model is distributed, it loads the global model and begins new local training; if the client is already training locally after the global model is distributed, it continues its current local training; once the client's local training is complete, it enters an idle state. This approach ignores differences in client training speeds and ensures that each client loads a completely new global model before starting new local training, thus resolving the issue of global model obsolescence.

[0055] S5. Combine the parameter update amounts from the local training of the active centralized clients in this round with the parameter update amounts in the cache entries of the slow centralized clients to perform intra-cluster aggregation, and obtain the update amount aggregation of each client cluster (i.e. the aggregation result of parameter update amounts) and the intra-cluster aggregation model.

[0056] In step S5, when the number of clients that have completed local training reaches a set proportion of the total number of clients within the cluster, intra-cluster aggregation is performed to further reduce the staleness of the aggregation parameters. For example, suppose a client cluster has 10 clients, of which 5 clients have fast computing speeds (referred to as fast clients), which can ensure that they keep up with the global aggregation speed, that is, they can load the global model and restart local training after each global aggregation; the remaining 5 clients have slow computing speeds (referred to as slow clients), which cannot guarantee that they will be in an idle state after each global aggregation, and cannot guarantee that they will load the global model in each round. In this scenario, the ratio is set at 70%, meaning that after each round of global model distribution, if 7 clients complete local training, the next aggregation begins. At this time, after each global aggregation, the 5 fast clients load the global model for this round, while the remaining 5 slow clients may still be stuck in the local training of previous rounds, or they may be currently in an idle state and start loading the global model for this round to restart local training. When 7 clients have completed local training, a new round of global aggregation begins. Of these 7 clients that have completed local training, 5 are fast clients and the remaining 2 are slow clients. The slow clients have actually completed local training after loading the global model previously.

[0057] It is worth noting that, in order to ensure the stability of the iteration, a specified value m can be set to be less than or equal to the product of the total number of clients in the client cluster and a set ratio.

[0058] The specified value m is related to the size of the client cluster. Let the k-th client cluster... The specified value is denoted as , For the k-th client cluster The number of clients in the system, where ρ is the set completion rate threshold.

[0059] Make the client cluster First The active set consists of clients that have completed local training. The remaining clients within the cluster constitute a slow set. The aggregated objects within a cluster are active sets. Then the client cluster Update volume aggregate value and intra-cluster aggregation model The calculation formula is as follows:

[0060] (2)

[0061] (3)

[0062] in, This represents the parameter update amount for client i in the current training round, i.e., the difference in agent parameters before and after this round of local training. Let $\mathbf{j}$ be the parameter update amount in the cached entry for client $j$. It's worth noting that after each round of local training, each client uploads the parameter update amount for that round to the cluster head cache space to overwrite the parameter update amount in the cached entry. Therefore, Essentially, it is also the parameter update amount in the current round of updates in the cached entries of client i.

[0063] The number of samples in the local dataset held by client i; The number of samples in the local dataset held by client j; The set drift threshold; Let j be the feature drift factor of client j. The feature weights for client j; This is the most recently acquired global model, with its initial value being a random value; To set the weight, the value range is [0.001, 0.1], and it can be set to 0.005.

[0064] Feature drift factor Specifically, the cosine distance between the intermediate features of agent j and the mean of the intermediate features of the active set of client agents in the client cluster to which client j belongs is used, i.e.:

[0065] (4)

[0066] in, This is an intermediate feature for client j, which can be extracted from cached entries; Let be the aggregated feature of the active set of the client cluster where client j is located. The aggregated feature of the active set is the mean of the intermediate features of each client in the active set; ||||2 represents the L2 norm.

[0067] Active set S a aggregation features ,|Sa |For active set S a The number of clients in the system.

[0068] Feature weights of client j for:

[0069] (5)

[0070] As an age factor, The sum of the global aggregation count and the number of local training rounds for client j in the current clustering state. The difference; It can be obtained directly from the cached entry of client j.

[0071] This is the cache expiration factor; The set attenuation coefficient has a range of values. 0,1 ; Update the round number for the cached entry of client j (i.e., the number of local training rounds of client i after the most recent clustering). The larger the value, the lower the weight.

[0072] This is a transitional term; To set the weight, the value range is [0.5, 2], and it can be set to 1; Let be the feature drift factor of client j.

[0073] S6. Aggregate the aggregation models within each cluster according to the ratio of the number of clients in each cluster, update the aggregation results to the global model, and broadcast the global model to each client in the cluster through each cluster head;

[0074] The updated global model is denoted as :

[0075] (6)

[0076] Where K is the number of current client clusters; The number of clients in the k-th client cluster is referred to as the size of the k-th client cluster; N is the total number of clients participating in federated learning. This is the intra-cluster aggregation model for the k-th client cluster.

[0077] S7. Determine whether the number of global model updates after the most recent client clustering has reached the set value R;

[0078] If yes, proceed to step S8;

[0079] If not, initialize the active and slow sets of each client cluster to empty sets, let each idle client train locally, let the idle client that completes the local training first be added to the active set, and let the remaining clients in the cluster be added to the slow set; then return to step S5; an idle client refers to a client that is not currently training locally.

[0080] Since local training can be fast or slow, it is very likely that while client A is completing one round of local training or even starting the second round of local training, model B is still continuing the first round of local training. In this case, model B will directly join the slow set and skip the second round of local training. Only after model B's local training is completed can it join the subsequent local training as an idle client.

[0081] S8. Determine whether the global model has converged;

[0082] The convergence condition of the global model can be set as follows: whether the number of clustering operations on the client reaches the set value M, or whether the evaluation metrics of the global model on each client (such as test accuracy, loss function, etc.) all reach the set value.

[0083] Yes, then client training is complete;

[0084] If not, return to step S2.

[0085] The following specific embodiments illustrate and verify the above-mentioned asynchronous federated learning aggregation method based on clustering caching.

[0086] In this embodiment, simulation experiments were conducted on the MNIST and CIFAR-10 datasets. A Dirichlet distribution was used to simulate data heterogeneity. Perform non-independent identically distributed (non-IID) partitioning. Concentration parameter. This generates a highly skewed label distribution and an imbalanced sample size among clients, thereby constructing a rigorous heterogeneity assessment environment.

[0087] For the MNIST dataset, the model structure uses a convolutional neural network (CNN) consisting of two convolutional layers and one 512-dimensional fully connected layer; for CIFAR-10, the LeNet-5 network is used.

[0088] The experiment was implemented using the PyTorch framework, with a global iteration count of 100 and a client participation rate of 70% per round (i.e., after each round of global aggregation, if the number of clients completing local training reaches 70% of the total number of clients within the cluster, then the cluster head performs intra-cluster aggregation). Local training used the SGD optimizer with a learning rate of 0.005 and a local training batch size of 32.

[0089] To simulate system heterogeneity, clients are divided into fast and slow clients, with the slow client's computation latency being 3 to 5 times that of the fast client. Core algorithm parameters include the number of clusters K (i.e., the number of client clusters), the clustering interval R (i.e., the number of global aggregations between two adjacent clustering operations), and the cache threshold T.

[0090] Specifically, the cache threshold T defines the maximum allowed expiration time when cache updates participate in aggregation. Formally, the local update generated in round t is only valid if the constraint is satisfied. Only then is it considered a global aggregation round. Effective. That is, slow concentration only... It only participates in global aggregation at certain times.

[0091] In this embodiment, a cache decay factor is introduced. To mitigate the impact of historical information.

[0092] In this embodiment, existing federated learning methods WKAFL, FedAvg, FedProx, and FedSA are used as comparison algorithms. The comparison algorithms and the algorithm of this invention, AHFL-Cache, are trained on two datasets respectively. Accuracy is used as the evaluation metric. The training results are as follows: Figure 2 and Figure 3 As shown in the figure. Experimental results demonstrate that AHFL-Cache outperforms the compared algorithms: its convergence speed is significantly faster than asynchronous methods such as FedSA and WKAFL, effectively overcoming the latency issues caused by heterogeneous devices. In terms of final accuracy, AHFL-Cache achieves the optimal test accuracy, and its convergence process is smoother and more stable compared to other algorithms.

[0093] In this embodiment, the sensitivity of three core parameters of the AHFL-Cache algorithm of the present invention was also evaluated: the number of clusters K, the clustering interval R, and the cache threshold T. Each parameter was adjusted independently, while the remaining parameters remained at their default values.

[0094] Table 1: Parameter Testing

[0095]

[0096] Optimal parameters for the MNIST dataset: K=5; R=10; T=5;

[0097] Optimal parameters for the CIFAR-10 dataset: K=10; R=10; T=10;

[0098] As can be seen from Table 1:

[0099] Accuracy increases with increasing K, then tends to stabilize. Smaller K leads to higher heterogeneity within clusters, making it difficult for the model to learn fine-grained features. As K increases, the clusters become more uniform, thus achieving stable performance.

[0100] Accuracy decreases as R increases, peaking at R=10 for both datasets. This indicates that a smaller R value allows for timely correction of cluster assignments due to client-side data lag and distribution drift during training. If the R value is too large, re-clustering will lag behind dynamic changes, leading to outdated membership relationships that hinder convergence.

[0101] The parameter T balances data utilization and update timeliness. MNIST performs best at T=5, while CIFAR-10 is more accurate at T=10. This indicates that complex tasks can still benefit more from high-contribution, lagging nodes even with higher latency.

[0102] In this embodiment, an ablation experiment was also conducted, and the ablation model was set as follows:

[0103] Elimination Model 1: Remove clustering based on intermediate features and directly adopt gradient clustering;

[0104] Ablation Model 2: Remove the cache, and do not distinguish between active and slow sets when aggregating within the cluster. After all clients have completed local training, directly take the average of the parameter update amounts of all clients within the cluster as the aggregated update amount value within the cluster.

[0105] Ablation Model 3: Removes the two-tier architecture, eliminates clustering, and directly aggregates all clients as a single cluster for global aggregation.

[0106] The core parameters for the ablation experiment are set as follows:

[0107] Optimal parameters for the MNIST dataset: K=5; R=10; T=5;

[0108] Optimal parameters for the CIFAR-10 dataset: K=10; R=10; T=10;

[0109] The results of the ablation experiment are shown in Table 2.

[0110] Table 2: Ablation Experiment

[0111]

[0112] Ablation experiments show that the accuracy of ablation model 1 decreased to 92.48% and 37.41% on MNIST and CIFAR-10, respectively. This indicates that in non-independent identically distributed (Non-IID) scenarios, feature fingerprints can accurately quantify the deviation between local updates and the global target. Without this identification mechanism, updates with severe feature shifts will interfere with the aggregation process, making it difficult for the model to converge to the optimal solution.

[0113] In ablation model 2, system performance fluctuates. The underlying physical significance is that the caching mechanism acts as a "data buffer" in an environment with uneven device computing power, effectively filling the update gaps caused by stragglers and ensuring that the global model can absorb a sufficient amount of knowledge in each round of aggregation.

[0114] Ablation model 3 had the most significant impact on system performance, with the accuracy of CIFAR-10 dropping sharply to 32.62%. This fully demonstrates the crucial role of hierarchical aggregation in complex heterogeneous environments: the pre-aggregation of cluster head nodes not only filters out local noise but also ensures the orderly integration of large-scale updates at the global level by reducing the communication pressure on the central server.

[0115] Of course, those skilled in the art will recognize that the present invention is not limited to the details of the exemplary embodiments described above, but also includes the same or similar structures that can be implemented in other specific forms without departing from the spirit or essential characteristics of the invention. Therefore, the embodiments should be considered illustrative and non-limiting in all respects, and the scope of the invention is defined by the appended claims rather than the foregoing description. Thus, all variations falling within the meaning and scope of equivalents of the claims are intended to be included within the present invention. No reference numerals in the claims should be construed as limiting the scope of the claims.

[0116] Furthermore, it should be understood that although this specification describes embodiments, not every embodiment contains only one independent technical solution. This narrative style is merely for clarity. Those skilled in the art should consider the specification as a whole, and the technical solutions in each embodiment can also be appropriately combined to form other embodiments that can be understood by those skilled in the art.

[0117] The technologies, shapes, and structures not described in detail in this invention are all known technologies.

Claims

1. An asynchronous federated learning aggregation method based on clustering caching, characterized in that, This is used for federated learning of agents on the client side in visual classification scenarios, where the agents are used to classify objects in the client-side scenario. S1. Deploy and initialize the intelligent agent on the client; S2. Train the client on the local dataset until convergence, and calculate the client's intermediate features; the client's intermediate features come from the features output by the client's intermediate layer for the last batch of training samples trained locally. S3. Clustering is performed based on the intermediate features of the client, and a cluster head is specified for each client cluster. The cluster head cache space is set and initialized. After clustering, each client stores or updates its cache entry in the cluster head cache space after completing each round of local training. The cache entry includes the client's parameter update amount, the number of cache entry updates, and the client's intermediate features. S4. After the idle client loads the global model, it restarts local training. The m clients that complete local training first in the client cluster constitute the active set, and the clients outside the cluster active set that have cache entries stored in the cluster head cache space constitute the slow set. Idle clients refer to clients that are not in local training when they receive the global model. The initial value of the global model is a random value. S5. Each client cluster aggregates the parameter update amounts from the local training of the active clients in this round and the parameter update amounts in the cache entries of the slow clients to obtain an aggregated update value; the aggregated update value is then superimposed on the most recently acquired global model to obtain the cluster-aggregated model. S6. Aggregate the aggregation models within each cluster according to the ratio of the number of clients in each cluster, update the aggregation results to the global model, and broadcast the global model to each client in the cluster through each cluster head; In each cluster state, repeat steps S4-S6 to perform global aggregation multiple times; whenever the number of global aggregations reaches the set value R, return to step S2 to re-cluster. Repeat the above steps until the global model converges; In step S5, the aggregation method within each client cluster is as follows: the ratio of the amount of data in the client's local dataset to the total number of clients in the cluster is used as the reference coefficient for the client. For the active set, the reference coefficient is used as the aggregation coefficient; for the slow set, the product of the reference coefficient and the client's feature weights is used as the aggregation coefficient. Then, the parameter update amounts of the clients participating in the aggregation are weighted and summed using the aggregation coefficient to obtain the aggregated update amount value; The feature weights of slow-focused clients are used to characterize the client's computation speed and the difference in scene distribution between the client and the active-focused clients; Differences in scene distribution are characterized by dataset features; The difference in scene distribution between slow-focused client j and active-focused client j is analyzed using a feature drift factor. Characterization; Feature drift factor Take intermediate features of client j The cosine distance from the mean of the intermediate features of the active, concentrated clients; Feature weights of slow-focused client j Take the client's obsolescence factor Cache expiration factor and transition items The product of , To set weights; obsolescence factor Take the global aggregation count in the current clustering state and the local training round number of client j. The smaller of the reciprocal of the difference and 1; cache expiration factor Take the attenuation coefficient Power of 1 When the asynchronous federated learning aggregation method based on clustering caching is applied to vehicle network monitoring, firstly, monitoring equipment is deployed at each monitoring location within the target area as clients. Then, the asynchronous federated learning aggregation method based on clustering caching is used to train a global model and deploy it to each client as an intelligent agent. The client then collects scene images in real time and identifies vehicles through an intelligent agent; The monitoring locations include one or more of the following: highways, urban and rural roads, intersections, and bridges.

2. The asynchronous federated learning aggregation method based on clustering caching as described in claim 1, characterized in that, The number of clients m in the active set of each client cluster is proportional to the size of the client cluster. m is the larger of the product of the number of clients in the client cluster and a set ratio, plus 1.

3. The asynchronous federated learning aggregation method based on clustering caching as described in claim 1, characterized in that, In step S4, the idle client restarts local training, while the remaining clients continue local training for the current round. The client is instructed to join the active set sequentially according to the order in which local training is completed, until the required number m is met; In step S5, when the number of clients that have completed local training reaches a set proportion of the total number of clients in the cluster, cluster aggregation is then performed.

4. The asynchronous federated learning aggregation method based on clustering caching as described in claim 1, characterized in that, The slow set is constructed as follows: after the active set is constructed, the slow set is initialized as an empty set; clients within the cluster that have cached entries stored in the cache area and are not in the active set are identified as slow ends; slow ends whose difference between the global aggregation count and the number of local training rounds of the client in the current cluster state is less than a set threshold are added to the slow set.

5. The asynchronous federated learning aggregation method based on clustering caching as described in claim 1, characterized in that, The intermediate features are the mean of the input features of the agent's output layer in the latest training batch, or the features of the input features of the agent's output layer in the latest training batch after pooling and equalization.

6. An asynchronous federated learning aggregation system based on clustering caching, characterized in that, It includes a memory and a processor, wherein the memory stores a computer program, the processor is connected to the memory, and the processor is used to execute the computer program to implement the asynchronous federated learning aggregation method based on clustering caching as described in any one of claims 1-5.