A recommendation method based on graph hypernetwork and heterogeneous federated learning

By employing graph hypernetworks and heterogeneous federated learning, the problem of device and data heterogeneity is solved, enabling efficient collaborative training of heterogeneous devices, improving the accuracy and stability of personalized recommendations, and resolving the issues of device resource waste and noise interference in existing technologies.

CN122242653APending Publication Date: 2026-06-19NANJING UNIV OF AERONAUTICS & ASTRONAUTICS

Patent Information

Authority / Receiving Office
CN · China
Patent Type
Applications(China)
Current Assignee / Owner
NANJING UNIV OF AERONAUTICS & ASTRONAUTICS
Filing Date
2026-04-10
Publication Date
2026-06-19

AI Technical Summary

Technical Problem

Existing federated learning schemes face challenges in terms of device and data heterogeneity, resulting in limited accuracy and wasted resources for personalized recommendations. Furthermore, they are prone to introducing noise in the early stages of training, which affects the model's convergence stability.

Method used

We employ a graph supernetwork and heterogeneous federated learning approach. By generating personalized local model parameters through the graph supernetwork, and combining a hybrid warm-up strategy and a one-way knowledge distillation mechanism, we achieve collaborative training of heterogeneous devices. Furthermore, we enhance the robustness of the model through residual bridging networks and counting mask gradient calculation.

Benefits of technology

It enables efficient collaborative training of heterogeneous devices, improves the accuracy and stability of personalized recommendations, avoids feature collapse and noise interference, and significantly improves the performance of the recommendation system.

✦ Generated by Eureka AI based on patent content.

Smart Images

  • Figure CN122242653A_ABST
    Figure CN122242653A_ABST
Patent Text Reader

Abstract

This invention discloses a recommendation method and system based on graph hypernetworks and heterogeneous federated learning. The method includes: a server acquiring item interaction features and model topology structures from heterogeneous clients; using a graph hypernetwork to generate personalized local model parameters for each heterogeneous client based on the interaction features; the client loading the parameters for local training and uploading for updates; the server executing a hybrid preheating strategy to update the global item embedding matrix, wherein in the preheating phase, block aggregation based on a unified dimension is performed, and in the fine-tuning phase, it switches to heterogeneous bridging fusion based on independent dimensions, and a residual bridging network is used to perform cross-dimensional unidirectional knowledge distillation; finally, the graph hypernetwork is updated in reverse based on the difference between the parameters generated by the graph hypernetwork and the parameters trained by the clients. This invention effectively solves the feature interference and cold start problems in heterogeneous federated learning through a hybrid preheating and unidirectional distillation mechanism, significantly improving the accuracy and robustness of the recommendation system.
Need to check novelty before this filing date? Find Prior Art

Description

Technical Field

[0001] This invention relates to the fields of artificial intelligence and internet information recommendation technology, and more specifically, to a recommendation method and system based on graph hypernetworks and heterogeneous federated learning that supports collaborative training of heterogeneous devices. Background Technology

[0002] With the rapid development of mobile internet technology, personalized recommendation systems have been widely applied in various online services such as e-commerce, social media, and streaming media, aiming to help users quickly discover content of interest from massive amounts of information. Traditional recommendation systems typically employ a centralized training model, requiring the uploading of interaction data scattered across various user terminals to a central server for unified storage and model training. However, this centralized data processing method not only raises growing concerns about user privacy leaks but also faces compliance challenges from increasingly stringent data protection laws and regulations. To address the privacy and data silo issues, federated learning has emerged. Its core idea is "data remains still, model moves," meaning that without exchanging local data, a global model is collaboratively built by aggregating the local model parameters of each client.

[0003] Despite its superior privacy performance, federated learning still faces two major technical bottlenecks in practical recommender system applications: data heterogeneity and device heterogeneity. On one hand, user interests in recommender scenarios are extremely discrete and personalized, with vastly different data distributions among users. Traditional federated averaging algorithms often struggle to accurately capture each user's unique preferences, limiting the accuracy of personalized recommendations. On the other hand, the hardware performance of user terminal devices in real-world edge environments varies greatly, ranging from high-performance flagship smartphones to low-performance IoT devices. Existing federated learning frameworks typically require all clients participating in training to deploy neural network models with identical structures. This leads to devices with weaker computing power being excluded from training due to their inability to support large models, or powerful devices being forced to accommodate smaller models, resulting in wasted resources and failing to fully utilize the computational potential of heterogeneous devices.

[0004] To address the challenges of heterogeneous devices, existing technologies have developed a heterogeneous federated learning method based on parameter slicing. This method mandates that small-scale models are subsets of medium-scale models, and medium-scale models are subsets of large-scale models, achieving collaborative training by sharing some underlying parameters. However, this simple, rigid parameter-sharing strategy has significant drawbacks. First, due to the strong coupling of parameter spaces, gradient updates from large-scale models often dominate the evolution direction of shared parameters, causing the feature representations of small-scale models to be "submerged" or "biased," resulting in a "blind men and the elephant" effect. Second, this method restricts the degrees of freedom for models of different sizes to explore their respective optimal feature spaces, preventing small models from learning independent semantic representations suitable for their own dimensions. Furthermore, in the early stages of training, the lack of an effective alignment mechanism can easily introduce significant noise into parameter aggregation between heterogeneous models, leading to slow model convergence or even performance degradation.

[0005] Therefore, the industry urgently needs a new federated recommendation scheme that can break down the barriers of device computing power, support heterogeneous collaboration of models of different scales, and solve the feature interference problem caused by parameter slicing through an effective knowledge fusion mechanism, achieving high-precision personalized recommendations while protecting privacy. This invention arose in this context. Summary of the Invention

[0006] Purpose of the Invention: This invention aims to address the dual challenges of data heterogeneity and device heterogeneity faced by existing recommender systems in federated learning scenarios. Specifically, traditional federated learning requires consistent client model structures, making it unsuitable for terminal devices with varying computing power. Existing heterogeneous solutions based on hard parameter slicing force smaller models to be subsets of larger models, resulting in the smaller models' feature representations being dominated by the larger models, leading to a "blind men and the elephant" effect. Furthermore, this approach easily introduces noise in the early stages of training, severely impacting the accuracy and convergence stability of personalized recommendations. This invention proposes a novel method to achieve efficient collaborative training across devices with different computing power, significantly improving the performance of recommender systems while protecting privacy.

[0007] Technical Solution: This invention provides a recommendation method based on graph hypernetworks and heterogeneous federated learning, the overall implementation steps of which are as follows:

[0008] (1) The server obtains the item interaction features of each heterogeneous client participating in the training, as well as the topology diagram of the local model corresponding to each client;

[0009] (2) The server uses a graph hypernetwork to generate personalized local model parameters for each heterogeneous client based on the item interaction features and topology graph;

[0010] (3) Each client loads the corresponding local model parameters and item embedding features, performs local training using local private data, and uploads the trained parameters to the server.

[0011] (4) The server executes a hybrid warm-up strategy based on the number of training rounds, aggregates the item embedding features uploaded by each client, and updates the item embedding matrix on the server side.

[0012] (5) The server performs backpropagation to update the graph hypernetwork based on the difference between the initial parameters generated by the graph hypernetwork and the parameters trained by the client.

[0013] Further, in step (1), the method for obtaining the item interaction features is as follows: extract positive sample items from the client's local historical interaction data, obtain the embedding vectors corresponding to these items in the current global embedding matrix, and perform mean aggregation calculation. The specific calculation formula is as follows:

[0014]

[0015] Among them, c i Let P represent the item interaction feature vector of the i-th client. i |P represents the set of positive sample items for this client. i | represents the number of positive sample items, e j Let represent the embedding vector of item j. This approach transforms the high-dimensional, sparse interaction history into a low-dimensional, dense feature vector, which is convenient for graph hypernetwork processing.

[0016] Further, in step (2), the process of generating parameters for the graph hypernetwork includes: parsing the client's model topology into a graph structure, and using a shape encoder to encode the parameter shapes of each network layer to obtain the initial state of the nodes. During the message transmission process of the graph nodes, the feature c obtained in step S1 is... i The hidden state of a node is dynamically incorporated, and the fusion formula is as follows:

[0017] x k =h k +α·c i

[0018] Where, x k h represents the input feature of the k-th node after fusion. k This represents the hidden state of the k-th node after being passed through the graph neural network, where α is a preset feature amplification factor. Finally, the decoder is used to... k Mapped to personalized network weights that fit the scale of the client model.

[0019] Further, in step (4), the hybrid preheating strategy specifically includes a preheating stage and a fine-tuning stage. If the current number of training rounds is less than a preset threshold, block aggregation based on a unified dimension is performed to maintain a unified large matrix to eliminate initialization noise; if the threshold is reached, heterogeneous bridging fusion is switched to maintain mutually independent item embedding matrices. In this stage, the low-dimensional student matrix E is fused using the residual bridging network Bridge(·). S Mapping to a higher-dimensional space, the mapping formula is:

[0020] Bridge(E S ) = W proj E S +ReLU(W refine (W proj E S ))

[0021] Based on this, perform one-way knowledge distillation to construct the loss function:

[0022] Loss distill =MSE(Bridge(E) S E T .detach())

[0023] Among them, E T The .detach() function blocks the gradient backpropagation of the high-dimensional teacher matrix, updating only the bridging network and student matrix, thus allowing the small model to absorb the general knowledge of the large model while maintaining its independent feature representation.

[0024] Furthermore, in step (5), the graph hypernetwork update does not directly aggregate parameters, but instead calculates and generates parameters W. GHN With the post-training parameters W local_trained The difference. Considering the different update frequencies of different parameter positions in heterogeneous scenarios, this invention introduces a counting mask matrix M to calculate the block-level average gradient signal:

[0025]

[0026] By using the gradient signal ΔW to update the weights of the graph supernetwork, closed-loop evolution of generator performance is achieved.

[0027] Beneficial effects: Through the above technical solutions, this invention breaks through the computing power barrier of devices and realizes collaborative training and zero-sample parameter generation of heterogeneous devices by using graph hypernetworks, effectively alleviating the cold start problem; it innovatively proposes a hybrid preheating strategy and a one-way knowledge distillation mechanism, which not only utilizes the full data for rapid convergence in the early stage of training, but also removes the forced constraint of the large model on the small model in the fine-tuning stage, avoiding feature collapse and the "blind men and the elephant" effect; at the same time, the introduction of residual bridging networks and counting mask gradient calculation significantly enhances the robustness of the system under non-independent and identically distributed data. Attached Figure Description

[0028] Figure 1 This is a schematic diagram of the overall architecture of a recommendation system based on graph hypernetworks and heterogeneous federated learning, provided by an embodiment of the present invention. The diagram illustrates the data interaction relationship between the server and the heterogeneous client cluster.

[0029] Figure 2 This is a flowchart illustrating a recommendation method based on graph hypernetworks and heterogeneous federated learning provided in an embodiment of the present invention. The diagram details the complete closed-loop process from feature acquisition, GHN parameter generation, local training, to executing a hybrid warm-up strategy based on the number of training epochs, and reverse updating of GHN parameters. Detailed Implementation

[0030] The technical solutions of the embodiments of the present invention will be clearly and completely described below with reference to the accompanying drawings. Obviously, the described embodiments are only some embodiments of the present invention, and not all embodiments. All other embodiments obtained by those skilled in the art based on the embodiments of the present invention without creative effort are within the scope of protection of the present invention.

[0031] Step 1, refer to Figure 1 The system architecture shown constructs a recommender system environment based on graph hypernetworks and heterogeneous federated learning. At the physical level, the system consists of a cloud server and a cluster of heterogeneous clients at the edge. The client cluster comprises several user terminals with varying computing capabilities, pre-divided into at least three model scales based on their hardware performance: first-scale clients, second-scale clients, and third-scale clients. The server-side deployment includes a global item embedding pool, a graph hypernetwork, and a heterogeneous bridging and fusion module. The global item embedding pool dynamically adjusts its maintenance strategy at different training stages, the graph hypernetwork generates network weights, and the heterogeneous bridging and fusion module performs cross-dimensional feature alignment.

[0032] Step 2, refer to Figure 2The illustrated process flow shows that when the federated learning training task begins, the server first performs feature extraction and topology acquisition operations. The server randomly selects active clients participating in this round of training and obtains the local model topology diagrams and item interaction features uploaded by these clients. Since the raw interaction data in recommendation systems is usually extremely sparse, directly using it as input will cause the neural network to have difficulty converging.

[0033] Step 201: To address the aforementioned sparsity problem, this embodiment employs mean aggregation to calculate the item interaction features. Specifically, the server or client extracts the set P of positive sample items from the local historical interaction data of the i-th client. i And obtain the embedding vector e of these items in the global item embedding matrix at the current time. j The item interaction feature vector c of this client is calculated using the following formula. i :

[0034]

[0035] Among them, |P i | represents the number of positive sample items. This step transforms the variable-length interaction history into a fixed-dimensional dense feature vector c. i , which serves as the input condition for the graph hypernetwork.

[0036] Step 3: The server uses a graph hypernetwork to dynamically generate personalized local model parameters for each heterogeneous client based on the item interaction features obtained in step 201. GHN treats the neural network uploaded by the client as a graph structure, where the layers of the neural network correspond to the nodes of the graph, and the data flow between layers corresponds to the edges of the graph. GHN first uses a shape encoder to encode the tensor shape of each layer parameter into the initial state of the node, and then uses a gated graph neural network to pass messages between nodes.

[0037] Step 301: During message transmission, in order to generate personalized parameters for specific users, this invention will use the client's feature c i It is dynamically integrated into the hidden state of the graph nodes. The specific fusion calculation formula is as follows:

[0038] x k =h k +α·c i

[0039] Where, x k h represents the input feature of the k-th node after fusion. k This represents the hidden state of the k-th node after being passed through the graph neural network, where α is a preset feature amplification factor used to control the intensity of the personalized signal. Finally, the decoder is used to process x... kMapped to personalized network weights that fit the scale of the client model.

[0040] Step 4: Each client receives the personalized local model parameters W sent by the server. GHN The corresponding item embedding features are loaded into the local model. The client uses its local, private user-item interaction data and performs several rounds of local training using optimization algorithms such as stochastic gradient descent to obtain the updated network parameters W. local_trained And the updated item embedding features.

[0041] Step 5: The server receives the updated data uploaded by each client and executes a hybrid preheating strategy to update the global item embedding matrix based on the current training round number t. This strategy has a preheating threshold T.

[0042] Step 501: If the current training round number t < T, the server determines that the system is in the warm-up phase and performs block aggregation based on a unified dimension. At this time, the server maintains a unified maximum-dimensional item embedding matrix in memory. Since the feature dimensions uploaded by Small, Medium, and Large clients are different (32, 64, and 128 respectively), the server divides the matrix into three blocks: [0-32], [32-64], and [64-128]. For each block, the mean of the updates uploaded by all clients containing data of that dimension is calculated, and the unified matrix is ​​updated by concatenating the blocks. Subsequently, the unified matrix is ​​sliced ​​and distributed to each client as needed. This step aims to quickly eliminate initialization noise using the full dataset.

[0043] Step 502: If the current training round number t ≥ T, the server determines that the system has entered the fine-tuning stage and switches to executing heterogeneous bridging fusion based on independent dimensions. At this time, the server decouples the unified matrix and maintains three physically independent item embedding matrices for Small, Medium, and Large clients respectively, and first calculates the aggregate mean within each size group. Subsequently, the heterogeneous bridging fusion module is used to establish connections between matrices of different dimensions. Taking low-dimensional to high-dimensional learning as an example, this invention constructs a residual bridging network Bridge(·) to bridge the low-dimensional student matrix E. S Mapped to a higher-dimensional space.

[0044] Step 503, the mapping calculation formula for the residual bridging network is:

[0045] Bridge(E S ) = W proj E S +ReLU(W refine (W proj E S ))

[0046] Among them, Wproj W represents the weight matrix of the linear projection layer, used for dimension alignment. refine represents the weight matrix of the nonlinear refinement layer; ReLU is the activation function. The design of the residual structure x+f(x) makes it easier for the network to learn the identity mapping relationship between feature spaces of different dimensions.

[0047] Step 504: Based on this, the server performs one-way knowledge distillation. The high-dimensional independent item embedding matrix is ​​used as the teacher matrix E. T The low-dimensional independent item embedding matrix is ​​used as the student matrix E. S Calculate the distillation loss function:

[0048] Loss distill =MSE(Bridge(E) S E T .detach())

[0049] Where MSE represents the mean square error function; key operation E T .detach() indicates that the teacher matrix E is blocked when calculating the gradient. T Gradient backpropagation. This means that the backpropagation algorithm only updates the parameters of the bridging network and the student matrix E. S Meanwhile, the teacher matrix remains unchanged. This mechanism ensures that high-quality features of the high-dimensional model can guide the low-dimensional model, while strictly prohibiting noise from the low-dimensional model from interfering with the high-dimensional model, effectively solving the "blind men and the elephant" problem.

[0050] Step 6: After updating the item embedding, the server updates the parameters of the graph hypernet. To adapt to heterogeneous environments, this invention does not directly aggregate client model parameters. The server calculates the initial parameters W generated by GHN. GHN With the parameters W after training on the client local_trained The difference between them.

[0051] Step 601: Considering the different sizes of parameters in different layers of the heterogeneous model and the different numbers of clients participating in the update, this invention constructs a counting mask matrix M and calculates the block-level average gradient signal ΔW:

[0052]

[0053] Using this gradient signal ΔW, the internal weights of the server-side GHN generator are updated via backpropagation. Through this closed-loop update, GHN can gradually learn to generate initial parameters that are closer to the "optimal solution" for each heterogeneous client, thereby improving the performance of the next round of training.

[0054] Step 7: This embodiment of the invention also provides a computer system, including a memory and a processor. The memory stores a computer program, and the processor executes the computer program to implement the methods described in steps 1 to 6 above. In summary, this invention, through a hybrid preheating strategy and unidirectional knowledge distillation technology, successfully achieves collaborative training of heterogeneous devices without sacrificing privacy, significantly improving the recall rate and normalized loss cumulative gain of the recommendation system.

Claims

1. A recommendation method based on graph hypernetworks and heterogeneous federated learning, characterized in that, Includes the following steps: (1): The server obtains the item interaction features of each client and the topology diagram of the local model corresponding to each client. The client includes at least two heterogeneous clients with different model scales. (2): The server uses a graph hypernetwork to perform message passing and feature fusion between nodes in the topology graph, and generates personalized local model parameters for each heterogeneous client based on the item interaction features. (3): Each client loads the corresponding local model parameters and item embedding features, performs local training using local private data, and uploads the trained local model parameters and updated item embedding features to the server. (4): The server performs block aggregation based on a unified dimension in the warm-up phase and heterogeneous bridging fusion based on an independent dimension in the fine-tuning phase, according to a phased strategy based on the number of training rounds, in order to update the item embedding matrix on the server side. (5): The server constructs a gradient signal based on the difference between the local model parameters generated by the graph hypernetwork and the local model parameters trained by the client, and updates the parameters of the graph hypernetwork.

2. The method according to claim 1, characterized in that, In step S1, the calculation formula for the item interaction features is: Among them, c i Let P represent the item interaction feature vector of the i-th client. i Let |P| represent the set of positive sample items in the local historical interaction data of the i-th client. i | represents the quantity of the positive sample items, e k This represents the embedding vector of item k in the current global item embedding matrix.

3. The method according to claim 1, characterized in that, In step S2, the specific process of generating local model parameters by the graph hypernetwork includes: Each network layer in the topology graph is parsed into a graph node, and a shape encoder is used to encode the feature of the parameter shape of each network layer to obtain the initial hidden state of the node. During message transmission in a graph hypernetwork, the client's item interaction features are integrated into the node's hidden state, and the fusion formula is as follows: x k =h k +a·c i Where, x k h represents the input feature of the k-th node after combination. k c represents the hidden state of the k-th node after being passed through the graph neural network. i The item interaction features of the client are defined as α, where α is a preset feature amplification factor. Using a decoder network, the fused features x k The decoding mapping is converted into the local model parameter matrix of the corresponding network layer.

4. The method according to claim 1, characterized in that, In step S5, the specific process of updating the parameters of the graph hypernetwork is as follows: For each layer of the graph hypernetwork output, the block-level average gradient signal is calculated using the following formula: Where ΔW represents the difference gradient tensor used for backpropagation to update the graph supernetwork, W GHN W represents the initial parameter tensor generated by the graph hypernetwork. local_trained The parameter tensor uploaded by the client after local training is shown, M is a counting mask matrix, and each element of the counting mask matrix represents the total number of heterogeneous clients participating in updating the position of the parameter; The weights of the graph hypernetwork are updated using the backpropagation algorithm based on the ΔW.

5. The method according to claim 1, characterized in that, In step S4, the specific judgment logic of the phased strategy is as follows: Set the preheating cycle number threshold T; If the current training round number t < T, it is determined to be the warm-up stage, and the block aggregation based on the unified dimension is executed; If the current training round number t≥T, it is determined to be in the fine-tuning stage, and the heterogeneous bridging fusion based on independent dimensions is performed.

6. The method according to claim 5, characterized in that, The block aggregation based on a unified dimension specifically includes: Maintain a unified maximum-dimensional item embedding matrix on the server side; The embedded features of items uploaded by various heterogeneous clients are divided into multiple blocks according to dimensional ranges; For each block, calculate the mean of the upload features of all heterogeneous clients that contain blocks of that dimension; The mean values ​​of each block are concatenated to update the unified maximum-dimensional item embedding matrix. The updated matrix is ​​then truncated and sliced ​​according to the model size of each heterogeneous client, and used as the item embedding features to be distributed to each client.

7. The method according to claim 5, characterized in that, The heterogeneous bridging fusion based on independent dimensions specifically includes: maintaining mutually independent item embedding matrices on the server side for clients of different model sizes; The high-dimensional independent item embedding matrix is ​​used as the teacher matrix E. T The low-dimensional independent item embedding matrix is ​​used as the student matrix E. S ; Construct a residual bridging network Bridge(·) to map the student matrix to a high-dimensional space. The mapping formula is as follows: Bridge (E S )=W proj HAVE BEEN S +ReLU(W refine (W proj HAVE BEEN S )) Among them, W proj W represents the weight matrix of the linear projection layer. refine Let represent the weight matrix of the nonlinear refinement layer, and be the activation function.

8. The method according to claim 7, characterized in that, The heterogeneous bridging fusion also includes performing one-way knowledge distillation: the one-way knowledge distillation loss function is defined as follows: Loss distill =MSE(Bridge(E S ),E T .detach()) Where MSE represents the mean square error function, E T The `.detach()` method blocks the teacher matrix E during backpropagation gradient computation. T Gradient backpropagation; By minimizing the Loss distill Simultaneously update the residual bridging network and the student matrix E. S without updating the teacher matrix E T .

9. A recommender system based on graph hypernetworks and heterogeneous federated learning, characterized in that, include: Memory, used to store computer programs; A processor, configured to implement the steps of the recommendation method based on graph hypernetworks and heterogeneous federated learning as described in any one of claims 1 to 8 when executing the computer program.

10. A computer-readable storage medium having a computer program stored thereon, characterized in that, When the computer program is executed by a processor, it implements the steps of the recommendation method based on graph hypernetworks and heterogeneous federated learning as described in any one of claims 1 to 8.