Model parameter adjustment method and device, equipment and readable storage medium

By dynamically integrating model parameters based on similarity relationships in multi-center collaborative learning, the problem of low accuracy caused by model parameter differences is solved, and the accuracy and generalization performance of the target model are improved.

CN114298330BActive Publication Date: 2026-06-26TENCENT TECHNOLOGY (SHENZHEN) CO LTD

Patent Information

Authority / Receiving Office
CN · China
Patent Type
Patents(China)
Current Assignee / Owner
TENCENT TECHNOLOGY (SHENZHEN) CO LTD
Filing Date
2021-08-06
Publication Date
2026-06-26

AI Technical Summary

Technical Problem

In multi-center collaborative learning, the differences in model parameters among training nodes are ignored, resulting in low accuracy of the generated model parameters and affecting the data processing effect.

Method used

By acquiring the model parameters of multiple model training nodes, weight factors are determined based on similarity relationships for weighted integration, and the results are fed back to the nodes for iterative training until the target model is obtained.

Benefits of technology

It improves the accuracy and generalization performance of the target model, solves the problem of low accuracy caused by differences in model parameters, and enhances the effect of multi-node data integration.

✦ Generated by Eureka AI based on patent content.

Smart Images

  • Figure CN114298330B_ABST
    Figure CN114298330B_ABST
Patent Text Reader

Abstract

The application discloses a model parameter adjustment method and device, equipment and a storage medium, and relates to the field of artificial intelligence. The method comprises the following steps: obtaining n model parameters from n model training nodes; determining weight factors corresponding to the n model parameters based on the similarity relationship between the n model parameters; performing weighted integration on the n model parameters based on the weight factors to obtain updated model parameters; feeding back the updated model parameters to the n model training nodes for cyclic iteration training until a target model is obtained, the target model being a model used for data prediction. Under the premise of ensuring the protection of data privacy, the model parameters are adjusted based on dynamic weights, the differences between the model parameters obtained by training the model training nodes are targetedly integrated, and the generalization performance of the target model is improved.
Need to check novelty before this filing date? Find Prior Art

Description

TECHNICAL FIELD

[0001] Embodiments of the present application relate to the field of artificial intelligence, and in particular to a model parameter adjustment method and device, equipment and a readable storage medium. BACKGROUND

[0002] Multi-center collaborative learning refers to a learning form in which multiple centers jointly conduct research programs to obtain satisfactory research results for studying a major issue. The significance of multi-center collaborative learning lies in overcoming the limitations of research results and avoiding the occurrence of biased results.

[0003] In related technologies, to avoid the problem of data privacy leakage caused by data transmission, a method of establishing a local model at each center, obtaining local model weights, and transmitting the average weight result obtained by averaging each model weight to the server end to update the global model is adopted to realize multi-center collaborative learning.

[0004] However, in the above-mentioned manner, the average value of the model weights ignores the certain difference between the model weights, and the accuracy of the finally generated model parameters is low, which affects the effect of data processing. SUMMARY

[0005] Embodiments of the present application provide a model parameter adjustment method, device, equipment and readable storage medium, which can improve the generalization performance of the model. The technical solution is as follows:

[0006] On the one hand, a model parameter adjustment method is provided, which comprises:

[0007] Obtaining n model parameters from n model training nodes, wherein the i-th model parameter is a model parameter obtained by the i-th model training node through internal sample data training, the internal sample data of at least two model training nodes is different, n is a positive integer, and 0

[0008] Determining weight factors corresponding to the n model parameters based on similarity relationships between the n model parameters, wherein the i-th similarity relationship and the weight factor corresponding to the i-th model parameter are in a negative correlation relationship, and the i-th similarity relationship is a similarity relationship between the i-th model parameter and other model parameters.

[0009] Weighted integration of the n model parameters based on the weight factors to obtain updated model parameters.

[0010] Feedback of the updated model parameters to the n model training nodes for cyclic iterative training until a target model is obtained, wherein the target model is a model used for data prediction.

[0011] On the other hand, a model parameter adjustment device is provided, the device comprising:

[0012] The acquisition module is used to acquire n model parameters from n model training nodes. The i-th model parameter is the model parameter obtained by the i-th model training node through internal sample data. There are at least two model training nodes with different internal sample data. n is a positive integer, 0 < i ≤ n.

[0013] The determination module is used to determine the weight factors corresponding to the n model parameters based on the similarity relationships between the n model parameters, wherein the i-th similarity relationship is negatively correlated with the weight factor corresponding to the i-th model parameter, and the i-th similarity relationship is the similarity relationship between the i-th model parameter and other model parameters;

[0014] The weighting module is used to weight and integrate the n model parameters based on the weighting factors to obtain updated model parameters;

[0015] The feedback module is used to feed back the updated model parameters to the n model training nodes for iterative training until the target model is obtained, which is a model used for data prediction.

[0016] On the other hand, a computer device is provided, the computer device including a processor and a memory, the memory storing at least one instruction, at least one program, code set or instruction set, the at least one instruction, the at least one program, the code set or instruction set being loaded and executed by the processor to implement the model parameter adjustment method as described in any of the embodiments of this application above.

[0017] On the other hand, a computer-readable storage medium is provided, wherein at least one instruction, at least one program, code set, or instruction set is stored therein, wherein the at least one instruction, the at least one program, the code set, or the instruction set is loaded and executed by a processor to implement the model parameter adjustment method as described in any of the embodiments of this application above.

[0018] On the other hand, a computer program product or computer program is provided, which includes computer instructions stored in a computer-readable storage medium. A processor of a computer device reads the computer instructions from the computer-readable storage medium and executes the computer instructions, causing the computer device to perform the model parameter adjustment method described in any of the above embodiments.

[0019] The beneficial effects of the technical solutions provided in this application include at least the following:

[0020] When data from multiple nodes cannot be shared, dynamic weights are used to integrate the model parameters trained on each node. During the integration process, weights are determined based on the similarity between the model parameters, thereby improving the accuracy of the target model's parameters. The target model's parameters are integrated in a targeted manner to address the differences between the model parameters trained on the training nodes, thus enhancing the generalization performance of the target model. Attached Figure Description

[0021] To more clearly illustrate the technical solutions in the embodiments of this application, the accompanying drawings used in the description of the embodiments will be briefly introduced below. Obviously, the accompanying drawings described below are only some embodiments of this application. For those skilled in the art, other drawings can be obtained based on these drawings without creative effort.

[0022] Figure 1 This is a schematic diagram of an implementation environment provided by an exemplary embodiment of this application;

[0023] Figure 2 This is a schematic diagram illustrating the adjustment process of model parameters provided in an exemplary embodiment of this application;

[0024] Figure 3 This is a flowchart of a method for adjusting model parameters provided in an exemplary embodiment of this application;

[0025] Figure 4 This is a flowchart of a method for adjusting model parameters provided in another exemplary embodiment of this application;

[0026] Figure 5 This is a schematic diagram of a model parameter adjustment scheme provided in an illustrative embodiment of this application;

[0027] Figure 6 This is a structural block diagram of a model parameter adjustment device provided in an exemplary embodiment of this application;

[0028] Figure 7 This is a structural block diagram of a model parameter adjustment device provided in another exemplary embodiment of this application;

[0029] Figure 8 This is a structural block diagram of a model parameter adjustment device provided in another exemplary embodiment of this application;

[0030] Figure 9 This is a structural block diagram of a server provided in an exemplary embodiment of this application. Detailed Implementation

[0031] To make the objectives, technical solutions, and advantages of this application clearer, the embodiments of this application will be described in further detail below with reference to the accompanying drawings.

[0032] First, a brief introduction to the terms used in the embodiments of this application will be given.

[0033] Artificial Intelligence (AI) is the theory, methods, technology, and application systems that use digital computers or machines controlled by digital computers to simulate, extend, and expand human intelligence, perceive the environment, acquire knowledge, and use that knowledge to achieve optimal results. In other words, AI is a comprehensive technology within computer science that attempts to understand the essence of intelligence and produce a new kind of intelligent machine that can react in a way similar to human intelligence. AI studies the design principles and implementation methods of various intelligent machines, enabling them to possess the functions of perception, reasoning, and decision-making.

[0034] Artificial intelligence (AI) is a comprehensive discipline encompassing a wide range of fields, including both hardware and software technologies. Fundamental AI technologies generally include sensors, dedicated AI chips, cloud computing, distributed storage, big data processing, operating / interactive systems, and mechatronics. AI software technologies primarily include computer vision, speech processing, natural language processing, and machine learning / deep learning.

[0035] Convolutional Neural Network (CNN): A type of feedforward neural network, it is one of the most representative network architectures in deep learning technology and a current research hotspot in the fields of speech analysis and image recognition. A CNN consists of convolutional layers, activation layers, and pooling layers, and its output is a specific feature space for each image's domain. The most important function of a CNN is to iteratively adjust the network weights through training data, which can significantly improve the classification accuracy of the data.

[0036] Multi-institutional collaboration refers to the collaboration of staff from multiple institutions or professions within a single organization or leadership to jointly execute a research plan and task towards a common goal, aiming to achieve satisfactory research results within a relatively short period. Multi-institutional collaboration can effectively avoid individual bias and cognitive problems, thus mitigating the limitations of research findings.

[0037] Centered Kernel Alignment (CKA) is a similarity metric function used to measure the similarity between representations learned by neural network layers. The algorithm takes two layers of representations as input and outputs a similarity score between 0 (completely dissimilar) and 1 (identical representations).

[0038] Hilbert-Schmidt Independence Criterion (HSIC): This is a statistical measure that is calculated by defining a cross-covariance operator on the reproducing kernel Hilbert space and deriving a suitable statistic from the operator to determine the magnitude of independence. It is used to measure the independence of two variables.

[0039] Due to the need to protect patient privacy within hospitals, hospitals cannot exchange case-related data. However, the high-performance deep learning networks in the field of automated assisted diagnosis currently require a large amount of training data. Therefore, patient privacy protection policies greatly limit the application of deep learning on multi-center training data. In related technologies, to achieve multi-center collaborative learning, the weights of the models trained at each center are averaged, and the average weights are transmitted instead of traditional data transmission. This allows for interaction between deep learning models while protecting patient privacy. However, because the model parameters at each training node are different, the method of averaging model weights often ignores model parameters with low similarity to others at each training node, limiting the generalization ability of deep learning models.

[0040] First, the implementation environment involved in the embodiments of this application will be described, for illustrative purposes only. Please refer to [the relevant documentation]. Figure 1 The implementation environment includes n model training nodes 110 and a server 120, and the model training nodes 110 and the server 120 are connected through a communication network 130.

[0041] In this embodiment, n model training nodes 110 are deployed independently, and each of the n model training nodes 110 includes internal sample data. The internal sample data of each model training node 110 is different or partially different; alternatively, at least two model training nodes 110 have different internal sample data. Optionally, the sample data in each model training node 110 is used to train its respective internal data processing model to obtain model parameters. Since the internal sample data of each model training node 110 has its own characteristics, the generalization ability of the trained model parameters is relatively weak. Therefore, in this embodiment, the model parameters trained by each model training node 110 are integrated while the internal sample data of each model training node 110 is not interconnected.

[0042] In some embodiments, the model training nodes 110 send their trained model parameters to the server 120 via the communication network 130. The server 120 integrates the received model parameters.

[0043] Indicative, such as Figure 2 As shown, n model training nodes 110 each train their own internal sample data to obtain corresponding model parameters 210, and then send the model parameters 210 to the server 120. The server 120 performs similarity analysis on the obtained model parameters 210 to obtain the weight factors 220 corresponding to each model parameter 210. After weighting and integrating the model parameters 210 according to the weight factors 220, the updated model parameters 230 are obtained as the integrated model parameters.

[0044] In some embodiments, the analysis of the updated model parameters 230 is performed iteratively. That is, after the server 120 obtains the updated model parameters 230 after the kth iteration, it feeds the updated model parameters 230 back to each model training node 110. The model training node 110 continues to iteratively train the updated model parameters 230 after the kth iteration using internal sample data to obtain the model parameters in the (k+1)th iteration.

[0045] In some embodiments, the target model described above is a model for data processing, such as a data analysis model, a data classification model, a data recognition model, etc., and this application embodiment does not limit it.

[0046] To illustrate, taking a medical image recognition model as an example, the above model training nodes are implemented as device nodes corresponding to various medical institutions. Since medical image data between different medical institutions cannot be shared, each device node corresponding to a medical institution uses its own internal medical image data as sample data for training the image recognition model and obtains model parameters. The model parameters are then sent to the server for parameter integration.

[0047] The aforementioned model training nodes can be implemented as various types of devices such as mobile phones, tablets, desktop computers, laptops, and node servers, and this application embodiment does not limit them.

[0048] It is worth noting that the aforementioned servers can be independent physical servers, server clusters or distributed systems composed of multiple physical servers, or cloud servers that provide basic cloud computing services such as cloud services, cloud databases, cloud computing, cloud functions, cloud storage, network services, cloud communication, middleware services, domain name services, security services, content delivery networks (CDN), and big data and artificial intelligence platforms.

[0049] Cloud technology refers to a hosting technology that unifies hardware, software, and network resources within a wide area network (WAN) or local area network (LAN) to achieve data computation, storage, processing, and sharing. Based on the cloud computing business model, cloud technology encompasses network technology, information technology, integration technology, management platform technology, and application technology. It can form resource pools, providing flexible and convenient on-demand access. Cloud computing technology will become a crucial support. Backend services of technical network systems require substantial computing and storage resources, such as video websites, image websites, and many portal websites. With the rapid development and application of the internet industry, every item may have its own identification mark in the future, requiring transmission to backend systems for logical processing. Data at different levels will be processed separately, and various industry data will require robust system support, which can only be achieved through cloud computing.

[0050] In some embodiments, the server described above can also be implemented as a node in a blockchain system. Blockchain is a novel application model of computer technologies such as distributed data storage, peer-to-peer transmission, consensus mechanisms, and cryptographic algorithms. Essentially, a blockchain is a decentralized database, a chain of data blocks linked together using cryptographic methods. Each data block contains information about a batch of network transactions, used to verify the validity of the information (anti-counterfeiting) and generate the next block. A blockchain can include a blockchain underlying platform, a platform product service layer, and an application service layer.

[0051] The application scenarios provided in the embodiments of this application are introduced. Illustratively, the model parameter adjustment method provided in the embodiments of this application can be applied to at least one of the following scenarios:

[0052] First, in medical image recognition scenarios, lesion classification models are used to classify and identify medical images. The training process for these lesion classification models involves independent training by multiple medical institutions and integrated training by a server. Specifically, each medical institution trains and adjusts the model parameters of the lesion classification model using internal sample medical images. After obtaining their respective model parameters, they send these parameters to the server. The server determines the weight factor for each model parameter based on the similarity relationship between the parameters, thereby updating the model parameters. These updated parameters are then fed back to each medical institution for further iterative training, ultimately resulting in a trained lesion classification model for classifying and recognizing medical images.

[0053] Second, in online shopping applications, user interest tags are predicted based on their historical shopping behavior and personal information. This is achieved through an interest tag prediction model. Since data is not shared between multiple shopping applications, each application's backend server trains its own interest tag prediction model using user data (including historical shopping data and personal information data). The model parameters are then sent to the server, which determines the weight factor for each parameter based on the similarity relationship between them. This updated model parameter is then fed back to the backend servers of each shopping application for further iterative training, ultimately resulting in a fully trained interest tag prediction model.

[0054] Third, in short video social applications, predictions are made based on the number of times users click on short videos and the types of videos they follow. Specifically, a video follower prediction model is used to predict user interest tags. Since data is not shared between multiple short video social applications, each application's backend server trains its interest tag prediction model using user data (including the number of times users click on short videos and the types of short videos they follow). The model parameters are then sent to the server, which determines the weight factor for each parameter based on the similarity relationship between them. This updated model parameter is then fed back to the backend servers of each short video application for further iterative training, ultimately resulting in a fully trained interest tag prediction model.

[0055] It is worth noting that the above application scenarios are merely illustrative examples, and the specific application scenarios of the model parameter adjustment methods in this application embodiment are not limited.

[0056] Based on the above terminology and process flow, the method for adjusting the model parameters provided in this application is explained, using the method described above. Figure 1 The following explanation uses the server execution example as an example. Figure 3 As shown, the method includes:

[0057] Step 301: Obtain n model parameters from n model training nodes.

[0058] Here, the i-th model parameter is the model parameter obtained by training the i-th model training node through internal sample data. There are at least two model training nodes with different internal sample data. n is a positive integer, 0 < i ≤ n.

[0059] In some embodiments, the n model parameters corresponding to the n model training nodes are model parameters obtained by training models of the same type, such as: all n model parameters are model parameters obtained by training a classification model; or, all n model parameters are model parameters obtained by training a recognition model.

[0060] In some embodiments, the n model parameters are model parameters obtained by training with the same type of sample data. That is, the internal sample data used to train the model parameters in the n model training nodes are of the same type. Taking medical imaging data as an example, the medical scan image data used to train the model parameters in the n model training nodes are medical scan image data for the same body part, such as: computed tomography (CT) images of the lungs. Illustratively, the i-th model training node trains the lung disease classification model using internally stored patient lung CT images to obtain the i-th model parameter, and then sends the i-th model parameter to the server.

[0061] Among them, the internal sample data of the n model training nodes belong to the same data type, such as medical images of diseases, disease description data, patient personal information data, etc., and this application embodiment does not limit this.

[0062] In some embodiments, the n model parameters are obtained through training with internal sample data. Illustratively, taking supervised training of a classification model using internal sample data by a model training node as an example, that is, the internal sample data of the model training node corresponds to sample labels. An initial classification model is obtained within the model training node. The feature extraction layer in the initial classification model extracts the sample features from the internal sample data, and classification prediction is performed on the sample features to obtain the prediction result. Based on the difference between the prediction result and the sample labels, the model parameters of the classification model are adjusted to obtain the adjusted model parameters. Specifically, the model parameters of the classification model are iteratively adjusted using internal sample data to obtain the adjusted model parameters.

[0063] In some embodiments, the model parameters are adjusted in batches using internal sample data. That is, the internal sample data is randomly divided into m batches, and the model parameters are adjusted using one batch of internal sample data each time, where m is a positive integer.

[0064] Since the internal sample data in the n model training nodes are different, that is, the model parameters obtained by the n model training nodes are independent and have weak generalization ability, it is necessary to integrate the model parameters obtained by the n model training nodes.

[0065] Step 302: Based on the similarity relationship between the n model parameters, determine the weight factors corresponding to the n model parameters respectively.

[0066] Here, the i-th similarity relationship is negatively correlated with the weight factor corresponding to the i-th model parameter. The i-th similarity relationship is the similarity relationship between the i-th model parameter and other model parameters. Other model parameters refer to model parameters other than the i-th model parameter among the n model parameters.

[0067] In some embodiments, determining the similarity relationship among n model parameters includes at least one of the following methods:

[0068] 1. Select the i-th model parameter from n model parameters, compare the similarity between the other model parameters and the i-th model parameter, and take the similarity comparison result as the similarity value of the i-th model parameter;

[0069] 2. Set different similarity levels and set a similarity threshold range for each similarity level. For the i-th model parameter, calculate the similarity between the other model parameters and the i-th model parameter. Based on the similarity between the other model parameters and the i-th model parameter, determine the similarity level corresponding to the i-th model parameter, which is used as the similarity relationship between the i-th model parameter and other model parameters.

[0070] It is worth noting that the above similarity relationship analysis method is only an illustrative example, and the specific method for analyzing the similarity relationship between n model parameters is not limited in the embodiments of this application.

[0071] In some embodiments, illustratively speaking, after obtaining n model parameters, it is necessary to calculate the similarity of each model parameter relative to the other model parameters. Illustratively speaking, the center kernel alignment algorithm is used to calculate the similarity between any two of the n model parameters. Please refer to the following formula 1 for details:

[0072]

[0073] Here, K and L are the model parameters for the two similarities to be tested. Obviously, K∈n, L∈n, and HSIC is the Hilbert-Schmidt independence operator (hereinafter referred to as the HSIC operator), which is used to measure whether the two model parameters to be tested are independent of each other.

[0074] In some embodiments, a similarity analysis is performed on any two model parameters out of n model parameters to obtain a similarity matrix. The elements in the similarity matrix indicate the similarity between model parameters corresponding to a row and column. The average value of the elements in the i-th row of the similarity matrix is ​​taken as the similarity value corresponding to the i-th model parameter; or, the average value of the elements in the i-th column of the similarity matrix is ​​taken as the similarity value corresponding to the i-th model parameter. That is, the average similarity between the i-th model parameter and other model parameters is taken as the similarity value corresponding to the i-th model parameter.

[0075] Schematic: Based on the HSIC operator results, a center kernel alignment algorithm is applied to obtain a K×K similarity matrix, where each element of the similarity matrix represents the similarity between one model parameter and another. Optionally, the elements in the i-th row of the similarity matrix are averaged to obtain the average element value of the i-th row, and this average element value is used as the similarity value of the i-th model parameter; alternatively, the elements in the i-th column of the similarity matrix are averaged to obtain the average element value of the i-th column, and this average element value is used as the similarity value of the i-th model parameter.

[0076] It is worth noting that the above-mentioned method for obtaining the similarity value corresponding to the i-th model parameter is only an illustrative example, and the specific method for obtaining the similarity value corresponding to the i-th model parameter in this application embodiment is not limited.

[0077] Secondly, after obtaining the similarity values ​​of the model parameters, the similarity values ​​of each model parameter are normalized to obtain the weight factor corresponding to the selected model parameter. The weight factor represents the weight ratio of the corresponding model parameter in the n model parameters. The larger the weight factor, the higher the weight ratio of the model parameter corresponding to the weight factor. For illustration, as shown in Formula 2:

[0078]

[0079] Among them, a i c represents the weight factor corresponding to the i-th model parameter. iLet represent the similarity value of the i-th model parameter. Calculate the sum of the similarity values ​​of the n model parameters. Then, calculate the difference between the similarity value of the i-th model parameter and the sum of the similarity values ​​of the n model parameters minus 1. This difference is used as the weight factor for the i-th model parameter. As shown in Formula 2, the higher the similarity value of the selected model parameter, i.e., the more similar it is to other model parameters, the smaller the resulting weight factor. Therefore, the similarity value of a model parameter and its weight factor are negatively correlated. That is, determine the similarity values ​​corresponding to the n model parameters to obtain the total similarity; based on the ratio of the similarity value of the i-th model parameter to the total similarity, determine the weight factor corresponding to the i-th model parameter.

[0080] In some embodiments, in addition to the methods described above, at least one of the following methods may be used to obtain the weighting factors:

[0081] 1. Perform threshold clustering on the model parameters based on their similarity values. Group model parameters with similarity values ​​within the same threshold range into the same cluster, and assign weight factors to the model parameters based on the similarity values ​​corresponding to each cluster.

[0082] 2. Sort the model parameters in descending order based on their similarity values, and assign the preset weight factor values ​​to the model parameters in ascending order.

[0083] It is worth noting that the above-mentioned method for determining the weighting factors is merely an illustrative example, and the specific method for determining the weighting factors is not limited in the embodiments of this application.

[0084] Step 303: Based on the weighting factors, the n model parameters are weighted and integrated to obtain the updated model parameters.

[0085] In some embodiments, the weighted sum of n model parameters is used as the updated model parameter; or, the weighted product of n model parameters is used as the updated model parameter; or, the weighted average of n model parameters is used as the updated model parameter. In this embodiment, the weighted sum of n model parameters is used as the updated model parameter for illustration.

[0086] As an illustration, Formula 3 is used to obtain the updated model parameters. Formula 3 is shown below:

[0087]

[0088] in, This represents the updated model parameters obtained from training with the internal sample data of the q-th batch. Let represent the i-th model parameter obtained from training on the internal sample data of the q-th batch. Then, we will combine the i-th model parameter with its corresponding weight factor 'a'. iThe product of the two parameters is used as the update parameter for the i-th model parameter. The n update parameters are weighted and summed to obtain the updated model parameter. That is, the product of the weight factor corresponding to the i-th model parameter and the i-th model parameter is determined as the i-th update parameter; the sum of the n update parameters is determined as the updated model parameter.

[0089] The updated parameter of the i-th model parameter represents the weight percentage of the i-th model parameter in the updated model parameters. Influenced by the weight factor, a larger weight factor value results in a higher weight percentage for this model parameter in the updated model parameters; conversely, a smaller weight factor value results in a lower weight percentage. When the weight factor value is larger, that is, when the similarity of the corresponding model parameter to others is lower, the weight percentage of this model parameter in the updated model parameters is higher, meaning it has a greater impact on changes in the updated model parameters, and the server will pay more attention to model parameters with higher weight percentages.

[0090] Step 304: Update the model parameters and feed them back to n model training nodes for iterative training until the target model is obtained.

[0091] The target model is a model used for data prediction.

[0092] In some embodiments, iterative training of updated model parameters refers to the model training node using updated model parameters to train the model in the next round, such as continuing to adjust the updated model parameters using internal sample data.

[0093] In summary, the method provided in this embodiment integrates the model parameters trained by each node using dynamic weights when multi-node data cannot be shared. During the integration process, the weights are determined based on the similarity relationship between the model parameters, thereby ultimately improving the accuracy of the target model's model parameters. The target model's model parameters are integrated in a targeted manner to address the differences between the model parameters trained by the model training nodes, thus improving the generalization performance of the target model.

[0094] In an optional embodiment, the model parameters are updated to the model parameters obtained in the k-th iteration. Figure 4 This is a flowchart of a method for adjusting model parameters provided in an exemplary embodiment of this application. In this embodiment, the method is described using an example where it is executed by a server. Figure 4 As shown, the method includes:

[0095] Step 401: Obtain n model parameters from n model training nodes.

[0096] Here, the i-th model parameter is the model parameter obtained by training the i-th model training node through internal sample data. There are at least two model training nodes with different internal sample data. n is a positive integer, 0 < i ≤ n.

[0097] It is worth noting that the process of obtaining n model parameters has been explained in step 301 above, and will not be repeated here.

[0098] Step 402: Based on the similarity relationship between the n model parameters, determine the weight factors corresponding to the n model parameters respectively.

[0099] Here, the i-th similarity relationship is negatively correlated with the weight factor corresponding to the i-th model parameter. The i-th similarity relationship is the similarity relationship between the i-th model parameter and other model parameters. Other model parameters refer to model parameters other than the i-th model parameter among the n model parameters.

[0100] Step 403: Based on the weighting factors, the n model parameters are weighted and integrated to obtain the updated model parameters.

[0101] In some embodiments, the weighted sum of n model parameters is used as the updated model parameter; or, the weighted product of n model parameters is used as the updated model parameter; or, the weighted average of n model parameters is used as the updated model parameter. In this embodiment, the weighted sum of n model parameters is used as the updated model parameter for illustration.

[0102] Step 404: After updating the model parameters and feeding them back to the n model training nodes, obtain n iterative model parameters from the n model training nodes.

[0103] Among them, the n iterative model parameters are the model parameters obtained by training the n model training nodes with internal sample data in the (k+1)th iteration.

[0104] In some embodiments, the feedback of updated model parameters includes single-line broadcast and multi-broadcast, and the updated model parameters are synchronized to n model training nodes in real time via wireless communication.

[0105] In some embodiments, after updating the model parameters and feeding them back to n model training nodes, the n model training nodes perform a new round of iterative training based on the updated model parameters. The training methods include at least one of the following:

[0106] 1. The model parameters obtained in the k-th iteration of the model training node are replaced with the updated model parameters fed back by the server, and the updated model parameters are trained using internal sample data to obtain the iterative model parameters in the (k+1)-th iteration. The iterative model parameters from the (k+1)-th iteration are fed back to the server for further weighted integration.

[0107] 2. The model parameters obtained in the k-th iteration of the model training node are combined with the updated model parameters fed back from the server to obtain the model parameters to be adjusted in the (k+1)-th iteration. For example, the average of the model parameters obtained in the k-th iteration and the updated model parameters fed back from the server is calculated as the model parameters to be adjusted in the (k+1)-th iteration. Internal sample data is used to train these adjusted model parameters to obtain the iterative model parameters in the (k+1)-th iteration. The iterative model parameters from the (k+1)-th iteration are then fed back to the server for further weighted integration. In the (k+1)-th iteration, the accuracy of the model parameters obtained in the k-th iteration is preserved, while the generalization ability of the updated model parameters fed back from the server is integrated, thus improving the model training efficiency.

[0108] It is worth noting that the above training methods are merely illustrative examples, and the embodiments of this application do not limit the cyclic training method of the model training nodes.

[0109] Step 405: Based on the weight factors corresponding to the n iterative model parameters, determine the updated model parameters obtained in the (k+1)th iteration, until the model parameters of the n model training nodes converge.

[0110] When the n iterative model parameters meet the cyclic iteration conditions, the updated model parameters obtained in the (k+1)th cyclic iteration are determined based on the weight factors corresponding to the n iterative model parameters.

[0111] In some embodiments, after obtaining n iterative model parameters through k iterations at n model training nodes, any two iterative model parameters are selected for similarity analysis to obtain a similarity matrix of the iterative model parameters. The average similarity of the selected iterative model parameters is obtained by averaging the elements in the similarity matrix, and this average similarity is taken as the corresponding similarity value. The similarity values ​​corresponding to the n iterative model parameters are determined, resulting in the sum of the similarities of the n iterative model parameters. The weighting factor corresponding to each iterative model parameter is then determined according to Formula 2 above. Based on the weighting factors of the iterative model parameters, the iterative model parameters are weighted and integrated to obtain the updated model parameters corresponding to the iterative model parameters; that is, the updated parameter model obtained from k+1 iterations is determined.

[0112] The iterative process stops after the model parameters of the n model training nodes converge. The convergence of the model parameters at each training node can be achieved through at least one of the following methods:

[0113] 1. Set a convergence threshold. If a training node of the model exceeds this threshold, it indicates that the training node of the model has reached convergence.

[0114] 2. Draw lines on the coordinate axes for the model parameters. If the resulting curve gradually flattens out and no longer rises or falls, it indicates that the model parameters have converged.

[0115] It is worth noting that the above convergence methods are merely illustrative examples, and the specific methods for model training node convergence in the embodiments of this application are not limited.

[0116] Optionally, the model parameters of the n model training nodes may converge completely or not, depending on the actual situation, and no limitation is made here.

[0117] In summary, the method provided in this embodiment integrates the model parameters trained by each node using dynamic weights when multi-node data cannot be shared. During the integration process, the weights are determined based on the similarity relationship between the model parameters, thereby ultimately improving the accuracy of the target model's model parameters. The target model's model parameters are integrated in a targeted manner to address the differences between the model parameters trained by the model training nodes, thus improving the generalization performance of the target model.

[0118] In some embodiments, after updating the model parameters and feeding them back to n model training nodes for iterative training until the target model is obtained, the target model needs to be analyzed. This may involve classifying the target data corresponding to the target model to obtain a data classification model; or, it may involve performing content recognition on the target data corresponding to the target model to obtain a data recognition model. That is, the target model is used to classify the target data, making it a data classification model; or, the target model is used to perform content recognition on the target data, making it a data recognition model. This can also be illustrated by performing lesion classification or lesion detection tasks on the target model, but this is not limited to these specific tasks.

[0119] In some embodiments, please refer to Figure 5 , Figure 5 This is a schematic diagram illustrating a model parameter adjustment scheme provided in an illustrative embodiment of this application, as shown below. Figure 5As shown, taking the model training nodes of medical institutions as an example, the model training nodes 510 of n medical institutions have n model parameters. Taking the lung CT image parameters as an example, each medical institution obtains its own model parameters for lung CT images by training on the lung CT images within its own institution, including feature extraction. The n model parameters for lung CT images are sent to the server 520. On the server 520, a similarity calculation 530 is performed on the n model parameters to obtain the similarity results. Based on the similarity results, the weight factors 540 corresponding to the model parameters are determined. Based on the weighted integration result of the model parameters and their corresponding weight factors 540, updated model parameters 550 for lung CT images are obtained. The updated model parameters 550 are sent to the model training nodes 510 for iterative training, continuously updating the model parameters of the medical institutions until the model parameters of each medical institution converge.

[0120] In summary, the method provided in this embodiment integrates the model parameters trained on each node using dynamic weights when multi-node data is not shared. During the integration process, weights are determined based on the similarity between model parameters, ultimately improving the accuracy of the target model's parameters. The target model's parameters are integrated in a targeted manner to address the differences between model parameters trained on different nodes, thus enhancing the generalization performance of the target model. Hospitals only need to upload their local model parameters and receive global model parameters to update their local models, resolving sensitive issues such as uploading training data. This allows for the training of deep learning models with good generalization performance on multi-center data without violating patient privacy regulations, increasing the likelihood of deep learning being applied in specific scenarios (such as healthcare).

[0121] Figure 6 This is a structural block diagram of a model parameter adjustment device provided in an exemplary embodiment of this application, such as... Figure 6 As shown, the device includes the following parts:

[0122] The acquisition module 610 is used to acquire n model parameters from n model training nodes, wherein the i-th model parameter is the model parameter obtained by the i-th model training node through internal sample data, and there are at least two model training nodes with different internal sample data, where n is a positive integer and 0 < i ≤ n;

[0123] The determining module 620 is used to determine the weight factors corresponding to the n model parameters based on the similarity relationship between the n model parameters, wherein the i-th similarity relationship is negatively correlated with the weight factor corresponding to the i-th model parameter, and the i-th similarity relationship is the similarity relationship between the i-th model parameter and other model parameters;

[0124] The weighting module 630 is used to perform weighted integration on the n model parameters based on the weighting factors to obtain updated model parameters;

[0125] The feedback module 640 is used to feed back the updated model parameters to the n model training nodes for iterative training until the target model is obtained, which is a model used for data prediction.

[0126] In an optional embodiment, the weighting module 630 is further configured to determine the product of the weight factor corresponding to the i-th model parameter and the i-th model parameter as the i-th update parameter; and to determine the sum of the n update parameters as the update model parameter.

[0127] In an optional embodiment, the determining module 620, as... Figure 7 As shown, it also includes:

[0128] The value taking unit 621 is used to take the average similarity between the i-th model parameter and other model parameters as the similarity value corresponding to the i-th model parameter;

[0129] The first determining unit 622 is used to determine the similarity values ​​corresponding to the n model parameters respectively, and to obtain the sum of similarities;

[0130] The second determining unit 623 is used to determine the weight factor corresponding to the i-th model parameter based on the ratio of the similarity value corresponding to the i-th model parameter to the sum of the similarities.

[0131] In an optional embodiment, the value-taking unit 621 is further configured to perform similarity analysis on any two model parameters among the n model parameters to obtain a similarity matrix, wherein the elements in the similarity matrix are used to indicate the similarity between the model parameters corresponding to the row and column; the average value of the elements in the i-th row is obtained from the similarity matrix as the similarity value corresponding to the i-th model parameter; or, the average value of the elements in the i-th column is obtained from the similarity matrix as the similarity value corresponding to the i-th model parameter.

[0132] In an optional embodiment, the updated model parameters are the model parameters obtained in the k-th iteration;

[0133] The feedback module 640, such as Figure 8 As shown, it also includes:

[0134] The acquisition unit 641 is used to feed back the updated model parameters to the n model training nodes, and then acquire n iterative model parameters from the n model training nodes. The n iterative model parameters are the model parameters obtained by the n model training nodes through the internal sample data in the (k+1)th iteration.

[0135] Convergence unit 642 is used to determine the updated model parameters obtained in the (k+1)th iteration based on the weight factors corresponding to the n iterative model parameters, until the model parameters of the n model training nodes converge.

[0136] The convergence unit 642 is further configured to, in response to the n iterative model parameters meeting the cyclic iteration conditions, determine the updated model parameters obtained in the (k+1)th cyclic iteration based on the weight factors corresponding to the n iterative model parameters.

[0137] In an optional embodiment, the feedback module 640 is further configured to classify the target data using the target model, wherein the target model is a data classification model; or, to perform content recognition on the target data using the target model, wherein the target model is a data recognition model.

[0138] It should be noted that the model parameter adjustment device provided in the above embodiments is only an example of the division of the above functional modules. In practical applications, the above functions can be assigned to different functional modules as needed, that is, the internal structure of the device can be divided into different functional modules to complete all or part of the functions described above. In addition, the model parameter adjustment device and the model parameter adjustment method embodiments provided in the above embodiments belong to the same concept, and their specific implementation process can be found in the method embodiments, which will not be repeated here.

[0139] Figure 9 This illustration shows a schematic diagram of a server provided in an exemplary embodiment of this application. The server may be as follows: Figure 1 The server shown.

[0140] Specifically, server 900 includes a central processing unit (CPU) 901, a system memory 904 including random access memory (RAM) 902 and read-only memory (ROM) 903, and a system bus 905 connecting system memory 904 and CPU 901. Server 900 also includes a mass storage device 906 for storing operating system 913, application programs 914, and other program modules 915.

[0141] Mass storage device 906 is connected to central processing unit 901 via a mass storage controller (not shown) connected to system bus 905. Mass storage device 906 and its associated computer-readable media provide non-volatile storage for server 900. That is, mass storage device 906 may include computer-readable media (not shown) such as hard disk or compact disc read-only memory (CD-ROM) drives.

[0142] Without loss of generality, computer-readable media can include computer storage media and communication media. Computer storage media includes volatile and non-volatile, removable and non-removable media implemented using any method or technology for storing information such as computer-readable instructions, data structures, program modules, or other data. Computer storage media include RAM, ROM, erasable programmable read-only memory (EPROM), electrically erasable programmable read-only memory (EEPROM), flash memory or other solid-state storage technologies, CD-ROM, digital versatile disc (DVD) or other optical storage, magnetic tape cassettes, magnetic tape, disk storage, or other magnetic storage devices. Of course, those skilled in the art will recognize that computer storage media are not limited to the above-mentioned types. The system memory 904 and mass storage device 906 described above can be collectively referred to as memory.

[0143] According to various embodiments of this application, server 900 can also be connected to a remote computer on a network, such as the Internet. That is, server 900 can be connected to network 912 via network interface unit 911 connected to system bus 905, or it can also use network interface unit 911 to connect to other types of networks or remote computer systems (not shown).

[0144] The aforementioned memory also includes one or more programs, which are stored in the memory and configured to be executed by the CPU.

[0145] Embodiments of this application also provide a computer device that can be implemented as follows: Figure 1The terminal or server shown. The computer device includes a processor and a memory, the memory storing at least one instruction, at least one program, code set, or instruction set, wherein the at least one instruction, at least one program, code set, or instruction set is loaded and executed by the processor to implement the model parameter adjustment method provided in the above-described method embodiments.

[0146] Embodiments of this application also provide a computer-readable storage medium storing at least one instruction, at least one program, code set, or instruction set, wherein the at least one instruction, at least one program, code set, or instruction set is loaded and executed by a processor to implement the model parameter adjustment method provided in the above-described method embodiments.

[0147] Embodiments of this application also provide a computer program product or computer program, which includes computer instructions stored in a computer-readable storage medium. A processor of a computer device reads the computer instructions from the computer-readable storage medium and executes the computer instructions, causing the computer device to perform the model parameter adjustment method described in any of the above embodiments.

[0148] Optionally, the computer-readable storage medium may include: read-only memory (ROM), random access memory (RAM), solid-state drives (SSDs), or optical discs, etc. The random access memory may include resistive random access memory (ReRAM) and dynamic random access memory (DRAM). The sequence numbers of the embodiments in this application are merely descriptive and do not represent the superiority or inferiority of the embodiments.

[0149] Those skilled in the art will understand that all or part of the steps of the above embodiments can be implemented by hardware or by a program instructing related hardware. The program can be stored in a computer-readable storage medium, such as a read-only memory, a disk, or an optical disk.

[0150] The above description is merely an optional embodiment of this application and is not intended to limit this application. Any modifications, equivalent substitutions, improvements, etc., made within the spirit and principles of this application should be included within the protection scope of this application.

Claims

1. A method for adjusting model parameters, characterized in that, Performed by a computer device, the method includes: n model parameters are obtained from n model training nodes, where the i-th model parameter is the model parameter obtained by the i-th model training node through internal sample data. There are at least two model training nodes with different internal sample data. n is a positive integer, 0 < i ≤ n. The internal sample data is the medical image data of the model training node. The average similarity between the i-th model parameter and other model parameters is used as the similarity value corresponding to the i-th model parameter; wherein, a similarity analysis is performed on any two model parameters among the n model parameters to obtain a similarity matrix, and the elements in the similarity matrix are used to indicate the similarity between the model parameters corresponding to the row and column. The average value of the elements in the i-th row or i-th column of the similarity matrix is ​​obtained as the similarity value corresponding to the i-th model parameter; Based on the similarity values ​​of the model parameters, threshold clustering is performed on the n model parameters. Model parameters with similarity values ​​within the same threshold range are grouped into the same cluster, and a weight factor is assigned to the model parameters for the similarity values ​​corresponding to each cluster. The updated model parameters are obtained by weighting and integrating the n model parameters based on the weighting factors. The updated model parameters are fed back to the n model training nodes for iterative training until a target model is obtained. The target model is a model used for data prediction. The model training nodes are used to calculate the average of the model parameters obtained in the k-th iteration and the updated model parameters, which is used as the model parameters to be adjusted in the (k+1)-th iteration. The target model includes a lesion classification model, which is used to classify and identify lesions in medical images.

2. The method according to claim 1, characterized in that, The updated model parameters are the model parameters obtained in the kth iteration. The step of feeding back the updated model parameters to the n model training nodes for iterative training includes: After feeding back the updated model parameters to the n model training nodes, n iterative model parameters are obtained from the n model training nodes. The n iterative model parameters are the model parameters obtained by the n model training nodes through the internal sample data in the (k+1)th iteration. Based on the weight factors corresponding to the n iterative model parameters, determine the updated model parameters obtained in the (k+1)th iteration, until the model parameters of the n model training nodes converge.

3. The method according to claim 2, characterized in that, The determination of the updated model parameters obtained in the (k+1)th iteration based on the weight factors corresponding to the n iterative model parameters includes: When the n iterative model parameters meet the cyclic iteration conditions, the updated model parameters obtained in the (k+1)th cyclic iteration are determined based on the weight factors corresponding to the n iterative model parameters.

4. The method according to claim 1, characterized in that, After feeding back the updated model parameters to the n model training nodes for iterative training until the target model is obtained, the process further includes: The target data is classified using the target model, which is a data classification model. or, Content recognition is performed on the target data using the target model, which is a data recognition model.

5. A device for adjusting model parameters, characterized in that, The device includes: The acquisition module is used to acquire n model parameters from n model training nodes, wherein the i-th model parameter is the model parameter obtained by the i-th model training node through internal sample data, and there are at least two different internal sample data of the model training nodes, where n is a positive integer, 0 < i ≤ n, and the internal sample data is the medical image data of the model training node. The determination module is used to take the average similarity between the i-th model parameter and other model parameters as the similarity value corresponding to the i-th model parameter; wherein, a similarity analysis is performed on any two model parameters among the n model parameters to obtain a similarity matrix, the elements in the similarity matrix are used to indicate the similarity between model parameters corresponding to rows and columns, the average value of the elements in the i-th row or i-th column of the similarity matrix is ​​obtained as the similarity value corresponding to the i-th model parameter; threshold clustering is performed on the n model parameters according to the similarity value of the model parameters, the model parameters with similarity values ​​within the same threshold range are grouped into the same cluster, and a weight factor is assigned to the model parameter for the similarity value corresponding to each cluster; The weighting module is used to weight and integrate the n model parameters based on the weighting factors to obtain updated model parameters; The feedback module is used to feed back the updated model parameters to the n model training nodes for iterative training until a target model is obtained. The target model is a model used for data prediction. The model training nodes are used to calculate the average value of the model parameters obtained in the k-th iteration and the updated model parameters, which is used as the model parameters to be adjusted in the (k+1)-th iteration. The target model includes a lesion classification model, which is used to classify and identify lesions in medical images.

6. A computer device, characterized in that, The computer device includes a processor and a memory, the memory storing at least one program, which is loaded and executed by the processor to implement the method for adjusting model parameters as described in any one of claims 1 to 4.

7. A computer-readable storage medium, characterized in that, The storage medium stores at least one program segment, which is loaded and executed by a processor to implement the model parameter adjustment method as described in any one of claims 1 to 4.