Transverse federation learning system optimization method and device and readable storage medium
A technology of learning system and optimization method, applied in the field of machine learning, can solve the problems of high consumption of computing resources and high cost of model training time, and achieve the effect of reducing interaction and improving generalization ability
Active Publication Date: 2020-06-12
WEBANK (CHINA)
6 Cites 1 Cited by
AI-Extracted Technical Summary
Problems solved by technology
[0005] The main purpose of the present invention is to provide a horizontal federated learning system optimization method, equipment and readable storage medium, aiming at solving the problem that e...
Method used
Further, the neural network model to be trained can be used to predict the credit risk of bank, the input of neural network model can be user's characteristic data, output can be to user's risk score, participating equipment can be many banks Each of the devices has sample data of multiple users locally, and the coordinating device is a third-party server independent of multiple banks. The coordinating device and each participating device perform neural network model training according to the federated learning process in the above-mentioned embodiment, and finally obtain a neural network model for credit risk prediction. Each bank can use the trained neural network model to predict the user's credit risk, input the user's characteristic data into the trained model, and obtain the user's risk score. Since the coordination device randomly determines the neuron on-off mode of the neural network model during the training process and sends it to each participating device, each participating device performs neuron on-off processing for the local neural network according to the neuron on-off mode before performing local training. Then, the model training is completed, so that the generalization ability of the trained neural network model is improved, and it also has better credit risk prediction ability for new user data other than the training data. Moreover, the process of federated learning will not bring more time costs to each bank, and it also saves the computing resources of each bank's equipment.
In the present embodiment, input identical generator parameter into identical random number generator by each participating equipment, determine neuron opening and closing mode according to the output result of random number generator, and according to neuron opening and closing mode To open and close the neurons in the neural network model, and then perform local training on the processed neural network model, so that in each global model update of federated learning, each neuron in the neural network model is randomly closed, The interaction between neuron nodes is reduced, so that the trained neural network model will not be too dependent on some local features, and the generalization ability of the model is improved. Moreover, since each participating device uses the same generator parameters to input the same random number generator in each local training, and determines the neuron opening and closing mode according to the output result, so that each participating device will affect the neuron's The closing process is aligned, so as to avoid the non-uniform random selection results of various participating devices, which will cause the strategy of randomly selecting neuron closing to lose statistical significance. Moreover, the strategy of randomly selecting neurons to turn off in the embodiment of the present invention can be well combined with federated learning, and will not bring too much extra time, compared to the existing scheme to avoid overfitting Cost and computing resource consumption.
In the present embodiment, randomly determine the neuron opening and closing mode of the neural network model by the coordination equipment, and send to each participating device, for each participating device to perform neuron in the neural network model according to the neuron opening and closing mode Open and close processing, and then perform local training on the processed neural network model, so that in each global model update of federated learning, each neuron in the neural network model is randomly closed, reducing the interaction between neuron nodes, It makes the trained neural network model not too dependent on some local features and improves the generalization ability of the model. Moreover, since the neuron opening and closing mode is randomly determined by the coordinating device and sent to each participating device uniformly, the closing processing of the neurons of each participating device is aligned during local training, thus avoiding the random selection result of each participating device Strategies that randomly select neurons to turn off lose statistical significance due to non-uniformity. Moreover, the strategy of randomly selecting neurons to turn off in the embodiment of the present invention can be well combined with federated learning, and will not bring too much extra time cost, compared with the existing scheme to avoid overfitting and computing resource consumption.
In this embodiment, by adding the influence of the different learning rates of each participating device in the weight when each local model parameter update is carried out on a weighted average, the global model parameter update obtaine...
Abstract
The invention discloses a transverse federated learning system optimization method and device, and a readable storage medium. The method comprises the steps of randomly determining a neuron on-off mode of a to-be-trained neural network model; sending the neuron on-off mode to each participating device, so that each participating device carries out on-off processing on neurons in respective local neural network models according to the neuron on-off mode, and carries out local training on the processed neural network models to obtain local model parameter updates; and fusing the local model parameter updates, and sending global model parameter updates obtained by fusion to the participation devices, so that the participation devices perform model updating on the local neural network models according to the global model parameter updates. Compared with an existing scheme for avoiding an over-fitting phenomenon, the strategy for randomly selecting neuron closing adopted in the method can be well combined with federated learning, and excessive extra time cost and computing resource consumption cannot be brought.
Application Domain
Physical realisationNeural learning methods
Technology Topic
Machine learningFederated learning +8
Image
Examples
- Experimental program(1)
Example Embodiment
[0035] It should be understood that the specific embodiments described herein are only used to explain the present invention, but not to limit the present invention.
[0036] Such as figure 1 As shown, figure 1 It is a schematic diagram of the device structure of the hardware operating environment involved in the solution of the embodiment of the present invention.
[0037] It should be noted that the optimization device of the horizontal federated learning system in the embodiment of the present invention may be devices such as a smart phone, a personal computer, and a server, which is not specifically limited herein.
[0038] Such as figure 1 As shown, the horizontal federated learning system optimization device may include: a processor 1001, such as a CPU, a network interface 1004, a user interface 1003, a memory 1005, and a communication bus 1002. Among them, the communication bus 1002 is used to implement connection and communication between these components. The user interface 1003 may include a display screen (Display) and an input unit such as a keyboard (Keyboard), and the optional user interface 1003 may also include a standard wired interface and a wireless interface. The network interface 1004 may optionally include a standard wired interface and a wireless interface (such as a WI-FI interface). The memory 1005 may be a high-speed RAM memory, or a stable memory (non-volatile memory), such as a magnetic disk memory. Optionally, the memory 1005 may also be a storage device independent of the aforementioned processor 1001.
[0039] Those skilled in the art can understand, figure 1 The device structure shown in does not constitute a limitation on the optimization device of the horizontal federated learning system, and may include more or less components than shown in the figure, or a combination of certain components, or different component arrangements.
[0040] Such as figure 1 As shown, the memory 1005 as a computer storage medium may include an operating system, a network communication module, a user interface module, and a horizontal federated learning system optimization program.
[0041] when figure 1 When the device shown in is a coordination device participating in horizontal federated learning, the user interface 1003 is mainly used for data communication with the client; the network interface 1004 is mainly used to establish a communication connection with each participating device participating in the horizontal federated learning; and the processor 1001 can be used to call the horizontal federated learning system optimization program stored in the memory 1005 and perform the following operations:
[0042] Randomly determining the neuron opening and closing mode of the neural network model to be trained, wherein in the neuron opening and closing mode, some neurons of the neural network model are in a closed state;
[0043] The neuron opening and closing mode is sent to each participating device, so that each participating device can open and close the neurons in their local neural network model according to the neuron opening and closing mode, and perform the processing on the processed neurons. The neural network model is trained locally, and the local model parameters are updated and returned;
[0044] The local model parameter updates received from each participating device are fused, and the global model parameter updates obtained by the fusion are sent to each participating device, so that each participating device updates its local neural network model according to the global model parameter. Perform model updates.
[0045] Further, the step of randomly selecting the neuron opening and closing mode of the neural network model to be trained includes:
[0046] Randomly determine the neuron opening and closing mode when training the neural network model to be trained with each small batch of training data under each traversal in each global model update, where the local training data of each participating device is divided into the same batch number For multiple small batches of training data, the participating device traverses the local training data once for one period, and the number of local training sessions for each participating device is the same.
[0047] Further, the step of sending the neuron opening and closing pattern to each participating device includes:
[0048] The neuron opening and closing mode is distributed to each participating device in the form of a K*M*N dimensional matrix, where K is the number of local training periods for each participating device, and M is the number of small batches of training data for each participating device, N is the number of neurons in the neural network model, and the value of each element in the matrix is used to indicate the opening and closing state of the corresponding neuron.
[0049] Further, before the step of randomly determining the neuron opening and closing mode of the neural network model to be trained in each period of traversal in each global model update, the method further includes:
[0050] Obtain the data volume of a small batch of local training data of each participating device;
[0051] The learning rate of each participating device's local model update is set according to the data amount, so that each participating device can update the local model according to the learning rate, wherein the learning rate is proportional to the data amount.
[0052] Further, the step of fusing the local model parameter updates received from each participating device includes:
[0053] Perform a weighted average of the local model parameter updates received from each participating device to obtain the global model parameter update, wherein the weight of each participating device used in the weighted average operation is calculated based on the learning rate corresponding to each participating device get.
[0054] when figure 1 When the device shown in is a participating device participating in horizontal federated learning, the user interface 1003 is mainly used for data communication with the client; the network interface 1004 is mainly used to establish a communication connection with the coordination device participating in horizontal federated learning; and the processor 1001 It can be used to call the horizontal federated learning system optimization program stored in the memory 1005 and perform the following operations:
[0055] The generator parameters are input to the random number generator, and the neuron opening and closing mode of the neural network model to be trained is determined according to the output result of the random number generator, wherein the neural network model is in the neuron opening and closing mode Part of the neurons in is in a closed state, and each participating device uses the same generator parameters to input the same random number generator in each local training of the neural network model;
[0056] Open and close the neurons in the local neural network model according to the neuron opening and closing mode, and perform local training on the processed neural network model to obtain local model parameter updates and send them to the coordination device;
[0057] The local neural network model is updated by using the global model parameter update received from the coordination device, where the coordination device performs the global model parameter update obtained by fusing the local model parameter updates received from each participating device.
[0058] Further, the generator parameters include the iterative index of the global model update, the period index of the local training, the batch index of the small batch of training data, and the neuron index of the neural network model, wherein the respective local training data of each participating device If the training data is divided into multiple small batches of the same number of batches, the participating device traverses the local training data once as one period, and the number of periods of local training of each participating device is the same.
[0059] Further, the step of opening and closing the neurons in the local neural network model according to the neuron opening and closing mode includes:
[0060] Determine the neuron to be closed in the local neural network model according to the neuron opening and closing mode;
[0061] The output of the neuron to be turned off is set to zero to turn off the neuron to be turned off.
[0062] Based on the above structure, various embodiments of the optimization method of the horizontal federated learning system are proposed.
[0063] Reference figure 2 , figure 2 It is a schematic flowchart of the first embodiment of the optimization method of the horizontal federated learning system of the present invention.
[0064] The embodiment of the present invention provides an embodiment of a method for optimizing a horizontal federated learning system. It should be noted that although the logical sequence is shown in the flowchart, in some cases, the sequence shown here can be executed in a different order. Steps out or described.
[0065] The first embodiment of the method for optimizing the horizontal federated learning system of the present invention is applied to a coordinated device participating in horizontal federated learning. The coordinated device is communicatively connected with multiple participating devices participating in horizontal federated learning. The coordinated device and the participating device involved in the embodiment of the present invention may be Devices such as smartphones, personal computers and servers. In this embodiment, the optimization method of the horizontal federated learning system includes:
[0066] Step S10, randomly determining the neuron opening and closing mode of the neural network model to be trained, wherein in the neuron opening and closing mode, some neurons of the neural network model are in a closed state;
[0067] In this embodiment, the coordination device and each participating device can establish a communication connection in advance through inquiry handshake authentication and identity authentication, and determine the neural network model to be trained in this federated learning. The neural network model of the same or similar structure may be constructed locally by each participating device, or the neural network model may be constructed by the coordination device and sent to each participating device. Each participating device has the training data used to train the neural network model locally.
[0068] In horizontal federated learning, the coordination device and participating devices cooperate with each other to perform multiple global model updates for the neural network model to be trained. Model update refers to updating the model parameters of the neural network model, such as the connection weight value between neurons. , And finally get the neural network model that meets the quality requirements. In a global model update, each participating device uses their local training data to perform local training on the local neural network model to obtain a local model parameter update. The local model parameter update can be the gradient information used to update the model parameters, or it can be local Updated model parameters; each participating device sends its own local model parameter update to the coordination device; the coordination device merges each local model parameter update, such as a weighted average, to obtain the global model parameter update, and sends it to each participating device; Each participating device uses the global model parameter update to update the model parameters of the local neural network model, that is, to update the local neural network model to complete a global model update. After each global model update, the model parameters of the local neural network models of each participating device are synchronized.
[0069] In this embodiment, in order to avoid overfitting of the neural network model obtained by the federated learning training, the coordination device randomly determines the neuron opening and closing mode of the local neural network model when the participating device is trained locally. Among them, there are multiple neurons in the neural network model, and the neuron opening and closing mode is a combined mode that indicates whether each neuron in the neural network model is on or off. In the neuron opening and closing mode, the part of the neural network model The neuron is in the off state; the neuron in the off state can be that the output of the neuron is set to 0, or the neuron does not output to the next neuron, or the neuron is disconnected from the downstream neuron . If a neuron is in the off state, then the neuron will not play a role in the neural network model. That is, the coordination device randomly determines which neuron in the neural network model should be turned off and which neuron should be turned on when each participating device is trained locally.
[0070] Such as image 3 As shown, it is a schematic diagram of the result of randomly selecting neuron shutdown on a neural network model. By randomly selecting some neurons in the neural network model to turn off, the interaction between feature detectors (the hidden layer neuron nodes) can be reduced. Detector interaction means that some detectors rely on other detectors to function, thus making training The obtained neural network model does not rely too much on some local features, improves the generalization ability of the model, and avoids over-fitting. It should be noted that regardless of the number of neurons in the output layer (also called the last layer) of the neural network model is one or more, the output layer is not randomly selected, that is, the above-mentioned neuron opening and closing mode The output layer neurons are not included in, and the neurons in the input layer (also called the first layer) of the neural network model can be randomly selected, that is, which features are randomly selected as input.
[0071] The coordination device can randomly determine the neuron opening and closing mode of the neural network model in the global model update before the start of a global model update; it can also randomly determine the neural network model in the subsequent global model updates before the federal learning starts. The neuron opening and closing mode. It should be noted that each time is randomly determined, so the neuron opening and closing patterns of the neural network model in each global model update are not necessarily the same.
[0072] There are many ways to determine randomly. For example, in order to determine the opening and closing state of the first neuron in the neural network model during the local training of each participating device in a global model update, a random number generator can be used to generate a random number. The random number is compared with a preset value. If it is greater than the preset value, it is determined to turn off the neuron, and if it is not greater than the preset value, it is determined not to turn off the neuron.
[0073] Step S20: Send the neuron opening and closing mode to each participating device, so that each participating device can open and close the neurons in their local neural network model according to the neuron opening and closing mode, and The processed neural network model is trained locally, and the local model parameters are updated and returned;
[0074] After determining the neuron opening and closing mode, the coordination device sends the neuron opening and closing mode to each participating device. The form of the neuron opening and closing mode is not limited. For example, the coordination device and the participating device may prescribe the number of each neuron in the neural network model, and the coordination device sends the number of the neuron that needs to be closed to the participating device. After receiving the neuron opening and closing mode, each participating device first performs opening and closing processing on each neuron in the neural network model according to the neuron opening and closing mode before performing local training on their local neural network model. Specifically, in the neuron opening-and-closing mode, the neuron that is instructed to be closed is closed, and the neuron that is not instructed to be closed or the neuron that is instructed to be opened is not to be closed. After the opening and closing processing, the participating equipment then conducts local training on the processed neural network model to obtain local model parameter updates. Specifically, the participating device can input the local training data into the current neural network model to obtain the model output; calculate the loss function according to the model output and the local data label of the participating device, and then calculate the gradient information of the loss function to the model parameters, and the gradient The information is sent to the coordination device as a local model parameter update. It can also be that the participating device inputs the local training data into the current neural network model to obtain the model output; calculate the loss function according to the model output and the local data label of the participating device, then calculate the gradient of the loss function to the model parameters, and then use the gradient to Update the model parameters, and send the updated model parameters as local model parameter updates to the coordination device.
[0075] Participating equipment can shut down a neuron by disconnecting this neuron from the downstream neuron, or by not passing the output of the neuron to the next neuron, or it can be the activation function of this neuron The output is set to 0. During a local model parameter update process, the connection weights corresponding to the connections that are selected to be disconnected are also set to 0 (that is, the corresponding model parameters are also set to 0), and the gradients corresponding to the disconnected connections are also set to 0. If the participating device sends gradient information to the coordination device, the gradient information set to 0 may not be transmitted to the coordination device.
[0076] It should be noted that the coordination device sends the neuron opening and closing mode to be used in a local training to each participating device, and the neuron opening and closing mode used by each participating device in the local training is the same. The opening and closing states of neurons in the neural network model are also aligned. Such as Figure 4 As shown, the open and closed states of the neurons in the processed neural network model of the participating device A (participant A in the figure) and the participating device B (participant B in the figure) are aligned. This ensures that each participating device in the same global model update uses the same random selection result, and avoids that the random selection results of each participating device are inconsistent, causing the strategy of randomly selecting neuron shutdown to lose statistical significance.
[0077] It should also be noted that each participating device opens and closes the neural network model. After one or more local model updates, the closed neurons in the neural network model need to be restored. The next time the local model is updated, the The neural network model performs opening and closing processing, that is, multiple opening and closing processings are not superimposed.
[0078] In step S30, the local model parameter updates received from each participating device are merged, and the global model parameter updates obtained by the fusion are sent to each participating device, so that each participating device updates the respective local models according to the global model parameter update. Neural network model for model update.
[0079] The coordination device receives the local model parameter updates sent by each participating device, and merges the local model parameter updates to obtain a global model parameter update. Specifically, the coordination device may perform a weighted average on each local model parameter update, and the weight value may be set according to the specific conditions of each participating device, for example, may be set according to the proportion of the local training data of each participating device. The coordinating device sends the global model parameter update to each participating device. Each participating device updates its local neural network model according to the global model parameter update. Specifically, if the received global model parameter update is gradient information, the participating device uses the gradient information and the current model parameters of the local neural network model to calculate the updated model parameters, and uses the updated model parameters as the latest model parameters. This completes a global model update. If the received global model parameter update is a model parameter, the participating device uses the model parameter as the latest model parameter, that is, a global model update is completed.
[0080] Perform multiple global model updates in a loop. When the coordination device detects that the preset stopping conditions are met, the training can be stopped to obtain the final trained neural network model. The preset stopping condition may be a condition set in advance according to needs, and training is stopped when the condition is met, for example, it may be the convergence of the loss function, the number of iterations greater than a set number, or the training time greater than a set time.
[0081] In this embodiment, the neuron opening and closing mode of the neural network model is randomly determined by the coordination device, and sent to each participating device for each participating device to open and close the neurons in the neural network model according to the neuron opening and closing mode , And then perform local training on the processed neural network model, so that in each global model update of federated learning, each neuron in the neural network model is randomly turned off, reducing the interaction between neuron nodes and making the training obtain The neural network model does not rely too much on some local features, which improves the generalization ability of the model. In addition, since the coordination device randomly determines the neuron opening and closing mode and sends it to each participating device uniformly, the processing of neuron shutdown during local training of each participating device is aligned, thereby avoiding the random selection result of each participating device The non-uniformity causes the strategy of randomly selecting neuron shutdown to lose statistical significance. Moreover, the strategy of randomly selecting neuron shutdowns adopted in the embodiment of the present invention can be well combined with federated learning compared with the existing solutions to avoid over-fitting, and will not bring too much additional time cost. And computing resource consumption.
[0082] Further, the neural network model to be trained can be used to predict the credit risk of the bank. The input of the neural network model can be the user's characteristic data, the output can be the risk score of the user, and the participating devices can be devices of multiple banks. Each has sample data of multiple users locally, and the coordination device is a third-party server independent of multiple banks. The coordination device and each participating device train the neural network model according to the process of federated learning in the above embodiment, and obtain the neural network model that is finally used for credit risk prediction. Each bank can use the trained neural network model to predict the user's credit risk, and input the user's characteristic data into the trained model to obtain the user's risk score. Since the coordination device randomly determines the neuron opening and closing mode of the neural network model during the training process and sends it to each participating device, each participating device performs neuron opening and closing processing on the local neural network according to the neuron opening and closing mode, and then performs local training. Then complete the model training, so that the generalization ability of the neural network model obtained by the training is improved, and it also has a better credit risk prediction ability for new user data other than the training data. Moreover, the process of federated learning will not bring more time cost to each bank, and it also saves the computing resources of each bank's equipment.
[0083] It should be noted that the neural network model to be trained can also be used in other application scenarios besides credit risk estimation, such as performance level prediction, paper value evaluation, etc. The embodiment of the present invention does not limit it here. .
[0084] Further, based on the foregoing first embodiment, a second embodiment of the method for optimizing a federated learning system of the present invention is proposed. In this embodiment, the step S10 includes:
[0085] Step S101: Randomly determine the neuron opening and closing mode when the neural network model to be trained is trained with each small batch of training data under each traversal in each global model update, where the local training data of each participating device is divided into For multiple small batches of training data with the same number of batches, the participating device traverses the local training data once for one period, and the number of local training sessions for each participating device is the same.
[0086] In this embodiment, the local training data of each participating device can be divided into multiple small batches of training data, and the number of small batches of training data divided by each participating device is guaranteed to be the same. When each participating device trains locally, multiple periods of local training can be performed. After the participating device traverses the local training data once, it is one period, and the number of periods for each participating device to perform local training in the same global model update is the same. During the traversal process, the participating device uses a small batch of training data to update the local neural network model once. Then in a global model update, the number of local model updates to be performed by a participating device is the local small Multiply the batch number of batch training data by the number of local training sessions. Each participating device can negotiate to determine the number of batches of training data and the number of local training periods; or the coordination device can determine the number of batches and periods according to the data volume of the local training data of each participating device, and then send them to each participant equipment.
[0087] By dividing the local training data of the participating device into multiple small batches of data, the amount of data to be processed for each local model update of the participating device will not be too large, thereby reducing the computing pressure of the device and avoiding too much data to cause processing The device crashes or the calculation time is too long. Participating equipment's multi-period local training can make full use of the local training data of the participating equipment, thereby reducing the number of global model parameter updates, thereby reducing the communication consumption of the coordination equipment and the participating equipment.
[0088] Based on this, the coordinating device can randomly determine the neuron opening and closing mode of the neural network model to be trained under each traversal in each global model update, and then send each neuron opening and closing mode to Each participating device.
[0089] The coordination device may send one neuron opening and closing mode at a time, and the timing of sending is before each participating device adopts the neuron opening and closing mode to perform a certain local model update. It is also possible to send the neuron opening and closing modes to be used in the next global model update to each participating device. At this time, each small batch of training data is used for each traversal in the next global model update. The neuron opening and closing mode required for local model update. It should be noted that the coordination device may carry instruction information when sending the neuron opening and closing mode to indicate which local model update the neuron opening and closing mode of each participating device is used for. When using small batches of training data to update the local model, the same neuron opening and closing mode can be used, or different neuron opening and closing modes can be used, that is, the coordination device can determine the two or Multiple small batches of training data correspond to a neuron opening and closing mode, and it is not necessary to determine a neuron opening and closing mode for each small batch of training data under each traversal.
[0090] After receiving the neuron opening and closing mode, the participating equipment uses the neuron opening and closing mode to open and close the neurons in the local neural network model, and uses a small batch of training data corresponding to the neuron opening and closing model. The processed neural network model is updated locally. If the participating device receives all the neuron opening and closing modes required in one global model update, the participating device uses each neuron opening and closing mode to perform each local model update.
[0091] In this embodiment, by dividing the local training data of each participating device into small batches of the same number of training data, each participating device performs the same number of local training sessions, so that the coordination device can easily unify the various participating devices in each session. The neuron opening and closing mode when the local model is updated, so as to avoid the inconsistent random selection results of the neurons of each participating device and make the strategy of randomly closing the neurons lose statistical significance, and ensure the generalization ability of the trained neural network model.
[0092] Further, the step S20 includes:
[0093] The neuron opening and closing mode is distributed to each participating device in the form of a K*M*N dimensional matrix, where K is the number of local training periods for each participating device, and M is the number of small batches of training data for each participating device, N is the number of neurons in the neural network model, and the value of each element in the matrix is used to indicate the opening and closing state of the corresponding neuron.
[0094] Further, the coordination device can distribute the neuron opening and closing mode in the form of a K*M*N dimensional matrix to each participating device. Among them, K is the number of local training periods for each participating device, M is the number of small batches of training data in each participating device, and N is the number of neurons in the neural network model. The value of each element in the matrix is used to indicate the corresponding nerve Yuan's opening and closing state. The coordinating device and the participating devices can negotiate in advance to determine the value of each element in the matrix, and specify the meaning of different values. For example, the value of each element in the matrix can be 0 and 1,0 to indicate the corresponding neuron Closed, 1 means the corresponding neuron is open.
[0095] A column of the third dimension of the matrix can be regarded as a bitmap of length N, which is used to indicate the open and close states of N neurons in the neural network model. The second dimension of the matrix has M columns, which correspond to local model updates using M small batches of training data. The first dimension of the matrix has K rows, which correspond to K periods of local training.
[0096] After receiving the matrix, the participating device can determine the opening and closing state of each neuron when each small batch of training data is used for local training under each period of traversal in a global model update according to the value of each element in the matrix. Open and close the neurons of the neural network model to complete each local training. For example, in a global model update, during the first period of traversal, the first small batch of training data is used for local training. When the model parameters are updated, the participating device obtains the (1,1,1)th element to the ( The value of 1,1, N) elements, that is, the values of N elements are obtained, and the opening and closing states of N neurons are determined correspondingly according to the values of these N elements.
[0097] Further, the coordination device can generate the matrix randomly with a certain probability P before the start of a global model update. For example, for the (k, m, n)th element, the coordination device can generate a random value between 0 and 1. If the generated random number is greater than the probability P, the coordinating device sets the (k, m, n)th element of the matrix to 1; otherwise, the coordinating device sets the (k, m, n)th element of the matrix Elements are set to 0. Among them, k=1, 2,...,K; m=1, 2,...,M; n=1, 2,...,N.
[0098] In this embodiment, the coordination device uses the form of a matrix to send the neuron opening and closing mode to each participating device. Due to the simple form of the matrix and the small amount of data, the coordination device does not need to add excessive communication overhead to transmit neurons. Open and close mode, but ensure the alignment of the open and close mode of each participating device neuron.
[0099] Further, the coordination device may also generate K*M N-dimensional bitmaps, or generate K M*N bitmap matrices separately, and send the generated bitmaps to each participating device to indicate each Participating equipment performs neuron opening and closing operations on the neural network model.
[0100] Further, based on the above-mentioned first and second embodiments, a third embodiment of the method for optimizing a federated learning system of the present invention is proposed. In this embodiment, before the step S101, the method further includes:
[0101] Step S40: Obtain the data volume of the local small batch of training data of each participating device;
[0102] Since the data volume of the local training data of each participating device is not necessarily the same, and the batches of the locally divided small batches of training data are the same, the data volume of the local small batches of training data of each participating device may not be the same, in this case , The coordination device can set different learning rates for each participating device, so that the progress of the local training of each participating device can be kept synchronized.
[0103] Specifically, the coordination device obtains the data volume of the small batch of training data locally on each participating device. It may be that each participating device sends the local small batch of training data to the coordination device.
[0104] In step S50, the learning rate of the local model update of each participating device is set according to the amount of data, so that each participating device can update the local model according to the learning rate, wherein the learning rate is proportional to the amount of data.
[0105] The coordinating device sets the learning rate of the local model update of each participating device according to each acquired data volume. The coordinating device may correspondingly send the determined learning rate to each participating device. Specifically, the learning rate can be proportional to the amount of data. For example, the coordination device can set a learning rate for one of the participating devices, such as 0.01, and then calculate the ratio of the amount of data corresponding to other participating devices to the amount of data of the participating device, and The calculated ratio is multiplied by the learning rate to obtain the learning rate of other participating devices. For example, the data volume of the small batch of training data in participating device 1 is 1000, and the data volume of small batch of training data in participating device 2 is 2000. If the learning rate of participating device 1 is set to 0.01, the calculated learning rate of participating device 2 is 0.02 . The participating equipment updates the local model according to the learning rate set by the coordinating equipment.
[0106] Further, the step of fusing the local model parameter updates received from each participating device in the step S30 includes:
[0107] Step S301: Perform a weighted average on the local model parameter updates received from each participating device to obtain the global model parameter update, wherein the weight of each participating device used in the weighted average operation is based on the corresponding The learning rate is calculated.
[0108] After the coordinating device sets the learning rate for the participating devices, when the coordinating device performs a weighted average of the local model parameter updates sent by each participating device, it can also add the influence of the learning rate to the weight, that is, the weighted average operation The weight of each participating device used may be calculated according to the learning rate corresponding to each participating device.
[0109] Specifically, the coordination device may set the weight value of each participating device according to other weight setting factors in advance, then multiply the weight value of each participating device by the corresponding learning rate, and then perform normalization processing to obtain each participating device and The weight associated with the learning rate uses the weight associated with the learning rate to perform a weighted average on the local model parameter update to obtain the global model parameter update.
[0110] In this embodiment, by adding the influence of the different learning rates of each participating device to the weight when performing a weighted average for each local model parameter update, the global model parameter update obtained by fusion can better reflect the local training data of each participating device Contribution to federated learning, thereby improving the quality of the neural network model obtained by training as a whole.
[0111] Further, based on the above-mentioned first, second and third embodiments, a fourth embodiment of the method for optimizing a horizontal federated learning system of the present invention is proposed. In this embodiment, the method for optimizing the horizontal federated learning system is applied to participate in horizontal federated learning. Participating devices of, the participating devices are in communication connection with the coordination devices participating in horizontal federated learning. The coordination devices and participating devices in the embodiment of the present invention may be devices such as smart phones, personal computers, and servers. In this embodiment, the optimization method of the horizontal federated learning system includes the following steps:
[0112] Step A10: Input the generator parameters into the random number generator, and determine the neuron opening and closing mode of the neural network model to be trained according to the output result of the random number generator, wherein, in the neuron opening and closing mode, Some neurons of the neural network model are in a closed state, and each participating device uses the same generator parameters to input the same random number generator in each local training of the neural network model;
[0113] In this embodiment, the coordination device and each participating device can establish a communication connection in advance through inquiry handshake authentication and identity authentication, and determine the neural network model to be trained in this federated learning. The neural network model of the same or similar structure may be constructed locally by each participating device, or the neural network model may be constructed by the coordination device and then sent to each participating device. Each participating device locally has training data for training the neural network model.
[0114] In horizontal federated learning, the coordination device and participating devices cooperate with each other to perform multiple global model updates for the neural network model to be trained. Model update refers to updating the model parameters of the neural network model, such as the connection weight value between neurons. , And finally get the neural network model that meets the quality requirements. In a global model update, each participating device uses their local training data to perform local training on the local neural network model to obtain a local model parameter update. The local model parameter update can be the gradient information used to update the model parameters, or it can be local Updated model parameters; each participating device sends its own local model parameter update to the coordination device; the coordination device merges each local model parameter update, such as a weighted average, to obtain the global model parameter update, and sends it to each participating device; Each participating device uses the global model parameter update to update the model parameters of the local neural network model, that is, to update the local neural network model to complete a global model update. After each global model update, the model parameters of the local neural network models of each participating device are synchronized.
[0115] In this embodiment, in order to avoid overfitting of the neural network model obtained by federated learning training, each participating device may randomly select part of the neurons in the neural network model to be trained to be closed during each local training. When the neuron is in the off state, the output of the neuron is set to 0, or the neuron does not output to the next neuron, or the neuron is disconnected from the downstream neuron.
[0116] The strategy of selecting neuron shutdown by random numbers can reduce the interaction between feature detectors (hidden layer neuron nodes). Detector interaction means that some detectors rely on other detectors to function, so that the training results The neural network model does not rely too much on some local features, improves the generalization ability of the model, and avoids over-fitting.
[0117] Specifically, the same random number generator can be set locally by each participating device. Each participating device can negotiate a random number generator with each other, or a coordinating device can generate a random number generator to each participating device to ensure that the random number generator in each participating device is the same.
[0118] Each participating device inputs generator parameters into a random number generator, and the random number generator generates one or more random numbers. Among them, the generator parameter is the input parameter of the random number generator, and the random number generator generates a random number according to the generator parameter. It should be noted that if two identical random number generators respectively input the same generator parameters, the random numbers generated are the same.
[0119] The participating equipment determines the neuron opening and closing mode of the neural network model to be trained according to the random number generated by the random number generator. There are multiple neurons in the neural network model. The neuron opening and closing mode is a combination mode of whether each neuron in the neural network model is on or off. In the neuron opening and closing mode, some neurons of the neural network model are It is closed. If a neuron is in the off state, then the neuron will not play a role in the neural network model.
[0120] It should be noted that in the process of federated learning for each participating device, each global model update will perform local training on the local neural network model. Each participating device uses the same local neural network model in each local training of the neural network model. The generator parameters are input to their respective local random number generators to ensure that in a global model update, the neuron opening and closing modes used by each participating device for local training of the local neural network model are the same, that is, to ensure the same In this global model update, each participating device adopts the same random selection result, so as to avoid the inconsistency of random selection results of each participating device, causing the strategy of randomly selecting neuron shutdown to lose statistical significance.
[0121] Each participating device can determine the neuron opening and closing mode according to the random number generated by the random number generator. For example, each participating device uses the random number generator to generate N random numbers before local training. For N neurons, compare the N random numbers with a preset value. If it is greater than the preset value, it is determined to close the corresponding neuron, and if it is not greater than, it is determined not to close the corresponding neuron.
[0122] Step A20: Open and close the neurons in the local neural network model according to the neuron opening and closing mode, and perform local training on the processed neural network model to obtain local model parameter updates and send them to the coordination device ;
[0123] After determining the neuron opening and closing mode, the participating equipment first conducts opening and closing processing on each neuron in the neural network model according to the neuron opening and closing mode before performing local training on their respective local neural network models. Specifically, in the neuron opening-and-closing mode, the neuron that is instructed to be closed is closed, and the neuron that is not instructed to be closed or the neuron that is instructed to be opened is not to be closed. After the opening and closing processing, the participating equipment then conducts local training on the processed neural network model to obtain local model parameter updates. Specifically, the participating device can input the local training data into the current neural network model to obtain the model output; calculate the loss function according to the model output and the local data label of the participating device, and then calculate the gradient information of the loss function to the model parameters, and the gradient The information is sent to the coordination device as a local model parameter update. It can also be that the participating device inputs the local training data into the current neural network model to obtain the model output; calculate the loss function according to the model output and the local data label of the participating device, then calculate the gradient of the loss function to the model parameters, and then use the gradient to Update the model parameters, and send the updated model parameters as local model parameter updates to the coordination device.
[0124] The coordination device receives the local model parameter updates sent by each participating device, and merges the local model parameter updates to obtain a global model parameter update. Specifically, the coordination device may perform a weighted average on each local model parameter update, and the weight value may be set according to the specific conditions of each participating device, for example, may be set according to the proportion of the local training data of each participating device. The coordinating device sends the global model parameter update to each participating device.
[0125] Further, in step A20, the step of opening and closing the neurons in the local neural network model according to the neuron opening and closing mode includes:
[0126] Step A201: Determine the neuron to be closed in the local neural network model according to the neuron opening and closing mode;
[0127] Step A202: Set the output of the neuron to be turned off to zero to turn off the neuron to be turned off.
[0128] The participating equipment can determine the neurons to be closed in the local neural network model according to the neuron opening and closing mode, that is, determine which neurons to close. Set the output of the neuron to be turned off to zero to achieve the purpose of turning off the neuron.
[0129] In addition, the process of shutting down the neuron by the participating device can also be to disconnect the neuron from the downstream neuron, or it can be that the output of the neuron is not transmitted to the next neuron.
[0130] Step A30, using the global model parameter update received from the coordination device to update the local neural network model, wherein the coordination device merges the local model parameter updates received from each participating device to obtain the global model Parameter update.
[0131] The participating equipment receives the global model parameter update sent by the coordinating equipment, and updates the respective local neural network model according to the global model parameter update. Specifically, if the received global model parameter update is gradient information, the participating device uses the gradient information and the current model parameters of the local neural network model to calculate the updated model parameters, and uses the updated model parameters as the latest model parameters , That is, a global model update is completed. If the received global model parameter update is a model parameter, the participating device uses the model parameter as the latest model parameter, that is, a global model update is completed.
[0132] Perform multiple global model updates in a loop. When the coordination device or one of the participating devices detects that the preset stopping conditions are met, the training can be stopped to obtain the final trained neural network model. The preset stopping condition may be a condition set in advance according to needs, and training is stopped when the condition is met, for example, it may be the convergence of the loss function, the number of iterations greater than a set number, or the training time greater than a set time.
[0133] In this embodiment, the same generator parameters are input to the same random number generator through each participating device, the neuron opening and closing mode is determined according to the output result of the random number generator, and the pairing is performed according to the neuron opening and closing mode. The neurons in the neural network model are opened and closed, and then the processed neural network model is trained locally, so that in each global model update of the federated learning, each neuron in the neural network model is randomly closed, reducing the nerve The interaction between the meta-nodes makes the trained neural network model not too dependent on some local features and improves the generalization ability of the model. In addition, since each participating device uses the same generator parameters to input the same random number generator in each local training, the neuron opening and closing mode is determined according to the output result, so that each participating device has a positive effect on the neuron during local training. The shutdown processing is aligned, thereby avoiding the inconsistency of the random selection results of each participating device and causing the strategy of randomly selecting neuron shutdown to lose statistical significance. In addition, the strategy of randomly selecting neuron shutdown adopted in the embodiment of the present invention can be well combined with federated learning and will not bring too much extra time compared to the existing schemes to avoid overfitting. Cost and computing resource consumption.
[0134] Further, the local training data of each participating device can be divided into multiple small batches of training data, and the number of small batches of training data divided by each participating device is guaranteed to be the same. When each participating device trains locally, multiple periods of local training can be performed. After the participating device traverses the local training data once, it is one period, and the number of periods for each participating device to perform local training in the same global model update is the same. During the traversal process, the participating device uses a small batch of training data to update the local neural network model once. Then in a global model update, the number of local model updates to be performed by a participating device is the local small Multiply the batch number of batch training data by the number of local training sessions. Each participating device can negotiate to determine the number of batches of training data and the number of local training periods; or the coordination device can determine the number of batches and periods according to the data volume of the local training data of each participating device, and then send them to each participant equipment.
[0135] By dividing the local training data of the participating device into multiple small batches, the amount of data to be processed by each local model update of the participating device will not be too large, thereby reducing the computing pressure of the device and avoiding too much data to cause the processor Crash or calculation time is too long. Participating equipment's multi-period local training can make full use of the local training data of the participating equipment, thereby reducing the number of global model parameter updates, thereby reducing the communication consumption of the coordination equipment and the participating equipment.
[0136] Based on this, the generator parameters of the participating device input random number generator may include the iterative index of global model update, the period index of local training, the batch index of small batch training data, and the neuron index of the neural network model. That is, the participating device inputs the iterative index of the global model update, the local training period index, the batch index of the small batch training data and the neuron index of the neural network model into the random number generator to obtain a random number, and the participating device Random numbers are used to determine the opening and closing states of the neurons corresponding to each index.
[0137] For example, when a participating device uses the m-th small batch of training data for local training under the k-th traversal in the t-th global parameter update, in order to determine the opening and closing state of the nth neuron, t, k, m and As a parameter group, enter the random number generator, and the random number generator generates a random number ρ between 0 and 1. If ρ is greater than a set value P, such as 0.5, the participating device determines to close the nth nerve If ρ is not greater than P, the participating device determines not to turn off the nth neuron.
[0138] In this embodiment, by dividing the local training data of each participating device into small batches of the same number of training data, each participating device performs the same number of local training sessions, so that each participating device can use the same random number generator and The generator parameters can easily unify the neuron opening and closing modes of each participating device during each local model update, so as to avoid the inconsistent random selection results of the neurons of each participating device and make the strategy of randomly turning off neurons lose statistical significance. The generalization ability of the trained neural network model is obtained.
[0139] In addition, an embodiment of the present invention also provides a computer-readable storage medium, the storage medium stores a horizontal federated learning system optimization program, and the horizontal federated learning system optimization program is executed by a processor to realize the horizontal federation as described below Learn the steps of the system optimization method.
[0140] The embodiments of the optimization device of the horizontal federated learning system and the computer-readable storage medium of the present invention can refer to the various embodiments of the method for optimizing the horizontal federated learning system of the present invention, which will not be repeated here.
[0141] It should be noted that in this article, the terms "include", "include" or any other variants thereof are intended to cover non-exclusive inclusion, so that a process, method, article or device including a series of elements not only includes those elements, It also includes other elements not explicitly listed, or elements inherent to the process, method, article, or device. If there are no more restrictions, the element defined by the sentence "including a..." does not exclude the existence of other identical elements in the process, method, article or device that includes the element.
[0142] The sequence numbers of the foregoing embodiments of the present invention are only for description, and do not represent the superiority of the embodiments.
[0143] Through the description of the above embodiments, those skilled in the art can clearly understand that the method of the above embodiments can be implemented by means of software plus the necessary general hardware platform. Of course, it can also be implemented by hardware, but in many cases the former is better. 的实施方式。 Based on this understanding, the technical solution of the present invention essentially or the part that contributes to the existing technology can be embodied in the form of a software product, and the computer software product is stored in a storage medium (such as ROM/RAM, magnetic disk, The optical disc) includes several instructions to enable a terminal device (which may be a mobile phone, a computer, a server, an air conditioner, or a network device, etc.) to execute the method described in each embodiment of the present invention.
[0144] The above are only the preferred embodiments of the present invention, and do not limit the scope of the present invention. Any equivalent structure or equivalent process transformation made using the content of the description and drawings of the present invention, or directly or indirectly applied to other related technical fields , The same reason is included in the scope of patent protection of the present invention.
PUM


Description & Claims & Application Information
We can also present the details of the Description, Claims and Application information to help users get a comprehensive understanding of the technical details of the patent, such as background art, summary of invention, brief description of drawings, description of embodiments, and other original content. On the other hand, users can also determine the specific scope of protection of the technology through the list of claims; as well as understand the changes in the life cycle of the technology with the presentation of the patent timeline. Login to view more.
Similar technology patents
Box-type house convenient to assemble and disassemble and installing method thereof
Owner:CHINA CONSTR SCI & TECH CO LTD
Overhead line channel environment parameter collecting device and method thereof
Owner:STATE GRID HUNAN ELECTRIC POWER +2
KPI-based fan manufacturing process-oriented comprehensive quality checking system
Owner:NORTH CHINA ELECTRIC POWER UNIV (BAODING) +1
Anti-adhesion agent for superabsorbent resin gel and preparation method thereof
Owner:JIANGSU SOBUTE NEW MATERIALS
Thermally activated delayed fluorescence material, preparation method thereof and organic electroluminescent device
Owner:WUHAN CHINA STAR OPTOELECTRONICS SEMICON DISPLAY TECH CO LTD
Classification and recommendation of technical efficacy words
- Improve generalization ability
- reduce interaction
Business risk assessment method and device, and risk control system
Owner:BEIJING QIYI CENTURY SCI & TECH CO LTD
Adaptive adversarial learning-based urban traffic scene semantic segmentation method and system
Owner:NANCHANG HANGKONG UNIVERSITY
Convolutional neural network-based single word pattern character recognition method and device
Owner:KINGDEE SOFTWARE(CHINA) CO LTD
Method and system for identifying falling fault of rail wagon hook lifting rod
Owner:HARBIN KEJIA GENERAL MECHANICAL & ELECTRICAL CO LTD
Segmented rotor type magnetic flux switching motor with hybrid excitation and magnetic adjustment method
Owner:NANJING UNIV OF AERONAUTICS & ASTRONAUTICS
Integrated passive devices
Owner:SYCHIP
Chromatography apparatus having diffusion-bonded and surface-modified components
Owner:WATERS TECH CORP
Aquatic amino acid preparation and preparation method and aquatic feed thereof
Owner:GUANGDONG HINAPHARM PHARMA CO LTD
Optical element with high scratch resistance
Owner:SCHOTT AG