A third-party federated gradient boosting decision tree model training method

A decision tree and gradient technology, applied in character and pattern recognition, data processing applications, finance, etc., can solve problems such as data leakage, space consumption, and difficulty in finding a trusted third party, so as to protect data security and reduce storage space , to ensure the effect of training accuracy

Active Publication Date: 2022-04-26
蓝象智联(杭州)科技有限公司
View PDF12 Cites 0 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

The first batch of performance evaluation data of the privacy computing institute directly under the Ministry of Industry and Information Technology shows that the average time-consuming of federal tree modeling with 900 features and 400,000 samples in the industry is 2 hours, 23 minutes and 47 seconds, which is difficult to meet the needs of the industry;
[0005] 2. There are third-party assistants to participate in training and synchronize model parameter distribution, but it is difficult to find a trusted third party for actual commercial implementation, and there is a risk of data leakage;
[0006] 3. The existing feature value storage efficiency is low, and a data set with 900 features and 400,000 samples needs to occupy 3.9G space
If the intermediate results of federated gradient boosting decision tree model training are stored on the local disk, a federated gradient boosting decision tree model training will consume more than 10G of space

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • A third-party federated gradient boosting decision tree model training method

Examples

Experimental program
Comparison scheme
Effect test

Embodiment

[0100] Embodiment: A third-party-free federated gradient boosting decision tree model training method in this embodiment is used for joint risk control modeling between banks and operators, such as figure 1 shown, including the following steps:

[0101] S1: The training initiator and training participants synchronously initialize the model parameters of their respective federated gradient boosting decision tree models; the model parameters include the depth of the federated gradient boosting decision tree, the number of federated gradient boosting decision trees, the sampling rate of large gradient samples, and the small gradient Sample sampling rate, tree column sampling rate, tree row sampling rate, learning rate, maximum number of leaves, minimum number of node samples after splitting, minimum profit of splitting, number of bins, L2 regularization, L1 regularization, termination threshold, modeling method;

[0102] S2: The training initiator samples d sample data sets x fro...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention discloses a third-party federated gradient boosting decision tree model training method. It includes the following steps: synchronous initialization of the training initiator and training participants; synchronous sampling of d sample data sets by the training initiator and training participants; binning of each feature data in their respective sample data sets by the training initiator and training participants , Record binning information and store bit slices; the training initiator calculates the first-order gradient sum and second-order gradient sum corresponding to each binning of each feature data of each sample data set, and the training initiator, training participants The party calculates the first-order gradient sum and the second-order gradient sum corresponding to each sub-bin of each feature data in the training participant's sample data set according to the safe multiplication protocol; the training initiator searches for the optimal split point, and synchronizes the results to the training participants side; repeat the above steps until the termination condition is met. The invention protects data security, reduces storage space, and greatly compresses communication traffic.

Description

technical field [0001] The invention relates to the technical field of gradient boosting decision tree model training, in particular to a third-party federated gradient boosting decision tree model training method. Background technique [0002] The federated gradient boosting decision tree model can solve both classification problems and regression problems, and has good interpretability, so it is widely used in the field of federated learning, especially in the field of bank risk control. The federated gradient boosting decision tree model is a very practical tree model. In the federated gradient boosting decision tree model, each participant calculates the first and second derivatives of the decision tree based on local data, and decides the best In this process, the first and second derivatives of different participants need to be added. Additive homomorphic encryption can be used to protect the data privacy of each participant from being leaked to the tree model during t...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
Patent Type & Authority Patents(China)
IPC IPC(8): G06K9/62G06Q40/02G06F30/27
CPCG06Q40/02G06F30/27G06F18/214
Inventor 郭梁徐时峰刘洋裴阳毛仁歆宋鎏屹
Owner 蓝象智联(杭州)科技有限公司
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products