Gradient boosting tree modeling method and device and terminal

A gradient boosting tree and modeling method technology, applied in the field of machine learning, can solve problems such as huge costs, achieve the effect of breaking data islands, promoting data circulation, and effective privacy protection

Pending Publication Date: 2020-12-08
BEIJING BAIDU NETCOM SCI & TECH CO LTD
View PDF0 Cites 2 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

Due to issues such as industry competition, privacy security, and complex administrative procedures, even data integration between different departments of the same company faces many obstacles. In reality, it is almost impossible to integrate data scattered in various places and institutions. It is impossible, or the required cost is huge

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Gradient boosting tree modeling method and device and terminal
  • Gradient boosting tree modeling method and device and terminal
  • Gradient boosting tree modeling method and device and terminal

Examples

Experimental program
Comparison scheme
Effect test

Embodiment 1

[0069] In a specific embodiment, a gradient boosting tree modeling method is provided, such as figure 1 shown, including:

[0070] Step S10: Perform an intersection operation on the first sample data set with label values ​​and multiple second sample data sets according to the identifiers to obtain the intersection of the first data set with label values ​​and the multiple second data sets.

[0071] In an example, taking two-party data as an example, a first sample data set and a second sample data set may be acquired. The first sample data set includes a plurality of first identifiers forming a column, a first label name forming a row, and a plurality of first feature names. For example, the first identification may include people such as Zhang San, Li Si, Wang Wu, Zhao Liu, etc., and the first tag name may include whether to buy insurance or not. The first feature name may include weight, income, height, and the like. The first sample data set also includes a first label ...

Embodiment 2

[0102] In one embodiment, multiple machines are provided, and a model training program is allowed on each machine. Then input the sample data corresponding to each machine into the corresponding local model training program, and start running the program at the same time. During the running of multiple programs, some encrypted information will be exchanged to jointly generate a model. The whole process is called the process of multi-party joint gradient boosting tree modeling.

[0103] Take the A machine and the B machine as examples for specific description. A and B have the first sample data set Qa and the second sample data set Qb respectively. Qa includes a plurality of first identifications forming rows, label names and a plurality of first feature names forming rows, and a plurality of first label values ​​and a plurality of first feature values ​​corresponding to a plurality of first identifications; Qb It includes a plurality of second identifiers forming a column, a...

Embodiment 3

[0125] In another specific embodiment, a gradient boosting tree modeling device is provided, such as Figure 8 shown, including:

[0126] The data set intersection module 10 is configured to perform an intersection operation on the first sample data set with a label value and a plurality of second sample data sets according to the identification to obtain the first data intersection with a label value and a plurality of second data sets intersection;

[0127] The target value encryption module 20 is used to obtain the target value of the first decision tree according to the label value and the predicted value of the previous decision tree, and encrypt the target value of the first decision tree to obtain the encryption of the first decision tree target value;

[0128] An optimal split point determining module 30, configured to determine the second data intersection according to the target value of the first decision tree, the first data intersection, the encryption target va...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The embodiment of the invention provides a gradient boosting tree modeling method and device and a terminal, and the method comprises the steps: carrying out the intersection operation of a first sample data set with a label value and a plurality of second sample data sets according to an identification, and obtaining a first data intersection set with a label value and a plurality of second dataintersection sets; obtaining a target value of the first decision tree according to the label value, and encrypting the target value of the first decision tree to obtain an encrypted target value of the first decision tree; determining an optimal splitting point of the first decision tree according to the target value of the first decision tree, the first data intersection, the encryption target value of the first decision tree and the second data intersection; splitting the node at the position of the optimal splitting point of the first decision tree to obtain a second decision tree; after the first decision tree is subjected to iteration of a preset training round number, an Nth decision tree is generated, and N is larger than or equal to 2; and obtaining a gradient boosting tree modelaccording to the first decision tree to the Nth decision tree. Multi-party joint gradient boosting tree modeling is adopted, and respective private data cannot be leaked.

Description

technical field [0001] The present invention relates to the technical field of machine learning, in particular to a gradient boosting tree modeling method, device and terminal. Background technique [0002] The gradient boosting tree (GBDT) algorithm is an iterative decision tree algorithm, which consists of multiple decision trees, and the conclusions of all trees are accumulated to make the final answer. The gradient boosting tree algorithm is one of the best algorithms for fitting real distributions in traditional machine learning algorithms. With the development of algorithms and big data, algorithms and computing power are no longer the bottlenecks that hinder the development of AI, and real and effective data sources in various fields are the most valuable resources. However, there are barriers between data sources that are hard to break, and in most industries, data exists in silos. Due to issues such as industry competition, privacy security, and complex administra...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
Patent Type & Authority Applications(China)
IPC IPC(8): G06N20/00
CPCG06N20/00
Inventor 宋传园冯智张宇
Owner BEIJING BAIDU NETCOM SCI & TECH CO LTD
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products