Decision tree binning computation optimization method, device, medium, and computer program product

By partitioning and parallelizing the memory storage data and index table of decision tree binning, the problem of high computational complexity in decision tree binning in existing technologies is solved, thereby improving computational efficiency.

CN114637591BActive Publication Date: 2026-06-12WEBANK (CHINA)

Patent Information

Authority / Receiving Office
CN · China
Patent Type
Patents(China)
Current Assignee / Owner
WEBANK (CHINA)
Filing Date
2020-12-15
Publication Date
2026-06-12

AI Technical Summary

Technical Problem

In existing technologies, when performing decision tree binning using CPU-based serial computing, the computational complexity is extremely high and the computational efficiency is low due to the extremely large number of bits in the large data.

Method used

By acquiring the memory storage data and index table of the target decision tree, the node bin labels are segmented based on the size of the decision tree bins to obtain the bin label segmentation batches. The decision tree bin batch data is then extracted according to the index table, and the number of parallel computing threads is matched to achieve parallel computing.

🎯Benefits of technology

It reduces computational complexity, improves the computational efficiency of decision tree binning, and overcomes the problem of low computational efficiency under the serial computation method.

✦ Generated by Eureka AI based on patent content.

Smart Images

  • Figure CN114637591B_ABST
    Figure CN114637591B_ABST
Patent Text Reader

Abstract

The application discloses a decision tree binning calculation optimization method, equipment, medium and computer program product. The decision tree binning calculation optimization method comprises the following steps: obtaining memory storage data corresponding to a target decision tree and an index table corresponding to the memory storage data, wherein the index table at least comprises a node binning label corresponding to each decision tree binning; then, based on the size of each decision tree binning, the node binning label is segmented to obtain a segmented batch of each binning label; then, based on the index table, each decision tree binning batch data is extracted; then, based on the decision tree binning size information corresponding to each segmented batch of binning label, a corresponding number of parallel computing threads is matched for each decision tree binning batch data; then, based on the number of parallel computing threads, parallel decision tree binning calculation is performed on each decision tree binning batch data to obtain a target decision tree binning calculation result. The application solves the technical problem of low calculation efficiency during decision tree binning.
Need to check novelty before this filing date? Find Prior Art

Description

Technical Field

[0001] This application relates to the field of artificial intelligence in financial technology (Fintech), and more particularly to a decision tree binning computation optimization method, device, medium, and computer program product. Background Technology

[0002] With the continuous development of fintech, especially internet fintech, more and more technologies (such as distributed systems, blockchain, artificial intelligence, etc.) are being applied in the financial field. However, the financial industry is also placing higher demands on technology, such as on the distribution of tasks to be completed.

[0003] With the continuous development of computer software and artificial intelligence, the application fields of machine learning are becoming more and more extensive. Currently, when performing decision tree binning based on federated learning, it is usually necessary to handle large number binning in dense state. At this time, it is necessary to sum the multiple large numbers in each bin. Currently, the large number summation for each decision tree bin is usually performed by CPU-based serial computation. However, since the number of bits corresponding to the large numbers is extremely large, the computational complexity of this method is extremely high, resulting in extremely low computational efficiency. Summary of the Invention

[0004] The main objective of this application is to provide a method, device, medium, and computer program product for optimizing decision tree binning computation, aiming to solve the technical problem of low computational efficiency in decision tree binning in the prior art.

[0005] To achieve the above objectives, this application provides a decision tree binning calculation optimization method, which is applied to a decision tree binning calculation optimization device. The decision tree binning calculation optimization method includes:

[0006] Obtain the memory storage data corresponding to the target decision tree and the index table corresponding to the memory storage data, wherein the index table includes at least one node bin label;

[0007] Based on the size of each decision tree bin, the bin labels of each node are segmented to obtain the segmentation batch of each bin label;

[0008] Based on the index table, the decision tree binning batch data corresponding to each binning label segmentation batch is extracted from the memory storage data;

[0009] Based on the bin size information of the decision tree binning batches corresponding to each binning label, the number of parallel computing threads corresponding to the data matching of each decision tree binning batch is determined.

[0010] Based on the number of parallel computing threads, decision tree binning calculations are performed in parallel on each batch of decision tree binning data to obtain the target decision tree binning calculation results.

[0011] This application also provides a decision tree binning computation optimization device, which is a virtual device and is applied to a decision tree binning computation optimization equipment. The decision tree binning computation optimization device includes:

[0012] The acquisition module is used to acquire the memory storage data corresponding to the target decision tree and the index table corresponding to the memory storage data, wherein the index table includes at least one node bin label;

[0013] The segmentation module is used to segment the bin labels of each node based on the size of each bin in the decision tree, and obtain the segmentation batch of each bin label;

[0014] The extraction module is used to extract the decision tree binning batch data corresponding to each binning label segmentation batch from the memory storage data according to the index table;

[0015] The matching module is used to segment the decision tree bin size information corresponding to each bin label and to match the number of parallel computing threads corresponding to each decision tree bin batch data.

[0016] The parallel computing module is used to perform decision tree binning calculations in parallel on each batch of decision tree binning data based on the number of parallel computing threads, and to obtain the target decision tree binning calculation result.

[0017] This application also provides a decision tree binning computation optimization device, which is a physical device. The decision tree binning computation optimization device includes: a memory, a processor, and a program of the decision tree binning computation optimization method stored in the memory and executable on the processor. When the program of the decision tree binning computation optimization method is executed by the processor, it can implement the steps of the decision tree binning computation optimization method as described above.

[0018] This application also provides a medium, which is a readable storage medium, on which a program implementing the decision tree binning computation optimization method is stored. When the program is executed by a processor, it implements the steps of the decision tree binning computation optimization method as described above.

[0019] This application also provides a computer program product, including a computer program that, when executed by a processor, implements the steps of the decision tree binning computation optimization method described above.

[0020] This application provides a method, device, medium, and computer program product for optimizing decision tree binning computation. Compared to the existing technology that uses CPU-based serial computation to perform large number summation on each decision tree bin, this application obtains the memory storage data corresponding to the target decision tree and an index table corresponding to the memory storage data. The index table includes at least one node bin label. Based on the size of each decision tree bin, the node bin labels are segmented to obtain batches of bin label segments. Then, according to the index table, the decision tree bin batch data corresponding to each bin label segmentation batch is extracted from the memory storage data. It should be noted that since a larger decision tree bin requires more parallel computing threads, if computation is not performed in batches, the computing system will continuously perform parallel computation using the maximum number of threads corresponding to the largest decision tree bin, increasing computational complexity. Therefore, the batch data is segmented based on the size of each bin label segmentation batch. The decision tree bin size information is used to match the number of parallel computing threads corresponding to each batch of decision tree bin data. That is, based on the size of the decision tree bins in each batch of decision tree bins, an appropriate number of threads is reasonably allocated to each batch of decision tree bins, so that the number of parallel computing threads is not excessive, thereby reducing the computational complexity. Furthermore, the original CPU-based serial computing method is optimized to a parallel computing method, further reducing the computational complexity. Based on the number of parallel computing threads, decision tree binning calculations are performed in parallel on each batch of decision tree bin data to obtain the target decision tree binning calculation result. This optimizes the computational efficiency of decision tree binning, overcoming the technical defect of the existing technology that uses CPU-based serial computing to perform large number summation on each decision tree bin, which results in extremely high computational complexity and low computational efficiency due to the extremely large number of bits corresponding to the large number. Therefore, the computational efficiency of decision tree binning is improved. Attached Figure Description

[0021] The accompanying drawings, which are incorporated in and form part of this specification, illustrate embodiments consistent with this application and, together with the description, serve to explain the principles of this application.

[0022] To more clearly illustrate the technical solutions in the embodiments of this application or the prior art, the drawings used in the description of the embodiments or the prior art will be briefly introduced below. Obviously, for those skilled in the art, other drawings can be obtained based on these drawings without creative effort.

[0023] Figure 1 This is a flowchart illustrating the first embodiment of the decision tree binning calculation optimization method of this application;

[0024] Figure 2This is a schematic diagram of the index table described in the decision tree binning calculation optimization method of this application;

[0025] Figure 3 This is a flowchart illustrating the second embodiment of the decision tree binning calculation optimization method of this application;

[0026] Figure 4 This is a schematic diagram illustrating the bitoneic collaborative sorting process used in the decision tree binning computation optimization method of this application.

[0027] Figure 5 This is a schematic diagram illustrating the large number reduction and summation process in the decision tree binning computation optimization method of this application;

[0028] Figure 6 This is a schematic diagram of the device structure of the hardware operating environment involved in the embodiments of this application.

[0029] The purpose, features, and advantages of this application will be further explained in conjunction with the embodiments and with reference to the accompanying drawings. Detailed Implementation

[0030] It should be understood that the specific embodiments described herein are for illustrative purposes only and are not intended to limit the scope of this application.

[0031] This application provides a method for optimizing decision tree binning computation. In the first embodiment of this method, refer to... Figure 1 The decision tree binning computation optimization method includes:

[0032] Step S10: Obtain the memory storage data corresponding to the target decision tree and the index table corresponding to the memory storage data, wherein the index table includes at least one node bin label;

[0033] In this embodiment, it should be noted that the decision tree binning calculation optimization method is applied to decision tree binning based on federated learning. In this case, the data in the decision tree bins are all homomorphically encrypted large numbers. For example, each data in the decision tree bin needs to be stored using 1024 bits, so each data in the decision tree bin can be considered as a large number. Moreover, the decision tree bin may include multiple decision tree bin data, which results in a large amount of computation and high computational complexity during decision tree binning calculation.

[0034] Additionally, it should be noted that the memory-stored data refers to the decision tree binning data within each decision tree bin of the target decision tree stored in GPU memory. In this case, each decision tree binning data is stored in GPU memory in binary data form. The index table includes at least one index number and one node binning label, wherein one index number corresponds to one node binning label. The index number is the storage location label of the decision tree binning data in the GPU, used to query the decision tree binning data in the memory-stored data. The node binning label represents the decision tree... The bin identifier is used to identify the identity of the decision tree bin. The node bin label includes the tree node number, feature number, and bin number. The tree node number is the identifier of the tree node, used to indicate the identity of the tree node to which the decision tree bin belongs. The feature number is the identifier of the feature, used to indicate the identity of the feature to which the decision tree bin belongs. The bin number is the identifier of the bin, used to indicate the identity of the bin. For example, if the node bin label is 123, it means that the decision tree bin corresponding to the node bin label has a tree node number of 1, a feature number of 2, and a bin number of 3.

[0035] Obtain the memory storage data corresponding to the target decision tree and the index table corresponding to the memory storage data. The index table includes at least one node bin label. Specifically, by traversing the target decision tree, the list-style decision list data corresponding to the target decision tree is stored in GPU memory to obtain the memory storage data. The target decision tree includes at least one decision tree bin, and each decision tree bin includes at least one decision tree bin data. Then, obtain the node bin label corresponding to each decision tree bin data and the storage location label of each decision tree bin data in the memory storage data. Based on the correspondence between each node bin label and the storage location label, construct an index table. Since a decision tree bin may contain one or more decision tree bin data, and one node bin label corresponds to one decision tree bin, one node bin label can correspond to one or more decision tree bin data. Figure 2 The diagram shows the index table, where the vertical axis represents the node binning label and the horizontal axis represents the storage location label, i.e., the index number. 001, 002, 003, and 004 are the node binning labels, and the remaining numbers in the index table represent the storage location labels. -1 indicates that the corresponding decision tree binning data does not exist in the stored data. "Node id" is the tree node number, "Feature id" is the feature number, and "Value id" is the binning number.

[0036] The step of obtaining the memory storage data corresponding to the target decision tree and the index table corresponding to the memory storage data, wherein the index table includes at least one node bin label, includes:

[0037] Step S11: Traverse the target decision tree to obtain the node bin labels and memory storage data corresponding to each decision tree bin.

[0038] In this embodiment, the target decision tree is traversed to obtain the node bin labels corresponding to each decision tree bin and the memory storage data. Specifically, by traversing the target decision tree, a list of decision tree bin data and the node bin labels corresponding to each decision tree bin data are obtained, and the list of decision tree bin data is stored in GPU memory to obtain the memory storage data.

[0039] The step of traversing the target decision tree to obtain the node bin labels corresponding to each decision tree bin and the memory-stored data includes:

[0040] Step S111: Traverse the target decision tree to obtain the bin list data that is common to each bin of the decision tree and the bin label information corresponding to each bin of the decision tree;

[0041] In this embodiment, the target decision tree is traversed to obtain bin list data and bin label information corresponding to each decision tree bin. Specifically, the target decision tree is traversed to obtain the decision tree bin data in the list form of each decision bin, obtain the bin list data, and obtain the tree node number, feature number, and bin as the bin label information corresponding to each decision tree bin data in the target decision tree.

[0042] Step S112: Convert the bin list data to GPU memory to obtain the memory storage data;

[0043] In this embodiment, the bin list data is converted to GPU memory to obtain the memory storage data. Specifically, the bin list data is converted to binary data form and stored in GPU memory to obtain the memory storage data.

[0044] Step S113: Generate each node's sub-bin label based on the tree node number, feature number, and sub-bin number in each sub-bin label information.

[0045] In this embodiment, based on the tree node number, feature number, and bin number in each bin label information, each node bin label is generated. Specifically, the tree node number, feature number, and bin number in each bin label information are arranged and combined in a preset order to obtain the node bin label corresponding to each decision tree bin data.

[0046] Step S12: Based on the storage location of the decision tree binning data in each decision tree binning in the memory storage data, generate a storage location label corresponding to each decision tree binning data.

[0047] In this embodiment, based on the storage location of the decision tree binning data in each decision tree binning in the memory storage data, a storage location label corresponding to each decision tree binning data is generated. Specifically, based on the storage location of the decision tree binning data in each decision tree binning in the memory storage data, an index number corresponding to each decision tree binning data is generated, and each index number is used as the storage location label corresponding to the decision tree binning data.

[0048] Step S13: Construct the index table based on the association between the node binning labels and the corresponding storage location labels of each decision tree binning data.

[0049] In this embodiment, an index table is constructed based on the association between the node binning labels and the corresponding storage location labels corresponding to each decision tree binning data. Specifically, the association between each node binning label and each storage location label is determined based on the correspondence between each decision tree binning data and each node binning label, and the correspondence between each decision tree binning data and each storage location label, and the index table is constructed accordingly.

[0050] Step S20: Based on the size of each decision tree bin, segment the bin labels of each node to obtain the segmentation batch of each bin label;

[0051] In this embodiment, it should be noted that the size of the decision tree bin is the number of decision tree bin data contained in the decision tree bin.

[0052] Based on the size of each decision tree bin, the bin labels of each node are segmented to obtain each bin label segmentation batch. Specifically, when traversing the target decision tree, the number of decision tree bins contained in each decision tree bin is determined to obtain the size of each decision tree bin. Then, based on the size of each decision tree bin, the bin labels of each node in the index table are segmented into different batches of node bin label sets to obtain each bin label segmentation batch. For example, if the target decision tree includes 100 decision tree bins, then 50 decision tree bins with sizes from 0 to 16 are taken as one bin label segmentation batch, and 50 decision tree bins with sizes from 17 to 32 are taken as one bin label segmentation batch.

[0053] Step S30: Based on the index table, extract the decision tree binning batch data corresponding to each binning label segmentation batch from the memory storage data;

[0054] In this embodiment, based on the index table, decision tree binning batch data corresponding to each binning label segmentation batch is extracted from the memory storage data. Specifically, based on the index table, the storage location label corresponding to each node binning label in each binning label segmentation batch is queried, and then based on each storage location label, decision tree binning data corresponding to each node location label in each binning label segmentation batch is extracted from the memory storage data to obtain the decision tree binning batch data corresponding to the binning label segmentation batch.

[0055] Step S40: Based on the bin size information of the decision tree corresponding to each bin label segment batch, the number of parallel computing threads corresponding to each decision tree bin batch data is matched.

[0056] In this embodiment, it should be noted that the decision tree bin size information includes the size of each decision tree bin corresponding to the bin label segmentation batch.

[0057] Based on the bin size information of the decision tree corresponding to each bin label segmentation batch, the number of parallel computing threads corresponding to each decision tree bin batch data is matched. Specifically, based on the size of each decision tree bin corresponding to each bin label segmentation batch, the number of threads corresponding to each bin label segmentation batch is queried to obtain the number of parallel computing threads corresponding to each decision tree bin batch data.

[0058] The decision tree bin size information includes a decision tree bin size range.

[0059] The step of matching the number of parallel computing threads for each decision tree binning batch based on the binning label segmentation batch data includes:

[0060] Step S41: Obtain the decision tree bin size range corresponding to each bin label segmentation batch;

[0061] In this embodiment, specifically, the size of each decision tree bin in each bin label segmentation batch is obtained, and then the range of decision tree bin sizes corresponding to each bin label segmentation batch is determined based on the size of each decision tree bin in each bin label segmentation batch.

[0062] Step S42: Match the upper limit of the bin size range of each decision tree with the corresponding number of threads to obtain the number of parallel computing threads for each decision tree.

[0063] In this embodiment, it should be noted that, in one feasible solution, the number of parallel computing threads is set to the smallest even number greater than half the upper limit of the decision tree bin size range; in another feasible solution, the number of parallel computing threads is set to the smallest even number greater than half the upper limit of the decision tree bin size range, which is 2. N , where N is a positive integer.

[0064] Step S50: Based on the number of parallel computing threads, perform decision tree binning calculations in parallel on each batch of decision tree binning data to obtain the target decision tree binning calculation result.

[0065] In this embodiment, based on the number of parallel computing threads, decision tree binning calculations are performed in parallel on each batch of decision tree binning data to obtain the target decision tree binning calculation result. Specifically, parallel computing threads corresponding to the number of parallel computing threads for each batch of decision tree binning data are allocated sequentially, and decision tree binning calculations are performed until all batches of decision tree binning data have completed the calculation and the target decision tree binning calculation result is obtained.

[0066] The step of performing decision tree binning calculations in parallel on each batch of decision tree binning data based on the number of parallel computing threads to obtain the target decision tree binning calculation result includes:

[0067] Step S51: Based on the number of parallel computing threads for each decision tree binning batch, allocate corresponding GPU parallel computing thread groups for each decision tree binning batch data.

[0068] In this embodiment, it should be noted that the GPU parallel computing thread group includes the GPU parallel computing threads corresponding to the number of parallel computing threads.

[0069] Step S52: Based on each GPU parallel computing thread group, perform large number reduction summation on the decision tree binning data in each decision tree binning batch data to obtain the target decision tree binning calculation result.

[0070] In this embodiment, based on each GPU parallel computing thread group, the decision tree binning data in each decision tree binning batch data is subjected to large number reduction summation to obtain the target decision tree binning calculation result. Specifically, for each GPU parallel computing thread group, the following steps are performed:

[0071] Based on each GPU parallel computing thread in the GPU parallel computing thread group, the large number reduction summation is performed sequentially on each decision tree bin in the batch data of the decision tree bins corresponding to the GPU parallel computing thread group. After all the batch data of the decision tree bins have been calculated, the large number reduction summation result corresponding to each decision tree bin is obtained, that is, the target decision tree bin calculation result is obtained. The large number reduction summation process is as follows: For example, assuming that there are 4 decision tree bins A, B, C and D in decision tree bin M, the GPU parallel computing thread group includes parallel computing thread X and parallel computing thread Y. Then, the sum of A and C, A+C, is calculated based on X, and the sum of B and D, B+D, is calculated based on Y. Then, through one of the parallel computing threads in the GPU parallel computing thread group, the sum of A+C and B+D, A+B+C+D, is calculated, thus completing the large number reduction summation of decision tree bin M.

[0072] This application provides an optimized method for decision tree binning computation. Compared to the existing technology that uses CPU-based serial computation to sum large numbers for each decision tree bin, this application obtains the memory storage data corresponding to the target decision tree and an index table corresponding to the memory storage data. The index table includes at least one node bin label. Based on the size of each decision tree bin, the node bin labels are segmented to obtain each bin label segmentation batch. Then, according to the index table, the decision tree bin batch data corresponding to each bin label segmentation batch is extracted from the memory storage data. It should be noted that since a larger decision tree bin requires more parallel computing threads, if computation is not performed in batches, the computing system will continuously perform parallel computation using the maximum number of threads corresponding to the largest decision tree bin, increasing computational complexity. Therefore, the method optimizes the decision tree segmentation based on each bin label segmentation batch. The bin size information corresponds to the number of parallel computing threads for each batch of decision tree binning data. That is, based on the bin size in each batch of decision tree binning, an appropriate number of threads is allocated to each batch, ensuring that the number of parallel computing threads is not excessive, thereby reducing computational complexity. Furthermore, the original CPU-based serial computing method is optimized to a parallel computing method, further reducing computational complexity. Then, based on the number of parallel computing threads, decision tree binning calculations are performed in parallel on each batch of decision tree binning data to obtain the target decision tree binning calculation result. This optimizes the computational efficiency of decision tree binning, overcoming the technical defect of existing technologies that use CPU-based serial computing to perform large number summation on each decision tree bin, resulting in extremely high computational complexity and low computational efficiency due to the extremely large number of bits in the large numbers. Therefore, this method improves the computational efficiency of decision tree binning.

[0073] Furthermore, referring to Figure 3 Based on the first embodiment of this application, in another embodiment of this application, the step of segmenting the target decision tree based on the size of each decision tree bin to obtain the batch of decision tree bin segments includes:

[0074] Step S21: Based on the size of each decision tree bin, sort the bin labels of each node to obtain a label sorting sequence;

[0075] In this embodiment, based on the size of each decision tree bin, the labels of each node bin are sorted to obtain a label sorting sequence. Specifically, by performing a bimodal co-sorting method on the size of each decision tree bin and the labels of each node bin, the labels of each node bin are sorted based on the size of each decision tree bin to obtain a label sorting sequence, such as... Figure 4 The diagram illustrates the bitone collaborative sorting process, where “NID FID VAL” represents the node bin labels, “Node size” represents the bin size of the decision tree, and “BitionicSort” represents the bitone collaborative sorting. The left side of “BitionicSort” shows the bin sizes of each decision tree and the bin labels of each node before the bitone collaborative sorting, while the right side of “BitionicSort” shows the bin sizes of each decision tree and the bin labels of each node after the bitone collaborative sorting, which is the label sorting sequence.

[0076] Step S22: Based on the information on the degree of change in the size of the boxes corresponding to the label sorting sequence, the label sorting list is segmented to obtain the label segmentation batches for each box.

[0077] In this embodiment, it should be noted that the bin size change information refers to the change in the size of each decision tree bin in the label sorting sequence, which is used to indicate the change in the size of each decision tree bin in the label sorting sequence.

[0078] Based on the bin size change information corresponding to the label sorting sequence, the label sorting list is segmented to obtain each bin label segmentation batch. Specifically, based on the bin size change information corresponding to the label sorting sequence, the label sorting list is segmented into each bin label segmentation batch, so that the size change of the decision tree bins corresponding to each bin label segmentation batch is less than a preset size change threshold. This ensures that the size change of the decision tree bins corresponding to the same bin label segmentation batch is not too large. Consequently, when allocating computing memory simultaneously for each decision tree bin in each decision tree bin within the same bin label segmentation batch, excessive idle computing memory is not allocated, thus reducing the proportion of idle computing memory during binning calculation and reducing the calculation process related to idle computing memory. Therefore, the computational efficiency during binning calculation is improved.

[0079] The information regarding the degree of change in bin size includes information on the gradient of bin size arrangement.

[0080] The step of segmenting the label sorting list based on the information on the degree of change in box size corresponding to the label sorting sequence to obtain the batch segmentation of each box label includes:

[0081] Step S221: Calculate the bin size arrangement gradient information corresponding to the label sorting sequence based on the size of each decision tree bin.

[0082] In this embodiment, it should be noted that the bin size arrangement gradient information is the degree of size change between the decision tree bins corresponding to every two adjacent node bin labels in the label sorting sequence, used to represent the degree of size change between the decision tree bins corresponding to every two adjacent node bin labels in the label sorting sequence.

[0083] Based on the size of each decision tree bin, the bin size arrangement gradient information corresponding to the label sorting sequence is calculated. Specifically, based on the size of each decision tree bin and a preset bin size arrangement gradient calculation formula, the bin size arrangement gradient corresponding to each label sorting sequence is calculated to obtain the bin size arrangement gradient information. The preset bin size arrangement gradient calculation formula is as follows:

[0084]

[0085] Where τ is the gradient of the bin size arrangement, node n+1 To arrange the bin sizes of the decision tree corresponding to the bin labels of the (n+1)th node, node n The size of the decision tree bin corresponding to the bin label of the nth node.

[0086] Step S222: Based on the bin size arrangement gradient information, select each segment point in the label sorting sequence;

[0087] In this embodiment, based on the bin size arrangement gradient information, segmentation points are selected in the label sorting sequence. Specifically, each bin size arrangement gradient is compared with a preset bin size arrangement gradient threshold to select target bin size arrangement gradients that are greater than the preset bin size arrangement gradient threshold. The connection point of the two node bin labels corresponding to the two decision tree bin sizes of each target bin size arrangement gradient in the label sorting sequence is taken as each segmentation point. For example, assuming that gradient A is greater than the preset bin size arrangement gradient threshold, the two node bin labels X and Y corresponding to the two decision tree bin sizes of gradient A are determined, and then the connection point of X and Y in the label sorting sequence is taken as the segmentation point.

[0088] Step S223: Based on each of the segmentation points, the label sorting list is segmented to obtain each of the box label segmentation batches.

[0089] In this embodiment, as Figure 5The diagram illustrates the large number reduction summation process. Here, bignm represents the decision tree binning data, the lines connecting the decision tree binning data represent parallel computing threads, 001, 002, 003, and 004 are the binning labels of the nodes, and -1 indicates that there is no decision tree binning data, which is a free memory block. Therefore, since large number reduction summation needs to be performed, there will always be calculations related to free memory blocks, resulting in the consumption of memory and computing resources.

[0090] Furthermore, in this embodiment, since each segmentation point is determined based on the bin size arrangement gradient information, the number of decision tree bin data contained in each decision tree bin in each bin label segmentation batch will not differ too much. This reduces the proportion of free memory blocks when performing decision tree binning calculations. In other words, there will not be too many free memory blocks when allocating memory for decision tree bin data in each decision tree bin, thereby reducing memory resource consumption and reducing the calculation process for free memory blocks. Therefore, the calculation efficiency of decision tree binning is improved.

[0091] This application provides a method for segmenting a label sorting list into binning label segmentation batches. Specifically, based on the size of each decision tree bin, the labels of each node bin are sorted to obtain a label sorting sequence. Based on the size of each decision tree bin, the bin size arrangement gradient information corresponding to the label sorting sequence is calculated. Then, the label sorting list is segmented into binning label segmentation batches based on the bin size arrangement gradient information. Since each segmentation point is determined based on the bin size arrangement gradient information, the number of decision tree bins contained in each decision tree bin within each binning label segmentation batch will not differ significantly. This reduces the proportion of free memory blocks during decision tree binning calculations. In other words, there will not be excessive free memory blocks when allocating memory for decision tree binning data in each decision tree bin, thereby reducing memory resource consumption and the calculation process related to free memory blocks. Therefore, the computational efficiency of decision tree binning is improved.

[0092] Reference Figure 6 , Figure 6 This is a schematic diagram of the device structure of the hardware operating environment involved in the embodiments of this application.

[0093] like Figure 6As shown, the decision tree binning computation optimization device may include: a processor 1001, such as a CPU, a memory 1005, and a communication bus 1002. The communication bus 1002 is used to establish communication between the processor 1001 and the memory 1005. The memory 1005 may be a high-speed RAM or a stable, non-volatile memory, such as a disk drive. Optionally, the memory 1005 may also be a storage device independent of the aforementioned processor 1001.

[0094] Optionally, the decision tree binning optimization device may also include a rectangular user interface, a network interface, a camera, RF (Radio Frequency) circuitry, sensors, audio circuitry, a WiFi module, etc. The rectangular user interface may include a display screen and an input submodule such as a keyboard; optionally, the rectangular user interface may also include standard wired or wireless interfaces. The network interface may optionally include standard wired or wireless interfaces (such as a Wi-Fi interface).

[0095] Those skilled in the art will understand that Figure 6 The structure of the decision tree binning computation optimization device shown in the figure does not constitute a limitation on the decision tree binning computation optimization device. It may include more or fewer components than shown, or combine certain components, or have different component arrangements.

[0096] like Figure 6 As shown, the memory 1005, which serves as a computer storage medium, may include an operating system, a network communication module, and a decision tree binning optimization program. The operating system is a program that manages and controls the hardware and software resources of the decision tree binning optimization device, supporting the operation of the decision tree binning optimization program and other software and / or programs. The network communication module is used to enable communication between the various components within the memory 1005, as well as communication with other hardware and software in the decision tree binning optimization system.

[0097] exist Figure 6 In the decision tree binning computation optimization device shown, the processor 1001 is used to execute the decision tree binning computation optimization program stored in the memory 1005 to implement the steps of the decision tree binning computation optimization method described above.

[0098] The specific implementation of the decision tree binning calculation optimization device in this application is basically the same as the various embodiments of the decision tree binning calculation optimization method described above, and will not be repeated here.

[0099] This application embodiment also provides a decision tree binning computation optimization device, which is applied to a decision tree binning computation optimization equipment, and the decision tree binning computation optimization device includes:

[0100] The acquisition module is used to acquire the memory storage data corresponding to the target decision tree and the index table corresponding to the memory storage data, wherein the index table includes at least one node bin label;

[0101] The segmentation module is used to segment the bin labels of each node based on the size of each bin in the decision tree, and obtain the segmentation batch of each bin label;

[0102] The extraction module is used to extract the decision tree binning batch data corresponding to each binning label segmentation batch from the memory storage data according to the index table;

[0103] The matching module is used to segment the decision tree bin size information corresponding to each bin label and to match the number of parallel computing threads corresponding to each decision tree bin batch data.

[0104] The parallel computing module is used to perform decision tree binning calculations in parallel on each batch of decision tree binning data based on the number of parallel computing threads, and to obtain the target decision tree binning calculation result.

[0105] Optionally, the segmentation module is further configured to:

[0106] Based on the size of each decision tree bin, the bin labels of each node are sorted to obtain a label sorting sequence;

[0107] Based on the information on the degree of change in box size corresponding to the label sorting sequence, the label sorting list is segmented to obtain the label segmentation batches for each box.

[0108] Optionally, the segmentation module is further configured to:

[0109] Based on the size of each decision tree bin, calculate the bin size arrangement gradient information corresponding to the label sorting sequence;

[0110] Based on the bin size arrangement gradient information, each segment point is selected in the label sorting sequence;

[0111] Based on each of the segmentation points, the label sorting list is segmented to obtain each of the box label segmentation batches.

[0112] Optionally, the acquisition module is further configured to:

[0113] Traverse the target decision tree to obtain the node bin labels and memory storage data corresponding to each decision tree bin;

[0114] Based on the storage location of the decision tree binning data in each decision tree binning in the memory storage data, a storage location label corresponding to each decision tree binning data is generated;

[0115] The index table is constructed based on the association between the node bin labels and the corresponding storage location labels of each decision tree bin data.

[0116] Optionally, the acquisition module is further configured to:

[0117] Traverse the target decision tree to obtain the bin list data that is common to each bin of the decision tree and the bin label information corresponding to each bin of the decision tree.

[0118] The bin list data is converted to GPU memory to obtain the memory storage data;

[0119] Based on the tree node number, feature number, and bin number in the bin label information, generate bin labels for each node.

[0120] Optionally, the matching module is further configured to:

[0121] Obtain the decision tree bin size range corresponding to each bin label segmentation batch;

[0122] The number of threads corresponding to the upper limit of the bin size range of each decision tree is matched to obtain the number of parallel computing threads for each decision tree.

[0123] Optionally, the parallel computing module is further configured to:

[0124] Based on the number of parallel computing threads, a corresponding GPU parallel computing thread group is allocated to each of the decision tree binning batch data.

[0125] Based on the GPU parallel computing thread groups, large number reduction summation is performed on the decision tree binning data in each decision tree binning batch data to obtain the target decision tree binning calculation result.

[0126] The specific implementation of the decision tree binning calculation optimization device in this application is basically the same as the various embodiments of the decision tree binning calculation optimization method described above, and will not be repeated here.

[0127] This application provides a medium that is a readable storage medium, and the readable storage medium stores one or more programs, which can be executed by one or more processors to implement the steps of the decision tree binning computation optimization method described in any of the above claims.

[0128] The specific implementation of the readable storage medium in this application is basically the same as the embodiments of the decision tree binning calculation optimization method described above, and will not be repeated here.

[0129] This application provides a computer program product, which includes one or more computer programs. The one or more computer programs can be executed by one or more processors to implement the steps of the decision tree binning computation optimization method described in any of the above claims.

[0130] The specific implementation of the computer program product in this application is basically the same as the embodiments of the decision tree binning calculation optimization method described above, and will not be repeated here.

[0131] The above are merely preferred embodiments of this application and do not limit the patent scope of this application. Any equivalent structural or procedural transformations made using the content of this application's specification and drawings, or direct or indirect applications in other related technical fields, are similarly included within the patent scope of this application.

Claims

1. A decision tree binning computation optimization method, characterized in that, The decision tree binning calculation optimization method includes: Obtain the memory storage data corresponding to the target decision tree and the index table corresponding to the memory storage data, wherein the index table includes at least one node bin label; Based on the size of each decision tree bin, the bin labels of each node are sorted to obtain a label sorting sequence; Based on the size of each decision tree bin, calculate the bin size arrangement gradient information corresponding to the label sorting sequence; Based on the bin size arrangement gradient information, each segment point is selected in the label sorting sequence; Based on each of the aforementioned segmentation points, the label sorting list is segmented to obtain the batch of labels for each sub-box. Based on the index table, the decision tree binning batch data corresponding to each binning label segmentation batch is extracted from the memory storage data; Obtain the decision tree bin size range corresponding to each bin label segmentation batch; The upper limit of the bin size range of each decision tree is matched with the corresponding number of threads to obtain the number of parallel computing threads; Based on the number of parallel computing threads, a corresponding GPU parallel computing thread group is allocated to each of the decision tree binning batch data. Based on the GPU parallel computing thread groups, large number reduction summation is performed on the decision tree binning data in each decision tree binning batch data to obtain the target decision tree binning calculation result.

2. The decision tree binning computation optimization method as described in claim 1, characterized in that, The step of obtaining the memory storage data corresponding to the target decision tree and the index table corresponding to the memory storage data includes: Traverse the target decision tree to obtain the node bin labels and memory storage data corresponding to each decision tree bin; Based on the storage location of the decision tree binning data in each decision tree binning in the memory storage data, a storage location label corresponding to each decision tree binning data is generated; The index table is constructed based on the association between the node bin labels and the corresponding storage location labels of each decision tree bin data.

3. The decision tree binning computation optimization method as described in claim 2, characterized in that, The step of traversing the target decision tree to obtain the node bin labels corresponding to each decision tree bin and the memory-stored data includes: Traverse the target decision tree to obtain the bin list data that is common to each bin of the decision tree and the bin label information corresponding to each bin of the decision tree. The bin list data is converted to GPU memory to obtain the memory storage data; Based on the tree node number, feature number, and bin number in the bin label information, generate bin labels for each node.

4. A decision tree binning computation optimization device, characterized in that, The decision tree binning computation optimization device includes: a memory, a processor, and a program stored in the memory for implementing the decision tree binning computation optimization method. The memory is used to store the program that implements the decision tree binning calculation optimization method; The processor is used to execute a program that implements the decision tree binning computation optimization method, so as to implement the steps of the decision tree binning computation optimization method as described in any one of claims 1 to 3.

5. A medium, said medium being a readable storage medium, characterized in that, The readable storage medium stores a program that implements the decision tree binning computation optimization method, and the program that implements the decision tree binning computation optimization method is executed by a processor to implement the steps of the decision tree binning computation optimization method as described in any one of claims 1 to 3.

6. A computer program product, comprising a computer program, characterized in that, When executed by a processor, the computer program implements the steps of the decision tree binning computation optimization method as described in any one of claims 1 to 3.