Data processing method and apparatus, and storage medium
By optimizing the random forest model, combining attention mechanisms and dimensionality reduction techniques, and integrating N decision tree models with weight adjustments, the problem of insufficient prediction accuracy of the random forest model was solved, achieving higher data prediction accuracy.
Patent Information
- Authority / Receiving Office
- CN · China
- Patent Type
- Patents(China)
- Current Assignee / Owner
- BEIJING WODONG TIANJUN INFORMATION TECH CO LTD
- Filing Date
- 2022-01-05
- Publication Date
- 2026-06-16
Smart Images

Figure CN116451153B_ABST
Abstract
Description
Technical Field
[0001] The present invention relates to the field of data processing technology, and in particular to a data processing method, apparatus and storage medium. Background Technology
[0002] Existing random forest models employ relatively simplistic strategies for combining decision tree models. For classification problems, majority voting is commonly used, while for regression problems, the mean-based approach is frequently employed. Current techniques often result in mispredictions because a relatively small number of decision trees within the random forest model produce predictions close to the true values, while the majority deviate significantly. This prevents the full utilization of the combined effect of multiple decision trees, leading to poor prediction accuracy for the random forest model. Summary of the Invention
[0003] The data processing method, apparatus, and storage medium provided in this invention can improve the accuracy of data prediction.
[0004] The technical solution of this invention is implemented as follows:
[0005] This invention provides a data processing method, including:
[0006] Get multiple pending data items;
[0007] The optimized random forest model is used to process the multiple datasets to obtain the processing results; wherein...
[0008] The optimized random forest model is obtained by combining N decision tree models, which are combined with attention mechanism and the original training set after dimensionality reduction; N is a positive integer greater than 1.
[0009] In the above scheme, before processing the multiple datasets using the optimized random forest model to obtain the processing results, the method further includes:
[0010] Based on the N training sets in the original training set, the N decision tree models are trained and formed; the original training set includes: multiple training data, and each of the N training sets includes: at least one training data;
[0011] Based on the N decision tree models, the original training set is predicted to obtain a prediction matrix. Based on the prediction matrix, the N weights corresponding to the N decision tree models are iteratively updated to obtain the N weights.
[0012] The optimized random forest model is obtained by combining the N decision tree models and the N weights.
[0013] In the above scheme, the step of predicting the original training set based on the N decision tree models to obtain a prediction matrix, and iteratively updating the prediction matrix to obtain the N weights corresponding to the N decision tree models, includes:
[0014] In the original training set, multiple sets of feature data corresponding to the multiple training data are extracted; each set of feature data includes: multiple feature data corresponding to the training data.
[0015] The prediction matrix is obtained by using the N decision tree models to predict the multiple sets of feature data respectively;
[0016] The original feature matrix corresponding to the original training set is subjected to dimensionality reduction processing to obtain a dimensionality-reduced matrix;
[0017] Obtain the N original weights from the N decision tree models;
[0018] Based on the prediction matrix, the N original weights, and the dimensionality reduction matrix, multiple prediction results are calculated;
[0019] The multiple prediction results are combined with N loss functions, and the training is iterated multiple times until the N loss functions converge. The gradient descent method is used to solve the N loss functions to obtain multiple gradient values. The multiple gradient values and the multiple original weights are combined to obtain the N weights.
[0020] In the above scheme, the step of using the N decision tree models to predict the multiple sets of feature data to obtain the prediction matrix includes:
[0021] The multiple sets of feature data are predicted using each of the N decision tree models, resulting in predicted values for each decision tree model corresponding to the multiple training data.
[0022] The i-th decision tree model is sequentially assigned to the multiple predicted values of the multiple training data as the i-th column of the prediction matrix, until all N columns of the prediction matrix are completed, thereby forming the prediction matrix; i is a positive integer greater than or equal to 1 and less than or equal to N.
[0023] In the above scheme, before training N decision tree models based on the N training sets in the original training set, the method further includes:
[0024] Multiple initial data sets are obtained, and the multiple initial data sets are processed using a data warehouse method to obtain multiple training data sets, which are then used to form the original training set; each training data set includes: encoding information corresponding to the target and multiple feature data sets.
[0025] The values of the multiple feature data corresponding to each training data are sequentially used as each row of the original feature matrix until all rows of the original feature matrix are completed, thereby forming the original feature matrix.
[0026] In the above scheme, the step of reducing the dimensionality of the original feature matrix corresponding to the original training set to obtain the dimensionality-reduced matrix includes:
[0027] Calculate the covariance matrix of the original feature matrix of the original training set, and determine the feature matrix corresponding to the covariance matrix;
[0028] The original feature matrix is multiplied by its transpose to obtain the dimensionality-reduced matrix.
[0029] In the above scheme, calculating the covariance matrix of the original feature matrix of the original training set and determining the feature matrix corresponding to the covariance matrix includes:
[0030] Calculate the covariance matrix of the original feature matrix, and calculate the multidimensional eigenvalues of the covariance matrix and the multiple eigenvectors corresponding to the multidimensional eigenvalues;
[0031] Among the multidimensional feature values, the top d largest feature values are determined; the sum of the top d feature values is greater than or equal to a preset value, and the sum of the top d-1 feature values is less than the preset value; d is a positive integer greater than or equal to 1.
[0032] The first d eigenvalues are represented by the first d eigenvectors corresponding to the first d eigenvalues in the plurality of eigenvectors, forming the d rows of the feature matrix, thereby forming the feature matrix.
[0033] In the above scheme, the calculation of multiple prediction results based on the prediction matrix, the N original weights, and the dimensionality reduction matrix includes:
[0034] The values of the N rows of the parameter matrix are constructed using the N original weights;
[0035] Multiply the dimensionality reduction matrix by the transpose of the parameter matrix to obtain the matching degree matrix;
[0036] By taking the exponent of each value in the matching degree matrix, the first intermediate matrix is obtained;
[0037] Calculate the quotient of each value in the first intermediate matrix and the sum of the values in its corresponding row to obtain the second intermediate matrix;
[0038] The multiple prediction results are calculated based on the second intermediate matrix and the prediction matrix.
[0039] In the above scheme, calculating the multiple prediction results based on the second intermediate matrix and the prediction matrix includes:
[0040] Each value in each row of the second intermediate matrix is multiplied by the corresponding value in the corresponding row of the prediction matrix to obtain multiple intermediate values for each row.
[0041] The multiple intermediate values of each row are combined to obtain the vector of each row, thereby obtaining the multiple prediction results.
[0042] In the above scheme, the step of combining the multiple prediction results with N loss functions, performing multiple iterations of training until the N loss functions converge, solving the N loss functions using gradient descent to obtain multiple gradient values, and combining the multiple gradient values and the multiple original weights to obtain the N weights includes:
[0043] Calculate the derivatives of the N loss functions with respect to the multiple prediction results, and obtain N expressions for the N gradient values based on the derivatives and L'Hôpital's rule. Solve the N expressions to obtain the N gradient values.
[0044] The N original weights are updated using the N gradient values, and the multiple prediction results are iteratively trained using the N loss functions until the N loss functions converge to obtain the N gradient values.
[0045] The N intermediate gradient values from the previous training are converged by the N loss functions to obtain N intermediate weights. The N gradient values and the N intermediate weights are then added together to obtain the N weights.
[0046] In the above scheme, the step of calculating the derivatives of the N loss functions with respect to the multiple prediction results, and obtaining N expressions for the N gradient values based on the derivatives and L'Hôpital's rule, and solving the N expressions to obtain the N gradient values, includes:
[0047] Calculate the N first derivatives of the N loss functions with respect to the multiple prediction results;
[0048] Calculate the multiple prediction results and the multiple second derivatives with respect to the second intermediate matrix;
[0049] Find the third derivative of the second intermediate matrix with respect to the first intermediate matrix;
[0050] Find the fourth derivative of the first intermediate matrix with respect to the matching degree matrix;
[0051] Find the fifth derivative of the matching degree matrix with respect to the parameter matrix;
[0052] According to L'Hôpital's rule, the N first derivatives, the plurality of second derivatives, the third derivative, the fourth derivative, and the fifth derivative are combined to obtain the N expressions;
[0053] Solve the N expressions to obtain the N gradient values.
[0054] In the above scheme, training the N decision tree models based on the N training data in the original training set includes:
[0055] The original training set is sampled N times with replacement to obtain N training sets.
[0056] Extract at least one feature data corresponding to at least one training data point from each of the N training sets;
[0057] The original decision tree model is trained using at least one feature data corresponding to at least one training data in each training set to obtain the decision tree model corresponding to each training set, and thus the N decision tree models are obtained.
[0058] In the above scheme, combining the N decision tree models and the N weights to obtain the optimized random forest model includes:
[0059] Multiply the N decision tree models by their corresponding weights to obtain N intermediate models;
[0060] The N intermediate models are combined to form the optimized random forest model.
[0061] This invention also provides a data processing apparatus, comprising:
[0062] The data acquisition unit is used to acquire multiple pieces of data that are currently pending processing.
[0063] The processing unit is used to process the multiple datasets to be processed using an optimized random forest model to obtain the processing results for the multiple datasets to be processed; wherein,
[0064] The optimized random forest model is obtained by combining N decision tree models, which are combined with attention mechanism and the original training set after dimensionality reduction; N is a positive integer greater than 1.
[0065] This invention also provides a data processing apparatus, including a memory and a processor. The memory stores a computer program that can run on the processor, and the processor executes the program to implement the steps in the above-described method.
[0066] This invention also provides a computer-readable storage medium storing a computer program thereon, which, when executed by a processor, implements the steps in the above-described method.
[0067] In this embodiment of the invention, multiple data points to be processed are acquired; an optimized random forest model is used to process these multiple data points, resulting in processing results. The optimized random forest model is obtained by combining N decision tree models, where N decision tree models are combined with an attention mechanism and a dimensionality-reduced original training set; N is a positive integer greater than 1. This combination scheme considers the preference characteristics of the N decision tree models for multiple training data points within the original training set. This makes the final combined random forest model more focused on the original training set, thereby effectively improving the prediction accuracy of the random forest model. Attached Figure Description
[0068] Figure 1 This is an optional flowchart illustrating a data processing method provided in an embodiment of the present invention.
[0069] Figure 2 This is an optional flowchart illustrating a data processing method provided in an embodiment of the present invention.
[0070] Figure 3 This is an optional flowchart illustrating a data processing method provided in an embodiment of the present invention.
[0071] Figure 4 This is an optional flowchart illustrating a data processing method provided in an embodiment of the present invention.
[0072] Figure 5 This is an optional flowchart illustrating a data processing method provided in an embodiment of the present invention.
[0073] Figure 6 This is an optional flowchart illustrating a data processing method provided in an embodiment of the present invention.
[0074] Figure 7 This is an optional flowchart illustrating a data processing method provided in an embodiment of the present invention.
[0075] Figure 8 This is an optional flowchart illustrating a data processing method provided in an embodiment of the present invention.
[0076] Figure 9This is an optional flowchart illustrating a data processing method provided in an embodiment of the present invention.
[0077] Figure 10 This is an optional flowchart illustrating a data processing method provided in an embodiment of the present invention.
[0078] Figure 11 This is an optional flowchart illustrating a data processing method provided in an embodiment of the present invention.
[0079] Figure 12 This is an optional flowchart illustrating a data processing method provided in an embodiment of the present invention.
[0080] Figure 13 This is a schematic diagram of the structure of the data processing device provided in an embodiment of the present invention;
[0081] Figure 14 This is a schematic diagram of a hardware entity of a data processing device provided in an embodiment of the present invention. Detailed Implementation
[0082] To make the objectives, technical solutions, and advantages of the present invention clearer, the technical solutions of the present invention will be further described in detail below with reference to the accompanying drawings and embodiments. The described embodiments should not be regarded as limitations on the present invention. All other embodiments obtained by those skilled in the art without creative effort are within the scope of protection of the present invention.
[0083] In the following description, references are made to “some embodiments,” which describe a subset of all possible embodiments. However, it is understood that “some embodiments” may be the same subset or different subsets of all possible embodiments and may be combined with each other without conflict.
[0084] If similar descriptions such as "first / second" appear in the invention document, the following explanation shall be added: In the following description, the terms "first / second / third" are used only to distinguish similar objects and do not represent a specific order of objects. It is understood that "first / second / third" may be interchanged in a specific order or sequence where permitted, so that the embodiments of the invention described herein can be implemented in an order other than that illustrated or described herein.
[0085] Unless otherwise defined, all technical and scientific terms used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this invention pertains. The terminology used herein is for the purpose of describing embodiments of the invention only and is not intended to limit the invention.
[0086] Figure 1 This is an optional flowchart illustrating a data processing method provided in an embodiment of the present invention, which will be combined with... Figure 1The steps shown are explained.
[0087] S101. Obtain multiple pending data.
[0088] In this embodiment of the invention, the server obtains multiple data to be processed from multiple clients through pre-established communication lines with multiple clients.
[0089] The multiple data points to be processed can be multiple attribute data corresponding to multiple targets. The targets can include any one of users, stores, and items. For example, the multiple data points to be processed can be multiple attribute data for multiple users. The attribute data can be information such as the user's transaction habits, gender, age, education level, and location. The multiple data points to be processed can also be attribute data corresponding to multiple stores. The multiple data points to be processed can also be attribute data corresponding to multiple items.
[0090] In this embodiment of the invention, the server can obtain multiple pieces of data to be processed through a third-party storage device. Specifically, the target object of the server uploads the data containing the multiple pieces of data to the server via the server's interface.
[0091] In this embodiment of the invention, the server can obtain multiple data to be processed from other servers through a pre-established communication line with other servers.
[0092] S102. The optimized random forest model is used to process multiple data sets to obtain the processing results of the multiple data sets to be processed. The optimized random forest model is obtained by combining N decision tree models. The N decision tree models are combined with the attention mechanism and the original training set after dimensionality reduction. N is a positive integer greater than 1.
[0093] In this embodiment of the invention, the server inputs multiple data sets to be processed into an optimized random forest model, and the optimized random forest model outputs the processing results of the multiple data sets. The optimized random forest model is obtained by combining N decision tree models, which are formed by combining an attention mechanism with the original training set after dimensionality reduction; N is a positive integer greater than 1.
[0094] The optimized random forest model is used to classify or normalize multiple datasets.
[0095] In this embodiment of the invention, the server inputs multiple data sets to be processed into an optimized random forest model, and the optimized random forest model outputs classification results for the multiple data sets to be processed. The classification results represent the category corresponding to each data set to be processed.
[0096] Random forest is a type of ensemble learning algorithm that trains multiple weak classifiers (usually decision trees) and then combines them to obtain the final result, resulting in a model with high accuracy and generalization performance.
[0097] Random forests have many advantages, including (1) good noise resistance and are less prone to overfitting; (2) the ability to handle high-dimensional data without feature selection; (3) easy parallelization and fast training speed; and (4) relatively good error balancing for imbalanced datasets. These advantages have led to their widespread application in the field of machine learning.
[0098] In this embodiment of the invention, multiple data points to be processed are acquired; an optimized random forest model is used to process these multiple data points, resulting in processing results. The optimized random forest model is obtained by combining N decision tree models, where N decision tree models are combined with an attention mechanism and a dimensionality-reduced original training set; N is a positive integer greater than 1. This combination scheme considers the preference characteristics of the N decision tree models for multiple training data points within the original training set. This makes the final combined random forest model more focused on the original training set, thereby effectively improving the prediction accuracy of the random forest model.
[0099] In some embodiments, see Figure 2 , Figure 2 This is an optional flowchart illustrating the data processing method provided in an embodiment of the present invention. Figure 1 The implementation of S102 is preceded by S103 to S105, which will be explained in conjunction with each step.
[0100] S103. Based on the N training sets in the original training set, train N decision tree models.
[0101] In this embodiment of the invention, the server trains N decision tree models based on N training sets in the original training set.
[0102] The server can use each training set to train and form a decision tree model corresponding to that training set.
[0103] In this embodiment of the invention, the server can perform N sampling operations on multiple training data sets in the original training set, with each sampling extracting the same number of training data sets, thereby forming N training sets. Each training set contains the same number of training data sets. The original training set includes multiple training data sets, and each of the N training sets includes at least one training data set. The server trains the original decision tree model using each of the N training sets until N decision tree models are obtained.
[0104] In this embodiment of the invention, each training data point in the plurality of training data points is training data for a corresponding target. The target can be any one of a user, a store, or an item. Each training data point includes: encoding information and feature data corresponding to the target.
[0105] S104. Based on N decision tree models, predict the original training set to obtain the prediction matrix. Iterate and update the prediction matrix to obtain the N weights corresponding to the N decision tree models.
[0106] In this embodiment of the invention, the server predicts multiple training data points in the original training set based on N decision tree models, obtaining prediction matrices corresponding to the N decision tree models. The server iteratively updates the prediction matrices to calculate the N weights corresponding to the N decision tree models.
[0107] In this embodiment of the invention, the server uses N decision tree models to predict multiple training data points. This yields a predicted value for each training data point corresponding to each decision tree model. Based on the predicted values for each training data point, the server constructs prediction matrices for the N decision tree models. The server then constructs a base matrix based on multiple feature data points included in the training data. The server calculates the covariance matrix of the base matrix and determines the corresponding feature matrix. The server multiplies the base matrix by the transpose of the feature matrix to obtain a dimension-reduced matrix. Based on the prediction matrix and the dimension-reduced matrix, the server calculates multiple prediction results. The server combines these prediction results with N loss functions, performing iterative training until the N loss functions converge. Gradient descent is then applied to the N loss functions to obtain multiple gradient values. These gradient values are combined with the original weights to obtain N weights.
[0108] S105. Combine N decision tree models and N weights to obtain the optimized random forest model.
[0109] In this embodiment of the invention, the server combines N decision tree models and N weights to obtain an optimized random forest model.
[0110] In this embodiment of the invention, the server multiplies each decision tree model by its corresponding weight to obtain an intermediate model for each decision tree model. This results in N intermediate models corresponding to N decision tree models. The server then combines these N intermediate models to form an optimized random forest model.
[0111] In this embodiment of the invention, the server obtains a prediction matrix based on the prediction results of N decision tree models on the original training set. Then, based on the prediction matrix, iterative updates are performed to calculate N weights. These N weights are multiplied by their corresponding decision tree models, and then combined to obtain an optimized random forest model. Since the prediction matrix is obtained based on predictions made by the decision tree models on the original training set, it takes into account the preference characteristics of the N decision tree models for multiple training data within the original training set. This makes the final combined random forest model more focused on the original training set, thereby effectively improving the prediction accuracy of the random forest model.
[0112] In some embodiments, see Figure 3 , Figure 3 This is an optional flowchart illustrating the data processing method provided in an embodiment of the present invention. Figure 2 S104 shown can be implemented through S106 to S111, which will be explained in conjunction with each step.
[0113] S106. In the original training set, extract multiple sets of feature data corresponding to multiple training data; each set of feature data includes multiple feature data corresponding to the training data.
[0114] In this embodiment of the invention, the server extracts multiple sets of feature data corresponding to multiple training data in the original training set. Each set of feature data includes multiple feature data corresponding to the training data.
[0115] In this embodiment of the invention, each training data includes: encoding information corresponding to the target and multiple feature data. The server extracts one or more feature data from the multiple feature data of each training data to form a set of feature data for that training data. This results in multiple sets of feature data for multiple training data sets.
[0116] S107. Use N decision tree models to predict multiple sets of feature data to obtain the prediction matrix.
[0117] In this embodiment of the invention, the server uses N decision tree models to predict multiple sets of feature data, and obtains a prediction matrix.
[0118] In this embodiment of the invention, the server can sort N decision tree models according to a certain relationship, and the server can also sort multiple sets of feature data according to a certain relationship. The server inputs the N decision tree models according to the sequential order of the multiple sets of feature data. This yields the predicted value for each training data corresponding to each decision tree model. Based on the sequential order of the N decision tree models, the server uses the multiple predicted values of the multiple training data corresponding to each decision tree model to form each column of a prediction matrix. This results in the prediction matrix.
[0119] S108. Perform dimensionality reduction processing on the original feature matrix corresponding to the original training set to obtain the dimensionality-reduced matrix.
[0120] In this embodiment of the invention, the server first constructs an original feature matrix based on multiple training data in the original training set. The server then performs dimensionality reduction processing on the original feature matrix to obtain a dimensionality-reduced matrix.
[0121] S109. Obtain the N original weights from the N decision tree models.
[0122] In this embodiment of the invention, the server operator pre-sets N weights corresponding to N decision tree models and inputs them into the server. The server then obtains these N weights.
[0123] S110. Based on the prediction matrix, N original weights, and the dimensionality reduction matrix, calculate multiple prediction results.
[0124] In this embodiment of the invention, the server calculates multiple prediction results based on the prediction matrix, N original weights, and the dimensionality reduction matrix.
[0125] In this embodiment of the invention, the server constructs a parameter matrix based on N original weights. The server multiplies the dimensionality-reduced matrix with the transpose of the parameter matrix to obtain a matching degree matrix. The server takes the exponent of each value in the matching degree matrix to obtain a first intermediate matrix. The server calculates the quotient of each value in the first intermediate matrix with the sum of the values in its corresponding row to obtain a second intermediate matrix. The server multiplies each value in each row of the second intermediate matrix with the corresponding value in the corresponding row of the prediction matrix to obtain multiple prediction results.
[0126] The values in the i-th row of the parameter matrix correspond to the original weights of the i-th decision tree model.
[0127] The prediction result can be a prediction vector.
[0128] S111. Combine multiple prediction results with N loss functions, perform multiple iterations of training until the N loss functions converge, solve the N loss functions using gradient descent to obtain multiple gradient values, and combine the multiple gradient values with multiple original weights to obtain N weights.
[0129] In this embodiment of the invention, the server combines multiple prediction results with N loss functions and performs multiple iterative training until the N loss functions converge. The server solves for the multiple loss functions using gradient descent, obtaining multiple gradient values. The server combines the multiple gradient values with the multiple original weights to obtain N weights.
[0130] In this embodiment of the invention, the N loss functions are the loss functions corresponding to the N decision tree models.
[0131] In this embodiment of the invention, the server combines multiple prediction results with N loss functions and performs multiple iterative training until the predetermined number of training iterations is reached. The server solves for the multiple loss functions using gradient descent, obtaining multiple gradient values. The server adds these multiple gradient values to the intermediate weights obtained from the previous training iteration, thereby obtaining N weights.
[0132] In this embodiment of the invention, the server first obtains a prediction matrix based on multiple sets of feature data from N decision tree models and multiple training data. Then, it calculates the dimensionality reduction matrix of the original feature matrix corresponding to the original training set. The server combines the dimensionality reduction matrix of the prediction matrix to calculate multiple prediction results. Since these multiple prediction results take into account the preference characteristics of different decision tree models for the training data, the server calculates N weights based on these multiple prediction results. This results in a random forest model that improves the accuracy of data prediction.
[0133] In some embodiments, see Figure 4 , Figure 4 This is an optional flowchart illustrating the data processing method provided in an embodiment of the present invention. Figure 3 The shown S107 can be implemented through S112 to S113, which will be explained in conjunction with each step.
[0134] S112. Use each of the N decision tree models to predict multiple sets of feature data, and obtain the predicted values of each decision tree model corresponding to multiple training data.
[0135] In this embodiment of the invention, the server uses each of the N decision tree models to predict multiple sets of feature data, thereby obtaining the predicted values of the multiple sets of feature data corresponding to each decision tree model.
[0136] In this embodiment of the invention, the server sorts N decision tree models. The server sorts multiple sets of feature data. The server inputs each set of feature data into the N decision tree models in sequence, obtaining the predicted value for each set of feature data for each decision tree model.
[0137] S113. The i-th decision tree model is used as the i-th column of the prediction matrix, corresponding to the multiple predicted values of the multiple training data in sequence, until all N columns of the prediction matrix are completed, thus forming the prediction matrix.
[0138] In this embodiment of the invention, the server sequentially assigns the i-th decision tree model to multiple predicted values of multiple training data to form the i-th column of the prediction matrix, until all N columns of the prediction matrix are completed, thereby forming the prediction matrix.
[0139] Where i is a positive integer greater than or equal to 1 and less than N.
[0140] In this embodiment of the invention, the server obtains a prediction matrix based on the prediction values of each decision tree model for each training data. This allows the prediction matrix to take into account the preference characteristics of different decision tree models for the training data.
[0141] In some embodiments, see Figure 5 , Figure 5 This is an optional flowchart illustrating the data processing method provided in an embodiment of the present invention. Figure 3 The implementation of S103 is preceded by S114 to S115, which will be explained in conjunction with each step.
[0142] S114. Obtain multiple initial data, process the multiple initial data using the data warehouse method to obtain multiple training data, and then form the original training set using the multiple training data.
[0143] In this embodiment of the invention, the server acquires multiple initial data sets. The server processes the multiple initial data sets using the data warehouse method (Extract-Transform-Load, ETL) to obtain multiple training data sets. The server combines the multiple training data sets to obtain the original training set.
[0144] Each training data point includes: the encoding information of the corresponding target and multiple feature data.
[0145] In this embodiment of the invention, the multiple initial data are data corresponding to multiple targets. Each of the multiple initial data may include: transaction habit information, gender information, age information, education information, and location information of the corresponding target.
[0146] S115. Sequentially, the values of multiple feature data corresponding to each training data are used as each row of the original feature matrix until all rows of the original feature matrix are completed, thus forming the original feature matrix.
[0147] In this embodiment of the invention, the server sequentially uses the values of multiple feature data corresponding to each training data as each row of the original feature matrix until all rows of the original feature matrix are completed, thereby forming the original feature matrix.
[0148] In this embodiment of the invention, the server arranges the multiple feature data corresponding to the multiple training data in the order of the multiple training data, thereby obtaining the original feature matrix.
[0149] Each row of the original feature matrix corresponds to a feature vector of the corresponding training data.
[0150] In this embodiment of the invention, the server processes the initial data to obtain multiple training data, and combines the feature data corresponding to the multiple training data to obtain the original feature matrix, which makes it convenient to perform quantization processing on the data in the original training set.
[0151] In some embodiments, see Figure 5 , Figure 5 This is an optional flowchart illustrating the data processing method provided in an embodiment of the present invention. Figure 3 The shown S108 can be implemented through S116 to S117, which will be explained in conjunction with each step.
[0152] S116. Calculate the covariance matrix of the original feature matrix of the original training set, and determine the feature matrix corresponding to the covariance matrix.
[0153] In this embodiment of the invention, the server calculates the covariance matrix of the original feature matrix and determines the feature matrix corresponding to the covariance matrix.
[0154] In this embodiment of the invention, the server can calculate the average value of each dimension of the original feature matrix. The server subtracts this average value from each column of the original feature matrix, obtaining a matrix corresponding to the original feature matrix. The server then calculates the covariance matrix of this matrix using the covariance calculation formula. The server calculates multiple eigenvalues of the covariance matrix. The server determines d eigenvectors corresponding to d eigenvalues from these d eigenvalues to form the feature matrix.
[0155] S117. Multiply the original feature matrix by the transpose of the feature matrix to obtain the dimension-reduced matrix.
[0156] In this embodiment of the invention, the server multiplies the original feature matrix with the transpose of the feature matrix to obtain the dimension-reduced matrix.
[0157] For example, the server can calculate the dimension reduction matrix Q using formula (1).
[0158] Q = DP T (1)
[0159] Where D is the basic matrix, P T This is the transpose of the characteristic matrix. The server will use D and P... T Multiplying them yields a reduced-dimensional matrix.
[0160] In this embodiment of the invention, the server multiplies the original feature matrix with the transpose of the feature matrix to obtain a dimensionality-reduced matrix, which in turn reduces the dimensionality of the original training set, facilitating the calculation of weights.
[0161] In some embodiments, see Figure 6 , Figure 6This is an optional flowchart illustrating the data processing method provided in an embodiment of the present invention. Figure 5 The shown S116 can be implemented through S118 to S120, which will be explained in conjunction with each step.
[0162] S118. Calculate the covariance matrix of the original feature matrix, and calculate the multidimensional eigenvalues of the covariance matrix and the multiple eigenvectors corresponding to the multidimensional eigenvalues.
[0163] In this embodiment of the invention, the server calculates the covariance matrix of the original feature matrix, and calculates the multidimensional eigenvalues of the covariance matrix and the multiple eigenvectors corresponding to the multidimensional eigenvalues.
[0164] S119. Identify the top d largest eigenvalues among the multidimensional eigenvalues.
[0165] In this embodiment of the invention, the server determines the top d largest feature values among the multidimensional feature values.
[0166] In this embodiment of the invention, the server sorts multiple feature values in descending order, obtaining a permutation set. The server then determines the top d largest feature values from the permutation set.
[0167] Wherein, the sum of the first d feature values is greater than or equal to a preset value, and the sum of the first d-1 feature values is less than a preset value; d is a positive integer greater than or equal to 1. The preset value can range from 0 to 1.
[0168] S120. Take the first d eigenvectors corresponding to the first d eigenvalues in multiple eigenvectors to form the d rows of the eigenmatrix, and then form the eigenmatrix.
[0169] In this embodiment of the invention, the server forms the first d feature vectors corresponding to the first d feature values in multiple feature vectors, and then forms the feature matrix by creating d rows of the feature matrix.
[0170] In this embodiment of the invention, the server determines the top d feature vectors corresponding to the top d largest feature values, and obtains the feature matrix based on the top d feature vectors. This fully integrates the feature information of each data point, thereby making the dimensionality reduction matrix more closely match the original training set.
[0171] In some embodiments, see Figure 7 , Figure 7 This is an optional flowchart illustrating the data processing method provided in an embodiment of the present invention. Figure 4 The shown S110 can be implemented by S121 to S125, which will be explained in conjunction with each step.
[0172] S121. Construct the values of N rows of the parameter matrix using N original weights.
[0173] In this embodiment of the invention, the server pre-constructs a parameter matrix, and the i-th row of the parameter matrix corresponds to the original weights of the i-th decision tree model.
[0174] In this embodiment of the invention, the server presets the parameter matrix of Attention to be K, which is an N x d matrix. The i-th row of K contains the parameters of the i-th decision tree model in Attention.
[0175] S122. Multiply the dimension reduction matrix by the transpose of the parameter matrix to obtain the matching degree matrix.
[0176] In this embodiment of the invention, the server multiplies the dimensionality reduction matrix with the transpose of the parameter matrix to obtain the matching degree matrix.
[0177] In this embodiment of the invention, the server can calculate the matching degree matrix Z using formula (2).
[0178] Z = QK T (2)
[0179] Where Q is the dimensionality reduction matrix, K T This is the transpose of the parameter matrix. The server uses Q and K... T Multiplying them together yields the matching degree matrix Z.
[0180] S123. Take the exponent of each value in the matching degree matrix to obtain the first intermediate matrix.
[0181] In this embodiment of the invention, the server takes the exponent of each value in the matching degree matrix to obtain the first intermediate matrix.
[0182] In this embodiment of the invention, the server can calculate the first intermediate matrix A using formula (3).
[0183] A = e z (3)
[0184] Where e is a constant. The server takes the exponent with base e for each value in the matching degree matrix, obtaining the median value for each value. The server arranges the median values according to the arrangement of the values in the matching degree matrix, obtaining the first median matrix A.
[0185] S124. Calculate the quotient of each value in the first intermediate matrix and the sum of the values in its corresponding row to obtain the second intermediate matrix.
[0186] In this embodiment of the invention, the server calculates the quotient of each value in the first intermediate matrix and the sum of the values in the corresponding row to obtain the second intermediate matrix.
[0187] In this embodiment of the invention, the server divides each value in the i-th row of the first intermediate matrix A by the sum of multiple values in the row to which it belongs, that is, normalizes the first intermediate matrix A by row to obtain the second intermediate matrix W.
[0188] In this embodiment of the invention, the server can calculate the second intermediate matrix W using formula (4).
[0189]
[0190] in, It is the i-th value in the j-th row of the second intermediate matrix. S is the i-th value in the j-th row of the first intermediate matrix. j This is the sum of the j-th row in the first intermediate matrix. The server will... Close S j It can be obtained This leads to the second intermediate matrix W.
[0191] S125. Based on the second intermediate matrix and the prediction matrix, calculate multiple prediction results.
[0192] In this embodiment of the invention, the server calculates multiple prediction results based on the second intermediate matrix and the prediction matrix.
[0193] In this embodiment of the invention, the server multiplies each value in each row of the second intermediate matrix with the corresponding value in the prediction matrix to obtain multiple prediction results.
[0194] In this embodiment of the invention, a parameter matrix is first constructed based on N original weights. Then, the parameter matrix and the dimensionality reduction matrix are combined for calculation to obtain a second intermediate matrix. Finally, the second intermediate matrix is combined with the prediction matrix to obtain multiple prediction results. The process of the server calculating multiple prediction results combines the prediction values of the decision tree model on the training data with the dimensionality reduction matrix of the original training set. This ensures that the calculated multiple prediction results also take into account the preference characteristics of the decision tree model for the training data, thereby improving the prediction accuracy of the final random forest model.
[0195] In some embodiments, see Figure 8 , Figure 8 This is an optional flowchart illustrating the data processing method provided in an embodiment of the present invention. Figure 7 The steps S125 to S111 shown can be implemented by S126 to S130, which will be explained in conjunction with each step.
[0196] S126. Multiply each value in each row of the second intermediate matrix with the corresponding value in the corresponding row of the prediction matrix to obtain multiple intermediate values for each row.
[0197] In this embodiment of the invention, the server multiplies each value in each row of the second intermediate matrix with the corresponding value in the corresponding row of the prediction matrix to obtain multiple intermediate values for each row.
[0198] In this embodiment of the invention, both the second intermediate matrix and the prediction matrix can be M-row N-column matrices. The server multiplies each value in the first row of the second intermediate matrix with the corresponding value in the corresponding row of the prediction matrix to obtain N intermediate values corresponding to the N values in each row.
[0199] S127. Combine multiple intermediate values of each row to obtain a vector for each row, and then obtain multiple prediction results.
[0200] In this embodiment of the invention, the server then combines multiple intermediate values of each row to obtain a vector for each row, thereby obtaining multiple prediction results.
[0201] In this embodiment of the invention, the server can combine the N intermediate values of each row to obtain a vector for each row. This results in M prediction results.
[0202] For example, the server can calculate the vector of the i-th row using formula (5).
[0203] in, This represents the j-th value in the i-th row of the second intermediate matrix. This is the value of the j-th element in the i-th row of the prediction matrix. The server will... and Multiply them, then add the intercity values of each value together to get the result.
[0204] S128. Calculate the derivatives of N loss functions with respect to multiple prediction results, and obtain N expressions for N gradient values based on the derivatives and L'Hôpital's rule. Solve the N expressions to obtain N gradient values.
[0205] In this embodiment of the invention, the server calculates the derivatives of N loss functions with respect to multiple prediction results, and obtains N expressions for N gradient values based on the derivatives and L'Hôpital's rule. The N expressions are then solved to obtain N gradient values.
[0206] S129. Update the N original weights with N gradient values, and then iteratively train the multiple prediction results with N loss functions until the N loss functions converge to obtain N gradient values.
[0207] In this embodiment of the invention, the server updates N original weights with N gradient values, and then iteratively trains multiple prediction results with N loss functions until the N loss functions converge to obtain N gradient values.
[0208] In this embodiment of the invention, after the server iteratively trains multiple prediction results using N loss functions, it obtains a new gradient value. The server adds the new gradient value to the previous weight to obtain the current weight. This process continues until the N loss functions converge, resulting in N gradient values.
[0209] S130. By using N loss functions to converge the N intermediate gradient values from the previous training, N intermediate weights are obtained. The N gradient values and N intermediate weights are added together to obtain N weights.
[0210] In this embodiment of the invention, the server converges the N intermediate gradient values from the previous training using N loss functions to obtain N intermediate weights. The server then adds the N gradient values and their corresponding N intermediate weights to obtain the N weights.
[0211] In this embodiment of the invention, the server updates N original weights using N gradient values, and then iteratively trains multiple prediction results using N loss functions to obtain N intermediate gradient values before convergence. The N gradient values and N intermediate weights are then added together to obtain N weights. By iteratively training with N loss functions, the server obtains N weights, strengthening the correlation between the weights and the corresponding decision tree model.
[0212] In some embodiments, see Figure 9 , Figure 9 This is an optional flowchart illustrating the data processing method provided in an embodiment of the present invention. Figure 8 The shown S128 can be implemented through S131 to S137, which will be explained in conjunction with each step.
[0213] S131. Calculate the N first derivatives of N loss functions with respect to multiple prediction results.
[0214] In this embodiment of the invention, the server calculates the N first derivatives of N loss functions with respect to multiple prediction results.
[0215] In this embodiment of the invention, the true label value of the sample is assumed to be... That is, let the prediction result be The loss function is L. As long as the gradient is obtained... Gradient descent can then be used for training to obtain the optimal values of the parameters K, which are also the weight values. The gradient calculation process is as follows:
[0216] In this embodiment of the invention, the server can calculate a loss function L for the corresponding prediction result using formula (6). First derivative:
[0217]
[0218] S132. Find multiple prediction results and multiple second derivatives with respect to the second intermediate matrix.
[0219] In this embodiment of the invention, the server calculates multiple prediction results and multiple second derivatives with respect to the second intermediate matrix.
[0220] In this embodiment of the invention, the server can calculate a value using formula (7). The second derivative with respect to W:
[0221]
[0222] S133. Find the third derivative of the second intermediate matrix with respect to the first intermediate matrix.
[0223] In this embodiment of the invention, the server calculates the third derivative of the second intermediate matrix with respect to the first intermediate matrix.
[0224] In this embodiment of the invention, the server can calculate the third derivative of W with respect to A using formula (8):
[0225]
[0226] S134. Find the fourth derivative of the first intermediate matrix with respect to the matching degree matrix.
[0227] In this embodiment of the invention, the server calculates the fourth derivative of the first intermediate matrix with respect to the matching degree matrix.
[0228] In this embodiment of the invention, the server can calculate the fourth derivative of A with respect to Z using formula (9):
[0229]
[0230] S135. Find the fifth derivative of the matching degree matrix with respect to the parameter matrix.
[0231] In this embodiment of the invention, the server calculates the fifth derivative of the matching degree matrix with respect to the parameter matrix.
[0232] In this embodiment of the invention, the server can calculate the fifth derivative of Z with respect to K using formula (10):
[0233]
[0234] S136. According to L'Hôpital's rule, by combining N first derivatives, multiple second derivatives, third derivatives, fourth derivatives, and fifth derivatives, N expressions are obtained.
[0235] In this embodiment of the invention, the server obtains N expressions by combining N first derivatives, multiple second derivatives, third derivatives, fourth derivatives, and fifth derivatives according to L'Hôpital's rule.
[0236] In this embodiment of the invention, the server can obtain a formula (11) based on L'Hôpital's rule:
[0237]
[0238] S137. Solve the N expressions to obtain N gradient values.
[0239] In this embodiment of the invention, the server calculates the derivative between a loss function and parameter K, and then obtains the expression for the gradient value of the loss function using L'Hôpital's rule. Solving this expression yields the corresponding gradient value, which in turn determines the corresponding weight. This process strengthens the correlation between the gradient value and the corresponding decision tree model.
[0240] In this embodiment of the invention, the server solves N expressions to obtain N gradient values.
[0241] In some embodiments, see Figure 10 , Figure 10 This is an optional flowchart illustrating the data processing method provided in an embodiment of the present invention. Figure 5 The shown S103 can be implemented through S131 to S133, which will be explained in conjunction with each step.
[0242] S131. Perform N samplings with replacement on multiple training data in the original training set to obtain N training sets.
[0243] In this embodiment of the invention, the server performs N samplings with replacement on multiple training data points in the original training set. This N samplings result in multiple training sets.
[0244] In this embodiment of the invention, the server performs N samplings with replacement on M training data points, resulting in N training sets. Each training set may include M training data points.
[0245] In this embodiment of the invention, the server performs N rounds of sampling from the original sample set using a bootstrap (sampling with replacement) method, sampling M samples in each round to obtain N sub-training sets {D1,D2,...,DN} of size M.
[0246] In this embodiment of the invention, during the sampling process, the server may have samples that are repeatedly sampled, or samples that are not sampled at all. The probability of a sample not being sampled in each round of sampling is... If M samplings are performed in each round, the probability that the sample is never drawn is When M approaches infinity, this value is approximately equal to 0.37. That is, only 63% of the samples in the original sample set D will appear in each sub-training set.
[0247] S132. Extract at least one feature data corresponding to each training data in each of the N training sets.
[0248] In an embodiment of the present invention, the server extracts at least one feature data corresponding to each training data in each of the N training sets.
[0249] In an embodiment of the present invention, since each training data corresponds to multiple feature data. The server uses each training set Di to generate a decision tree model respectively. During the generation process, t features (t < T) are randomly selected from the T feature data to form a sub-feature set, and the decision tree model is trained and generated by using the randomly selected sub-feature sets respectively. The selection of training sets and the selection of features for these N decision tree models are both random, and the correlation between the N decision tree models is positively correlated with the value of t.
[0250] S133. Use at least one feature data corresponding to each training data in each training set to train the original decision tree model, obtain the decision tree model corresponding to each training set, and thus obtain N decision tree models.
[0251] In an embodiment of the present invention, the server uses at least one feature data corresponding to each training data in each training set to train the original decision tree model, obtain the decision tree model corresponding to each training set, and thus obtain N decision tree models.
[0252] In an embodiment of the present invention, one training of each decision tree model can be represented by the following stages: forward propagation stage, backward propagation stage, and weight update stage. The forward propagation stage is that the text information is input from the input layer and transmitted backward to the output layer. The backward propagation stage is that it is transmitted forward from the output layer to the input layer. In the data processing method proposed in an embodiment of the present invention, in the forward propagation stage, the sub-training set Di is input into the network structure of the original decision tree model to be trained. The network structure of the original decision tree model calculates the loss corresponding to the sub-training set Di through the loss function based on the text information.
[0253] In this embodiment of the invention, the network structure of the original decision tree model calculates the loss of the corresponding sub-training set Di based on the loss function. If this loss is greater than the loss threshold, the network structure of the decision tree model will backpropagate the loss layer by layer through the output layer to the intermediate layers and the input layer, correcting the weights of each layer in a gradient descent manner. After the weights of each layer of the original decision tree model network structure are corrected, the original decision tree model network structure will continue to train on the newly acquired sub-training set Di+1. The process of training the original decision tree model to obtain the decision tree model continues until the current loss calculated by the original decision tree model is not greater than the loss threshold, or until the number of training iterations of the original decision tree model reaches the preset number of training iterations, thus obtaining the decision tree model.
[0254] In this embodiment of the invention, the server obtains the training set through sampling with replacement. This diversifies the data within different training sets, thereby training corresponding decision tree models. Furthermore, by training the decision tree models with different samples, N decision tree models are obtained, resulting in a diversity of models.
[0255] In some embodiments, see Figure 11 , Figure 11 This is an optional flowchart illustrating the data processing method provided in an embodiment of the present invention. Figure 2 The shown S105 can be implemented through S134 to S135, which will be explained in conjunction with each step.
[0256] S134. Multiply each of the N decision tree models by its corresponding weight to obtain N intermediate models.
[0257] S135. Combine the N intermediate models to form an optimized random forest model.
[0258] In some embodiments, see Figure 12 , Figure 12 An optional flowchart of the data processing method provided in the embodiments of the present invention will be described in conjunction with each step.
[0259] S201. Obtain the training data processed by the data warehouse.
[0260] In this embodiment of the invention, the server obtains training data that has been processed by the data warehouse.
[0261] S202. With replacement sampling, N training sets are formed.
[0262] In this embodiment of the invention, the server performs N samplings with replacement on the training data, forming N training sets. The server trains N decision trees using the N training sets. The server then predicts the training data using each of the N decision trees, obtaining N sets of predicted values corresponding to each of the N decision trees. Each set of predicted values includes multiple predicted values from the corresponding decision tree. The server then constructs a prediction matrix using these multiple sets of predicted values.
[0263] S203, Dimensional Reduction.
[0264] In this embodiment of the invention, the server performs dimensionality reduction processing on N training sets to obtain a dimensionality reduction matrix.
[0265] S204. Combine the dimensionality reduction matrix, the prediction matrix, and the preset parameter matrix to calculate N K values.
[0266] In this embodiment of the invention, the server combines the dimensionality reduction matrix, the prediction matrix, and the preset parameter matrix to calculate N K values. This yields the weights corresponding to the N decision trees.
[0267] S201, The optimized random forest model that combines N K and N decision trees.
[0268] In this embodiment of the invention, the server multiplies each decision tree by its corresponding weight K, and finally adds them together to obtain the optimized random forest model.
[0269] Please see Figure 13 This is a schematic diagram of the structure of the data processing device provided in an embodiment of the present invention.
[0270] In an embodiment of the present invention, a data processing apparatus 800 is provided, including: a data acquisition unit 803 and a processing unit 804.
[0271] Data acquisition unit 803 is used to acquire multiple pieces of data to be processed at present;
[0272] Processing unit 804 is used to process the multiple datasets to be processed using an optimized random forest model to obtain the processing results of the multiple datasets to be processed; wherein,
[0273] The optimized random forest model is obtained by combining N decision tree models, which are combined with attention mechanism and the original training set after dimensionality reduction; N is a positive integer greater than 1.
[0274] In this embodiment of the invention, the data processing device 800 is used to train N decision tree models based on N training sets in the original training set; the original training set includes multiple training data, and each of the N training sets includes at least one training data; the original training sets are predicted based on the N decision tree models to obtain a prediction matrix, and the prediction matrix is iteratively updated to obtain N weights corresponding to the N decision tree models; the N decision tree models and the N weights are combined to obtain the optimized random forest model.
[0275] In this embodiment of the invention, the prediction result includes a prediction matrix; the data processing device 800 is used to extract multiple sets of feature data corresponding to the multiple training data in the original training set; each set of feature data includes multiple feature data corresponding to the training data; the multiple sets of feature data are predicted using the N decision tree models respectively to obtain the prediction matrix; the original feature matrix corresponding to the original training set is subjected to dimensionality reduction processing to obtain a dimensionality reduction matrix; N original weights in the N decision tree models are obtained; multiple prediction results are calculated based on the prediction matrix, the N original weights and the dimensionality reduction matrix; the multiple prediction results are combined with N loss functions, and multiple iterations of training are performed until the N loss functions converge; the gradient descent method is used to solve the N loss functions to obtain multiple gradient values; the multiple gradient values and the multiple original weights are combined to obtain the N weights.
[0276] In this embodiment of the invention, the data processing device 800 is used to predict the multiple sets of feature data using each of the N decision tree models, to obtain the predicted value of each decision tree model corresponding to the multiple training data; the multiple predicted values of the i-th decision tree model corresponding to the multiple training data are sequentially used as the i-th column of the prediction matrix, until all N columns of the prediction matrix are completed, thereby forming the prediction matrix; i is a positive integer greater than or equal to 1 and less than or equal to N.
[0277] In this embodiment of the invention, the data acquisition unit 803 in the data processing device 800 is used to acquire multiple initial data, process the multiple initial data using a data warehouse method to obtain multiple training data, and then form the original training set using the multiple training data; each training data includes: encoding information corresponding to the target and multiple feature data; the values of the multiple feature data corresponding to each training data are sequentially used as each row of the original feature matrix until all rows of the original feature matrix are completed, thereby forming the original feature matrix.
[0278] In this embodiment of the invention, the data processing device 800 is used to calculate the covariance matrix of the original feature matrix of the original training set, and determine the feature matrix corresponding to the covariance matrix; multiply the original feature matrix by the transpose of the feature matrix to obtain the dimensionality reduction matrix.
[0279] In this embodiment of the invention, the data processing device 800 is used to calculate the covariance matrix of the original feature matrix, and to calculate the multidimensional eigenvalues of the covariance matrix and the multiple eigenvectors corresponding to the multidimensional eigenvalues; to determine the largest d eigenvalues among the multidimensional eigenvalues; the sum of the first d eigenvalues is greater than or equal to a preset value, and the sum of the first d-1 eigenvalues is less than the preset value; d is a positive integer greater than or equal to 1; and the first d eigenvectors corresponding to the first d eigenvalues in the multiple eigenvectors are used to form the d rows of the feature matrix, thereby forming the feature matrix.
[0280] In this embodiment of the invention, the data processing device 800 is used to construct N rows of values in a parameter matrix using the N original weights; multiply the dimensionality reduction matrix by the transpose of the parameter matrix to obtain a matching degree matrix; take the exponent of each value in the matching degree matrix to obtain a first intermediate matrix; calculate the quotient of the sum of each value in the first intermediate matrix and the value of its corresponding row to obtain a second intermediate matrix; and calculate the multiple prediction results based on the second intermediate matrix and the prediction matrix.
[0281] In this embodiment of the invention, the data processing device 800 is used to multiply each value of each row of the second intermediate matrix with the corresponding value of the corresponding row of the prediction matrix to obtain a plurality of intermediate values for each row; combine the plurality of intermediate values of each row to obtain a vector for each row, and then obtain the plurality of prediction results.
[0282] In this embodiment of the invention, the data processing device 800 is used to calculate the derivatives of the N loss functions with respect to the plurality of prediction results, and obtain N expressions for the N gradient values based on the derivatives and L'Hôpital's rule. The N expressions are then solved to obtain N gradient values. The N original weights are updated using the N gradient values, and the plurality of prediction results are iteratively trained using the N loss functions until the N loss functions converge to obtain the N gradient values. The N intermediate gradient values from the previous training are converged using the N loss functions to obtain N intermediate weights. The N gradient values and the N intermediate weights are then added together to obtain the N weights.
[0283] In this embodiment of the invention, the data processing device 800 is used to calculate N first derivatives of the N loss functions with respect to the plurality of prediction results; calculate a plurality of second derivatives of the plurality of prediction results with respect to the second intermediate matrix; calculate a third derivative of the second intermediate matrix with respect to the first intermediate matrix; calculate a fourth derivative of the first intermediate matrix with respect to the matching degree matrix; calculate a fifth derivative of the matching degree matrix with respect to the parameter matrix; obtain the N expressions by combining the N first derivatives, the plurality of second derivatives, the third derivative, the fourth derivative, and the fifth derivative according to L'Hôpital's rule; and solve the N expressions to obtain the N gradient values.
[0284] In this embodiment of the invention, the data processing device 800 is used to perform N samplings with replacement on the plurality of training data in the original training set to obtain N training sets; extract at least one feature data corresponding to at least one training data in each of the N training sets; and train the original decision tree model using the at least one feature data corresponding to at least one training data in each training set to obtain a decision tree model corresponding to each training set, thereby obtaining the N decision tree models.
[0285] In this embodiment of the invention, the data processing device 800 is used to multiply the N decision tree models by their corresponding weights to obtain N intermediate models; and to combine the N intermediate models to form the optimized random forest model.
[0286] In this embodiment of the invention, a data acquisition unit 803 acquires multiple data sets to be processed; a processing unit 804 processes these multiple data sets using an optimized random forest model to obtain processing results. The optimized random forest model is obtained by combining N decision tree models, which are formed by combining an attention mechanism with the original training set after dimensionality reduction; N is a positive integer greater than 1. This combination scheme takes into account the preference characteristics of the N decision tree models for multiple training data sets within the original training set. This makes the final combined random forest model more focused on the original training set, thereby effectively improving the prediction accuracy of the random forest model.
[0287] It should be noted that, in the embodiments of the present invention, if the above-described cluster construction and subscription information processing method is implemented as a software functional module and sold or used as an independent product, it can also be stored in a computer-readable storage medium. Based on this understanding, the technical solution of the embodiments of the present invention, or the part that contributes to related technologies, can be embodied in the form of a software product. This computer software product is stored in a storage medium and includes several instructions to cause a data processing device (which may be a personal computer, etc.) to execute all or part of the methods described in the various embodiments of the present invention. The aforementioned storage medium includes various media capable of storing program code, such as a USB flash drive, a portable hard drive, a read-only memory (ROM), a magnetic disk, or an optical disk. Thus, the embodiments of the present invention are not limited to any specific hardware and software combination.
[0288] Correspondingly, embodiments of the present invention provide a computer-readable storage medium having a computer program stored thereon, which, when executed by a processor, implements the steps in the above-described method.
[0289] Correspondingly, this embodiment of the invention provides a data processing device, including a memory 802 and a processor 801. The memory 802 stores a computer program that can run on the processor 801. When the processor 801 executes the program, it implements the steps in the above-described method.
[0290] It should be noted that the descriptions of the storage medium and device embodiments above are similar to those of the method embodiments above, and have similar beneficial effects. For technical details not disclosed in the storage medium and device embodiments of the present invention, please refer to the descriptions of the method embodiments of the present invention for understanding.
[0291] It should be noted that, Figure 14 This is a schematic diagram of a hardware entity of a data processing device provided in an embodiment of the present invention, such as... Figure 14 As shown, the hardware entity of the data processing device 800 includes: a processor 801 and a memory 802, wherein;
[0292] The processor 801 typically controls the overall operation of the data processing device 800.
[0293] The memory 802 is configured to store instructions and applications executable by the processor 801, and can also cache data to be processed or already processed by the processor 801 and the various modules in the data processing device 800 (e.g., image data, audio data, voice communication data and video communication data), which can be implemented by flash memory or random access memory (RAM).
[0294] It should be understood that the phrase "one embodiment" or "an embodiment" throughout the specification means that a specific feature, structure, or characteristic related to the embodiment is included in at least one embodiment of the invention. Therefore, "in one embodiment" or "in an embodiment" appearing throughout the specification does not necessarily refer to the same embodiment. Furthermore, these specific features, structures, or characteristics can be combined in any suitable manner in one or more embodiments. It should be understood that in the various embodiments of the invention, the sequence numbers of the above-described processes do not imply a sequential order of execution; the execution order of each process should be determined by its function and internal logic, and should not constitute any limitation on the implementation process of the embodiments of the invention. The sequence numbers of the above-described embodiments of the invention are merely descriptive and do not represent the superiority or inferiority of the embodiments.
[0295] It should be noted that, in this document, the terms "comprising," "including," or any other variations thereof are intended to cover non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements includes not only those elements but also other elements not expressly listed, or elements inherent to such a process, method, article, or apparatus. Unless otherwise specified, an element defined by the phrase "comprising one..." does not exclude the presence of other identical elements in the process, method, article, or apparatus that includes that element.
[0296] In the several embodiments provided by this invention, it should be understood that the disclosed apparatus and methods can be implemented in other ways. The apparatus embodiments described above are merely illustrative. For example, the division of units is only a logical functional division, and in actual implementation, there may be other division methods, such as: multiple units or components can be combined, or integrated into another system, or some features can be ignored or not executed. In addition, the coupling, direct coupling, or communication connection between the various components shown or discussed can be through some interfaces, and the indirect coupling or communication connection of the apparatus or units can be electrical, mechanical, or other forms.
[0297] The units described above as separate components may or may not be physically separate. The components shown as units may or may not be physical units. They may be located in one place or distributed across multiple network units. Some or all of the units may be selected to achieve the purpose of this embodiment according to actual needs.
[0298] In addition, in the various embodiments of the present invention, each functional unit can be integrated into one processing unit, or each unit can be a separate unit, or two or more units can be integrated into one unit; the integrated unit can be implemented in hardware or in the form of hardware plus software functional units.
[0299] Those skilled in the art will understand that all or part of the steps of the above method embodiments can be implemented by hardware related to program instructions. The aforementioned program can be stored in a computer-readable storage medium. When the program is executed, it performs the steps of the above method embodiments. The aforementioned storage medium includes various media that can store program code, such as mobile storage devices, read-only memory (ROM), magnetic disks, or optical disks.
[0300] Alternatively, if the integrated units of this invention are implemented as software functional modules and sold or used as independent products, they can also be stored in a computer-readable storage medium. Based on this understanding, the technical solutions of the embodiments of this invention, or the parts that contribute to related technologies, can be embodied in the form of a software product. This computer software product is stored in a storage medium and includes several instructions to cause a computer device (which may be a personal computer, server, or network device, etc.) to execute all or part of the methods described in the various embodiments of this invention. The aforementioned storage medium includes various media capable of storing program code, such as mobile storage devices, ROMs, magnetic disks, or optical disks.
[0301] The above description is merely an embodiment of the present invention, but the scope of protection of the present invention is not limited thereto. Any variations or substitutions that can be easily conceived by those skilled in the art within the technical scope disclosed in the present invention should be included within the scope of protection of the present invention. Therefore, the scope of protection of the present invention should be determined by the scope of the claims.
Claims
1. A data processing method, characterized in that, include: The original training set is predicted based on N decision tree models to obtain a prediction matrix. The original feature matrix corresponding to the original training set is then reduced in dimensionality to obtain a reduced-dimensional matrix. Obtain the N original weights from the N decision tree models; Based on the prediction matrix, the N original weights, and the dimensionality reduction matrix, multiple prediction results are calculated; the multiple prediction results are combined with N loss functions, and multiple iterations of training are performed until the N loss functions converge; the gradient descent method is used to solve the N loss functions to obtain multiple gradient values; the multiple gradient values and the N original weights are combined to obtain N weights. N is a positive integer greater than 1; The optimized random forest model is obtained by combining the N decision tree models and the N weights. Get multiple pending data items; The optimized random forest model is used to process the multiple datasets to obtain the processing results; wherein, the datasets to be processed are attribute data of the target; the target includes any one of users, stores, and items.
2. The data processing method according to claim 1, characterized in that, Before processing the multiple datasets using the optimized random forest model to obtain the processing results, the method further includes: Based on the N training sets in the original training set, the N decision tree models are trained to form; the original training set includes: multiple training data, and each of the N training sets includes: at least one training data.
3. The data processing method according to claim 2, characterized in that, The prediction of the original training set based on the N decision tree models to obtain the prediction matrix includes: In the original training set, multiple sets of feature data corresponding to the multiple training data are extracted; each set of feature data includes: multiple feature data corresponding to the training data. The prediction matrix is obtained by using the N decision tree models to predict the multiple sets of feature data respectively; The step of reducing the dimensionality of the original feature matrix corresponding to the original training set to obtain the dimensionality-reduced matrix includes: Calculate the covariance matrix of the original feature matrix of the original training set, determine the feature matrix corresponding to the covariance matrix, and multiply the original feature matrix with the transpose of the feature matrix to obtain the dimensionality reduction matrix.
4. The data processing method according to claim 3, characterized in that, The step of using the N decision tree models to predict the multiple sets of feature data to obtain the prediction matrix includes: The multiple sets of feature data are predicted using each of the N decision tree models, resulting in predicted values for each decision tree model corresponding to the multiple training data. The i-th decision tree model is sequentially assigned to the multiple predicted values of the multiple training data as the i-th column of the prediction matrix, until all N columns of the prediction matrix are completed, thereby forming the prediction matrix; i is a positive integer greater than or equal to 1 and less than or equal to N.
5. The data processing method according to claim 3, characterized in that, Before training N decision tree models based on the N training sets in the original training set, the method further includes: Multiple initial data sets are obtained, and the multiple initial data sets are processed using a data warehouse method to obtain multiple training data sets, which are then used to form the original training set; each training data set includes: encoding information corresponding to the target and multiple feature data sets. The values of the multiple feature data corresponding to each training data are sequentially used as each row of the original feature matrix until all rows of the original feature matrix are completed, thereby forming the original feature matrix.
6. The data processing method according to claim 3, characterized in that, The step of calculating the covariance matrix of the original feature matrix of the original training set and determining the feature matrix corresponding to the covariance matrix includes: Calculate the covariance matrix of the original feature matrix, and calculate the multidimensional eigenvalues of the covariance matrix and the multiple eigenvectors corresponding to the multidimensional eigenvalues; Among the multidimensional feature values, the top d largest feature values are determined; the sum of the top d feature values is greater than or equal to a preset value, and the sum of the top d-1 feature values is less than the preset value; d is a positive integer greater than or equal to 1. The first d eigenvalues are represented by the first d eigenvectors corresponding to the first d eigenvalues in the plurality of eigenvectors, forming the d rows of the feature matrix, thereby forming the feature matrix.
7. The data processing method according to claim 3, characterized in that, The calculation of multiple prediction results based on the prediction matrix, the N original weights, and the dimensionality reduction matrix includes: The values of the N rows of the parameter matrix are constructed using the N original weights; Multiply the dimensionality reduction matrix by the transpose of the parameter matrix to obtain the matching degree matrix; By taking the exponent of each value in the matching degree matrix, the first intermediate matrix is obtained; Calculate the quotient of each value in the first intermediate matrix and the sum of the values in its corresponding row to obtain the second intermediate matrix; The multiple prediction results are calculated based on the second intermediate matrix and the prediction matrix.
8. The data processing method according to claim 7, characterized in that, The calculation of the multiple prediction results based on the second intermediate matrix and the prediction matrix includes: Each value in each row of the second intermediate matrix is multiplied by the corresponding value in the corresponding row of the prediction matrix to obtain multiple intermediate values for each row. The multiple intermediate values of each row are combined to obtain the vector of each row, thereby obtaining the multiple prediction results.
9. The data processing method according to claim 8, characterized in that, The process of combining the multiple prediction results with N loss functions, performing multiple iterations of training until the N loss functions converge, solving the N loss functions using gradient descent to obtain multiple gradient values, and combining the multiple gradient values with the N original weights to obtain the N weights includes: Calculate the derivatives of the N loss functions with respect to the multiple prediction results, and obtain N expressions for the N gradient values based on the derivatives and L'Hôpital's rule. Solve the N expressions to obtain the N gradient values. The N original weights are updated using the N gradient values, and the multiple prediction results are iteratively trained using the N loss functions until the N loss functions converge to obtain the N gradient values. The N intermediate gradient values from the previous training are converged by the N loss functions to obtain N intermediate weights. The N gradient values and the N intermediate weights are then added together to obtain the N weights.
10. The data processing method according to claim 9, characterized in that, The step involves calculating the derivatives of the N loss functions with respect to the multiple prediction results, obtaining N expressions for the N gradient values based on the derivatives and L'Hôpital's rule, and solving the N expressions to obtain the N gradient values, including: Calculate the N first derivatives of the N loss functions with respect to the multiple prediction results; Calculate the multiple prediction results and the multiple second derivatives with respect to the second intermediate matrix; Find the third derivative of the second intermediate matrix with respect to the first intermediate matrix; Find the fourth derivative of the first intermediate matrix with respect to the matching degree matrix; Find the fifth derivative of the matching degree matrix with respect to the parameter matrix; According to L'Hôpital's rule, the N first derivatives, the plurality of second derivatives, the third derivative, the fourth derivative, and the fifth derivative are combined to obtain the N expressions; Solve the N expressions to obtain the N gradient values.
11. The data processing method according to claim 2, characterized in that, The step of training the N decision tree models based on the N training data in the original training set includes: The original training set is sampled N times with replacement to obtain N training sets. Extract at least one feature data corresponding to at least one training data point from each of the N training sets; The original decision tree model is trained using at least one feature data corresponding to at least one training data in each training set to obtain the decision tree model corresponding to each training set, and thus the N decision tree models are obtained.
12. The data processing method according to any one of claims 1-11, characterized in that, The process of combining the N decision tree models and the N weights to obtain the optimized random forest model includes: Multiply the N decision tree models by their corresponding weights to obtain N intermediate models; The N intermediate models are combined to form the optimized random forest model.
13. A data processing apparatus, characterized in that, include: The processing unit is used to predict the original training set based on N decision tree models to obtain a prediction matrix, perform dimensionality reduction processing on the original feature matrix corresponding to the original training set to obtain a dimensionality reduction matrix, and obtain the N original weights in the N decision tree models. Based on the prediction matrix, the N original weights, and the dimensionality reduction matrix, multiple prediction results are calculated; the multiple prediction results are combined with N loss functions, and multiple iterations of training are performed until the N loss functions converge; the gradient descent method is used to solve the N loss functions to obtain multiple gradient values; the multiple gradient values and the N original weights are combined to obtain N weights. N is a positive integer greater than 1; the N decision tree models and the N weights are combined to obtain the optimized random forest model; The data acquisition unit is used to acquire multiple pieces of data that are currently pending processing. The processing unit is used to process the multiple data to be processed using the optimized random forest model to obtain the processing results of the multiple data to be processed; wherein, the data to be processed is the attribute data of the target; the target includes any one of users, stores and items.
14. A data processing apparatus, characterized in that, It includes a memory and a processor, the memory storing a computer program that can run on the processor, the processor executing the program to implement the steps of the method according to any one of claims 1 to 12.
15. A computer-readable storage medium having a computer program stored thereon, characterized in that, When executed by a processor, the computer program implements the steps of the method according to any one of claims 1 to 12.