Data processing method and apparatus, storage medium, device, and product

By fitting the feature dimensions of deep neural network sample data, a fitting result of a transparent model is generated, which solves the problem of the difficulty in interpreting the feature transformation process of deep neural networks and realizes the interpretability analysis of the feature transformation process.

CN115130573BActive Publication Date: 2026-06-23TENCENT TECHNOLOGY (SHENZHEN) CO LTD

Patent Information

Authority / Receiving Office
CN · China
Patent Type
Patents(China)
Current Assignee / Owner
TENCENT TECHNOLOGY (SHENZHEN) CO LTD
Filing Date
2022-06-24
Publication Date
2026-06-23

AI Technical Summary

Technical Problem

The feature transformation process of complex models such as deep neural networks is difficult to explain, and their decision-making mechanisms are incomprehensible.

Method used

By obtaining category labels for sample data, the features of the sample data under different feature dimensions are fitted, and a transparent model is used to generate fitting results, indicating the importance of different feature dimensions. These results are then used as the basis for feature transformation, generating interpretable analysis results.

Benefits of technology

It enables interpretability analysis of the feature transformation process of deep neural networks, reveals the importance of feature dimensions in decision-making, and provides a transparent decision-making mechanism.

✦ Generated by Eureka AI based on patent content.

Smart Images

  • Figure CN115130573B_ABST
    Figure CN115130573B_ABST
Patent Text Reader

Abstract

A data processing method, device, storage medium, equipment and product, the method comprises: obtaining at least one sample data and the category annotation label of any sample data; one sample data contains sample features in one or more feature dimensions, and the category annotation label of any sample data is obtained after feature conversion of sample features of any sample data in different feature dimensions; fitting processing is performed on sample features of each sample data in different feature dimensions according to the corresponding category annotation label, and a fitting result is obtained to indicate the importance of different feature dimensions in the process of adding the corresponding category annotation label to each sample data; according to the importance indicated by the fitting result, the feature dimension meeting the selection condition is used as the conversion basis in the feature conversion process; the conversion basis is used to generate an explainability analysis result of feature conversion. Through the method of the present application, explainability analysis of the feature conversion process can be realized.
Need to check novelty before this filing date? Find Prior Art

Description

Technical Field

[0001] This application relates to the field of computer technology, and in particular to a data processing method, apparatus, storage medium, device, and product. Background Technology

[0002] Complex models like deep neural networks can abstract sample data into a new vector space through feature transformation, increasing the ability to extract original information and thus exhibiting superior performance on most tasks. However, complex models like deep neural networks are essentially black boxes; people cannot understand why these black-box models make certain decisions. Therefore, interpretability analysis of the feature transformation process of such black-box models is essential. Summary of the Invention

[0003] This application provides a data processing method, apparatus, storage medium, device, and product that can enable interpretable analysis of the feature conversion process.

[0004] On one hand, embodiments of this application provide a data processing method, the method comprising:

[0005] Obtain at least one sample data and a category label for any sample data; a sample data contains sample features under one or more feature dimensions, and the category label for any sample data is obtained by feature transformation of the sample features of the sample data under different feature dimensions;

[0006] For each sample data under different feature dimensions, the sample features are fitted according to the corresponding category labels to obtain the fitting results, which indicate the importance of different feature dimensions during the process of adding corresponding category labels to each sample data.

[0007] Based on the importance indicated by the fitting results, the feature dimensions that meet the selection criteria are used as the basis for transformation in the feature transformation process; the basis for transformation is used to generate interpretability analysis results of feature transformation.

[0008] On one hand, embodiments of this application provide a data processing apparatus, the apparatus comprising:

[0009] An acquisition unit is used to acquire at least one sample data and a category label for any sample data; a sample data contains sample features under one or more feature dimensions, and the category label for any sample data is obtained by feature transformation of the sample features of the sample data under different feature dimensions;

[0010] The processing unit is used to fit the sample features of each sample data under different feature dimensions according to the corresponding category labels to obtain the fitting results, which indicate the importance of different feature dimensions during the process of adding corresponding category labels to each sample data.

[0011] The processing unit is further configured to use the feature dimensions that meet the selection criteria as the basis for transformation in the feature transformation process based on the importance indicated by the fitting result; the transformation basis is used to generate interpretability analysis results of feature transformation.

[0012] On one hand, embodiments of this application provide a computer device, which includes a processor, a communication interface, and a memory. The processor, the communication interface, and the memory are interconnected. The memory stores a computer program, and the processor is used to call the computer program to execute the data processing method of any of the above possible implementations.

[0013] On one hand, embodiments of this application provide a computer-readable storage medium storing a computer program that, when executed by a processor, implements a data processing method of any possible implementation.

[0014] On the one hand, embodiments of this application also provide a computer program product, which includes a computer program or computer instructions, and the computer program or computer instructions are executed by a processor to implement the steps of the data processing method provided in embodiments of this application.

[0015] On the one hand, embodiments of this application also provide a computer program, the computer program including computer instructions, the computer instructions being stored in a computer-readable storage medium, a processor of a computer device reading the computer instructions from the computer-readable storage medium, the processor executing the computer instructions, causing the computer device to perform the data processing method provided in embodiments of this application.

[0016] In this embodiment, at least one sample data and a category label for any sample data can be obtained. The category label for any sample data is obtained by feature transformation of the sample features of the sample data under different feature dimensions. By fitting the sample features of each sample data under different feature dimensions according to the corresponding category label, a fitting result can be obtained. This fitting result can indicate the importance of different feature dimensions during the process of adding corresponding category labels to each sample data. Therefore, based on the importance indicated by the fitting result, the feature dimensions that meet the selection conditions can be used as the transformation basis in the feature transformation process. That is, the feature dimensions that meet the selection conditions are important factors used to distinguish sample data in the feature transformation process, and interpretability analysis results of feature transformation can be generated based on this transformation basis. Through this embodiment, interpretability analysis of the feature transformation process can be achieved. Attached Figure Description

[0017] To more clearly illustrate the technical methods of the embodiments of this application, the accompanying drawings used in the description of the embodiments will be briefly introduced below. Obviously, the accompanying drawings described below are some embodiments of this application. For those skilled in the art, other drawings can be obtained based on these drawings without creative effort.

[0018] Figure 1 This is a schematic diagram of a data processing method provided in an embodiment of this application;

[0019] Figure 2 A flowchart illustrating a data processing method provided in an embodiment of this application;

[0020] Figure 3 A flowchart illustrating another data processing method provided in an embodiment of this application;

[0021] Figure 4 A schematic diagram of a target network model provided in an embodiment of this application;

[0022] Figure 5 This is a schematic diagram illustrating the acquisition of at least one sample data according to an embodiment of this application;

[0023] Figure 6 A flowchart illustrating another data processing method provided in an embodiment of this application;

[0024] Figure 7 A schematic diagram of the decision path of a target network model provided in an embodiment of this application;

[0025] Figure 8 A schematic diagram of the decision path of another target network model provided in an embodiment of this application;

[0026] Figure 9 This is a schematic diagram of the structure of a data processing device provided in an embodiment of this application;

[0027] Figure 10 This is a schematic diagram of the structure of a computer device provided in an embodiment of this application. Detailed Implementation

[0028] The technical methods in the embodiments of this application will be clearly and completely described below with reference to the accompanying drawings. Obviously, the described embodiments are only some embodiments of this application, and not all embodiments. Based on the embodiments in this application, all other embodiments obtained by those of ordinary skill in the art without creative effort are within the scope of protection of this application.

[0029] This application proposes a data processing method that enables interpretability analysis of feature transformation processes and can be applied to various fields or scenarios such as cloud technology, artificial intelligence, blockchain, vehicle networking, smart transportation, and smart homes. In one embodiment, this data processing method can be implemented based on machine learning technology within artificial intelligence. Artificial intelligence is a comprehensive discipline involving a wide range of fields, encompassing both hardware and software technologies. Fundamental AI technologies generally include sensors, dedicated AI chips, cloud computing, distributed storage, large-scale video processing, operating / interactive systems, and mechatronics. AI software technologies mainly include computer vision, speech processing, natural language processing, and several major directions such as machine learning / deep learning, autonomous driving, and smart transportation. Machine learning (ML) is a multidisciplinary field involving probability theory, statistics, approximation theory, convex analysis, and algorithm complexity theory. It specifically studies how computers can simulate or implement human learning behavior to acquire new knowledge or skills and reorganize existing knowledge structures to continuously improve their performance. Machine learning is the core of artificial intelligence and the fundamental way to give computers intelligence; its applications span all areas of artificial intelligence. Machine learning and deep learning typically include techniques such as artificial neural networks, belief networks, reinforcement learning, transfer learning, inductive learning, and instructional learning. For example, in this embodiment of the application, machine learning techniques are used to fit sample data and corresponding category labels, so as to determine the conversion basis in the feature transformation process based on the obtained fitting results.

[0030] The data processing method proposed in this application is executed by a computer device, which may include one or more of a terminal and a server. That is, the data processing method proposed in the embodiments of this application can be executed by a terminal, by a server, or jointly executed by a terminal and a server capable of communicating with each other.

[0031] The terminal can be a smartphone, tablet, laptop, desktop computer, smart voice interaction device, smart home appliance, in-vehicle terminal, etc. The server can be a standalone physical server, a server cluster or distributed system composed of multiple physical servers, or a cloud server that provides basic cloud computing services such as cloud services, cloud databases, cloud computing, cloud functions, cloud storage, network services, cloud communication, middleware services, domain name services, security services, CDN (Content Delivery Network), and big data and artificial intelligence platforms.

[0032] Complex models like deep neural networks contain numerous non-linear network layers. While combinations of these layers can extract representations from raw data at various levels of abstraction, their high complexity and numerous parameters make it difficult to understand how this "end-to-end" approach makes decisions. Transparent models, on the other hand, are simple in structure and intuitively understandable. Examples include logistic regression, decision trees, and Naive Bayes models, which can explain all stages from data input to output prediction. Therefore, as... Figure 1 As shown, in the data processing method proposed in this application, after the computer device performs feature transformation on the sample data to obtain sample representations, it uses the category attributes of the sample data reflected by the sample representations to add category labels to the sample data. Then, it uses a transparent model to fit the sample data to the corresponding category labels, obtaining a fitting result. This fitting result is actually the interpretable analysis result of the transparent model, which can reflect the degree of influence of different feature dimensions of the sample data on the fitting of the corresponding category labels when fitting the sample data to the corresponding category labels. Therefore, this application uses the fitting result to indicate the importance of different feature dimensions during the process of adding corresponding category labels to each sample data. Through the importance indicated by the fitting result, it can be known which feature dimensions in a sample data are important factors in the feature transformation process, thereby obtaining the transformation basis in the feature transformation process, and generating interpretable analysis results of feature transformation based on the transformation basis, so as to realize interpretable analysis of the feature transformation process.

[0033] Please see Figure 2 , Figure 2 This is a flowchart illustrating a data processing method provided in an embodiment of this application. This method can be applied to the above-described... Figure 1 The method includes a computer device in the middle.

[0034] S201. Obtain at least one sample data and a category label for any sample data; a sample data contains sample features under one or more feature dimensions, and the category label for any sample data is obtained by feature transformation of the sample features of any sample data under different feature dimensions.

[0035] In this embodiment, a sample data point can be used to describe an object, which refers to an objectively existing thing, such as a person, a flower, or an animal. One or more feature dimensions are obtained by dividing the object into multiple dimensions. For example, a person can be divided according to height and weight, a flower can be divided according to color, size, and growth stage, and an animal can be divided according to markings and body temperature. The sample feature under a feature dimension refers to the value corresponding to that feature dimension. For example, a person's height is 156 centimeters, a flower's color is red, and an animal's body temperature is constant. It should be noted that each sample data point in the at least one sample data point contains one or more of the same feature dimensions.

[0036] In one embodiment, a computer device can perform feature transformation on any sample data under different feature dimensions to obtain a feature vector corresponding to any sample data. Then, based on the feature vector corresponding to any sample data, sample data with highly similar corresponding feature vectors are grouped into one dataset, while sample data with significantly different corresponding feature vectors are grouped into different datasets. Finally, a category label is determined based on the dataset to which any sample data belongs, wherein one category label can correspond to (indicate) one dataset.

[0037] The process of feature transformation on sample data is essentially a black-box model process. This black-box model analyzes the sample features across various feature dimensions to determine which category the sample data tends to belong to, generating feature vectors that make the sample data more likely to be classified as that category. Taking a common image classification model as an example, if the true label of an image is "cat," then image features that easily classify the image as a cat will be generated. These image features will also differ significantly from image features with true labels of other categories, allowing the model to perform classification. Therefore, the higher the similarity between corresponding feature vectors, the greater the likelihood that the sample data will be classified into the same category; conversely, the greater the difference between corresponding feature vectors, the lower the likelihood. The process of classifying at least one sample data point using feature vectors is essentially determining which category the feature transformation process tends to assign each sample data point to (indicated by the category label). Because feature transformation often involves numerous non-linear mappings, it's impossible to know which feature dimensions of the sample data are analyzed during feature transformation.

[0038] S202. For each sample data under different feature dimensions, perform fitting processing according to the corresponding category label to obtain the fitting result, which indicates the importance of different feature dimensions during the process of adding corresponding category labels to each sample data.

[0039] In this embodiment, the sample features of each sample data under different feature dimensions are fitted according to the corresponding category labels to obtain the fitting result. This fitting process is equivalent to inputting the sample features of any sample data under different feature dimensions and outputting the category label of that sample data. Therefore, the fitting process and the feature transformation process are essentially the same: by analyzing the sample features under each feature dimension in the sample data, it is determined which category label the sample data tends to belong to. This fitting process can be implemented using a transparent model. Since a transparent model can explain all the steps in the process from data input to output prediction, the interpretability of the feature transformation process can be analyzed through the interpretability results of the transparent model. Specifically, the interpretability analysis of the transparent model, or the fitting process, is based on the fitting results. These fitting results can reflect the degree of influence of different feature dimensions of any sample data when fitting the corresponding category label. This degree of influence can reflect the importance of each feature dimension during the process of adding the corresponding category label to each sample data.

[0040] Specifically, this fitting process can determine a fitting model. The sample features of any given sample data under different feature dimensions are used as input to the fitting model, and the target weights w in the fitting model are used as inputs. jm The sample data is transformed to obtain the output of the fitted model, which indicates the class label of any sample data. For example, when any sample data is input into the fitted model, the output of the fitted model is the predicted probability of adding different class labels to that sample data, and the class label with the highest predicted probability is the class label of that sample data. At this time, the target weight w... jm The absolute value of this value reflects the influence of the j-th feature dimension of any given sample data on the predicted probability when the m-th category label is added, as determined by the fitted model. Additionally, the target weight w... jm If the value is positive, it indicates that the j-th feature dimension is positively correlated with the predicted probability; that is, the larger the sample feature in the j-th feature dimension, the higher the predicted probability. The target weight w... jm If the value is negative, it indicates that the j-th feature dimension is negatively correlated with the predicted probability; that is, the larger the sample feature in the j-th feature dimension, the smaller the predicted probability will be. Here, j and m are integers greater than 0.

[0041] Then, through the above target weight w jm It can be seen that if the target weight w jm If the value is positive and larger, then the sample feature in the j-th feature dimension of a sample data is larger, and the sample data is more likely to be labeled with the m-th category. If the target weight w jm The smaller the negative number w, the larger the sample feature in the j-th feature dimension of a sample data, and the less likely that sample data will be labeled with the m-th category. If the target weight w jm The smaller the absolute value, the less impact the sample feature in the j-th feature dimension of a sample data point has on whether or not the sample data is labeled with the m-th category. Therefore, the target weight w is... jm As a result of the fitting, we can know which feature dimensions are more important during the feature transformation process, and how the feature dimensions affect the classification of the sample data.

[0042] S203. Based on the importance indicated by the fitting results, the feature dimensions that meet the selection conditions are used as the basis for transformation in the feature transformation process; the basis for transformation is used to generate interpretability analysis results of feature transformation.

[0043] Feature transformation can abstract sample data into a high-dimensional space (such as a vector space), but it is difficult to interpret the information learned during the feature transformation process. People cannot know which feature dimensions play an important role in the feature transformation process and how they play that role; that is, they do not know the basis for the transformation process.

[0044] Since the fitting process also involves determining which category the sample data tends to be labeled based on the sample data under different feature dimensions, the fitting of the sample data can enable interpretability analysis of the aforementioned feature transformation steps. In other words, this application uses the importance indicated by the fitting results to use the feature dimensions that meet the selection criteria as the basis for transformation in the feature transformation process. The importance indicated by the fitting results is determined by the target weight w. jm Quantifiable. Understandable, when the target weight w jm When the value is positive and larger, the sample feature under the j-th feature dimension in a sample data is larger, and the easier it is to add the m-th category label. Since each category label corresponds to a dataset, during feature transformation, the larger the sample feature under the j-th feature dimension in a sample data, the easier it is for that sample data to generate similar features with the sample data in the dataset corresponding to the m-th category label.

[0045] In one implementation, feature dimensions are selected sequentially according to their importance as indicated by the fitting results, from largest to smallest. These selected feature dimensions are then used as the basis for feature transformation. This includes: assuming a sample data package contains sample features under d feature dimensions, then there exists a target weight set W for labeling the m-th category. m ={w 1m ,w 2m ,...,w dm} can be based on the target weight set W m ={w 1m ,w 2m ,...,w dm The target weights in the formula are selected in descending order. The target weights of the target quantity (which can be set manually) are selected one by one. The feature dimension corresponding to the target weight of the target quantity is used as the basis for the feature transformation process. In the feature transformation process, the larger the sample feature under the feature dimension corresponding to the target weight of the target quantity in a sample data, the easier it is for that sample data to generate similar features with the sample data in the dataset corresponding to the m-th category label. This is the interpretability analysis result of the feature transformation.

[0046] In another implementation, based on the importance indicated by the fitting results, feature dimensions with importance greater than the importance threshold are selected, and these selected feature dimensions are used as the basis for feature transformation. This includes: assuming a sample data package contains sample features under d feature dimensions, then there exists a target weight set W for labeling the m-th category. m ={w 1m ,w 2m ,...,w dm The algorithm can obtain target weights in the target weight set that are greater than a weight threshold (which can be set manually). The feature dimension corresponding to the target weight that is greater than the weight threshold is used as the basis for the feature transformation process. In the feature transformation process, the larger the sample feature of the feature dimension corresponding to the target weight that is greater than the weight threshold in a sample data, the easier it is for that sample data to generate similar features with the sample data in the dataset corresponding to the m-th category label. This is the interpretability analysis result of the feature transformation.

[0047] In this embodiment, after performing feature transformation on sample data to obtain sample representations (i.e., feature vectors), category labels are added to the sample data using the category attributes reflected by the sample representations. Then, by fitting the sample data to the corresponding category labels, a fitting result is obtained. This fitting result indicates the importance of different feature dimensions during the process of adding corresponding category labels to each sample data point. By using the importance indicated by the fitting result, it is possible to determine which feature dimensions are more important during the feature transformation process and how these feature dimensions affect the classification of the sample data's category labels. This allows for the determination of how feature dimensions influence the feature transformation of the sample data, obtaining the basis for the feature transformation process and the results of interpretability analysis, thus enabling interpretability analysis of the feature transformation process.

[0048] Please see Figure 3 , Figure 3 This is a flowchart illustrating another data processing method provided in an embodiment of this application. This method can be applied to the above-described... Figure 1 The computer device in the middle, the method includes:

[0049] S301. Obtain at least one sample data and a category label for any sample data; a sample data contains sample features under one or more feature dimensions, and the category label for any sample data is obtained by feature transformation of the sample features of any sample data under different feature dimensions.

[0050] In one embodiment, feature transformation can be performed on the sample features of each sample data under different feature dimensions to obtain the feature vector corresponding to each sample data. The feature vector corresponding to each sample data is obtained by calling the i-th hidden layer of the target network model containing classification function for feature transformation. Here, the target network model can contain N hidden layers, where N is a positive integer greater than or equal to 1, and i is a positive integer greater than 0 and less than or equal to N.

[0051] The target network model can be a simple classification model, where the sample data can involve multiple feature dimensions such as flower size, color, and height, allowing the target network model to predict the type of flower. Alternatively, it can be a recommendation model, where the sample data can involve multiple feature dimensions such as the number of times an object purchases goods each month, the average price of goods purchased, the price of recommended goods, and the number of views of recommended goods. The target network model can then use this sample data to predict the likelihood of an object purchasing a recommended product. The number of times an object purchases goods each month and the average price of goods purchased are only obtained after obtaining the user's permission or consent.

[0052] like Figure 4 As shown, the target network model can be a deep neural network containing N hidden layers. Each sample data point's features under different feature dimensions can be input into the target network model. The first hidden layer of the target network model transforms and abstracts the input sample features into a new vector space, outputting a feature vector h1 corresponding to each sample data point. This feature vector h1 is then input into the next hidden layer to output a feature vector h2 corresponding to each sample data point. By stacking N hidden layers, the feature vector h2 corresponding to each sample data point can be obtained. N In one implementation, the i-th hidden layer of the target network model containing classification functionality is invoked for feature transformation to obtain the feature vector corresponding to each sample data. This includes: inputting the sample features of each sample data under different feature dimensions into the target network model; the first i layers of the target network model are stacked layer by layer; and the feature vector h is obtained through transformation and abstraction. i-1 Further, the feature vector h i-1 Inputting the i-th hidden layer of the network yields the feature vector h corresponding to each sample data. i .

[0053] It should be noted that the i-th hidden layer of the network takes the input feature vector h as input. i-1 Transformed into eigenvector h iThe process is called feature transformation. Therefore, the interpretability analysis result obtained after performing feature transformation on at least one sample data based on the i-th hidden layer of the network is used as the interpretability analysis result of the i-th hidden layer. Subsequent steps are actually also performing interpretability analysis on the i-th hidden layer.

[0054] Furthermore, based on the feature vector corresponding to each sample data, clustering operations are performed on at least one sample data to divide it into different datasets. Clustering algorithms are unsupervised methods, meaning that the true labels of the sample data are not used during training; instead, the data with similar representations are grouped into the same dataset based on the representation of the sample data itself. One feasible implementation is to use a clustering algorithm to cluster the feature vector corresponding to each sample data. This clustering algorithm can be the K-means algorithm (hard clustering algorithm), DBSCAN (Density-Based Spatial Clustering of Applications with Noise), etc. Taking the K-means algorithm as an example, ① select M (integers greater than 0) initial cluster centers {v1, v2, ..., V...} m}; ② Calculate the distance between the feature vector corresponding to each sample data and the M initial cluster centers, and assign it to the nearest cluster; ③ For each cluster, calculate the average feature vector within the current cluster, and use it as the new cluster center; ④ Repeat ②-③ until the termination condition is met, such as a preset upper limit for the number of iterations or the feature vectors in each cluster not changing.

[0055] Through the clustering process described above, the feature vector corresponding to each sample data can be divided into different clusters. Furthermore, by leveraging the correspondence between feature vectors and sample data, at least one sample data can be clustered. That is, feature vectors belonging to the same cluster have their corresponding sample data grouped into the same dataset, thus separating at least one sample data into different datasets. Understandably, since there are M cluster centers, M clusters are created, resulting in M ​​datasets.

[0056] In this approach, each dataset corresponds to a clustering label. In one implementation, after performing clustering operations on at least one sample dataset, the total number of datasets obtained can be determined, and the set number corresponding to each dataset can be determined based on this total number. For example, if there are three datasets with set numbers 1, 2, and 3, the set number corresponding to one dataset can be used as the clustering label for that dataset, and the clustering label of the dataset containing any sample data can be used as the category label for that sample data. Therefore, the category label of any sample data can indicate the dataset to which that sample data belongs. For example, if the category label of any sample data is 3, it means that the set number and clustering label of the dataset containing that sample data are 3.

[0057] In feasible embodiments, at least one reference sample data can be obtained for interpretable analysis of the (i-1)th hidden layer of the target network model, and at least two reference sample sets can be obtained after clustering the at least one reference sample data during the interpretable analysis of the (i-1)th hidden layer of the network. The reference sample data contained in either reference sample set can be used as the at least one sample data obtained above. Figure 5 As shown, when performing interpretability analysis on the (i-1)th hidden layer of the network, n (integers greater than 1) reference sample sets are obtained. When performing interpretability analysis on the i-th hidden layer of the network, any one of these reference sample sets can be used as at least one sample data for interpretability analysis of the i-th hidden layer. Therefore, during the interpretability analysis of the i-th hidden layer, any reference sample set corresponding to the (i-1)th hidden layer of the network can be further divided into multiple (n) reference sample sets corresponding to the i-th hidden layer.

[0058] Each sample data point can have a true label, which can be determined based on the classification task of the target network model. For example, if the target network model is to determine whether a person buys a product, the true label can include "buy" or "don't buy"; if the target network model is to determine the group to which an animal belongs, the true label can include amphibian, mammal, or reptile. In one implementation, M can be the number of true labels. For example, if the true labels include "buy" and "don't buy," M is 2; if the true labels include amphibian, mammal, and reptile, M is 3. The true label of each sample data point in any dataset can be obtained, and the number of samples corresponding to the same true label can be determined. Then, based on the number of samples corresponding to the same true label and the total number of samples in any dataset, the percentage of samples corresponding to the same true label is calculated, i.e., percentage of samples corresponding to the same true label = number of samples corresponding to the same true label / total number of samples in any dataset, thus obtaining the percentage of samples corresponding to different true labels. Furthermore, the true label with the largest percentage of samples in any dataset can be obtained. If the true labels of the maximum sample percentages for each of the M datasets are different, and their percentages are greater than a threshold (which can be set manually), then it can be determined that the target network model has good feature transformation capabilities by the time it reaches the i-th hidden layer, and can perform the classification task well. In this case, the target network model can be pruned, retaining only the i-th hidden layer and the hidden layers before the i-th hidden layer. For example, if the true label of the maximum sample percentage for dataset 1 is amphibian and the amphibian percentage is 98%, the true label of the maximum sample percentage for dataset 2 is mammal and the mammal percentage is 98%, and the true label of the maximum sample percentage for dataset 3 is reptile and the reptile percentage is 98%, then it is considered that the target network model has good feature transformation capabilities by the time it reaches the i-th hidden layer.

[0059] S302. Obtain the target algorithm for interpretability analysis.

[0060] The target algorithm can be a transparent model with self-explanatory capabilities. A transparent model is a type of model with a simple structure that can be intuitively understood. This application uses a logistic regression model as an example for illustration. Assume a sample data set includes sample features x = (x1, x2, ..., xn) across d (integers greater than 0) feature dimensions. d If the logistic regression model is expressed as shown in equation (1), it learns the mapping relationship from sample data x to predicted probability f(x) by training a set of θ=(W,b).

[0061] f(x)=σ(W T x+b)(1)

[0062] Where W = (w1, w2, ..., w dW represents the prediction weights, b represents the bias term, W and b are adjustable model parameters that can be obtained through parameter initialization, and T represents the transpose operation, meaning W... T This is obtained by transposing W, where x represents the input sample data; σ() represents the sigmoid function, which limits the range of f(x) to [0,1]. f(x) represents the predicted probability that a sample x is a positive sample, and 1-f(x) represents the predicted probability that a sample x is a negative sample. Here, w in W... j The absolute value of w reflects the magnitude of the influence of the j-th feature dimension on the predicted probability f(x). Additionally, w j If the value is positive, it means that the corresponding feature dimension and the prediction probability f(x) are positively correlated. That is, the larger the sample feature under the j-th feature dimension in a sample data, the larger the prediction probability f(x) is, and vice versa.

[0063] Equation (1) above is an expression for a binary logistic regression model, and Equation (2) below is an expression for a multi-class logistic regression model based on multiple binary logistic regression models.

[0064]

[0065] Where P(y=m|x,W) represents the predicted probability that a sample data x is classified into the m-th class, and W m =(w 1m ,w 2m ,...,w dm ) is the prediction weight, W m w in jm The magnitude of the absolute value can reflect the degree of influence of the j-th feature dimension on the predicted probability P(y=m|x,W) when a sample data x is classified into the m-th category.

[0066] S303. Using the target algorithm, the sample features of each sample data under different feature dimensions are fitted according to the corresponding category labels to obtain the fitting results, which indicate the importance of different feature dimensions during the process of adding corresponding category labels to each sample data.

[0067] In one embodiment, a target algorithm is used to fit the sample features of each sample data under different feature dimensions according to the corresponding category labels to obtain the fitting result. This includes: using the target algorithm to perform label prediction processing on the sample features of each sample data under different feature dimensions to obtain the category prediction result of each sample data. Specifically, equation (2) can be converted into the following equation (3).

[0068]

[0069] The sample features of each sample data under different feature dimensions are used as the input x = (x1, x2, ..., x) of the multi-class logistic regression model shown in Equation (3). d This function outputs the class prediction result for each sample data point: P(y=1|x,W), P(y=2|x,W), ..., P(y=M|x,W). This class prediction result for each sample data point indicates the predicted probability that each sample data point will be classified into different class labels. Since there are M datasets, and each dataset corresponds to one class label, there are M class labels.

[0070] Furthermore, the prediction weights of the target algorithm can be adjusted using the category prediction results and category labels of each sample data, that is, the prediction weights W1, W2, ..., W in equation (3) can be adjusted. M Adjustments can be made. For example, a loss function suitable for multi-class classification can be obtained, such as the cross-entropy loss function, the expression of which is shown in equation (4) below.

[0071]

[0072] Where loss1 represents the first loss value, S represents the number of samples for at least one sample data point, and x j Let P(y = m|x) represent the j-th sample data. j y(w) represents the predicted probability that the j-th sample data will be added as the label for the m-th class. jm This represents the sign function, which takes the value 0 or 1. If the sample data x j If the category label is the label of the m-th category, then y jm One is 1, and the rest are 0.

[0073] Then substitute the category prediction result of each sample data into P(y=m|x) in equation (4). j ,W) and the y in equation (4) which determines the category label based on each sample data jm This allows equation (4) to output the first loss value, which can then be used to adjust the prediction weights W1, W2, ..., W in equation (3) using this first loss value and the stochastic gradient descent method. M The adjustment process is repeated multiple times until a stopping condition is met, such as reaching a specified number of adjustments or the cross-entropy loss function converging. The adjusted prediction weights are then used as the fitting result. The adjusted prediction weights W1 = (w 11 ,w 21 ,...,w d1 W2 = (w 12 ,w22 ,...,w d2 ), ..., W M =(w 1M ,w 2M ,...,w dM The target weights in S202 and S203 above, and the multi-class logistic regression model containing these target weights are the fitting models in S202 above.

[0074] In another embodiment, if the number of label types obtained for the category labels is at least one, a target algorithm is used to fit the sample features of each sample data under different feature dimensions according to the corresponding category labels to obtain the fitting result. This includes: selecting a category label of a target label type from the obtained category labels, and using the selected category label of the target label type as the fitting target. Specifically, one label type can be selected sequentially from at least one label type as the target label type to select the category label of the target label type.

[0075] Furthermore, based on the fitting objective and using the objective algorithm, the sample features of each sample data under different feature dimensions are fitted according to the corresponding category labels to obtain the fitting results. Assuming that category label 1 is used as the fitting objective, the sample features of a sample data under different feature dimensions are used as the input x = (x1, x2, ..., x) of the binary logistic regression model shown in equation (1). d ), to output the predicted probability f(x) of a sample data x. Further, obtain the expression for the log loss function applicable to binary classification as shown in the following equation (5).

[0076]

[0077] Where loss2 represents the second loss value, f(x) j ) represents the predicted probability that the j-th sample is a positive sample, y j Let y represent the label of the j-th sample data. If the category label of the j-th sample data is the fitting target, then y j The value is 1 if the class label is 1, and 0 otherwise. For example, when class label 1 is the fitting target, if the class label of the j-th sample data is class label 1, then y j It is 1 if it is true, otherwise it is 0.

[0078] In one embodiment, the predicted probability of each sample data can be substituted into f(x) in equation (5). j And whether the category label based on each sample data is the fit target in equation (5) jThis allows equation (5) to output a second loss value, which can then be used to adjust the prediction weights W = (w1, w2, ..., w) in equation (1) using this second loss value and the stochastic gradient descent method. d This process is repeated multiple times until a stopping condition is met, such as reaching a specified number of adjustments or the log loss function converging. The adjusted prediction weights W = (w1, w2, ..., w...). d The adjusted prediction weights W = (w1, w2, ..., w2) are the fitting results obtained based on the fitting target and are associated with the fitting target. d The first (w1, w2, ..., w3) is the fitting result associated with the fitting target: category label 1. Then, category label 2 can be used as the fitting target, and the fitting result associated with category label 2 can be obtained in the same way. Understandably, assuming there are M category labels, and each category label has an associated fitting result, then the fitting result associated with the m-th category label is W = (w1, w2, ..., w3). d ), which is equivalent to W in equation (4) above. m This refers to the target weights in S202 and S203 above. Simultaneously, multiple binary logistic regression models will be obtained, which are the fitted models in S202 above.

[0079] S304. Based on the importance indicated by the fitting results, the feature dimensions that meet the selection criteria are used as the basis for transformation in the feature transformation process; the basis for transformation is used to generate interpretability analysis results of feature transformation.

[0080] In one implementation, feature dimensions are selected sequentially according to their importance as indicated by the fitting results, from largest to smallest. These selected feature dimensions are then used as the basis for feature transformation. This includes: assuming a sample data package contains sample features under d feature dimensions, then there exists a target weight set W for labeling the m-th category. m ={w 1m ,w 2m ,...,w dm} can be based on the target weight set W m ={w 1m ,w 2m ,...,w dmThe target weights in the formula are selected in descending order. The target weights of the target quantity (which can be set manually) are selected one by one. The feature dimension corresponding to the target weight of the target quantity is used as the basis for the feature transformation process. In the feature transformation process, the larger the sample feature under the feature dimension corresponding to the target weight of the target quantity in a sample data, the easier it is for that sample data to generate similar features with the sample data in the dataset corresponding to the m-th category label. This is the interpretability analysis result of the feature transformation.

[0081] In another implementation, based on the importance indicated by the fitting results, feature dimensions with importance greater than the importance threshold are selected, and these selected feature dimensions are used as the basis for feature transformation. This includes: assuming a sample data package contains sample features under d feature dimensions, then there exists a target weight set W for labeling the m-th category. m ={w 1m ,w 2m ,...,w dm The algorithm can obtain target weights in the target weight set that are greater than a weight threshold (which can be set manually). The feature dimension corresponding to the target weight that is greater than the weight threshold is used as the basis for the feature transformation process. In the feature transformation process, the larger the sample feature of the feature dimension corresponding to the target weight that is greater than the weight threshold in a sample data, the easier it is for that sample data to generate similar features with the sample data in the dataset corresponding to the m-th category label. This is the interpretability analysis result of the feature transformation.

[0082] The interpretability analysis results of the above feature transformation are the interpretability analysis results of the i-th hidden layer of the network. In feasible embodiments, the decision logic of the i-th hidden layer of the target network model can also be analyzed (an interpretability analysis). Specifically, the true label of each sample data in any dataset is obtained, and the number of samples corresponding to the same true label is determined. Based on the number of samples corresponding to the same true label and the total number of sample data in any dataset, the proportion of samples corresponding to the same true label is calculated, that is, the proportion of samples corresponding to the same true label = the number of samples corresponding to the same true label / the total number of sample data in any dataset, thus obtaining the proportion of samples corresponding to different true labels. Then, based on the proportion of samples corresponding to different true labels, the true label with the largest proportion of samples in any dataset is determined. If the larger the sample feature in the reference feature dimension of a sample data set, the easier it is for that sample data set to generate similar features to sample data in any other dataset. Therefore, the larger the sample feature in the reference feature dimension of that sample data set, the more likely the i-th hidden layer of the network is to predict the true label of that sample data set as the true label with the largest sample proportion in any other dataset. Thus, during feature transformation, the i-th hidden layer will ensure that the similarity between the feature vector generated by the feature transformation of that sample data set and the feature vector corresponding to the sample data with the true label of the largest sample proportion is less than a threshold. Therefore, the true label with the largest sample proportion is used to indicate that the similarity between the feature vector generated by the feature transformation and the feature vector corresponding to the sample data with the true label of the largest sample proportion is less than a threshold.

[0083] In summary, the data processing method provided in this application embodiment follows the following flow: Figure 6 As shown, the process includes: ① Obtaining a target network model, which may include multiple hidden layers. ② Inputting at least one sample data into the target network model to obtain the sample representation set X = {x1, x2, ..., x...} output by the Lth hidden layer. n}, where x j③ Use a clustering algorithm, such as K-means, to cluster the set of sample representations, dividing them into two clusters. Based on the cluster to which the sample representation belongs, divide at least one sample data into dataset X0 and dataset X1 (more datasets can be divided; two datasets are used as an example here). ④ Determine the category label of the sample data based on the cluster labels corresponding to the datasets to which the sample data belongs. ⑤ Use a target algorithm to fit the sample data and the category labels of the sample data to obtain the fitting result. Obtain the important feature dimension of the Lth hidden layer of the network through the fitting result. This important feature dimension is the feature dimension that meets the selection criteria. ⑤ Based on the important feature dimension of the Lth hidden layer of the network, determine the transformation basis of the Lth hidden layer of the network during feature transformation, and determine the interpretability analysis result of the Lth hidden layer of the network based on the transformation basis. ⑥ Use each sample data point in dataset X0 as at least one sample data point for interpretability analysis of the (L+1)th hidden layer of the network; and use each sample data point in dataset X1 as at least one sample data point for interpretability analysis of the (L+1)th hidden layer of the network, thereby obtaining the subdivision logic of datasets X0 and X1. In this way, the interpretability analysis results of each hidden layer in the target network model can be obtained.

[0084] In this embodiment, after using the hidden layers of the target network model to perform feature transformation on the sample data to obtain sample representations, the sample representations are clustered based on the category attributes reflected by the sample data, thereby dividing the sample data into different datasets. Then, the category labels of the sample data are determined based on the dataset to which the sample data belongs. Furthermore, the sample data is fitted to the corresponding category labels using a logistic regression model to obtain the target weights. Through these target weights, the importance (influence) of different feature dimensions can be determined during the process of adding corresponding category labels to each sample data. This allows us to determine which feature dimensions are more important during the feature transformation process of the hidden layers of the network, and how feature dimensions affect the classification of sample data. Thus, we can determine how feature dimensions affect the feature transformation of sample data, obtain the transformation basis and interpretability analysis results of the feature transformation process, and realize interpretability analysis of the feature transformation process.

[0085] In one embodiment, after interpretability analysis results are obtained for all hidden layers in the target network model, the decision path of the target network model can be determined based on the interpretability analysis results of each hidden layer.

[0086] The following Figure 7As an example, the census-income dataset includes sample data drawn from the Census Bureau database. Each sample contains a real label indicating whether an individual's annual electronic resource access exceeds $50,000. Feature dimensions include basic attributes such as gender, as well as information such as job type and years of education. Each sample was obtained with the corresponding user's permission or consent. Furthermore, the target network model includes a first hidden layer and a second hidden layer.

[0087] After inputting each sample data point from the census-income dataset into the hidden layer of the first network for feature transformation, we obtain the feature vector corresponding to each sample data point. Using these feature vectors, we can perform clustering operations to divide the sample data into different datasets, specifically using the k-means algorithm. Here, we have divided the dataset into two sets: in dataset 1, positive samples (annual electronic resource acquisition greater than $50,000) account for 23.53%, while in dataset 2, positive samples account for 100%. Figure 7 The portion indicated by 71 in the diagram represents the transformation basis for the feature transformation of the first hidden layer of the network, i.e., the important feature dimensions of the first hidden layer. The larger the sample feature under the feature dimension indicated by dataset 1 in a sample dataset, the easier it is for that sample dataset to be classified into dataset 1. Similarly, the larger the sample feature under the feature dimension indicated by dataset 2 in a sample dataset, the easier it is for that sample dataset to be classified into dataset 2. Therefore, if a person is unmarried, has a large capital loss, a private salary type, a high school diploma, and is female, the first hidden layer of the network tends to assume that this person's electronic resource acquisition does not exceed $50,000. Conversely, if a sample dataset shows a high capital gain, a long period of education, is married, is male, has a bachelor's degree, or has a master's degree, the first hidden layer tends to assume that this person's electronic resource acquisition exceeds $50,000.

[0088] Furthermore, since dataset 2 consists entirely of positive samples, we can analyze only dataset 1. That is, each sample in dataset 1 serves as at least one sample for interpretability analysis in the second network's hidden layer. Similarly, by partitioning the datasets and fitting a logistic regression model, we obtain datasets 3 and 4, as well as the transformation criteria for feature transformation in the second network's hidden layer. Figure 7(As indicated in part 72), the proportion of positive samples in dataset 3 is 0.7%, and the proportion of positive samples in dataset 4 is 36.4%. If a person has never been married, is a cleaner, works in agriculture, and has a highest level of education of 11th grade, the target network model tends to favor a lower probability that this person has more than $50,000 in electronic resources. The longer a person's weekly working hours, the more likely they are to be male, and the more likely they are to be a professional technician, the more likely the target network model tends to favor a person to have more than $50,000 in electronic resources.

[0089] As can be seen from the above examples, by understanding the transformation criteria of each hidden layer of the network, we can know how each hidden layer predicts whether a person's electronic resource acquisition will exceed $50,000. Therefore, we can obtain the decision path of the target network model, which can solve the problem of the lack of transparency in complex models such as deep neural networks and improve the interpretability of the target network model.

[0090] In addition, with the following Figure 8 To further illustrate this point, we will continue with an example. Figure 8 The target network model used determines whether a user will purchase a candidate item (e.g., a fund). A positive sample indicates a user purchasing a candidate item, while a negative sample indicates a user not purchasing a candidate item. Each sample data point used by the target network model is acquired only after obtaining the user's permission or consent. Specifically, the larger the sample features in the feature dimension indicated by cluster 0 within a sample data point, the more likely the sample data is to be assigned to cluster 0; similarly, the larger the sample features in the feature dimension indicated by cluster 1 within a sample data point, the more likely the sample data is to be assigned to cluster 1.

[0091] Specifically Figure 8 Let's explain path 1. In the first hidden layer of the target network model, important positive factors include the number of days a user has visited the "My Assets" page in the past 14 days, the number of visits to the main site in the past 92 days, and the amount of subscriptions made yesterday. These reflect the user's activity level; generally, users with higher activity levels are more likely to subscribe. Meanwhile, the average Sharpe ratio of the funds managed by the fund company and the fund's return over the past year reflect the quality of the candidate items; higher quality items are more likely to interest users. Furthermore, the number of times a user searches for a fund within a day directly reflects the user's preference for that fund, and higher is better. If a user has low values ​​on these positive indicators, but the fund company's maximum drawdown is high, the fund manager's downside volatility is high, and the user has a high number of visits to insurance fund pages, it indicates that the current fund's performance may be poor, exceeding the user's risk tolerance, and therefore the user's probability of subscribing may be relatively lower. (However, this does not mean that all users after this layer to cluster 0 are non-subscribing users; a more granular classification is needed.)

[0092] After the first hidden layer of the network, the positive sample ratio of cluster 0 is 0.4448, significantly lower than that of cluster 1 (0.8849), indicating that the model has already distinguished some high-conversion samples. After the second hidden layer, users in cluster 0 can be further divided into two user groups: If the user has been exposed to the fund a lot within 30 days (the user has some awareness), the fund has a high upward volatility, a high Calma ratio, and a high subscription amount yesterday, then the fund is performing well, and the user still has a high probability of purchasing it. This user is assigned to cluster 1 in the second layer of this path, with a positive sample ratio of 0.6354. Conversely, if these indicators are low, and the fund has a high maximum drawdown in the past year, high fees, and a high return ranking value (the lower the return, the higher the return ranking value), then the user's probability of purchasing is low, and the user enters cluster 0 in this path, with a positive sample ratio of only 0.1013.

[0093] After passing through the second hidden layer of the network, the positive sample ratio of the current cluster 0 is already very low, but the target network model still needs to further distinguish. If the user has been exposed to the fund a lot in the past 7 days (recent impression), the fund manager's average displayed return of managed funds is high (fund manager level), the user has searched and clicked on the fund a lot in the past 7 days (user interest), and the probability of profit after holding the fund for 3 months is high, then the user still has a certain probability of buying it. Conversely, if the fund has a high fee rate, the user has a high number of redemptions in the past day (the market may be declining and the user wants to exit), and the user has a lot of loan repayments in the past 12 months (the user lacks funds to replenish), then the probability of the user buying the candidate fund will be extremely low, reaching the endpoint of path 1, with a positive sample ratio of only 0.006.

[0094] Other paths can be analyzed in the same way; here we will only analyze paths 6 and 8 as examples.

[0095] For path 6, users with higher positive feature values ​​in the hidden layer of the first network enter cluster 1 of the first layer (the first node's differentiation logic is the same as path 2). If the fund has higher yesterday's click UV, higher yesterday's subscription amount, and higher yesterday's exposure / conversion number, then the fund is performing well. At the same time, if the user's historical subscription amount is high, the average subscription amount of users who subscribe to the fund is also high, indicating that high-subscription users may be inclined to buy the fund. Users with a high matching degree will enter cluster 1 of the second layer, with a positive sample ratio of 0.9461. Conversely, if these feature values ​​are low, but the fund's recent 15-day sliding yield ranking is high (low yield), and the user has a high number of subscriptions in the index in the past 31 days, then they enter cluster 0, with a positive sample ratio of 0.7693.

[0096] The current node still has a high proportion of positive samples, indicating that users still have a high willingness to subscribe. The model needs to further distinguish users' asset preferences: factors such as the user's high number of visits to Wei Securities in the past 30 days, high historical holdings of TengAn Fund, and high preference scores in the stock and securities RFM model indicate that the user tends to be more advanced in wealth management. If the fund to be sorted is an advanced wealth management product, and the user has held the fund for a long time in the past 6 months and has recently searched and clicked on it, the probability of the user purchasing it will be greatly increased, leading to the endpoint of path 6, where the proportion of positive samples reaches 0.8833.

[0097] For path 8, users with higher positive feature values ​​in the first layer enter cluster 1 of the first layer (the first node is distinguished by the same logic as paths 2 and 6). Users with higher positive feature values ​​in the second layer enter cluster 1 of the second layer, with the same distinction logic as path 6 (the fund performs well and the user's subscription amount is large). In the third layer, if the user's electronic resource volume + cumulative return is high (indicating a high holding amount), and the average number of days that the user's electronic resource volume + holding amount is above 10,000 is relatively long, then the user's subscription probability is high. At the same time, if the fund's return rate in the past year is high, the user has a large number of stable bond fund holdings, and the fund is also a stable wealth management product, then the user's purchase probability is high (the user's asset preference matches the asset).

[0098] As can be seen, this application demonstrates strong interpretability for the target network model used in financial scenarios, enabling the output of a decision path from input to output. Furthermore, by examining the decision path of the target network model, we can identify which feature dimensions it primarily uses for classification and prediction, and determine whether the model has captured meaningful features. Therefore, we can improve the feature selection algorithm for the target network model. For example, through... Figure 8 It can be known that the hidden layers of the target network model mainly focus on the feature dimensions, while other feature dimensions may not be of much concern to the target network model. Therefore, when the target network model makes classification predictions, it can only use the feature dimensions included in the decision path of the target network model.

[0099] It is understood that in the specific implementation of this application, sample data and other related data are involved. When the above embodiments of this application are applied to specific products or technologies, user permission or consent is required, and the collection, use and processing of related data must comply with the relevant laws, regulations and standards of the relevant countries and regions.

[0100] The methods of the embodiments of this application have been described in detail above. To facilitate better implementation of the methods of the embodiments of this application, the apparatus of the embodiments of this application is provided below. Please refer to... Figure 9 , Figure 9This is a schematic diagram of the structure of a data processing device provided in an embodiment of this application. The data processing device 90 may include:

[0101] The acquisition unit 901 is used to acquire at least one sample data and a category label for any sample data; a sample data contains sample features under one or more feature dimensions, and the category label for any sample data is obtained by feature transformation of the sample features of the sample data under different feature dimensions;

[0102] The processing unit 902 is used to fit the sample features of each sample data under different feature dimensions according to the corresponding category label to obtain the fitting result, so as to indicate the importance of different feature dimensions in the process of adding the corresponding category label to each sample data.

[0103] The processing unit 902 is further configured to use the feature dimensions that meet the selection criteria as the basis for transformation in the feature transformation process based on the importance indicated by the fitting result; the transformation basis is used to generate interpretability analysis results of feature transformation.

[0104] In one embodiment, the acquisition unit 901 is specifically used to: acquire a target algorithm for performing interpretability analysis;

[0105] The processing unit 902 is specifically used to: use the target algorithm to fit the sample features of each sample data under different feature dimensions according to the corresponding category label to obtain the fitting result.

[0106] In one embodiment, the processing unit 902 is specifically used to: perform feature transformation on the sample features of each sample data under different feature dimensions to obtain the feature vector corresponding to each sample data; perform clustering operation on the at least one sample data based on the feature vector corresponding to each sample data to divide the at least one sample data into different datasets, and one dataset corresponds to one clustering label; and use the clustering label corresponding to the dataset where any sample data is located as the category label label of the any sample data.

[0107] In one embodiment, the feature vector corresponding to each sample data in the at least one sample data is obtained by calling the i-th hidden layer of the target network model containing classification function for feature transformation. The target network model contains N hidden layers; where N is a positive integer greater than or equal to 1, and i is a positive integer greater than 0 and less than or equal to N. The interpretability analysis result obtained after performing feature transformation on the at least one sample data based on the i-th hidden layer is used as the interpretability analysis result of the i-th hidden layer.

[0108] In one embodiment, the acquisition unit 901 is specifically used to: acquire at least one reference sample data used for interpretable analysis of the (i-1)th hidden layer of the target network model, and at least two reference sample sets obtained after performing clustering operations on the at least one reference sample data during the interpretable analysis of the (i-1)th hidden layer of the network.

[0109] The processing unit 902 is specifically used to: take the reference sample data contained in any reference sample set as at least one sample data obtained.

[0110] In one embodiment, the acquisition unit 901 is specifically used to: acquire the total amount of the dataset obtained after performing clustering operation on the at least one sample data, and determine the set number corresponding to each dataset based on the total amount of the dataset;

[0111] The processing unit 902 is specifically used to: use the set number corresponding to a dataset as the cluster label corresponding to the dataset.

[0112] In one embodiment, the acquisition unit 901 is specifically used to: acquire the real label of each sample data contained in any dataset, and determine the number of samples corresponding to the same real label;

[0113] The processing unit 902 is specifically used to: calculate the sample proportion corresponding to the same real label based on the number of samples corresponding to the same real label and the total amount of sample data in any dataset, and obtain the sample proportions corresponding to different real labels respectively; determine the real label with the largest sample proportion according to the sample proportions corresponding to different real labels respectively; the real label with the largest sample proportion is used to indicate that the similarity between the feature vector generated by feature transformation and the feature vector corresponding to the sample data under the real label with the largest sample proportion is less than a threshold.

[0114] In one embodiment, the processing unit 902 is specifically used to: perform label prediction processing on the sample features of each sample data under different feature dimensions using the target algorithm to obtain the category prediction result of each sample data; adjust the prediction weight of the target algorithm based on the category prediction result of each sample data and the corresponding category label; and use the adjusted prediction weight as the fitting result.

[0115] In one embodiment, the acquisition unit 901 is specifically used to: select category labels of the target label type from the acquired category labels, and use the selected category labels of the target label type as the fitting target;

[0116] The processing unit 902 is specifically used to: based on the fitting target and using the target algorithm, perform fitting processing on the sample features of each sample data under different feature dimensions according to the corresponding category label to obtain the fitting result, wherein the fitting result obtained based on the fitting target is the fitting result associated with the fitting target.

[0117] In one embodiment, the number of label types of the obtained category label is at least one, and the processing unit 902 is specifically used to: sequentially select one label type from at least one label type as the target label type, so as to select the category label of the target label type.

[0118] In one embodiment, the processing unit 902 is specifically used to: sequentially select the feature dimensions of the target number according to the order of importance indicated by the fitting results from large to small, and use the selected feature dimensions as the basis for transformation in the feature transformation process; or, select the feature dimensions whose importance is greater than the importance threshold according to the importance indicated by the fitting results, and use the selected feature dimensions as the basis for transformation in the feature transformation process.

[0119] It is understood that the functions of each functional unit of the data processing device described in the embodiments of this application can be specifically implemented according to the methods in the above method embodiments, and the specific implementation process can be referred to the relevant descriptions in the above method embodiments, which will not be repeated here.

[0120] In this embodiment, at least one sample data and a category label for any sample data can be obtained. The category label for any sample data is obtained by performing feature transformation on the sample features of the sample data under different feature dimensions. By fitting the sample features of each sample data under different feature dimensions according to the corresponding category label, a fitting result can be obtained. This fitting result can indicate the importance of different feature dimensions during the process of adding corresponding category labels to each sample data. Therefore, based on the importance indicated by the fitting result, the feature dimensions that meet the selection conditions can be used as the transformation basis in the feature transformation process. This transformation basis can be used to generate interpretability analysis results of feature transformation. Through this embodiment, interpretability analysis of the feature transformation process can be achieved.

[0121] like Figure 10 As shown, Figure 10 This is a schematic diagram of the structure of a computer device provided in an embodiment of this application. The internal structure of the computer device 100 is as follows: Figure 10As shown, it includes: one or more processors 1001, memory 1002, and communication interface 1003. The processors 1001, memory 1002, and communication interface 1003 can be connected via bus 1004 or other means. This embodiment of the application takes the connection via bus 1004 as an example.

[0122] The processor 1001 (or CPU, Central Processing Unit) is the computing and control core of the computer device 100. It can parse various instructions within the computer device 100 and process various data. For example, the CPU can parse power-on / off commands sent by the user to the computer device 100 and control the computer device 100 to perform power-on / off operations; it can also transmit various interactive data between internal structures of the computer device 100, and so on. The communication interface 1003 may optionally include standard wired interfaces or wireless interfaces (such as Wi-Fi, mobile communication interfaces, etc.), and is controlled by the processor 1001 for sending and receiving data. The memory 1002 is the storage device in the computer device 100, used to store computer programs and data. It is understood that the memory 1002 here can include both the computer device 100's built-in memory and extended memory supported by the computer device 100. The memory 1002 provides storage space for the operating system of the computer device 100, which may include, but is not limited to, Windows, Linux, Android, iOS, etc., and this application does not limit this to any particular system. The processor 1001 performs the following operations by running the computer program stored in the memory 1002:

[0123] Obtain at least one sample data and a category label for any sample data; a sample data contains sample features under one or more feature dimensions, and the category label for any sample data is obtained by feature transformation of the sample features of the sample data under different feature dimensions;

[0124] For each sample data under different feature dimensions, the sample features are fitted according to the corresponding category labels to obtain the fitting results, which indicate the importance of different feature dimensions during the process of adding corresponding category labels to each sample data.

[0125] Based on the importance indicated by the fitting results, the feature dimensions that meet the selection criteria are used as the basis for transformation in the feature transformation process; the basis for transformation is used to generate interpretability analysis results of feature transformation.

[0126] In one embodiment, the processor 1001 is specifically used to: acquire a target algorithm for interpretability analysis; and use the target algorithm to fit the sample features of each sample data under different feature dimensions according to the corresponding category labels to obtain the fitting result.

[0127] In one embodiment, the processor 1001 is specifically configured to: perform feature transformation on the sample features of each sample data under different feature dimensions to obtain the feature vector corresponding to each sample data; perform clustering operation on the at least one sample data based on the feature vector corresponding to each sample data to divide the at least one sample data into different datasets, and one dataset corresponds to one clustering label; and use the clustering label corresponding to the dataset where any sample data is located as the category label label of the any sample data.

[0128] In one embodiment, the feature vector corresponding to each sample data in the at least one sample data is obtained by calling the i-th hidden layer of the target network model containing classification function for feature transformation. The target network model contains N hidden layers; where N is a positive integer greater than or equal to 1, and i is a positive integer greater than 0 and less than or equal to N. The interpretability analysis result obtained after performing feature transformation on the at least one sample data based on the i-th hidden layer is used as the interpretability analysis result of the i-th hidden layer.

[0129] In one embodiment, the processor 1001 is specifically configured to: acquire at least one reference sample data for interpretable analysis of the (i-1)th hidden layer of the target network model, and at least two reference sample sets obtained after performing clustering operations on the at least one reference sample data during the interpretable analysis of the (i-1)th hidden layer of the network; and use the reference sample data contained in any one of the reference sample sets as the at least one sample data acquired.

[0130] In one embodiment, the processor 1001 is specifically configured to: obtain the total amount of the dataset obtained after performing clustering operation on the at least one sample data, and determine the set number corresponding to each dataset based on the total amount of the dataset; and use the set number corresponding to a dataset as the clustering label corresponding to the dataset.

[0131] In one embodiment, the processor 1001 is specifically configured to: obtain the true label of each sample data contained in any dataset, and determine the number of samples corresponding to the same true label; calculate the sample proportion corresponding to the same true label based on the number of samples corresponding to the same true label and the total number of sample data in the dataset, and obtain the sample proportions corresponding to different true labels respectively; determine the true label corresponding to the maximum sample proportion according to the sample proportions corresponding to different true labels respectively; the true label with the maximum sample proportion is used to indicate that the similarity between the feature vector generated by feature transformation and the feature vector corresponding to the sample data under the true label with the maximum sample proportion is less than a threshold.

[0132] In one embodiment, the processor 1001 is specifically used to: perform label prediction processing on the sample features of each sample data under different feature dimensions using the target algorithm to obtain the category prediction result of each sample data; adjust the prediction weight of the target algorithm based on the category prediction result of each sample data and the corresponding category label; and use the adjusted prediction weight as the fitting result.

[0133] In one embodiment, the processor 1001 is specifically configured to: select a category label of a target label type from the acquired category label labels, and use the selected category label of the target label type as a fitting target; based on the fitting target and using the target algorithm, perform fitting processing on the sample features of each sample data under different feature dimensions according to the corresponding category label labels to obtain a fitting result, wherein the fitting result obtained based on the fitting target is a fitting result associated with the fitting target.

[0134] In one embodiment, the number of label types for the obtained category label is at least one; the processor 1001 is specifically used to: sequentially select one label type from at least one label type as the target label type, so as to select the category label of the target label type.

[0135] In one embodiment, the processor 1001 is specifically used to: sequentially select the feature dimensions of the target number according to the order of importance indicated by the fitting result from large to small, and use the selected feature dimensions as the basis for transformation in the feature transformation process; or, select the feature dimensions whose corresponding importance is greater than the importance threshold according to the importance indicated by the fitting result, and use the selected feature dimensions as the basis for transformation in the feature transformation process.

[0136] In specific implementations, the processor 1001, memory 1002, and communication interface 1003 described in the embodiments of this application can execute the implementation method described in the data processing method provided in the embodiments of this application, or they can execute the implementation method described in the data processing device provided in the embodiments of this application, which will not be repeated here.

[0137] In this embodiment, at least one sample data and a category label for any sample data can be obtained. The category label for any sample data is obtained by performing feature transformation on the sample features of the sample data under different feature dimensions. By fitting the sample features of each sample data under different feature dimensions according to the corresponding category label, a fitting result can be obtained. This fitting result can indicate the importance of different feature dimensions during the process of adding corresponding category labels to each sample data. Therefore, based on the importance indicated by the fitting result, the feature dimensions that meet the selection conditions can be used as the transformation basis in the feature transformation process. This transformation basis can be used to generate interpretability analysis results of feature transformation. Through this embodiment, interpretability analysis of the feature transformation process can be achieved.

[0138] This application also provides a computer-readable storage medium storing a computer program that, when run on a computer device, causes the computer device to perform the data processing method described in any of the possible implementations above. Specific implementations are described above and will not be repeated here.

[0139] This application also provides a computer program product, which includes a computer program or computer instructions. When executed by a processor, the computer program or computer instructions implement the steps of the data processing method provided in this application. The specific implementation method can be found in the foregoing description and will not be repeated here.

[0140] This application also provides a computer program comprising computer instructions stored in a computer-readable storage medium. A processor of a computer device reads the computer instructions from the computer-readable storage medium and executes the computer instructions, causing the computer device to perform the data processing method provided in this application. Specific implementation details are provided above and will not be repeated here.

[0141] It should be noted that, for the sake of simplicity, the foregoing method embodiments are all described as a series of actions. However, those skilled in the art should understand that this application is not limited to the described order of actions, as some steps may be performed in other orders or simultaneously according to this application. Furthermore, those skilled in the art should also understand that the embodiments described in the specification are preferred embodiments, and the actions and modules involved are not necessarily essential to this application.

[0142] Those skilled in the art will understand that all or part of the steps in the various methods of the above embodiments can be implemented by a program instructing related hardware. The program can be stored in a computer-readable storage medium, which may include: a flash drive, a read-only memory (ROM), a random access memory (RAM), a magnetic disk, or an optical disk, etc.

[0143] The above-disclosed embodiments are only some of the embodiments of this application, and should not be construed as limiting the scope of this application. Therefore, any equivalent changes made in accordance with the claims of this application shall still fall within the scope of this application.

Claims

1. A data processing method, characterized in that, The method includes: Obtain object data for at least one object, and category labels for any object data. An object data set contains object features across one or more feature dimensions. The category label for any object data set is obtained by transforming the object features across different feature dimensions. The object features across multiple feature dimensions include: the number of times the object purchased goods, the average price of the purchased goods, the price of the goods recommended to the object, and the number of views of the recommended goods. The label indicates whether the object was purchased or not. The object features are obtained by calling the i-th hidden layer of the target network model to perform identification processing on the object data. The category labels are obtained by performing clustering operations on the at least one object data set and dividing the at least one object data set into different datasets. The target network model contains N hidden layers, where the object features output from the previous hidden layer are input to the next hidden layer to obtain the object features corresponding to the object data in the next hidden layer. N is a positive integer greater than or equal to 1, and i is a positive integer greater than 0 and less than or equal to N. For each object data in the i-th hidden layer of the network, the object features under different feature dimensions are fitted according to the corresponding category labels to obtain the fitting results, which indicate the importance of different feature dimensions during the process of adding corresponding category labels to each object data; the fitting process is performed by calling the logistic regression model; Based on the importance indicated by the fitting results, the feature dimensions that meet the selection criteria are used as the transformation basis in the feature transformation process of the i-th hidden layer of the network; based on the transformation basis of the N hidden layers of the network, the decision path of the target network model is generated, and the decision path is used as the interpretability analysis result of the feature transformation of the target network model; Obtain the true label of each object data in any dataset and determine the number of objects corresponding to the same true label; based on the number of objects corresponding to the same true label and the total amount of object data in any dataset, calculate the proportion of objects corresponding to the same true label to obtain the proportion of objects corresponding to different true labels; obtain the true label of the largest object proportion corresponding to any dataset; if the true labels of the largest object proportions corresponding to M datasets are different and their proportions are greater than the proportion threshold, then prune the target network model and retain the i-th hidden layer of the target network model and the hidden layers of the network before the i-th hidden layer, where M is a positive integer.

2. The method according to claim 1, characterized in that, The process of fitting the object features of each object data under different feature dimensions according to the corresponding category labels to obtain the fitting results includes: Obtain the target algorithm for interpretability analysis; Using the target algorithm, the object features of each object data under different feature dimensions are fitted according to the corresponding category labels to obtain the fitting results.

3. The method according to claim 1, characterized in that, The methods for obtaining category labels for any object data include: For each object data, feature transformation is performed on the object features under different feature dimensions to obtain the feature vector corresponding to each object data. Based on the feature vector corresponding to each object data, a clustering operation is performed on the at least one object data to divide the at least one object data into different datasets, and one dataset corresponds to one clustering label; Use the clustering label corresponding to the dataset containing any object data as the category label for that object data.

4. The method according to claim 3, characterized in that, The feature vector corresponding to each object data in the at least one object data is obtained by performing feature transformation by calling the i-th hidden layer of the target network model that includes classification functionality. The interpretability analysis result obtained by performing feature transformation on the at least one object data based on the i-th hidden layer of the network is used as the interpretability analysis result of the i-th hidden layer of the network.

5. The method according to claim 4, characterized in that, The acquisition of at least one object data includes: Obtain at least one reference object data for interpretable analysis of the (i-1)th hidden layer of the target network model, and at least two reference object sets obtained after clustering the at least one reference object data during the interpretable analysis of the (i-1)th hidden layer of the network. Each reference object set contains reference object data as at least one object data obtained.

6. The method according to claim 3, characterized in that, The ways to determine a cluster label for a dataset include: Obtain the total amount of the dataset obtained after performing clustering operation on the at least one object data, and determine the set number corresponding to each dataset based on the total amount of the dataset; The set number corresponding to a dataset is used as the cluster label for that dataset.

7. The method according to claim 3, characterized in that, The method further includes: Based on the proportion of objects corresponding to different real labels, the real label with the largest object proportion is determined; the real label with the largest object proportion is used to indicate that the similarity between the feature vector generated by feature transformation and the feature vector corresponding to the object data under the real label with the largest object proportion is less than a threshold.

8. The method according to claim 2, characterized in that, The aforementioned target algorithm is used to fit the object features of each object data under different feature dimensions according to the corresponding category labels to obtain the fitting results, including: The target algorithm is used to perform label prediction processing on the object features of each object data under different feature dimensions to obtain the category prediction result of each object data. Based on the category prediction results and corresponding category labels for each object data, the prediction weights of the target algorithm are adjusted. The adjusted prediction weights are used as the fitting result.

9. The method according to claim 2, characterized in that, The aforementioned target algorithm is used to fit the object features of each object data under different feature dimensions according to the corresponding category labels to obtain the fitting results, including: From the obtained category labels, select the category labels of the target label type, and use the selected category labels of the target label type as the fitting target; Based on the fitting target and using the target algorithm, the object features of each object data under different feature dimensions are fitted according to the corresponding category labels to obtain the fitting result. The fitting result obtained based on the fitting target is a fitting result associated with the fitting target.

10. The method according to claim 9, characterized in that, The number of label types obtained from the category labels is at least one; the step of selecting the category label of the target label type from the obtained category labels includes: Select one label type from at least one label type as the target label type in sequence, so as to select the category label of the target label type.

11. The method according to claim 1, characterized in that, The step of using the feature dimensions that meet the selection criteria as the basis for feature transformation based on the importance indicated by the fitting results includes: Following the order of importance from highest to lowest as indicated by the fitting results, feature dimensions for the target number are selected sequentially, and these selected feature dimensions are used as the basis for feature transformation; or, Based on the importance indicated by the fitting results, feature dimensions with importance greater than the importance threshold are selected, and these selected feature dimensions are used as the basis for transformation in the feature transformation process.

12. A data processing apparatus, characterized in that, The device includes: An acquisition unit is used to acquire object data of at least one object and category labels for any object data. An object data set contains object features across one or more feature dimensions. The category label for any object data set is obtained by transforming the object features across different feature dimensions. The object features across multiple feature dimensions include: the number of times the object purchased goods, the average price of the purchased goods, the price of the recommended goods to the object, and the number of views of the recommended goods to the object. The label indicates whether the object was purchased or not. The object features are obtained by calling the i-th hidden layer of the target network model to perform identification processing on the object data. The category label is obtained by performing clustering operations on the at least one object data set and dividing the at least one object data set into different datasets. The target network model contains N hidden layers, where the object features output from the previous hidden layer are input to the next hidden layer to obtain the object features corresponding to the object data in the next hidden layer. N is a positive integer greater than or equal to 1, and i is a positive integer greater than 0 and less than or equal to N. The processing unit is used to fit the object features of each object data under different feature dimensions in the i-th hidden layer of the network according to the corresponding category labels to obtain the fitting result, which indicates the importance of different feature dimensions in the process of adding corresponding category labels to each object data; the fitting process is performed by calling the logistic regression model; The processing unit is further configured to use the feature dimensions that meet the selection conditions as the conversion basis in the feature transformation process of the i-th hidden layer of the network according to the importance indicated by the fitting result; and to generate the decision path of the target network model according to the conversion basis of the N hidden layers of the network, wherein the decision path is used as the interpretability analysis result of the feature transformation of the target network model. The processing unit is further configured to obtain the true label of each object data contained in any dataset and determine the number of objects corresponding to the same true label; based on the number of objects corresponding to the same true label and the total amount of object data in any dataset, calculate the proportion of objects corresponding to the same true label to obtain the proportion of objects corresponding to different true labels; obtain the true label of the largest object proportion corresponding to any dataset; if the true labels of the largest object proportions corresponding to M datasets are different and their proportions are greater than the proportion threshold, then perform pruning on the target network model and retain the i-th hidden layer of the target network model and the hidden layers of the network before the i-th hidden layer, where M is a positive integer.

13. A computer-readable storage medium, characterized in that, The computer-readable storage medium stores a computer program that, when executed by a processor, implements the data processing method as described in any one of claims 1-11.

14. A computer device, characterized in that, The computer device includes a memory, a communication interface, and a processor, wherein the memory, the communication interface, and the processor are interconnected; the memory stores a computer program, and the processor calls the computer program stored in the memory to implement the data processing method as described in any one of claims 1-11.

15. A computer program product, characterized in that, The computer program product includes a computer program or computer instructions, which, when executed by a processor, implement the data processing method as described in any one of claims 1-11.