Information processing device, information processing method, and program
The proposed method addresses the issue of feature interaction neglect in SHAP by calculating the expected difference in marginal contributions, enhancing the accuracy of model comparison and prediction evaluation.
Patent Information
- Authority / Receiving Office
- JP · JP
- Patent Type
- Applications
- Current Assignee / Owner
- NEC CORP
- Filing Date
- 2024-12-10
- Publication Date
- 2026-06-22
AI Technical Summary
Existing methods for evaluating the interpretability of machine learning models, such as SHAP, fail to account for interactions between features, leading to inaccuracies in comparing predictions across different cases.
A method that calculates the 'difference in marginal contribution' of each feature for all intervention sequences in predictive models, considering the expected value of this difference to account for interactions between features.
Enables accurate comparison and evaluation of model behavior across cases by incorporating interaction effects, providing a more precise understanding of prediction differences.
Smart Images

Figure 2026100894000001_ABST
Abstract
Description
[Technical Field]
[0001] This disclosure relates to a method for evaluating the behavior of predictions made by machine learning models. [Background technology]
[0002] When using machine learning models for various tasks involving decision-making, not only predictive performance but also interpretability is required. In recent years, post-hoc explanation methods have attracted attention, which provide explanations for a model's predictions in response to given case studies. SHAP (Sharpley Additive exPlanation) is one example of a post-hoc explanation method. Patent document 1 discloses a method for calculating the contribution of data to prediction results using SHAP in a device that supports physicians' diagnoses using machine learning models. [Prior art documents] [Patent Documents]
[0003] [Patent Document 1] Japanese Patent Publication No. 2023-5697 [Overview of the Initiative] [Problems that the invention aims to solve]
[0004] On the other hand, there are situations where we need to explain not only individual cases, but also the differences in predictions when multiple cases are given. However, simply evaluating the differences in feature importance between cases of interest using explanatory methods such as SHAP is insufficient because it does not take into account the differences due to interactions between features.
[0005] One objective of this disclosure is to provide an information processing device that can compare and evaluate the behavior of models during prediction between cases of interest, taking into account the differences resulting from the interaction between features. [Means for solving the problem]
[0006] In one aspect of the present disclosure, an information processing apparatus includes: input means for obtaining a set, feature amounts included in the set, and two or more functions that return values for any subset of the set; marginal contribution degree calculation means for calculating, as a marginal contribution degree, a difference between a first output value output by the function when a first subset of the set is input and a second output value output by the function when a second subset obtained by adding a feature amount to the first subset is input; difference output means for calculating and outputting an index indicating a difference between the functions based on the marginal contribution degree; and is provided with.
[0007] In another aspect of the present disclosure, an information processing method executed by a computer includes: obtaining a set, feature amounts included in the set, and two or more functions that return values for any subset of the set; calculating, as a marginal contribution degree, a difference between a first output value output by the function when a first subset of the set is input and a second output value output by the function when a second subset obtained by adding a feature amount to the first subset is input; calculating and outputting an index indicating a difference between the functions based on the marginal contribution degree.
[0008] In still another aspect of the present disclosure, a program causes a computer to execute a process of: obtaining a set, feature amounts included in the set, and two or more functions that return values for any subset of the set; calculating, as a marginal contribution degree, a difference between a first output value output by the function when a first subset of the set is input and a second output value output by the function when a second subset obtained by adding a feature amount to the first subset is input; calculating and outputting an index indicating a difference between the functions based on the marginal contribution degree.
Advantages of the Invention
[0009] According to the present disclosure, it is possible to compare and evaluate the behavior of a model at the time of prediction between target cases in consideration of the differences due to the interaction between feature amounts.
Brief Description of the Drawings
[0010] [Figure 1] It is a diagram for explaining an example of a cooperative game. [Figure 2] An example of evaluating a prediction while ignoring the error of the interaction between feature amounts is shown. [Figure 3] It is a diagram for explaining a method of evaluating a prediction model by an existing method. [Figure 4] It is a diagram for explaining a method of evaluating a prediction model by the proposed method. [Figure 5] The overall configuration of an information processing apparatus according to an embodiment is shown. [Figure 6] It is a block diagram showing the hardware configuration of an information processing apparatus. [Figure 7] It is a block diagram showing the functional configuration of an information processing apparatus. [Figure 8] It is a flowchart of function comparison processing executed by an information processing apparatus. [Figure 9] An example of a display for explaining the difference in prediction between cases is shown. [Figure 10] It is a block diagram showing the functional configuration of another information processing apparatus. [Figure 11] It is a flowchart of processing by another information processing apparatus.
Mode for Carrying Out the Invention
[0011] Hereinafter, preferred embodiments of the present disclosure will be described with reference to the drawings. <Related Art> Prior to the description of the embodiment, the related art will be described. [Shapley value] The Shapley value is a method in cooperative game theory for fairly distributing each player's contribution to the overall game. The Shapley value is expressed as the expected value (i.e., average) of the marginal contribution each player made to the overall game in their respective intervention order (hereinafter also referred to as the "intervention order").
[0012] In a cooperative game involving a set of multiple players N={1,...,N}, we define a characteristic function v(S) that returns a real number for a subset S⊆N of multiple players. The marginal contribution of player i to the subset S is the contribution that arises when player i joins the subset S, and is expressed by the following formula. The maximum contribution of player i = v(S∪{i})-v(S) Furthermore, the Shapley value φ for player i∈N i This is defined as the expected value of the maximum contribution of player i when players are added in a uniformly random order, and is expressed by the following formula:
[0013]
number
[0014] The following provides a concrete example. As an example of a cooperative game, consider a part-time job game played by three players, A, B, and C. Figure 1(A) shows the rewards each player can earn if they participate in the target part-time job. For example, if only player A performs the target part-time job, they will earn 80,000 yen, and if players A and B perform it together, they will earn 140,000 yen.
[0015] Now, let's consider the set of pre-registered participants as the subset S mentioned earlier, and the reward for player A when player A joins this subset as follows. If player A participates when there are no prior participants, player A's reward will be 8 - 0 = 80,000 yen. If Player A joins the game when Player B is the only pre-registered participant, Player A's reward will be 140,000 - 30,000 = 110,000 yen. If Player A joins the game when Player C is the only pre-registered participant, Player A's reward will be 180,000 - 60,000 = 120,000 yen. If Player A participates with Players B and C already registered, Player A's reward will be 20 - 10 = 100,000 yen.
[0016] Thus, calculating the reward for player A relative to pre-participants for all participation orders, i.e., the maximum contribution, results in Figure 1(B). Therefore, player A's Shapley value φ i This represents the expected (average) reward for all participation orders when player A participates, and is calculated as follows from Figure 1(B). φ i =(8+8+11+10+12+10) / 6=98,000 yen
[0017] [SHAP] SHAP applies Shapley values to improve the interpretability of machine learning, calculating the contribution of each feature to explain the prediction results of a machine learning model. SHAP is used in the prediction model f:R n →For R, the notable case x∈R n The local explanation of the prediction f(x) is shown by the importance of the features. SHAP is a type of Shapley value, and the correspondence between Shapley values and SHAP is as follows. Each player i is considered a feature i, and the set of players N is considered a set of features. • Define a game for the case x of interest, and define the characteristic function of this game as v x It is represented as follows. For a subset S∈N of features, the characteristic function v x Let (S) be the following equation (2),
[0018]
number
[0019]
Number
[0020] The marginal contribution of feature i to subset S is represented by the following formula. Marginal contribution of feature i = v x (S ∪ {i}) - v x (S) This is the contribution that occurs when the input value is changed to the feature value x S corresponding to subset S, and then the feature value is further changed to feature value x i .
[0021] The SHAP value φ i for feature i is defined as the expected value φ i of the marginal contribution of feature i when the features are added in a uniformly random order, and is shown by the following formula.
[0022]
Number
[0023] The vector φ = (φ1,..., φ n ) obtained by arranging the SHAP values for all features i is hereinafter referred to as the SHAP vector. In the following literature, a method of performing clustering using the SHAP vector obtained from the features of each sample has been proposed. In this method, clusters can be analyzed from the perspective of which features are important for the target variable. (Literature 1) Explainable AI for Trees: From Local Explanations to Global Understanding, Scott M. Lundberg, Gabriel Erion, Hugh Chen, Alex DeGrave, Jordan M. Prutkin, Bala Nair, Ronit Katz, Jonathan Himmelfarb, Nisha Bansal, Su-In Lee, https: / / doi.org / 10.48550 / arXiv.1905.04610
[0024] [assignment] The evaluation using the SHAP values described above has the drawback of lacking information about the interactions between features. Since the marginal contribution in SHAP values is influenced by other features, the marginal contribution of a given feature in intervention order A will differ from its marginal contribution in intervention order B. For example, let intervention orders A and B be as follows. Intervention order A: Feature 1 → Feature 3 → Feature 2 Intervention order B: Feature 2 → Feature 1 → Feature 3
[0025] Here, if feature 1 interacts with feature 2, the marginal contribution of feature 1 in intervention order B will differ from the marginal contribution of feature 1 in intervention order A. However, since the Shapley value is averaged over all intervention orders, it does not contain information about the fluctuation in marginal contribution (hereinafter also called "variance") that depends on the intervention order. The variance of marginal contribution that depends on the intervention order becomes large when there is interaction between features. Therefore, if the difference in Shapley values is simply used to explain the prediction error between multiple cases, such as in the clustering described in reference 1 above, the prediction error will be evaluated while ignoring the error of interaction between features.
[0026] Figure 2 illustrates an example where evaluation is performed while ignoring the error in the interaction between features. In the example in Figure 2(A), the characteristic function f corresponding to the prediction model is an OR function, and the inputs are features x1 and x2. For simplicity, the initial value is set to 0, and when the features are input in the order x1 → x2 according to intervention order A, the marginal contribution of feature x1 is "+1". On the other hand, when the features are input in the order x2 → x1 according to intervention order B, the marginal contribution of feature x1 is "0". Therefore, the SHAP value is the average of the marginal contributions of feature x1, which is "0.5".
[0027] In the example in Figure 2(B), the characteristic function f corresponding to the prediction model is an AND function, and the inputs are features x1 and x2. For simplicity, the initial value is set to 0, and when the features are input in the order x1 → x2 according to intervention order A, the marginal contribution of feature x1 is "0". On the other hand, when the features are input in the order x2 → x1 according to intervention order B, the marginal contribution of feature x1 is "+1". Therefore, the SHAP value is the average of the marginal contributions of feature x1, which is "0.5".
[0028] Thus, while different characteristic functions result in different marginal contributions for each feature depending on the intervention order, the SHAP value averages out the marginal contributions of each feature, resulting in the same SHAP value regardless of the characteristic function. In other words, the interaction error due to the intervention order is ignored.
[0029] <Proposed method> [Conceptual explanation] As described above, the Shapley value is the expected value, or mean, of the marginal contribution when any intervention order is assumed to have equal probability. Therefore, when evaluating the model's behavior during prediction between the cases of interest, simply comparing Shapley values does not allow for consideration of differences due to interactions between features.
[0030] Specifically, let's consider a cooperative game as a predictive model and evaluate the difference (also called "dissimilarity") between cooperative games A and B. A simple method of comparing Shapley values would calculate the expected value of the marginal contribution to the intervention order for cooperative games A and B and take the difference between them. However, this method cannot take into account the differences due to the interaction between features caused by the intervention order.
[0031] Therefore, the proposed method calculates the "difference in marginal contribution" of each feature for all intervention sequences in cooperative games A and B, which correspond to the predictive models, and then calculates the expected value of the calculated "difference in marginal contribution." This makes it possible to consider the differences due to the interaction between features caused by the intervention sequence.
[0032] In the following explanation, when comparing and evaluating two prediction models, the method of comparing average values of marginal contributions, such as Shapley values and SHAP values, will be called the "existing method," and the method of comparing the average value of the "difference in marginal contributions" will be called the "proposed method."
[0033] [Specific example] Figure 3 illustrates the evaluation method for predictive models using existing techniques. We consider cooperative games X and Y as the predictive models to be evaluated. For cooperative game X, Table T1 shows the relationship between each participant and their reward, and Table T2 shows the relationship between the order of participation (intervention order) and participant A's marginal contribution. Similarly, for cooperative game Y, Table T3 shows the relationship between each participant and their reward, and Table T4 shows the relationship between the order of participation (intervention order) and participant A's marginal contribution.
[0034] Existing methods first calculate the average marginal contribution of participant A for all intervention sequences for each cooperative game and then compare them. As shown in Figure 3, existing methods compare the average marginal contribution of participant A for cooperative game X (98,000 yen) with the average marginal contribution of participant A for cooperative game Y (90,000 yen) to evaluate cooperative games X and Y. However, as mentioned earlier, averaging the marginal contributions causes the interactions between features resulting from the intervention sequence to be buried and not reflected in the average, thus not being considered in the comparative evaluation.
[0035] Figure 4 illustrates the evaluation method for the predictive model using the proposed method. Similar to existing methods, cooperative games X and Y are considered as the predictive models to be evaluated. For cooperative game X, the relationship between each participant and their reward is shown in Table T1, and the relationship between the order of participation (intervention order) and participant A's marginal contribution is shown in Table T2. Similarly, for cooperative game Y, the relationship between each participant and their reward is shown in Table T3, and the relationship between the order of participation (intervention order) and participant A's marginal contribution is shown in Table T4. Tables T1 to T4 are the same as in Figure 3.
[0036] The proposed method first calculates the difference between participant A's marginal contribution in cooperative game X and participant A's marginal contribution in cooperative game Y for each intervention sequence. Then, the proposed method calculates the average of the obtained "differences in marginal contributions" and evaluates cooperative games X and Y based on the obtained average. Since the "differences in marginal contributions" include the interactions between features that appear in the marginal contributions for each intervention sequence, the final average value of the "differences in marginal contributions" includes the interactions between features. Therefore, the proposed method makes it possible to compare predictive models while considering the interactions between features.
[0037] [Calculation method] Next, we will explain how to calculate the difference in marginal contribution using the proposed method. (i) In the case of Shapley values First, we rewrite the Shapley value in terms of the expected value of the intervention order. Any permutation π: {1,...,n}→{1,...,n} is called an intervention order, and all intervention orders are defined by the set Π. Furthermore, the marginal contribution of player i in cooperative game A and intervention order π is defined by the following formula.
[0038]
number
[0039] Given the intervention order π and player i, the set of players whose intervention order precedes player i in intervention order π is defined by the following formula.
[0040]
number
[0041] The Shapley value for player i is expressed by the following formula, which takes the expected value for the intervention order π.
[0042]
number
[0043] Equation (7) shows the contribution that minimizes the squared error between the marginal contribution and the contribution in the actually observed intervention order π, assuming that all intervention orders occur with equal probability (i.e., the expected value).
[0044] From this perspective, the explanation for the difference between cooperative games A and B is the marginal contribution Δ related to the order of intervention. (i) π,A and Δ (i) π,B It is defined as the expected value of the squared error and is expressed by the following formula.
[0045]
number
[0046]
number
[0047] (ii) In the case of SHAP The marginal contribution of feature i in the background data b∈B, with a target case x and intervention order π, is defined by the following formula.
[0048]
number
[0049]
number
[0050] This formula shows the contribution that minimizes the squared error between the marginal contribution and the contribution in the observed intervention order π and background data b, assuming that all intervention orders and background data are selected with equal probability (= expected value).
[0051] From this perspective, notable case x A and x B The explanation for the difference is the marginal contribution Δ related to the order of interventions and background data. (i) π,b,xA and Δ (i) π,b,xB It is defined as the expected value of the squared error and is expressed by the following formula.
[0052]
number
[0053]
number
[0054] In the above explanation, we described the case where feature i is changed according to the intervention order selected with uniform probability. However, the proposed method is equally applicable to cases where feature i is changed according to the intervention order selected according to a specific probability distribution.
[0055] <Information Processing Device> Next, we will describe an information processing device to which the proposed method is applied. [First Embodiment] (Overall structure) Figure 5 shows the overall configuration of the information processing device according to the first embodiment. Two or more functions to be compared are input to the information processing device 100. The information processing device 100 compares the two or more input functions using the proposed method described above and outputs the difference between the functions.
[0056] (Hardware configuration) Figure 6 is a block diagram showing the hardware configuration of the information processing device 100. As shown in the figure, the information processing device 100 comprises a processor 11, an interface (IF) 12, a ROM (Read Only Memory) 13, a RAM (Random Access Memory) 14, a database (DB) 15, and a recording medium 16. Each component is connected to the others, for example, via a bus 18.
[0057] The processor 11 is a computer such as a CPU (Central Processing Unit) and controls the entire information processing device 100 by executing a pre-prepared program. Specifically, the processor 11 can be a CPU, GPU (Graphics Processing Unit), DSP (Digital Signal Processor), MPU (Micro Processing Unit), FPU (Floating Point Number Processing Unit), PPU (Physics Processing Unit), TPU (Tensor Processing Unit), quantum processor, microcontroller, or a combination thereof.
[0058] Furthermore, the processor 11 loads the program stored in the ROM 13 and recording medium 16 into the RAM 14 and executes each process coded in the program. The processor 11 functions as part or all of the information processing device 100. Specifically, the processor 11 performs the function comparison process described later.
[0059] IF12 transmits and receives data to and from external devices. Specifically, the information processing device 100 obtains two or more functions through IF12 and outputs an index showing the difference between the functions obtained by calculation to a display device or other external device.
[0060] ROM 13 stores various programs executed by processor 11. RAM 14 is used as working memory while processor 11 is executing various processes.
[0061] DB15 stores various algorithms, data, machine learning models, etc., that the information processing device 100 uses when it performs the function comparison process described later.
[0062] The recording medium 16 is a non-volatile, non-temporary recording medium such as a disk-shaped recording medium or semiconductor memory. The recording medium 16 may be configured to be detachable from the information processing device 100. The recording medium 16 stores various programs executed by the processor 11.
[0063] In addition to the above, the information processing device 100 may also be equipped with a display device such as a liquid crystal display, and an input device such as a keyboard or mouse. These display devices and input devices are used, for example, by the operator of the information processing device 100.
[0064] (Functional configuration and processing) Figure 7 is a block diagram showing the functional configuration of the information processing device. The information processing device 100 comprises a function input unit 21, a limit contribution calculation unit 22, a difference calculation unit 23, and an output unit 24. Figure 8 is a flowchart of the function comparison process performed by the information processing device 100. This process is realized by the processor 11 shown in Figure 6 executing a pre-prepared program.
[0065] The function input unit 21 obtains two or more functions corresponding to the prediction model (step S11). The marginal contribution calculation unit 22 calculates the marginal contribution for each function for each intervention order of feature i included in the set N (step S12). For example, the marginal contribution calculation unit 22 calculates the marginal contribution using equation (5) or (10) described above.
[0066] The difference calculation unit 23 calculates the expected value of the difference in marginal contribution for each intervention order as the difference between functions (step S13). Specifically, the difference calculation unit 23 calculates the difference between functions using equation (8) or (12) described above. The output unit 24 outputs the obtained difference between functions to a display device or external device (step S14). Then the function comparison process is completed.
[0067] (Example display) The proposed method can visualize and present to the user the explanation for the difference in predictions between cases obtained. Figure 9 shows an example of how the explanation for the difference in predictions between cases is displayed. In this example, Case 1, which compares cases a and b, and Case 2, which compares cases c and d, are displayed.
[0068] In Figure 9(A), "Square Error (Individual)" shows the individual squared errors for each feature x1 to x3, and "Square Error (Total)" shows the sum of the squared errors for all features x1 to x3. "Expected Difference" shows the difference in SHAP values and corresponds to the bias term in equation (13) above. "Expected Difference" shows the average magnitude of the index itself that shows the difference in predictions between cases. "Standard Deviation" shows the difference arising from the differences in the interaction between features x1 to x3 and corresponds to the variance term in equation (13) above. "Standard Deviation" is the square of the variance of the index that shows the difference in predictions between cases, and shows the fluctuation of the index that shows the difference in predictions between cases.
[0069] In Case 1, which compares cases a and b, and in Case 2, which compares cases c and d, the "expected value of the difference" (=SHAP value) is the same in both cases. However, because the "standard deviation" (=square of the variance) is different, the actual distance between the cases (=sum of squared errors) is also different.
[0070] Graph 50 in Figure 9(B) shows Case 1, comparing cases a and b. The end 51a of bar 51, corresponding to features x1 and x2, indicates the "expected difference." Note that the "expected difference" for feature x3 is 0, so bar 51 is not displayed. Additionally, bar 52, corresponding to features x1 and x2, shows the "standard deviation," indicating that there is interaction between the features in Graph 50.
[0071] Similarly, Graph 60 in Figure 9(B) shows Case 2, comparing cases c and d, where the end 61a of bar 61, corresponding to features x1 and x2, indicates the "expected difference." Note that the "expected difference" for feature x3 is 0, so bar 61 is not displayed. In Graph 60, the "standard deviation" corresponding to features x1 to x3 is "0," so the bar corresponding to bar 52 in Graph 50 is not shown. This indicates that there is no interaction between the features.
[0072] Thus, in Case 1, which compares examples a and b, and in Case 2, which compares examples b and c, the "expected difference" corresponding to the SHAP value is the same, but the "standard deviation" indicating the interaction between features is different. Therefore, by referring to such display examples, it is possible to determine whether or not there is interaction between features.
[0073] [Variations of the proposed method] (Variation 1) The proposed method described above can be applied between two sets of cases. Given a certain predictive model, set X of cases can be applied as follows: A and X B The difference in predictions between the two can be explained by the following indicators.
[0074]
number
[0075] (Modification 2) When using the proposed method, we can speed up computation by focusing on the features the model uses for prediction. The following observations have been made regarding the features used for prediction. (1) When there are features that do not affect the model's prediction, the marginal contribution of those features is 0. (2) For multiple feature sets S where the marginal contribution of feature i to the feature set S is the same, the marginal contribution can be calculated together.
[0076] Noting that the two issues mentioned above are particularly likely to occur in tree-structured models, the following reference 2 proposes an algorithm for quickly calculating SHAP values for tree-structured models. (Reference 2) Explainable AI for Trees: From Local Explanations to Global Understanding, Scott M. Lundberg et.al. arXiv2019 1905.04610 (arxiv.org).
[0077] In a tree structure model, when the branching condition of an internal node focuses on a feature j, all destinations of the set S that reach that node will coincide if they contain feature j, and all destinations will also coincide if they do not contain feature j. Therefore, processing can be sped up by processing the cases where the set reaches that node contains feature j and the cases where it does not, separately. Thus, the proposed method can also be extended based on the above observations (1) and (2). That is, based on observations (1) and (2), processing can be sped up by processing multiple feature sets S together for each of the focus cases A and B.
[0078] [Examples of application of the proposed method] (Application Example 1) The proposed method can be used to quickly discover similar cases. There is a need to find the case that is most similar to a case of interest. For example, in credit trusts, if one has failed the credit check, there is a need to find the case that is closest to oneself among those that have passed the credit check. In another example, in medical diagnosis, if one is diagnosed as pre-diabetic, finding the person who is closest to oneself among healthy individuals can help set treatment goals. In such cases, this need can be addressed by setting a similarity function or dissimilarity function as an indicator in the proposed method and performing nearest neighbor search for the case of interest.
[0079] (Application Example 2) The proposed method can be applied to clustering. As mentioned earlier, simply using the difference in Shapley values to explain the error in predictions between multiple cases has the problem of ignoring the error in interactions. In contrast, the proposed method, Difference in predicted values ≤ Difference in SHAP values ≤ Squared error of marginal contribution A fast algorithm based on the branch-and-bound method, utilizing this inequality, can be applied. For example, in predicting product purchases, when analyzing valuable customers from the purchase prediction results, grouping similar customers allows for the development of efficient strategies for each group.
[0080] <Second Embodiment> Figure 10 is a block diagram showing the functional configuration of the information processing device according to the second embodiment. The information processing device 70 of the second embodiment includes an input means 71, a limit contribution calculation means 72, and a difference output means 73.
[0081] Figure 11 is a flowchart of the processing performed by the information processing device of the second embodiment. The input means 71 obtains a set, feature quantities included in the set, and two or more functions that return values for any subset of the set (step S71). The marginal contribution calculation means 72 outputs the difference between a first output value output by the function when a first subset of the set is input and a second output value output by the function when a second subset obtained by adding feature quantities to the first subset is input, as the marginal contribution (step S72). The difference output means 73 calculates and outputs an index showing the difference between the functions based on the marginal contribution (step S73).
[0082] Some or all of the above embodiments may also be described as follows, but are not limited to the following:
[0083] (Note 1) An input means for obtaining a set, a feature quantity included in the set, and two or more functions that return values for any subset of the set, A marginal contribution calculation means calculates the difference between a first output value output by the function when a first subset of the set is input, and a second output value output by the function when a second subset obtained by adding features to the first subset is input, as the marginal contribution. A difference output means that calculates and outputs an index showing the difference between the functions based on the marginal contribution, An information processing device equipped with the following features.
[0084] (Note 2) The difference output means is an information processing device according to Appendix 1, which calculates the expected value of the squared error of the marginal contribution calculated for each of the functions as an indicator of the difference between the functions.
[0085] (Note 3) The difference output means is an information processing device according to Appendix 1, which calculates the expected value of the difference in marginal contributions calculated for each of the functions in each order in which the feature quantities are input, as an indicator of the difference between the functions.
[0086] (Note 4) The marginal contribution calculation means is an information processing device according to any one of the appendices 1 to 3, which calculates the difference between the value output by the function when a first subset selected uniformly from the set is input and the value output by the function when a second subset obtained by adding features to the first subset is input, as the marginal contribution.
[0087] (Note 5) The information processing apparatus according to any one of the appendices 1 to 3, wherein the marginal contribution calculation means calculates the difference between the value output by the function when a first subset selected from the set according to a certain probability distribution is input and the value output by the function when a second subset obtained by adding features to the first subset is input.
[0088] (Note 6) The difference output means is an information processing device according to any one of the appendices 1 to 5, which visualizes and displays a value indicating the average magnitude of the index itself that represents the difference between the functions and a variance value indicating the fluctuation of the index.
[0089] (Note 7) The marginal contribution calculation means is an information processing device according to any one of the appendices 1 to 6, which groups together subsets of the set whose marginal contributions are equal, and calculates the marginal contribution for each group.
[0090] (Note 8) A method of information processing performed by a computer, Obtain a set, a feature vector included in the set, and two or more functions that return a value for any subset of the set. The difference between the first output value output by the function when a first subset of the set is input, and the second output value output by the function when a second subset obtained by adding features to the first subset is input, is calculated as the marginal contribution. An information processing method that calculates and outputs an index showing the difference between the functions based on the marginal contribution.
[0091] (Note 9) Obtain a set, a feature vector included in the set, and two or more functions that return a value for any subset of the set. The difference between the first output value output by the function when a first subset of the set is input, and the second output value output by the function when a second subset obtained by adding features to the first subset is input, is calculated as the marginal contribution. A program that causes a computer to perform a process of calculating and outputting an index that shows the difference between the functions based on the marginal contribution.
[0092] Furthermore, some or all of the configurations described in Appendices 2 to 7, which are subordinate to Appendice 1 above, may also be subordinate to Appendices 8 and 9 in the same way as those described in Appendices 2 to 7. Moreover, not limited to Appendices 1, 8, and 9, some or all of the configurations described as appendices may also be subordinate to various hardware, software, various recording means for recording software, or systems, without departing from the embodiments described above.
[0093] Although the present disclosure has been described above with reference to embodiments and examples, the present disclosure is not limited to the above embodiments and examples. Various modifications to the structure and details of the present disclosure can be understood by those skilled in the art within the scope of the present disclosure. [Explanation of Symbols]
[0094] 11 processors 21 Function Input Section 22. Limit Contribution Calculation Unit 23 Difference calculation part 24 Output section 100 Information Processing Devices
Claims
1. An input means for obtaining a set, a feature quantity included in the set, and two or more functions that return values for any subset of the set, A marginal contribution calculation means calculates the difference between a first output value output by the function when a first subset of the set is input, and a second output value output by the function when a second subset obtained by adding features to the first subset is input, as the marginal contribution. A difference output means that calculates and outputs an index showing the difference between the functions based on the marginal contribution, An information processing device equipped with the following features.
2. The information processing apparatus according to claim 1, wherein the difference output means calculates the expected value of the squared error of the marginal contribution calculated for each of the functions as an indicator of the difference between the functions.
3. The information processing apparatus according to claim 1, wherein the difference output means calculates the expected value of the difference in marginal contribution calculated for each of the functions in each order in which the feature quantities are input, as an indicator showing the difference between the functions.
4. The information processing apparatus according to claim 1, wherein the marginal contribution calculation means calculates the difference between the value output by the function when a first subset selected uniformly from the set is input and the value output by the function when a second subset obtained by adding features to the first subset is input, as the marginal contribution.
5. The information processing apparatus according to claim 1, wherein the marginal contribution calculation means calculates the difference between the value output by the function when a first subset selected from the set according to a certain probability distribution is input and the value output by the function when a second subset obtained by adding features to the first subset is input, as the marginal contribution.
6. The information processing apparatus according to claim 1, wherein the difference output means visualizes and displays a value indicating the average magnitude of the index itself that represents the difference between the functions and a variance value indicating the fluctuation of the index.
7. The information processing apparatus according to claim 1, wherein the marginal contribution calculation means groups together subsets of the set whose marginal contributions are equal, and calculates the marginal contribution for each group.
8. A method of information processing performed by a computer, Obtain a set, a feature vector included in the set, and two or more functions that return values for any subset of the set. The difference between the first output value output by the function when a first subset of the set is input, and the second output value output by the function when a second subset obtained by adding features to the first subset is input, is calculated as the marginal contribution. An information processing method that calculates and outputs an index showing the difference between the functions based on the marginal contribution.
9. Obtain a set, a feature vector included in the set, and two or more functions that return values for any subset of the set. The difference between the first output value output by the function when a first subset of the set is input, and the second output value output by the function when a second subset obtained by adding features to the first subset is input, is calculated as the marginal contribution. A program that causes a computer to perform a process of calculating and outputting an index that shows the difference between the functions based on the marginal contribution.