Learning device and program

What is AI technical title?
AI technical title is built by Patsnap AI team. It summarizes the technical point description of the patent document.
The learning device and program improve the explainability of machine learning models by combining CNNs and feature vector networks with methods like LIME and Grad-CAM, enabling users to understand the basis for model judgments.

JP7875489B2Active Publication Date: 2026-06-18NIKON CORP

View PDF -1 Cites 0 Cited by

Patent Information

Authority / Receiving Office: JP · JP
Patent Type: Patents
Current Assignee / Owner: NIKON CORP
Filing Date: 2024-05-14
Publication Date: 2026-06-18

Application Information

Patent Timeline

14 May 2024

Application

18 Jun 2026

Publication

JP7875489B2

IPC: G06T7/00

CPC: G06T7/00; G06V10/40; G06V10/82; G06V20/698; G06V10/766; G06V10/764; G06V10/774; G06V10/454

AI Tagging

Application Domain

Image analysis Character and pattern recognition

Explore More Agents

Novelty Search
Search existing technologies and assess novelty
↗
FTO
Analyze whether a product may infringe others' patents
↗
Design FTO
Check prior-design risk for exterior design
↗
Drafting
Draft patent application text based on a technical solution
↗
Find Solutions with TRIZ
Generate feasible solution to solve your technical challenge
↗

Get free access to AI patent search and analysis

Check patentability, review prior art and ask IP Agent with full patent context.

AI Technical Summary

Technical Problem

Existing machine learning models, particularly deep learning models, provide accurate diagnostic and differential diagnoses but lack sufficient explainability, making it difficult for users to interpret the basis for their judgments.

Method used

A learning device and program that utilizes a multimodal model combining convolutional neural networks (CNN) and feature vector neural networks to extract interpretable features, accompanied by methods like LIME and Grad-CAM to visualize and quantify the contributions of these features to the judgment process, ensuring both accuracy and explainability.

Benefits of technology

The solution enhances the interpretability of machine learning models by providing clear visualizations and quantitative contributions of features, allowing users to understand the reasoning behind the model's decisions effectively.

✦ Generated by Eureka AI based on patent content.

Smart Images

Figure 0007875489000004
Figure 0007875489000005
Figure 0007875489000006

Patent Text Reader

Abstract

This training device includes: a storage unit that stores a trained model that is trained to receive input of a training image and a training feature amount obtained by digitizing a predetermined interpretable feature related to a subject of the training image, and output a result of determination for the training image and the training feature amount; a determination unit that uses the trained model stored in the storage unit to output a result of determination for an explanation target image and a first feature amount obtained by digitizing a predetermined interpretable feature related to a subject of the explanation target image; and an explanation output unit that outputs the degree of contribution of the explanation target image and the degree of contribution of the first feature amount in relation to the result of determination for the explanation target image and the first feature amount by the trained model.

Need to check novelty before this filing date? Find Prior Art

Description

[Technical Field]

[0001] This invention relates to a learning device and a program. [Background technology]

[0002] Patent Document 1 states that "the basis calculation unit 114 can use algorithms such as Grad-CAM (Gradient-weighted Class Activation Mapping), LIME (LOCAL Interpretable model-agnostic Explanations), SHAP (SHapley Additive exPlanations), and TCAV (Testing with Concept Activation Vectors), which are advanced forms of LIME, to calculate an image that visualizes the basis for each of the diagnostic and differential diagnoses using the trained machine learning model in the inference unit 112."

[0003] Non-patent document 1 describes a learning model that classifies skin lesions by inputting skin images and metadata (such as the patient's age and sex). [Prior art document] [Patent] [Patent Document 1] International Publication No. 2022 / 176396 “Non-Patent Document 1” Nils Gessert et al. “Skin lesion classification using ensembles of multi-resolution EfficientNets with meta data” Methods X, Volume 7, 2020 General Disclosure

[0004] In a first aspect of the present invention, the learning device includes a storage unit that stores a learned learning model, which takes a learning image and a learning feature quantity that quantifies predetermined interpretable features related to the subject of the learning image as input and outputs the results of judgments on the learning image and the learning feature quantity; a judgment unit that outputs the results of judgments on an image to be explained and a first feature quantity that quantifies predetermined interpretable features related to the subject of the image to be explained using the learning model stored in the storage unit; and an explanation output unit that outputs the contribution of the image to be explained and the contribution of the first feature quantity to the results of judgments on the image to be explained and the first feature quantity by the learning model.

[0005] The predetermined interpretable features may include parameters that can quantitatively represent the shape or characteristics of the subject. The determination unit may use a learning model to extract a second feature from the image to be explained, and the explanation output unit may calculate the contribution by performing linear regression on the neighborhood of the data consisting of the first and second features related to the subject in the feature space that includes the second and first features. The determination unit may use a learning model to extract a second feature from the image to be explained, and the explanation output unit may compress the dimension of the feature vector, which is the second feature, to calculate the contribution of the second feature as the contribution of the image to be explained. The explanation output unit may compress the feature vector to one dimension.

[0006] The system may further include a feature calculation unit that calculates at least one of the first features based on the image to be explained and inputs it into the learning model.

[0007] The learning model may include a convolutional neural network.

[0008] The explanation output unit may further output an image showing the basis for the judgment within the image to be explained, corresponding to the judgment result by the learning model. The explanation output unit may display the contribution of the image to be explained and the contribution of the first feature side by side. The explanation output unit may display the contribution of the image to be explained, the contribution of the first feature, the image showing the basis for the judgment, and the judgment result side by side. The explanation output unit may display text corresponding to the contribution of the image to be explained and the contribution of the first feature, respectively.

[0009] In a second aspect of the present invention, a program is provided that includes a memory function that stores a learned model in a memory unit, which takes a training image and a learning feature quantity that quantifies predetermined interpretable features related to the subject of the training image as input and outputs the results of judgments made on the training image and the learning feature quantity; a judgment function that outputs the results of judgments made on an image to be explained and a first feature quantity that quantifies predetermined interpretable features related to the subject of the image to be explained, using the learned model stored in the memory unit; and an explanation output function that outputs the contribution of the image to be explained and the contribution of the first feature quantity to the results of judgments made on the image to be explained and the first feature quantity by the learned model.

[0010] In a third aspect of the present invention, a learning device is provided that takes a training image and a training feature quantity, which is a numerical representation of a predetermined interpretable feature related to the subject of the training image, as inputs, and uses a learning model that has been trained to output the results of judgments on the training image and the training feature quantity, to output the results of judgments on an image to be explained and a first feature quantity, which is a numerical representation of a predetermined interpretable feature related to the subject of the image to be explained, and comprises an explanation output unit that outputs the contribution of the image to be explained and the contribution of the first feature quantity to the results of the learning model's judgment on the image to be explained and the first feature quantity.

[0011] It should be noted that the above summary of the invention does not enumerate all of its features. Furthermore, subcombinations of these features may also constitute an invention. [Brief explanation of the drawing]

[0012] [Figure 1] The functional blocks of the learning device 10 according to this embodiment are shown. [Figure 2] This shows the operation flow of the learning device 10. [Figure 3] An example of learning model 120 is shown in outline. [Figure 4] This is a distribution diagram to illustrate the outline of a method for linear regression on a feature space. [Figure 5] This is a distribution diagram that explains the readability feature space. [Figure 6] This is a distribution diagram illustrating a method for compressing the dimension of the second feature to one dimension. [Figure 7] An example of an image 200 displayed on the display by the explanatory output unit 104 is shown. [Figure 8] This is the text template 230 that will be displayed in text area 220. [Figure 9] This is a list of other application examples. [Figure 10] Examples of a computer 2200 in which multiple aspects of the present invention may be embodied in whole or in part are shown. [Modes for carrying out the invention]

[0013] The present invention will be described below through embodiments, but these embodiments are not intended to limit the scope of the claims. Furthermore, not all combinations of features described in the embodiments are necessarily essential to the solution of the invention.

[0014] Figure 1 shows the functional blocks of the learning device 10 according to this embodiment. The learning device 10 outputs the result of some determination for a target input. The determination may be to determine which of the predetermined categories it belongs to, or it may be to calculate the possibility of belonging to each category as a prediction probability. In this embodiment, the target input is an image of a cell and a feature quantity (first feature quantity) obtained by quantifying a predetermined interpretable feature related to the cell shown in the image, and the determination as the output is the prediction probability of which of the four states of the cell cycle the cell shown in the image and the first feature quantity are in. The said output may also be called a decision, determination, prediction, speculation, estimation, etc.

[0015] For such determination, a learning model such as deep learning is used. In the learning model, there is often a trade-off relationship between the accuracy of the determination and the understandability of the basis for the determination. For example, deep learning is a learning model with relatively accurate determination, but it is difficult for users to interpret it even by looking at the nodes and their weights in each layer. Therefore, explanation tools for the basis of judgment such as Grad-CAM, LIME, SHAP, and TCAV have been proposed, but none of them can be said to be sufficient. In this embodiment, the purpose is to present the basis for the determination in an easy-to-understand manner while ensuring the accuracy of the determination. Note that the understandability of the basis is sometimes referred to as high explainability.

[0016] The learning device 10 includes a feature calculation unit 100, a determination unit 102, an explanation output unit 104, and a storage unit 106. The storage unit 106 stores the learning model 120. The learning device 10 may be an information device such as a personal computer, a tablet, or a smartphone, and may be realized by installing a program or an application in these information devices. Also, the learning device 10 may be a web server that outputs the result of some determination for a target input by cloud computing, and may be realized by installing a program or an application in this. Note that the web server is connected to a microscope or the like via a network.

[0017] The feature calculation unit 100 acquires an input image from the outside such as a microscope. The feature calculation unit 100 may read out the input image stored in the storage unit 160 in advance. The feature calculation unit 100 calculates a feature quantity (first feature quantity) obtained by quantifying interpretable features determined in advance from the input image, and inputs it to the learning model 120.

[0018] The determination unit 102 performs determination on the input image using the learning model 120 stored in the storage unit 106. The determination unit 102 outputs the result of the determination to the outside such as a display.

[0019] When the learning model 120 obtains the determination result, the explanation output unit 104 outputs in a comparable manner the contribution degree of the feature quantity (second feature quantity, to be described later) extracted from the pixel values of the input image, and the contribution degree of the feature quantity (first feature quantity) obtained by quantifying interpretable features determined in advance and related to the subject of the input image. The output destination is, for example, a display, the same as that of the determination unit 102.

[0020] FIG. 2 shows the operation flow of the learning device 10. The operation flow starts, for example, when the user starts up the learning device 10. The operation flow includes a learning stage S100 for learning the learning model 120, a determination stage S100 for determining an object to be explained using the learning model 120, and an explanation stage S120 for calculating contribution degrees and the like to explain the result of the determination.

[0021] FIG. 3 schematically shows an example of the learning model 120. In the learning stage of step S100, first, the learning model 120 is set. In the present embodiment, as the learning model 120, a model that uses both machine learning using the input image 20 itself, in other words, pixel values, and machine learning using a feature quantity obtained by quantifying interpretable features determined in advance and related to the subject of the input image 20 is set. From the viewpoint of using different types of inputs, the model may be called a multimodal model.

[0022] The learning model 120 includes an image CNN 132 as a machine learning method using pixel values. The image CNN 132 is a CNN (Convolutional Neural Network) that takes images as input. There are no restrictions on the number of layers in the CNN or the number of nodes (also called filters, kernels, etc.) in each layer, but in this embodiment, a model up to the fully connected layer in VGG16 (i.e., VGGNet 16 layers) is used. For example, the image CNN 132 repeatedly performs convolution and pooling on an input image 20 having pixel values (96 vertical × 96 horizontal × 3 color), calculating 512 nodes and the weights for each node. These weights for the 512 nodes are used in subsequent stages as 512-dimensional second features.

[0023] The learning model 120 includes a feature vector NN134 as a machine learning method that uses quantified features. The feature vector NN134 is a single-layer or multi-layer neural network, and there are no restrictions on the number of layers or the number of nodes in each layer. In this embodiment, the feature vector NN134 accepts a first feature vector input containing 13 features, and calculates 8 nodes and the weights for each node. These weights for the 8 nodes are used in subsequent stages as an 8-dimensional third feature vector.

[0024] The first features are calculated from the input image 20 by the feature calculation unit 100. OpenCV is used as an example of the feature calculation unit 100, but other image processing engines may also be used. Alternatively, instead of using the feature calculation unit 100, the user may identify the first features and directly input them into the feature array NN134.

[0025] The first feature is a numerical representation of a predetermined, interpretable feature related to the subject of the input image 20. This predetermined, interpretable feature is a feature that the user can understand quantitatively and intuitively. In this embodiment, corresponding to determining the period of cells depicted in the input image 20, the predetermined, interpretable feature includes features related to the shape of the cells. Examples of these features are 13 in total: cell area, convex hull, perimeter, maximum fillet system, minimum fillet system, diameter, rectangle length, rectangle width, roundness, convexity, elongation, roughness, and unevenness. The first feature may be other than these (for example, cell density), and there is no limit to the number of features.

[0026] The learning model 120 further includes a classification NN 136. The classification NN 136 is a fully connected layer, the last layer of the image CNN 132 and the last layer of the feature NN 134, and performs four class classifications. In other words, the classification NN 136 can be described as a classifier that determines four classes from a 520-dimensional input consisting of a 512-dimensional second feature and an 8-dimensional third feature. The four classes to be classified are the stages of the cell cycle: "G1: DNA synthesis preparation" (Class 1), "S: DNA synthesis and replication" (Class 2), "G2: two sets of chromosomes" (Class 3), and "M: cell division" (Class 4).

[0027] The learning model 120 before training is initialized, for example, by user input. In this case, the learning device 10 may store an overview of the model in the storage unit 108, accept input from the user regarding the number of network layers and the number of nodes, and initialize the learning model 120 based on these.

[0028] In the learning phase of step S100, for example, 100,000 pairs are prepared, each consisting of an input image 20 (also called a training image), a first training feature, and a correct class. The learning model 120 is trained by reducing the error between the judgment result and the correct answer when these training images and first training features are input. For example, backpropagation can be used to reduce the error. Other learning methods may be used, and further learning methods such as dropout may be used in combination.

[0029] As described above, the learning model 120 is trained using the pixel values of the input image 20 and the first learning feature as inputs, and the result of a predetermined judgment (period) on the subject (cell) of the input image 20 as output. The trained learning model 120 is stored in the memory unit 106. This completes the operation of step S100. Note that the contents of the 512 nodes of the second feature change as training progresses, but even if the trained nodes are displayed visually, it is not intuitively clear to the user what they represent.

[0030] Next, in the determination stage of step S110, the determination unit 102 reads the trained learning model 120 from the storage unit 106, inputs the input image 20 to be explained and a feature quantity (first feature quantity) which is a numerical representation of a predetermined interpretable feature related to the subject (cell) of the input image 20 to be explained into the learning model 120, and performs a predetermined determination (cell cycle) of the subject. As a result of the determination, the determination unit 102 outputs a predicted probability for each of the four classes, indicating the likelihood of belonging to each class.

[0031] Furthermore, simply outputting the judgment result may not allow users to understand why that result was reached, and they may not be able to utilize the result effectively.

[0032] Next, the explanation stage of step S120 performs two processes in parallel. The first process (step S122) is a process in which the learning model 120 calculates the contribution of the image itself and the contribution of feature quantities (first features) that are numerical representations of predetermined interpretable features related to the subject of the image. The second process (step S124) is a process that visualizes the regions in the image that contributed to the judgment result (visualizes the basis for the judgment). The first and second processes are performed in the explanation output unit 104. First, the first process will be explained. In this embodiment, the first process is a process based on the known method LIME (LOCAL Interpretable model-agnostic Explanations), and uses a method that estimates the explanation model in the vicinity of the data to be explained in the feature space.

[0033] The data to be explained consists of the feature quantities (second feature quantities) of the input image 20 to be explained, and the feature quantities (first feature quantities) that are related to the subject (cells) of the input image 20 to be explained and are numerical representations of predetermined interpretable features.

[0034] Figure 4 is a distribution diagram illustrating the overview of a method for linear regression on the feature space. In the example in Figure 4, a two-dimensional feature space is depicted with feature x0 and feature x1 as the spatial axes. In this figure, the feature to be explained is indicated by an X, other input images with the highest predicted probability of belonging to the same class as the data to be explained are shown as black circles, and other input images with the highest predicted probability of belonging to either the same class as or different from the data to be explained are shown as white circles. In this figure, the solid line is the discriminative surface (boundary surface) that the learning model 120 learned to determine whether an image belongs to the same class as the data to be explained or to a different class.

[0035] As shown by the solid line, the discriminant surface of the learning model 120 is complex. However, if we focus on the "neighborhood of the data to be explained" and perform linear regression (linear approximation) on the discriminant surface, we can consider the slope of that line to correspond to the weights of feature x0 and feature x1 in that neighborhood. In other words, by focusing on the neighborhood of the data to be explained, the weights of each feature when the learning model is approximated by a linear classifier in that neighborhood can be considered as the contribution of each feature of the learning model 120 to the judgment result.

[0036] Therefore, the training data is extracted, and the prediction probability is recalculated using the trained training model 120. The training data consists of the features of the training input images 20 and the features (first training features) which are numerical representations of predetermined interpretable features related to the subject of the training input images.

[0037] Here, in order to define the "neighborhood of the data to be explained," we transform the feature space shown in Figure 4 into a "readable feature space."

[0038] The "readable feature space" is defined as follows: (1) Divide each feature into independent regions (one region corresponds to one square in Figure 5) according to the density of the data to be explained and the training data (collectively referred to as data). (2) For a given feature, assign a value of 1 to training data that belong to the same level as the data to be explained, and a value of 0 to any other data that does not belong to the same level. In other words, project the features of each data into the binary space. Perform this assignment for each data and each feature, and set the cost function of the linear regression based on these.

[0039] Regarding the above definition (1) of the readable feature space, the region division is such that each region contains a predetermined number or range of data points. As a result, as shown in Figure 5, the width of the region, i.e., the numerical range, becomes narrower in areas with high data density for each feature. Conversely, the width of the region becomes wider in areas with low data density.

[0040] Regarding the above definition (2) of the readable feature space, using the said region division, for each data, a readable feature amount z i is assigned. For the feature amount x i of a certain data, if it falls within the same region as the data to be explained, the readable feature amount z i = 1 is assigned. Conversely, for the feature amount x i , if it does not fall within the same region as the data to be explained, the readable feature amount z i = 0 is assigned. For each feature amount of the said data, the readable feature amount is assigned according to the above. Thereby, a space is defined such that if the data to be explained is close, 1s are arranged in the components of z, and if it is far, 0s are arranged.

[0041] FIG. 5 is a distribution diagram for explaining the readable feature space. In the example of FIG. 5, a two-dimensional readable feature space with the feature amount x0 and the feature amount x1 as the spatial axes is depicted. In the figure, the data to be explained is indicated by a cross, and the extracted learning data is represented by black circles. Furthermore, the number of data at each feature amount, that is, the density of the data, is schematically shown by a curve.

[0042] Next, linear regression is performed in the readable feature space. First, for the i-th data, it is formulated as follows.

Equation

[0043] For each of the data to be explained and the learning data, y i and z i are calculated, and under the above formulation, w and b that minimize the squared error between the left side and the right side of the above "Equation 1" are estimated as follows.

Equation

[0044] The i-th component w iHowever, the feature x calculated by linear regression in the readable feature space i This corresponds to the contribution of the coefficient w to the predicted probability. i The relative sizes of these are determined by the feature x of the learning model 120. i It can be said that this reflects the relative magnitudes of their contributions.

[0045] Note that the coefficient w i A positive value of contributes to increasing the prediction probability, while a negative value contributes to decreasing it. The constant b is the model bias, which corresponds to the prediction probability when random data is input.

[0046] The explanation of the readable feature space shown in Figure 5 above, and the linear regression within that space, is a general explanation of the known LIME. In this embodiment, different types of features, namely an image and a first feature (i.e., a feature that quantifies a predetermined interpretable feature), are input to LIME, and the following processing is performed to calculate the contribution of each feature. (1) Convert the first feature into a readable feature. (2) The image features (secondary features) are reduced in dimensionality and converted into readable features.

[0047] The transformation described in (1) above is a transformation to the readable feature space shown in Figure 5 above.

[0048] On the other hand, regarding the transformation in (2) above, since the second feature quantities extracted from the neural network into which the image is input have 512 dimensions in this embodiment, if these are used directly as the dimensions (i.e., spatial axes) of the readable feature space, 512 numerical values are obtained as the contributions of the second feature quantities. However, as mentioned above, each feature of the second feature quantities itself is not intuitively understandable, so the explainability does not improve much with these contributions.

[0049] Therefore, in this embodiment, the dimension of the second feature is reduced and the contribution is calculated using the method described above. For example, the dimension of the second feature is compressed to one dimension, so that the contribution after compression can be considered as the contribution of the "image itself".

[0050] Figure 6 is a distribution diagram illustrating a method for compressing the dimension of the second feature in the feature space shown in Figure 4 to one dimension. In Figure 6, the data to be explained is shown with an "X," and the training data is shown with black circles.

[0051] The method shown in Figure 6 compresses the original 512 dimensions into a feature quantity that represents the distance from the data to be explained in that 512-dimensional space. That is, the distance d is calculated as follows: where x ev v , x v These represent the value of the v dimension out of 512 dimensions in the input image being explained and the input image used for training, respectively.

number

[0052] Furthermore, corresponding to the first feature, the compressed feature is also assigned to a readable feature of either 0 or 1. In this embodiment, if the distance d is greater than or equal to the threshold, a readable feature z=0 is assigned. Conversely, if the distance d is less than the threshold, a readable feature z=1 is assigned.

[0053] This results in the second feature being converted to one dimension and projected into binary space as a readable feature. In other words, a readable feature of the image itself, reflecting the second feature, is defined.

[0054] A total of 14 dimensions of readable feature space is established by combining the 1-dimensional readable feature of the image itself and the 13-dimensional readable feature of the first feature, and linear regression as described in "Equation 2" above is performed. The resulting contribution is the contribution of the feature corresponding to the 14 dimensions, so the user can refer to both the contribution of the image itself and the contribution of the first feature in a comparable manner. Since the first feature is a numerical representation of predetermined interpretable features, the obtained results not only improve interpretability but also reveal the contribution of the image itself, allowing the user to evaluate the meaning of the heatmap created in the second process described later.

[0055] Next, the second process (step S124) will be explained. After the judgment stage in step S110, in step S124, for example, a heatmap is created. The heatmap is an output that allows comparison of the contribution of each region of the image used as the explanatory target in obtaining the judgment result for the explanatory target by the learning model 120. Note that it is not limited to a heatmap, but any method that visualizes the regions in the explanatory target image that contributed to the judgment result (visualizes the basis for the judgment) is acceptable. For example, from the explanatory target image, the contributing regions (regions that formed the basis) (all regions if there are multiple) (if there are multiple regions) may be extracted according to the degree of contribution to the judgment result. In this embodiment, the second processing is performed using only the information from the image CNN 132 within the learning model 120.

[0056] There are various methods for generating heatmaps, including CAM (Class Activation Map), a class activation mapping technique for learning models, and its derivatives (Grad-CAM and Grad-CAM++), which use gradients for weighting; ScoreCAM, which uses forward propagation for weighting without gradients; and Guide-BP and IngetrateGrad.

[0057] In a typical Grad-CAM, the gradient from the output of the final layer of the CNN is used to calculate the influence of each pixel value in the input image on the prediction probability of each class. However, instead, the gradient from the output of the intermediate layer, or the average of the gradients from the output of each layer, may be used. Furthermore, instead of a heatmap, the input image may be perturbed and divided into several superpixels, and then the aforementioned LIME may be applied to visualize the regions in the input image that serve as the basis for the decision. In this embodiment, any of these methods may be used.

[0058] After the first process (step S122) and the second process (step S124) described above are completed, the explanation output unit 104 outputs the result of the first process (contribution) and the result of the second process (hereinafter referred to as the processing result) to a display or the like. Instead of outputting the processing result to the display, or in addition to doing so, the explanation output unit 104 may store the processing result in the storage unit 106. Alternatively, the processing result may be output together with the result of the determination by the determination unit 102.

[0059] Figure 7 shows an example of a display image 200 that is shown on the display by the explanatory output unit 104. In Figure 7, the input image, which is the input to the learning model 120, is displayed in the target image area 202. Similarly, for the first feature, which is the input to the learning model 120, the names of the 13 features and bar graphs showing their magnitudes are displayed in the first feature area 204.

[0060] The contributions calculated in step S122 (first processing) are displayed in the display image 200 in multiple ways. First, each contribution area 208 displays the name of the feature and a corresponding bar graph showing the magnitude of its contribution. The first feature set shows the same 13 features as the input. On the other hand, the second feature set is shown as a single feature, corresponding to the one-dimensional compression. Furthermore, these are displayed side by side vertically. This allows the user to easily recognize the contributions of the first feature set and the second feature set, improving the explainability of the judgment. Also, since there is only one contribution for the second feature set, it can be interpreted as the contribution of the "image itself," further improving explainability.

[0061] In the cumulative contribution area 210, bar graphs for the predicted probability, features that increase the predicted probability, and features that decrease the predicted probability are displayed vertically, allowing for comparison with each other. The bar graph for features that increase the predicted probability is arranged in a series from left to right, in descending order of positive contribution values. The bar graph for features that decrease the predicted probability is arranged in a series from left to right, in descending order of negative contribution values. The right end of each bar graph is aligned with the right end of the bar graph above it. In addition, the name of the feature is appended above each contribution that is longer than a predetermined length. These displays further improve explainability.

[0062] Furthermore, the contribution is displayed in text area 220 using text. Text area 220 displays the file name to be explained, the predicted probability, the predicted class, and the text report. The text report may display text corresponding to the contribution of the second feature and the contribution of the first feature, respectively.

[0063] Figure 8 shows the text template 230 displayed in the text area 220. The template 230 is stored in the storage unit 106.

[0064] Template 230 contains pre-configured text and variables to be inserted into that text. Variables are indicated by [], and the values corresponding to the symbols written within them are assigned by the determination unit 102 and the contribution output unit 104 and displayed in the text area 220.

[0065] The number of features to be displayed in text area 220 may be predetermined, or it may be those that are greater than the threshold for contribution or the threshold for the absolute value of contribution. These rules may also be stored in storage unit 106.

[0066] Furthermore, the heatmap created in step S124 (second process) is displayed in heatmap area 206. These displays further improve explainability.

[0067] As described above, this embodiment ensures the accuracy of the judgment while clearly presenting the basis for the judgment. In particular, it can improve the explainability of the judgment in so-called multimodal models that use different types of feature inputs for judgment.

[0068] A modified version of the above embodiment is shown. Image CNN132 may be other CNNs, such as AlexNet, VGGNet, ResNet, ResNeXt, etc., instead of VGGNet. Furthermore, other neural networks other than CNNs may also be used. In addition, although the 512 dimensions of the final layer of Image CNN132 were used as the second feature vector, features from the intermediate layers before the final layer may be used instead or in addition to this.

[0069] The dimensionality reduction of the second feature is not limited to one-dimensionalization using distance. Alternatively, the second feature may be reduced in dimension using principal component analysis or related nonlinear dimensionality reduction algorithms. Even further, it may be reduced to one dimension through statistical processing such as taking a simple average of the second feature or its maximum value.

[0070] Explanatory models are not limited to linear regression.

[0071] The above embodiment describes an example in which the learning model 120 is used to determine the cell cycle from a cell image. The applications of the learning model 120 are not limited to this. Examples of other applicable applications are listed in Figure 9, along with the input image and the first feature.

[0072] The first set of features can be calculated automatically or manually from the input image, or they cannot. Examples of features that can be calculated include the radius, length, and other shape features of the subject (object, living being) in the image, as well as color features and characteristic features. Examples of features that cannot be calculated include attribute information such as the gender, age, and race of the owner of the object (subject) in the input image, and the subject of the living being (subject) in the image (these correspond to predetermined, interpretable features related to the subject). For example, location information is quantified using coordinates or indices.

[0073] Furthermore, various embodiments of the present invention may be described with reference to flowcharts and block diagrams, where a block may represent (1) a stage in a process in which an operation is performed or (2) a section of a device having the role of performing an operation. Specific stages and sections may be implemented by dedicated circuits, programmable circuits supplied with computer-readable instructions stored on a computer-readable medium, and / or processors supplied with computer-readable instructions stored on a computer-readable medium. Dedicated circuits may include digital and / or analog hardware circuits, and may include integrated circuits (ICs) and / or discrete circuits. Programmable circuits may include reconfigurable hardware circuits, including logical AND, logical OR, logical XOR, logical NAND, logical NOR, and other logic operations, flip-flops, registers, memory elements such as field-programmable gate arrays (FPGAs), programmable logic arrays (PLAs), etc.

[0074] Computer-readable media may include any tangible device capable of storing instructions to be executed by a suitable device, and as a result, computer-readable media having instructions stored therein will comprise a product containing instructions that can be executed to create means for performing operations specified in a flowchart or block diagram. Examples of computer-readable media may include electronic storage media, magnetic storage media, optical storage media, electromagnetic storage media, semiconductor storage media, etc. More specific examples of computer-readable media may include floppy disks, diskettes, hard disks, random access memory (RAM), read-only memory (ROM), erasable programmable read-only memory (EPROM or flash memory), electrically erasable programmable read-only memory (EEPROM), static random access memory (SRAM), compact disk read-only memory (CD-ROM), digital versatile disk (DVD), Blu-ray (RTM) disk, memory stick, integrated circuit card, etc.

[0075] Computer-readable instructions may include assembler instructions, instruction set architecture (ISA) instructions, machine instructions, machine-dependent instructions, microcode, firmware instructions, state setting data, or source code or object code written in any combination of one or more programming languages, including object-oriented programming languages such as Smalltalk®, Java®, C++, and traditional procedural programming languages such as the C programming language or similar programming languages.

[0076] Computer-readable instructions may be provided locally or via a wide area network (WAN), such as a local area network (LAN) or the internet, to a processor or programmable circuit of a general-purpose computer, a special-purpose computer, or other programmable data processing device, and these instructions may be executed to create means for performing operations specified in a flowchart or block diagram. Examples of processors include computer processors, processing units, microprocessors, digital signal processors, controllers, microcontrollers, and the like.

[0077] Figure 10 shows an example of a computer 2200 in which multiple aspects of the present invention may be embodied in whole or in part. A program installed on the computer 2200 can cause the computer 2200 to function as an operation or one or more sections of an apparatus according to an embodiment of the present invention, or to execute such operation or one or more sections, and / or to cause the computer 2200 to execute a process or a stage of such process according to an embodiment of the present invention. Such a program may be executed by the CPU 2212 to cause the computer 2200 to perform a particular operation associated with some or all of the blocks in the flowcharts and block diagrams described herein.

[0078] The computer 2200 according to this embodiment includes a CPU 2212, RAM 2214, a graphics controller 2216, and a display device 2218, which are interconnected by a host controller 2210. The computer 2200 also includes input / output units such as a communication interface 2222, a hard disk drive 2224, a DVD-ROM drive 2226, and an IC card drive, which are connected to the host controller 2210 via an input / output controller 2220. The computer also includes legacy input / output units such as a ROM 2230 and a keyboard 2242, which are connected to the input / output controller 2220 via an input / output chip 2240.

[0079] The CPU 2212 operates according to programs stored in the ROM 2230 and RAM 2214, thereby controlling each unit. The graphics controller 2216 retrieves image data generated by the CPU 2212 from a frame buffer provided in RAM 2214 or from itself, and displays the image data on the display device 2218.

[0080] The communication interface 2222 communicates with other electronic devices via a network. The hard disk drive 2224 stores programs and data used by the CPU 2212 in the computer 2200. The DVD-ROM drive 2226 reads programs or data from the DVD-ROM 2201 and provides them to the hard disk drive 2224 via the RAM 2214. The IC card drive reads programs and data from the IC card and / or writes programs and data to the IC card.

[0081] The ROM 2230 stores boot programs and / or programs that depend on the computer 2200's hardware, which are executed by the computer 2200 when activated. The input / output chip 2240 may also connect various input / output units to the input / output controller 2220 via parallel ports, serial ports, keyboard ports, mouse ports, etc.

[0082] The program is provided on a computer-readable medium such as a DVD-ROM 2201 or an IC card. The program is read from the computer-readable medium and installed on a hard disk drive 2224, RAM 2214, or ROM 2230, which are also examples of computer-readable medium, and executed by the CPU 2212. The information processing described within these programs is read by the computer 2200, resulting in coordination between the program and the various types of hardware resources described above. The apparatus or method may be configured to realize the manipulation or processing of information in accordance with the use of the computer 2200.

[0083] For example, when communication is performed between a computer 2200 and an external device, the CPU 2212 may execute a communication program loaded into RAM 2214 and, based on the processing described in the communication program, instruct the communication interface 2222 to perform communication processing. Under the control of the CPU 2212, the communication interface 2222 reads transmission data stored in a transmission buffer processing area provided in a recording medium such as RAM 2214, a hard disk drive 2224, a DVD-ROM 2201, or an IC card, transmits the read transmission data to the network, or writes received data received from the network to a reception buffer processing area provided on the recording medium.

[0084] Furthermore, the CPU 2212 may read all or necessary parts of files or databases stored on external storage media such as the hard disk drive 2224, DVD-ROM drive 2226 (DVD-ROM 2201), or IC card into the RAM 2214, and perform various types of processing on the data in the RAM 2214. The CPU 2212 then writes the processed data back to the external storage media.

[0085] Various types of information, such as various types of programs, data, tables, and databases, may be stored on the recording medium and subjected to information processing. The CPU 2212 may perform various types of processing on the data read from RAM 2214, including various types of operations, information processing, conditional judgments, conditional branching, unconditional branching, information retrieval / replacement, etc., as described throughout this disclosure and specified by the program instruction sequence, and write the results back to RAM 2214. The CPU 2212 may also retrieve information in files, databases, etc., within the recording medium. For example, if multiple entries are stored in the recording medium, each having an attribute value of a first attribute associated with an attribute value of a second attribute, the CPU 2212 may search among the multiple entries for an entry that matches the condition for which the attribute value of the first attribute is specified, read the attribute value of the second attribute stored in that entry, and thereby obtain the attribute value of the second attribute associated with the first attribute that satisfies a predetermined condition.

[0086] The programs or software modules described above may be stored on or near computer 2200 on a computer-readable medium. Alternatively, recording media such as hard disks or RAM provided within a server system connected to a dedicated communication network or the Internet can be used as computer-readable media, thereby providing programs to computer 2200 via the network.

[0087] Although the present invention has been described above using embodiments, the technical scope of the present invention is not limited to the scope described in the above embodiments. It will be apparent to those skilled in the art that various modifications or improvements can be made to the above embodiments. It will be clear from the claims that such modified or improved forms may also be included in the technical scope of the present invention.

[0088] It should be noted that the execution order of operations, procedures, steps, and stages in the devices, systems, programs, and methods shown in the claims, specifications, and drawings is not explicitly stated as "before," "prior to," etc., and that these can be performed in any order unless the output of a previous process is used in a later process. Even if the operation flow in the claims, specifications, and drawings is described using phrases such as "first," "next," etc. for convenience, this does not mean that it is mandatory to perform the operations in that order. [Explanation of symbols]

[0089] 10 Learning device, 20 Image, 100 Feature calculation unit, 102 Judgment unit, 104 Explanation output unit, 106 Memory unit, 120 Learning model, 132 Image CNN, 134 Feature NN, 136 Classification NN, 200 Display image, 202 Target image area, 204 First feature area, 206 Heatmap area, 208 Each contribution area, 210 Cumulative contribution area, 220 Text area, 230 Template, 2200 Computer, 2201 DVD-ROM, 2210 Host controller, 2212 CPU, 2214 RAM, 2216 Graphics controller, 2218 Display device, 2220 Input / output controller, 2222 Communication interface, 2224 Hard disk drive, 2226 DVD-ROM drive, 2230 ROM, 2240 Input / output chip, 2242 Keyboard

Claims

1. A storage unit that stores a learned model, which takes a training image and a learning feature quantity that quantifies predetermined interpretable features related to the subject of the training image as input, and outputs the results of judgments on the training image and the learning feature quantity. A determination unit outputs a determination result using the learning model stored in the memory unit, with respect to an image to be explained and a first feature quantity which is a numerical representation of a predetermined interpretable feature related to the subject of the image to be explained. An explanation output unit outputs the contribution of the explanation image and the contribution of the first feature to the results of the determination of the explanation image and the first feature by the learning model, A feature calculation unit calculates at least one of the first features based on the image used for explanation and inputs it to the learning model. A learning device equipped with the following features.

2. The learning device according to claim 1, wherein the predetermined interpretable features include parameters that can quantitatively represent the shape or characteristics of the subject.

3. The determination unit extracts a second feature from the image to be explained using the learning model, The learning device according to claim 1, wherein the explanatory output unit calculates the contribution by performing linear regression on the neighborhood of the data consisting of the first feature and the second feature related to the image to be explained in a feature space including the second feature and the first feature.

4. The determination unit extracts a second feature from the image to be explained using the learning model, The learning device according to claim 1, wherein the explanatory output unit compresses the dimension of the feature vector, which is the second feature, and calculates the contribution of the second feature as the contribution of the image to be explained.

5. The learning device according to claim 4, wherein the explanatory output unit compresses the feature vector into one dimension.

6. The learning device according to claim 1, wherein the learning model includes a convolutional neural network.

7. The learning device according to claim 1, wherein the explanatory output unit further outputs an image showing the basis for the determination within the image to be explained, corresponding to the result of the determination by the learning model.

8. The learning device according to claim 1, wherein the explanation output unit displays the contribution of the image to be explained and the contribution of the first feature side by side.

9. The learning device according to claim 7, wherein the explanation output unit displays the contribution of the image to be explained, the contribution of the first feature, an image showing the basis for the determination, and the result of the determination side by side.

10. The learning device according to claim 8, wherein the explanatory output unit displays text corresponding to the contribution of the image to be explained and the contribution of the first feature.

11. On the computer, A memory function that takes a training image and training features, which are numerical representations of predetermined interpretable features related to the subject of the training image, as input, and stores the trained training model in a memory unit as output, with the results of the judgment on the training image and the training features being the output. A judgment function that outputs a judgment result using the learning model stored in the memory unit, with respect to an image to be explained and a first feature quantity which is a numerical representation of a predetermined interpretable feature related to the subject of the image to be explained, An explanatory output function that outputs the contribution of the explanatory image and the contribution of the first feature to the results of the determination of the explanatory image and the first feature by the learning model, A feature calculation function that calculates at least one of the first features based on the image used for explanation and inputs it into the learning model. A program that makes this possible.

12. A learning device that takes a training image and training features, which are numerical representations of predetermined interpretable features related to the subject of the training image, as input, and uses a learning model that has been trained to output the results of judgments on the training image and the training features, to output the results of judgments on an image to be explained and a first feature, which is a numerical representation of predetermined interpretable features related to the subject of the image to be explained, wherein the learning device outputs the results of judgments on an image to be explained and a first feature, which is a numerical representation of predetermined interpretable features related to the subject of the image to be explained, An explanation output unit outputs the contribution of the explanation image and the contribution of the first feature to the results of the determination of the explanation image and the first feature by the learning model, A feature calculation unit calculates at least one of the first features based on the image used for explanation and inputs it to the learning model. A learning device equipped with the following features.