Supercharge Your Innovation With Domain-Expert AI Agents!

System and method for evaluating machine learning model behavior over data segments

a machine learning and data segment technology, applied in the field of system and method for evaluating machine learning model behavior over data segment, can solve problems such as inability to meet the requirements of classification and regression, legal inadmissibility or legal questionable ways, and accuracy and precision may not be achievabl

Pending Publication Date: 2022-01-13
TRUERA INC
View PDF0 Cites 0 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Benefits of technology

The patent text describes a method for evaluating machine learning models by comparing the output of the model applied to different data segments. This method helps to identify which features in the model are most important for the output and how much they contribute to the overall difference between the two data segments. This analysis can be useful for improving the performance of the machine learning model and ensuring that it is accurate and useful in making decisions based on the data it receives. The patent also includes a machine-readable medium and an apparatus for performing this method.

Problems solved by technology

In some cases, these machine learning models may operate differently on different data segments, potentially in legally-impermissible or legally-questionable ways.
In some cases, these machine learning models may operate differently on different data segments, potentially in ways that are impermissible or questionable from a legal or ethical perspective.
Two common types of problems in machine learning are classification problems and regression problems.
Each model develops a rule or algorithm over several epochs by varying the values of one or more variables affecting the inputs to more closely map to a desired result, but as the training dataset may be varied, and is preferably very large, perfect accuracy and precision may not be achievable.
When performing analysis of complex data, one of the major problems stems from the number of variables involved.
Analysis with a large number of variables generally requires a large amount of memory and computational power, and it may cause a classification algorithm to overfit to training samples and generalize poorly to new samples.
The challenge is that for a typical neural network, there may be millions of parameters to be optimized.
Trying to optimize all these parameters from scratch may take hours, days, or even weeks, depending on the amount of computing resources available and the amount of data in the training set.
Some embodiments remediate using methods based on understanding which features lead to undesirable model outcomes.
Remediation methods include but are not limited to feature engineering to adjust features that are causing problems (e.g., by bucketizing feature values differently, dropping features) such as instability or unwanted differential treatment and retraining the model.
This fragility arises in part because ML models can learn more complex relationships from the training data than linear scorecard models.
As used in this document, a consequential data drift has occurred if the distribution of data has changed from train / test to out-of-time (or live production) data in a way that significantly reduces the utility of the model.
The higher this metric, the more significant measured drift and hence the stronger the signal that the model may have become unstable and hence require a careful examination.
Some machine learning models may, in some situations, make decisions based on unjust bias.
In this context, a model may exhibit a form of unjust bias if its decisions result in a protected group being treated unfavorably for reasons that are not justifiable.
Unfavorable treatment: A protected group (e.g. women) is treated unfavorably relative to its complement group (e.g. men) if the model's outputs are “significantly worse” for the protected group.
Direct use: A protected feature may be directly used by the model (e.g., gender) and cause unfavorable treatment.
Indirect or proxy use: Even if the protected feature is not directly used as a feature to the model, it may end up using a “proxy” (an associated feature or feature group) that causes the unfavorable treatment.
However, as a result it is not a directional metric.

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • System and method for evaluating machine learning model behavior over data segments
  • System and method for evaluating machine learning model behavior over data segments
  • System and method for evaluating machine learning model behavior over data segments

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0014]The following description and the drawings sufficiently illustrate specific embodiments to enable those skilled in the art to practice them. Other embodiments may incorporate structural, logical, electrical, process, and other changes. Portions and features of some embodiments may be included in, or substituted for, those of other embodiments. Embodiments set forth in the claims encompass all available equivalents of those claims.

[0015]As discussed above, machine learning models are oftentimes used to make consequential decisions. For example, in consumer banking, a machine learning model may be used to make a preliminary decision to approve or disapprove a customer for a loan. In retail, a machine learning model may be used to identify customers to target with a promotion or advertisement. In spam filtering, a machine learning model may be used to identify an email message as spam or legitimate. The machine learning model may operate as a “black box,” providing an output of “...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

PUM

No PUM Login to View More

Abstract

A computing machine receives a representation of a machine learning model, a representation of a first data segment, and a representation of a second data segment. The computing machine computes an output difference between an output of the machine learning model applied to the first data segment and an output of the machine learning model applied to the second data segment. The computing machine determines a set of reasons for the computed output difference based on a set of metrics defining distance between feature importance distributions, the set of reasons identifying a set of features from a feature vector of the machine learning model along with a relative contribution of each feature to the computed output difference. The computing machine provides an output representing the set of reasons.

Description

PRIORITY CLAIM[0001]This application claims the benefit of priority under 35 U.S.C. 119(e) to U.S. Provisional Patent Application Ser. No. 63 / 049,689, filed on Jul. 9, 2020, entitled “SYSTEM AND METHOD FOR EVALUATING MACHINE LEARNING MODEL BEHAVIOR OVER DATA SEGMENTS,” the entire content of which is incorporated herein by reference.TECHNICAL FIELD[0002]Embodiments pertain to computer architectures for machine learning. Some embodiments relate to a system and method for evaluating machine learning model behavior over data segments.BACKGROUND[0003]In the last decade, machine learning models have become more and more common. These machine learning models are sometimes used to make decisions. For example, in consumer banking, a machine learning model may be used to make a preliminary decision to approve or disapprove a customer for a loan. In some schemes, the machine learning model operates as a black box, providing an output of “approve” or “disapprove,” without any explanation. In so...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

Application Information

Patent Timeline
no application Login to View More
Patent Type & Authority Applications(United States)
IPC IPC(8): G06N5/04G06N20/00
CPCG06N5/041G06N20/00G06N20/20G06N3/045
Inventor DATTA, ANUPAMSEN, SHAYAKGUPTA, APOORVKUROKAWA, DAVID SANDAI
Owner TRUERA INC
Features
  • R&D
  • Intellectual Property
  • Life Sciences
  • Materials
  • Tech Scout
Why Patsnap Eureka
  • Unparalleled Data Quality
  • Higher Quality Content
  • 60% Fewer Hallucinations
Social media
Patsnap Eureka Blog
Learn More