Unlock instant, AI-driven research and patent intelligence for your innovation.

Generating training data for machine learning models

A technology for machine learning models and training data sets, applied in the field of generating training data for machine learning models

Pending Publication Date: 2022-05-27
AMERICAN EXPRESS TRAVEL RELATED SERVICES CO INC
View PDF0 Cites 0 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

For example, tracking the occurrence of infrequently occurring events can result in small datasets due to missing event occurrences

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Generating training data for machine learning models
  • Generating training data for machine learning models
  • Generating training data for machine learning models

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0014] Various methods are disclosed for generating additional data for training machine learning models to complement small or noisy datasets that may not be sufficient to train machine learning models. When only small datasets are available to train machine learning models, data scientists can try to expand their datasets by collecting more data. However, this is not always possible. For example, datasets representing infrequently occurring events can only be supplemented by waiting an extended period of time for additional occurrences of the event. As another example, a dataset based at least in part on a small population size (eg, data representing a small group of people) cannot meaningfully scale by simply adding more members to the population.

[0015]Additional records can be added to these small datasets, but there are disadvantages. For example, one may have to wait a significant amount of time to collect enough data related to infrequent events in order to have a ...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

PUM

No PUM Login to View More

Abstract

Various embodiments for generating training data for a machine learning model are disclosed. The plurality of original records is analyzed to identify a probability distribution function (PDF), where a sample space of the PDF includes the plurality of original records. A plurality of new records are generated using the PDF. An enlarged data set is created that includes a plurality of new records. A machine learning model is then trained using the expanded data set.

Description

[0001] CROSS-REFERENCE TO RELATED APPLICATIONS [0002] This application claims priority to and the benefit of US Patent Application No. 16 / 562,972, filed September 6, 2019, entitled "Generating Training Data for Machine-Learning Models" . Background technique [0003] Machine learning models often require large amounts of data to train to make accurate predictions, classifications, or inferences about new data. When datasets are not large enough, machine learning models can be trained to make incorrect inferences. For example, small datasets can lead to overfitting of machine learning models to the available data. This can lead to machine learning models that are biased towards certain outcomes by omitting certain types of records from smaller datasets. As another example, by increasing the variance of the performance of the machine learning model, the number of outliers in small datasets may disproportionately affect the performance of the machine learning model. [0004...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

Application Information

Patent Timeline
no application Login to View More
IPC IPC(8): G06K9/62G06N3/04G06N20/00G06F17/18
CPCG06F17/18G06N20/00G06N3/045G06F18/214G06N3/047G06N3/094G06N3/0475G06N3/084G06N20/20G06N3/08G06N7/01
Inventor S·班纳吉J·S·乔杜里P·霍尔R·乔希S·S·萨胡
Owner AMERICAN EXPRESS TRAVEL RELATED SERVICES CO INC