Systems and methods for deriving and optimizing classifiers from multiple datasets

A technique for training data sets, computer systems, applied in the field of systems and methods for deriving and optimizing classifiers from multiple data sets, capable of solving problems such as poor model performance

Pending Publication Date: 2021-11-05
英芙勒玛提克斯公司
View PDF0 Cites 4 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

However, these models have been reported to perform poorly in validation

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Systems and methods for deriving and optimizing classifiers from multiple datasets
  • Systems and methods for deriving and optimizing classifiers from multiple datasets
  • Systems and methods for deriving and optimizing classifiers from multiple datasets

Examples

Experimental program
Comparison scheme
Effect test

Embodiment 1

[0164] Systematic search and inclusion criteria for gene expression studies in clinical infections

[0165] IMX training data meeting defined inclusion criteria for clinical infection studies were obtained from the NCBI GEO (www.ncbi.nlm.nih.gov / geo / ) and EMBL-EBI ArrayExpress (www.ebi.ac.uk / arrayexpress) databases set. Specifically, inclusion criteria included patients in the study who 1) had to be physician-adjudicated for the presence and type of infection (eg, strictly bacterial infection, strictly viral infection, or noninfectious inflammatory disease), and 2) had previously identified Gene expression measurements of 29 diagnostic markers (Sweeney et al., 2015, Sci Transl Med 7(287), pp.287ra71; Sweeney et al, 2016, Sci Transl Med 8(346), pp.346ra91; and Sweeney et al. ., 2018, Nature Communications 9, p.694), 3) over 18 years of age, 4) have been seen in a hospital setting (e.g. emergency department, intensive care), 5) have a community or hospital acquired infection, a...

Embodiment 2

[0167] Normalization of expression data and COCONUT co-normalization

[0168] Normalization was then performed within each study, using one of two methods depending on the platform. For Affymetrix arrays, use Robust Multi-array Average (RMA) (Irizarry et al., 2003, Biostatistics, 4(2):249-64) or gcRMA (Wu et al., 2004, Journal of the American Statistical Association, 99: 909–17) to normalize expression data. Expression data from other platforms were normalized using exponential convolution methods for background correction followed by quantile normalization.

[0169] After normalizing the raw expression data, the COCONUT algorithm (Sweeney et al., 2016, Sci Transl Med 8(346), pp.346ra91; and Abouelhoda et al., 2008, BMC Bioinformatics9, p.476) were used to total Normalize these measurements and ensure they are comparable across studies. Based on the empirical Bayesian batch correction method of ComBat (Johnson et al., 2007, Biostatistics, 8, pp.118-127), COCONUT calculates ...

Embodiment 3

[0171] Developing a sepsis classifier with machine learning

[0172] To develop a sepsis classifier, a machine learning approach was employed. The method involves specifying candidate models, evaluating the performance of different classifiers using training data and specified performance statistics, and then selecting the best performing model to evaluate on independent data.

[0173]In this case, model refers to machine learning algorithms such as logistic regression, neural networks, decision trees, etc. (similar to models used in statistics). Similarly, in this case the main classifier refers to a model with fixed (locked) parameters (weights) and thresholds, which is ready to be applied to previously unseen samples. Classifiers use two types of parameters: weights learned by a core learning algorithm (such as XGBoost), and additional user-supplied parameters that are input to the core learner. These additional parameters are called hyperparameters. Classifier developme...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

Systems and methods for subject clinical condition evaluation using a plurality of modules are provided. Modules comprise features whose corresponding feature values associate with an absence, presence or stage of phenotypes associated with the clinical condition. A first dataset is obtained having feature values, acquired through a first technical background from respective subjects in transcriptomic, proteomic, or metabolomic form, for at least a first of the plurality of modules. A second training dataset is obtained having feature values, acquired through a technical background other than the first technical background, from training subjects of the second dataset, in the same form as for the first dataset, of at least the first module. Inter-dataset batch effects are removed by co- normalizing feature values across the training datasets, thereby calculating co-normalized feature values used to train a classifier for clinical condition evaluation of the test subject.

Description

[0001] Cross References to Related Applications [0002] This application claims priority to U.S. Provisional Patent Application 62 / 822,730, filed March 22, 2019, which is hereby incorporated by reference in its entirety for all purposes. technical field [0003] The present disclosure relates to the training and implementation of machine learning classifiers for assessing a subject's clinical condition. Background technique [0004] Biological modeling approaches that rely on transcriptomics and / or other 'omics'-based data (e.g., genomics, proteomics, metabolomics, lipidomics, glycomics, etc.) meaningful and actionable diagnosis and prognosis. For example, some commercial genomic diagnostic tests are used to guide cancer treatment decisions. The Oncotype IQ test kit (Genomic Health) is an example of such a genome-based test that provides diagnostic information to guide treatment for various cancers. For example, one of these tests, ONCOTYPE for breast cancer (Genomic H...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
Patent Type & Authority Applications(China)
IPC IPC(8): G16B20/00G16B45/00G06N20/00
CPCG16B20/00G16H50/20G16H50/30G06N20/20G06N20/10G06N3/082G16B25/10G16B40/20G16B40/30G06N5/01G06N7/01G06N3/08G16B40/00
Inventor M·B·梅休L·布图罗维奇T·E·斯威尼R·吕蒂P·卡特里
Owner 英芙勒玛提克斯公司
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products