Clinical decision support system for cancer diagnosis
Patent Information
- Authority / Receiving Office
- WO · WO
- Patent Type
- Applications
- Current Assignee / Owner
- CANON KK
- Filing Date
- 2025-11-05
- Publication Date
- 2026-06-11
AI Technical Summary
Current clinical diagnosis of breast cancer relies heavily on invasive procedures like needle biopsies due to the low sensitivity and specificity of mammograms, and there is a lack of integrated diagnostic solutions combining imaging and blood test data.
A method integrating protein biomarker analysis from liquid biopsies with electronic medical record (EMR) data and imaging data using a fusion framework and machine learning to enhance diagnostic accuracy.
Improves diagnostic accuracy by distinguishing between benign and malignant breast cancer lesions with reduced false positives, reducing the need for unnecessary biopsies.
Smart Images

Figure US2025054209_11062026_PF_FP_ABST
Abstract
Description
CLINICAL DECISION SUPPORT SYSTEM FOR CANCER DIAGNOSISCROSS-REFERENCE TO RELATED APPLICATIONS
[0001] This application claims priority to U.S. Provisional Application Serial No. 63 / 716,675 filed November 5, 2024 and to U.S. Provisional Application Serial No. 63 / 849,514 filed July 23, 2025, and to U.S. Provisional Application Serial No. 63 / 910,663 filed November 3, 2025 the disclosures of each of which are incorporated by reference herein in their entirety.FIELD
[0002] This application generally concerns using liquid biopsy combined with machine learning for cancer assessment.BACKGROUND
[0003] Currently, the gold standard for clinical diagnosis of breast cancer is medical imaging using mammograms. Mammograms are capable of detecting morphological image lesions in the breast that may be related to breast cancer, such as masses and calcifications as well as architectural distortions. However, it is difficult to distinguish benign from malignant lesions from mammographic images alone. Thus, patients with questionable mammography results are sent for pathological diagnosis of the breast cancer tissue to determine whether the tissue is benign or malignant. However, the pathological diagnosis requires needle biopsies that are painful and invasive. Additionally, because of the high prevalence of benign lesions, multiple biopsies must be performed to diagnose only a few malignant findings. One challenge in clinical practice is the low sensitivity and specificity of the diagnosis of gold standard mammogram. When tumor-like lesions are found, even if they appear to be benign, a needle biopsy will be performed, resulting in a benign result and causing an unnecessary biopsy.
[0004] Liquid biopsy technology is emerging for cancer diagnosis, which less invasively determines whether a subject has cancer by analyzing biological analytes circulating in the peripheral blood. Breast cancer diagnostic applications that analyze biomarkers such as proteins and DNA / RNA in the blood draw are increasing.
[0005] The challenge in diagnosis as well as in other assessments of cancer, including prognosis, treatment, stratification, and / or monitoring of the cancer is that diagnostic imaging -mammography as well as ultrasound, MRI, and other modalities— and diagnostic blood test exist independently, and an integrated diagnostic solution has not yet been established. Therefore, there is a great need to provide analysis platform that organically integrates clinical information obtained from the two or more data sources to improve diagnostic accuracy.
[0006] Thus, new methods and systems are needed to assess cancers that can integrate data from two or more data sources to improve the assessment of cancers such as breast cancer.SUMMARY
[0007] Accordingly, the present disclosure provides systems, apparatuses, and methods that address shortcomings of the conventional systems.
[0008] An aspect of the present disclosure provides a method of assessing a biological sample from a subject for a cancer such as breast cancer comprising: determining a concentration for each of at least three protein biomarkers selected from Table 1, from Table 2, or from Table 4, analyzing the concentrations of the at least three protein biomarkers using a trained biomarker model to obtain a model output; wherein the biomarker model has been trained to discriminate between cancer samples and benign samples; and providing the model output for the assessment of a cancer.
[0009] An aspect of the present disclosure provides a method of assessing a biological sample from a subject, the method comprising: providing a sample, preferably a sample comprising peripheral blood from a subject, optionally a subject who is suspected of having breast cancer, determining, in the biological sample, a concentration for each of at least four or five protein biomarkers selected from Table 2 or from the sets listed in Table 4.
[0010] In other aspects of the present disclosure, electronic medical record (EMR) data and / or image data such as mammography data of the subject are also obtained and analyzed, where the analyzed data (model output) is fused along with the model output for the biomarker data to provide the assessment of the cancer.
[0011] In other aspects of the present disclosure, methods of assisting in the diagnosis of a breast cancer are provided. Such methods include determining, in a biological sample obtained from the subject, concentrations of MMP8, IL-8, and at least two (2) of CD69, HER3, PTX3, CYR61, p2i, FNi, and CD25; and using the determined concentrations to provide an information about the probability of the subject having breast cancer.
[0012] Provided herein are methods for assessing a biological sample from a subject, e.g., for diagnosing breast cancer, distinguishing between benign and malignant masses, or determining need for further biopsy. The methods comprise providing a sample, preferably a samplecomprising peripheral blood from a subject, optionally a subject who is suspected of having breast cancer, determining, in the biological sample, a concentration for each of at least four or five protein biomarkers selected from the sets listed in Table 4.
[0013] In some embodiments, the protein biomarkers are selected from the group consisting of MMP8; IL-8; CD69; HER3; PTX3; and CYR61. In some embodiments, the protein biomarkers are MMP8; IL-8; CD69; HER3; PTX3; and CYR61. In some embodiments, the concentration of each of the protein biomarkers is determined by a method selected from digital ELISA, optionally Single-Molecule Arrays (SIMOA); Molecular On-bead Signal Amplification for Individual Counting (MOSAIC); Meso Scale Discovery (MSD); Single-Molecule Counting (SMC); LUMINEX; SOMAscan Assays; mass spectrometry (e.g., MALDI-MS), and / or mass cytometry (e.g., CyTOF).
[0014] In some embodiments, the subject has a positive result on imaging, e.g., an identified lesion or mass, optionally identified on a mammogram or ultrasound, before or after the sample is assessed. In some embodiments, the subject is identified based on a positive result on imaging.
[0015] In some embodiments, the methods further comprise calculating a score for the subject based on the level of the biomarkers, wherein a score above a threshold score indicates that the subject has or is at risk of developing cancer.
[0016] In some embodiments, the methods further comprise calculating a score for the subject based on the level of the biomarkers and comparing the score to subtype reference scores for known subtypes of breast cancer and identifying a subject who has a score that is comparable to the subtype reference as having that subtype of breast cancer.
[0017] In some embodiments, the methods further comprise identifying a subject who has a score above a threshold score, or a score that is comparable to a reference score, and recommending or sending the subject for additional evaluation, optionally by imaging and / or biopsy.
[0018] In some embodiments, the methods further comprise administering a treatment for breast cancer to a subject who has been identified as having or at risk of developing breast cancer. In some embodiments, treatment comprises one or more of chemotherapy, hormone therapy, immunotherapy, radiation, or surgical resection.
[0019] These and other objects, features, and advantages of the present disclosure will become apparent upon reading the following detailed description of exemplary embodiments of the present disclosure, when taken in conjunction with the appended drawings, and provided claims.BRIEF DESCRIPTION OF THE DRAWINGS
[0020] For the purposes of illustrating various aspects of the disclosure, wherein like numerals indicate like elements, there are shown in the drawings simplified forms that may be employed, it being understood, however, that the disclosure is not limited by or to the precise arrangements and instrumentalities shown. To assist those of ordinary skill in the relevant art in making and using the subject matter hereof, reference is made to the appended drawings and figures.
[0021] FIG.i illustrates an example embodiment of a medical system.
[0022] FIG. 2(a) illustrates an exemplary clinical decision support system.
[0023] FIG. 2(b) illustrates another exemplary clinical decision support system.
[0024] FIG. 3(a) shows an equation for the weighted sum and FIG. 3(b) is an embodiment of a Fusion strategy showing howto fuse multi-modalities with calculated weights.
[0025] FIG. 4 is a medical image model showing a logic to integrate multiple predictions from image / ROI level to patient or study level.
[0026] FIGS. 5(a) - 5(h) provide biomarker measurements, giving healthy (label=o) vs. benign (label=i) vs. malignant (label=3) samples and their various concentrations for the identified biomarkers. FIGS. 5(i) - 5(m) provide biomarker measurements, giving healthy (label=o) vs. malignant (label=3) samples and their various concentrations for the identified biomarkers.
[0027] FIG. 6(a) and FIG. 6(b) show the performance for health versus malignant samples (FIG. 6(a) and benign versus malignant (FIG. 6(b)) for different biomarker models.
[0028] FIG. 7(a) and FIG. 7(b) show the performance for both benign versus invasive (FIG. 7(a) and DCIS versus invasive cancer (FIG. 7(b)) for different biomarker models.
[0029] FIG. 8 shows the performance for a six-biomarker model.
[0030] FIG. 9 shows the performance for a five-biomarker model.
[0031] FIG. 10(a) and FIG. 10(b) show the performance (ROC)for benign vs cancer (FIG. 10(a) and for DCIS vs invasive carcinoma (FIG. 10(b) for different biomarker panels.
[0032] FIG. 11(a) - (i) show PCA plots and ROC curves demonstrating improved classifier performance with the combination of IgM and IgG biomarkers.
[0033] FIG. 12(a) illustrates an example embodiment of a neural network. FIG. 12(b) illustrates an example embodiment of a convolutional neural network (CNN). FIG. 12(c)illustrates an example of implementing a convolution layer for one neuronal node of the convolution layer, according to an example embodiment.
[0034] FIG. 13 shows an exemplary data processing device.
[0035] Throughout the figures, the same reference numerals, and characters, unless otherwise stated, are used to denote like features, elements, components or portions of the illustrated embodiments. Moreover, while the subject disclosure will now be described in detail with reference to the figures, it is done so in connection with the illustrative exemplary embodiments. It is intended that changes and modifications can be made to the described exemplary embodiments without departing from the true scope and spirit of the subject disclosure as defined by the appended claims.DETAILED DESCRIPTION
[0036] The present embodiments provide a method of assessing a cancer in a subject and clinical decision support system. In breast cancer diagnosis, conventionally, independent clinical diagnosis from multiple modalities, such as medical imaging and blood test, are confirmed by physicians respectively, and physicians make their own integrated diagnosis, resulting in a low sensitivity and specificity.
[0037] In breast cancer diagnosis, conventionally, independent clinical diagnosis from multiple modalities, such as medical imaging and blood test, can be confirmed by physicians respectively, and physicians make their own integrated diagnosis, resulting in a low sensitivity and specificity. The proposed method provides new biomarker panel and framework for appropriate fusion of diagnosis-related information from multiple diagnostic modalities to ensure higher diagnostic accuracy and low false positive rate.
[0038] FIG. 1 illustrates an example embodiment of a medical-data-processing system. The medical-data-processing system includes at least one data-processing device 100, one or more imaging devices 204, 206, and 208, a biomarker-analysis device 302, and a server 402. In this embodiment, these include one or more imaging devices that can be used for cancer assessment such as a C-arm computed tomography (CT) scanner 204, a mammography device 206, and a magnetic-resonance-imaging (MRI) scanner 208. These are examples of imaging devices, and other embodiments may include more imaging devices, fewer imaging devices, different imaging devices (e.g., ultrasound devices, photoacoustic-imaging devices), a single imaging device (e.g., only a mammography device 206), and different combinations of imaging devices (e.g., both amammography device 206 and an ultrasound device (not shown)). When the imaging devices 202-208 perform an imaging operation on a subject (e.g., a patient), the imaging devices 202-208 generate and output groups of image data 200 that define one or more two-dimensional (2D) and / or three-dimensional (3D) images.
[0039] The biomarker-analysis device 302 performs liquid biopsies on one or more patient samples that are blood, plasma, serum, or urine 304 that were collected from patient(s). Analysis of the liquid biopsies provides biomarker data 300. These biomarker data are preferably a combination of data from a number of different biomarkers and sent to the data-processing device 100.
[0040] Electronic-medical-record (EMR) data 400, are collected from a server 402 and sent to the data-processing device 100. The EMR data 400 includes electronic medical records such as medical and / or treatment history of a patient. At the data-processing device 100, the image data 200, the biomarker data 300, and the EMR data 400 are combined to provide diagnosis information.
[0041] This together provides a clinical decision support system that includes a new liquidbased protein biomarker panel and a new fusion framework that fuses information from different diagnostic modalities using information from that panel as one of the input sources to discriminate between healthy or benign or malignant samples. Thus, the new blood-based protein biomarker panel includes multiple biomarkers with different concentration levels between healthy, benign, and malignant samples. These samples can be from patients that have or may be suspected as having breast cancer. New classification algorithms using these biomarkers and a new fusion framework support the data fusion for the different modalities with appropriate weights added to provide high classification performance are provided herein.Biomarker Analysis
[0042] The biomarker-analysis device 302 performs liquid biopsies on one or more samples (e.g., blood, plasma, serum, or urine, (304) that were collected from patient(s). Liquid biopsies analyze biological analytes (biomarkers) that circulate in a patient’s fluids, such as blood, plasma, serum, or urine. For example, some liquid biopsies analyze biomarkers, such as proteins, DNA, and RNA.
[0043] In some embodiments, the biomarker-analysis device 302 is an automated assay device including at least one controller, an assay-consumable handler, a sample loader, a sealer, and an imaging system. The assay-consumable handler can be operatively coupled to an assayconsumable, and the assay consumable includes a plurality of assay sites. The sample loader is configured to load an assay sample (e.g., comprising a plurality of analyte molecules or particles) into the assay sites of the assay consumable. The sealer is configured to apply a sealing component to the surface of the assay consumable. The imaging system is configured to acquire one or more images of the assay sites of the assay consumable. The biomarker-analysis device 302 may also include a bead loader that is separate from or associated with the sample loader, a rinser that is configured to rinse the surface of the assay consumable, a reagent loader that is configured to load a reagent into the assay sites of the assay consumable, and a wiper that is configured to remove excess beads from the surface of an assay substrate. And the one or more controllers may control the other components (the assay-consumable handler, the sample loader, the sealer, the imaging system, the bead loader, the rinser, the wiper) of the biomarker-analysis device 302.
[0044] In some embodiment, the biomarker-analysis device 302 is a standard biomarker analysis system. The biomarker analysis device may be an immunoassay device such as ELISA (enzyme-linked immunosorbent assay) or digital ELISA. One digital ELISA platform that may be used is Single-Molecule Arrays (SIMOA). SIMOA assays have several advantages over conventional ELISA, the current gold standard for protein detection in blood. SIMOA is 1000 times more sensitive than ELISA and allows for quantification of analytes present at low concentrations. It can detect protein concentrations as low as io-19M compared to conventional ELISA's ability to detect only io-12M. Further, the high sensitivity of SIMOA, allows for a more dilute liquid sample, which reduces non-specific binding that arises from matrix effects, and the wide dynamic range that spans four orders of magnitude in concentration allows for the detection of both low and high abundance markers. In some embodiments, the SIMOA technique achieves this high sensitivity by digitally counting the number of molecules in a sample by labeling and physically isolating each immunocomplex into femtoliter-sized wells. These advantages provide for detection and quantification of blood biomarkers for developing a robust biomarker panel. See, for example U.S. Pat. Pub. 2024 / 0036045 herein incorporated by reference in its entirety. See also: “Single-molecule enzyme-linked immunosorbent assay detects serum proteins at sub femtomolar concentrations” DM Rissin, et al., Nature biotechnology 28 (6), 595-599, 2010.
[0045] Another digital ELISA platform that may be used is identified as Molecular On-bead Signal Amplification for Individual Counting (MOSAIC). This platform can attain low attomolar limits of detection, with an order of magnitude enhancement in sensitivity over other ELISA methods. MOSAIC uses a rapid, automatable flow cytometric readout that provides high throughput and easy integration. MOSAIC has a solution-based signal readout and expands the number of analytes that can simultaneously be measured for higher-order multiplexing withfemtomolar sensitivities or below. This platform is further described by in “High-Throughput, High-Multiplex Digital Protein Detection with Attomolar Sensitivity” (C Wu; TJ Dougan’ DR Walt, ACS Nano 2022 Jan 25;16(1):1O25-1O35).
[0046] Other methods that can be used include highly sensitive or ultrasensitive and preferably multiplex detection methods including Meso Scale Discovery (MSD); Single-Molecule Counting (SMC); LUMINEX; SOMAscan Assays; mass spectrometry (e.g., MALDI-MS) and mass cytometry (e.g., CyTOF) (see, e.g., Cohen and Walt, Chem. Rev. 2019, 119, 293-321).
[0047] Other methods and biomarker-analysis devices 302 that can be used include chemiluminescence immunoassay (CLIA) which uses the principles of immunoassay (antigenantibody reactions) combined with chemiluminescence. CLIA may be performed on the CL-JACK system, a fully automated chemiluminescence analyzer made by Minaris Medical Co., Ltd.. Kato, H., et al., “Performance evaluation of the new chemiluminescent intact FGF23 assay relative to the existing assay system” J Bone Miner Metab 40, 101-108 (2022). )
[0048] The biomarker analysis device 302 may include the use of other highly sensitive or ultrasensitive and preferably multiplex detection methods including Chemiluminescence enzyme immunoassay (CLEIA), western blot, Proximity Extension Assay (PEA), Meso Scale Discovery (MSD); Single-Molecule Counting (SMC); LUMINEX; SOMAscan Assays; mass spectrometry (e.g., MALDI-MS) and mass cytometry (e.g., CyTOF) (see, e.g., Cohen and Walt, Chem. Rev. 2019, 119, 293-321). In some embodiments, mass spectrometry, and particularly matrix-assisted laser desorption / ionization mass spectrometry (MALDI-MS) and surface-enhanced laser desorption / ionization mass spectrometry (SELDI-MS), are used for the detection of biomarkers. (See U.S. Pat. Nos. 5,118,937; 5,045,694; 5,719,060; 6,225,047). In some embodiments, other methods can be used, e.g., standard electrophoretic and quantitative immunoassay methods for proteins, including but not limited to, Western blot; Enzyme-Linked Immunospot (ELISPOT); biotin / avidin type assays; protein array detection, e.g., protein microarrays; radio-immunoassay; immunohistochemistry (IHC); immune-precipitation assay; flow cytometry / FACS (fluorescent activated cell sorting); Proximity Ligation Assay (PLA); lateral flow assay; surface plasmon resonance (SPR); optical imaging; and mass spectrometry. The methods typically include revealing labels such as fluorescent, chemiluminescent, radioactive, and enzymatic or dye molecules that provide a signal either directly or indirectly. U.S. Pat. Pub. 2024 / 0036045 provides various biomarkers, combinations thereof, and methods of analysis, each of which are incorporated herein by reference.
[0049] The biomarker-analysis device 302 completes a biomarker analysis on sample from a subject (e.g., a patient), and outputs biomarker data 300 (e.g., groups of biomarker data 300), which indicate the results of the biomarker analysis (e.g., a group of biomarker data 300 may indicate the results of a respective liquid biopsy). In some examples, such as those using SIMOA, the biomarker data 300 is based on the images that are acquired by the imaging system of the biomarker-analysis device 302. In other examples, the biomarker-analysis device 302 can perform a liquid biopsy, which determines whether a subject has cancer by analyzing biological analytes (biomarkers, such as proteins, DNA, and RNA) circulating in the peripheral blood, optionally blood circulating near a lesion. When performing a liquid biopsy, the biomarkeranalysis device 302 measures the amount the biological analyte (differential level, absolute concentration, or relative concentration), detects the presence of any mutations (in the case of nucleic acid (NA) analysis), or measures the copy number variation (CNV).
[0050] The amount of the biological analyte measured provides the concentration or the differential levels of the biomarkers measured. Biomarker data may indicate a quantity (e.g., a count or concentration, such as mass concentration, molar concentration, number concentration, and volume concentration) of respective biomarkers. This may be an absolute concentration of the biomarker, such as provided by an assay described herein such as a SIMOA assay. Alternatively, it may be a count or a relative concentration. Relative value is used when absolute concentration is not available (for example, signal derived from target cannot be converted to absolute concentration due to the lack of standard curve). Relative values are obtained in western blot, protein array, antibody array, PEA assay screening(Olink), label free LC-MS / MS and so on. In some embodiments, relative values can be interpreted in a semi-quantitative manner when normalized using internal reference (e.g., housekeeping protein) or employing spike-in standard. Relative values can be compared between patients’ samples.
[0051] The biomarker data 300 as used herein is preferably data from protein biomarkers. Examples of protein biomarkers include the biomarkers listed in Table 1:TABLE 1
[0052] In some embodiments, the biomarkers comprise at least two, three, four, five, six, seven, eight or more biomarkers selected from MMP8; IL-8; CD69; HER3; PTX3; CYR61; CXCL9; p21; ADAM8; FN1; CA15-3; CD25; EGFR; HER4; PR; SCGB2A2 / Mammaglobin; HSP70; HGF; TFF1; TNFa; COMP; and FABP4. Additional biomarkers that may be included are: CA125; CEACAMi; CEACAM6; CXCL10; EGF; ER; FABP4; GDF15; He4; HER2; IL- lb; IL-6; LCN2; Ml; MICA; MMP1; PRL; TFF2; TFF3; VEGF; CEACAM5; CA27-29; ICAM1(CD54); GDF3; SPATS1; UBQLNi; SMC1B; SFRP1; SERPINB3; SPDEF; TPBG; SPATA19; PLAC1; CCDC33; IGF2; and CASP8. Other biomarkers that may be included are: CXCL16; PIP; ZAG; IGKC; FTL; CCL5; IL4; IFNg; IL10; IL5; IL12p7O; IL22; eCAD; OPN; TIMP; Leptin; CA19-9; NCAM1; and ORFip. The biomarkers may be selected from DNAH10, FAM177B, NUDT15, PCDH7, SHOX2, AKR1B10, TGFA, C1QTNF9, FIG4, C9orfl31, MMEL1, UBTD2, PLEKHA8P1, ENTHD1, TTC5, GAS8, LCORL, SPESP1, GFAP, GAL, AGAP9_AGAP4, DDIT3, ABHD8, ABO, FAM47B, ACRV1, PSCA, C2CD4C, SPINK4, ITGA11, and ANKRD34C. The biomarkers may also be selected from the IgM antibodies CDH3; CDK7; GDF3; IFNW1; MMP1; PRF1; RHOXF2; SCF; SRP54; UBQLNi; and TTR. The biomarkers may also be selected from the IgG antibodies: CASP8_pro; CDK2; COL1A1 FL; COL6A1; ETS2; IMPDH1; MAGEA2; MAGEA4V3; S1OOA; SELE-ext; SPDEF; SSX2B; TP53; andTUBAsC.Thus, in some embodiments, the biomarkers that are particularly useful for assessing a breast cancer are selected from 2, 3, 4, 5, 6, 7, 8 or more biomarkers in Table 2. In some embodiments,the training and / or the assessment is provided with biomarkers exclusively selected from Table 2. In some embodiments, the training and / or the assessment is provided with 2, 3, 4, 5, 6, 7, 8 or more biomarkers from Table 2 and an additional 1, 2, 3, or 4 protein biomarkers that are not listed in Table 2. The additional biomarkers may be assessed for effectiveness by the methods as exemplified herein. TABLE 2
[0053] Other examples of biomarkers that can be used are exemplary protein biomarkers include other cancer-specific biomarkers, tumor associated autoantibodies, immune systemspecific biomarkers, etc. Some biomarkers that can be used are described in U.S. Pat. Pub. 2024 / 0036045. For purposes of describing the biomarkers, the summary, drawings, and detailed description of U.S. Pat. Pub. 2024 / 0036045 are incorporated by reference herein, except for any definitions, subject matter disclaimers or disavowals, and except to the extent that the incorporated material is inconsistent with the express disclosure herein, in which case the language in this disclosure controls.
[0054] In some embodiments, one or more biomarkers or candidate biomarkers are selected by integrating single-cell RNA sequencing (scRNA-seq) data, gene expression profiles from The Cancer Genome Atlas (TCGA), clinical pathologist input, and literature review. The protein panel may be selected from a non-exclusive list of: o Interleukins (IL-1[3, IL-4, IL-5, IL-6, IL-8, IL-10, IL-i2p7O, IL-22) o TNFa, IFNy, o LINE1 ORFip, HERV-k, MMP8, CXCL9, CXCL10, CXCL16, CD25, CD69 o DNAJBl, HSPA8,o SCGB2A, PIP, AGR2, AZGP1, KRT18 o IGKC, FTL, CSTB, CCL5 o ADAM8, CA15-3, CA125, CA19-9, CYR61, CEACAM1, EGF, HER1 / EGFR, ER, GDF15, He4, HER2, HER3, HER4, HSP70, LCN2, MICA, P21, PR, PTX3, VEGF o NCAM1, TIMP1, COMP, FTL, OPN (Osteopontin), FABP4, eCadherin, Prolactin (PRL), TFFl, TFF2, TFF3, HGF, CEACAM6, MMP1.
[0055] Multiple biomarkers were assessed with the OLINK Explore HT platform, which is commercial resource of the PEA assay available from Olink (part of Thermo Fisher Scientific, Waltham, MA, USA) and provides high-multiplexed proteome analysis tool. Approximately 5400 proteints were analyzed. 40pL of each serum sample was transferred into each 0.75 mLtube with label to identify sample ID. Biological samples were frozen and transported with dry ice to Broad Clinical Labs (Burlington, MA, USA) for processing and data generation for Olink screening.
[0056] Through the Olink, NPX data, which shows the relative concentration of target proteins in biofluids, are obtained for each target protein and compared between benign and malignant patients for breast cancer. Of the 172 samples, 87 were benign and 85 had breast cancer. Only one or two samples have missing values. The missing values were imputed using median imputation.
[0057] Biomarkers can be found though the comparison of protein concentration between benign and malignant patients. For example, t test of mean difference of NPX values between benign and malignant patients are conducted for each protein and results are visualized and interpreted by volcano plot. Biomarkers can be found through the classifier model built based on the NPX data set considering the combination and correlation between each biomarkers.
[0058] The selection of biomarkers from the Olink dataset included: selecting biomarkers with data generally follows normal distribution using the Shapiro test (p> 0.01); removing biomarkers with AUC values between 0.45 and 0.55 as univariate analysis, using roc_auc_score API in scipy.stats; removing biomarkers p-values greater than 0.05 in U test using mannwhitneyu API in scipy.stats. This process provided 61 biomarkers.
[0059] In some embodiments, the biomarker data 300 further includes data from DNA and / or RNA biomarkers. DNA, RNA, and / or protein biomarkers that may be used to generate data for the biomarker data 300 are found, for example, in U.S. Patents and U.S. Patent Publications: 10,001,484; 10,416,164; 10,436,787; 10,473,662; 8,420,333; 8,455,200; 2020 / 0182874; 2021 / 0072245; US2021 / 0072246; US2021 / 0041440; US2017 / 0089903; W02021 / 202351; and WO2O21 / 222220. The biomarkers found in “Liquid Biopsy in Breast Cancer: A Focused Review”(Timothy Kwang Yong Tay and Puay Hoon Tan, Archives of Pathology & Laboratory Medicine (Vol. 145, Issue 6), 2021) may also be used.
[0060] A plurality of protein biomarkers creates the biomarker data 300 as used herein. For example, a panel of at least 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 24, 30 or more biomarkers may be assessed as a biomarker panel to create the biomarker data 300 for a patient. In some embodiments, the panel has less than 100, 50, 30, 25, 20, 15, or 10 different biomarkers. The size and identity of the biomarkers used in the panel may depend on the assessment needed for the particular patient. In some embodiments, the biomarker data 300 obtained from the biomarker-analysis device 302 contains information regarding a set of biomarkers that are not used in the analysis to assess the cancer. In other embodiments, the panel of biomarkers used for a patient is commensurate in scope with the panel of biomarkers used to assess the cancer. The number of biomarkers that are incorporated into the panel of biomarkers may be dependent on the particular assay.
[0061] In some embodiments, the biomarker panel is analyzed using the biomarker analysis device 302. In each sample, the plurality of biomarkers in the panel that will have a concentration level, which may be different for each of the biomarkers.
[0062] In some embodiments, the biomarker panel will provide different concentration levels between benign and malignant patients for breast cancer, between healthy and malignant patients for breast cancer, and / or between healthy and benign patents for breast cancer. While a single biomarker is often not sufficient for the assessment of cancer such as a classification of healthy vs. nonhealthy, the biomarker panel is analyzed with classification algorisms to determine the overall assessment.EMR Data
[0063] The server stores electronic-medical-record (EMR) data 400, which define one or more electronic medical records such as medical and / or treatment history of a patient. This EMR data 400 may be part of the stored patient’s health record, the record for the image data 200, the information obtained from a different source, or some combination thereof. In some embodiments, the EMR data 400 are part of a patient’s electronic health record (EHR). Storing the EMR data 400 as part of an EHR may be advantageous since this data may go beyond standard clinical data collected in the provider’s office and can be inclusive of a broader view on a patient’s care. Both the EMR and EHR data should be stored securely and may be stored in a variety of formats.
[0064] For assessing a cancer, the EMR data 400 will include one or more information about the patient, either from their history or their current status. For assessing breast cancer, the EMR data 400 will include at least some information from the EMR, such as the non-limiting breast cancer-related data described in Table 3:TABLE 3
[0065] In some instances, some of the features listed above or in other EMR datasets may be combined into a single feature. For example, the feature of prior or current hormonal therapy may not matter in the analysis, but an extended use of hormone therapy or a high dosage of hormone therapy over a time period may be included.
[0066] For assessing a cancer that may be one of a plurality of cancers in addition to breast cancer, the EMR data 400 may include some of the information from Table 3 as provided above, as relevant as well as other EMR data that may be particularly relevant for assessment of other types of cancer (e.g., lung cancer, gastric cancer, liver cancer, pancreatic cancer, colorectal cancer, prostate cancer, hepatocellular cancer, cervical cancer, ovarian cancer, bladder cancer, cancer of the urinary tract, thyroid cancer, renal cancer, carcinoma, melanoma, leukemia or brain cancer).The use of natural language processing of EMR data for analyzing various cancers is described in Chengtai, Li et al., Diagnostics (Basel) 2023 Jan 12;13(2):286, and such processes and data may be used in conjunction with the present invention. Other examples of EMR data are, for example, for brain cancer, the mutational status of IDH1 may be used. For lung cancer, the patent’s smoking history may be used. Other exemplary particularly useful EMR data 400 for lung cancer may be found, for example, in Dadhania et al. JCO 43, 8053-8053(2025) or in Raghu, VK et al., JAMA Netw Open 2022 Dec 2855(12). For prostate cancer, Gleason scoring and EMR-derived such as those described Nagpal, K, et al., npj Digit. Med. 2, 48 (2019) may be used.Image Data
[0067] The present methods can include assessment based on image data. Image data may come from one or more of a variety of imaging modalities. Examples of image modalities include an x-ray modality (e.g., two-dimensional (2D) x-ray images, computed-tomography (CT) images), a magnetic-resonance-imaging (MRI) modality, a fluoroscopic modality, an ultrasound modality, a tomosynthesis modality, a positron-emission-tomography (PET) modality, an endoscope modality, and a digital-pathology-scanner modality.
[0068] The data in the image modality include image data 200 that define one or more of the following: an x-ray image (a mammogram), a computed-tomography image, a magnetic- resonance-imaging image, a fluoroscopic image, an ultrasound image, and a positron-emission- tomography image. The images may be two-dimensional (2D) or three-dimensional (3D) images. The image data may be read by a radiologist, by a machine learning system, or a combination thereof.
[0069] The image data may be a single image, or multiple images may be used. For example, for breast images, the image data may comprise one or more cross-sectional images such mediolateral oblique (MLO) and craniocaudal (CC) images. It may comprise multiple types of images, including, for example, 2D images, 2D composite images, and 3D tomosynthesis images. Multiple images from a single breast maybe use. Multiple images from different image modalities such as a mammogram image, an ultrasound image, and an MRI image may be used. In some embodiments, the image data may be annotated image data. A radiologist or other expert may optionally annotate one or more image to create the image data. Alternatively, or in addition, an Al or other machine learning algorithm may provide annotation. The annotation can indicate whether a mass or lesion was found and / or where in the image a mass or lesion is located.
[0070] The imaging devices 202, are exemplified as a C-arm computed tomography (CT) scanner 204, a mammography device 206, and a magnetic-resonance-imaging (MRI) scanner 208. These are examples of imaging devices, and other embodiments may include more imaging devices, fewer imaging devices, different imaging devices (e.g., ultrasound devices, photoacousticimaging devices), a single imaging device and different combinations of imaging devices.Machine Learning Module | Fusion Framework
[0071] The fusion framework may be a fusion framework that fuses data from different diagnostic modalities and will include two or more of image data 200, biomarker data 300, and / or EMR data 400. This information can be used as an input source to discriminate between healthy or benign or malignant patients. This fusion framework supports the data fusion for at least two or at least three different modalities with appropriate weights added to provide high classification performance.
[0072] As exemplified in FIG. 2(a), the data- acquisition device 100 sends one or more of image data 200, biomarker data 300, EMR data 400, and optional other modality data 500, to a machine learning module 102. In this module, a model building element 110 will build one or more of a medical image model 112 for medical image data 200, a biomarker model 114 for biomarker data 300, an EMR model 116 for EMR data 400, and potentially other models 118 for other modality data 500. Optionally, the model building element 110 will build one or more models that use more than one type of data.
[0073] After the model building element 110, the data is moved to the data / model fusion element 120. In this element, data preprocessing 122 and calculation 124 occur. In data preprocessing, preprocessing such as combining and normalizing datasets to be input to the fusion model is conducted according to the fusion method determined by the data / model fusion element. In the calculation, the method of integrating the results from the selected models and the weights for each model are set.
[0074] When more than one data type (e.g., image data 200 and / or biomarker data 300 and / or EMR data 400) are used, the data fusion unit 120a the machine learning module 102 has a data fusion step. In this data fusion step, the data-processing device 100 obtains data in an image modality (the groups of image data 200); data in a biomarker modality (the biomarker data 300); and, in some embodiments, EMR data 400, and, in some embodiments, data from an additional modality 500. The data-processing device 100 can then use one or more modalities of the groups of image data 200, the biomarker data 300, and the EMR data 400 to assess the cancer. Theassessment may by to provide a one or more classification results. Examples of classification results include diagnoses and medical predictions, and thus the classification results may indicate the presence or absence of a medical condition. A classification result may also indicate the respective presences or absences of multiple diagnoses, and a classification result may indicate a respective likelihood of each diagnosis that is included in the classification result. For example, a classification result may indicate at least some of the following: the presence of absence of a cancer, such as breast cancer; whether a tumor is benign or malignant; a type of cancer, for example invasive or non-invasive; and a cancer molecular subtype (e.g., Luminal type, Hexenriched type, or Triple Negative Breast Cancer subtype). Other categories may also be used to classified, or the classification may be defined differently, such as the molecular subtype may be defined more specifically or differently, such as by using the most updated classification from the WHO classification for breast tumors.
[0075] Next, the prediction execution element 130 predicts an outcome for assessing a cancer. The predictions may include the diagnosis, prognosis, treatment, stratification, and / or monitoring of the cancer. For example, the prediction execution element 130 may diagnose a patient with a specific type of breast cancer or it may stratify the breast cancer into a risk category as to whether they have breast cancer or whether they have a specific type of breast cancer.
[0076] After the machine learning module 102, the information may be sent to a GUI 104 and may then provide a classification result and / or a prediction probability 106. For example, the prediction probability may be that there is a 40% probability that a breast cancer is present. This prediction probability is not necessarily categorical and can be a continuous assessment.
[0077] FIG. 2(b) is another embodiment of the invention but provides for a situation where there is more limited input of the different types of data. This system exemplifies a system where only the biomarker data is used to assess a cancer. In FIG. 2(b), the data-acquisition device 100 obtains biomarker data 300 This data is sent to a machine learning module 102 where a model building element 110 will build a first biomarker sub-model 114 for the biomarker data 300. In this embodiment, a second biomarker sub-model 115 is also provided. This second biomarker submodel may be absent or present (if absent, the first biomarker sub-model and the biomarker model are synonymous). Additionally, further biomarker models (e.g., a third biomarker model) may also be used. The second (and optionally further biomarker models) can, for example, evaluate a different clinical condition. The data from the first biomarker sub-model 114 and the second biomarker sub-model 115 are moved to the data / model fusion element 120. In this element, data preprocessing 122 and calculation 124 occur. The next step involves the predictionexecution element 130 where predictions are made to assess a cancer. The predictions may include any assessment of the cancer such as diagnosis, prognosis, treatment, stratification, and / or monitoring. After the machine learning module 102, the information may be sent to a GUI 104 and may then provide a classification result and / or a prediction probability 106.
[0078] An exemplary fusion strategy is shown in FIGS. 3(a) and 3(b). In this strategy, the decisions for a first model, 310 and a second model, 312 are integrated using weighted averages, where H(x) is the classification result, Wi is the assigned weight to classifier Ci(x) and Wi satisfies Wi > o and total sum of w; =1. In this embodiment, a first model, 310 and a second model, 312 are both provided, where the first model is weighted with weight wi and the second model is weighted with weight w2. Each weight (wi and w2) can be dynamically set to an optimal value based on the performance indicators output from Model 310 and Model 312. In the basic logic, AUC are applied for the weight calculation. A weighted average fusion 314 is then calculated to provide the predicted results 316. In an exemplary embodiment, the AUCs of Model 310 and Model 312 are 0.8 and 0.6, respectively, where the weights are set as wi = 0.57 and w2 = 0.43. In some embodiments, based on the preference of a user or site, one or more of Sensitivity, Specificity, Accuracy, or a combination thereof can be used for the calculation instead of AUC. In the exemplary classification of benign vs. malignant breast cancer using a non-image model and an image model, the best performance was obtained by assigning weights 0.6 and 0.4 to non-image model and medical image model, respectively.
[0079] Similarly, as illustrated in FIG. 3(b), the two or more biomarker models can be combined within the workflow of FIG. 3(a) where, along with the two or more biomarker models (114 and 115), the biomarker data is combined with a medical imaging model 112, with an EMR model, n6or with both a medical imaging model 112 and an EMR model 116.
[0080] FIG. 4 provides an exemplary assessment using a medical image model 112 and provides integration of multiple predictions to patient or study level. The logic to integrate multiple predictions from image / ROI level to patient or study level is shown in this example where different ROI, including the ROI from MLO 410, the ROI from a different MLO, 412, the ROI from CC 414 and another ROI from CC 416 are included. These images containing the ROI (410, 412,414 and 416) are exemplified as from MLO and CC, but they may be any combination of multiple cross-sectional images. They may also include multiple types of images, including 2D images, 2D composite images, and 3D tomosynthesis images and / or multiple images from the right breast and / or the left breast. The multiple images may also be from different image modalities such as Mammogram, Ultrasound, and MRI. In this example, the ROI (410, 412,414and 416) are input into the deep learning (DL) model 420 where the output includes probabilities 430, 432, 434, and 436 which are probabilities for each of the ROI. The probabilities are then combined to calculate the average 440 and a prediction 450 is output.Biomarker Models within the Machine Learning Module
[0081] As discussed above, the biomarker model 114 may be used alone or combined with a second biomarker sub-model 115. The first, or the first and second biomarker sub-model(s) 114 (and 115) may be used alone for the assessment of cancer, or it may be combined with other models. Each of the first and the second biomarker sub-models or models discriminate between two or more of a variety of sample types.
[0082] Examples of these different sample types that are used for cancer analysis and model development include invasive cancer samples, benign samples, healthy samples, and samples of distinct types or sub-categories of cancer. For clarity, we note that healthy samples are samples where the patient does not have cancer, or does not have breast cancer. This definition of healthy does not inform on any other disease or illness the patient may have and does not assess the overall patient health. Further, when an assessment of cancer is the assessment of, for example, breast cancer, a healthy sample is indicative of the patient not having breast cancer. Other cancers are not necessarily assessed.
[0083] Nonlimiting examples of these distinct types of discrimination for cancer analysis include:Benign vs. Invasive CancerBenign vs. Cancer in situBenign vs. microinvasive CancerHealthy vs. Invasive Cancer Healthy vs. Cancer in situ Benign vs. Healthy
[0084] For the assessment of breast cancer, examples of these different sample types that may be used for the discrimination and model development are used include invasive breast cancer, benign, healthy, Invasive Ductal Carcinoma (IDC), Ductal Carcinoma in situ (DCIS), and Invasive Lobular Carcinoma (ILC).
[0085] In these biomarker models, it is often the case where the concentration of a particular biomarker increases with the occurrence of a cancer. However, for some biomarkers, it is found that the concentration (or differential level) of a biomarker will be lower than a healthy sample. Specifically, it has been found that a lower concentration of the biomarker TFF1 is found in amalignant sample compared to a healthy control. It has also been found that a lower concentration of the biomarker MMP8 is found in a malignant sample compared to a healthy control.Multiple Biomarker Models
[0086] In some embodiments, it is advantageous to have multiple biomarker models 114. While a single model that addresses, for example, whether a patient has a cancer or not, the use of several different models provides discrimination within the cancer assessment not previously provided. There are multiple classes of biomarkers from which multiple models can be constructed. From the same biological sample, we can obtain information on DNA from Genomics, methylated DNA from Epigenomics, RNA from Transcriptomics, protein from Proteomics, and lipid, sugar, and metabolites from Metabolomics. In some embodiments, the combination of two or more of these biomarker models is used to improve the discrimination performance.
[0087] Thus, a combination of a first biomarker sub-model for analyzing a first subset of biomarkers and a second biomarker sub-model for analyzing a second subset of biomarkers may be used. The first and the second model can discriminate between distinct types of samples. For example, a first biomarker sub-model, which analyzes a first subset of biomarkers (a first biomarker panel), will discriminate between a first and a second sample type and a second biomarker sub-model, which analyzes a second subset of biomarkers (a second biomarker panel), will discriminate between a third and a fourth sample type.
[0088] Thus, a first biomarker sub-model and a second biomarker sub-model may be combined into the model building element 110 for the assessment of cancer. The first and second biomarker sub-models may optionally be combined with models from other modalities such as a medical imaging model 112 and / or an EMR model 116. In some embodiments, a first biomarker sub-model, a second biomarker sub-model, and a third (and optionally a fourth or further biomarker model) may be combined. These different biomarker models assess a different panel of biomarkers from different sample types.
[0089] When multiple biomarker sub-models are used, the different biomarker sub-models can be run sequentially or simultaneously. Some models lead to performance degradation due to model integration, and that such models can be preliminarily excluded as candidates for merging to achieve high discrimination. The use and integration of the sub-models may occur early or late in the process and may be informed by the outcome of a first-run model. For example, in some embodiments, a first sub-model of benign vs malignant classification may be run, then if found tobe of the malignant class, breast cancer subtyping is run (e.g., this sample is malignant, next which of the subtypes is it: Luminal A, Luminal B, Her2 enriched, or Trippie Negative).
[0090] Some scenarios are particularly useful when using a low-middle multiplex assay such as ELISA, chemiluminescence immunoassay and SIMOA. Alternatively, when using a high multiplex assay such as Olink and MS, the assay for first subset and second subset may be conducted either simultaneously or sequentially. For clarity, any biomarker sub-model may be run as a biomarker model, or any model may be run as a sub-model.Assessment
[0091] The results of the assessment will provide an assessment of a cancer. In some embodiments, the cancer is a breast cancer, and the assessment is a diagnosis. The results of the biomarker model(s) and / or the fusion model may be provided as a numerical value as to the probability that the sample is cancer. Alternatively, a predefined value may be provided such that if the results are above the predefined value, the subject is defined as having breast cancer. Other ways of providing information for assessing the cancer that are known in the art may also be used.
[0092] In some embodiments, once the information for assessing the cancer has been provided, the system will also provide a treatment plan. This treatment plan can be obtained from medical best practice as known in the art and may include information as to the next step or steps of treatment such as monitoring, additional imaging, surgery, chemotherapy, endocrine therapy, molecular targeted therapy, immune checkpoint inhibitor and / or radiation.Examples of Biomarker Models / Subsets
[0093] The following paragraphs describe certain explanatory embodiments. Other embodiments may include alternatives, equivalents, and modifications. Additionally, the explanatory embodiments may include several novel features, and a particular feature may not be essential to some embodiments of the devices, systems, and methods that are described herein. Furthermore, some embodiments include features from two or more of the following explanatory embodiments. Thus, features from various embodiments may be combined and substituted as appropriate.
[0094] A number of different models were run to create different biomarker panels, where the assessment of the concentrations of the biomarkers can discriminate between cancerous, benign, and normal tissues and provide an assessment of a cancer.
[0095] Various biomarkers were measured with healthy, benign, and malignant samples with varying concentrations, as is shown in FIGS. 5(a) - 5(h). Thus, for the biomarker PTX3, multiple samples of each healthy (o), benign (1), and malignantfg) were measured for their biomarker concentration. The concentration of PTX3 for each of the three sample types are provided in FIG. 5(a). FIGS. 5(b) to 5(h) provide the same data for seven additional biomarkers and FIGS. 5© to 5(m) provide the healthy (o) and malignant (3) data. This data shows that, for some biomarkers, the difference in concentration between the healthy, benign, and / or malignant samples is visible for the single biomarker.
[0096] Various biomarkers were measured with healthy, benign, and malignant samples with varying concentrations. FIG. 6(a) is the ROC curve and shows the performance for healthy versus malignant sample for a subset of biomarkers and FIG. 6(b) shows the performance for benign versus malignant. These figures show that the sensitivity increases non-linearly with the 1- specificity at varying threshold values. The AUC for this analysis for healthy versus malignant was 0.82 where the AUC for benign versus malignant is 0.66.
[0097] Similarly, FIG. 7(a) show the performance for health versus malignant sample with the selected biomarkers PTX3, TNF-a, FABP4, HGF, FN1, and COMP. The AUC was calculated as 0.7042. FIG. 7(b) shows the performance for DCIS versus invasive carcinoma with the selected biomarkers PTX3, CYR61, TFF1, AND HER3. The AUC was calculated as 0.7588. These figures show that the sensitivity increases non-linearly with the 1-specificity at varying threshold values.
[0098] FIG. 8 shows the performance for a classification model (BM model o) built with the six biomarkers, MMP8, IL-8, CD69, HER3, PTX3, and CYR61. This model is Light Gradient Boosting Machine classifier trained and validated through 5-fold cross-validation of 6 biomarkers’ concentrations in serum of benign and malignant patients. It shows an accuracy of 0.74, a sensitivity of 0.76, a specificity of 0.714 and an AUC of 0.76. The feature ranking for these five biomarkers is: MMP8>IL-8>CD69>CYR61>HER3> PTX3.
[0099] It is possible that the biomarker panels can have fewer biomarkers. Thus, a new panel was devised. FIG. 9 shows the performance for a classification model (BM model o) built with the five biomarkers, MMP8, IL8, FN1, p2i, and CD25. This model is Light Gradient Boosting Machine classifier trained and validated through 5-fold cross-validation of biomarkers’ concentrations in a serum of benign and malignant patients, and it shows an accuracy of 0.72, a sensitivity of 0.74, a specificity of 0.68, and an AUC of 0.74. The feature ranking for these five biomarkers is: MMP8 > IL8 > FN1 > p2i, > CD25.[oioo] In another embodiment, a classification model is built with the four biomarkers MMP8, IL8, FNi, and CD25. Other biomarkers can be added to this to create a classification model with five or six biomarkers. The fifth (or fifth and sixth biomarker) may be selected from the biomarkers of Table 2. In some embodiments, the fifth (or fifth and sixth biomarker) is one (or two) of DNAH10, SHOX2, OVGP1, ITGA11, TGFA, ENTHD1, RHOXF2, SPDEF, Pro-CASP8, COL1A1, TFF1, CRY61, FABP4, HGF, HER3, TNFa, and COMP.
[0101] A panel was also devised having the biomarkers FNi, IL-8, PTX3, CYR61, and TNFa. The model was shown to have an AUC of 0.68. A panel was also devised having the biomarkers FNi, IL-8, CD25, MMP8, p2i, and TFF1. This model was shown to have an AUC of 0.74.
[0102] Multiple specific subsets of biomarkers to be used in one or more biomarker models for invasiveness have been analyzed and can include:TABLE 4
[0103] For the biomarker panels in Table 4, all, or some subset of the biomarkers may be incorporated into the panel. Additional biomarkers from Table 2, Table 1, or other biomarkers known in the art may be added to the panel. These panels may be used as part of the biomarker model for assessing a cancer in a subject. Alternatively, these panels may be used to train the biomarker panels to create a biomarker model or biomarker sub-model. One, two, or more of the panels may be used in the methods and kits as provided herein.
[0104] The combination of protein biomarkers that makes up the panel of biomarkers, e.g., for a biomarker model, for the analysis of healthy sample(s) compared to malignant breast cancer sample(s) may include, or may substantially contain, at least 4, 5, 6, 7, 8, 9, 10, 11, or 12 biomarkers selected from PTX3, CYR61, HSP70, TNFa, TFF1, HGF, FN1, HER3, p2i, CXCL9, COMP, and FABP4. This new biomarker panel of the invention allows diagnosing and even stratifying various cancer diseases.
[0105] The combination of protein biomarkers that makes up the panel of biomarkers, e.g., for a biomarker model, for the analysis of healthy sample(s) compared to malignant breast cancer sample(s) may include, or may substantially contain, at least 4, 5, or six of: PTX3, TNFa, FN1, TFF1, HGF, and HER3.
[0106] In some embodiments, the combination of protein biomarkers that makes up the panel of biomarkers, e.g., for a biomarker model, for analysis of benign sample(s) compared to malignant breast cancer sample(s) may include or may substantially contain at least 3 or 4 of: PTX3, TNFa, FNi, and CYR61.
[0107] The combination of protein biomarkers that makes up the panel of biomarkers, e.g., for a biomarker model, for the analysis of healthy sample(s) compared to malignant breast cancer sample(s) may include, or may substantially contain, at least 4, 5, 6, 7, 8, 9, 10, 11, 12, 15 or more biomarkers selected from p2i, CD69, IL-8, MMP8, SCGB2A2 / Mammaglobin, TFF1, CYR61, FNi, PTX3, TNFa, HER3, CXCL9, HGF, ADAM8, CA15-3, CD25, EGFR, HER4, and PR. This new biomarker panel of the invention allows diagnosing and even stratifying various cancer diseases.
[0108] The combination of protein biomarkers that makes up the panel of biomarkers, e.g., for a biomarker model, for the analysis of healthy sample(s) compared to malignant breast cancer sample(s) may include, or may substantially contain, at least 3, 4, or 5 of: p2i, CD69, IL-8, MMP8, and SCGB2A2 / Mammaglobin.
[0109] In some embodiments, the combination of protein biomarkers that makes up the panel of biomarkers, e.g., for a biomarker model, for the analysis of benign sample(s) compared tomalignant breast cancer sample(s) may include, or may substantially contain at least 3, 4, 5 or 6 of: TFF1, CYR61, FN1, PTX3, TNFa, and HER3.
[0110] The combination of protein biomarkers that makes up the panel of biomarkers, e.g., for a biomarker model, for the analysis of benign sample(s) compared to malignant breast cancer sample(s) may include, or may substantially contain at least 3, 4, 5, or 6 of: MMP8, IL-8, CD69, HER3, PTX3, and CYR61.
[0111] The combination of protein biomarkers that makes up the panel of biomarkers, e.g., for a biomarker model, for the analysis of benign sample(s) compared to malignant breast cancer sample(s) may include, or may substantially contain at least 3, 4, or 5 of: MMP8, IL-8, p2i, FN1, and CD25.
[0112] The combination of protein biomarkers that makes up the panel of biomarkers for the analysis of benign sample(s) compared to malignant breast cancer sample(s) may include, or may substantially contain at least 3, 4, 5, or 6 of: PTX3, TNFa, FABP4, HGF, FN1, and COMP.
[0113] The combination of protein biomarkers that makes up the panel of biomarkers for the analysis of DCIS samples compared to malignant breast cancer sample(s) may include, or may substantially contain, at least 3, 4, 5, or 6 of PTX3, CYR61, HSP70, TFF1, FN1, HER3.
[0114] The combination of protein biomarkers that makes up the panel of biomarkers for the analysis of benign sample(s) compared to malignant breast cancer sample(s) may include, or may substantially contain, at least 3, 4, or 5 of PTX3, CYR61, TNFa, and FN1, MMP8, IL8, CD69, HER3, PTX3, and CYR61, or MMP8, IL8, p2i, FN1 and CD25.
[0115] The combination of protein biomarkers that makes up the panel of biomarkers for the analysis of healthy sample(s) compared to malignant breast cancer sample(s) may include, or may substantially contain, at least two or all of FN1, p2i, and IL-8.
[0116] The combination of protein biomarkers that makes up the panel of biomarkers for the analysis of healthy sample(s) compared to invasive lobular carcinoma sample(s) may include at least 3, 4, 5, 6, or more of one of the panels described above and also eCAD.
[0117] In addition, various classification models can be built depending on the classification objectives. Consistent with the models described above, to build the model to classify the first sample type and the second sample type, the concentrations of biomarker candidates are obtained for first sample type and second sample type and build the model candidates based on a various combination of biomarker candidates and the model candidate showing highest performance is finally selected as the classifier for first sample type vs second sample type.Examples of Biomarkers and Models from Olink data
[0118] The 61 biomarkers selected from the Olink data set were analyzed to for relevance to inclusion in the panels for assessment of cancer. Classification models were created with several classifiers using 5-fold cross-validation. As a result, Logistic Regression, KNN, and Support Vector Machine demonstrated high baseline performance. In addition, Multi-Layer Perceptron (MLP; a neural network model) was evaluated as well. BMs were narrowed down according to a feature selection algorithm using the classifiers: Logistic Regression, Support Vector Machine, KNN, and MLP. The biomarkers provided in Table 5 are those selected by two or three classifiers and may be particularly useful in the assessment of a cancer when comparing benign vs. cancer.TABLE 5Three Three Two Two Classifiers Classifiers classifiers classifiers ARID1A KRI1 B3GNT7 PDE2ABOLA1 MATN3 BPIFA2 SRGAP3D0C2B MOCS2 GSTA3 PRPH GAL RND3 INPP5D IL24 GCG SPRING1 LHB STX2 ITGA11 PCM1
[0119] FIG. 10(a) shows the performance for a classification model built with 6 biomarkers DNAH10, SHOX2, OVGPi, ITGA11, TGFA and ENTHD1. This model is logistic regression model classifier trained and validated through 5-fold cross validation of 6 biomarkers concentrations in serum of benign and malignant patients. The AUC was calculated 0.80.Similarly, FIG. 10(b) show the performance for a panel for a classification model that differentiates DCIS vs invasive carcinoma includes one or more of the seven biomarkers: FGD6, ZWILCH, NOP56, TECPR1, AKT3, LCP1, and MINDY4B. This model shows a mean ROC having an AUC of 0.96.
[0120] The combination of protein biomarkers that makes up the panel of biomarkers may include FN1, IK-8, CD25, MMP8 and one or both of DNAH10 and SHOX2. Another biomarker panel includes FNi, IK-8, CD25, MMP8 and one or more of DNAH10, ITGA11, MOCS2; STK36, GAS8, EFCAB8, GATA3, ITGA11, ABHD8, ABO; CLHC1, DSG3, and SIGLEC15.
[0121] Another biomarker panel includes FNi, IK-8, CD25, MMP8 and one or more of ARID1A; KRI1; BOLA1; MATN3; DOC2B; MOCS2; GAL; RND3; GCG; SPRING1; and ITGA11.Another biomarker panel includes FN1, IK-8, CD25, MMP8 and one or more of B3GNT7; PDE2A; BPIFA2; SRGAP3; GSTA3; PRPH; INPP5D; IL24; LHB; STX2; and PCM1.
[0122] Another biomarker panel includes two or more of ARID1A; KRIi; BOLA1; MATN3; DOC2B; MOCS2; GAL; RND3; GCG; SPRING1; and ITGA11.Examples of TAAb Biomarkers and Models
[0123] In some embodiments, one or more tumor-associated antibodies (TAAbs) are used for the assessment of cancer. These TAAbs may be combined with one or more of the protein biomarkers as described above. Alternatively, several TAAbs may be assessed in a model containing only TAAbs. The TAAb analysis may be done in parallel with the assessments discussed above. Alternatively, the TAAb may be assessed separately and the information combined into the model for the assessment of the cancer.
[0124] In this study, we conducted an extensive screening of TAAbs in serum samples from 26 donors with breast cancer and 24 donors with benign breast disease. Using the Sengenics i-Ome Cancer Array, we quantified the IgM and IgG response to 525 TAAs. We observed that each donor mounts a distinct antibody response to their specific disease, and that the number of benign donors with an activated immune response, characterized by many high-expression antibodies, was greater than the number of cancer donors with an activated immune response. This observation indicates that immune activation is highly individualized, with some donors exhibiting hyperactive or hypoactive responses, necessitating self-relative normalization for meaningful comparison— an approach essential for broad diagnostic applicability
[0125] The i-Ome Cancer Array (Sengenics, Malaysia) was used to screen for both IgM and IgG responses to a panel of 525 TAAs. This platform utilizes KREX Technology to perform high throughput, multiplexed antibody profiling with a panel of cancer-related antigens. Relative Fluorescent Intensity (RFIs) was used to quantify the concentration of each antibody, with units calculated as the difference between the foreground signal intensity and background signal intensity of each spot. Each antibody was quantified in technical quadruplicate and the RFIs for replicate spots were averaged to calculate the mean Net Intensity (Netl) for each antigen in each sample.
[0126] Table 6 provides a summary of significant TAAbs (with their relevant antibody class) identified by differential expression analysis of raw data of various TAAbs. These biomarkers meet both the p-value and fold-change threshold for significance, defined by a Mann- Whitney U Test (p-value <0.05) and fold-change >1.2. A positive fold-change indicates a higher mean relativeexpression level in the cancer group while a negative fold-change indicates a higher mean relative expression level in the benign group.TABLE 6
[0127] To assess the potential differences in total immune activity between donors, we plotted the number of IgM and IgG biomarkers at a given relative concentration for each donor, with distributions sorted by peak height. Some donors displayed globally elevated immune responses indicative of overactive immune systems, while others exhibited narrow distributions, indicative of reduced immune activity. These differences affected the mean and standard deviation of each individual’s IgM and IgG distributions, creating a latent variable proportional to immune activity that influenced analysis.
[0128] To account for this variation in immune activity between patients, z-score normalization was applied to each donor’s TAAb distribution, z-score normalization centers the mean of each person’s antibody distribution to zero and scales the standard deviation to one, enabling the comparison of relative antibody expression across individuals with different baseline IgM and IgG activities. After applying z-score normalization to account for patient specific variation in immune activity, differential expression analysis identified TAAbs elevated relative to each donor’s personal baseline immune profile. A two-tailed Mann-Whitney UTest identified 11 IgM antibodies and 14 IgG antibodies that were differed significantly between the benign and cancer groups (p<0.05, a=o.O5, fold-change >1.2).
[0129] Table 7 provides a summary of these significant TAAbs (with their relevant antibody class) identified by differential expression analysis of z-score normalized data for benign vs. cancer. Displayed are biomarkers meeting both the p-value and fold-change threshold for significance, defined by a Mann-Whitney U Test (p-value <0.05) and fold-change >1.2. A positive fold-change indicates a higher mean relative expression level in the cancer group while a negative fold-change indicates a higher mean relative expression level in the benign group. TABLE 7
[0130] An unsupervised principal component analysis (PCA) was applied to the z-score normalized datasets to visualize overall patterns of TAAb expression and assess whether benign and cancer samples exhibited distinct clustering in the IgM (FIG. 11(a), IgG (FIG. 11(b), and combined IgM and IgG data sets (FIG. 11(c)).
[0131] FIGS. 11(a) - 11(g) show the classifier performance with a combination if IgM and IgG biomarkers. PCA plots and ROC curves show improved classifier performance with the combination of IgM and IgG biomarkers. PCA plots are shown for the total IgM (FIG. 11(a)), IgG (FIG. 11(b)), and combined IgM and IgG (FIG. 11(c)) Normalized Netl datasets. The cancer class is shown on the plot as crosses while the benign class is shown as green circles. Cancer subtypes are differentiated by color with orange showing luminal A breast cancer, blue showing luminal B, and pink showing triple negative breast cancer. The corresponding ROC curves show the performance of linear regression classifiers built from the top performing group of biomarkersselected via recursive feature elimination in the IgM panel ((FIG. 11(d), IgG panel ((FIG. 11(e)), and in a combined group (FIG. 11(f)) of differentially expressed IgM and IgG biomarkers. Each fold of the 5-fold cross validation is shown by a different color, with the bold black ROC curve reflecting the mean of the five folds. The shaded grey area represents the 95% confidence interval of this estimation. SHAP plots show the impact of each of the selected features for the IgM classifier (in FIG. 11(g) the IgG classifier in FIG. 11(h)), and the combined IgM and IgG classifier in FIG. n(i)) on the prediction of its corresponding model. Red reflects high relative Normalized Netl and blue represents low relative Normalized Netl. Data to the right of the vertical line is predicted to be in the cancer group, while data to the left of the vertical line is predicted to be in the benign group.
[0132] Limiting further analysis to the differentially expressed antibodies, we evaluated whether combinations of the most significant TAAbs could distinguish benign from cancer samples using supervised multivariate classification. To prevent overfitting due to the limited sample size and large number of antibodies analyzed, only the TAAbs identified as differentially expressed and the absolute number of high-titer antibodies were included in the building of classifiers (27 total features; 11=12 IgM; 11= 15 IgG). Clinical features except Age were excluded as they would not be available at the time of a mammogram or biopsy. Recursive feature elimination (RFE) was employed to select the top-performing features for an IgM, an IgG, and a combined IgM and IgG classifier. Through this analysis, 8 features were selected for the IgM classifier (CDH3,CDI<7, IFNW1, MMP1, RHOXF2, SCF, SRP54, UBQLN1), 11 features were selected for the IgG classifier (CDK2, COL1A1, COL6A1, ETS2, IMPDH1, MAGEA4V3, SPDEF, SSX2B, TP53, TUBA3C, number of high-titer IgG antibodies), and 8 features were selected for the combined classifier (IgM: CDH3, IFNW1, RH0XF2, SCF, UBQLN1; IgG: CDK2, MAGEA2, SPDEF). Logistic regression classifiers were trained on the selected features from each antibody panel with fivefold cross-validation. The resulting models demonstrated promising discriminatory performance despite the limited sample size, as shown by the receiver operating characteristic (ROC) curves. Classifiers built from the IgM panel (FIG. 11(d) and IgG panel (FIG. 11(e) each achieved moderate separation between diagnostic groups (AUC-ROC = 0.68 and 0.72, respectively), whereas a combined model incorporating both IgM and IgG biomarkers yielded the best overall performance (AUC-ROC = 0.75; FIG. 11(f). These findings indicate that an integrated, self-relative IgM and IgG TAAb signature can provide diagnostic value and may serve as a foundation for serological classifiers of early-stage breast cancer detection.
[0133] In some embodiments, one or more of the IgM biomarkers including CDH3, GDF3, RHOXF2, UBZLNQ and / or one or more IgG biomarkers including CDK2, ETS2, IMPDH1,MAGEA2, MAGEA4V3, S100A7, SELE, SPDEF, SSX2B, TP53 and TUBA3C are used as part of the biomarker panel for the assessment of cancer.
[0134] An additional biomarker panel includes FN1, IK-8, CD25, MMP8 and one or more of RHOFX2, SPDEF, Pro-CASP8, and COL1A1. In other classification models, one of the biomarkers as provided in this paragraph are replaced with a different biomarker from Table 2.Example for Distinguishing Benign and Malignant Masses
[0135] Tubes of serum and EDTA & Heparin plasma were prospectively collected from 600 women receiving a screening mammogram, diagnostic mammogram, or image guided biopsy or localization procedure at Brigham Women's Faulkner Hospital or Chestnut Hill Healthcare Center. All women between the ages of 18 and 89 were eligible. The study was approved by the Mass General Brigham IRB, and written informed consent was obtained from all patients. Two tubes each of serum and plasma were collected, processed within 2 hours, aliquot, and stored until use at -80 °C.
[0136] Quantification of protein biomarkers was performed using single-molecule arrays (SIMOA) in an initial 250 samples, followed by down-selection to the top eight biomarkers, which were measured in the remaining 350 samples. Operators of SIMOA were blind to the diagnosis associated with the sample. All samples were analyzed in technical duplicate. Simultaneously, 2,252 mammographic images from 229 participants (127 normal, 67 benign, and 105 cancerous cases) were annotated by a radiologist.
[0137] Using data from blood-based biomarkers, Electronic Medical Records (EMR), and / or mammographic images, we constructed machine learning models for the classification of tasks of normal vs. cancer (breast cancer screening) and benign vs. cancer (reducing breast biopsies) based on the analysis. Feature importance, recursive feature elimination, permutation importance, and SHAP values, which are indication of a feature's (i.e., biomarker's) contribution to classification, were calculated to identify the biomarkers with the greater contribution to the classification algorithm.
[0138] Candidate biomarkers were selected by integrating single-cell RNA sequencing (scRNA- seq) data, gene expression profiles from The Cancer Genome Atlas (TCGA), clinical pathologist input, and literature review. The protein panel was selected from a non-exclusive list including: a) Interleukins (IL-1P, IL-4, IL-5, IL-6, IL-8, IL-10, IL-12P70, IL-22) b) TNFct, IFNy,h) NCAM1, TIMP1, COMP, FTL, OPN (Osteopontin), FABP4, eCadherin, Prolactin (PRL), TFFl, TFF2, TFF3, HGF, CEACAM6, MMP1.
[0139] The stand-alone blood-based biomarker panels could effectively distinguish between normal donors and those with growths (benign or cancerous), achieving an AUC of 0.82. The stand-alone Phase 1 blood-based biomarker panels, however, were less effective in distinguishing between benign and malignant breast cancers, with an AUC of 0.66. When the blood-based biomarker analysis was limited to benign versus invasive cancers, the AUC improved to 0.71, suggesting that blood-based biomarkers may be less detectable in peripheral blood for tumors confined to the ductal lumen. Classifiers combining radiological EMR data and mammographic imaging data could effectively differentiate between benign and cancerous findings, with an AUC of 0.91.Exemplary Fusion Operations
[0140] The fusion, operational flow, grouping and fusion of multiple modalities of data as described in U.S. Pat. Ser. No. 19 / 020,833 filed Jan 14, 2025, herein incorporated by reference in its entirety.
[0141] The data-processing device 100 integrates and interpolates diagnostic information obtained from two or more modalities to improve classification accuracy (e.g., diagnostic accuracy, prediction accuracy) compared to a classification that uses only a mammogram. This can reduce unnecessary biopsies. In addition, because different data are available at different facilities, the data-processing device 100 is able to adjust to the available data at a facility.
[0142] To generate classification results, the data-processing device 100 is configured to use early data fusion (early fusion), intermediate data fusion (intermediate fusion), and late data fusion (late fusion), and is also configured to determine when to use early fusion, intermediate fusion, and late fusion.
[0143] An output set may define all the possible classification results that can be output by the ML models (in one or both of early fusion and late fusion) and the late-fusion models. Forexample, some output sets include only the following classification results: healthy, benign, and malignant. Thus, in such embodiments, when an ML model or late-fusion model output a classification result, the output classification result is one (and, in some embodiments, only one) of the following: healthy, benign, and malignant. Also for example, other output sets include different cancers, such as breast cancer, lung cancer, gastric cancer, liver cancer, pancreatic cancer, colorectal cancer, prostate cancer, hepatocellular cancer, cervical cancer, ovarian cancer, bladder cancer, cancer of the urinary tract, thyroid cancer, renal cancer, carcinoma, melanoma, leukemia or brain cancer. And the classification result may also include the probability of occurrence, for example breast cancer = 75%, lung cancer = 20%, etc.
[0144] For example, if the classification objective is diagnosing the presence or absence of breast cancer, then the output set may be constituted by the following: healthy, benign, and malignant. Or the output set may be constituted by breast cancer, lung cancer, other cancer, and no cancer. Also, as noted above, the classification result may include each member of the output set along with a respective probability of occurrence.
[0145] An operational flow for generating a classification result from multiple modalities of data. Although this operational flow and the other operational flows that are described herein are each presented in a certain order, some embodiments may perform at least some of the operations in different orders than the presented orders. Examples of different orders include concurrent, parallel, overlapping, reordered, simultaneous, incremental, and interleaved orders.
[0146] Furthermore, although the operational flows in the embodiments that are described herein are performed by a data-processing device too, in some embodiments, the operational flows are performed by two or more data-processing devices too or by one or more other specially configured computing devices.
[0147] The data-processing device too obtains data, one or more machine-learning models (ML models), one or more late-fusion models, and classification settings.
[0148] The classification settings include one or more grouping criteria (which may be defined by one or more grouping policies). And the classification settings may include, for example, one or more classification objectives, such as determining a presence or absence of a disease (e.g., breast cancer) or condition, or one or more output sets. The classification settings may also include selections of the ML models and the late-fusion models that are to be used.Exemplary Machine Learning Operations
[0149] In FIG. 12(a) (and similarly in FIG. 12(b), the neurons (i.e., nodes) are depicted by circles around a threshold function. For the non-limiting example shown in FIG. 12(a), the inputs are depicted as circles around a linear function, and the arrows indicate directed connections between neurons. In certain implementations, the neural network is a feedforward network as exemplified in FIG. 12(a) and FIG. 12(b), (e.g., it can be represented as a directed acyclic graph).
[0150] In some embodiments, the machine learning uses neural network operates to achieve a specific task, such as resampling data or refining a pathlength, by searching within the class of functions F to learn, using a set of observations, to find m*eF, which solves the specific task in some optimal sense (e.g., the stopping criteria discussed above). For example, in certain implementations, this can be achieved by defining a cost function C: F -> IK such that, for the optimal solution m*, C(m*)<C(m)VmeF (i.e., no solution has a cost less than the cost of the optimal solution). The cost function is a measure of how far away a particular solution is from an optimal solution to the problem to be solved (e.g., the error). Learning algorithms iteratively search through the solution space to find a function that has the smallest possible cost. In certain implementations, the cost is minimized over a sample of the data (i.e., the training data).
[0151] In some embodiments, the neural network is a convolutional neural network (CNN), and FIG. 12(b), illustrates an example embodiment of a CNN. CNNs use feed- forward artificial neural networks (ANNs) in which the connectivity pattern between neurons can represent convolutions. For example, CNNs can be used for image-processing optimization by using multiple layers of small neuron collections that process portions of the input data (e.g., projection data), called receptive fields. The outputs of these collections can then be tiled so that they overlap. This processing pattern can be repeated over multiple layers having alternating convolution and pooling layers.
[0152] FIG. 12(c), illustrates an example of implementing a convolution layer for one neuronal node of the convolution layer, according to an example embodiment. FIG. 12(c), shows an example of a 4 x 4 kernel being applied to map values from an input layer representing a two- dimensional image (e.g., a sinogram) to a first hidden layer, which is a convolution layer. The kernel maps respective 4 x 4 pixel regions to corresponding neurons of the first hidden layer.
[0153] Following a convolution layer, a CNN can include local or global pooling layers, which combine the outputs of neuron clusters in the convolution layers. Additionally, in certainimplementations, the CNN can also include various combinations of convolution and fully connected layers, with pointwise nonlinearity applied at the end of or after each layer.
[0154] In some embodiments, machine learning or other algorithms that are not neural networks are used. For example, a supervised learning algorithm such as a gradient boosting machine (GBM) a random forest, or a support vector machine (SVM) may be used. In other embodiments, an unsupervised learning algorithm such as principal component analysis (PCA) may be used. In some embodiments, the levels of the biomarkers are used to calculate a score, e.g., along with one or more additional variable, e.g., a variable from the EMR. The score can be calculated, e.g., using an algorithm such as summation, or weighted summation, of the (normalized) levels of the biomarkers. Specific algorithms may be those as described herein or in U.S. Pat. Ser. No. 19 / 020,833, filed Jan 14, 2025. Similarly specific algorithms can be identified using known statistical methods including PCA, linear regression, SVM (support vector machine), decision tree, KNN (K-nearest neighbors), K-means, gradient boosting, or random forest methods. Other algorithms that may be useful include CatBoost, Light GBM, extreme gradient boosted classifier (XGBost), logistic regression, linear regression, TabPFN, Mitra, KNN, ridge, etc. For example, in some embodiments, an exemplary model uses a logistic regression analysis wherein each variable (biomarker, X) gets a weight (P). In the exemplary equation below, the weights ( ) are calculated for each marker, and there can be unique p values for each of the biomarkers.
[0155] In the clinic, the measured biomarker values (X values) can be used to obtain a probability score a patient has or will have cancer by plugging in the measured biomarker values (X) into the equation and then calculating a probability value (P).Exemplary Fusion Framework Models
[0156] In some embodiments of the fusion framework, the clinical decision for the assessment of a cancer using a combination of biomarker model, EMR model, and medical imaging model. This has the following steps or models: (a) Providing data sources from different modalities, where these data sources include at least information from Blood-based protein biomarker panel, EMRs, and medical imaging such as mammograms, (b) Preparing standalone models for biomarker model, EMR model, non-image model, and medical image model, where the model for the biomarker comprises two or more different models using different subsets of biomarker, (c)Fusing data for at least two or more different modalities, where the multi-modality data is fused in Late fusion, which is the method to integrate the result of biomarker model, EMR model, nonimage model, and medical image model and the fusion strategy for multi-modality data is characterized by fusing weights calculated based on performance indicators such as AUC from the stand-alone models, (d) Determining that compared to a healthy subject or a benign subject is indicative for the presence of a malignant cancer disease in the subject.
[0157] In some embodiments of the fusion framework, the clinical decision for the assessment of a cancer includes data from the non-image model. The following steps or models are considered for this non-image model: (a) Using multivariate machine learning classification algorithm with a decision tree-based classifier or a Support Vector Machine classifier, (b) Combining biomarker data where the model for the biomarker comprises two or more different models using different subsets of biomarker and EMR data to build the classification model using at least EMR of personal history of breast cancer, lesion size, No new symptoms, breast tissue density, age, personal history of benign finding, diagnostic BI-RADS score (Breast Imaging Reporting and Data System score), non-use of hormone-based contraceptive, imaging findings other than mass, and FNi in the dataset for benign vs. malignant discrimination (non-image selected features; BM+EMR).
[0158] In some embodiments of the fusion framework, the clinical decision for the assessment of a cancer includes data from a medical image model combined with a biomarker model. The following steps or models are considered for this medical image model: (a) Using a deep learning algorithm with a CoaT-Lite network structure for the medical imaging model, (b) Having a logic to integrate the prediction probability calculated from each image into a study-level probability to support patient- or study-level decisions when multiple images are available for the same patient or exam, (c) Integrating multiple cross-sectional images such as mediolateral oblique (MLO) and craniocaudal (CC) images, including 2D images, 2D composite images, and 3D tomosynthesis images; multiple images from right breast and / or the left breast; and / or multiple images from different image modalities such as Mammogram, Ultrasound, and MRI. (d) Preparing standalone models for biomarker model that comprises two or more different models using different subsets of biomarker, (c) Fusing data for the medical imaging model and the two or more biomarker modalities, where the multi-modality data is fused in Late fusion, (d) Determining that compared to a healthy subject or a benign subject is indicative for the presence of a malignant cancer disease in the subject.
[0159] In some embodiments, ensembling the image analysis improved scoring. In these embodiments, the image analysis ensemble was composed of several pre-trained CNNs: InceptionV3, VGG16, MobileNetV2, DenseNeti2i, and InceptionResNetV2. This model may have superior performance over individual CNNs due to the combination of features extracted from each CNN, which overcomes the challenges of feature memorization by a single CNN. In this model, features were concatenated end-to-end and normalized to form a single vector. This normalized vector was fed to a linear classifier to obtain the final prediction. In one embodiment, a batch size of 32 and trained the model for 30 epochs using the Adam optimizer with a learning rate of ie-3 was used.
[0160] In some embodiments, it has been found that the large language model (LLM) Vision transformer provides the superior performance. Specifically, the CoaT-Lite algorithm may be used. In other embodiments, a variety of LLM and other machine learning algorithms may be used to create the fusion framework.Exemplary Data Processing Operations
[0161] FIG. 13 illustrates an example embodiment of a data-processing device 100. The data- processing device 100 may be a computer system or may include common components of a computer system. As shown in FIG. 13, the data-processing device 100 includes processing circuitry 150, one or more input interface circuits 152, memory 154, and storage 156. Also, the hardware components of the data-processing device 100 communicate via one or more buses 158 or other electrical connections. Examples of buses 158 include a universal serial bus (USB), an IEEE 1394 bus, a PCI bus, an Accelerated Graphics Port (AGP) bus, a Serial AT Attachment (SATA) bus, and a Small Computer System Interface (SCSI) bus. The data-processing device 100 may comprise one or more processors (e.g., central processing unit (CPU) micro processing unit (MPU)) and may include a network of separate computers or separate processors to read out and execute the computer executable instructions. The computer-executable instructions may be provided to the data-processing device 100 for example, from a network or the storage medium.
[0162] The one or more input interface circuits 152 include communication components (e.g., a GPU, a network-interface controller, electrical interfaces) that communicate with a display, a network (not illustrated), and other input or output devices (not illustrated), which may include a keyboard, a mouse, a printing device, a touch screen, a light pen, an optical-storage device, a scanner, a microphone, a drive, a joystick, and a control pad, for example. An I / O interface can be used to provide communication interfaces to the input and output devices
[0163] The memory 154 includes one or more computer-readable storage media (e.g., RAM), and the memory 154 provides a work area for the processing circuitry 150.
[0164] The storage 156 includes one or more computer-readable storage media. As used herein, a computer-readable storage medium is a computer-readable medium that includes an article of manufacture, for example a magnetic disk (e.g., a floppy disk, a hard disk), an optical disc (e.g., a CD, a DVD, a Blu-ray), a magneto-optical disk, magnetic tape, and semiconductor memory (e.g., a non-volatile memory card, flash memory, a solid-state drive, SRAM, DRAM, EPROM, EEPROM) and which may also be referred to more fully as a 'non-transitory computer-readable storage medium’. A server connected to the data-processing device 100 may also be a storagei56. The storage 156, which may include ROM and / or RAM, can store computer-readable data or computer-executable instructions. Also, the storage 156 is an example of a storage unit.
[0165] The data-processing device 100 as described in this example additionally includes a data-acquisition module 160, a data-grouping module 162, an ML-model-selection module 164, an ML-model-execution module 166, a calculation module 168, and a communication module 170. A module includes logic, computer-readable data, or computer-executable instructions. In the embodiment shown in FIG. 13, the modules are implemented in software (e.g., Assembly, C, C++, C#, Java, BASIC, Perl, Visual Basic, Python). However, in some embodiments, the modules are implemented in hardware (e.g., customized circuitry) or, alternatively, a combination of software and hardware. When the modules are implemented, at least in part, in software, then the software can be stored in the storage 156. Also, in some embodiments, the data-processing device 100 includes additional or fewer modules, the modules are combined into fewer modules, or the modules are divided into more modules. Furthermore, the storage 156 includes a model repository 172, which stores machine-learning models and late-fusion models; a grouping-criteria repository 174, which stores grouping criteria (e.g., grouping policies); and a data, classification results, and features repository 176, which stores obtained and generated data, classification results, and features that are extracted from data using ML models.
[0166] The data-acquisition module 160 includes instructions that cause the applicable components (e.g., the processing circuitry 150, the input interface circuit 152, the memory 154) of the data-processing device 100 to obtain data in multiple modalities. For example, some embodiments of the data-acquisition module 160 include instructions that cause the applicable components of the data-processing device 100 to acquire image data 200, biomarker data 300 and / or EMR data 400. And the applicable components operating according to the data- acquisition module 160 realize an example of a data-processing device 100.
[0167] The data-grouping module 162 includes instructions that cause the applicable components (e.g., the processing circuitry 150, the input interface circuit 152, the memory 154) of the data-processing device 100 to group data into multi-modality (MM) groups according to one or more grouping criteria, which may be included in grouping policies; to remove data with low intra-group, inter-modality correlations from MM groups; and to generate groups (e.g., high- correlation groups) that include multiple intermediate-data groups or that include at least one intermediate-group with at least one single-modality (SM) group. For example, some embodiments of the data-grouping module 122 include instructions that cause the applicable components of the data-processing device 100 to perform grouping operations. And the applicable components operating according to the data-grouping module 162 realize an example of a data-grouping unit.
[0168] The model-selection module 164 includes instructions that cause the applicable components (e.g., the processing circuitry 150, the input interface circuit 152, the memory 154) of the data-processing device 100 to select one or more machine-learning models or one or more late-fusion models based on one or more criteria and to add late-fusion flags to data based on one or more criteria. Examples of the criteria include the following: obtained data (e.g., the modalities of obtained data, the content of obtained data), a classification objective, previously selected ML models, and outputs of other ML models. For example, some embodiments of the model-selection module 164 include instructions that cause the applicable components of the data-processing device 100 to select one or more specified models. Applicable components operating according to the model-selection module 164 realize an example of a model-selection unit.
[0169] The model-execution module 166 includes instructions that cause the applicable components (e.g., the processing circuitry 150, the input interface circuit 152, the memory 154) of the data-processing device 100 to input data into one or more ML models or into one or more late-fusion models, to execute the ML models or late-fusion models, to obtain the outputs of the ML models or late-fusion models, and to store the outputs of the ML models or late-fusion models in the data, classification results, and features repository 176. For example, some embodiments of the model-execution module 166 include instructions that cause the applicable components of the data-processing device 100 to execute one or more models. The applicable components operating according to the model-execution module 166 realize an example of a model-execution unit.
[0170] The calculation module 168 includes instructions that cause the applicable components (e.g., the processing circuitry 150, the input interface circuit 152, the memory 154) of the data- processing device 100 to calculate intra-group, inter-modality correlations between data in groupsand to calculate inter-group correlations between data in different groups (intermediate-data groups and single-modality groups). For example, some embodiments of the calculation module 168 include instructions that cause the applicable components of the data-processing device 100 to perform at least some of the calculations as described hereinabove as well as other known calculations. The applicable components operating according to the calculation module 168 realize an example of a calculation unit.
[0171] The communication module 170 includes instructions that cause the applicable components (e.g., the processing circuitry 150, the input interface circuit 152, the memory 154) of the data-processing device 100 to communicate with other devices, such as other computing devices (e.g., consoles, servers, databases, terminals), input devices, and output devices; to obtain grouping criteria, ML models, late-fusion models, and classification settings; and to output second classification results. And the applicable components operating according to the communication module 170 realize an example of a communication unit. Also, when controlling a display on a display device (e.g., to display classification results), the applicable components operating according to the communication module 170 realize an example of a display-control unit. Furthermore, when receiving classification settings (e.g., grouping criteria, classifications objectives) from an input device, the applicable components operating according to the communication module 170 realize an example of a configuration unit.Definitions
[0172] In the description, specific details are set forth to provide a thorough understanding of the embodiments disclosed. However, well-known methods, procedures, components, and circuits may not have been described in detail to avoid unnecessarily lengthening the present disclosure.
[0173] Throughout the figures, where possible, the same reference numerals and characters, unless otherwise stated, are used to denote like features, elements, components, or portions of the illustrated embodiments. In addition, while the subject disclosure is described in detail with reference to the enclosed figures, it is done so in connection with illustrative exemplary embodiments. It is intended that changes and modifications can be made to the described exemplary embodiments without departing from the true scope of the subject disclosure as defined by the appended claims. Although the drawings represent some possible configurations and approaches, the drawings are not necessarily to scale and certain features may be exaggerated, removed, or partially sectioned to better illustrate and explain certain aspects of the presentdisclosure. The descriptions set forth herein are not intended to be exhaustive or otherwise limit or restrict the claims to the precise forms and configurations shown in the drawings and disclosed in the following detailed description.
[0174] The terminology used herein is for the purpose of describing particular embodiments only and is not intended to be limiting. The use of the terms “a” and “an” and “the” and similar referents in the context of describing the disclosure (especially in the context of the following claims) are to be construed to cover both the singular and the plural, unless otherwise indicated herein or clearly contradicted by context. The terms “comprising,” “having,” “includes,” “including,” and “containing” are to be construed as open-ended terms (i.e., meaning “including, but not limited to,”) unless otherwise noted. Specifically, these terms, when used in the present specification, specify the presence of stated features, integers, steps, operations, elements, and / or components, but do not preclude the presence or addition of one or more other features, integers, steps, operations, elements, components, and / or groups thereof not explicitly stated. Recitation of ranges of values herein are merely intended to serve as a shorthand method of referring individually to each separate value falling within the range, unless otherwise indicated herein, and each separate value is incorporated into the specification as if it were individually recited herein. For example, if the range 10-15 is disclosed, then 11, 12, 13, and 14 are also disclosed. All methods described herein can be performed in any suitable order unless otherwise indicated herein or otherwise clearly contradicted by context. The use of any and all examples, or exemplary language (e.g., “such as”) provided herein, is intended merely to better illuminate the disclosure and does not pose a limitation on the scope of the disclosure unless otherwise claimed. No language in the specification should be construed as indicating any non-claimed element as essential to the practice of the disclosure.
[0175] It should be understood that if an element or part is referred herein as being "on", "against", "connected to", or "coupled to" another element or part, then it can be directly on, against, connected or coupled to the other element or part, or intervening elements or parts may be present. In contrast, if an element is referred to as being "directly on", "directly connected to", or "directly coupled to" another element or part, then there are no intervening elements or parts present. When used, term "and / or", includes any and all combinations of one or more of the associated listed items, if so provided.
[0176] Spatially relative terms, such as “under” “beneath”, "below", "lower", "above", "upper", “proximal”, “distal”, and the like, may be used herein for ease of description to describe one element or feature's relationship to another element(s) or feature(s) as illustrated in the variousfigures. It should be understood, however, that the spatially relative terms are intended to encompass different orientations of the device in use or operation in addition to the orientation depicted in the figures. For example, if the device in the figures is turned over, elements described as "below,” or "beneath" other elements or features would then be oriented "above" the other elements or features. Thus, a relative spatial term such as "below" can encompass both an orientation of above and below. The device may be otherwise oriented (rotated 90 degrees or at other orientations) and the spatially relative descriptors used herein are to be interpreted accordingly. Similarly, the relative spatial terms “proximal” and “distal” may also be interchangeable, where applicable.
[0177] The term “about,” as used herein means, for example, within 10%, within 5%, or less. In some embodiments, the term “about” may mean within measurement error. Similarity, the terms “equal” or “being equal” refers to being approximately equal (e.g., not statistically different, not different within measurement error.)
[0178] Also, as used herein, the conjunction “or” generally refers to an inclusive “or,” although “or” may refer to an exclusive “or” if expressly indicated or if the context indicates that the “or” must be an exclusive “or.”
[0179] Moreover, as used herein, the terms “first,” “second,” and so on, do not necessarily denote any ordinal, sequential, or priority relation and may be used to more clearly distinguish one member, operation, element, group, collection, set, region, section, etc. from another without expressing any ordinal, sequential, or priority relation. Thus, a first member, operation, element, group, collection, set, region, section, etc. discussed below could be termed a second member, operation, element, group, collection, set, region, section, etc. without departing from the teachings herein.
[0180] Various terms are used to describe position / orientation in a three-dimensional space. As used herein, the term “position” refers to the location of an object or a portion of an object in the three-dimensional space (e.g., three degrees of translational freedom along Cartesian X, Y, Z coordinates); the term “orientation” refers to the rotational placement of an object or a portion of an object (three degrees of rotational freedom— e.g., roll, pitch, and yaw); the term “posture” refers to the position of an object or a portion of an object in at least one degree of translational freedom and to the orientation of that object or portion of object in at least one degree of rotational freedom (up to a total six degrees of freedom); the term "shape" refers to a set of posture, positions, and / or orientations measured along the elongated body of the object. The terms “proximal” and “distal” are used with reference to the manipulation of an end of an instrument (such as a surgicalinstrument) extending from the user to a site or region of interest. In this regard, the term “proximal” refers to the portion of the instrument closer to the user, and the term “distal” refers to the portion of the instrument further away from the user and closer to a site or region of interest.
[0181] As used herein the term “biological sample” or “sample”, when referring to the material to be tested for the presence of a biological marker using the method of the invention, includes, inter alia, whole blood, plasma, urine, cerebral-spinal fluid, lymphatic fluid, pleural fluid, saliva, synovial fluid, or serum. If needed, various methods are well known within the art for the identification and / or isolation and / or purification of a biological marker from a sample. The sample may be from a standard venous blood sample.
[0182] An “isolated” or “purified” biological marker is substantially free of cellular material or other contaminants from the cell or tissue source from which the biological marker is derived, i.e., partially, or completely altered, or removed from the natural state through human intervention. For example, proteins contained in the sample can be isolated according to standard methods, for example using lytic enzymes, chemical solutions, or isolated by protein-binding resins following the manufacturer's instructions.
[0183] As used herein, the term “label” may refer to the coupling (i.e. physical linkage) of a detectable substance, such as a radioactive agent or fluorophore (e.g. phycoerythrin (PE) or indocyanine (Cy5)), to an antibody or probe, as well as indirect labeling of the probe or antibody (e.g. horseradish peroxidase, HRP) by reactivity with a detectable substance. As used herein, the term “label” may alternatively refer to a “part of a class,” for example, label =0 cam be used to define a class of healthy patients.
[0184] Breast cancer includes various invasive cancer types, including type means Invasive Ductal Carcinoma (IDC), Invasive Lobular Carcinoma (ILC), Invasive Ductal and Lobular Carcinoma (IDLC), and Inflammatory Breast Cancer. The term breast cancer also includes Ductal Carcinoma in situ (DCIS) and microinvasive carcinomas.
[0185] As used herein, “benign” or “benign breast disease” refers to a group of conditions that have changes in breast tissue that are benign (not cancerous). Benign breast disease includes conditions caused by an increase in the number of cells or by the growth of abnormal cells in the breast ducts or lobes. The most common benign breast disease causing a lump in the breast is fibroadenoma. Other common benign breast diseases include fibrosis and simple cysts, hyperplasia (ductal or lobular), lobular carcinoma in situ (LCIS), adenosis, fibroadenomas, phyllodes tumors, intraductal papillomas, fat necrosis and oil cysts, mastitis, duct ectasia, flatepithelial atypia, fibroepithelial lesion, microcalcifications and calcifications, radial scars and other non-cancerous breast conditions.
[0186] As used herein “assessing” a cancer includes one or more of the diagnosis, prognosis, treatment, stratification, and monitoring of the cancer. The method may assess whether a subject has a cancer such as breast cancer, a benign breast disease or condition such as diagnosis such as various fibrocystic breast changes. As used herein, the term “diagnosis” includes the identification of the nature or type of cancer. As used herein the term “prognosis” includes providing information as to the likely course of the cancer, the likely response to or effect of a treatment such as, for example, whether a chemotherapeutic will work, whether a CD4 / 6 inhibitor will work, whether an immunotherapy (e.g. checkpoint inhibitor) will work, or whether a mutation is likely to occur. As used herein, the term “treatment” includes providing a pharmaceutical, providing a radiopharmaceutical, surgery, radiation, etc.
[0187] The term “stratification” or “risk stratification” is the organization of a patient into a subgroup, such as by age. More specifically, “stratification” may also refer to “risk stratification,” which is the organization of a patient into a subgroup based on, e.g., cancer history, stage, cancer subtype, and / or the level of care they need. Risk stratification can be used to identify patients who are at higher risk of progression or death. It can be used to tailor treatment paths, screening intervals, and providing more resources to patients at higher risk.
[0188] “Monitoring” a cancer involves a single or multiple incidents where data is obtained to understand the current status of a known disease or treatment regime.
[0189] An “antibody” is a protein which is any polypeptide or complex of polypeptides having an immunoglobulin domain sequence. The immunoglobulin domain sequence is as found in natural antibodies, whether such polypeptide is naturally produced (e.g., generated by an organism reacting to an antigen), or produced by recombinant engineering, chemical synthesis, or other artificial system or methodology. The antibody may be polyclonal or monoclonal and may have constant region sequences that are characteristic of mouse, rabbit, primate, or human antibodies. The antibody as described herein may be, but is not limited to, intact IgA, IgG, IgE or IgM antibodies and antibody fragments such as Fab fragments and Fabzfragments. In some embodiments, an antibody may lack a covalent modification (e.g., attachment of a glycan) that it would have if produced naturally.
[0190] It will be appreciated that the methods and compositions of the instant disclosure can be incorporated in the form of a variety of embodiments, only a few of which are disclosed herein. Variations of those embodiments may become apparent to those of ordinary skill in the art uponreading the foregoing description. Skilled artisans are expected and understood to employ such variations as appropriate, and the present disclosure is intended to be practiced otherwise than as specifically described herein. Accordingly, this disclosure includes all modifications and equivalents of the subject matter recited in the claims appended hereto as permitted by applicable law. Moreover, any combination of the above-described elements in all possible variations thereof is encompassed by the disclosure unless otherwise indicated herein or otherwise clearly contradicted by context.
[0191] While the present disclosure has been described with reference to exemplary embodiments, it is to be understood that the disclosure is not limited to the disclosed exemplary embodiments. The scope of the following claims is to be accorded the broadest interpretation so as to encompass all such modifications and equivalent structures and functions.
[0192] While certain embodiments have been described, these embodiments have been presented by way of example only and are not intended to limit the scope of every embodiment. Indeed, the novel embodiments described herein may be embodied in a variety of other forms; furthermore, various omissions, substitutions, and changes in the form of the embodiments described herein may be made without departing from the spirit of the disclosure. The accompanying claims and their equivalents are intended to cover such forms or modifications as would fall within the scope and spirit of the disclosure.
Claims
CLAIMS1. A method of assessing a biological sample from a subject for a cancer comprising: determining, in the biological sample, a concentration for each of at least three protein biomarkers selected from Table 1; analyzing the concentrations of the at least three protein biomarkers using a trained biomarker model to obtain a model output; wherein the biomarker model has been trained to discriminate between cancer samples and benign samples; and providing the model output for the assessment of a cancer.
2. The method of Claim i, wherein the assessment of a cancer is a diagnosis of a breast cancer.
3. The method of Claim 1, wherein the assessment of a cancer is a stratification of a breast cancer.
4. The method of Claim 1, wherein the biological sample is a blood sample.
5. The method of Claim 1, wherein the concentrations of at least four (4) protein biomarkers selected from Table 1 are analyzed.
6. The method of Claim 5, wherein the concentrations of at least six (6) protein biomarkers are analyzed.
7. The method of Claim 1, wherein the at least three protein biomarkers selected from Table 1 are selected from Table 2.
8. The method of Claim 7, wherein the at least three protein biomarkers include at least three (3) of MMP8; IL-8; CD69; HER3; PTX3; CYR61.
9. The method of Claim 7, wherein the at least three protein biomarkers include at least three (3) of MMP8; IL-8; p2i; FN1; and CD25.
10. The method of Claim 7, wherein the at least three protein biomarkers include at least one11. The method of Claim 7, wherein the biomarkers include at least two of CD25, MMP8, IL- 8, and FN1 and at least one biomarker selected from DNAH10, SHOX2, OVGP1, ITGA11, TGFA, and ENTHD1.
12. The method of Claim 7, wherein the at least three protein biomarkers include at least one (1) of RHOXF2, SPDEF, Pro-CASP8, and COL1A1.
13. The method of Claim 7, wherein the biomarkers include at least two of CD25, MMP8, IL- 8, and FN1 and at least one biomarker selected from RHOXF2, SPDEF, Pro-CASP8, AND COL1A1.
14. The method of Claim 7, wherein the at least three protein biomarkers include at least three (3) of one of the following panels: a. p2i, CD69, IL-8, MMP8, SCGB2A2 / Mammaglobin, TFF1, CYR61, FN1, PTX3, TNFa, HER3, CXCL9, HGF, ADAM8, CA15-3, CD25, EGFR, HER4, and PR; b. p2i, CD69, IL-8, MMP8, and SCGB2A2 / Mammaglobin; c. TFF1, CYR61, FN1, PTX3, TNFa, and HER3; d. PTX3, TNFa, FABP4, HGF, FN1, and COMP; e. PTX3, CYR61, HSP70, TFFi, FNi, and HER3; f. PTX3, CYR61, TNFa, and FNi; g. FNi, p2i, and IL-8; h. DNAH10, SHOX2, OVGP1, ITGA11, TGFA, ENTHD1; and i. FGD6, ZWILCH, NOP56, TECPR1, AKT3, LCP1, MINDY4B.
15. The method of claim 10, wherein at least four (4) biomarkers from one of panels (a) - (f) or (h) - (i).
16. The method of Claim 1, wherein analyzing the concentrations of the at least three protein biomarkers comprises using an enzyme-linked immunosorbent assay (ELISA) or a digital ELISA.
17. The method of Claim 16, wherein the digital ELISA is a Single-Molecule Array (SIMOA).
18. The method of Claim 1, wherein the concentrations are concentrations that are relative to a test standard or a housekeeping protein concentration.19- The method of Claim 1, further comprising: obtaining image data of the subject; analyzing the image data using an image model to obtain an image model output; fusing the model output and the image model output to obtain a fused result; and providing information for assessment of a cancer by using the fused result.
20. The method of Claim 19, wherein the image model comprises a deep learning algorithm with a CoaT-Lite network structure.
21. The method of Claim 19, wherein the image model has a logic to integrate the prediction probability calculated from each image into a study-level probability, and wherein the image data comprises a plurality of images from the subject.
22. The method of Claim 19, wherein the image data comprises one or more of: multiple cross-sectional images; multiple types of images, multiple images from right breast, left breast; and multiple images from different image modalities.
23. The method of Claim 19, further comprising: obtaining an annotated image data from a radiologist based on the radiological analysis of the image data.
24. The method of Claim 1, further comprising: obtaining electronic medical record (EMR) data of the subject; analyzing the EMR data using a record model to obtain a record model output; fusing the model output and the record model output to obtain a fused result; and providing information for assessment of a cancer by using the fused result.
25. The method of Claim 1, further comprising: obtaining image data of the subject; obtaining electronic medical record (EMR) data of the subject; analyzing the concentrations of the at least three protein biomarkers, the image data, and the EMR data using the trained biomarker model to obtain a model output.
26. The method of Claim 25, wherein the EMR data analyzed by the record model comprises at least three (3) of: personal history of breast cancer, lesion size, no new symptoms, breast tissue density, age, personal history of benign finding, diagnostic Breast Imaging Reporting and Data System (BI-RAD) score, non-use of hormone-based contraceptive.
27. The method of Claim 25, wherein the EMR data analyzed by the record model comprises at least three (3) of: lesion size, no new symptoms, breast tissue density, age, diagnostic BIRADS score, non-use of hormone-based contraceptive, imaging findings other than mass, Age at menopause, image findings other than calcification, call back by mass finding, and biopsy recommendation by radiologist in the dataset for benign vs. malignant discrimination.
28. The method of Claim 1, wherein the biomarker model comprises a first biomarker submodel and a second biomarker sub-model, and wherein the first biomarker sub-model has been trained to differentiate between cancer samples and benign samples; and the second biomarker sub-model has been trained to differentiate between a third sample type and a fourth sample type, wherein at least of the third sample type and the fourth sample type is different from the first sample type and the second sample type.
29. The method of Claim 28, wherein the first sample type is a healthy sample, the second sample type is a malignant sample, the third sample type is a benign sample, and the fourth sample type is a malignant sample.
30. The method of Claim 28, wherein the first biomarker sub-model and the second biomarker sub-model are run simultaneously.
31. The method of Claim 1, wherein analyzing a trained biomarker model comprises characterizing a classification result for the cancer based on the determined concentration for each of the at least three protein biomarkers by attaching a multivariate machine learning classification model with a classifier.
32. The method of Claim 1, wherein the trained biomarker model comprises a multivariate machine learning classification algorithm with a decision tree-based classifier or a Support Vector Machine classifier.
33. The method of Claim 29, wherein the classifier is a Light Gradient Boosting Machine classifier or a Support Vector Machine classifier.34- The method of Claim 1, wherein providing the model output for assessing the cancer further comprises providing a treatment plan based on the information.
35. The method of Claim 1, further comprising recommending or sending the subject for additional evaluation to the subject who has been assessed as likely having a breast cancer.
36. The method of Claim 1, comprising administering a treatment for breast cancer to a subject who has been assessed as having a breast cancer, wherein the treatment comprises chemotherapy, hormone therapy, immunotherapy, radiation, or surgical resection.
37. A method of assessing a biological sample from a subject for a breast cancer in comprising: determining, in the biological sample, a concentration for each of at least three protein biomarkers selected from Table 2; analyzing the concentrations of the at least three protein biomarkers using a trained biomarker model to obtain a model output; wherein the biomarker model has been trained to discriminate between cancer samples and benign samples; and providing the model output for the assessment of breast cancer.
38. The method of Claim 37, wherein the biological sample is a blood sample.
39. The method of Claim 37, wherein the concentrations of at least six (6) protein biomarkers selected from Table 2 are analyzed.
40. The method of Claim 37, wherein the at least three protein biomarkers include at least three (3) of MMP8; IL-8; CD69; HER3; PTX3; CYR61.
41. The method of Claim 37, wherein the at least three protein biomarkers include at least three (3) of MMP8; IL-8; p2i; FN1; and CD25.
42. The method of Claim 37, wherein the at least three protein biomarkers include at least one (1) of DNAH10, SHOX2, OVGP1, ITGA11, TGFA, and ENTHD1.
43. The method of Claim 37, wherein the biomarkers include at least two of CD25, MMP8, IL-8, and FN1 and at least one biomarker selected from DNAH10, SHOX2, 0VGP1, ITGA11, TGFA, and ENTHD1.44- The method of Claim 37, wherein the at least three protein biomarkers include at least one (1) of RHOXF2, SPDEF, Pro-CASP8, and COL1A1.
45. The method of Claim 44, wherein the biomarkers include at least two of CD25, MMP8, IL-8, and FN1 and at least one biomarker selected from RHOXF2, SPDEF, Pro-CASP8, and COL1A1.
46. The method of Claim 37, wherein the at least three protein biomarkers include at least three (3) of one of the following panels: a. MMP8, IL-8, CD69, HER3, PTX3, CYR61; b. MMP8, IL-8, p2i, FNi, and CD25; c. p2i, CD69, IL-8, MMP8, SCGB2A2 / Mammaglobin, TFFi, CYR61, FNi, PTX3, TNFa, HER3, C32CL9, HGF, ADAM8, CA15-3, CD25, EGFR, HER4, and PR; d. p2i, CD69, IL-8, MMP8, and SCGB2A2 / Mammaglobin; e. TFFi, CYR61, FNi, PTX3, TNFa, and HER3; f. PTX3, TNFa, FABP4, HGF, FNi, and COMP; g. PTX3, CYR61, HSP70, TFFi, FNi, and HER3; h. PTX3, CYR61, TNFa, and FNi; i. FNi, p2i, and IL-8; h. DNAH10, SH0X2, OVGP1, ITGA11, TGFA, ENTHD1; and i. FGD6, ZWILCH, NOP56, TECPR1, AKT3, LCP1, MINDY4B.
47. The method of Claim 37, wherein analyzing the concentrations of the at least three protein biomarkers comprises using an enzyme-linked immunosorbent assay (ELISA) or a digital ELISA.
48. The method of Claim 47, wherein the digital ELISA is a Single-Molecule Array (SIMOA).
49. The method of Claim 37, wherein the concentrations are concentrations that are relative to a test standard or a housekeeping protein concentration.
50. The method of Claim 37, further comprising: obtaining image data of the subject;analyzing the image data using an image model to obtain an image model output; fusing the model output and the image model output to obtain a fused result; and providing information for assessment of a cancer by using the fused result.
51. The method of Claim 50, wherein the image model comprises a deep learning algorithm with a CoaT-Lite network structure.
52. The method of Claim 50, wherein the image model has a logic to integrate the prediction probability calculated from each image into a study-level probability, and wherein the image data comprises a plurality of images from the subject.
53. The method of Claim 50, wherein the image data comprises one or more of: multiple cross-sectional images; multiple types of images, multiple images from right breast, left breast; and multiple images from different image modalities.
54. The method of Claim 50, further comprising: obtaining an annotated image data from a radiologist based on the radiological analysis of the image data.
55. The method of Claim 37, further comprising: obtaining electronic medical record (EMR) data of the subject; analyzing the EMR data using a record model to obtain a record model output; fusing the model output and the record model output to obtain a fused result; and providing information for assessment of a cancer by using the fused result.
56. The method of Claim 37, further comprising: obtaining image data of the subject; obtaining electronic medical record (EMR) data of the subject; analyzing the concentrations of the at least three protein biomarkers, the image data, and the EMR data using the trained biomarker model to obtain a model output.
57. The method of Claim 56, wherein the EMR data analyzed by the record model comprises at least three (3) of: personal history of breast cancer, lesion size, no new symptoms, breasttissue density, age, personal history of benign finding, diagnostic Breast Imaging Reporting and Data System (BI-RAD) score, non-use of hormone-based contraceptive.
58. The method of Claim 56, wherein the EMR data analyzed by the record model comprises at least three (3) of: lesion size, no new symptoms, breast tissue density, age, diagnostic BIRADS score, non-use of hormone-based contraceptive, imaging findings other than mass, Age at menopause, image findings other than calcification, call back by mass finding, and biopsy recommendation by radiologist in the dataset for benign vs. malignant discrimination.
59. The method of Claim 37, wherein the biomarker model comprises a first biomarker submodel and a second biomarker sub-model, and wherein the first biomarker sub-model has been trained to differentiate between cancer samples and benign samples; and the second biomarker sub-model has been trained to differentiate between a third sample type and a fourth sample type, wherein at least of the third sample type and the fourth sample type is different from the first sample type and the second sample type.
60. The method of Claim 59, wherein the first sample type is a healthy sample, the second sample type is a malignant sample, the third sample type is a benign sample, and the fourth sample type is a malignant sample.
61. The method of Claim 59, wherein the first biomarker sub-model and the second biomarker sub-model are run simultaneously.
62. The method of Claim 37, further comprising recommending or sending the subject for additional evaluation to the subject who has been assessed as likely having a breast cancer.
63. The method of Claim 37, comprising administering a treatment for breast cancer to a subject who has been assessed as having a breast cancer, wherein the treatment comprises chemotherapy, hormone therapy, immunotherapy, radiation, or surgical resection.
64. A method of assessing a biological sample from a subject, the method comprising: providing a sample, preferably a sample comprising peripheral blood from a subject, optionally a subject who is suspected of having breast cancer, determining, in the biological sample, a concentration for each of at least four protein biomarkers selected from the biomarkers of Table 2 or from the panels of Table 4.
65. The method of Claim 64, wherein at least three of the protein biomarkers are selected from the group consisting of MMP8; IL-8; CD69; HER3; PTX3; and CYR61.
66. The method of Claim 65, wherein the protein biomarkers include MMP8; IL-8; CD25 and FN1.
67. The method of Claim 64, wherein the at least three protein biomarkers include at least three (3) of MMP8; IL-8; CD69; HER3; PTX3; CYR61.
68. The method of Claim 64, wherein the at least three protein biomarkers include at least three (3) of MMP8; IL-8; p2i; FN1; and CD25.
69. The method of Claim 64, wherein the at least three protein biomarkers include at least one (1) of DNAH10, SHOX2, 0VGP1, ITGA11, TGFA, and ENTHD1.
70. The method of Claim 64, wherein the biomarkers include at least two of CD25, MMP8, IL-8, and FN1 and at least one biomarker selected from DNAH10, SHOX2, 0VGP1, ITGA11, TGFA, and ENTHD1.
71. The method of Claim 64, wherein the at least three protein biomarkers include at least one (1) of RHOXF2, SPDEF, Pro-CASP8, and COL1A1.
72. The method of Claim 71, wherein the biomarkers include at least two of CD25, MMP8, IL-8, and FN1 and at least one biomarker selected from RHOXF2, SPDEF, Pro-CASP8, and COL1A1.
73. The method of any of Claims 64 to 72, wherein the concentration of each of the protein biomarkers is determined by a method selected from digital ELISA, optionally Single-Molecule Arrays (SIMOA); Molecular On-bead Signal Amplification for Individual Counting (MOSAIC); Meso Scale Discovery (MSD); Single-Molecule Counting (SMC); LUMINEX; SOMAscan Assays; mass spectrometry (e.g., MALDI-MS), mass cytometry (e.g., CyTOF) and / or a chemiluminescence immunoassay (CLIA).
74. The method of any of Claims 64 to 72, wherein the subject has a positive result on imaging, optionally a lesion or mass, optionally identified on a mammogram or ultrasound, before or after the sample is assessed, optionally wherein the subject is identified based on a positive result on imaging.
75. The method of any of Claims 64 to 72, further comprising calculating a score for the subject based on the level of the biomarkers, wherein a score above a threshold score indicates that the subject has or is at risk of developing cancer.
76. The method of Claim 75, further comprising calculating a score for the subject based on the level of the biomarkers and comparing the score to subtype reference scores for known subtypes of breast cancer and identifying a subject who has a score that is comparable to the subtype reference as having that subtype of breast cancer.
77. The method of any of Claims 64 to 72, further comprising identifying a subject who has a score above a threshold score, or a score that is comparable to a reference score, and recommending or sending the subject for additional evaluation, optionally by imaging and / or biopsy.
78. The method of any of Claims 64 to 72, further comprising administering a treatment for breast cancer to a subject who has been identified as having or at risk of developing breast cancer.
79. The method of Claim 78, wherein the treatment comprises chemotherapy, hormone therapy, immunotherapy, radiation, or surgical resection.
80. A method of assisting in the diagnosis of a breast cancer in a subject comprising: determining, in a biological sample obtained from the subject, concentrations of at least three of MMP8, IL-8 CD25 and FN1, and at least two (2) of CD69, HER3, PTX3, CYR61, p2i,DNAH10, SHOX2, 0VGP1, ITGA11, TGFA, ENTHD1, RHOXF2, SPDEF, Pro-CASP8, and COL1A1; and using the determined concentrations to provide an information about the probability of the subject having breast cancer.
81. The method of Claim 80, wherein the biological sample is a blood sample.
82. The method of Claim 80, wherein analyzing the concentrations of the at least three protein biomarkers comprises using an enzyme-linked immunosorbent assay (ELISA) or a digital ELISA.
83. The method of Claim 82, wherein the digital ELISA is a Single-Molecule Array (SIMOA).
84. The method of Claim 80, wherein the concentrations are concentrations that are relative to a test standard or a housekeeping protein concentration.
85. The method of Claim 80, further comprising: obtaining image data of the subject;analyzing the image data using an image model to obtain an image model output; fusing the model output and the image model output to obtain a fused result; and providing information for assessment of a cancer by using the fused result.
86. The method of Claim 85, further comprising: obtaining an annotated image data from a radiologist based on the radiological analysis of the image data.
87. The method of Claim 80, further comprising: obtaining electronic medical record (EMR) data of the subject; analyzing the EMR data using a record model to obtain a record model output; fusing the model output and the record model output to obtain a fused result; and providing information for assessment of a cancer by using the fused result.
88. The method of Claim 80, further comprising: obtaining image data of the subject; obtaining electronic medical record (EMR) data of the subject; analyzing the concentrations of the at least three protein biomarkers, the image data, and the EMR data using the trained biomarker model to obtain a model output.
89. The method of Claim 80, wherein providing the model output for assessing the cancer further comprises providing a treatment plan based on the information.
90. The method of Claim 80, further comprising recommending or sending the subject for additional evaluation to the subject who has been assessed as likely having a breast cancer.
91. The method of Claim 80, comprising administering a treatment for breast cancer to a subject who has been assessed as having a breast cancer, wherein the treatment comprises chemotherapy, hormone therapy, immunotherapy, radiation, or surgical resection.
92. A method of assessing a biological sample from a subject for a cancer comprising: determining, in the biological sample, a concentration for each of at least three protein biomarkers selected from Table 1;analyzing the concentrations of a plurality of protein biomarkers using a trained biomarker model to obtain a model output; wherein the biomarker model has been trained using: a first biomarker sub-model trained to differentiate between a first sample type and a second sample type using a first subset of the plurality of protein biomarkers ; a second biomarker sub-model trained to differentiate between a third sample type and a fourth sample type using a second subset of the plurality of protein biomarkers, wherein at least of the third sample type and the fourth sample type is different from the first sample type and the second sample type; and providing the model output for the assessment of the cancer.
93. The method of Claim 92, wherein the first sample type and the second sample type are a healthy sample and a malignant sample.
94. The method of Claim 93, wherein the first subset of the plurality of protein biomarkers includes at least four (4) of : PTX3, TNFa, FN1, TFF1, HGF, and HER3; or at least four (4) of: CYR61, TFF1, FN1, PTX3, TNFa, and HER3.
95. The method of Claim 92, wherein the third sample type and the fourth sample type are a benign sample and a malignant sample.
96. The method of Claim 95, wherein the second subset of the plurality of protein biomarkers includes at least three (3) of : PTX3, TNFa, FN1, and CYR61; or at least three (3) of : MMP8, IL-8, CD69, HER3, PTX3, CYR61, CXCL9, p2i, ADAM8, FN1, CA15-3, CD25 EGFR, HER4, PR, SCGB2A2 / Mammaglobin, HGF, TFF1, and TNFa.
97. The method of Claim 95, wherein the second subset of the plurality of protein biomarkers includes at least three (3) of : MMP8; IL-8; p2i; FN1; CD25, DNAH10, SHOX2, 0VGP1, ITGA11, TGFA, ENTHD1, RHOXF2, SPDEF, Pro-CASP8, and COL1A1.
98. The method of Claim 92, wherein the at least three protein biomarkers includes at least four (4) of : PTX3, CYR61, HSP70, TNFa, FNi, and HER3 and the concentration is a level higher than the healthy control or the reference value.
99. The method of Claim 92, wherein the assessment of a cancer is a diagnosis of a breast cancer.
100. The method of Claim 92, wherein the biological sample is a blood sample.
101. The method of Claim 92, wherein the first biomarker sub-model and the second biomarker sub-model are run simultaneously.
102. The method of Claim 92, wherein analyzing a trained biomarker model comprises characterizing a classification result for the cancer based on the determined concentration for each of the at least three protein biomarkers by attaching a multivariate machine learning classification model with a classifier.
103. The method of Claim 92, wherein the trained biomarker model comprises a multivariate machine learning classification algorithm with a decision tree-based classifier or a Support Vector Machine classifier.
104. The method of Claim 103, wherein the classifier is a Light Gradient Boosting Machine classifier or a Support Vector Machine classifier.
105. The method of Claim 92, wherein providing the model output for assessing the cancer further comprises providing a treatment plan based on the information.
106. A training method comprising: training a first biomarker sub-model to analyze the concentrations of a first subset of the plurality of proteins to differentiate between a first sample type and a second sample type; training a second biomarker sub-model to analyze the concentrations of a second subset of the plurality of proteins to differentiate between a third sample type and a fourth sample type, wherein at least of the third sample type and the fourth sample type is different from the first sample type and the second sample type; and combining the first biomarker sub-model and the second biomarker sub-model to create a biomarker model.
107. The method of Claim 106, further comprising: fusing an output of the biomarker model with an output of an image model to obtain a fused result, wherein the image model is trained using an annotated image data from a radiologist based on the radiological analysis of an image data.
108. The method of Claim 107, wherein the image model comprises a deep learning algorithm with a CoaT-Lite network structure.109- The method of Claim 107, wherein the image model has a logic to integrate the prediction probability calculated from each image into a study-level probability, and wherein the image data comprises a plurality of images from the subject.
110. The method of Claim 106, wherein the image data comprises one or more of: multiple cross-sectional images; multiple types of images, multiple images from a right breast and a left breast; and multiple images from different image modalities.
111. The method of Claim 106, further comprising: fusing an output of the biomarker model with an output of a record model to obtain a fused result, wherein the record model is trained using electronic medical record (EMR) data.
112. The method of Claim 111, wherein the EMR data comprises at least three (3) of: personal history of breast cancer, lesion size, no new symptoms, breast tissue density, age, personal history of benign finding, diagnostic Breast Imaging Reporting and Data System (BI-RAD) score, non-use of hormone-based contraceptive.
113. The method of Claim ill, wherein the EMR data comprises at least three (3) of: lesion size, no new symptoms, breast tissue density, age, diagnostic BI-RADS score, non-use of hormone-based contraceptive, imaging findings other than mass, Age at menopause, image findings other than calcification, call back by mass finding, and biopsy recommendation by radiologist in the dataset for benign vs. malignant discrimination.
114. The method of Claim 107, wherein the first sample type is a healthy sample, the second sample type is a malignant sample, the third sample type is a benign sample, and the fourth sample type is a malignant sample.
115. The method of Claim 107, wherein the first biomarker sub-model and the second biomarker sub-model are run simultaneously.
116. The method of Claim 107, wherein analyzing a trained biomarker model comprises characterizing a classification result for the cancer based on the determined concentration for each of the at least three protein biomarkers by attaching a multivariate machine learning classification model with a classifier.117- The method of Claim 107, wherein the trained biomarker model comprises a multivariate machine learning classification algorithm with a decision tree-based classifier or a Support Vector Machine classifier.
118. The method of Claim 107, wherein the classifier is a Light Gradient Boosting Machine classifier or a Support Vector Machine classifier.
119. The method of Claim 107, wherein providing the model output for assessing the cancer further comprises providing a treatment plan based on the information.
120. A diagnostic kit for performing the method of Claim 1, Claim 37, Claim 64, Claim 80, or Claim 92.
121. A system comprising: an immunoassay device; one or more processors; and one or more memories, wherein the one or more processors and the one or more memories are configured to perform the performing the method of Claim 1, Claim 37, Claim 64, Claim 80, or Claim 92.
122. One or more computer-readable media storing instructions that, when executed by one or more computing devices, cause the one or more computing devices to perform operations comprising the processes described in Claim 105.