Methods, compositions, and systems to detect non-small cell lung cancer

PhiP-Seq and machine learning classifiers enable effective detection of NSCLC by profiling autoantibody signatures, addressing limitations of existing methods and enhancing early-stage diagnosis.

WO2026136958A1PCT designated stage Publication Date: 2026-06-25CZ BIOHUB SF LLC +1

Patent Information

Authority / Receiving Office
WO · WO
Patent Type
Applications
Current Assignee / Owner
CZ BIOHUB SF LLC
Filing Date
2025-12-19
Publication Date
2026-06-25

AI Technical Summary

Technical Problem

Current screening methods for non-small cell lung cancer (NSCLC) are inadequate, particularly for never-smokers and early-stage patients, with low adoption rates due to false positives, high costs, and limited sensitivity of existing tests like LDCT and liquid biopsies.

Method used

A method using programmable phage immunoprecipitation sequencing (PhiP-Seq) to profile autoantibody repertoires in NSCLC patients, combined with machine learning classifiers to identify biomarkers and detect NSCLC through autoantibody signatures in biological samples.

Benefits of technology

The approach provides a robust and early detection of NSCLC, maintaining performance across stages and asymptomatic patients, offering a novel and scalable solution for identifying individuals at risk and improving prognosis.

✦ Generated by Eureka AI based on patent content.

Smart Images

  • Figure US2025060772_25062026_PF_FP_ABST
    Figure US2025060772_25062026_PF_FP_ABST
Patent Text Reader

Abstract

Provided are embodiments of methods to measure the presence and / or amount of one or more biomarkers associated with NSCLC in a subject, methods of identifying a subject at risk for NSCLC, methods of identifying a subject with NSCLC and treating the individual, methods of identifying a subject with NSCLC and monitoring the individual. Also provided are kits, systems, and devices for detecting NSCLC biomarkers. Also provided are methods of detecting presence of one or more antibodies to one or more biomarkers associated with NSCLC in a subject, as well as kits, devices and systems for detecting one or more antibodies to one or more biomarkers associated with NSCLC. Also provided are methods of detecting antigens of autoantibodies associated with NSCLC.
Need to check novelty before this filing date? Find Prior Art

Description

Attorney Docket No. 110221 -1532974-011210WOMETHODS, COMPOSITIONS, AND SYSTEMS TO DETECT NON-SMALL CELL LUNG CANCERSEQUENCE LISTING

[0001] The instant application contains a Sequence Listing which has been submitted herewith and is hereby incorporated by reference in its entirety. Said.xml copy, created on December 18, 2025, is named 110221-1532974-011210WO, and is 11,755 bytes in size.BACKGROUND

[0002] Non-small cell lung cancer (NSCLC) is the leading cause of cancer deaths in the United States, accounting for an estimated 238,340 new cases and 127,070 deaths in 2023 (American Cancer Society 2023). Although the prognosis of NSCLC improves significantly when diagnosed at an earlier stage (5-year survival of 61% at Stage I, as compared to 6% at Stage IV), up to half of NSCLC patients have stage IV disease at the time of diagnosis (Siegel et al. 2023). Low-dose chest computer tomography (LDCT) scans are effective for detecting lung cancer in patients with a significant smoking history (Mazzone et al. 2021). However, less than 5% of the eligible patients undergo LDCT screening, with cost and convenience serving as barriers to widespread adoption (Jonas et al. 2021; Meza et al. 2021; de Koning et al. 2020). Furthermore, current LDCT screening recommendations do not extend to the patients who have never smoked, while the incidence of never-smoker lung cancer continues to increase in the United States, particularly in women (Pelosof et al. 2017; Jemal et al. 2023). Never-smoker patients now accounting for up to one in four new NSCLC diagnoses (Cho et al. 2017;Zhang et al. 2021). Widespread adoption of LDCT presents its own challenges, given the rate and burden of false positives and incidental findings (Ding, Eisenberg, and Pandharipande 2011; Novellis et al. 2021). Liquid biopsy tests detect cancer by measuring analytes in the blood, such as circulating tumor cells and nucleic acids. While these tests can identify actionable mutations in late-stage disease, issues of sensitivity have limited their use in the de novo detection of early -stage cancers (Fernandez-Uriarte et al 2022). Novel screening approaches are needed that can identify individuals at risk for NSCLC and early-stage NSCLC patients, and that can also be deployed at a large scale.Attorney Docket No. 110221 -1532974-011210WOBRIEF SUMMARY

[0003] The terms “invention,” “the invention,” “this invention” and “the present invention,” as used in this document, are intended to refer broadly to all of the subject matter of this patent application and the claims below. Statements containing these terms should be understood not to limit the subject matter described herein or to limit the meaning or scope of the patent claims below. Covered embodiments of the invention are defined by the claims, not this summary. This summary is a high-level overview of various aspects of the invention and introduces some of the concepts that are described and illustrated in the present document and the accompanying figures. This summary is not intended to identify key or essential features of the claimed subject matter, nor is it intended to be used in isolation to determine the scope of the claimed subject matter. The subject matter should be understood by reference to appropriate portions of the entire specification, any or all figures and each claim. Some of the exemplary embodiments of the present invention are discussed below.

[0004] Included among the exemplary embodiments of the present invention are methods to measure presence and / or amount of one or more biomarkers associated with NSCLC in a subject, comprising a step of measuring in a biological sample obtained from the subject presence or amount of an expression product from one or more genes encoding the one or more biomark ers associated with NSCLC, wherein the one or more genes are shown in Table 5 A, Table 5B, and / or Table 5C, wherein the measured presence or amount of the expression product in the biological samples identifies the subject as being at risk for or having NSCLC. Also included among the exemplary embodiments of the present invention are methods of identifying a subject at risk for NSCLC, comprising a step of measuring in a biological sample obtained from the subject an amount of an expression product from one or more genes encoding the biomarkers associated with NSCLC, wherein the one or more genes are shown in Table 5 A, Table 5B, and / or Table 5C, and wherein presence of an altered level of the expression product from the one or more genes, as compared to a healthy control or to a pre-established amount, identifies the subject as being at risk for N SCLC. Also included among the exemplary embodiments of the present invention are methods of identifying a subject with NSCLC and treating the subject, the methods comprising the steps of: (a) measuring in a biological sample obtained from the subject an altered amount of an expression product, as compared to an amount measured in a healthy control or a pre-established amount, from one or more genes encoding the biomarkers associatedAttorney Docket No. 110221 -1532974-011210WOwith NSCLC, wherein the one or more genes are shown in Table 5 A, Table 5B, and / or Table 5C; and, (b) administering to the subject one or more NSCLC treatments. In some exemplary’ embodiments of the above methods, the one or more NSCLC treatments comprise one or more of surgery’, chemotherapy, targeted therapy, or radiation therapy. Also included among the exemplary’ embodiments of the present invention are methods of identifying a subject with Non¬ Small Cell Lung Cancer (NSCLC), and monitoring the subject, the methods comprising the steps of: (a) measuring in a biological sample obtained from the subject an altered amount, as compared to an amount measured in a healthy control or a pre-established amount, of an expression product from one or more genes encoding one or more biomarkers associated with NSCLC, wherein the one or more genes are shown in Table 5 A, Table 5B, and / or Table 5C; and, (b) repeating step (a) at a later time point to determine if an altered amount measured at the later time point increased or decreased, as compared to the altered amount measured in step (a),

[0005] In some exemplary embodiments of the methods summarized in the previous paragraph, the one or more genes comprise at least one of PRRC2A, LAX1, QTRT1, or NRAC. In some exemplary embodiments, the one or more genes are all genes shown Table 5A, Table 5B, and / or Table 5C. In some exemplary' embodiments, expression product is and / or a protein or an mRNA. In some exemplary embodiments, the step of measuring comprises performing one or more of laboratory methods. In some exemplary embodiments, the one or more laboratory methods comprise polymerase chain reaction, an immunoassay, or using an array of expression products. In some exemplary embodiments, the biological sample comprises at least one biological fluid or a tissue. In some exemplary embodiments, the biological fluid comprises blood, lymph, plasma, serum, interstitial fluid, phlegm, sputum, or pleural fluid. In some exemplary embodiments, the tissue sample comprises on or more of a lung tissue or a bronchial tissue.

[0006] Included among the exemplary embodiments of the present invention are kits comprising at least one reagent and / or at least one device for detecting an expression product of the one or more genes shown in Table 5A, Table 5B, or Table 5C. In some exemplary embodiments of such kits, the one more genes comprise at least one of PRRC2A, LAX1, QTRT1, or NRAC. In some exemplary embodiments of the kits, the one or more genes are all genes shown Table 5A, Table 5B, and / or Table 5C. In some exemplary embodiments of suchAttorney Docket No. 110221 -1532974-011210WOkits, the expression product is a protein and / or an mRNA. In some exemplary embodiments of such kits, the at least one reagent comprises one or more reagents for performing PCR and / or immunoassay. In some exemplary embodiments of such kits, the at least one device comprises an array of expression products.

[0007] Also included among the exemplary embodiments of the present invention are systems for performing one or more of the steps of the methods according to the exemplary embodiments of the present invention and summarized above, and / or using the kits according to the exemplary' embodiments of the present invention and summarized above. In some exemplary embodiments, a system comprises a computer-program product tangibly embodied in a non-transitory machine-readable storage medium, including instructions configured to run the system and / or perform one or of the steps of the discussed above.

[0008] Included among the exemplary embodiments of the present invention are methods of detecting presence and / or amount of one or more antibodies to one or more biomarkers associated with NSCLC in a subject, wherein the one or more biomarkers are one or more polypeptides encoded by one or more genes shown in Table 5A, Table 5B, or Table 5C, the methods comprising the step of: (a) contacting a biological sample obtained from the subject with the one or more polypeptides encoded by the one or more genes shown in Table 5 A, Table 5B, or Table 5C; and, (b) detecting binding of the one or more polypeptides with the one or more antibodies in the biological sample. Also included among the exemplary embodiments of the present invention are methods of identifying a subject at risk for NSCLC, the methods comprising the steps of: (a) contacting a biological sample obtained from the subject with one or more polypeptides encoded by one or more genes shown in Table 5 A, Table 5B, or Table 5C; and, (b) detecting binding of the one or more polypeptides with one or more antibodies in the biological sample, wherein the presence of binding of the one or more polypeptides with the one or more antibodies in the biological sample identifies the subject as being at risk for NSCLC. Also included among the exemplary embodiments of the present invention are methods of detecting and / or amount of one or more antibodies to one or more biomarkers associated NSCLC in a subject, wherein the one or more biomarkers are one or more polypeptides encoded by one or more genes shown in Table 5 A, Table 5B, or Table 5C, and treating the subject, the methods comprising the steps of: (a) contacting a biological sample obtained from the subject with theAttorney Docket No. 110221 -1532974-011210WOone or more polypeptides encoded by the one or more genes shown in Table 5 A, Table 5B, or Table 5C; (b) detecting presence of binding of the one or more polypeptides to the one or more antibodies in the biological sample; and, (c) administering to the subject one or more NSCLC treatments. In some exemplary' embodiments pf the above methods, the one or more NSCLC treatments comprise one or more of one or more of surgery, chemotherapy, targeted therapy, or radiation therapy.

[0009] In some exemplary embodiments of the methods summarized in the previous paragraph, the one or more genes comprise at least one of PRRC2A, LAX1, QTRT1, or NRAC. In some exemplary embodiments, the one or more genes are all genes shown Table 5 A, Table 5B, and / or Table 5C. In some exemplary embodiments, the one or more polypeptides comprise at least one sequence with at least 90% sequence similarity to SEQ ID NO: 1, SEQ ID NO:2, SEQ ID NO: 3, or SEQ ID NO:4. In some exemplary embodiments, the one or more polypeptides are one or more heterologously expressed polypeptides. In some exemplary embodiments, the one or more heterologously expressed polypeptides are expressed on a surface of a cell, on a phage, or on a virus. In some exemplary embodiments, the one or more heterologously expressed polypeptides are expressed in a phage display or eukaryotic cell display library. In some exemplary embodiments, the one or more polypeptides are isolated polypeptides. In some exemplary embodiments, the one or more polypeptides are immobilized on a solid carrier. In some exemplary embodiments of the methods summarized in the previous paragraph, step (b) comprises performing at least one of immunoprecipitation, microarray analysis, enzyme-linked immunosorbent assay (ELISA), or Western blot analysis. In some exemplary embodiments, the biological sample comprises at least one biological fluid or a tissue. In some exemplary embodiments, the biological fluid comprises blood, lymph, plasma, serum, interstitial fluid, phlegm, sputum, or pleural fluid. In some exemplary embodiments, the tissue comprises a lung tissue or a bronchial tissue. In some exemplary embodiments, the one or more antibodies in the biological sample comprise autoantibodies.

[0010] Included among the exemplary embodiments of the present invention are kits for detecting one or more antibodies to one or more biomarkers associated with NSCLC, comprising one or more polypeptides encoded by one or more genes shown in Table 5 A, Table 5B, or Table 5C, and one or more other reagents for detecting the one or more antibodies in the sample. AlsoAttorney Docket No. 110221 -1532974-011210WOincluded among the embodiments of the present invention are devices for detecting one or more antibodies to one or more biomarkers associated with NSCLC, comprising one or more polypeptides encoded by one or more genes shown in Table 5 A, Table 5B, or Table 5C, wherein the one or more polypeptides are immobilized on a surface of a solid carrier included in the device. In some exemplary embodiments of such devices, the solid carrier is a slide, a chip, a plate, a plurality of fibers, a plurality of beads, a chromatography column, or a membrane. Also included among the embodiments of the present invention are systems for detecting one or more antibodies to one or more biomarkers associated with NSCLC, comprising at least one device according to the exemplary embodiments discussed above, and one or more reagents, other than the one or more polypeptides. Also included among the embodiments of the present invention are kits according to the embodiments summarized above, devices according to the embodiments summarized above, or systems according to the embodiments summarized above, wherein the one or more genes are all genes shown Table 5 A, Table 5B, and / or Table 5C. Also included among the embodiments of the present invention are kits according to the embodiments summarized above, devices according to the embodiments summarized above, or systems according to the embodiments summarized above, wherein the one more genes comprise at least one of PRRC2A, LAX1, QTRT1, or NRAC. Also included among the embodiments of the present invention are kits according to the embodiments summarized above, devices according to the embodiments summarized above, or systems according to the embodiments summarized above, wherein the one or more polypeptides comprise at least one sequence with at least 90% sequence similarity to SEQ ID NO:1, SEQ ID NO:2, SEQ ID NO:3, or SEQ ID NO:4.

[0011] Included among the exemplary embodiments of the present invention are methods of detecting NSCLC, comprising the steps of: (a) contacting a first biological sample obtained from a healthy human subject with a peptide display library under conditions sufficient to permit binding of a first plurality of antibodies from the first biological sample to a first plurality of corresponding polypeptides within the peptide display library to generate a first plurality of antibody-polypeptide complexes; (b) contacting a second biological sample obtained from a human subject who has NSCLC with the peptide display library under conditions sufficient to permit binding of a second plurality of antibodies from the first biological sample to a second plurality of corresponding polypeptides within the peptide display library to generate a second plurality of antibody-polypeptide complexes; (c) identifying the first plurality of the polypeptidesAttorney Docket No. 110221 -1532974-011210WOand the second plurality of the polypeptides; and, (d) using a computer system, identifying a third plurality of polypeptides as the polypeptides found in the second plurality of the polypeptides but not in the first plurality of polypeptides, wherein the third plurality of the polypeptides are the antigens of the autoantibodies associated with the risk of NSCLC In some exemplary embodiments, the peptide display library is a phage display library. In some exemplary embodiments, a nucleic sequence encoding each peptide of the phage display library comprises a unique nucleic acid barcode sequence. In some exemplary' embodiments, the methods further comprise the steps of: subjecting the first plurality of complexes and the second plurality of complexes to nucleic acid amplification under conditions sufficient to amplify the unique nucleic acid barcode sequence to generate a plurality of amplified nucleic acid sequences; determining sequences of the amplified nucleic acid sequences to generate determined sequences; and, in step (d), using the determined sequences for the identifying the first plurality of the polypeptides and the second plurality of the polypepti des. In some exemplary embodiments the third plurality of the polypeptides comprises polypeptides with at least 90% sequence similarity to SEQ ID NO: 1, SEQ ID NO: 2, SEQ ID NO: 3, or SEQ ID NO:4.BRIEF DESCRIPTION OF THE DRAWINGS

[0012] FIG. 1 is a schematic illustration of an embodiment of the cancer diagnostic pipeline according to the present disclosure. A serum sample is subject to the PhlP-Seq assay. The results are fed into a bagged machine learning model (chosen after evaluation with 10-fold cross-validation), which outputs a probability of tumor presence.

[0013] FIG. 2 is a bar graph illustrating the comparison of individual component model sensitivity at various specificity thresholds for various models on gene and peptide levels.

[0014] FIG. 3 shows receiver-operator-characteristic (ROC) curves for the classification of NSCLC patients vs. healthy controls based on a bagged model, the best performing classifier (bootstrapped across 1000 iterations, with an average Receiver Operating Characteristic Area Under the Curve (ROC-AUC):::0.94).

[0015] FIG. 4 shows ROC curves of varying feature space sizes, determined by maximum RP100K threshold across all samples, NSCLC and healthy control.Attorney Docket No. 110221 -1532974-011210WO

[0016] FIG. 5 shows ROC curves of varying train-test split sizes, across NSCLC and healthy control, cross-validated.

[0017] FIG. 6 is a dot plot illustrating the results of split-luciferase binding assay of PRRC2A protein (Mann- Whitney U test: p > 0.0001).

[0018] FIG. 7 is a dot plot illustrating the results of split-luciferase binding assay of QTRT1 protein (Mann- Whitney U test: p > 0.0001).

[0019] FIG. 8 is a dot plot illustrating the results of split-luciferase binding assay of LAX1 protein (Mann- Whitney U test: p = 0.0005).

[0020] FIG. 9 is a dot plot illustrating the results of split- luciferase binding assay of NRAC protein (Mann- Whitney U test: p> 0.0001).

[0021] FIG. 10 shows a group of line plots generated based on the data obtained from The Cancer Genome Atlas for Lung Adenocarcinoma (TCGA-LUAD) and illustrating that the expression of PRRC2A gene is significantly enriched in NSCLC tissue compared to healthy lung tissue from the same patient (Wilcoxon signed-rank test: p < le-5).

[0022] FIG. 11 shows a group of line plots generated based on the data obtained from TCGA-LUAD and illustrating that the expression of QTRT1 gene is significantly enriched in NSCLC tissue compared to healthy lung tissue from the same patient (Wilcoxon signed-rank test: p < 0.002).

[0023] FIG. 12 shows a group of line plots generated based on the data obtained from TCGA-LUAD and illustrating that the expression of LAX1 gene is significantly enriched in NSCLC tissue as compared to healthy lung tissue from the same patient (Wilcoxon signed-rank test: p < 0.002).

[0024] FIG. 13 is a bar graph illustrating that gene expression in various healthy tissues from the Genotype-Tissue Expression (GTEx) indicates potential lung-specificity, when comparing average Z-score across all available tissue types in sets of top features (25, 50, 75, and 100) to relevant comparative tissue types.

[0025] FIG. 14 shows ROC curves of classification of independent blinded validation cohort (137 NSCLC patient and 96 healthy subjects, all previously unseen) with model trained on entireAttorney Docket No. 110221 -1532974-011210WOinitial discovery cohort (301 NSCLC patients and 352 healthy subjects). Predictive power decreased compared to cross-validation, as expected, but remained quite robust(ROC-AUC = 0.84).

[0026] FIG. 15 are ROC curves illustrating that manipulated input testing performed as expected, with no effect from input re-ordering (ROC-AUC = 0.84), a small decrease when a constant input of 50 reads was systematically added to all samples (ROC-AUC = 0.83), and complete signal ablation with random inputs (ROC-AUC = 0.44).

[0027] FIG. 16 are box plots illustrating the distributions of classifier predictions (0 represents healthy, 1 represents NSCLC) for all samples in both training and validation cohorts. Significant differences were seen in both cohorts (Mann- Whitney U-test).

[0028] FIG. 17 are box plots illustrating the distributions of classifier predictions split by smoking status. The predictions for smokers were consistently higher, significantly so in the validation cohort.

[0029] FIG. 18 are box plots illustrating the distributions of classifier predictions split by NSCLC stage (Stage I vs. Stages II-IV). No significant differences were observed.

[0030] FIG. 19 are box plots illustrating the distributions of classifier predictions split by symptomatic status at time of diagnosis. No significant differences were observed.

[0031] FIG. 20 are ROC curves illustrating “drop-out” analysis for NSCLC classification according to some embodiments of the present disclosure.DETAILED DESCRIPTIONIntroduction

[0032] The inventors used programmable phage immunoprecipitation sequencing (PhlP-Seq) to profile the autoantibody repertoire in non-small cell lung cancer (NSCLC) patients. Using these autoantibody profiles, the inventors then trained a machine learning-based classifier to distinguish NSCLC patients from healthy controls using 301 primarily early-stage, asymptomatic NSCLC patients and 352 healthy controls. The classifier performed well in cross-validation (average ROC-AUC = 0.94) and in an independently analyzed clinical validation cohort of 137 NSCLC patients and 96 healthy controls (ROC-AUC = 0.84). Classification performance wasAttorney Docket No. 110221 -1532974-011210WOmaintained with only a few hundred target peptides, provided a sufficiently large cohort was used for optimal training. Based on the above studies, the inventors discovered the existence of a measurable autoreactive humoral profile in NSCLC The discovery' of the measurable autoreactive humoral profile in NSCLC led the inventors to conclude that that a number of protein antigens may be overexpressed in NSCLC. This, in turn, led to the discovery of the biomarkers and biomarker signatures associated with NSCLC, which are described in the present disclosure. As a result, the inventors conceived the methods, compositions, kits, devices, and systems for NSCLC detection.

[0033] Accordingly, described in the present disclosure are embodiments of methods to measure the presence and / or amount of one or more biomarkers associated with NSCLC in a subject, methods of identifying a subject at risk for NSCLC, methods of identifying a subject with NSCLC and treating the individual, methods of identifying a subject with NSCLC and monitoring the individual. Also described in the present disclosure are kits, systems, and devices for detecting NSCLC biomarkers. Also described in the present disclosure are methods of detecting presence of one or more antibodies to one or more biomarkers associated with NSCLC in a subject, as well as kits, devices and systems for detecting one or more antibodies to one or more biomarkers associated with NSCLC. / Xlso described in the present disclosure are methods of detecting antigens of autoantibodies associated with NSCLC.Overview

[0034] As described in the present disclosure, the inventors used machine learning techniques to derive cancer- specific signatures following PhlP-Seq autoantibody profiling of plasma from patients with NSCLC, as well as healthy individuals, and developed a highly predictive classifier model of NSCLC. The inventors used PhlP-Seq to systematically profile the proteome-wide humoral response to NSCLC. These autoantibody profiles served as the basis of the classifier that reliably distinguished NSCLC / patients from healthy controls. Classifier performance was maintained in a blinded independent clinical validation cohort. The inventors identified biomarkers and biomarkers combinations that can be used for detecting NSCLC by testing biological samples obtained from the individuals for such biomarkers. In some embodiments, the samples may be biological fluids. A recurring issue among the existing nucleic acid-based early detection approaches for cancer is a significant decrease in test performance in early-stage andAttorney Docket No. 110221 -1532974-011210WOasymptomatic patients. In contrast, the classifier model used by the inventors maintained its performance in the patients with Stage I disease as well as in the patients with Stage II-IV disease, and also in the patients who were asymptomatic at the time of sample collection.Coupled with iterative rounds of enrichment using PhiP-Seq, autoantibody profiling offers a unique way to capitalize on the immune response to cancer, which are present in the earliest stages of the disease, and thus a means to identify patients with occult cancers. This is an important advantage of the various embodiments of the invention according to the present disclosure, as earlier diagnosis is associated with markedly improved outcomes. The methods of NSLC detection conceived by the investors are novel, robust, and are suitable for early cancer detection.

[0035] The adaptive immune system plays a significant role in the earliest immune responses to cancer, from the removal of aberrant cells before they become cancerous(immunosurveil lance) to the equilibrium-like maintenance of more advanced lesions (immunoediting) (O’Donnell, Teng, and Smyth 2019), The majority of previous attempts at profiling the antibody response to cancer-specific antigens or self-antigens have relied on a candidate approach to build antibody panels for cancer screening and prognosis (Patel et al. 2022; Yang et al. 2022; Lastwika et al. 2023). While some of these approaches have shown early promise, validation on larger cohorts has faced challenges (Schrag et al. 2023).

[0036] Programmable phage display and phage immunoprecipitation sequencing (PhiP-Seq) is a powerful tool for comprehensively profiling the humoral immune repertoire with proteome-wide resolution (Larman et al. 2011; O’Donovan et al. 2020). As opposed to classical phage display, wherein short random peptides are displayed, programmable phage display leverages large-scale oligonucleotide synthesis to create libraries of long (40+ amino acid) overlapping peptides spanning a given proteome of protein-coding sequences, encoded and displayed by T7 bacteriophage. These phage libraries are applied to iterative cycles of immunoprecipitation pull¬ downs with patient immunoglobins, followed by next-generation sequencing to quantify individual peptide enrichments relative to the pre-selected library, thereby generating an antibody “autoreactome” signature from each patient sample (Bodansky et al. 2024). PhiP-Seq has been used to identify autoimmune targets associated with a wide range of diseases, including, but not limited to, genetic immune dysregulation (Vazquez et al. 2020), paraneoplastic diseasesAttorney Docket No. 110221 -1532974-011210WO(Mandel-Brehm et al. 2019; O’Donovan et al. 2020), rare childhood disorders (Mandel- Brehm et al., 2022), and cancer immunotherapy-related adverse events (Mandel-Brehm et al. 2023). Outside of cancer associated with paraneoplastic disease, however, this technique has not been applied to identify a broader antibody signature associated with cancer and use such a signature for detection and patient classification.

[0037] Several PhlP-Seq analysis approaches have been described, generally relying on various statistical techniques to determine a set of differentially enriched peptides or proteins (Mohan etal. 2018; O’Donovan et al. 2020; Vazquez et al. 2020; Chen et al. 2022;Raghavan et al. 2023). Machine learning, particularly deep learning, has been successfully applied in some contexts of cancer immunology, including HLA-specific neoantigen mass-spectrometry (Bulik-Sullivan et al. 2018) and T-cell receptor sequencing (Sidhom et al. 2021), Machine learning with VirScan, PhlP-Seq on viral epitopes, has been used to highlight at-risk patients for hepatocellular carcinoma through a signature primarily driven by hepatitis C infection (J. Liu et al. 2020), Basic machine learning approaches have been previously applied to PhlP-Seq data (Vazquez et al. 2022), classifying patients with a known monogenic autoimmune disorder, autoimmune poly endocrine syndrome type 1 (APS-1), versus healthy controls solely based on PhlP-Seq signatures. These models identified previously known and novel antigens that were subsequently validated through orthogonal assays. While diseases such as APS1 represent dramatic shifts in the immune repertoire, early cancers present comparatively fewer and more subtle changes. Therefore, more advanced machine learning models may be necessary to interrogate such phenotypes thoroughly. Analysis of top autoantibody targets with a classifier according to the present disclosure revealed some proteins that are preferentially enriched in tumor tissue, when compared to normal lung tissue within the same patients and may have greater lung-specific expression compared to other tissue types. Increased expression of some of the genes corresponding to the validated peptides have been previously implicated in poor prognosis in lung cancer for QTRT1 (Ma and He 2020) and immune infiltration for PRRC2A (X. Liu et al. 2021).

[0038] Moving from a hit-calling analytical paradigm to a signature-based biomarker approach, the methods provided in disclosure offer tremendous advantages for NSCLC diagnostics. While a single linear model on the protein level was sufficient for the classificationAttorney Docket No. 110221 -1532974-011210WOof APS! versus healthy controls (Vazquez et al. 2022), the improved performance of an ensemble model indicates that the component models provide complementary and nonredundant information. Given the tiling and isoform redundancy built into a peptide library' (O’Donovan et al. 2020), collapsing by protein may boost the signal from lower-abundance single peptides with shared sequences (both exact matches and biochemical similarities).Similarly, neural networks may recognize meaningful interactions between seemingly unrelated peptides, potentially indicative of amino acid sequence similarity and biological relationships (such as shared pathways) unheeded by linear models. The robust performance of an ensemble model with relatively standard machine learning components, as described in the present disclosure, is an important advantage achieved by the inventors. The use of linear model components allowed the extraction of features that were experimentally validated and biologically meaningful, while the non-linear components appear to add latent value. The results described in the present disclosure show the presence of a distinct humoral autoreactive signal in patients with NSCLC that can be used to distinguish them from healthy controls using machine learning applied to PhlP-Seq autoantibody profiles. The ability to detect tumors using a serological autoantibody-based signature before the development of overt disease is highly advantageous for early detection of NSCLC and can be used instead of or together with the existing liquid-biopsy approaches.Terms and concepts

[0039] A number of terms and concepts are discussed below. They are intended to facilitate the understanding of various embodiments of the invention in conjunction with the rest of the present document and the accompanying figures. These terms and concepts may be further clarified and understood based on the accepted conventions in the fields of the present invention, as well as the description provided throughout the present document and / or the accompanying figures. Some other terms can be explicitly or implicitly defined in other sections of this document and in the accompanying figures, and may be used and understood based on the accepted conventions in the fields of the present invention, the description provided throughout the present document and / or the accompanying figures. The terms not explicitly defined can also be defined and understood based on the accepted conventions in the fields of the presentAttorney Docket No. 110221 -1532974-011210WOinvention and interpreted in the context of the present document and / or the accompanying figures.A. General

[0040] Unless otherwise dictated by context, singular terms include pluralities, and plural terms include the singular. Generally, nomenclatures used in connection with, and techniques of, cell and tissue culture, molecular biology, immunology, microbiology, genetics and protein and nucleic acid chemistry’ are those well-known and commonly used. Known methods and techniques are generally performed according to conventional methods well-known and as described in various general and more specific references, unless otherwise indicated. The nomenclatures used in connection with the laboratory procedures and techniques described in the present disclosure are those well-known and commonly used.

[0041] As used herein, the terms “a”, “an”, and “the” can refer to one or more unless specifically noted otherwise.

[0042] The use of the term “or” is used to mean “and / or,” unless explicitly indicated to refer to alternatives only, or the alternatives are mutually exclusive, although the disclosure supports a definition that refers to only alternatives and “and / or.” As used herein “another” can mean at least a second or more.

[0043] The terms “about” and “approximately” as used herein shall generally mean an acceptabl e degree of error for the quantity measured given the nature or precision of the measurements. Exemplary degrees of error are within 20% (%); preferably, within 10%; and more preferably, within 5% of a given value or range of values. Any reference to “about X” or “approximately X” specifically indicates at least the values X, 0.95X, 0.96X, 0.97X, 0.98X, 0.99X, 1.01X, 1.02X, 1.03X, 1.04X, and 1.05X. Thus, expressions “about X” or “approximately X” are intended to teach and provide written support for a claim limitation of, for example, “0.98X.” Alternatively, in biological systems, the terms “about” and “approximately” may mean values that are within an order of magnitude, preferably within 5-fold, and more preferably within 2-fold of a given value. Numerical quantities given herein are approximate unless stated otherwise, meaning that the term “about” or “approximately” can be inferred when not expressly stated. When “about” is applied to the beginning of a numerical range, it applies to both ends of the range.Attorney Docket No. 110221 -1532974-011210WO

[0044] A receiver operating characteristic (ROC) curve, is a graphical plot that illustrates the performance of a binary classifier model (can be used for multi class classification as well) at varying threshold values of true positive rate (sensitivity) and false positive rate (1 - specificity) of the model. Area under the curve (AUC) is a measure of performance, representing the probability that the model will classify a positive input higher than a negative input. AUC=1 represents perfect classification, while AUC = 0.5 represents random chance. For example, in the present disclosure ROC curves are shown in FIG. 3, FIG. 4, FIG. 5, FIG. 14, FIG. 15, and FIG. 20.

[0045] As used herein, the terms “biomarker,” “marker,” and the related terms and expressions refer to one or more nucleic acids (such as mRNA, DNA or other nucleic acids), polypeptides and / or other biomolecules (such as carbohydrates, cholesterol, lipids) that can be used to diagnose, or to aid in the diagnosis or prognosis of a disease or a condition of interest, either alone or in combination with other biomarkers; monitor the progression of a disease or syndrome of interest; and / or monitor the effectiveness of a treatment for a syndrome or a disease of interest. The expressions “biomarker signature,” “biomarker ensemble,” and the related terms and expressions refer to a combination of biomarkers that can be used for the above purposes. For example, in the present disclosure, biomarkers may be one or more expression products (mRNA or polypeptides) encoded by the one or more genes shown in Table 5A, Table 5B, or Table 5C. A “biomarker signature” may be any combination of two or more of the above biomarkers.”

[0046] The terms “individual,” “subject,” “person,” and “patient” can be used interchangeably in the present disclosure to refer to a non-human animal or a human. Examples of subjects include, but are not limited to: humans and other primates, including non-human primates, such as chimpanzees and other apes and monkey species; farm animals, such as cattle, sheep, pigs, goats and horses; domestic mammals such as dogs and cats; laboratory animals including rodents, such as mice, rats and guinea pigs; birds, including domestic, wild and game birds, such as chickens, turkeys and other gallinaceous birds, ducks, geese, and the like. The terms individual, subject, and patient, by themselves, do not denote a particular age, sex, race, or clinical status. Thus, subjects of any age, whether male or female, are intended to be covered by the present disclosure. Likewise, the methods of the present invention can be applied to anyAttorney Docket No. 110221 -1532974-011210WOhuman race, including, for example, Caucasian (white), African- American (black), Native American, Native Hawaiian, Hispanic, Latino, Asian, and European. For example, in the present disclosure, an “individual,” “subject,” “person,” and “patient” may be a human having NSCLC, suspected of having NSCLC, suspected of having NSCLC, in treatment for NSCLC, or in remission from NSCLC.

[0047] The terms “administering” or “administration,” when using in the context of the present disclosure (and the related terms and expression), encompass act of physically delivering a substance as it exists outside the body (for example, an immunogenic composition described in the present disclosure) into a subject. Administration can be by mucosal, intradermal, intravenous, intramuscular, subcutaneous delivery and / or by any other known methods of physical delivery. Administration encompasses direct administration, such as administration to a subject by a medical professional or self-administration, or indirect administration, which may be the act of prescribing a composition described in the present disclosure. In the above context, administration may refer to administration of pharmaceuticals, including, but not limited to, chemotherapeutic or immunotherapeutic pharmaceutical to a patient, such as an NSCLC patient. The terms “administering” or “administration,” when using in the context of the present disclosure, also encompass performing surgery or radiation treatment on the patient, such as an NSCLC patient.

[0048] As used herein, the term “biological sample” encompasses any sample obtained from a biological source. A biological sample can, by way of non-limiting example, include blood, lymph, serum, plasma, tissue biopsy, interstitial fluid, phlegm, pleural fluid, cerebrospinal fluid, sputum, bronchial washings, urine, feces, epidermal sample, skin sample, cheek swab, amniotic fluid, cultured cells, bone marrow sample. The term biological sample encompasses samples which have been processed to release or otherwise make available a protein for detection as described herein. A biological sample may be processed prior to use in a detection assay including dilution, addition of buffer or preservative, concentration, purification, or partial purification. Fixed or frozen tissues also may be used.

[0049] The term “non-small cell lung cancer” or “NSCLC” refers a disease in which malignant cancer cells form in the tissues of the lung. NSCLC typically encompasses any type of epithelial lung cancer other than small-cell lung cancer (SCLC). Examples of non-small cell lung cancersAttorney Docket No. 110221 -1532974-011210WOinclude, but are not limited to, squamous cell carcinoma, large cell carcinoma, and adenocarcinoma. NSCLC can be characterized by a stage. A process of establishing an NSCLC stage may be referred to as “staging.” The staging helps determine a treatment plan and lung cancer prognosis. NSCLC stages range from one to four, usually expressed in numerals (0 through IV). The lower the lung cancer stage, the less the cancer has spread. Stage 0 (carcinoma / tumor in-situ) is an early-stage lung cancer that is only in the top lining of the lung or bronchus and has not spread. Stage I is divided into two sub-stages, 1 A and IB, based on the size of the tumor. In Stage I, the cancer has not spread to the lymph nodes or other parts of the body. Stage II is divided into stage IIA and IIB, with each stage then broken into additional sections, depending on the size of the tumor, where it is found, and whether or not the cancer has spread to the lymph nodes. These tumors may be larger than those in stage I and / or have begun to spread to nearby lymph nodes. In stage II, the cancer has not spread to distant organs. Stage III is divided into TITA, IIIB or IIIC, depending on the size and location of the tumor and how far it has spread. Most commonly, the cancer has spread to the lymph nodes in the mediastinum (the area in the chest between the lungs). Stage IV is the most advanced form of NSCLC, In this stage, the cancer has metastasized, or spread, to the lining of the lung or other areas of the body. A so-called “occult” NSCLC is NSCLC that cannot be detected by imaging or bronchoscopy, although the cancer may have spread to other parts of the body. Cancer cells may be found in sputum or bronchial washings (a sample of cells taken from inside the airways that lead to the lungs). Some of the NSCLC treatments are discussed elsewhere in the present disclosure.B. Polypeptides etc.

[0050] The terms “protein,” “peptide,” and “polypeptide” are used interchangeably to refer to a polymer of amino acid residues. The terms apply to naturally occurring amino acid polymers and non-natural ammo acid polymers, as well as to amino acid polymers in which one (or more) ammo acid residue is an artificial chemical mimetic of a corresponding naturally occurring amino acid. The terms encompass amino acid chains of any length, including full-length proteins, wherein the amino acid residues are linked by covalent peptide bonds.

[0051] An “isolated” or “purified” polypeptide or protein, or biologically active portion a polypeptide or a protein, is substantially or essentially free from components that normally accompany or interact with the polypeptide or protein as found in its naturally occurringAttorney Docket No. 110221 -1532974-011210WOenvironment. Thus, an isolated or purified polypeptide or protein is substantially free of other cellular material, or culture medium when produced by recombinant techniques, or substantially free of chemical precursors or other chemicals when chemically synthesized. A protein that is substantially free of cellular material includes preparations of protein having less than about 30%, 20%, 10%, 5%, 1%, 0.5%, or 0.1% (total protein) of contaminating protein.

[0052] The term “amino acid” refers to any monomeric unit that can be incorporated into a peptide, polypeptide, or protein. Amino acids include naturally occurring a-amino acids and their stereoisomers, as well as unnatural (non-naturally occurring) amino acids and their stereoisomers. “Stereoisomers” of a given ammo acid refer to isomers having the same molecular formula and intramolecular bonds but different three-dimensional arrangements of bonds and atoms (e.g., an L-amino acid and the corresponding D-amino acid).

[0053] Naturally occurring amino acids are those encoded by the genetic code, as well as those ammo acids that are later modified, e.g., hydroxyproline, y-carboxyglutamate, and O-phosphoserine. Naturally occurring a-amino acids include, without limitation, alanine (Ala), cysteine (Cys), aspartic acid (Asp), glutamic acid (Glu), phenylalanine (Phe), glycine (Gly), histidine (His), isoleucine (He), arginine (Arg), lysine (Lys), leucine (Leu), methionine (Met), asparagine (Asn), proline (Pro), glutamine (Gin), serine (Ser), threonine (Thr), valine (Vai), tryptophan (Trp), tyrosine (Tyr), and their combinations. Stereoisomers of a naturally occurring a-amino acids include, without limitation, D-alanine (D-Ala), D-cysteine (D-Cys), D-aspartic acid (D-Asp), D-glutamic acid (D-Glu), D-phenylalanine (D-Phe), D-histidine (D-His), D-isoleucine (D-Ile), D-arginine (D- / Xrg), D-lysine (D-Lys), D-leucine (D-Leu), D-methionme (D-Met), D-asparagine (D-Asn), D-proline (D-Pro), D-glutamine (D-Gln), D-serine (D-Ser), D-threonme (D-Thr), D-valine (D-Val), D-tryptophan (D-Trp), D-tyrosine (D-Tyr), and their combinations.

[0054] Unnatural (non-naturally occurring) ammo acids include, without limitation, amino acid analogs, amino acid mimetics, synthetic ammo acids, A-substituted glycines, and A-methyl ammo acids in either the L- or D-configuration that function in a manner similar to the naturally occurring amino acids. For example, “amino acid analogs” can be unnatural amino acids that have the same basic chemical structure as naturally occurring amino acids (z.e., a carbon that is bonded to a hydrogen, a carboxyl group, an amino group) but have modified side-chain groupsAttorney Docket No. 110221 -1532974-011210WOor modified peptide backbones, e.g., homoserine, norleucine, methionine sulfoxide, methionine methyl sulfonium. “Amino acid mimetics” refer to chemical compounds that have a structure that is different from the general chemical structure of an amino acid, but that functions in a manner similar to a naturally occurring amino acid. Amino acids may be referred to by either the commonly known three letter symbols or by the one-letter symbols recommended by the IUPAC-IUB Biochemical Nomenclature Commission.

[0055] The expression “conservatively modified variant” and related expression may apply to amino acid sequences, as well to nucleic acid sequences encoding amino acid sequence.Substitutions, deletions or additions to a nucleic acid, peptide, polypeptide, or protein sequence which alters, adds or deletes a single ammo acid or a small percentage of ammo acids in the encoded sequence is a “conservatively modified variant” where the alteration results in the substitution of an ammo acid with a chemically similar amino acid. Conservative substitution tables providing functionally similar ammo acids are well known in the art. Such conservatively modified variants are in addition to and do not exclude polymorphic variants, interspecies homologs, and alleles of the invention. The following eight groups each contain amino acids that are conservative substitutions for one another:1) Alanine (A), Glycine (G);2) Aspartic acid (D), Glutamic acid (E);3) Asparagine (N), Glutamine (Q);4) / Xrginine (R), Lysine (K);5) Isoleucine (I), Leucine (L), Methionine (M), Valine (V);6) Phenylalanine (F), Tyrosine (Y), Tryptophan (W);7) Serine (S), Threonine (T); and8) Cysteine (C), Methionine (M).

[0056] The terms “identity,” “substantial identity,” “similarity,” “substantial similarity,” “homology” and the related terms and expressions used in the context of describing nucleic acid or amino acid sequences refer to a sequence that has at least 60% sequence identity to a reference sequence. Examples include at least: 60%, 65%, 70%, 75%, 80%, 85%, 86%, 87%, 88%, 89%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, or 99%, sequence identity, as compared to a reference sequence using the programs for comparison of nucleic acid or amino acid sequences,Attorney Docket No. 110221 -1532974-011210WOsuch as BLAST using standard parameters. For sequence comparison, typically one sequence acts as a reference sequence to which test sequences are compared. When using a sequence comparison algorithm, test and reference sequences are entered into a computer, subsequence coordinates are designated, if necessary, and sequence algorithm program parameters are designated. Default (standard) program parameters can be used, or alternative parameters can be designated. The sequence comparison algorithm then calculates the percent sequence identities for the test sequences relative to the reference sequence, based on the program parameters. A “comparison window” includes reference to a segment of any one of the number of contiguous positions (from 20 to 600, usually about 50 to about 200, more commonly about 100 to about 150), in which a sequence may be compared to a reference sequence of the same number of contiguous positions after the two sequences are optimally aligned. Methods of alignment of sequences for comparison are well-known. Optimal alignment of sequences for comparison may be conducted, for example, by the local homology algorithm of Smith and Waterman (Smith and Waterman 1981) by the homology alignment algorithm of Needleman and Wunsch (Needleman and Wunsch 1970), by the search for similarity method of Pearson and Lipman (Pearson and Lipman 1988), by computerized implementations of these algorithms (for example, BLAST), or by manual alignment and visual inspection.

[0057] Algorithms that are suitable for determining percent sequence identity and sequence similarity include BLAST and BLAST 2.0 algorithms, which are described in Altschul el al. 1990, and Altschul et al. 1997, respectively. Software for performing BLAST analyses is publicly available through the National Center for Biotechnology Information (NCBI) web site. The algorithm involves first identifying high scoring sequence pairs (HSPs) by identifying short words of length W in the query sequence, which either match or satisfy some positive-valued threshold score T when aligned with a word of the same length in a database sequence. T is referred to as the neighborhood word score threshold. These initial neighborhood word hits act as seeds for initiating searches to find longer HSPs containing them. The word hits are then extended in both directions along each sequence for as far as the cumulative alignment score can be increased. Cumulative scores are calculated using, for nucleotide sequences, the parameters M (reward score for a pair of matching residues; always >0) and N (penalty score for mismatching residues; always <0). For amino acid sequences, a scoring matrix is used to calculate the cumulative score. Extension of the word hits in each direction are halted when: the cumulativeAttorney Docket No. 110221 -1532974-011210WOalignment score falls off by the quantity X from its maximum achieved value; the cumulative score goes to zero or below, due to the accumulation of one or more negative-scoring residue alignments; or the end of either sequence is reached. The BLAST algorithm parameters W, T, and X determine the sensitivity and speed of the alignment. The BLASTN program (for nucleotide sequences) uses as defaults a word size (W) of 28, an expectation (E) of 10, M=1, N=-2, and a comparison of both strands. For amino acid sequences, the BLASTP program uses as defaults a word size (W) of 3, an expectation (E) of 10, and the BLOSUM62 scoring matrix (Henikoff and Henikoff 1989), The BLAST algorithm also performs a statistical analysis of the similarity between two sequences (Karlin and Altschul 1993). One measure of similarity provided by the BLAST algorithm is the smallest sum probability (P(N)), which provides an indication of the probability by which a match between two nucl eotide or ammo acid sequences would occur by chance. For example, a nucleic acid is considered similar to a reference sequence if the smallest sum probability in a comparison of the test nucleic acid to the reference nucleic acid is less than about 0.01, more preferably less than about 10'5, and most preferably less than about 10'20.C. Antibodies and antigens

[0058] The term “antibody” and the related terms refer to an immunoglobulin or its fragment that binds to a particular spatial and polar organization of another molecule. Immunoglobulins include various classes and isotypes, such as IgA, IgD, IgE, IgGl, IgG2a, IgG2b and IgG3, IgG4, IgM, etc. Naturally occurring antibodies are encoded by immunoglobulin genes. The recognized immunoglobulin genes include the kappa, lambda, alpha, gamma, delta, epsilon and mu constant region genes, as well as myriad immunoglobulin variable region genes. Light chains are typically classified as either kappa or lambda. Heavy chains are typically classified as gamma, mu, alpha, delta, or epsilon, which in turn define the immunoglobulin classes. A typical immunoglobulin (antibody) structural unit is known to comprise a tetramer. Each tetramer is composed of two identical pairs of polypeptide chains, each pair having one “light” (about 25 kD) and one “heavy” chain (about 50-70 kD). The N-terminus of each chain defines a variable region of about 100 to 110 or more amino acids primarily responsible for antigen recognition. The terms “variable light chain” (VL) and “variable heavy chain” (VH) refer to these light and heavy' chains respectively. An antibody can be specific for a particular antigen. The antibody or its antigen can be either an analyte or a binding partner. The term “antibody”Attorney Docket No. 110221 -1532974-011210WOencompasses autoantibodies, which are produced by an immune system of a subject against individual’s own antigens.

[0059] The term “antigen” refers to a molecule, such as a polypeptide, containing one or more epitopes (either linear, conformational or both) that can stimulate a subject’s immune system to produce antigen-specific immune response. A polypeptide epitope may include between about 7 and 15 amino acids, such as, 9, 10, 12 or 15 amino acids. The term “antigen” may be used interchangeably with the term “immunogen.” The expression “antigenic polypeptide,” which can be used interchangeably with the expression “immunogenic polypeptide,” is used in the present disclosure to refer to a polypeptide containing one or more epitopes that can stimulate a subject’s immune system to produce antigen-specific immune response to the antigenic polypeptide. The term “epitope” or other related terms and expressions may be used in the present disclosure to polypeptides and / or amino acid sequences (which need not be contiguous) that are specifically recognized by antibodies. In other words, an antibody can specifically bind to an epitope, although an epitope, by itself, may not necessarily stimulate a subject’s immune system to produce antigen-specific immune response.Methods of detectionA. Detecting expression products

[0060] Embodiments of the present invention include methods for diagnosing the presence of NSCLC or increased risk of developing NSCLC. The methods may be embodied in a variety of ways. Some embodiments of the methods involve detecting or measuring (measuring the presence and / or amount of) one or more biomarkers associated with NSCLC. An example of such biomarkers are expression products of one or more genes encoding the biomarkers associated with NSCLC, Examples of these genes include the genes that are listed in Table 5 A, Table 5B, and / or Table 5C of the present disclosure. In the context of the present disclosure, an expression product may be a polypeptide or a nucleic acid, such as mRNA. In the context of the present disclosure, such polypeptides and nucleic acids may be referred to as “polypeptides of interest” or “nucleic acids of interest,” respectively. A variety of techniques may be used to detect or measure the expression product or products. In some examples, the techniques provide quantitative results. Some suitable techniques are discussed elsewhere in the present disclosure.Attorney Docket No. 110221 -1532974-011210WO

[0061] In some embodiments, biomarker combinations (meaning two or more biomarkers) are detected. For example, 2 or more, 3 or more, 4 or more, 5 or more, 10 or more, 20 or more, 30 or more, 40 or more, or 50 or more expression products from genes shown in Table 5 A, Table 5B, and / or Table 5C may be detected. In one more example, 2 or more, 3 or more, 4 or more, 5 or more, 10 or more, 20 or more, 30 or more, 40 or more, 50 or more, 100 or more, or 150 or more of expression products from genes shown in Table 5A and / or Table 5B may be measured or detected. In one more example, 2 or more, 3 or more, 4 or more, 5 or more, 10 or more, 20 or more, 30 or more, 40 or more, 50 or more, 100 or more, 150 or more, 200 or more, 250 or more, 300 or more, 350 or more, 400 or more, or 450 or more of expression products from genes shown in Table 5 A may be measured or detected. In some embodiments, expression products of all the genes shown in Table 5 A, Table 5B, and / or Table 5C are measured or detected. In some embodiments, expression products of at least one (one, two, three, or four) of PRRC2A, LAX1, QTRT1, or NRAC are measured or detected.

[0062] Some embodiments described in the present disclosure are methods to measure the presence and / or amount of one or more biomarkers associated with NSCLC in a subject, which involve measuring in a biological sample obtained from the subject an amount of an expression product from one or more genes encoding the biomarkers associated with NSCLC. Some other embodiments described in the present disclosure are methods of identifying a subject at risk for NSCLC, which involve measuring in a biological sample obtained from the subject an amount of an expression product from one or more genes encoding the biomarkers associated with NSCLC, wherein the presence of an altered level of the expression product from the biomarker associated with NSCLC, as compared to an amount measured m a healthy control or a pre-established amount, identifies the subject as being at risk for NSCLC. Some other embodiments described in the present disclosure are methods of identifying a subject with NSCLC and treating the subject, which involve measuring in a biological sample obtained from the subject an altered amount of an expression product, as compared to an amount measured in a healthy control or a pre- established amount, from one or more genes encoding the biomarkers associated with NSCLC, and administering to the subject one or more SCLC treatments, which are discussed elsewhere in the present disclosure. Some other embodiments described in the present disclosure are methods of identifying subjects at risk for NSCLC and monitoring the subjects. Such methods involve measuring m a biological sample obtained from the subject an altered amount, asAttorney Docket No. 110221 -1532974-011210WOcompared to an amount measured in a healthy control or a pre-established amount, of an expression product from one or more genes encoding one or more biomarkers associated with NSCLC, and then repeating the measurements at a later time point to determine if an altered amount measured at the later time point increased or decreased, as compared to the initially altered amount. V arious biological samples may be used in the methods according to the present disclosure. A biological sample may be or may include (comprise) at least one biological fluid or tissue. Some non-limiting examples of biological fluids are blood, lymph, plasma, serum, interstitial fluid, phlegm, sputum, or pleural fluid. Some non-limiting examples of biological tissues are lung tissue or bronchial tissue. A biological sample may be a liquid sample. Kits, compositions, systems and devices useful for performing the methods described in this section are described elsewhere in the present disclosure.B. Detecting antibodies

[0063] Some embodiments of the methods according to the present disclosure involve detecting or measuring (measuring the presence and / or amount of) of one or more antibodies to one or more biomarkers associated with NSCLC in a subject. In such methods, biomarkers are one or more polypeptides (biomarker polypeptides) encoded by one or more genes encoding the biomarkers associated with NSCLC. Examples of these genes include the genes that are shown in Table 5 A, Table 5B, and / or Table 5C of the present disclosure. In some embodiments, antibody combinations (meaning antibodies to two or more biomarkers) are detected. For example, antibodies to 2 or more, 3 or more, 4 or more, 5 or more, 10 or more, 20 or more, 30 or more, 40 or more, or 50 or more polypeptides encoded by genes shown in Table 5A, Table 5B, and / or Table 5C may be detected. In one more example, antibodies to 2 or more, 3 or more, 4 or more, 5 or more, 10 or more, 20 or more, 30 or more, 40 or more, 50 or more, 100 or more, or 150 or more of polypeptides encoded by genes shown in Table 5A and / or Table 5B may be measured or detected. In one more example, antibodies to 2 or more, 3 or more, 4 or more, 5 or more, 10 or more, 20 or more, 30 or more, 40 or more, 50 or more, 100 or more, 150 or more, 200 or more, 250 or more, 300 or more, 350 or more, 400 or more, or 450 or more of polypeptides encoded by genes shown in Table 5A may be measured or detected. In some embodiments, antibodies to one polypeptides encoded by all the genes shown in Table 5A, Table 5B, and / or Table 4D are measured or detected. In some embodiments, antibodies to one or more polypeptides encoded by at least one (two, three, or four) of PRRC2A, LAX1, QTRT1, or NRACAttorney Docket No. 110221 -1532974-011210WOare measured or detected. Various biomarker polypeptides are described elsewhere m the present disclosure.

[0064] Some embodiments described in the present disclosure are methods of detecting presence and / or amount of one or more antibodies (which can be autoantibodies) that specifically bind to one or more bioniarker polypeptides associated with NSCLC in a subject. Such methods involve contacting a biological sample, which was obtained from the subject, with the one or more bioniarker polypeptides, detecting binding of the one or more bioniarker polypeptides with the one or more antibodies (which can be autoantibodies) in the biological sample. Some embodiments described in the present disclosure are methods of identifying a subject at risk for NSCLC. Such methods involve contacting a biological sample, which was obtained from the individual, with one or more bioniarker polypeptides, and detecting binding of the one or more biomarker polypeptides with one or more antibodies in the biological sample. In this situation, the presence of binding of the one or more polypeptides with the one or more antibodies in the biological sample identifies the subject as being at risk for NSCLC. Some embodiments described in the present disclosure are methods of detecting presence and / or amount of one or more antibodies to one or more biomarker polypeptides associated with NSCLC in a subject. Such methods involve contacting a biological sample, which was obtained from the subject, with the one or more biomarkers polypeptides, detecting presence of binding of the one or more biomarker polypeptides to the one or more antibodies in the biological sample, and administering to the subject one or more NSCLC treatments. Some NSCLC treatments are described elsewhere in the present disclosure. Various biological samples may be used in the methods according to the present disclosure. A biological sample may be or may include (comprise) at least one biological fluid or tissue. Some non-limiting examples of biological fluids are blood, lymph, plasma, serum, interstitial fluid, phlegm, sputum, or pleural fluid. Some non-limiting examples of biological tissues are lung tissue or bronchial tissue. A biological sample may be a liquid sample. Kits, compositions, systems and devices useful for performing the methods described in this section are described elsewhere in the present disclosure.

[0065] A step of contacting a biological sample comprising antibodies associated with a risk of NSCLC may be carried out by incubating an immobilized form of a biomarker polypeptide according to the present disclosure, in the presence of a biological sample and under conditionsAttorney Docket No. 110221 -1532974-011210WOthat are compatible with the formation of an antibody-polypeptide complex, such as a complex comprising a biomarker polypeptide and an antibody (which can be an autoantibody) that specifically binds to the biomarker polypeptide. Optionally, one or more washing steps may be contemplated.

[0066] In some embodiments, a biological sample is contacted with one or more biomarker polypeptides according to the present disclosure and a secondary antibody. The secondary antibody is an antibody raised against the IgG of the animal species in which the primary' antibody originated. Secondary antibodies bind to the primary antibody to assist in detection, sorting and purification of target antigens to which a specific primary antibody is first bound. The secondary antibody must have specificity both for the antibody species as well as the isotype of the primary antibody being used. For example, if an antibody to the biomarker polypeptide is present m the biological sample, under appropriate conditions, a complex is formed between the biomarker polypeptide, the antibody to the biomarker polypeptide antibody in the biological sample (primary antibody), and the secondary antibody.

[0067] In some embodiments, a method according to the present disclosure involves contacting one or more polypeptides identified by the inventors, such as a biomarker polypeptide, with a biological sample from a subject and a secondary antibody having a suitable label under conditions in which an antibody-polypeptide complex is formed, such as a complex between the biomarker polypeptide and a corresponding antibody to the biomarker polypeptide in the biological sample, if present, and the secondary antibody; and detecting the complex formed, if formed, by detecting the label of the secondary antibody, wherein the presence of the secondary antibody is indicative of the presence of an antibody associated with a risk of NSCLC in the biological sample, and wherein the absence of the secondary antibody is indicative of the absence of an antibody associated with a risk of NSCLC in the biological sample. In some instances, the secondary antibody is detectably labeled.

[0068] Immobilization of the appropriate biomarker polypeptides on a solid carrier can facilitate the method of antibody detection. In some instances, the method comprises contacting one or more biomarker polypeptides having a suitable label thereon with a biological sample from a subject, immunoprecipitating any complex formed between the polypeptide and the antibody in the biological sample, and monitoring for said label on any of said complexes,Attorney Docket No. 110221 -1532974-011210WOwherein the presence of said label is indicative of the presence of an antibody associated with a risk of NSCLC in the biological sample and the absence of said label is indicative of the absence of an antibody associated with a risk of NSCLC m the biological sample. In some instances, the method comprises a combination of immunoprecipitation and Western blot analysis to detect the presence of antibodies associated with a risk of NSCLC in a biological sample. For example, the method may comprise contacting one or more biomarker with a biological sample from a subject under conditions in which a complex is formed between polypeptide and the antibody in the biological sample, if present; immunoprecipitating any complex formed to produce an immunoprecipitate comprising any such complex formed; separating components of the immunoprecipitate from each other (e.g., by electrophoresis), said components comprising the polypeptide and the corresponding antibodyin the biological sample, if present; and contacting the components of the immunoprecipitate with a secondary antibody having a suitable label thereon that specifically binds to a constant region of the antibody if present; and detecting the complex formed, if formed, by detecting the label of the secondary antibody, wherein the presence of the secondary antibody is indicative of the presence of an antibody associated with a risk of NSCLC in the biological sample, and wherein the absence of the secondary antibody is indicative of the absence of an antibody associated with a risk of NSCLC in the biological sample. For example, immunoprecipitation assay may be performed to detect the presence of antibodies associated with a risk of NSCLC in a subject by contacting a biomarker with a biological sample from the subject. Exemplary labels include any of the detectable labels described in this disclosure including, for example, but not limited to, fluorescent dyes and radioactive labels.

[0069] In some embodiments of the methods according to the present disclosure, a biomarker polypeptide is heterologously expressed. Heterologous expression is accomplished by introduction of a nucleic acid encoding for a protein of interest from one species into the cell of another species (“host cell”), such that the host cell’s cellular machinery expresses the foreign protein. For example, a heterologously expressed biomarker polypeptide is a biomarker polypeptide expressed by a host cell, into which a nucleic acid encoding for the biomarker polypeptide from a species different from the host cell species was introduced by appropriate molecular biology techniques.Attorney Docket No. 110221 -1532974-011210WO

[0070] In some embodiments of the methods according to the present disclosure, a biomarker polypeptide is heterologously expressed on the surface of a cell. For example, a vector comprising the coding sequence of a biomarker polypeptide operably linked to a promoter can be introduced into a cell. The vector may comprise elements that cause the biomarker polypeptide, to be expressed on the surface of the cell. For example, the biomarker polypeptide may be expressed as fusion proteins with a membrane protein on the surface of the cell. In some instances, the cell is a bacteria cell or a eukaryotic cell. For example, the eukaryotic cell may be a yeast cell or a mammalian cell such as a human cell. Methods of transfection and transduction of cells to introduce recombinant nucleic acids are well known. For example, a 293T cell-based expression assay can be used to detect antibodies associated with a risk of NSCLCin a biological sample.

[0071] In instances where the biomarker polypeptide used m the methods according to the present disclosure is in a phage display or eukaryotic cell display library', the presence of an antibody associated with a risk of NSCLC in a biological sample from a subject is assessed by contacting the biological sample with a phage display or eukaryotic cell display library. An appropriate display library includes a plurality of eukaryotic cells or phage that express a plurality of peptides, including various biomarker polypeptides according to the present disclosure, on the surface of the eukaryotic cells or phage. For example, various biomarker polypeptides may be expressed as fusion proteins with a membrane protein on the surface of the eukaryotic cells or phage. Each cell or phage in the library expresses a different biomarker polypeptide. In some instances, the eukaryotic cell may be a yeast cell or a mammalian cell such as a human cell. The biological sample can be assayed to detect whether there is specific protein¬ protein interaction with any of the peptides expressed on the surface of the eukaryotic cells or phage. Methods of detecting protein-protein interactions using phage display are well-known in the art. For example, a polypeptide comprising a putative epitope may be bound to a solid support and the phage library applied thereto. After washing the solid support, any phage that remain bound to the solid support may express an antibody that can bind specifically to a corresponding biomarker polypeptide. The phage DNA is isolated (after bacterial amplification) and sequenced to identify the sequence of the peptide expressed by the phage. Such peptides may then be further assessed for specific binding to the putative immunogen such as, for example, by immunoprecipitation, Western blot, or other immunoassay. In some instances, where the displayAttorney Docket No. 110221 -1532974-011210WOlibrary comprises eukaryotic cells, specific protein-protein interaction with any of the peptides may be assessed by flow cytometry’. In some instances, the eukaryotic cells of the display library may be yeast cells. In some instances, the eukaryotic cells of the binding pool may be mammalian cells such as human cells. The peptides expressed on the cells of the display library may be fluorescently labeled (see discussion above regarding detectably labeled secondary’ antibodies for exemplary fluorescent labels). The biological sample and the display library’ may be combined, and FACS analysis performed to identify cells that express peptides that are bound specifically to antibody associated with a risk of NSCLC. In some instances, the identified cells may then be expanded in vitro, and the DNA or the RNA analyzed, such as by next generation sequencing. In some instances, single cell PCR may be performed followed by RNA and / or DNA sequence analysis. Other exemplary methods for assessing protein -protein interactions between a biological sample that contains an antibody associated with a risk of NSCLC and a display library include those described in Jardine etal. 2013 and McGuire et al. 2014, See also WO 2020 / 190700, which is incorporated by reference for the teaching of the phage display system.

[0072] In some instances, more than one of the detection methods described above or elsewhere in the present disclosure may be used in a complementary manner for more reliable results. In some embodiments, other immunoassays can be performed either in alternative to or before and / or after the immunohistochemistry methods. For example, a Western blot may be performed using, for example, a panel of known antigens associated with antibodies, the panel including one or more biomarker polypeptides according to the present disclosure. The results of such a Western blot may warrant further evaluation using, for example, the immunohistochemistry methods described herein. In another example, an immunohistochemistry method as described herein may be performed, followed by a Western blot in order to, for example, further confirm the specific antigens, including one or more biomarker polypeptides according to the present disclosure, recognized by the antibodies in the biological sample. In another example, a phage or eukaryotic cell display library that includes a plurality of eukaryotic cells or phage that express a plurality of peptides, including one or more biomarker polypeptides according to the present disclosure on the surface of the eukaryotic cells or phage may be used to assess for the presence of antibodies associated with a risk of NSCLC in a biological sample, and then followed by a suitable radioligand binding assay method or an immunohistochemistryAttorney Docket No. 110221 -1532974-011210WOmethod. In another example, the biological sample may be assessed by a radioligand binding assay method first, with confirmation by assessing the sample using a phage or eukaryotic cell display library.

[0073] Any data demonstrating the presence or absence of an antibody associated with a risk of NSCLC may be correlated with reference data. For example, detection of an antibody to a biomarker protein in a biological sample may indicate that the subject whom the sample was obtained may be having the NSCLC or at risk for NSCLC. In one more example, detection of an altered amount of an antibody to a biomarker protein in a biological sample, as compared to a reference amount of the antibody, may indicate that the subject whom the sample was obtained may be having the NSCLC or at risk for NSCLC. The reference amount may be an amount of an antibody in a biological sample obtained from a healthy subject and / or a subject known to be free of NSCLC. A reference amount may be an amount calculated based on the data obtained a biological sample obtained from a group of healthy subject and / or subject known to be free of NSCLC. In some cases, the reference amount may be obtained from a pre-established source or database. If the subject has been previously identified as being at risk of NSCLC or having NSCLC, am amount of an antibody to a biomarker protein in a biological sample obtained from the subject and detected at the time of prior diagnosis and in the present time may be correlated to find out about the progression of the risk and / or the success of a treatment. For example, if the amount of an antibody to a biomarker protein in a biological sample obtained from the subject is found to increase, it may be concluded that the risk of NSCLC is increasing, that NSCLC is progressing, and / or that a treatment attempted is unsuccessful.C. Laboratory methods for detecting polypeptides

[0074] As discussed above and elsewhere in the present disclosure, some embodiments of the present invention involve detection of polypeptides. For example, some methods of the present disclosure involve detection of polypeptides encoded by one or more genes listed in Table 5 A, Table 5B, or Table 5C. Some methods of the present disclosure involve detection of antibodies, which are also polypeptides. Various laboratory methods may be used for polypeptide detection.

[0075] For example, specific antibodies that recognize one or more polypeptides of interest can be used. Antibodies against particular epitopes, polypeptides, and / or proteins an be generatedAttorney Docket No. 110221 -1532974-011210WOusing any of a variety of known methods. For example, an epitope, polypeptide, or protein against which an antibody is desired can be produced and injected into an animal, typically a mammal (such as a donkey, mouse, rabbit, horse, chicken, etc.), and antibodies produced by the animal can be collected from the animal. Monoclonal antibodies can also be produced by generating hybridomas that express an antibody of interest with an immortal cell line. In some embodiments, antibodies can be labeled with detectable moieties, which are described elsewhere in the present disclosure.

[0076] Antibody-based detection methods are well known in the art including, but are not limited to, enzyme-linked immunoadsorbent assays (ELISAs) and Western blots. Some such methods are amenable to being performed in an array format. For example, in some embodiments, a polypeptide of interest is detected using a first antibody (or antibody fragment) that specifically recognizes the polypeptide of interest. The antibody may be labeled with a detectable moiety (e.g,, a chemiluminescent molecule), an enzyme, or a second binding agent (e.g., streptavidin). The first antibody may be detected using a second antibody.

[0077] In certain embodiments of antibody-based detection methods, a capture support may be used. In some examples, a capture support includes at least one capture support binding agent that recognizes and binds to a polypeptide of interest so as to immobilize it on the capture support. A second binding agent that can specifically recognize and bind to at least some of the plurality binding agent molecules and / or the polypeptides immobilized the capture support. For example, a binding agent that can specifically recognize and bind to at least some of the binding agent molecules and / or the polypeptide of interest immobilized on the capture support may be a soluble binding agent (e.g., a secondary antibody). The second binding agent may be labeled with a detectable moiety, such that binding of the polypeptide of interest is measured by adding a substrate for the enzyme and quantifying the amount of product formed. In some examples, a capture solid support may be an assay well (i.e., such as a microtiter plate). A capture solid support may be a location on an array, or a mobile support, such as a bead.A capture support may be a filter.

[0078] In some cases, a polypeptide of interest may be allowed to complex with a first binding agent (e.g., primary antibody specific for a polypeptide of interest and labeled withAttorney Docket No. 110221 -1532974-011210WOdetectable moiety) and a second binding agent (e.g., a secondary antibody that recognizes the primary antibody or a second primary' antibody), where the second binding agent is complexed to a third binding agent (e.g., biotin) that can then interact with a capture support (e.g., magnetic bead) having a reagent (e.g., streptavidin) that recognizes the third binding agent linked to the capture support. The complex (labeled primary' antibody: biomarker: second primary antibody-biotin: streptavi din-bead) may then be captured using a magnet (e.g., a magnetic probe) to measure the amount of the complex.

[0079] A variety of binding agents may be used in the methods, devices, systems and kits according to the present disclosure. For example, a binding agent attached to the capture support, or the second antibody, may be either an antibody or an antibody fragment that a polypeptide of interest. In certain embodiments, a capture support may be treated with a passivating agent. For example, in certain embodiments a polypeptide of interest of interest may be captured on a passivated surface (i.e,, a surface that has been treated to reduce non-specific binding). One such passivating agent is bovine serum albumin (BSA). Additionally and / or alternatively, where the binding agent used is an antibody, a capture support may be coated with protein A, protein G, protein A / G, protein L, or another agent that binds with high affinity to the binding agent (e.g,, antibody). These proteins bind the Fc domain of antibodies and thus can orient the binding of antibodies that recognize a polypeptide of interest,

[0080] An antibody-polypeptide complex, such as complex comprising an antibody to a polypeptide of interest and the polypeptide of interest, may be detected using a variety of methods, such as, but not limited to, immunofluorescence microscopy or spectroscopy, luminescence, nuclear magnetic resonance (NMR) spectroscopy, immunodiffusion, radioactivity, chemical crosslinking, surface plasmon resonance, native gel electrophoresis, or enzymatic activity. Depending on the nature of the sample, either or both immunoassays and immunocytochemical staining techniques may be used. Enzyme-linked immunosorbent assays (ELISA), Western blot, and radioimmunoassays can be used to detect antibodies associated with a risk of NSCLC in a biological sample. While some of these methods allow for direct detection of the antibody-polypeptide complex, such as such as complex comprising an antibody to a biomarker polypeptide and the biomarker polypeptide, in some embodiments, the secondary antibody is labeled such that the complex may be detected specifically owing to intrinsicAttorney Docket No. 110221 -1532974-011210WOproperties of the label such as, for example, fluorescence, radioactivity, enzymatic activity, visibility in NMR or magnetic resonance imaging (MRI) spectra or the like. In some embodiments, the detection method may include, but is not limited to, performing any one or more of Western blot, dot blot, protein microarray, ELISA, line blot radioimmune assay, immunoprecipitation, indirect immunofluorescence microscopy, radioimmunoassay, radioimmunodiffusion, Ouchterlony immunodiffusion, rocket Immunoelectrophoresis, immunohistostaining, complement fixation assay, fluorescence-activated cell sorting (FACS), and protein chip. Antibodies associated with a risk of NSCLC m a biological sample may be detected by immunohistochemistry. Immunohistochemical methods are well known, and nonlimiting exemplary methods are described in U. S. Pat. Nos. 5,073,504; 5,225,325; and 6,855,552. See also Dabbs, Diagnostic Immunohistochemistry, 2ndEd,, 2006, Churchill Livingstone; and Chu & Weiss, Modern Immunohistochemistry, 2009, Cambridge University Press. It is to be understood that immunohistochemistry routinely includes steps that are not necessarily discussed herein in detail, such as washing the tissue samples to remove unbound secondary antibodies and the parallel staining experiments with proper controls.D. Laboratory methods for detecting nucleic acids

[0081] As discussed above and elsewhere in the present disclosure, some embodiments of the present invention involve detection of nucleic acid. For example, some methods of the present disclosure involve detection of mRNAs encoded one or more genes listed in Table 5A, Table 5B, or Table 5C. Various laboratory methods may be used for nucleic acid detection. For example, nucleic acid can be analyzed by sequencing, hybridization, PCR amplification, restriction enzyme digestion, etc. In some laboratory methods for nucleic acid detection, nucleic acids are extracted from a biological sample. In some embodiments, nucleic acids are analyzed without having been amplified. In some embodiments, nucleic acids are amplified using various techniques, such as generating cDNA that is amplified using the polymerase chain reaction (PCR)) and amplified nucleic acids are used in subsequent analyses. Multiplex PCR, in which several amplicons (e.g., from different genomic regions) are amplified at once using multiple sets of primer pairs, may be employed.

[0082] In certain embodiments, mRNA can be analyzed using droplet-digital PCR, e.g., duplex ddPCR or multiplex ddPCR. In digital PCR, individual PCR reactions are partitioned intoAttorney Docket No. 110221 -1532974-011210WOseveral hundred to millions of individual wells or, as in droplet digital PCR (ddPCR), small volume water-oil emulsion droplets. Following PCR amplification, each partition is counted as either positive or negative. The ratio of positive partitions (k) over the total number of partitions (») is used to calculated the initial concentration (C) with a Poisson distribution as C = -ln(1 -k / n).

[0083] In certain embodiments, mRNA is analyzed using real-time and / or reverse¬ transcriptase PCR using known methods and / or commercial reagents and / or kits. “Real-time PCR” or rPCR is a method for detecting and measuring products generated during each cycle of a PCR, which are proportionate to the amount of template nucleic acid prior to the start of PCR. The information obtained, such as an amplification curve, can be used to determine the presence of a target nucleic acid and / or quantitate the initial amounts of a target nucleic acid sequence. The term “real-time PCR” is used to denote a subset of PCR techniques that allow for detection of PCR product throughout the PCR reaction, or in real-time. In some embodiments, rPCR is real time reverse transcriptase (RT) real-time PCR (rRT-PCR).

[0084] Reverse transcriptase PCR is used when the starting material is RNA and / or mRNA. RNA is first transcribed into complementary DNA (cDNA) by reverse transcriptase. In rRT- PCR, the cDNA is then used as the template for the qPCR reaction. rRT-PCR can be performed in a one-step method, which combines reverse transcription and PCR in a single tube and buffer, using a reverse transcriptase along with a DNA polymerase. In one-step rRT-PCR, both RNA and DNA targets are amplified using sequence-specific targets. The term “quantitative PCR” encompasses all PCR-based techniques that allow for quantitative or semi-quantitative determination of the initially present target nucleic acid sequences.

[0085] The principles of real-time PCR (rPCR) are generally described, for example, in Held et al. “Real Time Quantitative PCR” Genome Research 6:986-994 (1996). Generally, rPCR measures a signal at each amplification cycle. Some rPCR techniques rely on fluorophores that emit a signal at the completion of every multiplication cycle. Examples of such fluorophores are fluorescence dyes that emit fluorescence at a defined wavelength upon binding to double¬ stranded DNA, such as SYBR green. An increase in double-stranded DNA during each amplification cycle thus leads to an increase in fluorescence intensity due to accumulation of PCR product. Another example of fluorophores used for detection in rPCR are sequence-specificAttorney Docket No. 110221 -1532974-011210WOfluorescent reporter probes. The examples of such probes are TAQMAN® probes. The use of sequence-specific reporter probe provides for detection of a target sequence with high specificity, and enables quantification even in the presence of non-specific DNA amplification. Fluorescent probes can also be used in multiplex assays — for detection of several genes in the same reaction — based on specific probes with different-colored labels. For example, a multiplex assay can use several sequence-specific probes, labeled with a variety of fluorophores, including, but not limited to, FAM, JA270, CY5.5, and / or HEX, in the same PCR reaction mixture.

[0086] rPCR relies on detection of a measurable parameter, such as fluorescence, during the course of the PCR reaction. The amount of the measurable parameter is proportional to the amount of the PCR product, which allows one to observe the increase of the PCR product “in real time.” Some rPCR methods allow for quantification of the input DNA template based on the observable progress of the PCR reaction. A “growth curve” or “amplification curve” in the context of a nucleic acid amplification assay is a graph of a function, where an independent variable is the number of amplification cycles and a dependent variable is an amplificationdependent measurable parameter measured at each cycle of amplification, such as fluorescence emitted by a fluorophore. As discussed above, the amount of amplified target nucleic acid can be detected using a fluorophore-labeled probe. Typically, the amplification-dependent measurable parameter is the amount of fluorescence emitted by the probe upon hybridization, or upon the hydrolysis of the probe by the nuclease activity of the nucleic acid polymerase. The increase in fluorescence emission is measured in real time and is directly related to the increase in target nucleic acid amplification. In some examples, the change in fluorescence (dRn) is calculated using the equation dRn = Rn+ - Rn-, with Rn+ being the fluorescence emission of the product at each time point and Rn- being the fluorescence emission of the baseline. The dRn values are plotted against cycle number, resulting in amplification plots. In a typical polymerase chain reaction, a growth curve contains a segment of exponential growth followed by a plateau, resulting in a sigmoidal-shaped amplification plot when using a linear scale. A growth curve is characterized by a “cross point” value or “CP” value, which can be also termed “threshold value” or “cycle threshold” (Ct), which is a number of cycles where a predetermined magnitude of the measurable parameter is achieved. For example, when a fluorophore-labeled probe is employed, the threshold value (Ct) is the PCR cycle number at which the fluorescence emission (dRn) exceeds a chosen threshold, which is typically 10 times the standard deviation of the baselineAttorney Docket No. 110221 -1532974-011210WO(this threshold level can, however, be changed if desired). A lower Ct value represents more rapid completion of amplification, while the higher Ct value represents slower completion of amplification. Where efficiency of amplification is similar, the lower Ct value is reflective of a higher starting amount of the target nucleic acid, while the higher Ct value is reflective of a lower starting amount of the target nucleic acid. Where a control nucleic acid of known concentration is used to generate a “standard curve,” or a set of “control” Ct values at various known concentrations of a control nucleic acid, it becomes possible to determine the absolute amount of the target nucleic acid m the sample by comparing Ct values of the target and control nucleic acids.

[0087] In some embodiments, an oligonucleotide ligation assay (“OLA” or “OL”) may be used for nucleic acid detection. OLA employs two oligonucleotides that are designed to be capable of hybridizing to abutting sequences of a single strand of a target molecules. Typically, one of the oligonucleotides is biotinylated, and the other is detectably labeled, e.g., with a streptavidin-conjugated fluorescent moiety. If the precise complementary sequence is found m a target molecule, the oligonucleotides will hybridize such that their termini abut, and create a ligation substrate that can be captured and detected. See e.g., Nickerson etal. (1990) Proc, Natl. Acad. Sci, U. S. A, 87:8923-8927, Landegren, U. el al. (1988) Science 241:1077-1080, and U. S. Pat. No. 4,998,617.

[0088] In some embodiments, nucleic acids can be analyzed by hybridization using one or more oligonucleotide probes specific for a nucleic acid of interest of interest and under conditions sufficiently stringent to disallow a single nucleotide mismatch. Nucleic acid hybridization techniques are well known, and it is well established how to estimate and adjust the stringency of hybridization conditions such that sequences having at least a desired level of complementary will stably hybridize, while those having lower complementary will not. For examples of hybridization conditions and parameters, see, e.g., Sambrook, etal., 1989, Molecular Cloning: A Laboratory Manual, Second Edition, Cold Spring Harbor Press, Plainview, N. Y.; Ausubel, F. M. etal. 1994, Current Protocols in Molecular Biology. John Wiley & Sons, Secaucus, N. J. In some embodiments, probe molecules that hybridize to nucleic acid sequences of interest can be used for detecting such sequences in the amplified product byAttorney Docket No. 110221 -1532974-011210WOsolution phase or, more preferably, solid phase hybridization. Solid phase hybridization can be achieved, for example, by attaching probes to a microchip.

[0089] Nucleic acid probes may comprise ribonucleic acids and / or deoxyribonucleic acids. In some embodiments, provided nucleic acid probes are oligonucleotides (i.e., “oligonucleotide probes”). Generally, oligonucleotide probes are long enough to bind specifically to a homologous region of the gene of interest, but short enough such that a difference of one nucleotide between the probe and the nucleic acid sample being tested disrupts hybridization. Typically, the sizes of oligonucleotide probes vary from approximately 10 to 100 nucleotides. In some embodiments, oligonucleotide probes vary from 15 to 90, 15 to 80, 15 to 70, 15 to 60, 15 to 50, 15 to 40, 15 to 35, 15 to 30, 18 to 30, or 18 to 26 nucleotides in length. An optimal length of an oligonucleotide probe may depend on the particular methods and / or conditions in which the oligonucleotide probe may be employed. In some embodiments, nucleic acid probes can be used as primers, e.g., for nucleic acid amplification and / or extension reactions. [In some embodiments, nucleic acid probes and / or primers are labeled with one or more detectable moieties as described in the present disclosure.E. Arrays

[0090] A variety of the methods according to the present disclosure may be adapted for use as arrays that allow sets of polypeptides or nucleic acids to be analyzed and / or detected in a single experiment. For example, methods that involve use of nucleic acid reagents (e.g., probes, primers, oligonucleotides, etc.) are amenable for adaptation to an array-based platform (e.g., microarray). In one more example, protein arrays, which can be also referred to as “protein chips” may be used to detect polypeptides of interest according to the present disclosure. A protein microarray (or protein chip) includes a support surface, such as a glass slide, nitrocellulose membrane, bead, or microtiter plate, to which an array of capture proteins is bound. Capture proteins may be antibodies. Capture proteins may be polypeptides according to the present disclosure, such as one or more polypeptides (biomarker polypeptides) encoded by one or more genes encoding the biomarkers associated with NSCLC. polypeptides) encoded by one or more genes encoding the biomarkers associated with NSCLC. Examples of these genes include the genes that are shown in Table 5A, Table 5B, and / or Table 5C of the present disclosure. Probe molecules, typically labeled with a fluorescent dye, are added to the array. AnyAttorney Docket No. 110221 -1532974-011210WOreaction between the probe and the immobilized protein emits a fluorescent signal that can read by a laser scanner.F. Detectable moieties

[0091] Certain molecules (e.g., nucleic acid probes, polypeptides, antibodies, etc.) used in the embodiments of the present invention are labeled detectable entities or moieties, i.e., such molecules are “labeled” with such entities or moieties, which can be referred to as “labels” or “detectable labels.” A variety of detectable moieties can be used. Suitable detectable moieties include, but are not limited to: various ligands, radionucleotides; fluorescent dyes; chemiluminescent agents (such as acridinium esters, stabilized dioxetanes, and the like); bioluminescent agents; spectrally resolvable inorganic fluorescent semiconductors nanocrystals (e.g., quantum dots); microparticles; metal nanoparticles (e.g., gold, silver, copper, platinum); nanoclusters; paramagnetic metal ions; enzymes; colorimetric labels (such as, for example, dyes, colloidal gold, and the like); biotin; dioxigenin; haptens; and proteins for which antisera or monoclonal antibodies are available. Another labeling technique which may result in greater sensitivity is the coupling the antibodies to low molecular weight haptens. These haptens can then be specifically altered by means of a second reaction. For example, it is common to use haptens such as biotin, which reacts with avidin, or dimtrophenol, pyridoxal, or fluorescein, which can react with specific anti-hapten antibodies. Biotin can be bound to avidins (such as streptavidin), which are typically conjugated (directly or indirectly) to other moieties (e.g., fluorescent moieties) that are detectable themselves.

[0092] In certain embodiments, a detectable moiety is a fluorescent dye. Numerous known fluorescent dyes of a wide variety of chemical structures and physical characteristics are known. A fluorescent detectable moiety can be stimulated by a laser with the emitted light captured by a detector. The detector can be a charge-coupled device (CCD) or a confocal microscope, which records its intensity. Suitable fluorescent dyes include, but are not limited to, fluorescein and fluorescein dyes (e.g., fluorescein isothiocyanate or FITC, naphthofluorescein, 4’,5’-dichloro-2’, T - dimethoxyfluorescein, 6-carboxyfluorescein or FAM), hexachloro-fluorescem (HEX), carbocyanine, merocyanine, styryl dyes, oxonol dyes, phycoerythrin, erythrosin, eosin, rhodamine dyes (e.g., carboxytetramethylrhodamine or TAMRA, carboxyrhodamine 6G, carboxy-X-rhodamine (ROX), lissamine rhodamine B, rhodamine 6G, rhodamine Green,Attomey Docket No. 110221 -1532974-011210WOrhodamine Red, tetramethylrhodamine (TMR)), coumarin and coumarin dyes (e.g., methoxycoumarin, dialkylaminocoumarin, hydroxycoumarin, aminomethylcoumarin (AMCA)), Q-DOTS, Oregon Green Dyes (e.g., Oregon Green 488, Oregon Green 500, Oregon Green 514), Texas Red, Texas Red-X, SPECTRUM RED, SPECTRUM GREEN, cyanine dyes (e.g., CY-3, CY-5, CY-3.5, CY5.5), ALEXA FLUOR dyes (e.g., ALEXA FLUOR 350, ALEXA FLUOR 488, ALEXA FLUOR 532, ALEXA FLUOR 546, ALEXA FLUOR 568, ALEXA FLUOR 594, ALEXA FLUOR 633, ALEXA FLUOR 660, ALEXA FLUOR 680), BODIPY dyes (e.g., BODIPY FL, BODIPY R6G, BODIPY TMR, BODIPY TR, BODIPY 530 / 550, BODIPY 558 / 568, BODIPY 564 / 570, BODIPY 576 / 589, BODIPY 581 / 591, BODIPY 630 / 650, BODIPY 650 / 665), IRDyes (e.g., IRD40, IRD 700, IRD 800), and the like. For more examples of suitable fluorescent dyes and methods for coupling fluorescent dyes to other chemical entities such as proteins and peptides, see, for example, “Handbook of Molecular Probes and Research Reagents, 8thed. (2002), Molecular Probes, Eugene, Oregon, U. S. Patent Nos, 6,191,278, 6,372,907, 6,096,723, 5,945,526, 4,997,928, and 4,318,846. Favorable properties of fluorescent labeling agents include high molar absorption coefficient, high fluorescence quantum yield, and photostability. In some embodiments, labeling fluorophores exhibit absorption and emission wavelengths in the visible (i.e., between 400 and 750 nm) rather than in the ultraviolet range of the spectrum (i.e., lower than 400 nm).

[0093] A detectable moiety may include more than one chemical entity such as in fluorescent resonance energy transfer (FRET). Resonance transfer results an overall enhancement of the emission intensity. To achieve resonance energy transfer, the first fluorescent molecule (the "donor" fluor) absorbs light and transfers it through the resonance of excited electrons to the second fluorescent molecule (the "acceptor" fluor). In one approach, both the donor and acceptor dyes can be linked together and attached to the oligo primer. Methods to link donor and acceptor dyes to a nucleic acid have been described, for example, in U.S. Pat. No. 5,945,526.Donor / acceptor pairs of dyes that can be used include, for example, fluorescein / tetramethylrhodamine, IAEDANS / fluorescein, EDANS / DABCYL, fluorescein / fluorescein, BODIPY FL / BODIPY FL, and Fluorescein / QSY 7 dye. See, e.g., U. S. Pat. No. 5,945,526 to Lee et al. Many of these dyes also are commercially available, for instance, from Molecular Probes Inc. (Eugene, Oregon). Suitable donor fluorophores include 6-Attorney Docket No. 110221 -1532974-011210WOcarboxyfluorescein (FAM), tetrachloro-6-carboxyfluorescein (TET), 2’-chloro-7’-phenyl-l,4-dichloro-6-carboxyfluorescein (VIC), and the like.

[0094] A detectable moiety may be an enzyme. Examples of suitable enzymes include, but are not limited to, those used in an ELISA, e.g., horseradish peroxidase, beta-galactosidase, luciferase, alkaline phosphatase, etc. Other examples include beta-glucuronidase, beta-D-glucosidase, urease, glucose oxidase, etc. An enzyme may be conjugated to a molecule using a linker group such as a carbodiimide, a diisocyanate, a glutaraldehyde, and the like.

[0095] In certain embodiments, a detectable moiety is a radioactive isotope. For example, a molecule may be isotopically labeled (i.e., may contain one or more atoms that have been replaced by an atom having an atomic mass or mass number different from the atomic mass or mass number usually found in nature) or an isotope may be attached to the molecule. Nonlimiting examples of isotopes that can be incorporated into molecules include isotopes of hydrogen, carbon, fluorine, phosphorous, copper, gallium, yttrium, technetium, indium, iodine, rhenium, thallium, bismuth, astatine, samarium, and lutetium (e.g.,3H,13C,14C,18F,19F,32P,35S,64Cu,67Cu,67Ga,90Y,99mTc,111In,125I,123I,129I,131I,135I,186Re,187Re,201Tl,212Bi,213Bi,211At).

[0096] Detectable moiety can be a heterologous polypeptide, for example, FLAG, polyhistidine, hemagglutinin (HA), glutathione-S-transferase (GST), or maltose-binding protein (MBP). In some instances, the detectable label can be a heterologous polypeptide that is useful as diagnostic or detectable marker such as, for example, luciferase, a fluorescent protein (such as a green fluorescent protein (GFP)), or chloramphenicol acetyl transferase (CAT). Signal amplification may also achieved using labeled dendrimers as the detectable moiety (see, e.g., Stears etal, Physiol Genomics 3:93-99, 2000).Biomarkers

[0097] As discussed above, biomarkers according to the embodiments of the present disclosure may be expression products of the genes that are shown m Table 5A, Table 5B, and / or Table 5C of the present disclosure. Examples of such genes are PRRC2A, LAX1, QTRT1, and NRAC. An expression product may be a polypeptide (“biomarker polypeptide”) or a nucleic acid, such as mRNA (“biomarker mRNA”). In the context of the present disclosure, when an expression product is mRNA, it is understood to contain one or more RNA sequences that may serve as aAttorney Docket No. 110221 -1532974-011210WOtemplate, in a translation process, for one or more biomarker polypeptides according to the present disclosure.

[0098] An entire protein (full-length polypeptide) encoded by one of the genes that are shown in Table 5 A, Table 5B, and / or Table 5C may serve as a biomarker polypeptide in various embodiments according to the present disclosure, a fragment of a protein encoded by one of the genes that are shown in Table 5 A, Table 5B, and / or Table 5C may be used, a variant of a protein encoded by one of the genes that are shown in Table 5 A, Table 5B, and / or Table 5C may be used, or any combination of two or more of the full-length polypeptide, a fragment, a variant of the full-length polypeptide, or a variant of a fragment of a protein encoded by one of the genes that are shown in Table 5 A, Table 5B, and / or Table 5C may be used. In reference to the embodiments of the present invention, an ammo acid sequence of a biomarker polypeptide may contain a naturally a naturally occurring (or “wild-type”) ammo acid sequence of a biomarker protein or a portion (fragment) thereof. An example of wild-type amino acid sequences are sequences of human PRRC2A, LAX1, QTRT1, and NRAC. It is to be understood that, in some examples of the polypeptides according to the present disclosure, various features (such as posttranslational modifications) and mutations of the wild-type ammo acid sequences may be present.Table 1, Exemplary wild-type sequences.Protein Amino Acid Sequence MSDRSGPTAKGKDGKKYSSLNLFDTYKGKSLEIQKPAVAPRHGLQSLGKVAIARRMPPPANLPSLK AENKGNDPRWSLVPKPGTGWASKQEQSD'PKSSDASTAQPPESQPLPASQTPASNQPKRPPAAPENT PLVPSGVKSWAQASVTHGAHGDGGRASSLLSRFSREEFPTLQAAGDQDKAAKERESAEQSSGPGPS LRPQNSTTWRDGGGRGPDELEGPDSKLHHGHDPRGGLQPSGPPQFPPYRGMMPPFMYPPYLPFPPP YGPQGPYRYPTPDGPSRFPRVAGPRGSGPPMRLVEPVGRPSILKEDNLKEFDQLD'QENDDGWAGAH EEVDYTEKLKFSDEEDGRDSDEEGAEGHRDSQSASGEERPPEADGKKGNSPNSEPPTPKTAWAETS RPPETEPGPPAPKPPLPPPHRGPAGNWGPPGDYPDRGGPPCKPPAPEDEDEAWRQRRKQSSSEISL AVERARRRREEEERRMQEERRAACAEKLKRLDEKFGAPDKRLKAEPAAPPAAPSTPAPPPAVPKEL PAPPAPPPASAPTPETEPEEPAQAPPAQSTPTPGVAAAPTLVSGGGSTSSTSSGSFEASPVEPQLP PRRC2A SKEGPEPPEEVPPPTTPPVPKVEPKGDGIGPTRQPPSQGLGYPKYQKSLPPRFQRQQQEQLLKQQQ QHQWQQHQQGSAPPTPVPPSPPQPVTLGAVPAPQAPPPPPKALYPGALGRPPPMPPMtFDPRWT^II PPYVDPRLLQGRPPLDFYPPGVHPSGLVPRERSDSGGSSSEPFDRHAPAMLRERGTPPVDPKLAWV GD'VFTATPAEPRPLTSPLRQAADEDDKGMRSETPPVPPPPPYLASYPGFPENGAPGPPISRFPLEE PGPRPLPWPPGSDEVAKIQTPPPKKEPPKEETAQLTGPEAGRKPARGVGSGGQGPPPPRRESRTET RWGPRPGSSRRGIPPEEPGAPPRRAGPIKKPPPPTKVEELPPKPLEQGDETPKPPKPPPLKITKGK LGGPKETPPNGNLSPAPRLRRDYSYERVGPTSCRGRGRGEYFARGRGFRGTYGGRGRGARSREFRS YREFRGDDGRGGGTGGPNHPPAPRGRTASETRSEGSEYEEI PKRRRQRGSETGSETHESDLAPSDK EAPTPKEGTLTQVPLAPPPPGAPPSPAPARFTARGGRVFTPRGVPSRRGRGGGRPPPQVCPGWSPP AKSLAPKKPPTGPLPPSKEPLKEKLIPGPLSPVARGGSNGGSNVGMEDGERPRRRRHGRAQQQDKPPRFRRLKQERENAARGSEGKPSLTLPASAPGPEEALTTVTVAPAPRRAAAKSPDLSNQNSDQANEEAttorney Docket No. 110221 -1532974-011210WOWETASESSDFTSERRGDKEAPPPVLLTPKAVGTPGGGGGGAVPGISAMSRGDLSQRAKDLSKRSFS SQRPGMERQNRRPGPGGKAGSSGSSSGGGGGGPGGRTGPGRGDKRSWPSPKNRSRPPEERPPGLPL PPPPPSSSAVFRLDQVIHSNPAGIQQALAQLSSRQGSVTAPGGHPRHKPGLPQAPQGPSPRPPTRY EPQRVNSGLSSDPHFEEPGPMVRGVGGTPRDSAGVSPFPPKRRERPPRKPELLQEESLPPPHSSGF LGSKPEGPGPQAESRDTGTEALTPHIWNRLHTATSRKSYRPSSMEPWMEPLSPFEDVAGTEMSQSD SGVDLSGDSQVSSGPCSQRSSPDGGLKGAAEGPPKRPGGSSPLNAVPCEGPPGSEPPRRPPPAPHD GDRKEIiPREQPLPPGPIGTERSQRTDRGTEPGPIRPSHRPGPPVQFGTSDKDSDLRLVVGDSLKAE KE LTASVTEAI PVS RDWE LLPSAAASAE PQS KNLDS GHCVPE PS S S GQRLY PEV FY GS AGPS S S QI SGGAMDSQLHPNSGGFRPGTPSLHPYRSQPLYLPPGPAPPSALLSGIALKGQFLDFSTMQATELGK LPAGGVLYPPPSFLYSPAFCPSPLPDTSLLQVRQDLPSPSDFYSTPLQPGGQSGFLPSGAPAQQML LPWDSQLPVVNFGSLPPAPPPAPPPLSLLPVGPALQPPSLAVRPPPAPATRVLPSPARPFPASLG RAELHPVELKPFQDYQKLSSNLGGPGSSRTPPTGRSFSGLNSRLKATPSTYSGVFRTQRVDLYQQA SPPDALRWIPKPWERTGPPPREGPSRRAEEPGSRGDKEPGLPPPR - SEQ ID NO:1* MDGVTPTLSTIRGRTLESSTLHVTPRSLDRNKDQITNIFSGFAGLIAILLVVAVFCILWWNKRKK RQVPYLRVTVMPLLTLPQTRQRAKNIYDILPWRQEDLGRHESRSMRIFSTESLLSRNSESPEHVPS QAGNAFQEHTAHIHATEYAVGIYDNAIWPQMCGNLTPSAHCIWRASRDCASISSEDSHDYVWPT LAX1 AEEIAETLASTKSPSRNLFVLPSTQKLEFTEERDEGCGDAGDCTSLYSPGAEDSDSLSNGEGSSQI SNDYVNMTGLDLSAIQERQLWVAFQCCRDYENVPAADPSGSQQQAEKDVPSSNIGHVEDKTDDPGT HVQCVKRTFLASGDYADFQPFTQSEDSQMKHREEMSNEDSSDYENVLTAKLGGRDSEQGPGTQLLP DE - SEQ ID NO:2** MKLSETKYWNGCRLGKIKNLGKTGDHTMDIPGCELYTKTGSAPHLTHHTLHNIHGVPAMAQLTLSS LAEHHEVLTEYKEGVGKFIGMPESLLYCSLHDPVSPCPAGYWNKSVSWSVAGRVEMTVSKFMAI QKALQPDWFQCLSDGEVSCKEATSIKRVRKSVDRSLLFLDNCLREQEESEVLQKSVIIGVIEGGDV QTRT1 MEERLRSARETAKRPVGGFLLDGFQGNPTTLEARLRLLSSVTAELPEDKPRLISGVSRPDEVLECI ERGVDLFESFFPYQVTERGCALTFSFDYQPNPEETLLQQNGTQEEIKCMDQIKKIETTGCNQEITS FEINLKEKKYQEDFNPLVRGCSCYCCKNHTRAYIHHLLVTNELLAGVLLMMHNFEHYFGFFHYIRE ALKSDKLAQLKELIHRQAS - SEQ ID NO:3*** MRTAAGAVSPDSRPETRRQTRKNEEAAWGPRVCRAEREDNRKCPPSILKRSRPEHHRPEAKPQRTS NRAC RRWFREPPAVTVHYIADKNATATVRVPGRPRPHGGSLLLQLCVCVDLVLALGLYCGRAJIPVATALEDLRARLLGLVLHLRHVALTCWRGLLRL- SEQ ID NO:4*P48634-1 isoform identified as the canonical sequence in Universal Protein Resource (UniProt) database**Q8IWV1-1 isoform identified as the canonical sequence in UniProt database ***Q9H974-1 isoform identified as the canonical sequence in UniProt database

[0099] Some embodiments of biomarker polypeptides may contain artificially modified amino acid sequences of a biomarker protein or fragment thereof. Artificially modified amino acid sequences of biomarker polypeptides may contain various mutations and / or ammo acid modifications, as compared wild-type sequences. Biomarker polypeptide ammo acid sequences containing mutations with respect to wild-type biomarker polypeptide sequences can be referred to as “variants” and are discussed in more details below. In one example, an artificially modified ammo acid sequence of a biomarker polypeptide may contain mutations removing or adding glycosylation sites. In another example, an artificially modified ammo acid sequence of a biomarker polypeptide may contain one or more mutations stabilizing a desirable conformation of the biomarker polypeptide. A biomarker polypeptide may be artificially chemically modified.Attorney Docket No. 110221 -1532974-011210WOfor example, with glutaraldehyde. In some embodiments according to the present disclosure, a biomarker polypeptide includes an epitope that is recognized by an antibody to the bioniarker polypeptide. It is to be understood that epitopes, while recognized by antibodies to bioniarker polypeptides, may or may not stimulate a subject’s immune system to produce antibodies to the bioniarker polypeptides.

[0100] The term “variant” of a biomarker polypeptide or protein refers to a polypeptide comprising an amino acid residue sequence that is at least 70, 75, 80, 85, 90, 92, 94, 95, 96, 97, 98 or 99% identical to a corresponding wild-type sequence of the biomarker polypeptide. In certain embodiments according to the present disclosure, a variant of the bioniarker protein retains its ability to be bound specifically under appropriate conditions by an antibody that would specifically bind to the corresponding full length bioniarker polypeptide under appropriate conditions. In some instances, variants are modified at ammo acid residues other than those essential for the biological activity, for example the ability of the polypeptide bind specifically to a biomarker polypeptide-specific antibody. In some instances, one or more such essential ammo acid residues may optionally be replaced in a conservative manner or additional amino acid residues may be inserted such that the biological activity of the variant polypeptide is preserved.

[0101] The term “fragment” or “portion” with regard to a biomarker polypeptide refers to an amino acid residue sequence of a portion of the full-length biomarker protein, encompassing, for example, an amino acid residue sequence that is truncated at one or both termini by one or more amino acids. In certain embodiments, a biomarker polypeptide fragment retains its ability to be bound specifically, under appropriate binding conditions, by a biomarker polypeptide-specific antibody that would bind specifically to the corresponding full-length biomarker protein under appropriate binding conditions. A portion of a biomarker protein can be a polypeptide that is, for example, 10, 25, 50, 100, 150, 200, 250 or more ammo acid residues in length of the full-length biomarker protein. Alternatively or in addition, such peptide sequence may comprise one or more internal deletions of one or more amino acid residues. Thereby, the residual length of the fragment equals or exceeds the length of one or more continuous or conformational epitopes, e.g., 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 21, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 55, 60, 65, 70, 75, 80, 85, 90, 95, 100 or more ammo acid residues.Attorney Docket No. 110221 -1532974-011210WO

[0102] In some embodiments, a fragment comprises at least 6 contiguous amino acid residues of SEQ ID NO: 1, SEQ ID NO:2, SEQ ID NO:3, SEQ ID NO:4, or SEQ ID NO:5, or an ammo acid sequence having at least 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, or 99% sequence similarity to SEQ ID NO:1, SEQ ID NO:2, SEQ ID NO:3, SEQ ID NO:4, orSEQ ID NO: 5. In some embodiments, a fragment comprises at least 8 contiguous amino acid residues of SEQ ID NO: 1, SEQ ID NO:2, SEQ ID NO:3, SEQ ID NO:4, or SEQ ID NO:5, or an amino acid sequence having at least 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, or 99% sequence similarity to SEQ ID NO:1, SEQ ID NO:2, SEQ ID NO:3, SEQ ID NO:4, orSEQ ID NO: 5. In some embodiments, a fragment comprises at least 12 contiguous amino acid residues of SEQ ID NO:1 or an amino acid sequence having at least 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, or 99% sequence similarity to SEQ ID NO:1, SEQ ID NO:2,SEQ ID NO:3, SEQ ID NO:4, or SEQ ID NO:5. In some embodiments, a fragment comprises 8-20 contiguous amino acid residues of SEQ ID NO: 1, SEQ ID NO:2, SEQ ID NO:3,SEQ ID NO:4, or SEQ ID NO:5, or an amino acid sequence having at least 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, or 99% sequence similarity to SEQ ID NO:1, SEQ ID NO:2, SEQ ID NO:3, SEQ ID NO:4, or SEQ ID NO:5. In some embodiments, a fragment comprises 30-60 contiguous amino acid residues of SEQ ID NO:1, SEQ ID NO:2, SEQ ID NO:3,SEQ ID NO:4, or SEQ ID NO:5, or an amino acid sequence having at least 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, or 99% sequence similarity to SEQ ID NO:1, SEQ ID NO:2, SEQ ID NO:3, SEQ ID NO:4, or SEQ ID NO:5. In some embodiments, a fragment comprises 16 contiguous ammo acid residues of SEQ ID NO:1, SEQ ID NO:2, SEQ ID NO:3, SEQ ID NO:4, or SEQ ID NO: 5, or an amino acid sequence having at least 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, or 99% sequence similarity to SEQ ID NO:1, SEQ ID NO:2, SEQ ID NO:3, SEQ ID NO: 4, or SEQ ID NO: 5. In some instances, a plurality of fragments is provided.

[0103] In some embodiments, a fragment comprises at least 6 contiguous amino acid residues of or an amino acid sequence having at least 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, or 99% sequence similarity to SEQ ID NO:6, SEQ ID NO:7, or SEQ ID NO:8, as shown in Table 3 of the present disclosure. In some embodiments, a fragment comprises at least 7 contiguous ammo acid residues of SEQ ID NO:6, SEQ ID NO:7, or SEQ ID NO:8, or an amino acid sequence having at least 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, or 99% sequence similarity to SEQ ID NO:6, SEQ ID NO:7, or SEQ ID NO: 8. In some embodiments, aAttorney Docket No. 110221 -1532974-011210WOfragment comprises at least 8 contiguous amino acid residues of SEQ ID NO: 6, SEQ ID NO: 7, or SEQ ID NO: 8, or an amino acid sequence having at least 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, or 99% sequence similarity to SEQ ID NO:6, SEQ ID NO:7, or SEQ ID NO:8. In some embodiments, a fragment comprises at least 9 contiguous amino acid residues of SEQ ID NO:6, SEQ ID NO:7, or SEQ ID NO:8, or an amino acid sequence having at least 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, or 99% sequence similarity to SEQ ID NO:6, SEQ ID NO: 7, or SEQ ID NO: 8. In some embodiments, a fragment comprises at least 10 contiguous amino acid residues of SEQ ID NO:6, SEQ ID NO:7, or SEQ ID NO:8, or an amino acid sequence having at least 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, or 99% sequence similarity to SEQ ID NO:6, SEQ ID NO:7, or SEQ ID NO:8. In some embodiments, a fragment comprises at least 11 contiguous ammo acid residues of SEQ ID NO: 6, SEQ ID NO: 7, or SEQ ID NO: 8, or an amino acid sequence having at least 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, or 99% sequence similarity to SEQ ID NO:6, SEQ ID NO:7, or SEQ ID NO:8. In some embodiments, a fragment comprises at least 12 contiguous amino acid residues of SEQ ID NO:6, SEQ ID NO:7, or SEQ ID NO:8, or an amino acid sequence having at least 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, or 99% sequence similarity to SEQ ID NO:6, SEQ ID NO: 7, or SEQ ID NO: 8. In some embodiments, a fragment comprises at least 13 contiguous amino acid residues of SEQ ID NO:6, SEQ ID NO:7, or SEQ ID NO:8, or an amino acid sequence having at least 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, or 99% sequence similarity to SEQ ID NO:6, SEQ ID NO:7, or SEQ ID NO: 8. In some embodiments, a fragment comprises at least 14 contiguous ammo acid residues of SEQ ID NO:6, SEQ ID NO:7, or SEQ ID NO: 8, or an amino acid sequence having at least 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, or 99% sequence similarity to SEQ ID NO:6, SEQ ID NO:7, or SEQ ID NO:8. In some embodiments, a fragment comprises at least 15 contiguous amino acid residues of SEQ ID NO:6, SEQ ID NO:7, or SEQ ID NO:8, or an amino acid sequence having at least 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, or 99% sequence similarity to SEQ ID NO:6, SEQ ID NO:7, or SEQ ID NO:8. In some embodiments, a fragment comprises SEQ ID NO:6, SEQ ID NO:7, or SEQ ID NO:8, or an amino acid sequence having at least 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, or 99% sequence similarity to SEQ ID NO: 6, SEQ ID NO: 7, or SEQ ID NO: 8.Attorney Docket No. 110221 -1532974-011210WO

[0104] Variants and fragments according to the present disclosure may be prepared, for example, by introducing deletions, insertions or substitutions in nucleic acid sequences encoding them, or by chemical synthesis or modification. Moreover, variants and fragments may also be generated by fusion with other known polypeptides or variants thereof and encompass active portions or domains, preferably having a sequence identity of at least 70, 75, 80, 85, 90, 92, 94, 95, 96, 97, 98, or 99% when aligned with the active portion of the reference sequence, wherein the term “active portion”, as used herein, refers to an amino acid sequence, which is less than the full length ammo acid sequence or, in the case of a nucleic acid sequence, codes for less than the full length amino acid sequence, respectively, but retains at least some of the biological activity. For example, an active portion of a polypeptide retains the ability to bind to an antibody to a biomarker polypeptide.

[0105] In some embodiments, a biomarker polypeptide is an isolated, purified polypeptide. Protein expression and purification methods are well known. Biomarker polypeptides according to the present disclosure may be provided in any form and at any degree of purification, from tissues or cells comprising said polypeptides in an endogenous form, such as cells overexpressmg the polypeptide and crude or enriched lysates of such cells, to purified and / or isolated polypeptides that are essentially pure. In some embodiments, a biomarker polypeptide is included in a phage display or eukaryotic cell display library. In some embodiments, a biomarker polypeptide is heterologously-expressed on the surface of a cell. In some embodiments, biomarker polypeptides have a native configuration, wherein the term “native configuration,” as used herein, refers to a folded polypeptide, such as a folded polypeptide purified from tissues or cells, such as mammalian cells or tissues or from non-recombinant tissues or cells. In some embodiments biomarker polypeptides are recombinant proteins, wherein the term “recombinant”, as used herein, refers to a polypeptide produced using genetic engineering approaches at any stage of the production process, for example by fusing a nucleic acid encoding the polypeptide to a strong promoter for overexpression in cells or tissues or by engineering the sequence of the polypeptide itself. Such techniques are well known m the art.

[0106] In some instances, biomarker polypeptides can be denatured such as by heating, freezing or ultraviolet ray, or chemical treatments such as a surfactant or a denaturant. For example, such a denatured form of an biomarker polypeptide may be prepared by treating it withAttorney Docket No. 110221 -1532974-011210WOsodium dodecyl sulfate (SDS) or dithiothreitol (DTT). Biomarker polypeptides that are included in a kit or a panel as described in the present disclosure can be provided within a cell, in a solution in which they are soluble, or bioniarker polypeptides can be provided in a lyophilized form.

[0107] In some embodiments, biomarker polypeptides can be immobilized on a solid carrier insoluble in an aqueous solution, such as via a covalent bond, electrostatic interactions, encapsulation or entrapment, for example by denaturing a globular polypeptide in a gel, or via hydrophobic interactions such as via one or more covalent bonds. Various suitable carriers, for example paper, metal, silicon or glass surfaces, microfluidic channels, membranes, beads such as magnetic beads, column chromatography media, biochips, polyacrylamide gels and the like have been described in the literature, for example in Kim et al. 2013. This way, the immobilized molecule, together with the insoluble carrier, may be separated from an aqueous solution in a straightforward manner, for example by filtration, centrifugation or decanting. An immobilized molecule may be immobilized in a reversible or irreversible manner. For example, the immobilization is reversible if the molecule interacts with the carrier via ionic interactions that can be masked by addition of a high concentration of salt or if the molecule is bound via a cleavable covalent bond such as a disulfide bridge which may be cleaved by addition of thiol-contammg reagents. By contrast, the immobilization is irreversible if the molecule is tethered to the carrier via a covalent bond that cannot be cleaved in aqueous solution, for example a bond formed by reaction of an epoxide group and an amine group as frequently used to couple lysine side chains to affinity columns. A biomarker polypeptide may be indirectly immobilized, for example by immobilizing an antibody or other entity having affinity to the molecule, followed by formation of a complex to the effect that the molecule- antibody complex is immobilized. Various reagents and kits for immobilization reactions are commercially available such as, for example, from Pierce Biotechnology.Kits, devices, and systems

[0108] In some embodiments, the present disclosure provides kits, devices, and systems for use in accordance with the methods and compositions according to the embodiments of the present invention. Kits, devices, and systems according to the present disclosure may include one or more reagents for detecting expression products of one or more genes encoding theAttorney Docket No. 110221 -1532974-011210WObiomarkers associated with NSCLC. Examples of these genes include the genes that are shown in Table 5 A, Table 5B, and / or Table 5C of the present disclosure. In some examples, the genes comprise one or more of PRRC2A, LAX1, QTRT1, or NRAC. As discussed elsewhere in the present disclosure, an expression product may a protein or a nucleic acid. In certain embodiments, one or more expression products are mRNA. In the relevant embodiments, kits, devices, and systems may include the reagents for measuring mRNA. In certain embodiments, one or more expression products are polypeptides, and kits, devices, and systems include reagents for detecting proteins, for example, by using an immunoassay. Thus, kits, devices, and systems according to the present disclosure may include nucleic acid primers and / or probes, antibodies and / or antibody fragments. In some embodiments, suitable reagents are provided in a form of an array such as a microarray.

[0109] Kits, devices, and systems according to the present disclosure may include one or more reagents for detecting of one or more antibodies to one or more biomarkers associated with NSCLC in a subject. In the context of kits, devices, and system for antibody detection, the biomarkers are one or more polypeptides (biomarker polypeptides) encoded by one or more genes encoding the biomarkers associated with NSCLC. Examples of these genes include the genes that are shown in Table 5A, Table 5B, and / or Table 5C of the present disclosure. In some examples, the genes comprise one or more of PRRC2A, LAX1, QTRT1, or NRAC. Thus, provided in this disclosure are kits, devices, and systems containing one or more polypeptides identified by the inventors (some of which are listed in Table 3 of the present disclosure), such as one or more polypeptides encoded by the genes listed in Table 5A, Table 5B, or Table 5C, to which antibodies associated with NSCLC, can specifically bind. The polypeptides used in the kits, devices, and systems according to the present need not be full-length polypeptides encoded by the genes listed in Table 5 A, Table 5B, or Table 5C. The polypeptides used in the kits, devices, and systems used for antibody detection need to be able to specifically bind to antibodies associated with a risk of NSCLC.

[0110] In some embodiments, the provided kits, devices, and systems may further comprise reagents for carrying out various detection methods according to the present disclosure, such as, but not limited to, RT-PCR, sequencing, hybridization, primer extension, immunoassays, etc. For example, kits may optionally contain buffers, enzymes, and / or reagents for use in methodsAttorney Docket No. 110221 -1532974-011210WOaccording to the present disclosure for amplifying nucleic acids via PCR (for example, duplex or multiplex PCR and / or RT-PCR (i.e., real-time RT-PCR), primer-directed amplification, for performing immunoassays experiments, etc. Kits, devices, and systems may, in certain embodiments, include various primers and / or probes labeled with detectable moieties described elsewhere in the present disclosure. In some examples, a kit, a device, or a system may include detectably labeled antibodies. Reagents for the detection of the detectably labeled antibodies can also be included in its, devices, or systems according to the present disclosure. In some examples, a kit, a device or a system may include one or more solubilizing agents for polypeptides or nucleic acids, for example, one or more buffer solution. I

[0111] In addition, a kit, a device, or a system can include directions for practicing a method described in the present disclosure, such as detecting expression products or antibodies associated with NSCLC, in a biological sample. A concentration or amount of one or more antibodies or expression products associated with NSCLC, contained in the biological sample may be indirectly measured, for example, by measuring an amount of a detectable label. The obtained measurement value may be converted to a relative or absolute concentration, amount, activity, etc. using a calibration curve or the like.

[0112] In some embodiments, a kit, a device, or a system may include a control indicative of a healthy individual, such as a nucleic acid and / or protein sample from a subject who does not have NSCLC. Or the kit may comprise a positive control comprising a known amount of one (or more) of the expression products or antibodies measured. A kit, a device, or a system contain instructions on how to determine if a subject has NSCLC, or is at risk of developing NSCLC.

[0113] Some embodiments of the present invention include a prognostic or diagnostic devices containing one or more one or more reagents for detecting NSCLC-associated expression products and / or antibodies according to the present disclosure. In some exemplary devices, one or more reagents for detecting expression products of one or more genes encoding the biomarkers associated with NSCLC polypeptides, or reagents for detecting NSCLC-associated antibodies may be immobilized on the surface of a carrier. Such carriers include, but are not limited to glass plates or slides, chips, microtiter plates, fibers, beads, for example magnetic beads, chromatography columns, membranes or the like. Exemplary devices include line blots, microtiter plates and biochips.Attorney Docket No. 110221 -1532974-011210WO

[0114] In some exemplary embodiments, the present disclosure provides systems for detecting or measuring (measuring the presence and / or amount of) one or more expression products or antibodies associated with NSCLC in a biological sample obtained from a subject. Some embodiments of such systems may include a station and / or component for obtaining a biological sample from a subject. Some embodiments of such systems may include a station and / or component for detecting or measuring in the biological sample presence and / or amount of expression products from one or more genes shown in Table 5 A, Table 5B, and / or Table 5C of the present disclosure. In some examples, the one or more genes comprise one or more of PRRC2A, LAX1, QTRT1, or NRAC. Some embodiments of such systems may include a station and / or component for detecting or measuring in the biological sample presence and / or amount of antibodies to polypeptides encoded one or more genes shown in Table 5A, Table 5B, and / or Table 5C of the present disclosure. Some embodiments of such systems may include a station and / or component for detecting or measuring in the biological sample presence and / or amount of antibodies to polypeptides encoded by one or more of PRRC2A, LAX1, QTRT1, or NRAC genes. Some embodiments of such systems may include a station and / or component for detecting or measuring in the biological sample presence and / or amount of antibodies to polypeptides shown in Table 3 of the present disclosure.

[0115] In some embodiments, a kit, a device, or a system according to the present disclosure may include one or various components of computer systems, which are described elsewhere in the present disclosure.Methods of treatment

[0116] Provided in the present disclosure are methods of treating subjects identified as having NSCLC using methods of detection described in the present disclosure, which are useful for screening subjects to identify those at risk of or having NSCLC. The presence, absence, or amount of one or more biomarkers according to the present disclosure, or one or more antibodies associated NSCLC, such as one or biomarkers according to the present disclosure, in a biological sample obtained from a subject may be detected by using at least one or any combination of the methods, kits, or devices described elsewhere in the present disclosure. Other suitable NSCLC diagnostic techniques, such as X-rays diagnostics, computer tomography (CT), positron emissionAttorney Docket No. 110221 -1532974-011210WOtomography (PET), magnetic resonance imaging (MRI), biopsies, and / or liquid biopsies may be used in conjunction with the detection methods according to the present disclosure.

[0117] The detection methods according to the present disclosure may be useful as an indicator in determining suitable NSCLC treatment or treatments for the subject. Accordingly, in some embodiments, the methods according to the present disclosure may include performing one or more treatments on or administering such treatments to the subject having NSCLC, who may be referred to as NSCLC patient. An example on an NSCLC treatment is surgery. In most cases, the goal of lung cancer surgery is to remove the entire tumor, including a small amount of normal tissue (about 2 cm) at the margin. The general name for lung cancer surgery that enters the chest is thoracotomy, and specific types of surgical interventions may be performed as part of the thoracotomy, such as wedge resection, segmentectomy, sleeve resection, lobectomy, or pneumonectomy, depending on the tumor and patient characteristics. Another example of an NSCLC treatment is radiation treatment or radiation therapy (radiotherapy), which uses ionizing radiation to kill or control the growth of cancer cells. One more example of an NSCLC treatment is chemotherapy, which uses one or more chemotherapeutic agents (drugs). For example, NSCLC patients may be treated with cisplatin or carboplatin as the backbone of all NSCLC chemotherapy treatments protocols. Pemetrexed may be provided with platinum-based chemotherapy to patients with nonsquamous NSCLC. Gemcitabine is provided with platinum¬ based drug to patients with squamous NSCLC, Other examples of cheinotheurapeutical agents that may be administered to NSCLC patients are daxanes, such as paclitaxel or docetaxel. One more example of a NSCLC treatment is targeted therapy, which uses agents specifically designed to selectively target molecular pathways responsible for, or that substantially drive, the malignant phenotype of NSCLC cells. For example, targeted therapy agents may be inhibitors of epidermal growth factor receptor (EGFR), which include tyrosine kinase inhibitors (TKIs) (for instance, erlotinib (Tarceva), gefitinib (Iressa), and osimertinib (Tagrisso)) and monoclonal antibodies against EGFR (for instance, cetuximab (Erbitux)). In another example, targeted therapy agents may be inhibitors of vascular endothelial growth factor (VEGF), such as bevacizumab (Avastin). In one more example, targeted therapy agents may be inhibitors of EML4-ALK, such as crizotimb.Attorney Docket No. 110221 -1532974-011210WO

[0118] In accordance with the present disclosure, there may be employed conventional molecular biology, microbiology, biochemical, and recombinant DNA techniques within the skill of the art. Such techniques are explained fully in the literature. The provided methods and compositions will be further described in the following examples, which do not limit the scope of the methods and compositions of matter described in the claims.Methods of identifying autoantibodies associated with a risk of NSCLC

[0119] Methods of identifying autoantibodies associated with a risk of NSCLC are provided in the present disclosure. Embodiments of such methods may use PhlP-seq as described, for example, in are described, for example, in W02020190700, WO2021138273, WO2021046466, and WO2022146837. Also provided are methods of identifying and / or detecting antigens of autoantibodies associated with a risk of NSCLC. Examples of such antigens are biomarker polypeptides according to the present disclosure, including polypeptides encoded by one or more genes shown in Table 5A, Table 5B, and / or Table 5C. Embodiments of the methods may involve the following steps: (a) contacting a first biological sample obtained from a huma subject who is healthy and / or known to be NSCLC-free with a peptide display library under conditions sufficient to permit binding of a first plurality of antibodies from the first biological sample to a first plurality of corresponding polypeptides within the peptide display library to generate a first plurality of antibody-polypeptide complexes; (b) contacting a second biological sample obtained from a human subject who has NSCLC with the polypeptide display library under conditions sufficient to permit binding of a second plurality of antibodies from the first biological sample to a second plurality of corresponding polypeptides within the peptide display library to generate a second plurality of antibody-polypeptide complexes; (c) identifying the first plurality of the polypeptides and the second plurality of the polypeptides; and, (d) using a computer system, identifying the third plurality of polypeptides as the polypeptides found in the second plurality of the polypeptides but not in the first plurality of polypeptides. The third plurality of the polypeptides are the antigens of the autoantibodies associated with the risk of NSCLC.

[0120] In some embodiments of identifying autoantibodies associated with a risk of NSCLC, the peptide display library is a phage display library, and a nucleic sequence encoding each peptide of the phage display library includes a unique nucleic acid barcode sequence. The first plurality of complexes and the second plurality of complexes are subjected to nucleic acidAttorney Docket No. 110221 -1532974-011210WOamplification under conditions sufficient to amplify the unique nucleic acid barcode sequences, and the of the amplified nucleic acid sequences are then determined by a suitable sequencing method. The term “sequencing,” as used herein, and the related terms and expressions, generally refer to methods and technologies for determining the sequence of nucleotide bases in one or more polynucleotides. The polynucleotides can be, for example, nucleic acid molecules such as deoxyribonucleic acid (DNA) or ribonucleic acid (RNA), including variants or derivatives thereof (e.g., single stranded DNA). Sequencing can be performed by various systems currently available, such as, without limitation, a sequencing system by Illumina®, Pacific Biosciences (PacBio®), Oxford Nanopore®, or Life Technologies (Ion Torrent®), Alternatively or in addition, sequencing may be performed using nucleic acid amplification, polymerase chain reaction (PCR) (e g., digital PCR, quantitative PCR, or real time PCR), or isothermal amplification. Such systems may provide a plurality of raw genetic data corresponding to the genetic information of a subject (e.g., human individual), as generated by the systems from a sample provided by the subject. In some examples, such systems provide sequencing reads. A read may include a string of nucleic acid bases corresponding to a sequence of a nucleic acid molecule that has been sequenced. The determined sequences are then for the identifying the first plurality of the polypeptides and the second plurality of the polypeptides. As discussed above, the polypeptides found in the second plurality of the polypeptides but not in the first plurality of polypeptides are the antigens of the autoantibodies associated with the risk of NSCLC.Computer Systems

[0121] Any of the computer systems mentioned in the present disclosure may utilize any suitable number of subsystems. In some embodiments, a computer system includes a single computer apparatus, where the subsystems can be the components of the computer apparatus. In other embodiments, a computer system can include multiple computer apparatuses, each being a subsystem, with internal components. The subsystems can be interconnected via a system bus. Additional subsystems such as a printer, keyboard, storage device(s), monitor, which is coupled to display adapter, and others are shown. Peripherals and input / output (I / O) devices, which couple to I / O controller, can be connected to the computer system by any number of means known in the art, such as serial port. For example, serial port or external interface (e.g. Ethernet, Wi-Fi, etc.) can be used to connect computer system to a wide area network such as the Internet,Attorney Docket No. 110221 -1532974-011210WOa mouse input device, or a scanner. The interconnection via system bus allows the central processor to communicate with each subsystem and to control the execution of instructions from system memory’ or the storage device(s) (e.g., a fixed disk, such as a hard drive or optical disk), as well as the exchange of information between subsystems. The system memory and / or the storage device(s) may embody a computer readable medium. Any of the data mentioned herein can be output from one component to another component and can be output to the user.

[0122] A computer system can include a plurality of the same components or subsystems, e.g., connected together by external interface or by an internal interface. In some embodiments, computer systems, subsystem, or apparatuses can communicate over a network. In such instances, one computer can be considered a client and another computer a server, where each can be part of same computer system. A client and a server can each include multiple systems, subsystems, or components.

[0123] It should be understood that any of the embodiments of the present invention can be implemented in the form of control logic using hardware (e.g. an application specific integrated circuit or field programmable gate array) and / or using computer software with a generally programmable processor in a modular or integrated manner. As used herein, a processor includes a multi-core processor on a same integrated chip, or multiple processing units on a single circuit board or networked. Based on the disclosure and teachings provided herein, a person of ordinary skill in the art will know and appreciate other ways and / or methods to implement embodiments of the present invention using hardware and a combination of hardware and software.

[0124] Any of the software components or functions described in this application may be implemented as software code to be executed by a processor using any suitable computer language such as, for example, Java, C+ + or Perl using, for example, conventional or object- oriented techniques. The software code may be stored as a series of instructions or commands on a computer readable medium for storage and / or transmission, suitable media include random access memory (RAM), a read only memory (ROM), a magnetic medium such as a hard-drive or a floppy disk, or an optical medium such as a compact disk (CD) or DVD (digital versatile disk), flash memory, and the like. The computer readable medium may be any combination of such storage or transmission devices.Attorney Docket No. 110221 -1532974-011210WO

[0125] Such programs may also be encoded and transmitted using carrier signals adapted for transmission via wired, optical, and / or wireless networks conforming to a variety of protocols, including the Internet. As such, a computer readable medium according to an embodiment of the present invention may be created using a data signal encoded with such programs. Computer readable media encoded with the program code may be packaged with a compatible device or provided separately from other devices (e.g., via Internet download). Any such computer readable medium may reside on or within a single computer product (e.g. a hard drive, a CD, or an entire computer system), and may be present on or within different computer products within a system or network. A computer system may include a monitor, printer, or other suitable display for providing any of the results mentioned herein to a user,EXAMPLES

[0126] The following examples are offered to illustrate, but not to limit the claimed invention. Example 1: MethodsA. Sample acquisition

[0127] The primary (n = 301) and validation (n = 134) cohorts of primarily early-stage NSCLC plasma samples were collected from the UCSF Thoracic Oncology Center (San Francisco, CA). All patients gave informed consent for sample collection. Samples were collected in accordance with UCSF ethical guidelines and regulations for the conduct of responsible research. The study was approved by the UCSF Institutional Review Board. Deidentified healthy control plasma was collected from four sources: provided by New York Blood Center (New York, New York); purchased from SeraCare (Milford, Massachusetts); purchased from BioIVT (Westbury, New York); and provided donors from a UCSF community drive,B. Phage immunoprecipitation sequencing (PhlP-Seq) assay

[0128] PhlP-Seq was run on the vacuum-based high-throughput assay described in Vazquez etal., 2022. Plasma samples across all patient cohorts were randomly arrayed across 96-well plates. In each well, 1 pL of patient plasma was incubated with 500 uL of the custom T7 phage display library of 731,724 unique species spanning the normal human proteome from NCBI RefSeq (Agilent Technologies, previously described in O’Donovan et al., 2020). After overnight incubation, antibodies were pulled down with Protein A / G beads (DynaMax). AfterAttorney Docket No. 110221 -1532974-011210WOthree washes, the remaining bound-phage were then inoculated into Escherichia coli culture (strain BLT5403) and allowed to lyse. The process was then repeated with the resultant phage culture and fresh patient plasma twice more for a total of three iterative rounds. The final phage DNA was extracted and sequenced on the NovaSeq platform (Illumina).C. PhlP-Seq data analysis

[0129] The sequencing reads were aligned with RAPSearch2 search tool (Zhao et al., 2012) and tabulated and log-transformed with Python. Analysis was performed in Python with the following software packages: pandas, numpy, scipy, PyTorch, and scikit-learn. The figures were made with matplotlib, seaborn, Adobe Illustrator, and BioRender software.D. Machine learning implementation and evaluation

[0130] Various machine learning models were implemented in Python with scikit-learn and PyTorch, and then evaluated through 10-fold cross-validation on the training set (352 NSCLC, 301 healthy subjects), varying model parameters to maximize classification ability using ROC-AUC (scikit-learn) as the output metric. The following models and parameters were evaluated: Logistic regression (regularization, solver, iterations), Random forests (number of trees, depth, criterion, iterations), Support vector machine (regularization, kernel, iterations), and Neural network (activation function, loss function, number of hidden layers, layer size, iterations). The final ensemble model consists of logistic regression (L2 regularization, liblinear solver) on peptide and protein level and a fully-connected feed-forward neural network (RELU activation function on 2 hidden layers of sizes 100 and 50).E. Split luciferase binding assay (SLBA)

[0131] SLBA was run as described in Rackaityte et al., 2023.Example 2: Classification of NSCLC versus healthy control

[0132] To ascertain the potential of autoantibody profiling for cancer classification, PhlP-Seq immunological profiling was performed on a cohort of plasma samples previously collected from 301 patients with NSCLC at the University of California, San Francisco (UCSF) Thoracic Oncology Laboratory and plasma from 352 healthy volunteers (see Table 2). An important feature of this NSCLC cohort is its early-stage composition, with more than 90 percent of patients having Stage I or II disease. Similarly, over two-thirds of patients in this training cohort were asymptomatic at the time of diagnosis.Attorney Docket No. 110221 -1532974-011210WOTable 2. Summary of patient clinical and pathological characteristics.Training Training Validation Validation NSCLC Healthy NSCLC Healthy Patients 301 352 137 9629.3 Age 68.7 (9.8)§38.7 (12.6)§68.6 (10.3)§(9.0)§Females 189 (62.8) 176 (50.3) 83 (60.5) 54 (56.2) Smoking HistoryYes 191 (63.5) - 80 (58.4) - No 110 (36.5) - 57 (41.6) - Stage0 1 (0.3) - 3 (2.0) - I 231 (76.7) - 71 (51.8) - IA1 67 (22.3) - 12 (8.7)IA2 67 (22.3) - 22 (16.1) - IA3 22 (7.3) - 16 (11.7) - IB 75 (24.9) - 21 (15.3) - II 43 (14.3) - 39 (28.5) - IIA 12 (4.0) - 10 (7.3) - IIB 31 (10.3) 29 (21.2) - III 25 (8.3) - 20 (14,6) - IIIA 23 (7.6) - 15 (10.9) - IIIB 2 (0.7) - 5 (3.6) - me 0 (0.0) - 0 -IV 1 (0.3) 3 (2.2) - Numbers in parentheses represent the percentage of patients except where stated.§Cohort Mean (Standard Deviation)

[0133] Following our previously described high-throughput PhlP-Seq protocol (Mann et al. 2022; Vazquez etal. 2022), autoreactive serological profiles of the entire cohort were captured. Briefly, plasma from patients and controls were randomly arrayed in 96-well plates, then incubated with a previously validated custom library of 730,000+ unique phage species spanning the entire annotated “healthy” human proteome, including known isoforms, tiled across 49 amino acid fragments with a 25 amino acid overlap (O’Donovan et al. 2020). Following three rounds of enrichment through immunoprecipitation of patient or healthy control antibodies bound by protein A / G beads, the resultant enriched phage species were sequenced using the Illumina NovaSeq platform (median read depth of 1,276,108 reads per sample) and aligned to theAttorney Docket No. 110221 -1532974-011210WOreference with RAPSearch2 (Zhao, Tang, and Ye 2012). To ensure consistency and to check for potential batch effects, a subset of the samples was run in duplicate on separate plates.Duplicates on separate plates demonstrated a sample- wise correlation (average Pearson coefficient > 0.80) to its corresponding replicate as well as to a separate PhlP-Seq run on an earlier date with limited correlation to other samples in the set. The complete PhlP-Seq data is available on Dryad.

[0134] Enriched peptide counts, scaled for sequencing depth to reads-per- 100,000 (RP100K), were used as input into a range of machine learning classifier models, both on the individual peptide level (731,724 unique features), as well as collapsed by associated protein to account for tiling and isoform redundancy (19,998 unique features), similar to previous analyses with the same library (Vazquez et al. 2020). Classifiers, described below, were initially evaluated using 10-fold cross-validation - systematically withholding 10% of the samples (both case or control), training on the remainder of the set, and then predicting the status of the withheld samples (FIG. 1).

[0135] A range of machine learning classifiers, implemented using the popular scikit-learn and PyTorch python packages, were evaluated, including logistic regression, neural networks, support vector machines, and random forests (FIG. 2). After hyperparameter optimization on individual levels, we found that an ensemble model incorporating predictions from multiple models yielded the best classification performance, with an average area under the receiver operator curve (AUC-ROC) of 0.94 after 1000 bootstrapped iterations (FIG. 3). Ensemble models have previously been successfully implemented in other biological classification tasks (REF). The three components of the ensemble model were logistic regression on the peptide level, logistic regression on the protein level, and a fully connected feed-forward neural network on the protein level (see Methods and GitHub repository [URL here] for model details and specific packages used). Although not necessarily optimal, an ensemble model provided the biological comprehensibility of the linear components and latent benefits captured by the nonlinear components. Indeed, performance was superior to that of the individual models m isolation, with a sensitivity of 70% at a specificity of 95% (Supplemental Table I ). More sophisticated models have the potential to further increase performance, but this combination of tractable components serves as a proof-of-concept.Attorney Docket No. 110221 -1532974-011210WOExample 2: Model parameterization suggests a robust and specific signal

[0136] The best-performing ensemble model was further characterized through parameterization. While the full model utilized all values in the dataset, we evaluated the model training and cross-validation performance by progressively limiting the number of peptide-level input features from 100 to 100,000 features relative to the complete dataset. (FIG. 4). As expected for zero-inflated data structures like PhlP-Seq peptide / protein counts, removing low- level and zero counts did not affect the model's performance. Indeed, removing more than 99% of available features (5,000 features remaining) still yielded a comparable AUC (0.92 average). However, model performance suffered progressively more degradation when restricting input features to 1,000 or less. At 100 features, the average AUC had decreased to 0.72. This suggests that the underlying information content separating case and healthy control is distributed across a plurality of autoreactive protein targets. This is in contrast to cancer-related autoimmune disorders, such as seminoma-associated paraneoplastic encephalitis, where a single autoreactive target (KLHL11) is sufficiently diagnostic (Mandel-Brehm etal. 2019).

[0137] To further characterize the specificity of the classifier, the train-test split sizes of the model were systematically varied, training on 90%, 80%, 50%, 20%, and 10% of the samples (both NSCLC and healthy, cross-validated) and testing on the remainder (FIG. 5), As expected, model performance progressively degraded as the training split size decreased. However, the model maintained reasonable classification ability (AUC > 0.80) with as little as 20% of the samples (FIG. 4). These data suggest the presence of a shared humoral autoreactive signal across multiple individuals among cases and not controls while simultaneously reinforcing the need for sufficiently large cohorts.Example 3: Top features identified by model are orthogonally validated and indicate potential tissue and cancer specificity

[0138] Although the logistic regression model alone did not have the best classification performance among the models tested here, it has the distinct advantage of being a linear model. As such, it can extract and quantify the contributions of the input features. Four polypeptides associated with top coefficients from the model (PRRC2A, QTRT1, NRAC, and LAX1 polypeptides shown in Table 3) were chosen for separate experimental validation via aAttorney Docket No. 110221 -1532974-011210WOpreviously described split-luciferase binding assay (SLBA) (Rackaityte et al. 2023). Plasma from NSCLC patients yielded significantly higher binding to polypeptides encoded by these four genes than plasma from non-cancer controls (FIG. 6, FIG. 7, FIG. 8, FIG. 9). Similarly, RNA expression profiles from paired cancer-normal samples within The Cancer Genome Atlas (TCGA-LUAD) for these genes (PRRC2A, QTRT1, and LAX1; NRAC was not available) all indicate a statistically significant increase in expression of these genes m the tumor compared to the paired normal lung tissue from the same patients (FIG. 10, FIG. 11, and FIG. 12). When mapped to normal gene expression in the genotype-tissue expression atlas (GTEx), the average Z-score across all available tissue types of a broader set of top features (25, 50, 75, and 100 genes) suggests lung-specific enrichment compared to other relevant tissue types (FIG. 13). The enrichment appears to decrease as more features, which have lower coefficients and presumably contribute less to the signal, are included.Table 3. Sequences of the polypeptides associated top coefficients from the model.Protein Amino Acid Sequence FPASLGRAELHPVELKPFQDYQKLSSNLGGPGSSRTPPTGRSFSGLNS PRRC2A(SEQ ID NO: 5) GFAGLLAILLVVAVFCILWNWNKRKKRQVPYLRVTVMPLLTLPQTRQR LAX1AKNIYDILPWRQEDLGRHESRSMR (SEQ ID NO: 6) RSRARAGEIAW. PHGWATPVFMPVGTQATMKGITTEQLDAI. GCRICI^G QTRT1NGYHLGLRFAIPF. LIQKANGLHGFM (SEQ ID NO: 7)NRAC KRSRPEHHRPEAKPQRTSRRVWFREPPAV1VHYIADKNATATVRVPGR(SEQ ID NO: 8)Example 4: Independent cohort validation

[0139] To test the robustness of the cancer signature, the model was trained with the entire discovery cohort (301 NSCLC samples and 352 healthy samples) and then subsequently used to evaluate a blinded cohort consisting of 137 NSCLC samples and 96 healthy (Table 1), all previously unseen by the model. Samples were processed in the same manner as before, and sequencing data was then transmitted, completely blinded with no pre-processing or evaluation, to an independent validator from an outside institution well-versed in PhlP-Seq data and analysis but otherwise previously uninvolved in the project. Using a web server, the independent validator could submit anonymous files containing peptide counts from the blinded samples forAttorney Docket No. 110221 -1532974-011210WOclassification by our model. Our validator reported robust classification (ROC-AUC = 0.84, FIG. 14) and predictive value at various prespecified classifier thresholds in our independent validation cohort (see Table 4). Various test manipulations performed by the independent validator produced predictable results from the model that were in line with expectations, underlying the validity of the independent evaluation. Shuffling samples and count orders had no effect. Introducing a uniform background distribution by adding 50 to all counts slightly blunted the signal, while random peptide values completely ablated the signal (FIG. 15). This blinded independent third-party evaluation on separate samples, never used in training, suggests the model is reproducible for early-stage cancer.Table 4. Sensitivity and specificity for validation cohort at various classification thresholds.Classifer Threshold Sensitivity Specificity0.50 0.79 0.700.60 0.74 0.760.70 0.69 0.800.80 0.60 0.860.90 0.44 0.910.95 0.39 0.95Example 5: Correlation to relevant clinical features

[0140] Certain clinical features in NSCLC are linked to disease development, progression, and outcome. When sub-setting the training cohort and validation cohorts by these features, the distribution of prediction values from the classifier (0 as healthy, 1 as NSCLC) offers insights into the captured signal. Consistent with the ROC curves, there were robust and significant differences between NSCLC and healthy for both the cross-validated training and independent clinical validation cohorts (FIG. 16). Tobacco use is a well-established risk factor for NSCLC, and smoking histoiy (self-reported current or former) was indeed correlated with higher prediction scores, significantly so in the validation cohort (FIG 17). Tumor stage represents the size and invasiveness of the disease, with later stages representing a larger global tumor footprint. Interestingly, we did not observe significant differences in classifier performance in patients with early versus late-stage disease in either cohort (FIG. 18). This starkly contrasts with most nucleic acid-based early-detection modalities, which typically see a loss of signal in earlier-Attorney Docket No. 110221 -1532974-011210WOstage disease due to the lower amount of circulating tumor DNA (ctDNA) present in early-stage disease. Similarly, there were no significant differences in classifier performance in patients who were symptomatic (defined as any of the following: chest pam, cough, hemoptysis, dyspnea, fatigue, weight loss) versus asymptomatic patients at the time of diagnosis in either the training or validation cohorts (FIG. 19).Example 6: “Drop-out” analysis of the model for NSCLC classification

[0141] “Drop-out” analysis of the model for NSCLC classification was performed. The data set used in the model was manipulated to remove set numbers (100, 200, or 500 biomarkers shown in Table 5) of top biomarkers, which were selected based on top peptide and gene-level logistic regression coefficients. The model was then re-trained using the manipulated data sets to test the model’s performance. Surprisingly, as illustrated in FIG. 20, the model’s performance dropped considerably and was nearly eliminated when biomarkers were removed. Accordingly, it was concluded that 500 biomarkers shown in Table 5A, 200 biomarkers shown in Table 5B, and 100 biomarkers shown in Table 5C are top sets of biomarkers useful for NSCLC classification.Table 5. Top biomarker genes, listed in alphabetical order, removed from the model during “drop out analysis.”A. T op 500 biomarker genesABL1, ABLIM2, ACINI, ACVR2A, ADCK5, ADH1B, AFAP1L1, AGAP3, AGBL2, AGBL3, AGPAT6, AHCTF1, AHR, ALMS1, AMACR, ANGEL 1, ANGPTL2, ANK1, ANKRD12, ANKRD32, ANXA2R, APOBEC3AJ3, ARAP1, ARHGAP31, ARID1B, ARID2, ARID5B, ASAH2, ASMTL, ATAD5, ATP10A, ATP2B3, ATP5J2-PTCD1, ATP6V1G3, ATRX, B4GALNT4, BANKL BBS9, BCLAFL BDIT1, BIN2, BIRC7, BOD1L1, BPIFB4, BPTF, BRF1, BRPFL BTG4, C10orf88, Cl lorf80, C14orfl80, C16orf78, C19orf47, Clorfl06, Clorfl59, C2orf42, C4BPA, C. XBIM. CABP4, CACNA1E, CACTIN, CAGE1, CARD6, CCDC146, CCDC17, CCDC28A, CCDC42B, CCDC64, CCDC92, CD320, CDC6, CDH26, CDHR3, CDHR5, CDK5RAP1, CDYL2, CEP128, CEP 192, CEP95, CERKL, CFAP54, CHID1, COL15A1, COL5A3, COL6A3, COL7A1, COROIC, CORO7, CREB3L4, CRTC1, CRY1, CWF19L1, CXorf57, CYSRT1, DAB2, DDX24, DDX55, DEFA1, DENND1C, DGCR6, DHX57, DIP2A, DIS3L2, DISCI, DLGAP2, DLGAP3, DMD, DNAH7, DNMBP, DOC2B, DONE, DRD3, DSC3, DTNB, DTX4, DUOXA1, DYRK1A, ECHDC3, EDC4, EED2, EIF2D, ELL2, EMCN, ENOl, ENOX2, ENTPD8, EPB41L3, EPHB4, EPS8L1, ERBB2IP, ERCC6L, ESRRG, EXOSCIO, FAM154A, FAM171B, FAM193A, FAM207A, FAM214A, FAM21A, FAM73B, FAT2, FCRL2, FGF17, FE,Attorney Docket No. 110221 -1532974-011210WOFHDC1, FH0D3, FILIP1L, FLG, FLG2, FLU, FLNA, F0XJ3, F0XP3, FRMD4A, FSIP2, GAB2, GALNTL6, GATAD2B, GFAP, GIGYF1, GLGL GMEB2, GNL2, GNPTAB, GPATCH2, GPATCH4, GPC1, GPSM1, GRAMD1B, GRIK1, GRIP2, GTF2A1, GTSE1, H1FNT, HDAC7, HDX, HEATR4, HEG1, HES6, H0GA1, HOTS, HPS1, HSPG2, HTATSF1, HUWE1, IGSF10, IGSF21, INADL, INO80D, INPP4B, IQCJ-SCHIP1, IQSEC1, ISLR2, ITGAM, ITGB1BP1, ITPREPLl, KALRN, KAT6A, KCNH8, KCN. T11, KCNMA1, KCNQ4, KCP, KDM1B, KDM6A, KIAA0100, KIAA1210, KIAA1614, KIR2DL5B, KLF11, KLHL35, KLK12, KLKBI, KMT2A, KMT2C, KRT5, LAMA3, LAX1, LCA5, LDLR, LEMD1, LENG1, LIAS, LIG3, LILRB4, LMNA, LOC100996413, LOC101927260, LOC101929601, LOC102723822, LOC102724642, LOC102725241, LOC105370295, LOC105376351, LOC400682, LRCH3, LRFN5, LRRC16B, LSR, MACF1, MAMLD1, MAP3K15, MAP3K3, MAP7D3, MARK2, MB21D1, MCF2L, MCM3AP, MDM1, MDN1, MED12L, MED8, MEF2A, MEF2D, MEPE, MFF, MFGE8, MGAT5B, MICAL3, MKRN2, MLPH, MPHOSPH9, MPP6, MRP Si 1, MS4A14, MS4A18, MTA1, MTFR2, ML IC 16, MUC17, MUC21, MYH4, MYO19, MYRIP, NAB1, NACAD, NBPF20, NCAM2, NCOA1, NEDD4, NEK1, NFRKB, NHSL1, NHSL2, NINE, NLRP6, NOL8, NOP56, NOSIP, NPBWR2, NR1D1, NR1I3, NR5A1, NRF 1, NRXN1, NRXN2, NT5C1B, NUGGC, NUP54, OBSCN, OIP5, OPA3, OSBPL6, OVOL3, P2RX1, PALLD, PALMD, PATL2, PAX3, PCDH7, PDSS2, PDXK, PDZD2, PDZD7, PDZD8, PEAK1, PGAP2, PIK3R6, PIKFYVE, PJA2, PKNOX2, PLA2G4E, PLAGL2, PLCHI, PLEKHA4, PLEKHG3, PLEKHM3, PLEKHS1, PLIN4, PNN, PNPTL PODNLL POLRK4T, PPAN-P2RY11, PPFIA4, PPL, PPP1R13B, PPP1R9A, PPP4R4, PRCT, PRDM15, PRDM2, PRKCH, PRPF38B, PRRC2A, PRRC2B, PRSS54, PRSS56, PTGDS, PTPN12, PTPRK, PTPRN2, QTRT1, RAB3IL1, RAB42, RECQL5, REXO1, RFPL3, RFWD2, RGS3, RHBDD3, RIMS1, RIMS2, RIN3, RIPK1, RMI1, RNASE2, RNF123, RNF165, RNF34, ROBO3, RPRD2, RSPHL RSPH3, RSRC1, RSRP1, RTELL RTL1, RUFY3, RUSC2, RWDD3, RXFP4, SAMD4A, SAP25, SART1, SCUBE3, SEMA4G, SENP7, SH2B3, SH2D3C, SH2D4A, SH3BP5L, SHROOM2, SLC11A2, SLC12A5, SLC28A3, SLC35G6, SLC43A2, SLC4A2, SLC4A3, SLC4A4, SLC6A19, SMARCAL1, SMC2, SMOC2, SNX8, SOS1, SPACA7, SPAG17, SPATA31A3, SPEN, SPRTN, SRSF4, SS18, SS18L1, STMN4, SUN1, SUV420H1, SYNE1, SYNM, SYTL1, TAMM41, TANCI, TANGO2, TBC1D14, TBC1D8B, TCEAL3, TCP10L2, TDRP, TG, THBS2, TINAG, TLL1, TMEFF1, TMEM143, TMEM265, TNNI3K, TNRC6A, TOM1L1, TONSL, TOP2B, TOX2, TPH1, TPM4, TRAF3IP1, TRIM3, TRIO, TROAP, TSC2, TSGA13, TTC24, TTC37, TTC9C, TTN, UBE2F, UNC80, USP11, USP17L2, USP36, USP39, UTY, VPS13B, VPS54, WHSC1L1, WNK1, XDH, XIAFL XPO7, YLPM1, YTHDF2, ZAR1L, ZBTB1, ZC3H13, ZC3FI18, ZC3H3, ZCCHC6, ZDHHC17, ZEB2, ZFHX3, ZFYVE9, ZNF169, ZNF174, ZNF177, ZNF232, ZNF410, ZNF439, ZNF496, ZNF507, ZNF514, ZNF529, ZNF552, ZNF570, ZNF585A, ZNF608, ZNF618, ZNF688, ZNF750, ZNF766, ZNF821, ZSCAN20 B. Top 200 biomarker genesABLIM2, ACINI, AGAP3, AGBL2, AHR, A\L\(T; ANK1, ANKRD12, ARHGAP31, ARID5B, ASAH2, ATRX, B4GALNT4, BDH1, BLN2, B1RC7, BOD1L1, BPTF, BRPF1, C10orf88, C14orfl80, C16orf78, Clorfl06, CABIN1, CACNA1E, CARD6, CCDC146, CCDC17, CCDC42B, CD320, CEP192, CORO7, CREB3L4, CRTC1, CWF19L1, CYSRT1, DAB2, DDX24, DENND1C, DIP2A, DMD, DNAH7, DNMBP, DOHH, DUOXA1, EHD2,Attorney Docket No. 110221 -1532974-011210WOEMCN, EN01, ENTPD8, EPB41L3, EPS8L1, FAM193A, FAM207A, FAM214A, FAM21A, FIJI, FOXJ3, FRMD4A, GATAD2B, GIGYF1, GMEB2, GNL2, GPATCH2, GPSM1, GRIP2, GTSE1, HDAC7, HPS1, HSPG2, IGSF21, INADL, INPP4B, IQSECJ, ITGAM, KALRN, KAT6A, KCP, KDM1B, KDM6A, KIR2DL5B, KMT2A, KMT2C, LAMA3, LAX1, LIG3, LOC100996413, LOC101927260, LOC101929601, LOC102723822, LOC102724642, LOC102725241, LOC105370295, LOCI 05376351, LRFN5, LRRC16B, LSR, MAMLD1, MAP3K15, MCF2L, MCM3AP, MDM1, MED12L, MEF2D, MLPH, MTA1, MTFR2, MUC16, NEK1, NHSL1, NHSL2, NINE, NOL8, NOSIP, NR1D1, NR1I3, NRFL NRXN2, OBSCN, OP A3, OSBPL6, PCDH7, PDXK, PDZD2, PDZD8, PEAK!, PIK3R6, PLCHI, PLEKHG3, PODNL1, PPAN-P2RY11, PPL, PRC1, PRDM2, PRRC2A, PRRC2B, PTGDS, PTPN12, PTPRK, QTRT1, RAB3IL1, RECQL5, RIMS2, RIPK1, RNF165, RNF34, ROBO3, RPRD2, RUFY3, RTJSC2, SCLT3E3, SEMA4G, SENP7, SH2B3, SH2D4A, SH3BP5L, SLC28A3, SLC43A2, SLC4A3, SLC6A19, SMOC2, SPEN, SRSF4, SS18L1, STMN4, SYNE1, TANGO2, TBC1D14, TDRP, TG, THBS2, TINAG, TMEFF1, TONSL, TOX2, TTN, UNC80, USP36, WHSC1L1, WNK1, XIAP, YLPM1, YTHDF2, ZAR1L, ZBTB1, ZC3H13, ZC3H3, ZCCHC6, ZDHHC17, ZEB2, ZNF169, ZNF410, ZNF439, ZNF496, ZNF570, ZNF608, ZNF618, ZNF688, ZNF750, ZNF766, ZNF821C. Top 100 biomarker genesABLIM2, AGAP3, AGBL2, AHR, ANK1, ANKRDI2, ARHGAP31, ASAH2, ATRX, BIN2, BIRC7, BODILI, C10orf88, C14orfl80, C16orf78, Clorfl06, CACNAIE, CCDC146, CCDC17, CCDC42B, CD320, CORO7, DENNDIC, DMD, DNAH7, EMCN, ENTPD8, EPB41L3, FAM207A, FAM21 A, FIJI, GATAD2B, GMEB2, GPATCH2, GTSE1, INADL, IQSEC1, KDM6A, LAMA3, LAX1, LOC100996413, LOC101927260, LOC101929601, LOC102724642, LOC10272524I, LRRC16B, MAMLD1, MAP3KL5, MCF2L, MDMI, MEF2D, NHSL1, NHSL2, NINL, NOSIP, NR1D1, OP A3, PCDH7, PDZD2, PDZD8, PL CHI, PPAN-P2RY11, PRRC2A, PRRC2B, PTGDS, PTPRK, QTRT1, RAB3IL1, RECQL5, RIPK1, RNF34, RUFY3, RUSC2, SEMA4G, SH2B3, SH2D4A, SLC4A3, SLC6A19, SPEN, SRSF4, SS18L1, STMN4, SYNE1, TDRP, TMEFF1, TTN, UNC80, WNK1, YTHDF2, ZAR1L, ZC3H13, ZC3H3, ZCCHC6, ZEB2, ZNF496, ZNF608, ZNF618, ZNF688, ZNF750, ZNF821

[0142] It is understood that the examples and embodiments described in the present disclosure are for illustrative purposes only and that various modifications or changes m light thereof will be suggested to persons skilled in the art and are to be included within the spirit and purview of this application and scope of the appended claims. All publications, patents, and patent applications cited in the present disclosure are hereby incorporated by reference in their entirety for all purposes.Attorney Docket No. 110221 -1532974-011210WOSELECTED REFERENCES CITED IN THE DISCLOSUREAltschul et al. 1990 “Basic local alignment search tool.” J. Moi. Biol. 215:403-410.Altschul et al. 1997. “Gapped BLAST and PSI-BLAST: a new generation of protein database search programs.” Nucleic Acids Res. 25:3389-3402.Bodansky et al. 2024. “Unveiling the Proteome-Wide Autoreactome Enables Enhanced Evaluation of Emerging CAR- 1' Therapies in Autoimmunity.” The Journal of Clinical Investigation, May. https: / / doi.org / 10.1172 / JCI180012.Bulik-Sullivan et al. 2018. “Deep Learning Using Tumor HLA Peptide Mass Spectrometry Datasets Improves Neoantigen Identification.” Nature Biotechnology 37 (1): 55-63. https: / / doi. org / 10.1038 / nbt.4313.Chen et al. 2022. “Detecting Antibody Reactivities in Phage Immunoprecipitation Sequencing Data.” BMC Genomics 23 (1): 654. https: / / doi.org / 10.1186 / sl2864-022-08869-y.Cho etal. 2017. “Proportion and Clinical Features of Never-Smokers with Non-Small Cell Lung Cancer.” Chinese Journal of Cancer 36 (1 ): 20. https: / / doi.org / 10.1186 / s40880-017-0187-6.Ding et al. 2011. “The Economic Burden of Incidentally Detected Findings.” Radiologic Clinics of North / America 49 (2): 257-65. https: / / doi.org / 10.1016 / j.rcl.2010.11.004.Fernandez et al. 2022. “Cancer Screening Companies Are Rapidly Proliferating: Are They Ready for Business?” Cancer Epidemiology, Biomarkers & Prevention 31 (6): 1146-50. https: / / doi.org / 10.1158 / 1055-9965. EPI-22-0102.Jardine et al. 2013. “Rational HIV immunogen design to target specific germline B cell receptors.” Science 340(6133):711-716Jemal et al. 2023. “The Burden of Lung Cancer in Women Compared With Men in the US.” JAMA Oncology, October. https: / / doi, org / 10.1001 / jamaoncol.2023, 4415.Attorney Docket No. 110221 -1532974-011210WOJonas etal. 2021. “Screening for Lung Cancer With Low-Dose Computed Tomography: Updated Evidence Report and Systematic Review for the US Preventive Services Task Force.” JAMA 325 (10): 971. https: / / doi.org / 10.100Ljama.2021.0377.Henikoff and Henikoff 1989. “Amino acid substitution matrices from protein blocks.” Proc. Natl. Acad. Sci. USA 89:10915-10919.Karlin and Altschul 1993. “Applications and statistics for multiple high-scoring segments in molecular sequences.” Proc. Natl. Acad. Sci. USA 90:5873-5787.Kim et al. 2013. “Protein immobilization techniques for microfluidic assays.” Bio microfluidics 7(4):041501Koning et al. 2020. “Reduced Lung-Cancer Mortality with Volume CT Screening m a Randomized Trial.” New England Journal of Medicine 382 (6): 503-13.https: / / do i. org / 10.1056 / NE JMoa 1911793.Larman et al. 2011. “Autoantigen Discovery with a Synthetic Human Peptidome.” Nature Biotechnology 29 (6): 535-41. https: / / doi.org / 10.1038 / nbt.1856.Lastwika et al. 2023. “Posttranslational Modifications Induce Autoantibodies with Risk Prediction Capability in Patients with Small Cell Lung Cancer.” Science Translational Medicine 15 (678): eadd8469. https: / / doi.org / 10.1126 / scitranslmed.add8469.Lee et al. 1997 “New energy transfer dyes for DNA sequencing.” Nucleic acids research 25.14(2816-2822.Liu et al. 2020. “A Viral Exposure Signature Defines Early Onset of Hepatocellular Carcinoma.” Cell 182 (2): 317-328.el0. https: / / doi. Org / 10. I0I6 / j.cell.2020.05.038.Liu et al. 2021. “PRRC2A Promotes Hepatocellular Carcinoma Progression and Associates with Immune Infiltration,” Journal of Hepatocellular Carcinoma Volume 8 (December): 1495-1511. https: / / doi.org / T0.2147 / JI-IC. S337I I I.Attorney Docket No. 110221 -1532974-011210WOMa et al. 2020. “Enhanced Expression of Queuine tRNA-Ribosyltransferase 1 (QTRT1) Predicts Poor Prognosis in Lung Adenocarcinoma.” Annals of Translational Medicine 8 (24): 1658-1658. https: / / doi. org / 10.21037 / atm-20-7424.Mandel-Brehm et al. 2019. “Kelch-like Protein 11 Antibodies in Seminoma- Associated Paraneoplastic Encephalitis.” New England Journal of Medicine 381 (1): 47-54. https: / / doi.org / 10.1056 / NEJMoal 816721.Mandel et al. 2023. “Autoantibodies to Perilipin-1 Define a Subset of Acquired Generalized Lipodystrophy.” Diabetes 72 (1): 59-70. https: / / doi.org / 10.2337 / db21-1172.Mann et al. 2022. “Scaled High Throughput Vacuum PhIP Protocol,” March. https: / / www.protocols.io / view / scaled-high-throughput-vacuum-phip-protocol-btnznmf6.Mazzone etal. 2021. “Screening for Lung Cancer.” Chest 160 (5): e427-94.https: / / doi. org / 10.1016 / j. chest.2021.06.063.McGuire et al. 2014. “Diverse Recombinant HIV-1 Envs Fail to Activate B Cells Expressing the Germline B Cell Receptors of the Broadly Neutralizing Anti-HIV-1 Antibodies PG9 and 447-52D.” J. of Virology 88(5):2645-2657Meza et l. 2021. “Evaluation of the Benefits and Harms of Lung Cancer Screening With Low- Dose Computed Tomography: Modeling Study for the US Preventive Services Task Force.” JAMA 325 (10): 988. https: / / doi. org / 10.1001, (jama.2021.1077.Mohan et al. 2018. “PhlP-Seq Characterization of Serum Antibodies Using Oligonucleotide-Encoded Peptidomes.” Nature Protocols 13 (9): 1958-78. https: / / doi.org / T0. I038 / s41596-018- 0025-6.Needleman and Wunsch 1970. “A general method applicable to the search for similarities in the amino acid sequence of two proteins.” J Mol Biol. 48(3):443-53Novellis et al. 2021. “Lung Cancer Screening: Who Pays? Who Receives? The European Perspectives.” Translational Lung Cancer Research 10 (5): 2395-2406.https: / / d0i.0rg / l 0.21037 / tlcr-20-677.Attorney Docket No. 110221 -1532974-011210WOO’Donnell et l. 2019. “Cancer Immunoediting and Resistance to T Cell-Based Immunotherapy.” Nature Reviews Clinical Oncology 16 (3): 151—67.https: / / d0i.0rg / l 0.1038 / s41571 -018-0142-8.O’Donovan el al. 2020. “High-Resolution Epitope Mapping of Anti-Hu and Anti-Yo Autoimmunity by Programmable Phage Display.” Brain Communications 2 (2): fcaa059. https: / / doi.org / 10.1093 / braincomms / fcaa059.Patel et al. 2022. “A Highly Predictive Autoantibody-Based Biomarker Panel for Prognosis in Early-Stage NSCLC with Potential Therapeutic Implications.” British Journal of Cancer 126 (2): 238-46. https: / / d01.0rg / l 0.1038 / s41416-021 -01572-x.Pearson and Lipman 1988. “Improved tools for biological sequence comparison.” Proc. Natl. Acad. Sci. USA 85(8):2444-8.Pelosof etal. 2017. “Proportion of Never-Smoker Non-Small Cell Lung Cancer Patients at Three Diverse Institutions.” J ournal of the National Cancer Institute 109 (7): djw295. https: / / d0i.0rg / l 0.1093 / jnci / djw295.Rackaityte et l. 2023. “Validation of a Murine Proteome-Wide Phage Display Library for Identification of Autoantibody Specificities.” J CI Insight, November.https: / / doi.org / 10.1172 / j ci. insight.174976.Raghavan et al. 2023. “1 Proteome-Wide Antigenic Profiling in Ugandan Cohorts Identifies Associations between Age, 2 Exposure Intensity, and Responses to Repeat-Containing Antigens in Plasmodium Falciparum.” eLife.Schrag et al. 2023. “Blood-Based Tests for Multicancer Early Detection (PATHFINDER): A Prospective Cohort Study.” The Lancet 402 (10409): 1251-60. https: / / doi.org / 10.1016 / S0140-6736(23)01700-2.Sidhom et al. 2021. “DeepTCR Is a Deep Learning Framework for Revealing Sequence Concepts within T-Cell Repertoires.” Nature Communications 12 (1): 1605.https: / / do i. org / 10.1038 / s41467- 021 -21879- w.Attorney Docket No. 110221 -1532974-011210WOSiegel et al. 2023.” CA: A Cancer Journal for Clinicians 73 (1): 17-48.https: / / doi. or g / 10.3322 / caac.21763.Smith and Waterman 1981. “Identification of common molecular subsequences.” J Mol Biol. 147(1): 195-7.Vazquez etal. 2020. “Identification of Novel, Clinically Correlated Autoantigens in the Monogenic Autoimmune Syndrome AI’Sl by Proteome-Wide PhlP-Seq.” eLife 9 (May):e55053. https: / / doi. org / 10.7554 / eLife.55053.Vazquez etal. 2022. “Autoantibody Discovery across Monogenic, Acquired, and COVID-19-Associated Autoimmunity with Scalable PhlP-Seq.” Edited by Antony Rosen, Satyajit Rath, and Shiv Pillai. eLife 11 (October):e78550. https: / / doi.org / 10.7554 / eLife.78550.Yang et al. 2022. “Autoantibodies as Biomarkers for Breast Cancer Diagnosis and Prognosis.” Frontiers in Immunology 13 (November): 1035402. https: / / doi.org / 10.3389 / fimmu.2022.1035402.Zhang et al. 2021. “Genomic and Evolutionary Classification of Lung Cancer in Never Smokers.” Nature Genetics 53 (9): 1348-59. https: / / doi.org / 10.1038 / s4I588-021-00920-0.Zhao et al. 2012. “RAPSearch2: A Fast and Memory- Efficient Protein Similarity Search Tool for next-Generation Sequencing Data.” Bioinformatics 28 (1): 125-26.https: / / d0i.0rg / T 0.1093 / bioinformatics / btr595.

Claims

Attorney Docket No. 110221 -1532974-011210WOWHAT IS CLAIMED IS:

1. A method to measure presence and / or amount of one or more biomarkers associated with Non-Small Cell Lung Cancer (NSCLC) m a subject, comprising measuring in a biological sample obtained from the subject presence or amount of an expression product from one or more genes encoding the one or more biomarkers associated with NSCLC, wherein the one or more genes are shown in Table 5 A, Table 5B, and / or Table 5C, wherein the measured presence or amount of the expression product in the biological samples identifies the subject as being at risk for or having NSCLC.

2. A method of identifying a subject at risk for Non-Small Cell Lung Cancer (NSCLC), comprising, measuring in a biological sample obtained from the subject an amount of an expression product from one or more genes encoding the biomarkers associated with NSCLC, wherein the one or more genes are shown in Table 5 A, Table 5B, and / or Table 5C, and wherein presence of an altered level of the expression product from the one or more genes, as compared to a healthy control or to a pre-established amount, identifies the subject as being at risk for NSCLC.

3. A method of identifying a subject with Non-Small Cell Lung Cancer (NSCLC) and treating the subject, the method comprising:(a) measuring in a biological sample obtained from the subject an altered amount of an expression product, as compared to an amount measured in a healthy control or a pre-established amount, from one or more genes encoding the biomarkers associated with NSCLC, wherein the one or more genes are shown in Table 5A, Table 5B, and / or Table 5C; and,(b) administering to the subject one or more NSCLC treatments.

4. The method of claim 3, wherein the one or more NSCLC treatments comprise one or more of surgery, chemotherapy, targeted therapy, or radiation therapy.

5. A method of identifying a subject with Non-Small Cell Lung Cancer (NSCLC), and monitoring the subject, the method comprising:(a) measuring in a biological sample obtained from the subject an altered amount, as compared to an amount measured in a healthy control or a pre-established amount, of anAttorney Docket No. 110221 -1532974-011210WOexpression product from one or more genes encoding one or more biomarkers associated with NSCLC, wherein the one or more genes are shown in Table 5 A, Table 5B, and / or Table 5C; and, (b) repeating step (a) at a later time point to determine if an altered amount measured at the later time point increased or decreased, as compared to the altered amount measured in step (a).

6. The method of any one of claims 1 to 5, wherein the one or more genes comprise at least one of PRRC2A, LAX1, QTRT1, or NRAC.

7. The method of any one of claims 1 to 6, wherein the one or more genes are all genes shown Table 5 A, Table 5B, and / or Table 5C.

8. The method of any one of claims 1 to 7, wherein the expression product is and / or a protein or an mRNA.

9. The method of any one of claims step 1 to 8, wherein the step of measuring comprises performing one or more of laboratory methods.

10. The method of claim 9, wherein the one or more laboratory methods comprise polymerase chain reaction, an immunoassay, or using an array of expression products.

11. The method of any one of claims 1 to 10, wherein the biological sample comprises at least one biological fluid or a tissue.

12. The method of claim 11, wherein the biological fluid comprises blood, lymph, plasma, serum, interstitial fluid, phlegm, sputum, or pleural fluid.

13. The method of claim 11, wherein the tissue sample comprises on or more of a lung tissue or a bronchial tissue.

14. A kit comprising at least one reagent and / or at least one device for detecting an expression product of the one or more genes shown in Table 5 A, Table 5B, or Table 5C.

15. The kit of claim 14, wherein the one more genes comprise at least one of PRRC2A, LAX1, QTRT1, or NRAC.Attorney Docket No. 110221 -1532974-011210WO16. The kit of claim 14 or 15, wherein the one or more genes are all genes shown Table 5 A, Table 5B, and / or Table 5C.

17. The kit of any of claims 14 to 16, wherein, the expression product is a protein and / or an mRNA.

18. The kit of any one of claims 14 to 17, wherein the at least one reagent comprises one or more reagents for performing PCR and / or immunoassay.

19. The kit of any one of claims 14 to 18, wherein the at least one device comprises an array of expression products.

20. A system for performing one or more of the steps of the methods of claims 1 to 13 and / or using the kit of any of claims 14 to 18.

21. The system of claim 20, comprising a computer-program product tangibly embodied in a non-transitory machine-readable storage medium, including instructions configured to run the system and / or perform one or of the steps of the methods of claims 1 to 12.

22. A method of detecting presence and / or amount of one or more antibodies to one or more biomarkers associated with Non-Small Cell Lung Cancer (NSCLC) in a subject, wherein the one or more biomarkers are one or more polypeptides encoded by one or more genes shown in Table 5A, Table 5B, or Table 5C, the method comprising:(a) contacting a biological sample obtained from the subject with the one or more polypeptides encoded by the one or more genes shown in Table 5A, Table 5B, or Table 5C; and,(b) detecting binding of the one or more polypeptides with the one or more antibodies in the biological sample.Attorney Docket No. 110221 -1532974-011210WO23. A method of identifying a subject at risk for Non-Small Cell Lung Cancer (NSCLC), comprising:(a) contacting a biological sample obtained from the subject with one or more polypeptides encoded by one or more genes shown in Table 5A, Table 5B, or Table 5C; and,(b) detecting binding of the one or more polypeptides with one or more antibodies in the biological sample.wherein the presence of binding of the one or more polypeptides with the one or more antibodies in the biological sample identifies the subject as being at risk for NSCLC.

24. A method of detecting and / or amount of one or more antibodies to one or more biomarkers associated with Non-Small Cell Lung Cancer (NSCLC) in a subject, wherein the one or more biomarkers are one or more polypeptides encoded by one or more genes shown in Table 5A, Table 5B, or Table 5C, and treating the subject, the method comprising:(a) contacting a biological sample obtained from the subject with the one or more polypeptides encoded by the one or more genes shown in Table 5 A, Table 5B, or Table 5C;(b) detecting presence of binding of the one or more polypeptides to the one or more antibodies in the biological sample; and,(c) administering to the subject one or more NSCLC treatments.

25. The method of claim 24, wherein the one or more NSCLC treatments comprise one or more of one or more of surgery, chemotherapy, targeted therapy, or radiation therapy.

26. The method of any one of claims 22 to 25 wherein the one or more genes compri se at least one of PRRC2A, LAX1, QTRT1, or NRAC,27. The method of any one of claims 22 to 26, wherein the one or more genes are all genes shown Table 5A, Table 5B, and / or Table 5CAttorney Docket No. 110221 -1532974-011210WO28. The method of any one of claims 22 to 27, wherein the one or more polypeptides comprise at least one sequence with at least 90% sequence similarity to SEQ ID NO: 1, SEQ ID NO:2, SEQ ID NO:3, or SEQ ID NO:4.

29. The method of any one of claims 22 to 28, wherein the one or more polypeptides are one or more heterologously expressed polypeptides.

30. The method of any claim 29, wherein the one or more heterologously expressed polypeptides are expressed on a surface of a cell, on a phage, or on a virus.

31. The method of claim 29 or 30, wherein the one or more heterologously expressed polypeptides are expressed in a phage display or eukaryotic cell display library.

32. The method of any one of claims 22 to 31, wherein the one or more polypeptides are isolated polypeptides.

33. The method of any one of claims 22 to 32, wherein the one or more polypeptides are immobilized on a solid carrier.

34. The method of any one of claims 22 to 33, wherein step (b) comprises performing at least one of immunoprecipitation, microarray analysis, enzyme-linked immunosorbent assay (ELISA), or Western blot analysis.

35. The method of any one of claims 22 to 34, wherein the biological sample comprises at least one biological fluid or a tissue.

36. The method of claim 35, wherein the biological fluid comprises blood, lymph, plasma, serum, interstitial fluid, phlegm, sputum, or pleural fluid.

37. The method of claim 35, wherein the tissue comprises a lung tissue or a bronchial tissue,38. The method of any one of claims 22 to 37, wherein the one or more antibodies in the biological sample comprise autoantibodies.Attorney Docket No. 110221 -1532974-011210WO39. A kit for detecting one or more antibodies to one or more biomarkers associated with Non-Small Cell Lung Cancer (NSCLC), comprising one or more polypeptides encoded by one or more genes shown in Table 5 A, Table 5B, or Table 5C, and one or more other reagents for detecting the one or more antibodies in the sample.

40. A device for detecting one or more antibodies to one or more biomarkers associated with Non-Small Cell Lung Cancer (NSCLC), comprising one or more polypeptides encoded by one or more genes shown in Table 5 A, Table 5B, or Table 5C.

41. The device of claim 40, wherein the one or more polypeptides are immobilized on a surface of a solid carrier included in the device.

42. The device of claim 41, wherein the solid carrier is a slide, a chip, a plate, a plurality of fibers, a plurality of beads, a chromatography column, or a membrane.

43. A system for detecting one or more antibodies to one or more biomarkers associated with Non-Small Cell Lung Cancer (NSCLC), comprising a device of any one of claims 40 to 42 and one or more reagents, other than the one or more polypeptides.

44. The kit of claim 39, the device of any one of claims 40 to 42, or the system of claim 43, wherein the one or more genes are all genes shown Table 5A, Table 5B, and / or Table 5C.

45. The kit of claim 39 or 44, the device of any one of claims 40 to 42, or the system of claim 43, wherein the one more genes comprise at least one of PRRC2A, LAX1, QTRT1, or NRAC.

46. The kit of claim 39, the device of any one of claims 40 to 42, or the system of claim 43, wherein the one or more polypeptides comprise at least one sequence with at least 90% sequence similarity to SEQ ID NO:1, SEQ ID NO: 2, SEQ ID NO: 3, or SEQ ID NO:4.Attorney Docket No. 110221 -1532974-011210WO47. A method of detecting antigens of autoantibodies associated with Non-Small Cell Lung Cancer (NSCLC), comprising:(a) contacting a first biological sample obtained from a healthy human subject with a peptide display library under conditions sufficient to permit binding of a first plurality of antibodies from the first biological sample to a first plurality of corresponding polypeptides within the peptide display library to generate a first plurality of antibody- polypeptide complexes;(b) contacting a second biological sample obtained from a human subject who has NSCLC with the peptide display library under conditions sufficient to permit binding of a second plurality of antibodies from the first biological sample to a second plurality' of corresponding polypeptides within the peptide display library to generate a second plurality of antibody-polypeptide complexes;(c) identifying the first plurality of the polypeptides and the second plurality of the polypeptides; and,(d) using a computer system, identifying a third plurality of polypeptides as the polypeptides found in the second plurality of the polypeptides but not in the first plurality of polypeptides,wherein the third plurality of the polypeptides are the antigens of the autoantibodies associated with the risk of NSCLC.

48. The method of claim 47, wherein the peptide display library is a phage display library.

49. The method of claim 48, wherein a nucleic sequence encoding each peptide of the phage display library comprises a unique nucleic acid barcode sequence.Attorney Docket No. 110221 -1532974-011210WO50. The method of claim 49, further comprising:subjecting the first plurality of complexes and the second plurality of complexes to nucleic acid amplification under conditions sufficient to amplify the unique nucleic acid barcode sequence to generate a plurality of amplified nucleic acid sequences;determining sequences of the amplified nucleic acid sequences to generate determined sequences; and,in step (d), using the determined sequences for the identifying the first plurality of the polypeptides and the second plurality of the polypeptides.

51. The method of any one of claims 47 to 50, wherein the third plurality of the polypeptides comprises polypeptides with at least 90% sequence similarity to SEQ ID NO:1, SEQ ID NO:2, SEQ ID NO: 3, or SEQ ID NO:4.