Diagnostic cancer signature

The integration of Raman spectral analysis with machine learning enhances cancer detection by identifying specific wavenumber ranges and signal intensity differences in biological samples, addressing the limitations of current methods and improving diagnostic accuracy for multiple cancer types and precancerous conditions.

JP2026518969APending Publication Date: 2026-06-11UNIVERSITY COLLEGE OF SWANSEA

Patent Information

Authority / Receiving Office
JP · JP
Patent Type
Applications
Current Assignee / Owner
UNIVERSITY COLLEGE OF SWANSEA
Filing Date
2024-05-10
Publication Date
2026-06-11

AI Technical Summary

Technical Problem

Current cancer detection methods, such as flexible bronchoscopy, CT scans, and X-rays, are ineffective in early detection, and existing Raman spectroscopy techniques lack reliability and diagnostic resolution for distinguishing cancer types and precancerous conditions.

Method used

An improved Raman spectral waveform signature analysis combined with machine learning methods to identify discriminative features in biological samples, particularly blood, for diagnosing multiple cancer types and distinguishing between them, using specific wavenumber ranges and signal intensity differences.

🎯Benefits of technology

Provides a robust and highly accurate pan-cancer diagnostic method with unprecedented sensitivity and specificity, capable of identifying various cancer types and precancerous conditions, and monitoring cancer progression.

✦ Generated by Eureka AI based on patent content.

Smart Images

  • Figure 2026518969000001_ABST
    Figure 2026518969000001_ABST
Patent Text Reader

Abstract

The present invention relates to a method for determining cancer in a subject, comprising performing Raman spectral analysis of a biological sample obtained from the subject, which may also include comparing a portion of the spectrum from the sample with a portion of the spectrum of a control; the use of the aforementioned method for determining the likelihood of cancer, particularly breast cancer, pancreatic cancer, lung cancer and / or colorectal cancer; the use of the aforementioned method for further selecting a course of action for cancer, and methods of action including the same; and a kit of parts for use in the aforementioned method.
Need to check novelty before this filing date? Find Prior Art

Description

[Technical Field] 【0001】 The present invention relates to a method for determining cancer in a subject, comprising performing Raman spectral analysis of a biological sample obtained from the subject, which may also include comparing a portion of the spectrum from the sample with a portion of the spectrum of a control; the use of the aforementioned method for determining the likelihood of cancer, particularly breast cancer, pancreatic cancer, lung cancer and / or colorectal cancer; the use of the aforementioned method for further selecting a course of action for cancer, and methods of action including the same; and a kit of parts for use in the aforementioned method. [Background technology] 【0002】 Cancer is the leading cause of death worldwide, accounting for 10 million deaths annually. Currently available methods for detecting cancer include flexible bronchoscopy, computed tomography (CT) scans, and X-rays. However, as evidenced by the extremely low diagnostic rates of early-stage disease, these techniques are not very effective in the early detection of the disease. 【0003】 Pan-cancer diagnostic testing has become a rapidly growing field of research in recent years, with the potential to fundamentally transform the current path to diagnosis. Raman spectroscopy offers a non-destructive, highly sensitive, and label-free tool that utilizes the biochemical changes of a sample under laser exposure. Raman spectra contain a rich molecular fingerprint of the sample, as incident laser light induces the Raman effect, which is the inelastic scattering of light. This change in energy in the scattered light is related to the chemical composition of the sample and its responsiveness to laser energy. Due to its inherent ease of use, high reproducibility, and non-invasiveness, Raman has long been successfully applied to a wide range of biofluid and tissue samples and has been seen as a promising, highly sensitive diagnostic tool, particularly in the form of surface-enhanced Raman (SERS) techniques, for distinguishing neoplasms from normal cells in cancers such as colorectal cancer, prostate cancer, breast cancer, cervical cancer, gastric cancer, oral cancer, and esophageal cancer. However, many of these earlier studies typically only demonstrated the potential of Raman spectroscopy by utilizing complex systems and methods to chemically / plasmonically enhance Raman scattering signatures from small sample sets, and therefore the lack of reliability and diagnostic resolution was not particularly significant to this and did not impact the clinical context. [Overview of the Initiative] [Problems that the invention aims to solve] 【0004】 Therefore, there is a clear unmet need for improved cancer diagnostic tools and vibrational spectroscopy in clinical settings that can precisely identify and distinguish patients / animals with cancer, and especially when patients / animals with cancer present with ambiguous symptoms, they can further distinguish the specific nature of the cancer they have from the individual / animal with cancer. Furthermore, there is a great need for the ability to use simple, highly powerful, non-invasive techniques that can achieve early diagnosis and thus improve the survival rate of patients / animals. [Means for solving the problem] 【0005】 Therefore, the inventors herein disclose an improved Raman spectral waveform signature and analysis that can be used to diagnose multiple cancer types, preferably by standard or surface-enhancement-free Raman techniques, and importantly and excellently, can even distinguish between individuals the nature of the specific cancer with which an individual is afflicted. In particular, through an exhaustive study of a large dataset, the inventors were able to combine the complex nature of biological Raman spectra with exploration via machine learning methods to provide a robust analysis for determining discriminative features among patient / animal groups representing reliable key diagnostic waveform signatures in multiple cancer types. Through subsequent validation, the inventors have found that such signatures provide a robust and superior diagnostic assay that boosts the diagnostic power of Raman analysis in cancer to levels of increased sensitivity and specificity across multiple cancer types that are a prerequisite in any clinical diagnostic test. 【0006】 According to a first aspect of the present invention, a method for determining cancer in a subject, comprising: i) performing Raman spectral analysis of a biological sample obtained from the subject to generate a test sample spectrum; ii) comparing the test sample spectrum obtained in step i) with at least one control sample Raman spectrum from a control subject; iii) about 622 cm -1 、about 644 cm -1 、about 666 cm -1 、about 682 cm -1 、about 700 cm -1 、about 719 cm -1 、about 745 cm -1 、about 758 cm -1 、about 803 cm -1 、about 829 cm -1 、about 853 cm -1 、about 880 cm -1 、about 943 cm -1 、about 957 cm -1 、about 1004 cm -1 、about 1031 cm -1, approx. 1051cm -1 Approximately 1062cm -1 , approx. 1084cm -1 , approx. 1127cm -1 , approx. 1158cm -1 , approx. 1175cm -1 , approx. 1272cm -1 , approx. 1341cm -1 Approximately 1402cm -1 , approx. 1411cm -1 Approximately 1449cm -1 , approx. 1498cm -1 , approx. 1521cm -1 Approximately 1556cm -1 , approx. 1587cm -1 , approx. 1607cm -1 , approx. 1657cm -1 The difference between the spectrum of the test sample and the spectrum of the control sample at one or more wavenumbers within one or more ranges selected from the group including indicates a subject suffering from cancer. A method is provided that includes this. 【0007】 In a preferred embodiment, the subject is a mammal, more preferably a human, horse, dog, cat, pig, or any other domestic or agricultural species. In a particularly preferred embodiment, the subject is a human. 【0008】 Notably, when analyzing biological samples, particularly blood, these specific wave frequencies can accurately predict and diagnose multiple cancer types, thus providing a robust and highly accurate pan-cancer diagnostic method with unprecedented levels of sensitivity and specificity. 【0009】 In preferred embodiments, the biological sample may be a tissue or biopsy sample, or a processed derivative thereof. In preferred methods, the biological sample is a blood sample. A reference to a blood sample includes liquid or dried whole blood, or fractions thereof including serum and / or plasma, or processed derivatives thereof. Preferably, the blood sample is a liquid sample. A reference to a processed derivative typically includes a reference to a blood sample after it has been processed, for the purpose of preparing the blood sample for the method of the present invention or preserving the blood sample before carrying out the method, and includes the use of conventional techniques well known to those skilled in the art for collecting, preparing or preserving biological samples. Such techniques include, but are not limited to, freezing, thawing, and / or dehydrating the sample. 【0010】 References to control samples herein refer to samples obtained from at least one person / one control subject who has been shown to be cancer-free using any one or more conventional techniques for identifying cancer, including but not limited to bronchoscopy, sigmoidoscopy, colonoscopy, CT scan, X-ray, ultrasound, MRI, and biopsy. As disclosed herein, through rigorous analysis and machine learning, it has become clear that improved Raman spectral waveform signatures have been determined that can be used to diagnose and predict various cancer types, and importantly, and excellently, not only to predict the presence of cancer, but also to actually diagnose specific cancers that a subject may have. 【0011】 Preferably, multiple control spectra are obtained for each control subject, ideally from multiple control subjects, and each spectrum preferably undergoes one or more, preferably two or more, of wavenumber correction, baseline correction, and vector normalization. 【0012】 As used herein, the term “cancer” refers to a cell having an abnormal condition or state characterized by autonomous proliferation, i.e., uncontrolled cell growth. This term is intended to include all types of cancerous growth or carcinogenic processes, metastatic tissue, or malignantly transformed cells, tissues, or organs, regardless of histopathological type or invasive stage. 【0013】 Most preferably, cancers referred to herein include the following cancers: nasopharyngeal cancer, synovial cancer, hepatocellular carcinoma, renal cancer, connective tissue cancer, melanoma, lung cancer, intestinal cancer, colorectal cancer, brain cancer, throat cancer, oral cancer, liver cancer, bone cancer, pancreatic cancer, choriocarcinoma, gastrin-producing tumors, pheochromocytoma, prolactin-producing tumors, T-cell leukemia / lymphoma, tonsil, spleen, neuroma, von Hippel-Lindau disease, Zollinger-Ellison syndrome, adrenal cancer, anal cancer, bile duct cancer, bladder cancer, ureteral cancer, glioma, oligodendroglioma, neuroblastoma, meningioma, spinal cord tumor, osteochondroma, chondrosarcoma, Ewing's sarcoma, carcinoid, gastrointestinal carcinoid, fibrosarcoma, breast cancer, muscle cancer, Paget's disease, cervical cancer, rectal cancer, esophageal cancer, gallbladder cancer, This includes any one or more of the following: cholangiocarcinoma, head cancer, eye cancer, nasopharyngeal cancer, cervical cancer, kidney cancer, Wilms' tumor, liver cancer, Kaposi's sarcoma, prostate cancer, testicular cancer, Hodgkin's disease, non-Hodgkin lymphoma, skin cancer, mesothelioma, myeloma, multiple myeloma, ovarian cancer, endocrine and pancreatic cancer, glucagon-producing tumors, parathyroid cancer, penile cancer, pituitary cancer, soft tissue sarcoma, retinoblastoma, small intestine cancer, gastric cancer, thymic cancer, thyroid cancer, trophoblastic carcinoma, hydatidiform mole, uterine cancer, endometrial cancer, vaginal cancer, vulvar cancer, acoustic neuroma, mycosis fungoides, insulinoma, carcinoid syndrome, somatostatin-producing tumors, gingival cancer, heart cancer, lip cancer, meningeal cancer, oral cancer, nerve cancer, palatine cancer, parotid gland cancer, peritoneal cancer, pharyngeal cancer, pleural cancer, salivary gland cancer, tongue cancer, and tonsil cancer. 【0014】 More preferably, the cancer is selected from the group including the following cancers: colorectal cancer, lung cancer, pancreatic cancer, and breast cancer. 【0015】 Furthermore, the methods described herein may also be able to identify precancerous conditions and distinguish them from cancer. As used herein, reference to a precancerous condition means a sample provided from an individual / person confirmed to be cancer-free using any one or more conventional techniques for identifying cancer, such as those listed above, and / or from a subject who does not yet have confirmed cancer but is observed to have a premalignant or precancerous condition or disorder that, if left untreated, has an increased potential to progress to cancer, as determined using any one or more of the conventional techniques described above for identifying cancer. For example, this includes, but is not limited to, ductal carcinoma in situ, lobular carcinoma in situ, sclerosing adenosis, tubular papilloma as a pre-malignant neoplasm of breast cancer; colorectal adenoma, sessile serrated lesion, Lynch syndrome, familial adenomatous polyposis as a pre-malignant neoplasm of colorectal cancer; or mucinous cystic neoplasms (MCNs), intraductal papillary mucinous neoplasms (IPMNs), or neuroendocrine neoplasms as pre-malignant neoplasms of pancreatic cancer. 【0016】 As can be understood, this method thus involves comparing the spectrum of a test sample with the spectrum of a control sample obtained from a healthy individual / individual, or an individual / individual with a precancerous condition as defined herein, and / or both. In this way, it is possible to observe the difference in the test spectrum (where disease or precancerous condition is confirmed or suspected) compared with the healthy / precancerous spectrum, i.e., to determine the difference indicating cancer. Furthermore, an additional level of analysis can be added by optionally comparing the spectrum of a precancerous individual / individual with a healthy control, thereby specifically determining the difference in the spectrum of a precancerous disease condition. 【0017】 In a more preferred embodiment, the control and / or test sample are matched for age, sex, and weight. 【0018】 In the preferred method of the present invention, Raman analysis is performed on a liquid blood or blood product sample obtained from the subject. This minimizes additional drying processes. The blood or blood product sample is preferably fresh. 【0019】 Blood or blood product samples may also be analyzed after being dried. Therefore, the method may include a step of drying the sample. The drying step may include drying the sample at room temperature or via assisted drying (e.g., vacuum drying). It is beneficial that the sample may be dried on a sample holder. 【0020】 In the present invention, Raman analysis involves irradiating a sample with a light source, preferably a laser light source, more preferably a laser light source configured to emit light in the infrared wavelength band, most preferably 785 nm. In a particularly preferred embodiment, the laser light source generates laser light having an output of about 10 mW to about 1 W, more preferably about 50 mW to about 500 mW. 【0021】 The output spectrum is preferably about 600 to about 1700 cm⁻¹. -1 The spectral range is recorded and analyzed. This range is determined to encompass the maximum spectral output that allows for reproducible identification. 【0022】 In this specification, the term "approximately" means plus or minus 5%, most preferably plus or minus 2%. For example, due to the variability of the sample by the nature of the technique, there may be variations around the listed wavenumbers, e.g., ±5 cm. -1 ±4cm -1 ±3cm -1 ±2cm -1 ±1cm -1 Those skilled in the art will understand that such a thing may exist. 【0023】 As will be understood by those skilled in the art, in a preferred method of the present invention, the spectrum or each of the spectra preferably undergoes one or more conventional preprocessing steps before or after the comparison step to reduce noise associated with one or more spectra and provide the processed spectra of the spectrum or each of the spectra. The (one or more) preprocessing steps may include one or more of the following: data binning, smoothing, background removal (e.g., extended multiplicative scattering correction), and / or normalization such as vector normalization and / or baseline correction, or other methods known to those skilled in the art. Preferably, multiple output spectra are obtained, each spectrum preferably undergoes one or more, preferably two or more, of the following: wavenumber correction, baseline correction and vector normalization. 【0024】 In a preferred embodiment, the or each processed spectrum may be further processed to provide one or more processed or differential spectra, such as, but not limited to, dimensionality-reduced spectra or feature-reduced spectra. The or each differential spectrum is then compared to a similarly derived control spectrum. As understood, likewise, such derived spectra may be prepared by comparing the sample to a control to facilitate the identification of additional or further features of interest. For example, a random forest Gini importance derived spectrum or trace may be prepared by comparing the sample and a control spectrum to aid in the visual identification of significant spectral regions of difference or importance, and may be used to evaluate changes in the spectrum or region of interest as defined herein. 【0025】 As can be understood, the method of the present invention, in particular the identification of spectral wavenumber regions (key features) that are diagnostically appropriate for determining cancerous conditions, was derived by post-processing and comparing a large spectral dataset obtained from healthy subjects with subjects known to have cancer or precancerous conditions. Such post-processing steps involve the application of machine learning methods aimed at generating mathematical models that enable the classification of some condition. This is achieved using supervised learning, in which examples of a given class (e.g., affected or unaffected) are fed into a machine learning algorithm, and patterns within the given class are identified to enable the classification of untrained data, known as generalization. 【0026】 To tackle large datasets with over 1000 variables and significantly reduce their size, feature selection (FS) techniques, sometimes called feature reduction techniques, have been employed. This is particularly useful with Raman spectra where the data are highly correlated, and therefore some wavenumbers may not provide any further additional information and can be removed. Many feature reduction methods exist that are known to those skilled in the art, such as principal component analysis, factor analysis, ElasticNet, and random forest feature selection. The preferred method used in this study was random forest feature selection (RFFS). 【0027】 Random forests tend to correct the tendency of pure decision trees to overfit the training set. In RF, the rule is to maximize the reduction of Gini impurity that occurs at each split. Gini impurity gives the probability that a new data point is misclassified, given the current split performance. This indicates how frequently a variable is selected at each split and the magnitude of the variable's discriminative power for the given classification problem. The inventors then selected the most important features and created Random Forest Feature Selection (RFFS). 【0028】 For each cancer type, we used either a single-cell cross-validation or LOOCV procedure to evaluate and optimize the RFFS algorithm using relatively small datasets. 【0029】 In a preferred method, step iii) includes or consists of observing a difference in which an increase or decrease in signal intensity at the same wavenumber or a shift in the position of the maximum or minimum signal value between wavenumbers between the test sample spectrum and the control sample spectrum generated from the sample indicates a subject affected by cancer. 【0030】 Most preferably, an increase or decrease in signal intensity between wavenumbers is observed between the spectrum generated from the sample and the control. 【0031】 The method described above is preferably a pan-cancer diagnostic method, and the method is approximately 666 cm -1 Approximately 682 cm -1 , approx. 829cm -1 , approx. 853cm -1 , about 700cm -1 , approx. 719cm -1 , about 745cm -1 , about 803cm -1 , about 1004cm -1 , approx. 1127cm -1 , approx. 1175cm -1 , approx. 1341cm -1 Approximately 1402cm -1 , approx. 1411cm -1 , approx. 1498cm -1 and approximately 1657cm -1 The method involves comparing the signal intensity of one or any combination of wavenumbers, including or selected from the group consisting of, the difference compared to a control, indicating a subject with cancer, and more preferably, further, a combination of at least two wavenumbers, more preferably, further, a combination of three, four, five, six, seven, eight, nine, ten, and / or eleven wavenumbers. Preferably, about 1004 cm⁻¹ -1 , approx. 1127cm -1 , approx. 1341cm -1 Approximately 1402cm-1 , approx. 1411cm -1 Observe the increased signal intensity at one or more wavenumbers, ideally selected from all of them, and / or approximately 666 cm⁻¹. -1 Approximately 682 cm -1 , about 700cm -1 , approx. 719cm -1 , about 745cm -1 , about 803cm -1 Observing a reduced signal intensity at one or more wavenumbers, ideally selected from all of them, indicates a subject affected by cancer. In a particularly preferred embodiment, approximately 1004 cm⁻¹ -1 Approximately 1402cm -1 Observe the increased signal intensity at a wavenumber selected from one or ideally both of the following, and / or approximately 666 cm⁻¹ -1 Approximately 682 cm -1 745cm -1 Observing a reduced signal intensity at one or more wavenumbers, ideally selected from all of them, indicates a subject with cancer. 【0032】 In an alternative embodiment, the method is approximately 644 cm -1 , approx. 666cm -1 Approximately 682 cm -1 , about 745cm -1 , approx. 853cm -1 , about 1004cm -1 , approx. 1031cm -1 Approximately 1062cm -1 , approx. 1084cm -1 , approx. 1127cm -1 , approx. 1158cm -1 , approx. 1175cm -1 , approx. 1341cm -1 Approximately 1402cm -1 , approx. 1411cm -1 , approx. 1498cm -1comparing the signal intensity at one or any combination of wave numbers selected from the group consisting of or including, and a difference compared to a control indicates a subject suffering from pancreatic cancer, more preferably, still, a combination of at least two wave numbers, more preferably, still, a combination of 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15 and / or 16 wave numbers. Preferably, about 644 cm -1 about 853 cm -1 about 1004 cm -1 about 1031 cm -1 about 1062 cm -1 about 1084 cm -1 about 1127 cm -1 about 1175 cm -1 about 1341 cm -1 about 1402 cm -1 about 1411 cm -1 about 1498 cm -1 observing an increased signal intensity at one or more, ideally all, of the wave numbers selected from, and / or about 666 cm -1 about 682 cm -1 about 745 cm -1 about 1158 cm -1 observing a decreased signal intensity at one or more, ideally all, of the wave numbers selected from indicates a subject suffering from pancreatic cancer. 【0033】 In a further alternative embodiment, the method comprises about 622 cm -1 about 644 cm -1 about 666 cm -1 about 682 cm -1 about 700 cm -1 about 719 cm -1 about 745 cm -1 about 803 cm -1 about 829 cm -1 about 853 cm -1 about 880 cm -1 about 1051 cm -1 about 1175 cm -1 about 1272 cm -1 about 1402 cm -1, approx. 1411cm -1 , approx. 1498cm -1 , approx. 1657cm -1 The method involves comparing the signal intensity of one or any combination of wavenumbers, including or selected from the group consisting of, the difference compared to a control, indicating a subject with lung cancer, more preferably a combination of at least two wavenumbers, more preferably a combination of at least two wavenumbers, more preferably a combination of three, four, five, six, seven, eight, nine, ten, eleven, twelfth, thirteenth, fifteenth, sixteenth, seventeenth, nineteenth, tenth, eleventh, twelveth, thirteenth, fourteenth, fifteenth, sixteenth, seventeenth, and / or eighteenth wavenumbers. Preferably about 829 cm⁻¹ -1 , approx. 853cm -1 , about 880cm -1 , approx. 1051cm -1 Approximately 1402cm -1 , approx. 1411cm -1 Observe the increased signal intensity at one or more wavenumbers, ideally selected from all of them, and / or 622 cm⁻¹. -1 , approx. 644cm -1 , approx. 666cm -1 Approximately 682 cm -1 , about 700cm -1 , approx. 719cm -1 , about 745cm -1 , about 803cm -1 , approx. 1175cm -1 , approx. 1272cm -1 , approx. 1498cm -1 , approx. 1657cm -1 Observing a reduced signal intensity at one or more wavenumbers, ideally selected from all of them, indicates a subject with lung cancer. 【0034】 In a further alternative embodiment, the method is approximately 622 cm -1 , approx. 644cm -1 , approx. 666cm -1 Approximately 682 cm -1 , about 700cm -1 , approx. 719cm -1 , about 745cm -1 , approx. 758cm -1 , about 803cm -1, approx. 829cm -1 , approx. 853cm -1 , about 1004cm -1 , approx. 1127cm -1 , approx. 1175cm -1 Approximately 1449cm -1 , approx. 1498cm -1 , approx. 1657cm -1 The method includes, or comprises, comparing, the signal intensity of one or any combination of wavenumbers selected from a group that includes or comprises a group that includes, a difference compared to a control, indicating a subject with breast cancer, more preferably a combination of at least two wavenumbers, and more preferably a combination of three, four, five, six, seven, eight, nine, ten, eleven, twelfth, thirteenth, fourteenth, fifteenth, sixteenth, and / or seventeenth wavenumbers. Preferably about 622 cm². -1 , approx. 758cm -1 , about 1004cm -1 , approx. 1127cm -1 , approx. 1175cm -1 Observe the increased signal intensity at one or more wavenumbers, ideally selected from all of them, and / or approximately 644 cm⁻¹. -1 , approx. 666cm -1 Approximately 682 cm -1 , about 700cm -1 , approx. 719cm -1 , about 745cm -1 , about 803cm -1 , approx. 829cm -1 , approx. 853cm -1 Approximately 1449cm -1 , approx. 1498cm -1 Observing a reduced signal intensity at one or more wavenumbers, ideally selected from all of them, indicates a subject with breast cancer. 【0035】 In a further alternative embodiment, the method is approximately 622 cm -1 , approx. 829cm -1 , approx. 943cm -1 Approximately 957cm -1 , about 1004cm -1 , approx. 1341cm -1 Approximately 1402cm -1, approx. 1498cm -1 , approx. 1521cm -1 Approximately 1556cm -1 , approx. 1587cm -1 , about 1607cm -1 , approx. 1657cm -1 The method includes, or comprises, comparing, the signal intensity of one or any combination of wavenumbers selected from a group including or consisting of such a group, wherein the difference compared to a control indicates a subject with colorectal cancer, more preferably a combination of at least two wavenumbers, and more preferably a combination of three, four, five, six, seven, eight, nine, ten, eleven, twelve, and / or thirteen wavenumbers. Preferably about 829 cm⁻¹ -1 , about 1004cm -1 , approx. 1341cm -1 Approximately 1402cm -1 , approx. 1498cm -1 Approximately 1556cm -1 , approx. 1587cm -1 , about 1607cm -1 , approx. 1657cm -1 Observe the increased signal intensity at one or more wavenumbers, ideally selected from all of them, and / or approximately 622 cm⁻¹. -1 , approx. 943cm -1 Approximately 957cm -1 , approx. 1521cm -1 Observing a reduced signal intensity at one or more wavenumbers, ideally selected from all of them, indicates a subject with colorectal cancer. 【0036】 According to a second aspect of the present invention, a method for determining cancer in a subject, i) Perform Raman spectral analysis of the biological sample obtained from the subject to generate a test sample spectrum. ii) Compare the test sample spectrum obtained in step i) with the control sample Raman spectrum from the control object. iii) Approximately 644cm -1 , approx. 666cm -1 Approximately 682 cm -1 , about 745cm -1, approx. 853cm -1 , about 1004cm -1 , approx. 1031cm -1 Approximately 1062cm -1 , approx. 1084cm -1 , approx. 1127cm -1 , approx. 1158cm -1 , approx. 1175cm -1 , approx. 1341cm -1 Approximately 1402cm -1 , approx. 1411cm -1 , approx. 1498cm - One or more ranges selected from the group including, ideally, the difference between the test sample spectrum and the control sample spectrum in one or more wavenumbers within all ranges, indicates a subject with pancreatic cancer. A method is provided that includes this. 【0037】 In a preferred method of a second aspect of the present invention, about 644 cm -1 , approx. 853cm -1 , about 1004cm -1 , approx. 1031cm -1 Approximately 1062cm -1 , approx. 1084cm -1 , approx. 1127cm -1 , approx. 1175cm -1 , approx. 1341cm -1 Approximately 1402cm -1 , approx. 1411cm -1 , approx. 1498cm -1 Observe the increased signal intensity at one or more wavenumbers, ideally selected from all of them, and / or approximately 666 cm⁻¹. -1 Approximately 682 cm -1 , about 745cm -1 , approx. 1158cm -1 Observing a reduced signal intensity at one or more wavenumbers, ideally selected from all of them, indicates a subject with pancreatic cancer. 【0038】 According to a third aspect of the present invention, a method for determining lung cancer in a subject, i) Perform Raman spectral analysis of the biological sample obtained from the subject to generate a test sample spectrum. ii) Compare the test sample spectrum obtained in step i) with the control sample Raman spectrum from the control object. iii) Approximately 622cm -1 , approx. 644cm -1 , approx. 666cm -1 Approximately 682 cm -1 , about 700cm -1 , approx. 719cm -1 , about 745cm -1 , about 803cm -1 , approx. 829cm -1 , approx. 853cm -1 , about 880cm -1 , approx. 1051cm -1 , approx. 1175cm -1 , approx. 1272cm -1 Approximately 1402cm -1 , approx. 1411cm -1 , approx. 1498cm -1 , approx. 1657cm -1 One or more ranges selected from the group including, ideally, the difference between the test sample spectrum and the control sample spectrum at one or more wavenumbers within all ranges indicates a subject with lung cancer. A method is provided that includes this. 【0039】 In a preferred method of a third aspect of the present invention, about 829 cm -1 , approx. 853cm -1 , about 880cm -1 , approx. 1051cm -1 Approximately 1402cm -1 , approx. 1411cm -1 Observe the increased signal intensity at one or more wavenumbers, ideally selected from all of them, and / or 622 cm⁻¹. -1 , approx. 644cm -1 , approx. 666cm -1 Approximately 682 cm -1 , about 700cm -1 , approx. 719cm -1 , about 745cm -1 , about 803cm-1 , approx. 1175cm -1 , approx. 1272cm -1 , approx. 1498cm -1 , approx. 1657cm -1 Observing a reduced signal intensity at one or more wavenumbers, ideally selected from all of them, indicates a subject with lung cancer. 【0040】 According to a fourth aspect of the present invention, a method for determining breast cancer in a subject, i) Perform Raman spectral analysis of the biological sample obtained from the subject to generate a test sample spectrum. ii) Compare the test sample spectrum obtained in step i) with the control sample Raman spectrum from the control object. iii) Approximately 622cm -1 , approx. 644cm -1 , approx. 666cm -1 Approximately 682 cm -1 , about 700cm -1 , approx. 719cm -1 , about 745cm -1 , approx. 758cm -1 , about 803cm -1 , approx. 829cm -1 , approx. 853cm -1 , about 1004cm -1 , approx. 1127cm -1 , approx. 1175cm -1 Approximately 1449cm -1 , approx. 1498cm -1 , approx. 1657cm -1 One or more ranges selected from the group including, ideally, the difference between the test sample spectrum and the control sample spectrum in one or more wavenumbers within all ranges indicates a subject with breast cancer. A method is provided that includes this. 【0041】 In a preferred method according to a fourth aspect of the present invention, about 622 cm -1 , approx. 758cm -1 , about 1004cm -1 , approx. 1127cm -1 , approx. 1175cm -1Observe the increased signal intensity at one or more wavenumbers, ideally selected from all of them, and / or approximately 644 cm⁻¹. -1 , approx. 666cm -1 Approximately 682 cm -1 , about 700cm -1 , approx. 719cm -1 , about 745cm -1 , about 803cm -1 , approx. 829cm -1 , approx. 853cm -1 Approximately 1449cm -1 , approx. 1498cm -1 Observing a reduced signal intensity at one or more wavenumbers, ideally selected from all of them, indicates a subject with breast cancer. 【0042】 According to a fifth aspect of the present invention, a method for determining colorectal cancer in a subject, i) Perform Raman spectral analysis of the biological sample obtained from the subject to generate a test sample spectrum. ii) Compare the test sample spectrum obtained in step i) with the control sample Raman spectrum from the control object. iii) Approximately 622cm -1 Approximately 829cm -1 , approx. 943cm -1 Approximately 957cm -1 , about 1004cm -1 , approx. 1341cm -1 Approximately 1402cm -1 , approx. 1498cm -1 , approx. 1521cm -1 Approximately 1556cm -1 , approx. 1587cm -1 , about 1607cm -1 , approx. 1657cm -1 One or more ranges selected from the group including, ideally, the difference between the test sample spectrum and the control sample spectrum in one or more wavenumbers within all ranges indicates a subject with colorectal cancer. A method is provided that includes this. 【0043】 In a preferred method of the fifth aspect of the present invention, about 829 cm -1 , about 1004cm -1 , approx. 1341cm -1 Approximately 1402cm -1 , approx. 1498cm -1 Approximately 1556cm -1 , approx. 1587cm -1 , about 1607cm -1 , approx. 1657cm -1 Observe the increased signal intensity at one or more wavenumbers, ideally selected from all of them, and / or approximately 622 cm⁻¹. -1 , approx. 943cm -1 Approximately 957cm -1 , approx. 1521cm -1 Observing a reduced signal intensity at one or more wavenumbers, ideally selected from all of them, indicates a subject with colorectal cancer. 【0044】 A further aspect of the present invention provides a method for monitoring the progression of cancer in a subject, comprising periodically repeating one or more of the above methods, ideally the same (one or more) methods. 【0045】 Ideally, Raman analysis correlates with known cancer staging techniques, such as simple in vitro assays or biopsies, which are used to reliably inform clinicians not only of the presence of a cancer but also of its stage or progression. 【0046】 As will be understood by those skilled in the art, the above method of the present invention may be used, for example, to evaluate how effectively a treatment regimen is working by assaying signal intensity levels during the course of a given treatment to determine whether there is a change in signal intensity in response to the treatment, by comparing signal intensity with respect to wavenumbers from the test sample spectrum, and / or by analyzing (one or more) relative spectral differences between the test sample spectrum and the control sample spectrum at one or more wavenumbers within one or more ranges disclosed herein. 【0047】 In addition, or alternatively, a method for treating cancer is provided, comprising performing any one of the aforementioned methods, and then, depending on the outcome of the said method, carrying out an appropriate or selected course of action for treatment. 【0048】 According to a further aspect of the present invention, a kit for use in determining cancer in a biological sample from a subject, a. A Raman spectrometer for performing spectral analysis of biological samples obtained from the subject, b. A processing unit, the processing unit processing the test sample spectrum in a manner defined herein, c. An output unit configured to provide an output indicating that a target is affected by cancer, in accordance with the determination made by the processing unit in step b), A kit is provided that includes the following: 【0049】 In a preferred kit of the present invention, the Raman spectrometer comprises a laser light source, more preferably a laser light source configured to emit light in the infrared wavelength band, most preferably 785 nm. Alternatively or in addition, the light source preferably generates laser light having an output of about 10 mW to about 1 W, more preferably about 50 mW to about 500 mW. 【0050】 A particularly suitable Raman spectrometer comprises a long-pass or Raman edge filter, a dispersion device, and a charge-coupled device configured to produce a spectral response of intensity versus energy (relative Raman shift at wavenumber). 【0051】 Throughout this specification and in the claims, the terms “comprise” and “contain,” and their variations, such as “comprising” and “comprises,” mean “including but not limited to,” and do not exclude other parts, additives, components, integers, or processes. Throughout this specification and in the claims, unless otherwise required by context, the singular form includes the plural form. In particular, where the indefinite article is used, unless otherwise required by context, this specification should be understood to assume both singular and plural forms. 【0052】 All references, including any patents or patent applications, cited herein are incorporated herein by reference. No reference is intended to constitute prior art. Furthermore, no prior art is intended to constitute ordinary general knowledge in the art. 【0053】 Preferred features of each aspect of the present invention may be as described in relation to any of the other aspects. 【0054】 Other features of the present invention will become apparent from the following examples. Generally speaking, the present invention extends to any novel one or any novel combination of the features disclosed herein (including the appended claims and drawings). Accordingly, any features, integers, properties, compounds or chemical parts described in relation to a particular aspect, embodiment or example of the present invention should be understood to be applicable to any other aspect, embodiment or example described herein, unless otherwise inconsistent. 【0055】 Furthermore, unless otherwise specified, any feature disclosed herein may be replaced by an alternative feature that serves the same or similar purpose. 【0056】 Herein, the present invention will be described simply as an example, with reference to the following embodiments and drawings. [Brief explanation of the drawing] 【0057】 [Figure 1] Figure 1. a) ROC curve from LOOCV for propensity score-matched controls, where all cancers are included as a single class. The red dots on the plot represent model performance when the probability threshold for cancer is 0.5. The green x marks further show model performance when the probability threshold is adjusted to achieve a minimum sensitivity of 90%. b) Feature importance (Gini importance) for a general cancer control model normalized between 0 and 1. [Figure 2] Figure 2. Random forest Gini importance (i.e., wavenumber importance) for each cancer type superimposed on a typical serum spectrum. [Figure 3] Figure 3. ROC curves for each cancer type from LOOCV. The red dots on the plot represent the model performance when the probability threshold for cancer is 0.5. Furthermore, the model performance adjusted to achieve a minimum sensitivity of 90% probability threshold is indicated by the green x marks. [Figure 4] Figure 4. Feature importance for a model trained to distinguish pancreatic cancer from colorectal cancer overlaid on a typical human serum sample. [Figure 5] Figure 5. Feature importance for a model trained to distinguish pancreatic cancer from lung cancer, superimposed on a typical human serum sample. [Figure 6] Figure 6. Feature importance for a model trained to distinguish pancreatic cancer from breast cancer, superimposed on a typical human serum sample. [Figure 7] Figure 7. Feature importance for a model trained to distinguish lung cancer from breast cancer overlaid on a typical human serum sample. [Figure 8] Figure 8. Feature importance for a model trained to distinguish lung cancer from colorectal cancer overlaid on a typical human serum sample. [Figure 9]Figure 9. Feature importance for a model trained to distinguish breast cancer from colorectal cancer overlaid on a typical human serum sample. 【0058】 Table 1. Breakdown of the number of samples from propensity score-matched controls based on each cancer type, age, and sex. This gives the total number of 263 cancers, including the matched control samples. 【0059】 Table 2. Model performance for each cancer type using LOOCV with a probability threshold of 0.5. 【0060】 Table 3. Model performance for each cancer type using LOOCV, with model probability thresholded using AUC to achieve a minimum sensitivity of 90%. 【0061】 Table 4. Provisional peak allocation for pancreatic cancer. 【0062】 Table 5. Metabolite pathways from the most important model features in pancreatic cancer, as obtained from the online KEGG pathway mapper. 【0063】 Table 6. Raman peaks consistent with mass spectrometry for the top 100 features observed in lung cancer. 【0064】 Table 7. Metabolite pathways from the most important model features in lung cancer, as obtained from the online KEGG pathway mapper. 【0065】 Table 8. Provisional peak allocation for breast cancer. 【0066】 Table 9. Metabolite pathways from the most important model features in breast cancer, as obtained from the online KEGG pathway mapper. 【0067】 Table 10. Provisional peak assignments from Raman spectroscopy and mass spectrometry for the top 100 features observed in colorectal cancer. 【0068】 Table 11. Metabolite pathways from the most important model features in colorectal cancer, as obtained from the online KEGG pathway mapper. 【0069】 Table 12. Performance of pancreatic cancer versus colorectal cancer with and without setting a probability threshold to achieve a minimum sensitivity of 90%. 【0070】 Table 13. Performance evaluation indices for pancreatic cancer versus lung cancer in cross-validation. 【0071】 Table 14. Performance evaluation indices for pancreatic cancer versus breast cancer in cross-validation. 【0072】 Table 15. Performance evaluation indices for lung cancer versus breast cancer in cross-validation. 【0073】 Table 16. Performance evaluation indices for lung cancer versus colorectal cancer in cross-validation. 【0074】 Table 17. Performance evaluation indices for breast cancer versus colorectal cancer in cross-validation. 【0075】 Table 18. Confusion matrix between evaluation indices calculated based on actual clinical "gold standard" diagnoses and predicted diagnoses based on Raman measurements. 【0076】 method Serum collection Patient characteristics at the time of sample collection may determine the accuracy of the obtained spectrum. Patients were fasted for 4 hours prior to sample collection and had no liver disease. Details of medication administered to the patient were also recorded. Blood samples were collected by skilled hematology professionals using standard operating procedures. Blood was collected using Vacutainer® Serum Separator collection tubes. The collection tubes were then handled according to the manufacturer's best practice protocol to prepare liquid serum. Samples were typically frozen and stored for preservation before analysis. Samples were analyzed as liquid samples by Raman spectroscopy. 【0077】 Raman spectroscopy of liquid samples (785nm laser) A liquid sample was added by pipetting into a container in the form of a stainless steel sample holder with multiple wells. This was then placed in a spectrometer on a stainless steel cooling plate. Using the objective lens, a 785 nm laser beam was focused into the liquid sample, 1.2 mm above the bottom of the wells. Then, using a laser power of 165–175 mW, a 610 cm² beam was emitted for an exposure time of 5 seconds. -1 ~1718cm -1 Data points were acquired in the spectral region. These were then averaged over 30 acquisitions to generate a single spectrum. This process was then repeated to generate five replicas per sample, which were used in a diagnostic model to examine the degree of spectral variation related to "sample collection" reproducibility. 【0078】 Spectral data preprocessing The analysis was performed using a computational package developed to help optimize diagnostic model performance by testing preprocessing steps for Raman spectral data and the order in which they are applied. While good diagnostic performance can be achieved using various other preprocessing methods and / or alternative configurations or sequences of preprocessing steps, a preferred sequence of such steps—binning, data smoothing, baseline correction, and normalization—was found to achieve superior diagnostic performance. Each preprocessing step can be described as follows: 【0079】 Data binning: This method minimizes the impact of observational errors within a dataset and, in spectral data, increases the signal-to-noise ratio. Binning has the additional advantage of reducing data dimensionality, which can decrease the computational burden on data processing using highly multivariate data. In the data analysis here, binning was set to 1. 【0080】 Data smoothing: This process minimizes noise in the dataset by fitting a curve to replace the raw data, thereby increasing the signal-to-noise ratio. In this study, a Savitsky-Golay (SG) filter is applied when the preprocessing sequence includes a smoothing step. The SG works by manipulating a sliding window along the dataset and fitting a polynomial of degree n to that window, thus replacing the spectrum with the fitted polynomial of that window. In the preferred analysis here, the filter window length was set to 9 and the degree of the polynomial was 4. 【0081】 Background removal: This process attempts to remove any background contribution to the spectrum from the fluorescence without removing features from the spectrum (peaks). Background removal can be achieved by several methods. 【0082】 A Rolling Circle Filter (RCF) is a high-pass filter for removing background fluorescence that works by rotating under the spectrum and remembering the minimum distance between a circle and the data. It is iterated over and evaluated against a shifted window along the spectrum, resulting in a list of the circumference of a circle and the minimum distance between them, which are then removed from the spectrum as background contributions. The RCF removes broad features from the spectrum, thus preserving sharp, characteristic Raman peaks. The tunable parameter in the RCF is the radius of the circle, which is typically chosen so that broad features are effectively removed and the circle cannot "roll" during the Raman peak. 【0083】 Polynomial baseline removal: This method aims to remove broader features that are not characteristic of Raman while preserving peak structure by fitting an n-th degree polynomial to the spectrum using least squares. This method works by smoothing the spectrum to preserve Raman peaks while fitting the background. 【0084】 Extended multiplicative scattering correction: This method is a powerful preprocessing technique that isolates and removes complex multiplicative effects caused by physical phenomena, making it easier to determine spectral features. 【0085】 Differential Baseline Removal: By determining the derivative of the measured response with respect to wavenumber (or exponent), the derivative of the spectrum can be evaluated to highlight the maximum and minimum values ​​that characterize the Raman spectrum. To achieve this, either the first or second derivative can be sought, both of which act as high-pass filters to remove low-frequency data from the spectrum. The preferred method used here was chosen to be a polynomial background removal of order 5. 【0086】 Normalization: Due to the inherent variation in intensity counts in spectra from day to day, this is generally considered an essential step when analyzing spectroscopic data. Normalizing spectra aims to eliminate this variation in intensity, making it easier for analysts to compare spectra. Various methods can be used, such as min-max, vector normalization, normalization to major peaks, and standard normalization, but the preferred method used here was standard normalization. 【0087】 sample Table 1 shows the number of patients for each cancer type. Patients are matched using sex and sex-based propensity score matching. Controls are determined by a normal CT scan or colonoscopy for CRC, and cancer is confirmed by histopathology from tumor biopsy. 【0088】 Machine Learning Classification The classification of Raman spectra was performed using the Random Forest machine learning algorithm. Briefly, Random Forest works by constructing n decision trees, each tree starting with a random variable or wavenumber, and the resulting model combines all n trees for an ensemble result. The advantage of Random Forest is its ability to extract variables important in decision-making, and in the case of Raman spectroscopy, these wavenumbers can be remapped to molecular vibrational modes for identification. Furthermore, RF is readily available, performs well, and tends not to overfit through bagging, as other classical machine learning methods fall into. This information on important features can also be used to reconstruct the model with fewer variables to remove everything that does not contribute to classification. 【0089】 The performance evaluation of each model is described using the evaluation index, sensitivity, specificity, positive predictive value (PPV), negative predictive value (NPV), and precision defined in Table 18. 【0090】 Furthermore, the inventors observe the area under the receiver operation curve (AUC, or AUROC, or ROC curve), which is, by definition, the area under the plot that indicates how well the binary classifier performs when the discrimination threshold is varied. Figure 1a) shows an exemplary ROC curve, where optimal performance is in the upper left of the plot and AUC=1 is a perfect classifier. 【0091】 Uncertainty about model performance The uncertainty in the performance of the machine learning algorithm in this case where LOOCV is used is expressed by the formula: 【number】 Calculated using, In the formula, 1.96 is derived from the 95% confidence interval, the evaluation metric is the evaluation metric in question, e.g., the sensitivity of the model, the prevalence is the proportion of cases in the dataset, and N is the total number of samples in the dataset. In this study, the prevalence is 50%, or 0.5, for each cancer type tested. 【0092】 The datasets range from a minimum of N=61 (31 lung cancer patients) for lung cancer to a maximum of N=300 (150 CRC patients) for CRC cancer, so a high degree of uncertainty is expected, with some evaluation metrics ranging from 5% to 17%. This is clearly significant and will be addressed in the discussion section. 【0093】 Raman spectroscopy provides a fingerprint of the vibrational modes of a sample, which can lead to multiple possible assignments for the same peak. To ensure matched Raman peaks, 61 serum samples underwent both Raman and mass spectrometry analysis. A database of metabolites with Raman peaks was constructed using the correlation between Raman wavenumber and mass spectrometry measurements combined with an existing Raman peak library. 【0094】 result First, we developed a machine learning model for general cancer detection incorporating a complete dataset containing 263 cancers along with 263 matched controls. Figure 1 shows the receiver operated characteristic (ROC) curve for this model. Setting the probability to 0.5, the model achieves a sensitivity of 78.3% and a specificity of 89.4%, and when established at a threshold (probability cutoff = 0.374) to achieve a minimum sensitivity of 90%, it achieves a sensitivity of 90.1% and a specificity of 83.3%. In Figure 1b, we can see the feature importance for all cancers versus controls over a typical serum spectrum. The feature importance from the RF algorithm indicates which wavenumber is clearly the least "impure" in the split decision. This means that the peak in feature importance corresponds to the wavenumber where the clearest difference is seen between cancer and control. From the feature importance, we can then remove wavenumbers with poor differences and thus simplify the model. As prominent features become more well-established, it is expected that the more dates there are, the less noise there will be around the feature importance. 【0095】 These results demonstrate high sensitivity and specificity and are extremely promising. For example, if a general cancer screening without cancer specificity is a viable solution, then RS could potentially be utilized. However, if the specific location of cancer can be identified via blood-based Raman testing, it would streamline the current NHS pathway. 【0096】 Tables 2 and 3 show the model performance for cancer versus control samples in the pairs given in Table 1 in one-miss cross-validation (LOOCV). Figure 2 shows the feature importance for the cancer versus control models in these pairs. This shows clear differences between cancer types in many areas, along with similarities. A metabolomical understanding of the differences between cancer types could be helpful in detection, treatment, and outcomes if cancer is detected at an earlier stage. 【0097】 Figure 3 shows the ROC curves for each cancer model, along with the performance of the model with a probability set to 0.5 that correlates the performance in Table 2 (red dots), and the performance with a threshold probability to achieve the minimum sensitivity of 90% that correlates with the performance in Table 3 (green x). 【0098】 Pancreatic cancer As shown in Figure 2, the most important region for random forest classification of pancreatic cancer is primarily the wavenumber region 620–800 cm⁻¹. -1 , 1000~1200cm -1 , and 1350~1550cm -1 It is located in Table 4, where approximate peak assignments are given, and the wavenumber is matched to the nearest peak. This is to account for several important features that lie on the peak shoulders and are combined for brevity. From the random forest feature importance, the top 10 metabolites for Raman-mediated pancreatic cancer diagnosis are L-phenylalanine, L-proline, pantothenate, D-fructose, sarcosine, alanine, L-tryptophan, citrate, phenol, and L-glutamine. 【0099】 Table 5 shows the output obtained by feeding the top metabolites for pancreatic cancer determined from RS into the Kyoto Encyclopaedia of Genes and Genomes (KEGG) pathway mapper. Pathways with top hits include alanine, aspartate, and glutamate metabolism, as well as valine, leucine, and isoleucine biosynthesis / degradation. Statistical significance has been previously observed between pancreatic ductal adenocarcinoma and other pancreatic cancers + chronic pancreatitis for various amino acids. Of these highlighted amino acids, our results for L-phenylalanine, L-proline, alanine, L-tryptophan, and L-glutamine are consistent, and this is also consistent with the KEGG pathway. 【0100】 lung cancer In lung cancer, the top 100 features mapped to metabolites by Raman and mass spectrometry correspond to L-phenylalanine, pilbert, citrate, alanine, D-fructose, L-proline, sarcosine, L-tryptophan, phenol, and pantothenate. 【0101】 The most important metabolite in the Raman lung cancer model is L-phenylalanine, which, in our research, is generally reduced in lung cancer patients. Pilbert was identified as the second most important metabolite in the Raman-based lung cancer model and is known as a major supplementation source in the TCA cycle. This also relates to citrate, the third most important metabolite in the lung cancer model. Table 7 shows the top metabolites from lung cancer consistent with the KEGG pathway. The pathway alanine, aspartate, and glutamate metabolism contains the highest number of hits with top Raman metabolites associated with the affected TCA cycle. 【0102】 breast cancer The top 100 features mapped by Raman spectroscopy and mass spectrometry correspond to the metabolites: L-lysine, L-methionine, L-proline, L-valine, sucrose, L-phenylalanine, alanine, L-tryptophan, phenol, and pantothenate. 【0103】 The inventors have found that L-methionine is the second most important metabolite for breast cancer detection, and that it is elevated in breast cancer patients. Table 9 shows the top metabolite pathways from KEGG for the Raman model for breast cancer. Similar to lung cancer and pancreatic cancer, alanine, aspartate, and glutamate metabolism have the highest number of agreements with Raman metabolites. This commonality may indicate pathways that are affected across each cancer type. 【0104】 colorectal cancer In colorectal cancer, the top 10 metabolites consistent with key Raman features and mass spectrometry correspond to L-proline, sucrose, methylmalonate, D-fructose, sarcosine, L-phenylalanine, alanine, L-tryptophan, citrate, and pantothenate. 【0105】 The most important metabolite in the Raman-based model is L-proline, which the inventors have found to be elevated in patients with colorectal cancer. 【0106】 In the Raman model, sucrose was the second most important metabolite identified for colorectal cancer, but apart from its association with high-diet sucrose associated with colorectal cancer, no other associations with cancer have been found in the literature. Table 11 shows the KEGG pathways consistent with the top Raman metabolites from this model. Unlike pancreatic, lung, and breast cancers, where the alanine, aspartate, and glutamate pathways showed the greatest statistical significance, CRC showed the greatest significance in valine, leucine, and isoleucine biosynthesis. However, alanine, aspartate, and glutamate metabolism is the second most important pathway identified for CRC. This may indicate that this pathway is affected in most or all cancers. 【0107】 Paired cancer analysis Here, the inventors move on to analyzing cancers against each other, excluding control samples from the analysis. By constructing an ML model between two cancer types, for example, colorectal cancer and pancreatic cancer, the inventors can establish whether Raman peaks can reveal differences between serums from cancer patients with different cancer types. The inventors can then compare any differences in Raman peaks with a spectral library to establish which metabolites differ between cancer types. By comparing and understanding specific metabolites and pathways affected by different cancers, it may be possible to uncover more information about the fundamental changes occurring in cancer metabolism and inform treatment decisions. 【0108】 Pancreatic cancer vs. colorectal cancer The same methodology is used for different cancer types versus controls, so that feature importance is obtained through the 100RF model and the resulting mean Gini importance plot. 【0109】 Next, the top 100 features are given to an RF algorithm with LOOCV to evaluate its ability to classify one cancer type from another. 【0110】 The model performance for pancreatic cancer versus colorectal cancer is 90.9% sensitivity (when colorectal cancer is considered a positive case), 70.5% specificity, 75.5% PPV, 88.6% NPV, 80.7% precision, and finally, an AUC of 0.815. Note that this corresponds to the case where the probability has a threshold in order to achieve at least 90% sensitivity. Table 12 shows the performance of the pancreatic cancer versus colorectal cancer model with and without probability thresholding. 【0111】 Figure 4 shows the feature importance for a pancreatic cancer vs. colorectal cancer model superimposed on a typical serum spectrum. The most important selected features are tentatively attributed to wavenumber 644 cm⁻¹ for L-lysine, L-phenylalanine, L-methionine, L-proline, phenol, alanine, L-tryptophan, uridine, pantothenate, L-alanine, GABA, L-glutamic acid, L-glutamine, L-leucine, and D-fructose. -1 (↑), 850cm -1 (↑), 800cm -1 (↑), 1050cm -1 Corresponds to (↑). The arrows indicate how the pancreatic cancer peak differs from the CRC, i.e., how each peak is increased for pancreatic cancer. 【0112】 Pancreatic cancer vs. lung cancer Table 13 shows the model performance in cross-validation for a pancreatic cancer vs. lung cancer model using 31 age- and sex-matched patients. With a probability threshold adjusted to ensure a minimum sensitivity of 90%, a sensitivity of 90.3%, specificity of 64.5%, PPV of 71.8%, and NPV of 87.0% were achieved. 【0113】 In Figure 5, which shows a feature importance plot over a typical serum spectrum, a clear difference between pancreatic cancer and lung cancer can be seen. The random forest provisionally assigns the most important features between pancreatic cancer and lung cancer to L-phenylalanine, L-tryptophan, pilbert, methionine, L-proline, L-valine, citrate, L-leucine, D-fructose, phenol, and creatinine (1004 cm²). -1 (↑), 1020cm -1 (↑), 625cm -1 (↑), 1704cm -1 I selected it as (↑). 【0114】 Pancreatic cancer vs. breast cancer Table 14 shows the performance of a pancreatic cancer model versus a breast cancer model using 38 matched patients in cross-validation. With probability thresholding, the models achieved a sensitivity of 91.9%, specificity of 67.6%, PPV of 73.9%, NPV of 89.3%, precision of 79.7%, and AUC of 0.892. Note that for breast cancer patients with only female participants, age and sex propensity score matching could not be performed when breast cancer was paired with other cancers excluding CRC. However, when sex matching was not possible, matching was still age-based. 【0115】 Figure 6 shows the feature importance for machine learning models of pancreatic cancer versus breast cancer. The most important wavenumbers highlighted from the random forest are tentatively attributed to 1600 cm⁻¹ for L-phenylalanine, L-methionine, L-tryptophan, L-lysine, L-proline, phenol, alanine, uridine, pantothenate, L-alanine, L-proline, L-leucine, D-fructose, pilbert, and hypoxanthine. -1 (↑), 644cm -1 (↓), 1200cm -1 (↑) 【0116】 Lung cancer vs. breast cancer The model performance for lung cancer versus breast cancer is 94.6% sensitivity, 62.2% specificity, 71.4% PPV, 92.0% NPV, 78.4% precision, and 0.893 AUC at a 90% sensitivity threshold. Table 15 shows the complete performance from cross-validation with and without probability thresholding. 【0117】 Figure 7 shows the feature importance for the lung cancer vs. breast cancer model. The most important wavefrequency is 1600 cm⁻¹. -1 (↑), 644cm -1 (↑), 1195cm -1 (↑) These include, tentatively, L-phenylalanine, L-methionine, L-tryptophan, L-lysine, L-proline, phenol, alanine, uridine, pantothenate, L-alanine, L-proline, L-leucine, D-fructose, pilbert, and hypoxanthine. Notably, these wavenumbers are the same as the most important wavenumbers in the paired model pancreatic cancer versus breast cancer. The common factor is breast cancer, which indicates a similarity between pancreatic cancer and lung cancer. 【0118】 Lung cancer vs. colorectal cancer The model performance for lung cancer versus colorectal cancer, as shown in Table 16, is as follows: at a 90% sensitivity threshold, it exhibits a sensitivity of 90.3%, specificity of 83.9%, PPV of 84.9%, NPV of 89.7%, precision of 87.1%, and AUC of 0.898. 【0119】 Figure 8 shows the feature importance for the lung vs. colorectal model. The most important wavefrequency is tentatively attributed to 644 cm⁻¹, which is associated with L-lysine, L-phenylalanine, L-methionine, L-proline, phenol, alanine, L-tryptophan, uridine, pantothenate, L-alanine, GABA, L-glutamic acid, L-glutamine, L-leucine, and D-fructose. -1 (↓), 850cm -1 (↑), 800cm -1 (↓), 1050cm -1 (Includes ↓) 【0120】 Breast cancer vs. colorectal cancer Table 17 shows that the model performance for colorectal cancer versus breast cancer is 97.3% sensitivity, 83.8% specificity, 85.7% PPV, 96.9% NPV, 90.5% precision, and 0.954 AUC, at a 90% sensitivity threshold. 【0121】 Figure 9 shows the feature importance for the breast cancer vs. colorectal cancer model. The most important wavenumbers are tentatively attributed to L-lysine, L-phenylalanine, L-methionine, L-proline, phenol, alanine, L-tryptophan, uridine, pantothenate, L-alanine, sarcosine, and citrate at 644 cm⁻¹. -1 (↓), 1280cm -1 (↓), 1004cm -1 (↓), 1600cm -1 (↑) 【0122】 Consideration The inventors presented the application of Raman spectroscopy combined with machine learning for detecting multiple cancer types from human serum. For cancer more generally, the inventors' model performed with a sensitivity of 90.1% and a specificity of 83.3%. These results alone are positive and suggest the potential of Raman spectroscopy to be used as a screening procedure to mark patients / animals requiring further investigation, or as a pre-secondary care step to prevent unnecessary and expensive exploratory investigations. 【0123】 As a tool for detecting specific cancers, the inventors found that their model performs highly well against pancreatic cancer (when a threshold is applied to ensure a minimum sensitivity of 90%), with a sensitivity of 90.9% and a specificity of 77.3%. For lung cancer, the model achieves a sensitivity of 90.3% and a specificity of 64.5%. For breast cancer, it achieves a sensitivity of 92.1% and a specificity of 65.8%. Finally, for colorectal cancer, it achieves a sensitivity of 91.3% and a specificity of 44.0%. These results prove to be positive as proof in principle. 【0124】 Clear differences in feature importance across different cancer types. Notably, for pancreatic cancer, these are attributed to L-phenylalanine, L-proline, alanine, L-tryptophan, and L-glutamine, consistent with another study that showed statistical significance in pancreatic cancer. For lung cancer, the most important metabolites from the Raman model are L-phenylalanine, pilbert, and citrate. For breast cancer, the most important features from Raman are attributed to L-lysine, L-methionine, and L-proline. For colorectal cancer, the most important features from Raman include L-proline and sucrose. 【0125】 Raman spectroscopy combined with machine learning has great potential for detecting multiple cancer types. [Table 1] [Table 2] [Table 3] [Table 4] [Table 5] [Table 6] [Table 7] [Table 8] [Table 9] [Table 10] [Table 11] [Table 12] Table 13 Table 14 Table 15 Table 16 Table 17 Table 18

Claims

[Claim 1] A method for determining cancer in a subject, i) Perform Raman spectral analysis of the biological sample obtained from the subject to generate a test sample spectrum. ii) Compare the spectrum of the test sample obtained in step i) with the Raman spectrum of at least one control sample from the control subject. iii) Approximately 622 cm -1 、approximately 644 cm -1 、approximately 666 cm -1 、approximately 682 cm -1 、approximately 700 cm -1 、approximately 719 cm -1 、approximately 745 cm -1 、approximately 758 cm -1 、approximately 803 cm -1 、approximately 829 cm -1 、approximately 853 cm -1 、approximately 880 cm -1 、approximately 943 cm -1 、approximately 957 cm -1 、approximately 1004 cm -1 、approximately 1031 cm -1 、approximately 1051 cm -1 、approximately 1062 cm -1 、approximately 1084 cm -1 、approximately 1127 cm -1 、approximately 1158 cm -1 、approximately 1175 cm -1 、approximately 1272 cm -1 、approximately 1341 cm -1 、approximately 1402 cm -1 、approximately 1411 cm -1 、approximately 1449 cm -1 、approximately 1498 cm -1 、approximately 1521 cm -1 、approximately 1556 cm -1 、approximately 1587 cm -1 、approximately 1607 cm -1 、approximately 1657 cm -1 where the difference between the test sample spectrum and the control sample spectrum at one or more wavenumbers within one or more ranges selected from the group including indicates a subject suffering from cancer. Methods that include... [Claim 2] The method according to claim 1, wherein the subject is a mammal, and the mammal may be a human, a horse, a dog, a cat, a pig, or any other domestic or agricultural species. [Claim 3] The method according to claim 1 or 2, wherein the biological sample is a blood sample, and may be liquid blood or a blood product sample. [Claim 4] The aforementioned cancers include nasopharyngeal cancer, synovial cancer, hepatocellular carcinoma, renal cancer, connective tissue cancer, melanoma, lung cancer, intestinal cancer, colorectal cancer, brain cancer, throat cancer, oral cancer, liver cancer, bone cancer, pancreatic cancer, choriocarcinoma, gastrin-producing tumors, pheochromocytoma, prolactin-producing tumors, T-cell leukemia / lymphoma, tonsil, spleen, neuroma, von Hippel-Lindau disease, Zollinger-Ellison syndrome, adrenal cancer, anal cancer, bile duct cancer, bladder cancer, ureteral cancer, glioma, oligodendroglioma, neuroblastoma, meningioma, spinal cord tumor, osteochondroma, chondrosarcoma, Ewing's sarcoma, carcinoid, gastrointestinal carcinoid, fibrosarcoma, breast cancer, muscle cancer, Paget's disease, cervical cancer, rectal cancer, esophageal cancer, gallbladder cancer, bile duct cancer, head cancer, eye cancer, nasopharyngeal cancer, The method according to any one of claims 1 to 3, selected from cervical cancer, kidney cancer, Wilms' tumor, liver cancer, Kaposi's sarcoma, prostate cancer, testicular cancer, Hodgkin's disease, non-Hodgkin lymphoma, skin cancer, mesothelioma, myeloma, multiple myeloma, ovarian cancer, endocrine pancreatic cancer, glucagon-producing tumor, parathyroid cancer, penile cancer, pituitary cancer, soft tissue sarcoma, retinoblastoma, small intestine cancer, gastric cancer, thymic cancer, thyroid cancer, choriocarcinoma, hydatidiform mole, uterine cancer, endometrial cancer, vaginal cancer, vulvar cancer, acoustic neuroma, mycosis fungoides, insulinoma, carcinoid syndrome, somatostatin-producing tumor, gingival cancer, cardiac cancer, lip cancer, meningeal cancer, oral cancer, nerve cancer, palatine cancer, parotid gland cancer, peritoneal cancer, pharyngeal cancer, pleural cancer, salivary gland cancer, tongue cancer, and tonsil cancer. [Claim 5] The method according to claim 4, wherein the cancer is selected from colorectal cancer, lung cancer, pancreatic cancer, and breast cancer. [Claim 6] The method according to any one of claims 1 to 5, wherein the control and / or test sample is matched for age, sex, and weight. [Claim 7] The method according to any one of claims 1 to 6, wherein the Raman spectral analysis comprises irradiating the sample with a laser light source, and the laser light source may be configured to emit light most preferably at 785 nm in the infrared wavelength band. [Claim 8] The method according to claim 7, wherein the laser light source generates laser light having an output of about 10 mW to about 1 W. [Claim 9] The method according to any one of claims 1 to 8, wherein the spectrum or each of the spectra undergoes one or more conventional pre-processing steps before or after the comparison step, preferably, in order to reduce noise associated with the one or more spectra and provide the processed spectrum or each of the spectra. [Claim 10] The method according to claim 9, wherein the one or more preprocessing steps are selected from the group including data binning, smoothing, background removal such as extended multiplicative scattering correction, and / or normalization such as vector normalization and / or baseline correction. [Claim 11] The method according to any one of claims 1 to 10, wherein step iii) includes or consists of observing a difference, wherein an increase or decrease in signal intensity at the same wavenumber or a shift in the position of the maximum or minimum signal value between wavenumbers between the test sample spectrum and the control sample spectrum generated from the sample indicates a subject suffering from cancer. [Claim 12] The above method, approximately 666 cm -1 Approximately 682 cm -1 , approx. 829cm -1 , approx. 853cm -1 , about 700cm -1 , approx. 719cm -1 , about 745cm -1 , about 803cm -1 , about 1004cm -1 , approx. 1127cm -1 , approx. 1175cm -1 , approx. 1341cm -1 Approximately 1402 cm -1 , approx. 1411cm -1 , approx. 1498cm -1 and approximately 1657 cm -1 The method according to any one of claims 1 to 11, comprising comparing the signal intensity at one or any combination of wavenumbers selected from the group including, or comprising, wherein the difference compared to a control indicates a subject suffering from cancer. [Claim 13] Approximately 1004cm -1 Approximately 1402 cm -1 Increased signal strength at a wavenumber selected from one or ideally both, and / or approximately 666 cm⁻¹ -1 Approximately 682 cm -1 745cm -1 The method according to claim 12, wherein observing a reduced signal intensity at one or more wavenumbers, ideally selected from all of them, indicates a subject suffering from cancer. [Claim 14] The aforementioned method, approximately 644 cm -1 , approx. 666cm -1 Approximately 682 cm -1 , about 745cm -1 , approx. 853cm -1 , about 1004cm -1 , approx. 1031cm -1 Approximately 1062 cm -1 , approx. 1084cm -1 , approx. 1127cm -1 , approx. 1158cm -1 , approx. 1175cm -1 , approx. 1341cm -1 Approximately 1402 cm -1 , approx. 1411cm -1 , approx. 1498cm -1 The method according to any one of claims 1 to 11, comprising comparing the signal intensity of one or any combination of wavenumbers selected from the group including, or comprising, wherein the difference compared to a control indicates a subject suffering from pancreatic cancer. [Claim 15] Approximately 644cm -1 , approx. 853cm -1 , about 1004cm -1 , approx. 1031cm -1 Approximately 1062 cm -1 , approx. 1084cm -1 , approx. 1127cm -1 , approx. 1175cm -1 , approx. 1341cm -1 Approximately 1402 cm -1 , approx. 1411cm -1 , approx. 1498cm -1 Increased signal strength at one or more wavenumbers, ideally selected from all of them, and / or approximately 666 cm⁻¹ -1 Approximately 682 cm -1 , about 745cm -1 , approx. 1158cm -1 The method according to claim 14, wherein observing a reduced signal intensity at one or more wavenumbers, ideally selected from all of them, indicates a subject suffering from pancreatic cancer. [Claim 16] where the method is at about 622 cm -1 、about 644 cm -1 、about 666 cm -1 、about 682 cm -1 、about 700 cm -1 、about 719 cm -1 、about 745 cm -1 、about 803 cm -1 、about 829 cm -1 、about 853 cm -1 、about 880 cm -1 、about 1051 cm -1 、about 1175 cm -1 、about 1272 cm -1 、about 1402 cm -1 、about 1411 cm -1 、about 1498 cm -1 、about 1657 cm -1 comprises comparing the signal intensity at one or any combination of wave numbers selected from the group consisting of or including these, or consists of comparing, and the difference compared to the control indicates a subject suffering from lung cancer. The method according to any one of claims 1 to 11. [Claim 17] Approximately 829cm -1 , approx. 853cm -1 , about 880cm -1 , approx. 1051cm -1 Approximately 1402 cm -1 , approx. 1411cm -1 Increased signal strength at one or more wavenumbers, ideally selected from all of them, and / or 622 cm⁻¹ -1 , approx. 644cm -1 , approx. 666cm -1 Approximately 682 cm -1 , about 700cm -1 , approx. 719cm -1 , about 745cm -1 , about 803cm -1 , approx. 1175cm -1 , approx. 1272cm -1 , approx. 1498cm -1 , approx. 1657cm -1 The method according to claim 16, wherein observing a reduced signal intensity at one or more wavenumbers, ideally selected from all of them, indicates a subject suffering from lung cancer. [Claim 18] The aforementioned method, approximately 622 cm -1 , approx. 644cm -1 , approx. 666cm -1 Approximately 682 cm -1 , about 700cm -1 , approx. 719cm -1 , about 745cm -1 , about 758cm -1 , about 803cm -1 , approx. 829cm -1 , approx. 853cm -1 , about 1004cm -1 , approx. 1127cm -1 , approx. 1175cm -1 Approximately 1449 cm -1 , approx. 1498cm -1 , approx. 1657cm -1 A method according to any one of claims 1 to 11, comprising or consisting of comparing the signal intensity at one or any combination of wavenumbers selected from the group comprising, wherein the difference compared to a control indicates a subject with breast cancer. [Claim 19] Approximately 622cm -1 , about 758cm -1 , about 1004cm -1 , approx. 1127cm -1 , approx. 1175cm -1 Increased signal strength at one or more wavenumbers, ideally selected from all of them, and / or approximately 644 cm⁻¹ -1 , approx. 666cm -1 Approximately 682 cm -1 , about 700cm -1 , approx. 719cm -1 , about 745cm -1 , about 803cm -1 , approx. 829cm -1 , approx. 853cm -1 Approximately 1449 cm -1 , approx. 1498cm -1 The method according to claim 18, wherein observing a reduced signal intensity at one or more wavenumbers, ideally selected from all of them, indicates a subject suffering from breast cancer. [Claim 20] The aforementioned method, approximately 622 cm -1 , approx. 829cm -1 , approx. 943cm -1 Approximately 957 cm -1 , about 1004cm -1 , approx. 1341cm -1 Approximately 1402 cm -1 , approx. 1498cm -1 , approx. 1521cm -1 Approximately 1556 cm -1 , approx. 1587cm -1 , approx. 1607cm -1 , approx. 1657cm -1 The method according to any one of claims 1 to 11, comprising comparing the signal intensity at one or any combination of wavenumbers selected from the group including, or comprising, the difference compared to a control, indicating a subject with colorectal cancer. [Claim 21] Approximately 829cm -1 , about 1004cm -1 , approx. 1341cm -1 Approximately 1402 cm -1 , approx. 1498cm -1 Approximately 1556 cm -1 , approx. 1587cm -1 , approx. 1607cm -1 , approx. 1657cm -1 Increased signal strength at one or more wavenumbers, ideally selected from all of them, and / or approximately 622 cm⁻¹ -1 , approx. 943cm -1 Approximately 957 cm -1 , approx. 1521cm -1 The method according to claim 20, wherein observing a reduced signal intensity at one or more wavenumbers, ideally selected from all of them, indicates a subject suffering from colorectal cancer. [Claim 22] A method for monitoring the progression of cancer in a subject, comprising periodically repeating one or more of the methods described in any one of claims 1 to 21, ideally the same (one or more) methods. [Claim 23] A method for treating cancer, comprising carrying out the method according to any one or more of claims 1 to 21, and then carrying out an appropriate or selected policy for treatment depending on the results of the method. [Claim 24] A kit for use in determining cancer in biological samples from subjects, a. A Raman spectrometer for performing spectral analysis of biological samples obtained from the subject, b. A processing unit comprising a processing unit that processes the test sample spectrum according to the method described in any one or more of claims 1 to 21, c. An output unit configured to provide an output indicating that an object is suffering from cancer, in accordance with the determination made by the processing unit in step b), A kit that includes the following: [Claim 25] The Raman spectrometer is equipped with a laser light source, and the laser light source is a. It is configured to emit light in the infrared wavelength band, most preferably 785 nm, and / or b. Generate a laser beam with an output of approximately 10 mW to approximately 1 W. The kit according to claim 24.