System for predicting primary therapeutic prognosis of colorectal cancer patient

An AI-based system predicts colorectal cancer treatment regimens by analyzing clinical and genetic data, addressing the inefficiencies of current methods and enabling personalized treatment optimization.

WO2026135246A1PCT designated stage Publication Date: 2026-06-25ONCOMASTER INC +1

Patent Information

Authority / Receiving Office
WO · WO
Patent Type
Applications
Current Assignee / Owner
ONCOMASTER INC
Filing Date
2025-12-17
Publication Date
2026-06-25

AI Technical Summary

Technical Problem

Current first-line treatment regimens for colorectal cancer lack individualized selection based on clinical and genetic factors, leading to ineffective drug choices and toxicity issues, with conventional prediction methods being inefficient in time, cost, and labor.

Method used

A system using an artificial intelligence model that predicts the prognosis of primary treatment regimens for colorectal cancer patients by analyzing clinical and genetic information, generating genomic feature vectors, and applying them to a learned prognosis prediction model to classify patients into high-resistance or excellent treatment responsiveness groups.

Benefits of technology

Enables rapid and accurate prediction of treatment response and prognosis, optimizing personalized treatment plans and reducing the need for extensive drug experimentation.

✦ Generated by Eureka AI based on patent content.

Smart Images

  • Figure KR2025022026_25062026_PF_FP_ABST
    Figure KR2025022026_25062026_PF_FP_ABST
Patent Text Reader

Abstract

The present invention relates to a system for predicting a primary therapeutic prognosis of a colorectal cancer patient. According to the present invention, the system comprises: a data input unit for receiving clinical information and genetic information of a subject; a feature extraction unit for generating a genomic feature vector reflecting the characteristics of a molecular subtype by selecting and processing genetic features according to a preset molecular subtype reflection criterion among a plurality of pieces of candidate genetic mutation information; and a prognosis prediction unit for predicting a primary therapeutic prognosis of the subject by applying the input clinical information and the generated genomic feature vector to a trained prognosis prediction model, and may further comprise a training unit for training the prognosis prediction model to predict the prognosis of a patient on the basis of pieces of clinical information, pieces of genetic information, and genomic feature vectors of a plurality of patients under preset cohort conditions.
Need to check novelty before this filing date? Find Prior Art

Description

Prognosis prediction system for primary treatment regimens in colorectal cancer patients

[0001] The present invention relates to a system for predicting the prognosis of a primary treatment regimen for colorectal cancer patients, and more specifically, to a system for predicting the prognosis of one or more primary treatment regimens for colorectal cancer patients through an artificial intelligence model by considering the subject's clinical information and genetic information.

[0002] Colorectal cancer accounts for 12.7% of all cancer patients, with 309,761 people, making it the third most common type of cancer. With the number of patients showing an increasing trend globally, the number of colorectal cancer patients is expected to reach approximately 2.5 million by 2035. Additionally, it is known that metastasis is confirmed in about 25% of patients at the time of diagnosis, and metastatic lesions appear in 40 to 50% of patients.

[0003] In addition, the average survival time for patients with metastatic colorectal cancer is less than 30 months, so it is most important to completely remove the tumor and metastatic sites through surgery. However, up to about 50% of colorectal cancer patients may experience subsequent metastasis, so the primary treatment strategy is to suppress tumor growth as much as possible using first-line therapy.

[0004] However, in the case of first-line treatment regimens, the effectiveness of each drug has not been clearly proven based on individual clinical or genetic factors of the patient, so there is a disadvantage that a random drug must be selected due to the absence of biomarkers in the selection process of first-line treatment regimens, and there is a limitation in that long-term use is restricted due to the toxicity of first-line treatment regimens.

[0005] Therefore, there is a need for technology that can predict the effectiveness of each treatment in advance so that a treatment regimen suitable for each individual patient can be selected.

[0006] Thus, according to the present invention, the purpose is to provide a system for predicting the prognosis of a primary treatment regimen for colorectal cancer patients, which predicts the prognosis of one or more primary treatment regimens through an artificial intelligence model by considering the clinical information and genetic information of the subject.

[0007] According to an embodiment of the present invention for achieving such technical challenges, a system for predicting the prognosis of a primary treatment regimen for colorectal cancer patients comprises: a data input unit that receives clinical information and genetic information of a subject; a feature extraction unit that selects and processes genetic features from the genetic information according to a pre-set molecular subtype reflection criterion among a plurality of candidate gene mutation information to generate a genomic feature vector that reflects the characteristics of a molecular subtype; and a prognosis prediction unit that applies the input clinical information and the generated genomic feature vector to a learned prognosis prediction model to predict the prognosis of the subject's primary treatment regimen, and may further include a learning unit that trains a prognosis prediction model to predict the prognosis of a patient based on the clinical information, genetic information, and genomic feature vector of a plurality of patients under a pre-set cohort condition.

[0008] The above prognosis prediction unit can classify a subject into either a group with excellent treatment responsiveness or a group with high resistance based on predefined prognosis criteria by applying the genomic feature vector generated based on the clinical information and genetic information of the input subject to a learned prognosis prediction model.

[0009] The above prognosis prediction unit can predict a negative prognosis if the subject is classified into a high-resistance group, and a positive prognosis if the subject is classified into a group with excellent treatment responsiveness.

[0010] The above clinical information includes at least one of the subject's gender, age at the time the patient started the first-line treatment regimen, stage of colorectal cancer at diagnosis, and whether combination therapy was used during the first-line treatment, and the above genetic information can be used as source data to derive genomic features from at least one of genetic variation, hotspot mutation, microsatellite instability, and tumor mutation burden based on the subject's tumor genome data.

[0011] The above genomic feature vector can reflect the degree of chromosomal instability by analyzing the presence of mutations in gene groups belonging to the corresponding pathways, mutation frequency, and overall mutation distribution patterns based on multiple molecular pathways including mismatch repair (MMR), homologous recombination (HR), and DNA damage repair (DDR).

[0012] As such, according to the present invention, a system can be provided to support decision-making for selecting the optimal treatment method for colorectal cancer patients by predicting the prognosis of the first-line treatment regimen.

[0013] In addition, by predicting the prognosis of each individual patient's primary treatment regimen in advance, it can help optimize personalized treatment plans.

[0014] In addition, conventional methods for predicting responsiveness to anticancer chemotherapy are very inefficient in terms of time, cost, and labor because they require experiments for the square of the number of all drugs. However, by applying the subject's clinical and genetic information to a pre-established artificial intelligence model to predict the duration of treatment, the response and prognosis of first-line therapy can be reliably predicted quickly and simply.

[0015] FIG. 1 is a configuration diagram of a system for predicting the prognosis of a primary treatment regimen for colorectal cancer patients according to one embodiment of the present invention.

[0016] FIG. 2 is a diagram illustrating the framework of a prognosis prediction model according to one embodiment of the present invention.

[0017] Figure 3 is a diagram illustrating the results of analyzing the survival period using a verification dataset according to an embodiment of the present invention.

[0018] Preferred embodiments according to the present invention will be described in detail below with reference to the attached drawings. In this process, the thickness of lines or the size of components shown in the drawings may be exaggerated for clarity and convenience of explanation.

[0019] Throughout the specification, when a part is described as "including" a certain component, this means that, unless specifically stated otherwise, it does not exclude other components but may include additional components.

[0020] Furthermore, the terms described below are defined in consideration of their functions within the present invention, and these may vary depending on the intent or practice of the user or operator. Therefore, the definitions of these terms should be based on the content throughout this specification.

[0021] In the embodiments of the present invention described below, the primary treatment prognosis prediction system (100) is specifically described as predicting the prognosis of FOLFOX among primary treatment regimens for colorectal cancer. However, the present invention is not limited thereto.

[0022] Additionally, the primary treatment prognosis prediction system (100) may be performed by a computing device comprising one or more memories or one or more processors capable of performing the following processes.

[0023] FIG. 1 is a configuration diagram of a system for predicting the prognosis of a primary treatment regimen for colorectal cancer patients according to one embodiment of the present invention.

[0024] As illustrated in FIG. 1, a prognosis prediction system (100) for a primary treatment regimen for colorectal cancer patients may be configured to include a data input unit (110), a feature extraction unit (120), a learning unit (130), and a prognosis prediction unit (140).

[0025] Here, each component may be implemented by one or more processors that receive clinical information and genetic information of a subject, select and process genetic features that have a significant correlation with treatment responsiveness among multiple candidate genetic variants according to pre-established criteria for reflecting molecular subtypes from the genetic information, generate a genomic feature vector that reflects the characteristics of the molecular subtype, apply the input clinical information and the generated genomic feature vector to a pre-trained prognosis prediction model to predict the prognosis of the subject's first-line treatment regimen, and train the prognosis prediction model to predict the patient's prognosis based on the clinical information, genetic information, and genomic feature vector reflecting the characteristics of molecular subtypes of multiple patients under pre-established cohort conditions, and the prognosis prediction model executed by the processor may be performed as an instruction or program stored in memory.

[0026] Additionally, the data input unit (110), feature extraction unit (120), learning unit (130), and prognosis prediction unit (140) may be configured to perform their functions sequentially, and may be implemented so that a processor performs each function in parallel or a single processor performs each function sequentially.

[0027] In addition, the method for predicting the prognosis of a primary treatment regimen for a colorectal cancer patient, which is performed by the system (100) for predicting the prognosis of a primary treatment regimen for a colorectal cancer patient, can be implemented as computer-readable code on a computer-readable recording medium, and the computer-readable recording medium (e.g., ROM, RAM, CD-ROM, magnetic tape, floppy disk, optical data storage device, etc.) includes all types of recording devices in which data that can be read by a computing device is stored. In addition, the computer-readable recording medium can be distributed to networked computing devices so that computer-readable code can be stored and executed in a distributed manner.

[0028] First, the data input unit (110) can receive clinical information and genetic information of the subject.

[0029] Here, clinical information includes at least one of the subject's gender, age at the time the patient started the first-line treatment regimen, stage of colorectal cancer at diagnosis, and type of combination therapy during the first-line treatment. In this case, the stage of colorectal cancer is binarized to 0 for stages 1, 2, and 3, and to 1 for stage 4, and combination therapy during the first-line treatment indicates whether monotherapy using only one type of treatment regimen or combination therapy using different types of drugs is performed.

[0030] According to one embodiment of the present invention, in the case of combination therapy during primary treatment, it can be expressed as any one of the use of FOLFOX therapy alone, the use of FOLFOX therapy and CETUXIMAB in combination, or the use of FOLFOX therapy and BEVACIZUMAB in combination.

[0031] Genetic information includes information on multiple candidate gene mutations and is used as source data to derive genomic features from at least one of mutation, hotspot mutation, microsatellite instability (MSI), and tumor mutation burden (TMB) based on the subject's tumor genome data. Mutation and hotspot mutation are expressed as mutations through binarization, microsatellite instability is a continuous numerical value calculated based on the tumor genome data, and tumor mutation burden is a numerical value representing the number of detected mutations per million base pairs.

[0032] Specifically, the data input unit (110) can receive clinical information and genetic information of the subject from an Electronic medical record (EMR), a Hospital information system (HIS), a clinical pathology testing device, or a digital pathology system.

[0033] According to one embodiment of the present invention, the data input unit (110) can receive genetic information obtained from a subject who is a patient with metastatic colorectal cancer (i.e., genetic mutation information of tumor genome data).

[0034] Next, the feature extraction unit (120) can generate a genomic feature vector that reflects the characteristics of a molecular subtype by selecting and processing genetic features that have a significant correlation with therapeutic responsiveness among multiple candidate genetic variant information from the input subject's genetic information according to a pre-set molecular subtype reflection criterion (e.g., molecular subtype classification based on whether there is a variant and the number of variants for genes within a DNA repair-related signaling pathway).

[0035] Here, the genomic feature vector is a vector that reflects the degree of chromosomal instability (e.g., chromosome instable, chromosome intermediate, and chromosome stable) by analyzing the presence of mutations, mutation frequency, and overall mutation distribution patterns of gene groups belonging to the respective pathways, based on multiple molecular pathways including mismatch repair (MMR), homologous recombination (HR), and DNA damage repair (DDR).

[0036] At this time, the chromosomal unstable type is one in which the overall mutation distribution pattern satisfies the established subtype criteria (e.g., genes in the top 20% of the cohort conditions for the overall gene mutation distribution pattern) and at least one gene group belonging to the mismatch repair pathway, homologous recombination repair pathway, or DNA damage repair pathway possesses a mutation; the chromosomal intermediate type is one in which the overall mutation distribution pattern satisfies the established subtype criteria (e.g., genes in the top 20% of the cohort conditions for the overall number of gene mutations) or at least one gene group belonging to the mismatch repair pathway, homologous recombination repair pathway, or DNA damage repair pathway possesses a mutation; and the chromosomal stable type is one in which the overall mutation distribution pattern does not satisfy the established subtype criteria (e.g., genes in the top 20% of the cohort conditions for the overall gene mutation distribution pattern) and at least one gene group belonging to the mismatch repair pathway, homologous recombination repair pathway, or DNA damage repair pathway possesses a mutation.

[0037] Next, the learning unit (130) can train a prognosis prediction model to predict the prognosis of a patient based on the clinical information, genetic information, and genomic feature vectors of the patient in the pre-set cohort condition.

[0038] Specifically, the learning unit (130) can train a prognostic prediction model to predict the prognosis of a patient based on Time to next treatment (TTNT) according to predefined prognostic criteria based on genomic feature vectors reflecting clinical information, genetic information, and molecular subtypes of a patient who satisfies a pre-set cohort condition (e.g., a patient with metastatic colorectal cancer who uses FOLFOX therapy as a first-line treatment for palliative care).

[0039] According to one embodiment of the present invention, the learning unit (130) classifies patients who satisfy a pre-set cohort condition (e.g., patients who use FOLFOX therapy as a first-line treatment regimen for palliative care, who have genetic variation data available, and who can measure TTNT) into a high-resistance group if the measured TTNT period is 9 months or less, and into a high-resistance group if the TTNT period is 15 months or more, and extracts a pre-set ratio (80%) of the patient information corresponding to the high-resistance group or the high-resistance group as a learning dataset and extracts the remainder as a validation dataset to train a prognosis prediction model. At this time, the patient information includes the patient's clinical information, genetic information, and genomic feature vector.

[0040] At this time, the validation dataset includes the remaining proportion (20%) of patient information from the high-resistance group or the group with excellent treatment responsiveness that remains after extraction from the training dataset, and the same proportion (20%) of patient information from the intermediate response group.

[0041] That is, the learning unit (130) can be trained to classify and predict which group a target patient belongs to by pre-labeling patients into a good prognosis group and a poor prognosis group based on the TTNT index.

[0042] FIG. 2 is a diagram illustrating the framework of a prognosis prediction model according to one embodiment of the present invention.

[0043] As illustrated in FIG. 2, the learning unit (130) can train a prognosis prediction model by selecting a predetermined number of key factors (e.g., 50) from a learning dataset through pharmacogenomics analysis and feature selection techniques, and labeling groups with high resistance as 0 and groups with excellent treatment responsiveness as 1.

[0044] Additionally, the learning unit (130) can train a prognosis prediction model, constructed by sequentially combining several decision trees as one of various algorithm examples, to predict that the prognosis will be positive (Increased benefit) or negative (Decreased benefit) when a primary treatment regimen is used by receiving the patient's genetic information, clinical information, and genomic feature vector as input.

[0045] Figure 3 is a diagram illustrating the results of analyzing the survival period using a verification dataset according to an embodiment of the present invention.

[0046] As shown in FIG. 3, the learning unit (130) applies the verification dataset to the prognosis prediction model to predict positive and negative groups, and performs verification by analyzing the survival period between the two groups using a pre-specified analysis method (e.g., Kaplan-Meier analysis).

[0047] Based on this, the learning unit (130) can train a prognosis prediction model to predict positively or negatively by identifying factors (key gene mutations and clinical characteristics) that affect the prognosis prediction of the primary treatment regimen depending on the presence or absence of specific gene mutations.

[0048] Next, the prognosis prediction unit (140) can predict the prognosis of the subject's primary treatment regimen by applying the input subject's clinical information and genomic feature vector to a learned prognosis prediction model.

[0049] Specifically, the prognosis prediction unit (140) can predict the prognosis of a subject positively or negatively by applying the input clinical information and genomic feature vector of the subject to a learned prognosis prediction model and classifying the subject into either a group with excellent treatment responsiveness or a group with high resistance according to predefined prognosis criteria based on TTNT.

[0050] According to one embodiment of the present invention, the prognosis prediction unit (140) applies the clinical information and genomic feature vector of the input subject to a learned prognosis prediction model, and if the subject is classified into a group with high resistance, the prognosis is predicted to be negative, and if the subject is classified into a group with excellent treatment responsiveness, the prognosis is predicted to be positive.

[0051] According to the embodiments of the present invention described above, a system can be provided to support decision-making for selecting the optimal treatment method for colorectal cancer patients by predicting the prognosis of the first-line treatment regimen.

[0052] In addition, by predicting the prognosis of each individual patient's primary treatment regimen in advance, it can help optimize personalized treatment plans.

[0053] In addition, conventional methods for predicting responsiveness to anticancer chemotherapy are very inefficient in terms of time, cost, and labor because they require experiments for the square of the number of all drugs. However, by applying the subject's clinical and genetic information to a pre-established artificial intelligence model to predict the duration of treatment, the response and prognosis of first-line therapy can be reliably predicted quickly and simply.

[0054] The present invention has been described with reference to the embodiments illustrated in the drawings, but this is merely illustrative, and those skilled in the art will understand that various modifications and equivalent alternative embodiments are possible therefrom. Accordingly, the true technical scope of protection of the present invention should be determined by the technical spirit of the following claims.

Claims

1. A data input unit that receives the subject's clinical information and genetic information; A feature extraction unit that generates a genomic feature vector reflecting the characteristics of a molecular subtype by selecting and processing genetic features according to pre-established molecular subtype reflection criteria among multiple candidate genetic variant information; and A prognosis prediction system for a colorectal cancer patient's first-line treatment regimen, comprising a prognosis prediction unit that predicts the prognosis of the subject's first-line treatment regimen by applying the input clinical information and the generated genomic feature vector to a learned prognosis prediction model.

2. In Paragraph 1, The above prognosis prediction unit is, A prognosis prediction system for first-line treatment regimens of colorectal cancer patients, which applies the genomic feature vector generated based on the clinical and genetic information of the input subject to a learned prognosis prediction model to classify the subject into either a group with excellent treatment responsiveness or a resistant group according to predefined prognosis criteria based on TTNT (Time to next treatment).

3. In Paragraph 2, The above prognosis prediction unit is, A prognosis prediction system for first-line treatment regimens in colorectal cancer patients, which predicts a negative prognosis if the subject is classified into a high-resistance group and a positive prognosis if the subject is classified into a group with excellent treatment responsiveness.

4. In Paragraph 1, The above clinical information includes at least one of the subject's gender, age at the time the patient started the first-line treatment regimen, stage of colorectal cancer at diagnosis, and whether combination therapy was used during the first-line treatment, and A system for predicting the prognosis of a primary treatment regimen for colorectal cancer patients, wherein the above genetic information is used as source data to derive genomic features from at least one of genetic variation, hotspot mutation, microsatellite instability, and tumor mutation burden based on the subject's tumor genome data.

5. In Paragraph 1, The above-described genomic feature vector is a system for predicting the prognosis of a primary treatment regimen for colorectal cancer patients that reflects the degree of chromosomal instability by analyzing the presence of mutations in gene groups belonging to the respective pathways, mutation frequency, and overall mutation distribution patterns based on multiple molecular pathways including mismatch repair (MMR), homologous recombination (HR), and DNA damage repair (DDR).

6. In Paragraph 1, A prognosis prediction system for primary treatment regimens for colorectal cancer patients, further comprising a learning unit for training a prognosis prediction model to predict the prognosis of patients based on clinical information, genetic information, and genomic feature vectors of multiple patients of a pre-set cohort condition.