A method for predicting the prognosis of breast cancer treatment based on cell-free DNA in the blood.

The method using cfDNA analysis and breast tissue image information addresses the challenge of predicting breast cancer prognosis across subtypes, enhancing treatment accuracy and reducing side effects by identifying patients at risk of recurrence.

JP7883583B2Active Publication Date: 2026-07-01GREEN CROSS GENOME CORP

Patent Information

Authority / Receiving Office
JP · JP
Patent Type
Patents
Current Assignee / Owner
GREEN CROSS GENOME CORP
Filing Date
2022-12-05
Publication Date
2026-07-01

AI Technical Summary

Technical Problem

Current breast cancer treatment methods face challenges in accurately predicting patient prognosis, particularly for early-stage patients, leading to unnecessary side effects from chemotherapy and radiation therapy, and existing prognostic indicators are limited in their applicability across different breast cancer subtypes.

Method used

A method utilizing cell-free DNA (cfDNA) analysis through normalization calibration and regression analysis of chromosomal regions, combined with breast tissue image information, to predict breast cancer prognosis, involving steps such as sequence alignment, quality control, and I-score calculation.

Benefits of technology

This approach allows for high-sensitivity prediction of breast cancer prognosis, enabling more accurate treatment selection and monitoring, reducing unnecessary treatments and improving patient outcomes.

✦ Generated by Eureka AI based on patent content.

Smart Images

  • Figure 0007883583000030
    Figure 0007883583000030
  • Figure 0007883583000031
    Figure 0007883583000031
  • Figure 0007883583000032
    Figure 0007883583000032
Patent Text Reader

Abstract

The present invention relates to a method for predicting the prognosis of breast cancer treatment based on cell-free DNA in blood, more specifically, the method includes extracting cell-free DNA (cfDNA) from a biological sample before anti-cancer treatment, obtaining sequence information, obtaining an I score using normalization calibration and regression analysis of chromosomal regions, and analyzing the I score and breast image information after anti-cancer treatment. The method for predicting the prognosis of breast cancer according to the present invention not only improves the accuracy of prognosis prediction for breast cancer patients using next generation base sequence analysis technology (Next Generation Sequencing, NGS), but also improves the accuracy of prognosis prediction based on very low concentrations of cell-free DNA that are difficult to detect, thereby increasing commercial applicability. Therefore, the method of the present invention is useful for determining the prognosis of breast cancer patients.
Need to check novelty before this filing date? Find Prior Art

Description

Technical Field

[0001] The present invention relates to a method for predicting the prognosis of breast cancer treatment based on cell-free DNA in blood. More specifically, after extracting cell-free DNA (cfDNA) from a biological sample before anti-cancer treatment and obtaining sequence information, an I score is obtained by using normalization calibration and regression analysis of chromosomal regions, and the present invention relates to a method for predicting the prognosis of breast cancer treatment based on cell-free DNA, which includes analyzing both the I score and breast image information after anti-cancer treatment.

Background Art

[0002] Breast cancer is a tumor composed of cancer cells that occur in the breast, and is the second most common type of cancer worldwide after lung cancer, and is known as the fifth most cancer with high mortality after lung cancer, gastric cancer, liver cancer, and colon cancer. In addition, breast cancer is the most common cancer for women and the second most cancer with high mortality.

[0003] Risk factors for breast cancer onset include race, age, and mutations in tumor suppressor genes BRCA-1, BRCA-2, and p53. Alcohol intake, high-fat diet, lack of exercise, exogenous hormones after menopause, and ionizing radiation also increase the risk of breast cancer onset. Breast cancer is classified into four subtypes: luminal A type, luminal B type, HER2 type, and triple-negative breast cancer (TNBC) according to the expression status of hormone receptors (estrogen receptor or progesterone receptor) and HER2 (human epidermal growth factor receptor 2). Each subtype of breast cancer has distinct molecular characteristics.

[0004] Current treatment methods for breast cancer often require additional adjuvant therapy after tumor removal surgery, such as chemotherapy, anti-hormone therapy, targeted therapy, or radiation therapy, to reduce the risk of future recurrence. While 70% to 80% of early-stage breast cancer patients do not require chemotherapy due to a very low risk of metastasis to other organs, existing breast cancer treatment guidelines make accurate identification difficult, resulting in the majority of patients receiving chemotherapy and radiation therapy after surgery. However, continuously administering anticancer drugs to patients who do not respond well to chemotherapy only increases side effects and can cause unwanted suffering. Therefore, it is necessary to clearly predict the future prognosis of early-stage breast cancer patients, wisely select the most appropriate treatment method at present, and prepare for poor prognoses such as metastatic recurrence.

[0005] Once breast cancer treatment begins, the progression of the cancer must be monitored periodically. However, diagnostic methods are costly and time-consuming, and it is very difficult to detect and diagnose cancer when the patient's tumor is small or the number of cancer cells is low. While some products exist that can predict prognosis, they are still expensive, cannot confirm the patient's condition during the treatment process, and can only predict a simple prognosis at a single point in time.

[0006] On the other hand, existing prognostic indicators for breast cancer have primarily focused on proliferation and cell cycle signaling. Therefore, proliferation / cell cycle regulatory genes have been used as markers and applied to gene expression-based analytical methods for prognosis prediction. Representative products such as Oncotype DX, MammaPrint, PAM50, and Endopredict are commercial analytical methods based on composite gene expression profiling techniques targeting proliferation genes in frozen or formalin-fixed paraffin-embedded (FFPE) samples. However, these commercial kits have limitations, as each targets a specific breast cancer subtype, making them difficult to use universally across all subtypes of breast cancer molecules. The aforementioned Oncotype DX, MammaPrint, PAM50, and Endopredict kits primarily target ER+ type breast cancer. As can be seen from these commercial kits, they are only capable of predicting prognosis for hormone receptor-positive breast cancer subtypes, and commercial kits for hormone receptor-negative breast cancer subtypes still do not exist.

[0007] Considering the current situation, improvements to existing analytical methods used for predicting breast cancer prognosis are required to more accurately predict patient survival outcomes and responses to adjuvant chemotherapy. There is a need for a prognostic analysis method that can be universally applied to various types of breast cancer.

[0008] Recently, research has been progressing to detect chromosomal abnormalities using cell-free DNA (cfDNA) present in plasma, obtained through cell necrosis, apoptosis, and secretion, based on liquid biopsy techniques. In particular, cell-free DNA in the blood derived from tumor cells contains tumor-specific chromosomal abnormalities and mutations that do not appear in normal cells, and has the advantage of reflecting the current state of the tumor because it has a short half-life of about 2 hours. Furthermore, because it can be collected non-invasively and repeatedly, cell-free DNA in the blood is attracting attention as a tumor-specific biomarker in various cancer-related fields such as cancer diagnosis, monitoring, and prognosis.

[0009] With the advancement of molecular diagnostic technologies, studies have shown that tumor-specific chromosomal abnormalities can be detected in cell-free DNA from the blood of cancer patients through digital karyotyping, PARE analysis, and NGS, and clinically confirmed results have been published (Leary RJ et al., Sci Transl Med. Vol.4, Issue 162, 2012). Daniel G. Stover analyzed tissue-specific CNAs via cfDNA in 164 patients with metastatic TNBC (Triple-Negative Breast Cancer) (Stover DG. et al., J Clin Oncol. Vol.36(6):543-553). As a result, he found that copy number gain of specific genes such as NOTCH2, AKT2, and AKT3 was higher in metastatic TNBC compared to primary TNBC, and confirmed that the survival rate of metastatic TNBC patients with duplication of chromosomes 18q11 and 19p13 was statistically significantly lower.

[0010] Against this technological backdrop, the inventors diligently worked to develop a method for predicting the prognosis of breast cancer based on cell-free DNA in the blood. As a result, they confirmed that by performing normalization calibration and regression analysis of chromosomal regions using cell-free DNA in blood samples obtained before anti-cancer treatment, and integrating the results with image interpretation information after cancer treatment, it is possible to predict the prognosis of breast cancer patients with high sensitivity, thus completing the present invention. [Overview of the project] [Problems that the invention aims to solve]

[0011] The object of the present invention is to provide a cell-free DNA (cfDNA)-based method for predicting the prognosis of breast cancer.

[0012] Another object of the present invention is to provide a device for predicting the prognosis of breast cancer.

[0013] A further object of the present invention is to provide a computer-readable medium containing instructions configured to be executed by a processor that predicts the prognosis of breast cancer in the manner described above.

[0014] A further object of the present invention is to provide a method for providing information for determining the prognosis of breast cancer, including the method described above.

[0015] A further object of the present invention is to provide a method for determining the prognosis of breast cancer, including the method described above. [Means for solving the problem]

[0016] To achieve the above objective, the present invention provides: a) the step of obtaining sequence information of cell-free DNA isolated from a biological sample before anti-cancer treatment; b) the step of aligning the sequence information (reads) in a reference genome database of a reference population; c) the step of checking the quality of the aligned sequence information (reads) and selecting only the sequence information that is equal to or greater than a cut-off value; d) the step of dividing the standard chromosome into certain intervals (bins) and normalizing the selected sequence information (reads) by checking the amount in each interval; e) the step of calculating the Z score between the values ​​normalized in step d) after determining the mean and standard deviation of the reads matched to each normalized interval (bin) of the reference population; f) the step of dividing the chromosome using the Z score and calculating the I score; g) the step of obtaining breast tissue image interpretation information after anti-cancer treatment; and h) the step of determining whether the calculated I score is equal to or greater than a cut-off value. The present invention provides a cell-free DNA (cfDNA)-based method for predicting the prognosis of breast cancer, which includes a step in which, if the breast tissue image interpretation information is positive and the value is above a certain level, the prognosis for breast cancer is determined to be poor.

[0017] Furthermore, the present invention provides a cfDNA-based breast cancer prognosis prediction device comprising: a decoding unit for decoding cell-free DNA sequence information isolated from a biological sample before anti-cancer treatment; an alignment unit for aligning the decoded sequences in a standard chromosome sequence database of a reference population; a quality control unit for selecting only the sequence information of samples that are equal to or greater than a cut-off value for the aligned sequence information (reads); an I-score calculation unit for calculating a Z-score by comparing the selected sequence information (reads) with a reference population sample, and then calculating an I-score based on this; an image interpretation information receiving unit for acquiring breast tissue image interpretation information after anti-cancer treatment; and a determination unit for determining that the prognosis for breast cancer is poor if the I-score is equal to or greater than a cut-off value and the image interpretation information is positive.

[0018] Furthermore, the present invention includes instructions configured to be executed by a processor that predicts the prognosis of breast cancer, as a computer-readable medium, a) a step of obtaining cell-free DNA sequence information isolated from a biological sample before anti-cancer treatment; b) a step of aligning the sequence information (reads) in a reference genome database of a reference population; c) a step of checking the quality of the aligned sequence information (reads) and selecting only sequence information that is above a cut-off value; d) a step of dividing the standard chromosome into certain intervals (bins) and normalizing the selected sequence information (reads) by checking the amount in each interval; e) a step of calculating the Z score between the values ​​normalized in step d) after obtaining the mean and standard deviation of the reads matched to each normalized interval (bin) of the reference population; f) a step of dividing the chromosome using the Z score and calculating the I score; g) a step of obtaining breast tissue image interpretation information after anti-cancer treatment; and h) a step of when the calculated I score is above a cut-off value If the breast tissue image interpretation information is positive (value) or higher, it is considered a stage where the prognosis for breast cancer is poor; Provides a computer-readable medium containing instructions configured to be executed by a processor including [a specific component].

[0019] Furthermore, the present invention provides a method for providing information for determining the prognosis of breast cancer, including the method described above. Furthermore, the present invention provides a method for determining the prognosis of breast cancer, including the method described above. [Brief explanation of the drawing]

[0020] [Figure 1] This is the overall flowchart for predicting the prognosis of breast cancer based on cfDNA, according to the present invention. [Figure 2] This diagram illustrates the correction results of the number of sequencing reads before and after GC calibration using the LOESS algorithm during the QC (quality control) process of read data. [Figure 3]Results of Kaplan-Meier analysis for predicting breast cancer progression and survival by the method of the present invention. (A) shows the results in the discovery cohort, and (B) shows the results in the validation cohort. [Figure 4] Results of risk analysis for breast cancer progression and survival by the method of the present invention. (A) shows the results in the discovery cohort, and (B) shows the results in the validation cohort. [Figure 5] Results of confirming the relationship between the I score of the present invention and pathological complete response (pCR) by Kaplan-Meier analysis. (A) shows the results in the discovery cohort, and (B) shows the results in the validation cohort. [Figure 6] Prognosis prediction results for the survival of breast cancer patients in groups subdivided by the I score and pCR of the present invention. [Figure 7] Results of risk assessment in the prognosis prediction for the survival of breast cancer patients in groups subdivided by the I score and pCR of the present invention.

Mode for Carrying Out the Invention

[0021] Unless otherwise defined, all technical and scientific terms used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this invention belongs. In general, the nomenclature used herein and the experimental methods described below are well known and commonly used in the art.

[0022] Terms such as "first", "second", "A", "B", etc. can be used to describe various components, but each corresponding component is not limited by the above terms and is only used for the purpose of distinguishing one component from another. For example, without departing from the scope of the technical scope described below, the first component may be named the second component, and similarly, the second component may be named the first component. The term "and / or" includes combinations of multiple related listed items or any one of the multiple related listed items.

[0023] In the terms used herein, a singular expression should be understood to include the plural expression unless the context clearly indicates otherwise, and terms such as “includes” should be understood to mean that the described feature, number, stage, operation, component, part, or combination thereof exists, without prejudice to the existence or possibility of adding one or more other features, number, stage, operation, component, part, or combination thereof.

[0024] Before providing a detailed explanation of the drawings, it should be made clear that the classifications of components in this specification are merely based on the main function each component performs. That is, two or more components described below may be combined into one component, or one component may be further subdivided into two or more components based on more specific functions. Furthermore, each component described below may additionally perform some or all of the functions performed by other components, in addition to its own main function, and it is also possible that some of the main functions performed by each component may be exclusively performed by other components.

[0025] Furthermore, in carrying out a method or operation, each step of the method may occur in an order different from the specified order, unless the context clearly indicates a specific order. That is, each step may occur in the same order as specified, substantially simultaneously, or in the opposite order.

[0026] In this invention, sequence analysis data obtained from breast cancer patient samples is normalized, organized based on reference values, divided into certain intervals (bins), the read quantity for each interval (bin) is normalized, the Z score is calculated relative to a reference population sample, the chromosomes are segmented again based on the derived Z score, and the I-score is calculated based on this. It was confirmed that an I-score above the reference value indicates a poor prognosis, and an I-score below the reference value indicates a good prognosis. Specifically, the risk group for death or progression due to breast cancer can be classified and confirmed based on the reference value of the I-score and the presence or absence of pathological complete response (pCR). More specifically, it was confirmed that if the I-score is above the reference value and the image interpretation information is negative, it can be classified as a moderate-risk group; if the I-score is below the reference value and the image interpretation information is positive, it can be classified as a high-risk group; and if the I-score is above the reference value and the image interpretation information is positive, it can be classified as a very high-risk group.

[0027] In other words, in one embodiment of the present invention, DNA extracted from the blood of 20 healthy individuals and 456 breast cancer patients before anti-cancer treatment was sequenced, quality was controlled using the LOESS algorithm, the chromosomes were divided into certain intervals (bins), the amount of matching reads for each interval was normalized by the GC ratio, the mean and standard deviation of matching reads for each interval (bin) were obtained from the healthy individual samples, the Z score was calculated with the normalized values, the chromosomal regions where the Z score changes abruptly were further segmented (segmentation), the I score was calculated using this, and the pCR information of breast cancer patients after anti-cancer treatment was analyzed together. A method was developed to determine that the prognosis of breast cancer patients is poor if the I score is 7.81 or higher and pCR is not achieved (Figure 1).

[0028] In this invention, the term "reads" means a single nucleic acid fragment whose sequence information has been analyzed using various methods known in the art. Therefore, in this specification, the terms "sequence information" and "reads" have the same meaning in that they are the result of obtaining sequence information through a sequencing process.

[0029] In this invention, the term "prognosis prediction" is used interchangeably with "prognosis," but refers to the act of predicting the course and outcome of a disease in advance. More specifically, prognosis prediction can be interpreted as all acts of predicting the course of the disease after treatment, taking into account the patient's condition, as the course of the disease after treatment can vary depending on the patient's physiological or environmental state.

[0030] For the purposes of the present invention, the prognosis prediction can be interpreted as the act of predicting the course of the disease in advance after treatment for breast cancer, and predicting the risk of cancer progression, cancer recurrence, and / or cancer metastasis. For example, the term "good prognosis" means that after treatment for breast cancer, the risk of cancer progression, cancer recurrence, and / or cancer metastasis in the patient is less than 1, and the breast cancer patient has a high probability of survival, and can also be expressed as "positive prognosis" in other senses. The term "bad prognosis" means that after treatment for breast cancer, the risk of cancer progression, cancer recurrence, and / or cancer metastasis in the patient is greater than 1, and the breast cancer patient has a high probability of death, and can also be expressed as "negative prognosis" in other senses.

[0031] In this invention, the term "risk" refers to the odds ratio, risk ratio, etc., for the probability that a patient will experience cancer progression, recurrence, and / or metastasis after treatment for breast cancer.

[0032] Therefore, in one view, the present invention, a) The step of obtaining sequence information of cell-free DNA isolated from a biological sample before anti-cancer treatment; b) The step of aligning the sequence information (reads) with the reference genome database of the reference population; c) A step in which the quality of the sorted sequence information (reads) is checked and only sequence information that is equal to or greater than the cut-off value is selected; d) A step of dividing the standard chromosome into certain intervals (bins), and normalizing the selected sequence information (reads) by checking the amount in each interval; e) After determining the mean and standard deviation of the reads matched to each normalized interval (bin) of the reference population, the step of calculating the Z-scores between the values ​​normalized in step d); f) A step of dividing chromosomes using the Z score and calculating the I score; g) The step of obtaining information for interpreting breast tissue images after anti-cancer treatment; and The present invention relates to a cell-free DNA (cfDNA)-based method for predicting the prognosis of breast cancer, which includes the step of determining that the prognosis for breast cancer is poor if the calculated I-score is equal to or greater than the cut-off value and the breast tissue image interpretation information is positive;

[0033] In the present invention, any method capable of treating cancer can be used without limitation, and is preferably selected from the group consisting of neoadjuvant therapy, neoadjuvant chemotherapy, adjuvant anti-cancer chemotherapy, surgical treatment, and radiotherapy, but is not limited thereto.

[0034] In the present invention, The aforementioned step a) is, (ai) The step of removing proteins, fats, and other residues from the collected cell-free DNA using a salting-out method, column chromatography method, or beads method to obtain purified nucleic acids; (a-ii) The step of preparing a single-end sequencing or pair-end sequencing library from the purified nucleic acid; (a-iii) the step of reacting the prepared library with a next-generation sequencer; and (a-iv) The method may be characterized by including the step of obtaining nucleic acid sequence information (reads) using the next-generation gene sequencing analyzer.

[0035] The present invention may be characterized in that, between step (ai) and step (a-ii), the nucleic acid purified in step (ai) is randomly fragmented by enzymatic cleavage, pulverization, or hydroshear method to produce a single-end sequencing or pair-end sequencing library.

[0036] In the present invention, step a) of acquiring sequence information may be characterized by acquiring the isolated cell-free DNA through full-length gene sequencing at a depth of 0.01 to 100 reads.

[0037] In the present invention, the next-generation sequencer is not limited to, but may include, Illumina Corporation's Hiseq system, Illumina Corporation's Miseq system, Illumina Corporation's Genome Analyzer (GA) system, Roche Company's 454 FLX, Applied Biosystems, Inc.'s SOLiD system, or Life Technologies, Inc.'s Ion Torrent system.

[0038] In the present invention, the biological sample means any substance, biological fluid, tissue, or cell obtained from or derived from an individual, and includes, for example, whole blood, blood including leukocytes, peripheral blood mononuclear cells, leukocyte buffy coat, plasma, and serum, sputum, tears, mucus, nasal washes, nasal aspirate, breath, urine, semen, saliva, peritoneal washings, pelvic fluids, cystic fluid, meningeal fluid, amniotic fluid, glandular fluid, and pancreatic fluid. This may include, but is not limited to, fluids such as lymph fluid, pleural fluid, nipple aspirate, bronchial aspirate, synovial fluid, joint aspirate, organ secretions, cells, cell extract, semen, hair, saliva, urine, oral cells, placental cells, cerebrospinal fluid, and mixtures thereof.

[0039] In this invention, the term "reference population" refers to a comparable reference population, such as a standard nucleotide sequence database, and means a population of people who currently do not have a specific disease or condition. In this invention, the standard nucleotide sequence in the standard chromosome sequence database of the reference population may be a reference chromosome registered with a public health organization such as the NCBI.

[0040] In the present invention, the sorting step may be performed using the BWA algorithm and the Hg19 sequence, although this is not limited thereto.

[0041] In the present invention, the BWA algorithm may include, but is not limited to, BWA-ALN, BWA-SW, or Bowtie2.

[0042] In the present invention, in step c), confirming the quality of the aligned sequence information means confirming how well the actual sequencing reads match the reference chromosome sequence using a Mapping Quality Score index.

[0043] In the present invention, step c) is, (ci) the step of identifying the region of each aligned nucleic acid sequence; and (c-ii) The procedure may be characterized by including a step of selecting sequences that satisfy a mapping quality score and a GC ratio within the region.

[0044] In the present invention, in the step of identifying the nucleic acid sequence region in step (ci), the nucleic acid sequence region is not limited to, but may be 20kb to 1MB.

[0045] In the present invention, in step (c-ii), the reference value may vary depending on the desired criteria for the mapping quality score, but specifically it may be 15 to 70, more specifically 30 to 65, and most specifically 60. In step (c-ii), the GC ratio may vary depending on the desired criteria, but specifically it may be 20% to 70%, and more specifically 30% to 60%.

[0046] In the present invention, step c) may be characterized by being performed excluding data from the central or terminal body of the chromosome.

[0047] In the present invention, the term "centrosome" may be characterized by being approximately 1 Mb from the starting point of each chromosome's long arm (q arm), but is not limited thereto.

[0048] In the present invention, the term "terminal body" may be characterized by being within approximately 1 Mb from the start point of each chromosome short arm (p arm) or within 1 Mb from the end point of the long arm (q arm), but is not limited thereto.

[0049] In the present invention, step d) is, (di) The stage of dividing a standard chromosome into specific segments (bins); (d-ii) A step of calculating the number of reads aligned in the above interval and the GC amount of each read; (d-iii) A step of performing regression analysis based on the number of reads and the amount of GC and calculating the regression coefficients; and (d-iv) The step of normalizing the number of reads using the regression coefficients; may be characterized by including these steps.

[0050] In the present invention, a certain interval (bin) in (di) may specifically be 100kb to 2000kb.

[0051] In the present invention, in the step of identifying the region of the nucleic acid sequence in step (di), a certain interval (bin) may be, but is not limited to, 100kb to 2MB, specifically 500kb to 1500kb, more specifically 600kb to 1600kb, even more specifically 800kb to 1200kb, and most specifically 900kb to 1100kb.

[0052] In the present invention, the regression analysis in step (iii) above can be any regression analysis method capable of calculating regression coefficients, and may be characterized by being a LOESS analysis, but is not limited thereto.

[0053] In the present invention, the step of calculating the Z score in step e) above can be characterized by standardizing the sequencing read values ​​for each specific region (bin), and more specifically, it can be characterized by calculating it using the following formula 1.

number

[0054] In the present invention, step (f) is, (fi) The stage of dividing chromosomal regions using the CBS method (Circular Binary Segmentation method) based on the Z-score for each segment; (f-ii) The step of calculating the Z-score for each segmented chromosomal region (segment) as the average of the Z-scores calculated for each bin contained within the region; (f-iii) Perform local regression analysis (LOESS) for each interval (bin) and calculate the smoothed Z-score (Zn); (In this case, n ∈ {1, ..., N}, where N = the total number of bins.) (f-iv) The step of calculating the n_score associated with the noise using Equation 2; and

number

number

number

number

[0055] In the present invention, the CBS algorithm means a method for detecting a point where a change in the number of Z points calculated at the above stage occurs.

[0056] That is, assuming that the start point of the change in the number of Z points on the chromosome is i, the end point is j, the length of the entire region is N, r is the bin value of each nucleic acid sequence (specific bin interval), and s is the standard deviation of each bin value, the following formula is satisfied under the condition of 1 <= i < j <= N.

Equation

Equation

Equation

Equation

Equation

[0057] Here, (i c , j c ) means the position where the change in the number of Z points actually occurred, max means the maximum value, and arg means the argument.

[0058] In the present invention, the reference value of the number of I points can be used without limitation as long as it is a value that can perform prognosis prediction, preferably it can be characterized by being 5 to 10, and most preferably it can be characterized by being 7.81, but it is not limited thereto.

[0059] In the present invention, the breast tissue image can be used without limitation as long as it is an image that can confirm the presence or absence of cancer cells after anti-cancer treatment. Preferably, the breast tissue image may be a magnetic resonance imaging (MRI) image, a histochemical stained breast tissue sample image, an ultrasound image, an X-ray image, or a fluorescent stained breast tissue sample image. More preferably, it may be selected from the group consisting of a histochemical stained breast tissue sample image and a fluorescent stained breast tissue sample image, but is not limited thereto.

[0060] In the present invention, a positive result in the breast tissue image interpretation information means that cancer cells are confirmed in the image, and a negative result means that cancer cells are not confirmed in the image.

[0061] In the present invention, the breast tissue image interpretation information can be used as an indicator for determining pathological complete response. Pathological complete response is defined as a state in which a breast cancer patient is free from invasive breast cancer after undergoing prior therapy and surgery.

[0062] In the present invention, the method may further include the step of classifying a patient as a moderately risky group if the I score is above the reference value and the image interpretation information is negative, classifying a patient as a high-risk group if the I score is below the reference value and the image interpretation information is positive, and classifying a patient as a very high-risk group if the I score is above the reference value and the image interpretation information is positive.

[0063] In another aspect, the present invention is a decoding unit for decoding the sequence information of cell-free DNA isolated from a biological sample before anti-cancer treatment; An alignment unit that aligns the decoded sequence with a standard chromosome sequence database of the reference population; A quality control department that selects only the sequence information of samples that meet or exceed a cut-off value from the sorted sequence information (reads); An I-score calculation unit calculates the Z-score by comparing the selected sequence information (reads) with a reference population sample, and then calculates the I-score based on this Z-score; Image interpretation information receiving unit for acquiring breast tissue image interpretation information after anti-cancer treatment; and This invention relates to a cfDNA-based breast cancer prognosis prediction device that includes a determination unit that determines that the prognosis for breast cancer is poor if the score is above the cut-off value and the image interpretation information is positive.

[0064] In the present invention, the decoding unit may include a nucleic acid injection unit for injecting nucleic acids extracted from an independent device, and a sequence information analysis unit for analyzing the sequence information of the injected nucleic acids. Preferably, it is an NGS analyzer, but is not limited thereto.

[0065] In the present invention, the decoding unit may be characterized by receiving and decoding sequence information data generated by an independent device.

[0066] In the present invention, the image interpretation information receiving unit may be characterized by receiving image interpretation information generated by an independent device.

[0067] In yet another aspect, the present invention includes instructions configured to be executed by a processor that predicts the prognosis of breast cancer, a) A step of obtaining sequence information of cell-free DNA isolated from a biological sample before anti-cancer treatment; b) A step of aligning the sequence information (reads) in a reference genome database of a reference population; c) A step of checking the quality of the aligned sequence information (reads) and selecting only sequence information that is above a cut-off value; d) A step of dividing the standard chromosome into certain intervals (bins) and normalizing the selected sequence information (reads) by checking the amount in each interval; e) A step of calculating the Z score between the values ​​normalized in step d) after determining the mean and standard deviation of the reads matched to each normalized interval (bin) of the reference population; f) A step of dividing the chromosome using the Z score and calculating the I score; g) A step of obtaining breast tissue image interpretation information after anti-cancer treatment; and h) A step of determining that the prognosis for breast cancer is poor if the calculated I score is above the cut-off value and the breast tissue image interpretation information is positive; This relates to a computer-readable medium containing instructions configured to be executed by a processor that includes [a specific component / function].

[0068] In yet another aspect, the present invention relates to a method for providing information for determining the prognosis of breast cancer, including the method described above.

[0069] In the present invention, the term "breast cancer" includes, without limitation, all types of cancer that occur in the breast, and more specifically, includes, but is not limited to, ductal carcinoma in situ, anti-inflammatory carcinoma in situ, invasive ductal carcinoma, invasive lobular carcinoma, ductal carcinoma in situ, and lobular carcinoma in situ.

[0070] In this invention, the term "prognosis" refers to the prediction of the likelihood of cancer progression, recurrence, and / or metastasis. The prediction method of this invention involves selecting the most appropriate treatment method for any given particular patient and can be used to make clinical treatment decisions. The prediction method of this invention is a valuable diagnostic and / or diagnostic aid for determining whether a patient is likely to experience cancer progression, recurrence, and / or metastasis.

[0071] In other embodiments, the method relating to the present invention may be embodied using a computer. In one embodiment, the computer includes one or more processors connected to a chipset. The chipset is connected to memory, storage devices, a keyboard, a graphics adapter, a pointing device, and a network adapter, etc. In one embodiment, the performance of the chipset is obtained by a memory controller hub and an I / O controller hub. In another embodiment, the memory may be used by being directly connected to the processor instead of the chipset. The storage device is any device capable of holding data, including a hard drive, a CD-ROM (Compact Disk Read-Only Memory), a DVD, or other memory device. The memory is involved with the data and instructions used by the processor. The pointing device may be a mouse, a trackball, or other type of pointing device, and is used in combination with a keyboard to transmit input data to the computer system. The graphics adapter displays images and other information on a display. The network adapter is connected to a computer system via a short-range or long-range communication network. However, the computer used in this application is not limited to the above configuration and may lack some configurations, include additional configurations, or be part of a Storage Area Network (SAN). The computer of this application may be configured to be suitable for module execution in a program for performing the method according to this application.

[0072] In this application, "module" may mean a functional and structural combination of hardware for implementing the technical concept of this application and software for driving said hardware. For example, the module may mean a logical unit of a predetermined code and hardware resources for which said code is performed, and it will be obvious to those skilled in the art that it does not necessarily mean physically connected code or a single type of hardware. [Examples]

[0073] The present invention will be described in more detail below through examples. It will be obvious to those ordinary in the art that these examples are for illustrative purposes only and that the scope of the present invention should not be construed as being limited by these examples.

[0074] Example 1. Calculation of Score I in breast cancer patients and healthy individuals

[0075] The score was calculated using the method described in Korean Patent No. 10-2019-0019315.

[0076] More specifically, cell-free DNA was extracted from pre-treatment plasma samples of 456 breast cancer patients who participated in the PEARLY clinical trial (NCT02441933), underwent neoadjuvant therapy, and progressed to surgery, as well as from plasma samples of 20 healthy individuals, to prepare a library for full-length chromosomes. Cell-free DNA extraction was performed in the following order: 1) After blood collection in an EDTA tube, the supernatant (plasma) was separated by sequential centrifugation at 1600g for 10 minutes and then at 3000g for 10 minutes within 4 hours; 2) Cell-free DNA was extracted using a plasma circulating DNA kit (Tiangen, China) with 0.6 ml of the separated plasma; 3) The finally extracted cell-free DNA was reacted with a Qubit 2.0 fluorometer, and the concentration (ng / μl) was measured. The library was prepared using the MGI-Easey Cell-free DNA Library Prep Kit (MGI, China), with a total of 2-6 ng of cell-free DNA used in the reaction.

[0077] The completed libraries were subjected to nucleotide sequence analysis using a DNBSEQ-G400 sequencing system (MGI), producing an average of 17 million reads of sequence information data per sample.

[0078] After converting the Bcl file (containing nucleotide sequence information) to fastq format using next-generation sequencing (NGS) equipment, the library sequences of the fastq file were aligned using the BWA-mem algorithm based on the reference chromosome Hg19 sequence. It was confirmed that the mapping quality score (Mapping Quality Score) was 60.

[0079] We confirmed that the distribution of sequencing reads in each chromosomal locus (bin) is biased by the amount of GC (Figure 2), and used regression analysis to calibrate the numbers of library sequences aligned by chromosome-specific GC ratios.

[0080] Subsequently, the Z score was calculated using formula 1 below.

number

[0081] To calculate the I score, the Z score for each bin was used as data, and the process of segmenting the chromosome using the CBS algorithm was performed first.

[0082] Subsequently, the score was calculated through the following steps.

[0083] i) The step of calculating the Z-score for each divided chromosomal region (segment) as the average of the Z-scores calculated for each bin contained within the region; ii) Perform local regression analysis (LOESS) for each interval (bin) and calculate the smoothed Z-score (Zn); (where n ∈ {1, ..., N}, and N = the total number of bins.) iii) The step of calculating the n_score associated with the noise using formula 2; and

number

number

number

number

[0084] Example 2.1 Confirmation of the effect of scores on breast cancer progression and survival.

[0085] In Example 1, breast cancer patients were divided into an exploratory group of 232 and a validation group of 233. The association between the I score and disease-free survival (DFS) was analyzed in the exploratory group using univariate Cox regression and the maximal log-rank test. The results showed a significant decrease in DFS in the group with an I score of 7.81 or higher, and an increase in the hazard ratio (HR) for disease progression-free survival (Figures 3A and 4A). The same results were also confirmed in the validation group (Figures 3B and 4B).

[0086] [Table 1] Abbreviations:IQR, interquartile range;SD, standard deviation;pCR, pathologic complete response;CNA, copy number aberration *Calculated by t-test **calculated Fisher's exact test

[0087] Example 3.1 Confirmation of the relationship between scores and pCR

[0088] Multivariate Cox analysis was used to examine the relationship between the presence or absence of pathological complete response (PCR), a strong prognostic factor for breast cancer, and the I score. As shown in Figure 5, in the exploratory group, regardless of whether or not pCR was achieved, a score above the reference value of the I score resulted in a decrease in disease-free survival (DFS).

[0089] Furthermore, since pCR and I score each act as independent prognostic predictors, we confirmed that further refined prognosis prediction is possible when these two are combined and separated into four groups (Figures 6 and 7).

[0090] The four groups are (1) Groups with a score of I above the normal range but no pCR (2) Groups with a score of I above the standard value and a pCR (3) Groups whose I score is below the reference value and who do not have a pCR, (4) Groups with a score below the normal range and a pCR As shown in Figures 6 and 7, it was confirmed that group 1 had the worst prognosis and group 4 had the best prognosis.

[0091] While the above describes in detail specific aspects of the present invention, it will be clear to those with ordinary skill in the art that such specific descriptions are merely preferred modes of implementation and do not limit the scope of the invention. Therefore, the substantial scope of the present invention is defined by the appended claims and their equivalents. [Industrial applicability]

[0092] The breast cancer prognosis prediction method according to the present invention not only improves the accuracy of prognosis prediction for breast cancer patients by using next-generation sequencing (NGS) techniques, but also improves the accuracy of prognosis prediction based on very low concentrations of cell-free DNA, which were previously difficult to detect, thereby increasing its commercial applicability. Therefore, the method of the present invention is useful for determining the prognosis of breast cancer patients.

Claims

1. Cell-free DNA (cfDNA)-based breast cancer prognosis prediction method including the following steps: a) The step of obtaining sequence information of cell-free DNA isolated from a biological sample before anti-cancer treatment; b) The step of aligning the sequence information (reads) with the reference genome database of the reference population; c) A step in which the quality of the aligned sequence information (reads) is checked, and only sequence information that is equal to or above a cut-off value is selected; d) A step of dividing the standard chromosome into certain intervals (bins), and normalizing the selected sequence information (reads) by checking the amount in each interval; e) After determining the mean and standard deviation of the reads matched to each normalized interval (bin) of the reference population, the step of calculating the Z-score between the values ​​normalized in step d); f) A step of dividing chromosomes using the Z score and calculating the I score; g) The step of obtaining predetermined breast tissue image interpretation information after anti-cancer treatment; and h) If the calculated I score (I-score) is equal to or greater than the cut-off value, and the breast tissue image interpretation information is positive, this is the stage in which it is determined that the prognosis for breast cancer is poor. Here, step (f) is characterized by being carried out in a manner that includes the following steps: (f-i) The step of dividing chromosomal regions using the CBS (Circular Binary Segmentation) method based on the Z-scores for each section; (f-ii) The step of calculating the Z score for each segmented chromosomal region (segment) as the average of the Z scores calculated for each interval (bin) included in the region; (f-iii) A step in which local regression analysis (LOESS) is performed for each interval (bin) and the smoothed Z-score (Zn) is calculated; (In this case, n ∈ {1, ..., N}, where N = the total number of bins.) (f-iv) The step of calculating n_score related to noise in equation 2; and [Math 2] (At this time, 【number】 This means that the Z-score for each bin was calculated in step i). (f - v) The stage where the score I is calculated using formula 3 below. [Math 3] (At this time, [Math 4] This means that the Z-score for each region (segment) was calculated in step i). The breast tissue image is selected from the group consisting of breast tissue sample histochemical staining images and breast tissue sample tissue fluorescence staining images, and h) A method for predicting prognosis, characterized in that the reference value for stage 7.

81.

2. The cfDNA-based breast cancer prognosis prediction method according to claim 1, characterized in that step a) is carried out by a method comprising the following steps: (a-i) The step of removing proteins, fats, and other residues from the collected cell-free DNA using a salting-out method, a column chromatography method, or a beads method to obtain purified nucleic acids; (a-ii) A step of preparing a single-end sequencing or pair-end sequencing library from the purified nucleic acid; (a-iii) The step of reacting the prepared library with a next-generation sequencer; and (a-iv) The step of obtaining nucleic acid sequence information (reads) using the next-generation gene sequencing analyzer.

3. The cfDNA-based method for predicting the prognosis of breast cancer according to claim 2, characterized in that the method is carried out by further comprising the step of randomly fragmenting the nucleic acid purified in step (a-i) by enzymatic cleavage, pulverization, or hydroshear method between step (a-i) and step (a-ii) to produce a single-end sequencing or pair-end sequencing library.

4. The cfDNA-based method for predicting the prognosis of breast cancer according to claim 1, characterized in that the step of obtaining sequence information in step a) above is obtained by obtaining the isolated cell-free DNA through full-length genome sequencing at a depth of 0.01 to 100 reads.

5. The cfDNA-based method for predicting the prognosis of breast cancer according to claim 1, characterized in that step c) is carried out by a method comprising the following steps: (c-i) The step of identifying the region of each aligned nucleic acid sequence; and (c-ii) A step of selecting sequences that satisfy the mapping quality score and GC ratio criteria within the region.

6. The cfDNA-based method for predicting the prognosis of breast cancer according to claim 5, characterized in that the reference values ​​are such that the mapping quality score is 15 to 70 and the GC ratio is 30 to 60%.

7. The cfDNA-based method for predicting the prognosis of breast cancer according to claim 5, characterized in that step c) is performed by excluding data from the central or terminal body of a chromosome.

8. The cfDNA-based method for predicting the prognosis of breast cancer according to claim 1, characterized in that step d) is carried out by a method comprising the following steps: (d-i) The stage of dividing the standard chromosome into specific intervals (bin); (d-ii) A step of calculating the number of reads aligned in the above interval and the GC amount of each read; (d-iii) A step of performing regression analysis based on the number of reads and the amount of GC and calculating the regression coefficients; and (d-iv) A step in which the number of reads is normalized using the regression coefficients.

9. The cfDNA-based method for predicting the prognosis of breast cancer according to claim 8, characterized in that a certain interval (bin) in (d-i) is 100kb to 2Mb.

10. The cfDNA-based method for predicting the prognosis of breast cancer according to claim 1, characterized in that step e) above is calculated using the following formula 1. [Math 1]

11. The cfDNA-based method for predicting the prognosis of breast cancer according to claim 1, characterized in that a positive result in the breast tissue image interpretation information means that cancer cells are confirmed in the image.

12. The cfDNA-based method for predicting the prognosis of breast cancer according to claim 1, further comprising the steps of: classifying the patient as a moderate-risk group if the I score is above the reference value and the image interpretation information is negative; classifying the patient as a high-risk group if the I score is below the reference value and the image interpretation information is positive; and classifying the patient as a very high-risk group if the I score is above the reference value and the image interpretation information is positive.

13. A method for providing information for determining the prognosis of breast cancer, comprising the step of predicting the prognosis of breast cancer by the method according to any one of claims 1 to 12.

14. A decoding unit that decodes the sequence information of cell-free DNA isolated from biological samples before anti-cancer treatment; An alignment unit that aligns the decoded sequence with a standard chromosome sequence database of the reference population; A quality control department that selects only the sequence information of samples that are above a cut-off value from the sorted sequence information (reads); An I-score calculation unit calculates the Z-score by comparing the selected sequence information (reads) with a reference population sample, and then calculates the I-score based on this Z-score; An image interpretation information receiving unit that acquires predetermined breast tissue image interpretation information after anti-cancer treatment; and A cfDNA-based breast cancer prognosis prediction device comprising: a determination unit that determines that the prognosis of breast cancer is poor if the I score is above the reference value (cut-off value) and the image interpretation information is positive; Here, the I-score calculation unit calculates the Z-score in a manner that includes the following steps: (i) The step of dividing chromosomal regions using the Circular Binary Segmentation (CBS) method based on the Z-scores for each segment; (ii) The step of calculating the Z score for each segmented chromosomal region (segment) as the average of the Z scores calculated for each interval (bin) included in the region; (iii) The step of performing local regression analysis (LOESS) for each interval (bin) and calculating the smoothed Z-score (Zn); (In this case, n ∈ {1, ..., N}, where N = the total number of bins.) (iv) The step of calculating n_score related to noise in equation 2; and [Math 2] (At this time, 【number】 This means that the Z-score for each bin was calculated in step i). (v) The stage of calculating the I score using formula 3 below. [Math 3] (At this time, [Math 4] This means that the Z-score for each region (segment) was calculated in step i). The breast tissue image is selected from the group consisting of breast tissue sample histochemical staining images and breast tissue sample tissue fluorescence staining images, and A prognosis prediction device characterized in that the reference value used in the determination unit is 7.

81.

15. The instructions, configured to be executed by a processor that predicts the prognosis of breast cancer, are a computer-readable medium. a) The step of obtaining sequence information of cell-free DNA isolated from a biological sample before anti-cancer treatment; b) The step of aligning the sequence information (reads) with the reference genome database of the reference population; c) A step in which the quality of the aligned sequence information (reads) is checked, and only sequence information that is equal to or above a cut-off value is selected; d) A step of dividing the standard chromosome into certain intervals (bins), and normalizing the selected sequence information (reads) by checking the amount in each interval; e) After determining the mean and standard deviation of the reads matched to each normalized interval (bin) of the reference population, the step of calculating the Z-score between the values ​​normalized in step d); f) A step of dividing chromosomes using the Z score and calculating the I score; g) The step of obtaining predetermined breast tissue image interpretation information after anti-cancer treatment; and h) If the calculated I-score is equal to or greater than the cut-off value and the breast tissue image interpretation information is positive, the prognosis for breast cancer is determined to be poor; A computer-readable medium containing instructions configured to be executed by a processor including, Here, step (f) is carried out in a manner that includes the following steps: (f-i) The step of dividing chromosomal regions using the CBS (Circular Binary Segmentation) method based on the Z-scores for each section; (f-ii) The step of calculating the Z score for each segmented chromosomal region (segment) as the average of the Z scores calculated for each interval (bin) included in the region; (f-iii) A step in which local regression analysis (LOESS) is performed for each interval (bin) and the smoothed Z-score (Zn) is calculated; (In this case, n ∈ {1, ..., N}, where N = the total number of bins.) (f-iv) The step of calculating n_score related to noise in equation 2; and [Math 2] (At this time, 【number】 This means that the Z-score for each bin was calculated in step i). (f - v) The stage where the score I is calculated using formula 3 below. [Math 3] (At this time, [Math 4] This means that the Z-score for each region (segment) was calculated in step i). The breast tissue image is selected from the group consisting of breast tissue sample histochemical staining images and breast tissue sample tissue fluorescence staining images, and h) A computer-readable medium characterized in that the reference value for each stage is 7.81.