Method for constructing hematological tumor clonal evolution map and related equipment

By extracting a set of complementary determinant region sequences from batch sequencing data, performing pseudoclone filtering and edit distance clustering, and combining evolutionary co-validation, a clonal evolution map was constructed. This solved the problem of low accuracy in constructing clonal evolution maps for hematologic malignancies, achieving higher accuracy and lower experimental costs.

CN122245432APending Publication Date: 2026-06-19SHENZHEN NEOIMMUNE CO LTD

Patent Information

Authority / Receiving Office
CN · China
Patent Type
Applications(China)
Current Assignee / Owner
SHENZHEN NEOIMMUNE CO LTD
Filing Date
2026-01-30
Publication Date
2026-06-19

AI Technical Summary

Technical Problem

Existing methods for constructing clonal evolution maps of hematologic malignancies are not very accurate and are difficult to effectively distinguish between sequencing errors and actual clonal evolution events.

Method used

By extracting a set of complementary determinant region sequences from batch sequencing data, performing pseudoclone filtering and edit distance clustering, and combining evolutionary co-validation, a clonal evolution map is constructed, and pseudoclones are identified by frequency changes at different time points.

Benefits of technology

It improves the accuracy of constructing clonal evolution maps of hematologic tumors, reduces the complexity and cost of experimental systems, and enables effective identification of pseudoclones.

✦ Generated by Eureka AI based on patent content.

Smart Images

  • Figure CN122245432A_ABST
    Figure CN122245432A_ABST
Patent Text Reader

Abstract

This application provides a method and related equipment for constructing a clonal evolution atlas of hematologic malignancies. The method includes acquiring batch sequencing data and extracting a set of complementarity-determining region (CDR) sequences from the batch sequencing data; filtering the CDR sequence set for pseudoclones to obtain a first sequence set; clustering the CDR sequences in the first sequence set based on edit distance to obtain multiple clonal clusters; performing evolutionary co-validation on clonal pairs in the clonal clusters at multiple time points to obtain an evolutionary trend score; filtering the clonal clusters at multiple time points based on the evolutionary trend score to obtain a clonal evolution time series; and constructing a clonal evolution atlas based on the clonal evolution time series. This method can improve the accuracy of the constructed clonal evolution.
Need to check novelty before this filing date? Find Prior Art

Description

Technical Field

[0001] This application relates to the field of computer technology, and in particular to a method and related equipment for constructing a clonal evolution map of hematologic tumors. Background Technology

[0002] Batch sequencing refers to the technical methods and strategies for performing high-throughput sequencing analysis on a large number of nucleic acid samples simultaneously in a single experimental procedure, so as to achieve simultaneous detection and data output of multiple samples, thereby improving the efficiency of sequencing experiments and reducing the cost of sequencing a single sample.

[0003] Sequencing errors may occur during batch sequencing. Clonal evolution events often exhibit sequence differences similar to those of sequencing errors. Currently, filtering sequencing data with sequencing errors in batch sequencing data can improve the accuracy of constructing clonal evolution based on batch sequencing data.

[0004] However, the accuracy of the clone evolution constructed so far is not high. Summary of the Invention

[0005] The main purpose of this application is to propose a method and related equipment for constructing a clonal evolution map of hematologic tumors, so as to improve the accuracy of the constructed clonal evolution.

[0006] To achieve the above objectives, this application proposes a method for constructing a clonal evolution map of hematologic malignancies, comprising:

[0007] Acquire batch sequencing data and extract a set of complementarity-determining region sequences from the batch sequencing data; The set of complementarity-determining region sequences is filtered by pseudoclones to obtain the first set of sequences. The complementary determination region sequences in the first sequence set are clustered based on edit distance to obtain multiple cloning clusters; Evolutionary co-validation was performed on clone pairs in clonal clusters at multiple time points to obtain evolutionary trend scores; Clonal clusters at multiple time points are filtered based on evolutionary trend scores to obtain clonal evolution time series, and a clonal evolution map is constructed based on the clonal evolution time series.

[0008] Optionally, in one implementation, evolutionary co-verification is performed on clone pairs in the clonal cluster to obtain an evolutionary trend score, including: Determine the first abundance change rate of the parent clone and the second abundance change rate of the daughter clone for clone pairs at multiple time points; Evolutionary trend scores were determined based on the first and second abundance change rates.

[0009] Optionally, in one implementation, determining the evolutionary trend score based on a first abundance change rate and a second abundance change rate includes: The evolutionary trend was determined based on the first abundance change rate and the second abundance change rate; An evolutionary trend score is determined based on the evolutionary trend of each clone pair at each time point.

[0010] Optionally, in one implementation, determining an evolutionary trend score based on the corresponding evolutionary trend of each clone pair at each time point includes: Obtain the weight at each time point; The evolutionary trend of each clone pair at each time point is weighted according to the weight of each time point to obtain the evolutionary trend score.

[0011] Optionally, in one implementation, constructing a clonal evolution map based on clonal evolution time series includes: Convert the evolutionary relationships of clone pairs in the clonal evolution time series into directed edges; Clonal evolution graphs are constructed based on directed edges.

[0012] Optionally, in one embodiment, the set of complementarity determination region sequences is subjected to pseudo-clone filtering to obtain a first sequence set, including: The complementary decision region sequence set is filtered for pseudo-clones based on preset rules to obtain the first sequence set. The preset rules include one or more of the following: frequency ratio, edit distance, and sequence inclusion relationship.

[0013] Optionally, in one implementation, extracting a set of complementarity-determining region sequences from batch sequencing data includes: Sequence structure analysis was performed on batch sequencing data to obtain a set of complementarity-determining region sequences.

[0014] This application also proposes a device for constructing a clonal evolution map of hematologic malignancies, characterized by comprising: The acquisition unit is used to acquire batch sequencing data and extract a set of complementarity-determining region sequences from the batch sequencing data. The filtering unit is used to perform pseudo-clone filtering on the complementarity determination region sequence set to obtain the first sequence set. Clustering units are used to cluster the complementary determination region sequences in the first sequence set based on edit distance to obtain multiple clone clusters; The verification unit is used to perform evolutionary co-verification of clone pairs in a clone cluster at multiple time points to obtain an evolutionary trend score. The building blocks are used to filter clonal clusters at multiple time points based on evolutionary trend scores to obtain clonal evolution time series, and to construct clonal evolution maps based on clonal evolution time series.

[0015] Another aspect of this application provides an electronic device, comprising: Memory, transceiver, processor, and bus system; The memory is used to store programs; The processor is used to execute programs in memory, including methods for performing the aspects mentioned above; Bus systems are used to connect memory and processor to enable communication between them.

[0016] Another aspect of this application provides a computer-readable storage medium storing instructions that, when executed on a computer, cause the computer to perform the methods described above.

[0017] As can be seen from the above technical solutions, the embodiments of this application have the following advantages: This method first extracts a set of complementary determinant region sequences from batch sequencing data to filter pseudoclones, then clusters the filtered sequence sets based on edit distance, then scores the evolutionary trends of clonal clusters at multiple time points based on evolutionary co-validation, verifies pseudoclones based on the scores, and finally constructs a clonal evolution map from the validated data. By using evolutionary trend-based validation, pseudoclones can be identified by utilizing frequency changes between different time points, thereby improving the accuracy of the constructed clonal evolution. Attached Figure Description

[0018] Figure 1 This is a schematic diagram of the immune repertoire structure provided in the embodiments of this application; Figure 2 This is a flowchart illustrating the method for constructing a clonal evolution map of hematologic tumors provided in this application embodiment; Figure 3 yes Figure 2 A flowchart illustrating step 204 in the middle section; Figure 4 yes Figure 3 A flowchart illustrating step 302 in the middle section; Figure 5 yes Figure 4 A flowchart illustrating step 402 in the middle section; Figure 6 yes Figure 2 A flowchart illustrating step 205; Figure 7 This is a schematic diagram of the structure of the hematologic tumor clonal evolution map construction system provided in the embodiments of this application; Figure 8 These are two patient clonal evolution dynamic network diagrams provided in the embodiments of this application; Figure 9 This is a schematic diagram of the dynamic evolution of a multi-time-point spectral system provided in an embodiment of this application; Figure 10This is a heatmap comparing the Top 10 clone similarity between two samples provided in an embodiment of this application; Figure 11 This is a scatter plot showing the correlation of IGH CDR3 nucleotide sequence frequencies between two samples provided in the embodiments of this application. Figure 12 This is the Top 10 clone Sankey diagram between two samples provided in the embodiments of this application; Figure 13 This is a clonal phylogenetic tree and multiple sequence alignment diagram provided in the embodiments of this application; Figure 14 This is a schematic diagram of the structure of the hematologic tumor clonal evolution map construction device provided in the embodiments of this application; Figure 15 This is a schematic diagram of the structure of the electronic device provided in the embodiments of this application. Detailed Implementation

[0019] To make the objectives, technical solutions, and advantages of this application clearer, the following detailed description is provided in conjunction with the accompanying drawings and embodiments. It should be understood that the specific embodiments described herein are merely illustrative and not intended to limit the scope of this application.

[0020] It should be noted that although functional modules are divided in the device schematic diagram and a logical order is shown in the flowchart, in some cases, the steps shown or described may be performed in a different order than the module division in the device or the order in the flowchart. The terms "first," "second," etc., in the specification, claims, and the aforementioned drawings are used to distinguish similar objects and are not necessarily used to describe a specific order or sequence.

[0021] The term “exemplary” as used herein means “serving as an example, embodiment, or illustration.” Any embodiment illustrated herein as “exemplary” is not necessarily to be construed as superior to or better than other embodiments.

[0022] In the embodiments of this application, the terms "module" or "unit" refer to a computer program or part of a computer program that has a predetermined function and works with other related parts to achieve a predetermined goal, and can be implemented wholly or partially using software, hardware (such as processing circuitry or memory), or a combination thereof. Similarly, a processor (or multiple processors or memory) can be used to implement one or more modules or units. Furthermore, each module or unit can be part of an overall module or unit that includes the functionality of that module or unit.

[0023] Furthermore, to better illustrate this application, numerous specific details are provided in the following detailed embodiments. Those skilled in the art should understand that this application can be implemented without certain specific details. In some instances, methods, means, components, and circuits well-known to those skilled in the art have not been described in detail in order to highlight the main points of this application.

[0024] Unless otherwise defined, all technical and scientific terms used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this application belongs. The terminology used herein is for the purpose of describing embodiments of this application only and is not intended to limit this application.

[0025] First, let's analyze some of the terms used in this application: Immune repertoire sequencing: With the development of next-generation sequencing (NGS) technology, immune repertoire sequencing (including T cell receptor (TCR) and B cell receptor (BCR)) is widely used in tumor microenvironment and treatment monitoring, enabling in-depth characterization of B / T cell clonal diversity and dynamic changes. Immune repertoire sequencing targets B / T lymphocytes, amplifying gene regions that determine antigen recognition in the BCR or TCR, including variable regions (V regions), hypervariable regions (D regions), and junction regions (J regions), to obtain relevant information about the immune repertoire. Among these, the BCR and TCR contain three complementary determining regions (CDRs) closely related to antigen interaction: CDR1, CDR2, and CDR3. The CDR3 region, located at the junction of V(D)J recombination, is the most diverse and specific region, often used as a molecular marker for cells. Therefore, the nucleotide sequence of the CDR3 region is the most important research target in clonal evolution analysis. Specific immune repertoire structures can be found below. Figure 1The diagram shows the structure of the immunoglobulin heavy chain (IGH) gene and its amplification using multiplex polymerase chain reaction (PCR). From the 5' end to the 3' end, the IGH gene sequentially includes the leader sequence, the framework region (FR1-FR4), the complementarity-determining region (CDR1-CDR3), and gene fragments such as the immunoglobulin heavy chain variable gene (IGHV), the immunoglobulin heavy chain diversity gene (IGHD), and the immunoglobulin heavy chain joining gene (IGHJ). FR represents the conserved sequence of the variable region, and CDR represents the hypervariable region for antigen binding (CDR3 is formed by recombination of IGHV / IGHD / IGHJ and is the most diverse region). The figure also indicates the amplification range and product length for different primer combinations: for example, the Leader-Primer and downstream primer combination can amplify a fragment of approximately 450 bp (covering the complete variable region), the FR1-Primer corresponds to an amplification product of approximately 400 bp, the FR2-Primer to approximately 270 bp, and the FR3-Primer to approximately 150 bp. It also shows the distribution of amplification regions for different primer pairs (such as Leader-R1, FR1-R1, etc.) in multiplex PCR. Immune repertoires reflect the composition and functional state of the body's immune system. By analyzing immune receptor rearrangement sequences, the amplification, contraction, and mutation of specific clones can be tracked, thereby revealing the immunodynamic characteristics during disease progression or treatment response.

[0026] Bulk sequencing: Generally, based on sequencing resolution, it can be divided into bulk sequencing, single-cell sequencing, and spatial transcriptome sequencing. Higher resolution provides clearer data analysis, but also increases cost, making large-scale applications difficult. Most mainstream immune repertoire sequencing products use bulk sequencing, which involves extracting nucleic acids from a mixed cell population for unified library construction and sequencing. Its advantages include high coverage depth, low cost, and good reproducibility, but its disadvantages include difficulty in distinguishing differences between individual cells and the boundary between actual mutations and sequencing errors.

[0027] Unique Molecular Identifiers (UMIs) are a molecular barcoding technology that introduces unique short sequence tags into each DNA or RNA molecule during library construction, enabling the correction of sequencing errors and the identification of duplicate amplifications. UMI technology can improve the accuracy of mutation detection to some extent, compensating for the insufficient resolution caused by bulk sequencing. However, UMI technology increases library construction complexity and cost, and is susceptible to the efficiency of UMI tagging, limiting its widespread application. This method, without UMI, achieves similar or even better accuracy than UMI through algorithmic error modeling and correction.

[0028] Hematologic Malignancies and MRD: Currently, the main application of immune repertoire sequencing in hematologic malignancies is focused on the detection of minimal residual disease (MRD). Traditional MRD analysis software (such as LymphoTrack®) typically uses the CDR3 sequence of the master clone at the diagnostic stage as a molecular marker. By tracking clones in the sequencing data that perfectly match this sequence or contain only a few base differences (usually 1–2), the MRD level at different time points is calculated. However, the analytical scope of such methods is mainly limited to the static abundance changes of the master clone, failing to deeply analyze the dynamic evolutionary relationship between the master clone and its related subclones.

[0029] Clonal evolution refers to the dynamic changes of clonal populations within a tumor or immune system over time, involving multiple mechanisms such as mutation accumulation, selective pressure, immune escape, and treatment response. In hematologic malignancies, the abnormal expansion and continuous evolution of specific clones are important mechanisms for disease occurrence, relapse, and drug resistance; in contexts such as infection and vaccination, clonal expansion and affinity maturation reflect the immune response process. Therefore, clonal evolution analysis based on immune repertoires has significant clinical value in minimal residual disease (MRD) detection, efficacy monitoring, relapse prediction, and immune function assessment. At the immune repertoire level, clonal evolution manifests as the amplification, mutation, and lineage remodeling of specific BCR / TCR sequences. Through longitudinal analysis of immune repertoire samples, the origin, transformation, and evolutionary pathways of tumor immune clones can be revealed, providing a basis for determining disease outcome and drug resistance mechanisms.

[0030] Sequencing Errors and Clonal Evolution: Distinguishing between sequencing errors and true mutations is the most critical part of clonal evolution analysis. In Bulk immune repertoire sequencing, sequencing errors and true clonal evolution events often exhibit similar sequence differences, easily leading to misinterpretation as new mutant clones or false-positive clone amplification. Sequencing errors typically originate from PCR amplification bias, sequencing platform errors, or library construction contamination, while sequence variations resulting from clonal evolution reflect actual biological processes.

[0031] Because PCR amplification and sequencing errors can produce false mutations, making it difficult to distinguish between true biological variations and technical errors, current methods for identifying false clones involve attaching random nucleotide sequences as UMI tags to each original IGH gene fragment. All PCR amplification products of the same original molecule will carry the same UMI. After sequencing, the sequences are clustered based on the UMI, and then the differences between the clustered sequences are compared to filter out false positive variations caused solely by PCR mismatches or sequencing errors, thereby identifying the true clone sequences. However, using UMI tags increases the complexity and cost of the experimental system, therefore, a new method for constructing clonal evolution maps of hematologic malignancies is urgently needed.

[0032] Based on this, embodiments of this application provide a method for constructing a clonal evolution map of hematologic tumors, which can solve the above-mentioned technical problems.

[0033] Please see Figure 2 ,like Figure 2 The image shows a method for constructing a clonal evolution map of hematologic malignancies according to an embodiment of this application. The method includes: Step 201: Obtain batch sequencing data and extract the complementarity-determining region sequence set from the batch sequencing data.

[0034] Step 202: Perform pseudo-clone filtering on the complementarity determination region sequence set to obtain the first sequence set.

[0035] Step 203: Cluster the complementary decision region sequences in the first sequence set based on edit distance to obtain multiple clone clusters.

[0036] Step 204: Perform evolutionary co-validation on clone pairs in clone clusters at multiple time points to obtain evolutionary trend scores.

[0037] Step 205: Filter clonal clusters at multiple time points based on evolutionary trend scores to obtain clonal evolution time series, and construct a clonal evolution map based on the clonal evolution time series.

[0038] In step 201 of some embodiments, batch sequencing of immune repertoire samples is completed through a high-throughput sequencing platform to obtain batch sequencing data containing raw sequencing reads of all samples. The raw sequencing data can be preprocessed to perform operations such as adapter sequence removal, low-quality base filtering, and length uniformization screening to obtain high-quality effective reads.

[0039] The complementarity-determining region (CDR) sequence set can refer to a dataset extracted from batch sequencing data of immune repertoires, containing nucleotide or amino acid sequences of the CDR1, CDR2, and CDR3 complementarity-determining regions (CDR1, CDR2, and CDR3) within the variable region of the B cell receptor (BCR) or T cell receptor (TCR). Here, it can specifically refer to the nucleotide sequence of the CDR3 region. For example, specific alignment primers can be designed based on the conserved frame region sequence of the IGH gene to globally align the effective reads with the IGH gene reference sequence, locating the variable region of the immunoglobulin heavy chain. Bioinformatics algorithms can then be used to identify and extract sequence fragments between highly conserved frame regions within the variable region, yielding the CDR3 sequence set.

[0040] In step 202 of some embodiments, pseudoclone filtering can refer to the process of identifying and removing non-authentic biological clone sequences generated by technical factors such as PCR amplification errors and sequencer signal deviations during the analysis of immune repertoire sequencing data, thereby obtaining a purified sequence dataset that reflects the actual composition of immune receptor clones in the sample. Here, non-authentic biological clone sequences obtained from the analysis of the complementarity-determining region (CDR) sequence set can be removed, retaining CDR sequences with independent sequence characteristics, ultimately obtaining a first sequence set free of pseudoclone interference.

[0041] In step 203 of some embodiments, the edit distance can refer to the minimum number of edit operations required to convert one sequence into another. Common edit operations include the insertion, deletion, and replacement of single characters. The edit distance between any two complementarity-determining region (CDR) sequences in the first sequence set is calculated. By setting a reasonable edit distance threshold (e.g., a difference of 1-2 residues), CDR sequences with edit distances less than or equal to the threshold are grouped into the same clonal cluster. These sequences are determined to originate from the evolution of the same initial B cell clone. After completing the clustering analysis of the entire sequence set, multiple clonal clusters composed of multiple homologous sequences are obtained.

[0042] For example, using the CDR3 nucleotide sequence of a clone as a feature, the edit distance (Levenshtein distance) between any two clones is calculated. Similarity relationships between clones are defined in the sequence space, a similarity adjacency matrix is ​​constructed, and a hierarchical clustering algorithm is used to group adjacent clones in the sequence space into the same cluster. Each cluster represents a potential clonal evolutionary population, reflecting a series of mutant clones derived from a common ancestral sequence during immune responses or tumor evolution.

[0043] In step 204 of some embodiments, a clone pair may refer to a combination of two immune receptor clone clusters that are evolutionarily related, originating from the same initial immune cell clone and appearing in samples at different time points or different tissue sites.

[0044] Evolutionary co-validation refers to the bioinformatics analysis process used in clonal evolution analysis of immune repertoires to verify the authenticity of evolutionary associations between clonal clusters at different time points. Its core objective is to distinguish between true adaptive evolution of clones and sequence similarity caused by technical errors or random mutations. In this context, it can involve verifying whether there is a correlation between abundance changes in clonal pairs at different time points, obtaining an evolutionary trend score that reflects the reliability of the clonal pair's evolution.

[0045] In step 205 of some embodiments, the evolutionary trend score is a quantitative representation of the evolutionary synergy verification results of clones across time points. The score directly corresponds to the authenticity and reliability of the clonal evolutionary relationship. Based on a preset score threshold, clone clusters at multiple time points are screened, clone clusters with qualified scores are retained, and non-evolutionary clone data with scores below the threshold are removed, thereby achieving secondary purification of the clone data.

[0046] The filtered clone pairs with real evolutionary relationships are integrated according to the time sequence of sample collection. The sequence characteristics of each clone cluster at different time points are sorted out to form a complete clonal evolution time series, which can present the dynamic evolution trajectory of a single clone from the initial time point to subsequent time points.

[0047] Visualizing the dynamic evolution of cloning allows for the construction of clonal evolution maps. These maps can be plotted with time on the horizontal axis and clonal abundance or size on the vertical axis. Nodes represent the state of clonal clusters at different points in time, with node size corresponding to clonal abundance. Arrowed lines represent the evolutionary relationships between clonal clusters.

[0048] The method for constructing a clonal evolution map of hematologic tumors provided in this application includes: acquiring batch sequencing data and extracting a set of complementarity-determining region (CDG) sequences from the batch sequencing data; filtering the CDG sequences for pseudoclones to obtain a first sequence set; clustering the CDG sequences in the first sequence set based on edit distance to obtain multiple clonal clusters; performing evolutionary co-validation on clonal pairs in the clonal clusters at multiple time points to obtain an evolutionary trend score; filtering the clonal clusters at multiple time points based on the evolutionary trend score to obtain a clonal evolution time series, and constructing a clonal evolution map based on the clonal evolution time series.

[0049] This method first extracts a set of complementarity-determining region (MDR) sequences from batch sequencing data to filter pseudoclones. Then, it clusters the filtered sequence sets based on edit distance. Next, it scores the evolutionary trends of clonal clusters at multiple time points based on evolutionary co-validation. Based on the scores, it verifies pseudoclones. Finally, it constructs a clonal evolution map from the validated data. By using evolutionary trend-based validation, it can identify pseudoclones by utilizing frequency changes between different time points, thereby improving the accuracy of the constructed clonal evolution. Furthermore, it does not require identification through UMI tags, reducing the complexity and cost of the experimental system.

[0050] Please see Figure 3 In some embodiments, step 204 includes, but is not limited to, steps 301 to 302.

[0051] Step 301: Determine the first abundance change rate of the parent clone and the second abundance change rate of the child clone for clone pairs at multiple time points; Step 302: Determine the evolutionary trend score based on the first abundance change rate and the second abundance change rate.

[0052] In this embodiment, the abundance change rate can refer to the relative change in the abundance proportion of clonal clusters in the immune repertoire at different time points or different sample sites. Clonal pairs are defined by chronological order, with the clonal cluster at the earlier time point being the parent clone and the clonal cluster at a later time point that has a potential evolutionary association with the parent clone being the child clone. The first abundance change rate can refer to the change in the abundance of the parent clone at its respective time point compared to the baseline state (e.g., the initial time point), and the second abundance change rate can refer to the change in the abundance of the child clone at its respective time point compared to the baseline state. True evolutionary clonal pairs typically exhibit a synergistic trend of decreasing parent clone abundance and increasing child clone abundance, while clones with pseudo-evolutionary associations often show no obvious correlation in their abundance changes.

[0053] The matching degree between the first abundance change rate and the second abundance change rate is converted into a specific scoring index, which yields the evolutionary trend score. For example, when the decrease in the abundance of the parent clone is positively correlated with the increase in the abundance of the daughter clone, the higher the synergy between the two, the greater the corresponding score weight; if the abundance changes of the parent clone and the daughter clone are not significantly related, or even show an inverse relationship, the score will be reduced.

[0054] The accuracy of evolutionary relationships between clone pairs can be improved by calculating the correlations exhibited by parent-child clone pairs over time.

[0055] Please see Figure 4 In some embodiments, step 302 includes, but is not limited to, steps 401 to 402.

[0056] Step 401: Determine the evolutionary trend based on the first abundance change rate and the second abundance change rate; Step 402: Determine the evolutionary trend score based on the corresponding evolutionary trend of each clone pair at each time point.

[0057] In this embodiment, under antigen-driven or environmental selection pressure, the parent clone undergoes sequence mutations to produce daughter clones, accompanied by a synergistic change in the abundance of the parent clone and the abundance of the daughter clone. Specifically, the evolutionary trend type is classified based on the positive and negative values ​​and absolute values ​​of the first and second abundance change rates. Examples include a positive evolutionary trend (negative first abundance change rate of the parent clone (abundance decrease), positive second abundance change rate of the daughter clone (abundance increase)), no evolutionary trend (abundance change rates of both the parent and daughter clones are below the significance threshold), and a degenerate evolutionary trend (abundance change rates of both the parent and daughter clones are negative).

[0058] A global consistency score can be calculated for clone pairs based on their evolutionary trends over different time intervals, resulting in a global quantitative evolutionary trend score that quantifies the dynamic synergy of abundance. For example, different weights can be applied to different evolutionary trend types; high weights can be assigned to clone pairs with positive evolutionary trends, low weights to clone pairs with no evolutionary trend, and medium weights to clone pairs with degenerate evolutionary trends. The weighted summation of clone pairs at each time point yields the global evolutionary trend score.

[0059] A scoring mechanism is constructed based on the evolutionary trend type and the absolute value of the abundance change rate. The global evolutionary activity intensity of clones is quantified by the score, which improves the accuracy of distinguishing between real evolution and pseudo-evolution.

[0060] Please see Figure 5 In some embodiments, step 402 includes, but is not limited to, steps 501 to 502.

[0061] Step 501: Obtain the weight at each time point; Step 502: Based on the weight of each time point, the evolutionary trend of each clone pair at each time point is weighted to obtain the evolutionary trend score.

[0062] In this embodiment, different weights can be assigned to the evolutionary trends at different times. For example, in the scenario of tumor immunotherapy monitoring, samples at key treatment nodes (such as efficacy assessment nodes 1 month and 3 months after treatment) have clonal abundance changes that better reflect the driving effect of treatment on immune clones, and therefore can be assigned higher weights. Meanwhile, the baseline time point before treatment or the follow-up time point at the end of treatment can be assigned medium or low weights based on their contribution to the evolutionary trajectory. The final evolutionary trend score of the clonal pair can be obtained by summing the weighted scores of all time points.

[0063] The evolution of clonal clones in an immune repertoire is a continuous and dynamic process, with significant correlations and differences in evolutionary characteristics at different time points. Introducing a weighted calculation with time point weights allows for the integration of the evolutionary trends of clonal pairs at different time points, enabling the scoring results to comprehensively reflect the entire evolutionary trajectory of the clone, rather than isolated characteristics at a single time point. This approach better aligns with the biological dynamics of clonal evolution, making the time series of clonal evolution selected based on scoring more continuous and realistic.

[0064] In one example, specifically, the candidate clone set table obtained by pre-filtering at each time point is first input. This set table contains key information such as the CDR3 core sequence, VDJ gene annotation, number of reads, clone frequency, and percentage of nucleated cells. Based on this, a time series matrix is ​​built for each clone (with the CDR3 sequence as the unique identifier). The matrix records in detail the number of reads or relative frequency data of the clone at different time points, so as to achieve a complete characterization of the abundance change trajectory of a single clone over time.

[0065] Based on this, the dynamic trend of each clone is identified by analyzing the abundance time series data. Specifically, this is achieved by calculating the temporal variance and the slope of the growth trend of clonal abundance. The temporal variance is used to measure the degree of fluctuation of clonal abundance in the time dimension, while the slope of the growth trend is used to determine the direction of change in clonal abundance. Combined with dynamic thresholds, the cases with small variance and no significant change in slope are judged as random fluctuations. The cases with variance that conforms to biological laws and slope that shows a significant increase or decrease are judged as expansion or contraction trends, respectively. The cases that only appear at some time points and have no continuous change trajectory are judged as new clonals or noise behavior.

[0066] Subsequently, for potential father-son clone pairs, the correlation of their abundance changes over time is calculated to define the evolutionary synergy index. Its core principle is based on the biological laws of real clonal evolution, that is, when the father clone mutates under antigen selection pressure to produce a daughter clone, it is often accompanied by an inverse trend of the father clone abundance shrinking and the daughter clone abundance expanding. Moreover, the initial abundance of the daughter clone is usually lower than that of the father clone. By quantifying this inverse correlation, the evolutionary synergy index is obtained. The higher the index, the closer the evolutionary association of the clone pair.

[0067] Subsequently, by integrating information such as abundance dynamics and evolutionary co-occurrence indices of clone pairs at all time points, a global consistency score was calculated for each clone pair. This score comprehensively reflects the continuity, relevance, and biological rationality of the abundance change trend of the clone pair, thereby assessing the authenticity of its changes over time.

[0068] Finally, based on all the above validation data, clone pairs with global consistency scores higher than the preset threshold and evolutionary synergy indices consistent with real evolutionary patterns are retained, while pseudo-clone pairs with excessively low scores and no obvious synergy in abundance changes are removed. This achieves accurate validation of clonal evolutionary relationships and provides a high-quality data foundation for the subsequent construction of clonal evolution time series and evolutionary maps.

[0069] Please see Figure 6 In some embodiments, step 205 includes, but is not limited to, steps 601 to 602.

[0070] Step 601: Convert the evolutionary relationships of clone pairs in the clonal evolution time series into directed edges; Step 602: Construct a clonal evolution graph based on directed edges.

[0071] In this embodiment, in the clonal evolution map, a directed edge is a directional connecting line segment used to characterize the evolutionary relationship between clonal clusters at different time points. This process identifies all valid clonal pairs in the clonal evolution time series that have passed the evolutionary trend score screening, clarifying the correspondence between the parent clonal cluster at the preceding time point and the child clonal cluster at the subsequent time point in each clonal pair. Subsequently, using clonal clusters as nodes in the map and the direction from the parent clonal cluster to the child clonal cluster as the edge direction, directed edges representing the evolutionary relationship are constructed. During the transformation process, the directed edges can be assigned differentiated attribute characteristics. For example, the thickness of the edges can be set according to the evolutionary trend score of the clonal pair; for instance, the higher the score of the clonal pair, the thicker the corresponding directed edge, thus intuitively reflecting the reliability of the evolutionary relationship. Simultaneously, a color gradient can be marked on the directed edges according to the abundance change from the parent clonal cluster to the child clonal cluster; the greater the abundance increase, the darker the edge color. The node size can be set according to the abundance ratio of the clonal clusters. The higher the abundance of the clonal cluster, the larger the corresponding node volume. Then, the constructed directed edges are connected to the corresponding nodes according to the topological relationship to form a complete evolutionary network and obtain the clonal evolution map.

[0072] The graph arranges nodes around a timeline and combines the pointing relationships of directed edges to accurately reconstruct the evolutionary order of clonal clusters at different points in time. Each directed edge corresponds to a clear temporal inheritance relationship, making the origin and differentiation path of any clonal cluster directly traceable.

[0073] In one example, a set of nodes for the phylogenetic map is first established. Each node in the set corresponds to a dynamically validated real clone, ensuring that all clones included in the map have reliable biological evolutionary significance. At the same time, each node is assigned initial attributes that cover the core molecular and evolutionary characteristics of the clone, including the CDR3 sequence that uniquely identifies the clone, the VDJ gene information that reflects the gene recombination pattern, the evolutionary level that reflects the position of evolutionary inheritance, the clone size that characterizes the size of the clone population, and the clone marker used to distinguish between dominant clones and specific clones.

[0074] Subsequently, based on the sequence characteristics and abundance dynamics of each clone in the node set, potential parent-child clone evolutionary relationships are identified through spatial distance inference and hierarchical clustering methods. The spatial distance is quantified by the Levenstein distance to measure the sequence homology between clones. Hierarchical clustering further filters clone pairs with real evolutionary associations based on spatial distance and evolutionary synergy index. The evolutionary inheritance relationship between these parent-child clones is then transformed into directed edges in the graph, with the direction of the directed edges pointing from the parent clone node to the child clone node, thereby clarifying the path direction of clonal evolution.

[0075] Based on this, the node list is traversed and starting from a pre-defined marker node (i.e., the master clone, usually the dominant clone at the initial time point or the core clone with clear biological significance), and all sub-clone nodes that have direct or indirect evolutionary connections with it are traced through a recursive process. The complete evolutionary chain extending from the marker node is sorted out, and then the discrete nodes and directed edges are integrated into a systematic evolutionary phylogenetic structure.

[0076] Finally, the completed pedigree diagram is visualized using either a network graph structure or a tree structure. The network graph structure can present complex clonal differentiation and crossover evolutionary relationships, while the tree structure more clearly shows the hierarchical evolutionary path of a single lineage. Both structures can intuitively present the evolutionary relationships between clones and the dynamic changes of clones at different time points. At the same time, a data statistics table containing node attributes, directed edge association information, and evolutionary chain statistics is generated. In some embodiments, step 202 includes, but is not limited to, performing pseudo-clone filtering on the complementary decision region sequence set based on preset rules to obtain a first sequence set. The preset rules include one or more of frequency ratio, edit distance, and sequence inclusion relationship.

[0077] Frequency ratio refers to the proportion of the occurrence frequency of a single complementarity-determining region (CDR) sequence to the total frequency of the CDR sequence set. Its core filtering logic is that "pseudo-cloned sequences are mostly low-frequency error sequences." True clones exhibit a higher frequency ratio due to the bias of PCR amplification, while pseudo-cloned sequences caused by technical errors usually appear randomly and have extremely low frequencies. In practice, it is necessary to first count the absolute occurrence frequency of each sequence in the sequence set, calculate its frequency ratio, and then set a reasonable frequency ratio threshold to eliminate all sequences with a frequency ratio lower than the threshold. For example, based on the sequencing error rate of the sequencing platform ±3 standard deviations, a sequencing error empirical value of 0.002728 is obtained. If the ratio of the cloning frequency of the daughter sequence to the cloning frequency of the parent sequence is less than the sequencing error empirical value, it is included in the pseudo-cloning judgment.

[0078] Edit distance-based filtering needs to consider the evolutionary characteristics of immune receptor sequences. During immune repertoire sequencing, technical errors such as PCR amplification mismatches and sequencer base recognition biases typically only cause minor variations of 1-2 residues in the sequence. However, the evolution of real immune cell clones is often accompanied by antigen-driven directed mutations, resulting in much larger sequence differences. By calculating the spatial edit distance between the target sequence and the high-frequency core reference sequence in the complementarity-determining region (CDR) sequence set, and combining this with a preset threshold to identify and remove pseudo-clones caused by sequencing errors, the true CDR sequences are purified. For example, the Levenstein distance is used to calculate the spatial edit distance between the parent and child sequences to analyze the correlation between sequences. If the spatial distance is ≤2, it suggests that the child sequence may be a pseudo-clone generated by sequencing errors in the parent sequence.

[0079] Sequence inclusion relationships refer to complete fragment nesting between two complementary determinant region (CDR) sequences. Pseudo-clones are often fragments of the true sequence with missing or redundant segments. If an inclusion relationship exists between a child and parent sequence, the sequence needs to be extended from CDR3 to the full-length amplification for pseudo-clone identification. Frequency ratio inference and correlation analysis should be performed to determine if a pseudo-clone exists. In immune repertoire sequencing, non-specific PCR amplification or local breaks in sequencing reads may lead to the loss of some bases in the true sequence, forming truncated sequences, or the addition of irrelevant bases, forming redundant sequences. These truncated or redundant sequences are pseudo-clones. During the procedure, the base composition of any two sequences in the sequence set can be compared. If all bases of sequence A can completely match a continuous segment of sequence B, then an inclusion relationship exists between them. In this case, further judgment based on frequency ratio is needed. The complete sequence with the higher frequency ratio is retained, while the truncated or redundant sequences with extremely low frequency ratios are discarded, thereby eliminating pseudo-clone interference caused by sequence fragmentation.

[0080] In some embodiments, step 201 includes, but is not limited to, performing sequence structure analysis on batch sequencing data to obtain a set of complementarity-determining region sequences.

[0081] In this embodiment, sequence structure analysis refers to the bioinformatics analysis process in the batch sequencing data processing workflow of immune repertoires, based on the gene structure characteristics of the variable region of the immune receptor (BCR / TCR), using methods such as sequence alignment and conserved region identification to locate and extract the complementarity-determining region (CDR1, CDR2, CDR3) sequences. This can involve preprocessing the batch sequencing data, removing interfering data generated during sequencing through operations such as adapter sequence removal, low-quality base filtering, and length uniformization screening, obtaining purified data containing only effective immune receptor gene sequences. Based on this, sequence structure analysis is performed on the purified batch sequencing data, globally aligning the effective sequences in the purified data with reference sequences to locate gene regions of the variable region, and identifying and extracting complementarity-determining region sequences, thereby obtaining a set of complementarity-determining region sequences.

[0082] In one example, multiplex PCR primers were first used to target and capture the lymphocyte receptor gene region. These primers can specifically bind to conserved regions of the receptor gene, achieving efficient enrichment of the target gene fragment. Subsequently, an immune repertoire library was constructed based on the captured product, and PE150 paired-end sequencing was performed to generate raw data for the immune repertoire sequencing covering the receptor gene region. The paired-end sequencing mode can improve the length and accuracy of sequence reads.

[0083] Based on this, a systematic analysis of the raw data was carried out, and key steps were completed in sequence, including data filtering and quality control, sequence alignment and re-alignment to determine the V(D)J gene recombination sequence, sequence structure analysis to locate the CDR3 nucleotide sequence and deduce its corresponding amino acid sequence. Among these, the identification of CDR3 and the annotation of the VDJ gene are the core links of the entire analysis process. This is because the CDR3 region, as the core region of the lymphocyte receptor antigen binding site, is not only the region with the highest lymphocyte diversity, but also has extremely high specificity. Different lymphocytes correspond to different CDR3 sequences. Therefore, it is not necessary to rely on the full-length receptor sequence; a lymphocyte clone can be uniquely defined solely by the CDR3 sequence.

[0084] After sequence parsing, multi-dimensional clonal feature information is further integrated for specific screening, including clonal frequency abundance, discontinuous distribution characteristics of clones in the sample, quantitative proportion of nucleated cells corresponding to the clone, sequence abundance of the receptor chain to which the clone belongs, and the distribution frequency of the clone in the normal population. By comparing the differences between tumor-related clones and normal background clones in the above features, such as tumor-specific clones usually showing abnormally high frequency abundance, non-random distribution characteristics, and extremely low distribution in the normal population, it is possible to accurately determine which clones in the sample belong to hematologic malignancy lymphocyte clones and which clones belong to normal background clones. This provides a clear analytical target for subsequent algorithms to distinguish between tumor clones and normal clones and track the evolutionary trajectory of tumor clones.

[0085] In one example, a schematic diagram of a hematologic tumor clonal evolution atlas construction system according to an embodiment of this application can be referred to... Figure 7 As shown, it includes 4 modules.

[0086] Module 1 is the Bulk immune repertoire data self-analysis and annotation module, which includes Bulk immune repertoire construction and sequencing based on multiplex PCR primers, immune repertoire data analysis (CDR3 identification and VDJ gene annotation), and identification of high-frequency abnormal tumor clones. Module 2 is the single-sample error identification and clone clustering module, which includes pseudo-variable identification and screening based on abundance ratio and distribution stability, and clone clustering based on spatial edit distance. Module 3 is the dynamic collaborative screening module based on multi-time point evolutionary data, which includes clone time series construction, dynamic pattern recognition of clone abundance, evolutionary direction consistency verification, multi-time point consistency scoring, and execution of time series filtering and clone retention. Module 4 is the clone evolutionary lineage construction module, which includes node definition and attribute initialization, edge (relationship) establishment and direction inference, evolutionary lineage construction and hierarchical expansion, lineage result data generation and data statistics. The four modules are connected sequentially. The output of Module 1 serves as the input of Module 2, the output of Module 2 enters Module 3, and the results of Module 3 are passed to Module 4, forming a complete analysis workflow.

[0087] Module 1 first constructs and sequences the Bulk immune repertoire library using multiplex PCR primers, then analyzes the sequencing data to identify CDR3 and annotate the VDJ gene, ultimately achieving preliminary identification of high-frequency abnormal tumor clones and providing basic sequence and candidate clone data for subsequent analysis.

[0088] Building on this, Module 2 takes over the output of Module 1. It first identifies and filters pseudo-mutant sequences based on abundance ratio and distribution stability, and then performs clonal clustering on real sequences through spatial edit distance to further purify the clonal data and eliminate interference caused by technical errors.

[0089] Next, module 3 uses clone data from multiple time points to first construct a time series matrix of clones, then identify their abundance dynamics, verify the consistency of evolutionary direction, and complete the consistency score at multiple time points. Finally, based on these indicators, time series filtering is performed to retain clone pairs with real evolutionary associations.

[0090] Finally, Module 4 transforms the selected clone pairs into elements of a visual atlas: first, it defines and initializes the nodes' attributes; then, it establishes evolutionary relationships (edges) between clones and infers their directions; next, it constructs an evolutionary lineage and expands its hierarchy, ultimately generating lineage data and statistical information to provide an intuitive presentation of the clonal evolution process. The entire process, through the orderly connection between modules, gradually improves the authenticity and reliability of the data, ultimately outputting an interpretable clonal evolutionary lineage.

[0091] In one example, Figure 7 Based on this, a specific example of cloning evolution analysis can be described as follows: The Bulk immune repertoire data self-analysis and annotation module extracted DNA from bone marrow samples of 48 patients with ph+All hematologic diseases. An immune repertoire library based on multiplex PCR was constructed using an immune repertoire detection kit, and PE150 sequencing was performed. Samples were collected from each patient at four time points: initial diagnosis (before treatment), Day 46 (46 days after treatment), maintenance (during maintenance therapy), and relapse. The raw sequencing data were analyzed using immune repertoire analysis software to obtain results including full-length sequences, CDR3 sequences, VDJ gene annotation, and alignment information. High-frequency abnormal tumor clones in the pre-treatment samples were identified using indicators such as clonal frequency, discontinuous distribution, total nucleated cell percentage, and total number of reads, identifying tumor-related clones for subsequent tracking. Tumor-related clones were continuously tracked in post-treatment samples.

[0092] Based on the single-sample error identification and clone clustering module, error identification and clone clustering are first performed on single samples. For any two CDR3 sequences (parent and child sequences), frequency ratio, edit distance, and sequence inclusion relationship are analyzed to identify and eliminate spurious variations: when the frequency ratio of the child sequence is lower than the sequencing error threshold (0.002728) or its Levenstein distance with the parent sequence is ≤2, it is determined to be a spurious clone; if there is an inclusion relationship between the sequences, it is extended to the full-length sequence and verified again. The high-confidence clone set retained after screening is characterized by CDR3 sequences. The edit distance between clones is calculated and a similarity matrix is ​​constructed. Hierarchical clustering is used to identify clone clusters that are close to each other in the sequence space. Each cluster represents a potential evolutionary population, providing structured input for subsequent pedigree chart construction.

[0093] The dynamic collaborative screening module based on multi-timepoint evolutionary data first inputs the candidate clone sets (including CDR3 sequence, V / J gene annotation, read count, relative frequency, and cell proportion information) after screening and clustering at each time point during the multi-timepoint dynamic validation stage. A time-series matrix is ​​constructed for each clone, recording its abundance changes at different time points. Subsequently, the frequency dynamics of the clones are analyzed, and their change patterns are identified by calculating temporal variance and growth slope, distinguishing between stable fluctuations, genuine amplification, or spurious mutations. Next, for parent-child clone pairs that may have evolutionary relationships, their abundance correlation in the time dimension is calculated, defining the evolutionary co-existence index (ECS) to reflect the amplification trend of the child clone when the parent clone shrinks or decays. Further, the dynamic information from each time point is integrated through a multi-timepoint consistency score (TCS) to assess the overall reliability of clonal evolution. Finally, based on the joint judgment results of ECS and TCS, temporal noise filtering is performed, retaining genuine clonal relationships with stable evolutionary characteristics in multi-timepoint changes and eliminating spurious clones originating from sequencing errors or random fluctuations, thereby obtaining a highly reliable dynamic clonal evolution structure.

[0094] The clonal evolutionary phylogenetic construction module first establishes a node set during the phylogenetic graph construction phase. Each dynamically validated real clone is defined as a node, and its attribute information is initialized, including CDR3 sequence, V / J gene annotation, evolutionary level, clonal frequency, and marker status. Then, based on spatial edit distance and hierarchical clustering results, the mutational relationships between parent and child clones are determined and transformed into directed edges in the phylogenetic graph to indicate the direction of clonal evolution. Next, by recursively traversing the nodes, the evolutionary path is expanded layer by layer from the master clone (marker node), constructing a complete hierarchical chain structure. Finally, the phylogenetic results are visualized as a network graph or tree structure, showing the mutational relationships and abundance changes between clones, and generating corresponding statistical tables to describe the number, depth, and dynamic characteristics of clonal lines.

[0095] For example, please refer to Figure 8 The two patient clonal evolution dynamic network diagrams shown illustrate the clonal network diagrams at four time points: initial diagnosis (before treatment), Day 46 (46 days after treatment), maintenance (during maintenance treatment), and recurrence. Each point represents a clone, and the connection of each line represents an evolving clone. The connections of multiple evolving clones form a dense lineage in space. Red marks indicate abnormally high-frequency clones (tumor clones), and green marks indicate high-frequency evolving clonal lineages generated during the evolution process.

[0096] In one example, see Figure 9The diagram illustrates the dynamic evolution of cell lineages at multiple time points. It shows the recurring cell lineages (Lineages) in the same patient at different time points and demonstrates the evolutionary trend of the abundance of each cell lineage over different periods. Each rectangular bar represents a specific treatment time point, and the color area change of each region represents the change in the abundance of the cell lineage, i.e., the change in clonal frequency. Specifically, Lineage(n)_marker represents the abundance and change of tumor cells sorted at n; Lineage(n)_other represents the abundance and change of the evolved clone corresponding to tumor cells sorted at n; and other_Lineage represents the abundance and change of cell lineages unrelated to the tumor.

[0097] In one example, see Figure 10 The heatmap showing the similarity of the top 10 clones between the two samples is illustrated. The horizontal and vertical axes correspond to the top 10 clones of the two samples, respectively. The Levinstein distance is used as a quantitative indicator to show the differences between the sequences. The sample in the figure is labeled MGI250927N01-34. The rows and columns of the heatmap correspond to these 10 high-frequency CDR3 sequences (Top 1-Top 10). The colors represent the distance grouping between sequence pairs: green (match / similar) indicates a small Levinstein distance and high sequence similarity, while dark gray (unrelated) indicates a large distance and no obvious homology between the sequences. Specifically: match indicates that the clone CDR3 sequences are completely matched; similar indicates that the spatial distance between the CDR3 sequences is ≤2, and the sequences are similar; unrelated indicates that the spatial distance is >2, and the sequences are not matched.

[0098] In one example, see Figure 11The scatter plot shows the correlation of IGH CDR3 nucleotide sequence (NT) frequencies between the two samples. The two samples are labeled MGI250927N01-34 and MGI250927N01-27. The axes correspond to the frequencies of each CDR3 sequence in the two samples (both on a logarithmic scale). The distribution of points represents the frequency matching of different CDR3 sequences in the two samples. Red points correspond to CDR3 sequences that appear only in one of the samples. The Pearson correlation coefficient is 0.9957 (close to 1, indicating a very strong linear positive correlation between the frequencies of the common CDR3 sequences in the two samples), while the Spearman correlation coefficient is -0.4044 (negative and with a small absolute value, indicating a weak correlation in the overall sequence frequency ranking). The overlap ratio "overlap=341 / 2603=13.10%" indicates that the common CDR3 sequences in the two samples account for only about 13.1% of the total sequence count. Most points in the figure are concentrated in the low to medium frequency region, and the red points (unique sequences) are mainly distributed on both sides of the coordinate axis, which can indicate that the CDR3 sequence composition of the two samples is quite different, but the common sequences are highly consistent in frequency.

[0099] In one example, see Figure 12 The Sankey plot of the Top 10 clones between the two samples shown is a stacked bar chart displaying the distribution of the proportion of IGH Top 10 clones in the two samples (MGI250927N01-27 and MGI250927N01-34). The vertical axis represents the proportion of the Top 10 clones in the sample (with a maximum value of 100%), and the horizontal axis corresponds to the two samples. Different colored blocks represent different types of clones (including tumor-related Top clones in MGI250927N01-34, tumor / normal Top clones in MGI250927N01-27, and other types). As can be seen from the figure, the proportion of Top 10 clones in both samples is close to 100%, but there are significant differences in their composition: the Top 10 clones of MGI250927N01-27 are mainly of the "Others" class, with only a small number of clones such as MGI250927N01-27_Top2_Tumor; while the Top 10 clones of MGI250927N01-34 are mainly tumor-related clones such as MGI250927N01-34_Top1_Tumor, and also include some clones from MGI250927N01-27.

[0100] In one example, see Figure 13The clonal phylogenetic tree and multiple sequence alignment diagram shown illustrate the correlation between clonal evolutionary relationships and feature distribution, including a phylogenetic tree on the left (Figure A) and a heatmap on the right (Figure B). Figure A is a hierarchical clustering tree constructed based on clonal sequence similarity. Each terminal node corresponds to a clone (multiple clonal identifiers are labeled in the example in the figure, such as PB-Top4). The branching structure of the tree reflects the evolutionary kinship between clones; the closer the branches of the clones, the higher the sequence similarity and the closer the evolutionary association. Figure B is a feature heatmap corresponding to this clustering tree. Each row corresponds to a clone in Figure A, and the intensity or hue of the color represents the quantification level of a certain feature of that clone (such as abundance, gene annotation type, etc.). The columns of the heatmap correspond to different samples or time points. Combining the two allows us to identify clonal clusters with similar evolutionary relationships through the clustering tree and to visually observe the distribution patterns of these clonal clusters in terms of features through the heatmap, such as whether clones corresponding to a certain evolutionary lineage exhibit similar abundance characteristics in specific samples.

[0101] The above describes a method for constructing a clonal evolution map of hematologic malignancies. The following describes an apparatus for performing this method.

[0102] Please see Figure 14 The diagram shows a structural schematic of a hematologic malignancy clonal evolution map construction device. The device 1400 includes: Acquisition unit 1401 is used to acquire batch sequencing data and extract a set of complementarity-determining region sequences from the batch sequencing data; Filtering unit 1402 is used to perform pseudo-clone filtering on the complementarity determination region sequence set to obtain a first sequence set; Clustering unit 1403 is used to cluster the complementary determination region sequences in the first sequence set based on edit distance to obtain multiple clone clusters; The verification unit 1404 is used to perform evolutionary co-verification of clone pairs in a clone cluster at multiple time points to obtain an evolutionary trend score; Building unit 1405 is used to filter clonal clusters at multiple time points based on evolutionary trend scores to obtain clonal evolution time series, and to construct a clonal evolution map based on the clonal evolution time series.

[0103] Optionally, in one embodiment, the verification unit 1404 is specifically used for: Determine the first abundance change rate of the parent clone and the second abundance change rate of the daughter clone for clone pairs at multiple time points; Evolutionary trend scores were determined based on the first and second abundance change rates.

[0104] Optionally, in one embodiment, the verification unit 1404 is specifically used for: The evolutionary trend was determined based on the first abundance change rate and the second abundance change rate; An evolutionary trend score is determined based on the evolutionary trend of each clone pair at each time point.

[0105] Optionally, in one embodiment, the verification unit 1404 is specifically used for: Obtain the weight at each time point; The evolutionary trend of each clone pair at each time point is weighted according to the weight of each time point to obtain the evolutionary trend score.

[0106] Optionally, in one embodiment, the building unit 1405 is specifically used for: Convert the evolutionary relationships of clone pairs in the clonal evolution time series into directed edges; Clonal evolution graphs are constructed based on directed edges.

[0107] Optionally, in one embodiment, the filtering unit 1402 is specifically used for: The complementary decision region sequence set is filtered for pseudo-clones based on preset rules to obtain the first sequence set. The preset rules include one or more of the following: frequency ratio, edit distance, and sequence inclusion relationship.

[0108] Optionally, in one embodiment, the acquisition unit 1401 is specifically used for: Sequence structure analysis was performed on batch sequencing data to obtain a set of complementarity-determining region sequences.

[0109] This application also provides an electronic device, which includes a memory and a processor. The memory stores a computer program, and the processor executes the computer program to implement the above-described method for constructing a clonal evolution map of hematological malignancies. This electronic device can be any smart terminal, including tablet computers, in-vehicle computers, etc.

[0110] Please see Figure 15 , Figure 15 The hardware structure of an electronic device according to another embodiment is illustrated. The electronic device includes: The processor 1501 can be implemented using a general-purpose CPU (Central Processing Unit), microprocessor, application-specific integrated circuit (ASIC), or one or more integrated circuits, and is used to execute relevant programs to implement the technical solutions provided in the embodiments of this application. The memory 1502 can be implemented as a read-only memory (ROM), static storage device, dynamic storage device, or random access memory (RAM). The memory 1502 can store the operating system and other applications. When the technical solutions provided in the embodiments of this specification are implemented through software or firmware, the relevant program code is stored in the memory 1502 and is called and executed by the processor 1501 to execute the hematologic tumor clonal evolution map construction method of the embodiments of this application. The input / output interface 1503 is used to implement information input and output; The communication interface 1504 is used to enable communication and interaction between this device and other devices. Communication can be achieved through wired means (such as USB, Ethernet cable, etc.) or wireless means (such as mobile network, WIFI, Bluetooth, etc.). Bus 1505 transmits information between various components of the device (e.g., processor 1501, memory 1502, input / output interface 1503, and communication interface 1504); The processor 1501, memory 1502, input / output interface 1503 and communication interface 1504 are connected to each other within the device via bus 1505.

[0111] This application also provides a computer-readable storage medium storing a computer program that, when executed by a processor, implements the above-described method for constructing a clonal evolution map of hematologic tumors.

[0112] Memory, as a non-transitory computer-readable storage medium, can be used to store non-transitory software programs and non-transitory computer-executable programs. Furthermore, memory may include high-speed random access memory, and may also include non-transitory memory, such as at least one disk storage device, flash memory device, or other non-transitory solid-state storage device. In some embodiments, memory may optionally include memory remotely located relative to the processor, and these remote memories can be connected to the processor via a network. Examples of such networks include, but are not limited to, the Internet, intranets, local area networks, mobile communication networks, and combinations thereof.

[0113] The embodiments described in this application are for the purpose of more clearly illustrating the technical solutions of the embodiments of this application, and do not constitute a limitation on the technical solutions provided by the embodiments of this application. As those skilled in the art will know, with the evolution of technology and the emergence of new application scenarios, the technical solutions provided by the embodiments of this application are also applicable to similar technical problems.

[0114] Those skilled in the art will understand that the technical solutions shown in the figures do not constitute a limitation on the embodiments of this application, and may include more or fewer steps than shown, or combine certain steps, or different steps.

[0115] The device embodiments described above are merely illustrative. The units described as separate components may or may not be physically separate; that is, they may be located in one place or distributed across multiple network units. Some or all of the modules can be selected to achieve the purpose of this embodiment according to actual needs.

[0116] Those skilled in the art will understand that all or some of the steps in the methods disclosed above, as well as the functional modules / units in the systems and devices, can be implemented as software, firmware, hardware, or suitable combinations thereof.

[0117] The terms “first,” “second,” “third,” “fourth,” etc. (if present) in the specification and accompanying drawings of this application are used to distinguish similar objects and are not necessarily used to describe a specific order or sequence. It should be understood that such data can be interchanged where appropriate so that the embodiments of this application described herein can be implemented in orders other than those illustrated or described herein. Furthermore, the terms “comprising” and “having,” and any variations thereof, are intended to cover non-exclusive inclusion; for example, a process, method, system, product, or apparatus that comprises a series of steps or units is not necessarily limited to those steps or units explicitly listed, but may include other steps or units not explicitly listed or inherent to such processes, methods, products, or apparatus.

[0118] It should be understood that in this application, "at least one (item)" means one or more, and "more than" means two or more. "And / or" is used to describe the relationship between related objects, indicating that three relationships can exist. For example, "A and / or B" can represent three cases: only A exists, only B exists, and both A and B exist simultaneously, where A and B can be singular or plural. The character " / " generally indicates that the preceding and following related objects are in an "or" relationship. "At least one (item) of the following" or similar expressions refer to any combination of these items, including any combination of single or plural items. For example, at least one (item) of a, b, or c can represent: a, b, c, "a and b", "a and c", "b and c", or "a and b and c", where a, b, and c can be single or multiple.

[0119] In the several embodiments provided in this application, it should be understood that the disclosed apparatus and methods can be implemented in other ways. For example, the apparatus embodiments described above are merely illustrative; for instance, the division of the units described above is only a logical functional division, and in actual implementation, there may be other division methods. For example, multiple units or components may be combined or integrated into another system, or some features may be ignored or not executed. Furthermore, the coupling or direct coupling or communication connection shown or discussed may be through some interfaces; the indirect coupling or communication connection between apparatuses or units may be electrical, mechanical, or other forms.

[0120] The units described above as separate components may or may not be physically separate. The components shown as units may or may not be physical units; that is, they may be located in one place or distributed across multiple network units. Some or all of the units can be selected to achieve the purpose of this embodiment according to actual needs.

[0121] Furthermore, the functional units in the various embodiments of this application can be integrated into one processing unit, or each unit can exist physically separately, or two or more units can be integrated into one unit. The integrated unit can be implemented in hardware or as a software functional unit.

[0122] If the integrated unit is implemented as a software functional unit and sold or used as an independent product, it can be stored in a computer-readable storage medium. Based on this understanding, the technical solution of this application, in essence, or the part that contributes to the prior art, or all or part of the technical solution, can be embodied in the form of a software product. This computer software product is stored in a storage medium and includes multiple instructions to cause a computer device (which may be a personal computer, server, or network device, etc.) to execute all or part of the steps of the methods of the various embodiments of this application. The aforementioned storage medium includes various media capable of storing programs, such as USB flash drives, portable hard drives, read-only memory (ROM), random access memory (RAM), magnetic disks, or optical disks.

[0123] The preferred embodiments of the present application have been described above with reference to the accompanying drawings, but this does not limit the scope of the claims of the present application. Any modifications, equivalent substitutions, and improvements made by those skilled in the art without departing from the scope and substance of the embodiments of the present application shall be within the scope of the claims of the present application.

Claims

1. A method for constructing a clonal evolution map of hematologic malignancies, characterized in that, include: Acquire batch sequencing data and extract a set of complementarity-determining region sequences from the batch sequencing data; The set of complementary determination region sequences is subjected to pseudo-clone filtering to obtain a first set of sequences; The complementary determination region sequences in the first sequence set are clustered based on edit distance to obtain multiple cloning clusters; Evolutionary co-verification was performed on clone pairs in the clonal cluster at multiple time points to obtain an evolutionary trend score; The clonal clusters at multiple time points are filtered based on the evolutionary trend score to obtain a clonal evolution time series, and a clonal evolution map is constructed based on the clonal evolution time series.

2. The method according to claim 1, characterized in that, The evolutionary co-validation of clone pairs in the clonal cluster to obtain an evolutionary trend score includes: Determine the first abundance change rate of the parent clone and the second abundance change rate of the child clone for the clone pair at multiple time points; An evolutionary trend score is determined based on the first abundance change rate and the second abundance change rate.

3. The method according to claim 2, characterized in that, The determination of the evolutionary trend score based on the first abundance change rate and the second abundance change rate includes: The evolution trend is determined based on the first abundance change rate and the second abundance change rate; An evolutionary trend score is determined based on the evolutionary trend of each clone pair at each time point.

4. The method according to claim 3, characterized in that, The determination of evolutionary trend scores based on the evolutionary trends of each clone pair at each time point includes: Obtain the weight at each time point; The evolutionary trend of each clone pair at each time point is weighted based on the weight of each time point to obtain an evolutionary trend score.

5. The method according to claim 1, characterized in that, The construction of the clonal evolution map based on the clonal evolution time series includes: The evolutionary relationships of clone pairs in the clonal evolution time series are converted into directed edges; A clonal evolution graph is constructed based on the directed edges.

6. The method according to claim 1, characterized in that, The step of performing pseudo-clone filtering on the complementarity determination region sequence set to obtain a first sequence set includes: The complementary decision region sequence set is filtered for pseudo-clones based on preset rules to obtain a first sequence set. The preset rules include one or more of the following: frequency ratio, edit distance, and sequence inclusion relationship.

7. The method according to claim 1, characterized in that, The extraction of the complementarity-determining region sequence set from the batch sequencing data includes: Sequence structure analysis was performed on the batch sequencing data to obtain a set of complementarity-determining region sequences.

8. A device for constructing a clonal evolution map of hematologic tumors, characterized in that, include: An acquisition unit is used to acquire batch sequencing data and extract a set of complementarity-determining region sequences from the batch sequencing data. A filtering unit is used to perform pseudo-clone filtering on the complementary determination region sequence set to obtain a first sequence set; Clustering units are used to cluster the complementary determination region sequences in the first sequence set based on edit distance to obtain multiple clone clusters; The verification unit is used to perform evolutionary co-verification on clone pairs in the clone cluster at multiple time points to obtain an evolutionary trend score; The construction unit is used to filter the clonal clusters at multiple time points based on the evolutionary trend score to obtain a clonal evolution time series, and to construct a clonal evolution map based on the clonal evolution time series.

9. An electronic device, characterized in that, include: Memory, transceiver, processor, and bus system; The memory is used to store programs; The processor is configured to execute a program in the memory, including performing the method as described in any one of claims 1 to 7; The bus system is used to connect the memory and the processor to enable communication between the memory and the processor.

10. A computer-readable storage medium, characterized in that, Includes instructions that, when run on a computer, cause the computer to perform the method as described in any one of claims 1 to 7.