A method for predicting the response to therapy for disorders via the core microbiome guild.

A genome-centered approach identifies stable relationships between high-quality metagenomic assembly genomes to reveal core microbiome components, addressing the limitations of existing methods by predicting therapeutic responses and managing disease through a seesaw-like network of bacterial guilds.

JP2026519948APending Publication Date: 2026-06-19RUTGERS THE STATE UNIV +1

Patent Information

Authority / Receiving Office
JP · JP
Patent Type
Applications
Current Assignee / Owner
RUTGERS THE STATE UNIV
Filing Date
2024-04-25
Publication Date
2026-06-19

AI Technical Summary

Technical Problem

Existing methods for characterizing the core microbiome primarily focus on presence, absence, abundance, or prevalence of specific taxa or gene/pathways, inadequately representing crucial ecological interactions that influence disease states, and lack a comprehensive understanding of microbial guilds' cooperative or competitive behaviors.

Method used

Employing a genome-centered, reference-free approach to identify stable relationships between high-quality metagenomic assembly genomes (HQMAGs) to reveal core microbiome components, utilizing a seesaw-like network of two competing bacterial guilds that exhibit cooperative and competitive interactions, and applying machine learning models to predict therapeutic responses.

Benefits of technology

This approach provides a more comprehensive depiction of the microbiome, revealing critical ecological networks and enabling personalized health interventions by predicting responses to immunotherapy and managing disease risk through balanced microbiome regulation.

✦ Generated by Eureka AI based on patent content.

Smart Images

  • Figure 2026519948000024
    Figure 2026519948000024
  • Figure 2026519948000025
    Figure 2026519948000025
  • Figure 2026519948000026
    Figure 2026519948000026
Patent Text Reader

Abstract

A method and system for predicting a subject's response to therapy by obtaining a first set of multiple nucleic acid sequences from genomic DNA samples taken from the subject's gut. From the nucleic acid sequences, multiple genomic abundance values ​​for multiple gut bacteria are determined. The model is applied to the multiple genomic abundance values, thereby obtaining a prediction of the subject's response to therapy as the model's output.
Need to check novelty before this filing date? Find Prior Art

Description

[Technical Field]

[0001] Cross-reference of related applications This application claims priority to U.S. Provisional Patent Application No. 63 / 498,177, filed on 25 April 2023, and U.S. Provisional Patent Application No. 63 / 595,189, filed on 1 November 2023, the contents of which are incorporated herein by reference in their entirety for all purposes.

[0002] A brief explanation of sequence listings This submission was prepared on April 24, 2024, and was submitted by mail on April 25, 2024, as an XML file on a read-only optical disc (DVD), incorporating by reference a “Sequence Listing XML” file named ST26_126146_5001_WO.XML containing sequence numbers 1 to 99534, having a size of 2,491,699 kilobytes in accordance with U.S. Patent Law Enforcement Rules 1.831 to 1.835. The entire Sequence Listing XML is incorporated herein by reference. [Background technology]

[0003] The human gut microbiome, a symbol of complex adaptive systems (CAS), harbors trillions of microorganisms and embodies rich phylogenetic diversity. This sophisticated ecosystem not only maintains active interactions with the host environment but also exhibits dynamic adaptability, thereby playing a crucial role in maintaining health and regulating disease susceptibility. Within the astonishing microbial diversity present in the human gut, the concept of the “core microbiome” has gained considerable traction. This core is hypothesized to incorporate microorganisms that are universally established in healthy individuals and therefore significantly contribute to maintaining homeostasis in nutrition, metabolism, immunity, and behavior. The essential role of this core microbiome is analogous to that of essential organs, highlighting its importance in overall health management.

[0004] Historically, core microbiome boundaries have relied primarily on assessments of presence or absence, complemented by the quantification of abundance or prevalence of specific taxa or gene / pathway within cohorts of healthy individuals. While these methodologies have undoubtedly provided important insights into the structural composition and potential functional properties of the microbiome, they may inadequately represent crucial ecological interactions that underscore the stability and resilience of this complex system. This oversight is particularly important when considering the critical roles these interactions play in the onset, progression, and remission of various disease states.

[0005] As a CAS (Core Microbiome), the microbiome adheres to modular design principles. Essential components of the CAS are organized into modules that interconnect to establish a network. Within the gut ecosystem, individual microorganisms are integrated into modular structures called guilds. Each guild functions as a consistent functional unit or module within the microbiome's CAS, despite containing microorganisms with diverse taxonomic backgrounds. Guild members exhibit cooperative behavior through coexistence, and different guilds may engage in cooperative or competitive interactions to form ecological networks. Therefore, characterizing the core microbiome from a guild perspective emerges as a promising and interesting approach. [Overview of the project]

[0006] Through co-evolution, the gut microbiota has established a crucial role in maintaining human health. However, identifying core microbiome components that reliably provide essential health benefits remains a significant challenge. These core members were hypothesized to maintain cooperative or competitive ecological interactions despite changes in environmental conditions. From high-fiber intervention trials in patients with type 2 diabetes and 26 diverse case-control datasets, 284 high-quality metagenomic assembly genomes were identified that consistently form stable pairs among individuals amid dietary changes or disease progression. These genomes correspond to two guilds, containing the most resilient and highly interconnected bacteria, and collectively correlate with a wide range of health conditions. One guild's genome was rich in genes for plant polysaccharide degradation and butyrate production, while the other was characterized by a high prevalence of genes associated with pathogenicity and antibiotic resistance. Using these genomes as references, a random forest model skillfully distinguished cases and controls across 15 different diseases and predicted patient responses to immunotherapy. Therefore, this core microbiome signature has potential as a unified therapeutic target for enhancing health.

[0007] Individual microbial cells are considered fundamental components or agents of the CAS, representing the major ecologically significant structural and functional units within the gut ecosystem. In some embodiments, the use of high-quality metagenomic assembly genomes (HQMAGs) is employed as a surrogate for profiling these microbial cells, thus providing a more comprehensive and realistic depiction of the microbiome compared to gene, pathway, or taxonomic-centric approaches. This perspective encompasses the full genetic potential and ecological identity of microorganisms and reinforces the essential ecological axiom that organisms (or more precisely, cells) interact with each other and their environment, rather than with genes / pathways or taxa.

[0008] To identify core components of the gut microbiome, we employed a genome-centered, reference-free approach that emphasizes the stability of ecological interactions. This methodology involves detecting stable relationships between HQMAGs across various conditions, where environmental perturbations to the gut ecosystem are introduced via dietary interventions or disease progression. These stable relationships can reveal core members of the microbiome. This aligns with the fundamental principle of systems biology, where relationship stability often signifies critical system components. In the context of the gut microbiome, these core components likely perform essential functions contributing to system resilience and host health, requiring their sustained presence and predictable interaction patterns. Therefore, revealing these stable relationships could reveal these critical microbial components and potentially expose the backbone of conserved ecological networks within the gut microbiome across individuals, populations, or health states.

[0009] We identified a robust, seesaw-like network containing two competing bacterial guilds. This network was identified by searching for stable genome pairs across individuals before and after high-fiber intervention (QD study, Figure 1A), or across co-existing networks between healthy and diseased cohorts. This seesaw-like network embodies both cooperative and competitive interactions and potentially exhibits a key feature of stable microbiome structure. HQMAGs identified within this novel core microbiome demonstrated correlation with various clinical parameters in patients with type 2 diabetes mellitus (T2DM) receiving high-fiber intervention. Furthermore, a universal machine learning model based on these HQMAGs of the seesaw-networked core microbiome successfully distinguished cases from controls in 26 independent datasets spanning 15 different diseases. In addition, these HQMAGs supported machine learning models for predicting individualized therapeutic responses to immunotherapy in patients with cancer or autoimmune diseases. This disclosure introduces a novel conceptual and analytical paradigm for studying the core gut microbiome. This paradigm provides enhanced health maintenance strategies and disease management, enabling personalized interventions that address the complex interactions of microbial relationships within the gut ecosystem.

[0010] Given the above background, a genome-centered MWAS was adopted, using high-quality draft genomes assembled from metagenomic datasets (metagenomic assembly genomes, MAGs) as the most important microbiome features for correlation analysis with fundamental components of the gut ecosystem and disease phenotypes. MAGs are also not independent microbiome features; they have ecological interactions with one another, such as competition or cooperation, and are organized into higher-level structures called “guilds” [5]. Each guild is potentially a functional unit of the gut ecosystem, and although its members may have a wide variety of taxonomic backgrounds, they exhibit coexisting behavior. Guilds have been shown to be positively or negatively correlated with disease phenotypes

[17] . Therefore, MAGs and their guild-level aggregations are ecologically significant features for identifying microbiome signatures associated with human disease.

[0011] Dysbiosis of the gut microbiome is associated with an increased risk of a wide range of human diseases [1, 2]. To date, many attempts have focused on identifying gene-based or taxonomically based microbial signatures as disease biomarkers. However, such signatures remain controversial [3, 4] and overlook the fact that gut bacterial strains do not exist independently but rather interact with each other to form a consistent functional group (also known as a "guild") that influences host health [5]. Embodiments may propose exploring strain-level microbiome signatures in the form of robust guilds through which the gut microbiome provides stable health-related functions to the host. Embodiments may show that two competing bacterial guilds are organized as two ends of a robustly stable seesaw-like network, and their abundances correlate with a wide range of chronic diseases. Of the 1,845 metagenomic assembly genomes (MAGs) in total, 141 experienced significant structural changes in the gut microbiome during a 3-month high-fiber intervention and 1-year follow-up in patients with type 2 diabetes (T2DM), while forming two competing guilds considering stable ecological relationships. The 50 genomes in Guild 1 contained more genes for plant polysaccharide degradation and butyrate production, while the 91 genomes in Guild 2 contained carriers of almost all pathogenicity or antibiotic resistance genes predicted from the 1,845 MAGs. Random forest regression models showed that the abundance distribution of the 141 genomes was associated with 41 out of 43 bioclinical parameters. Using these 141 MAGs as reference genomes, such a seesaw network not only proved detectable but also facilitated machine learning models for predictive classification between cases and controls for nine diseases, including T2DM, atherosclerosis, hypertension, cirrhosis, inflammatory bowel disease, colorectal cancer, ankylosing spondylitis, schizophrenia, and Parkinson's disease, in 12 independent metagenomic datasets from 1874 participants spanning ethnic and geographical ranges. Two seesaw-networked guilds function as core microbiomes, and their balance can be regulated for disease risk management.

[0012] In one embodiment, the present disclosure provides a pharmaceutical composition comprising a first intestinal microorganism selected from those microorganisms listed in Figures 13A to 13XX. In some embodiments, the composition further comprises pharmaceutically acceptable excipients.

[0013] In one embodiment, the present disclosure provides a method for treating a subject in need of treatment, comprising administering to the subject a therapeutically effective amount of a pharmaceutical composition described herein. In some embodiments, the administration is by fecal microbiome transplantation. In some embodiments, the administration is by direct transplantation into the subject's intestines. In some embodiments, the administration is by oral ingestion.

[0014] In one aspect, the Disclosure provides a method and system for training a model to predict a subject's response to therapy. The method includes, in a computer system having at least one processor and memory for storing one or more programs for execution by one or more processors, electronically obtaining, for each subject in a plurality of subjects, (i) corresponding genomic abundance values ​​for each subject at a time before therapy is received, wherein each corresponding genomic abundance value includes, for each of the multiple gut microbiota in a plurality of gut microbiota, a corresponding value to the abundance of the genome of each gut microbiota in a corresponding biological sample from the gut of each subject, and (ii) an index of each subject's response to therapy, provided that each subject in a plurality of subjects is receiving therapy for a disability. The method also includes inputting information about each of the multiple training subjects into a model comprising multiple parameters, wherein the model applies the multiple parameters to the information by at least 10,000 calculations to obtain a corresponding output from the model for each training subject, the corresponding output comprising a prediction of the response of each training subject to a therapy, the information about each training subject comprising a corresponding genomic abundance value for each of the multiple gut microbiota, and the multiple gut microbiota are selected from Table 1, Table 2, or Figures 13A to 13XX. The method also includes adjusting the multiple parameters for each of the first multiple training subjects based on one or more differences between (i) a corresponding output from the model and (ii) a corresponding index of the response of each training subject to a therapy.

[0015] Accordingly, another aspect of the present disclosure provides a method and system for using a model to predict a subject's response to therapy. The method includes obtaining, in electronic form, a plurality of genomic abundance values ​​for each of a plurality of intestinal microorganisms selected from Table 1, Table 2, or Figures 13A to 13XX, for each intestinal microorganism in a biological sample from a subject, including corresponding abundance values ​​for the genome of each species of intestinal microorganism in the plurality of intestinal microorganisms. The method also includes inputting the plurality of genomic abundance values ​​into a model comprising a plurality of parameters, the model applying the plurality of parameters to the plurality of genomic abundance values ​​by at least 10,000 calculations to generate, as an output from the model, a prediction of the subject's response to therapy.

[0016] As disclosed herein, any embodiment disclosed herein may be applied to any other embodiment where applicable.

[0017] Further aspects and advantages of the Disclosure will be readily apparent to those skilled in the art from the following detailed description, in which only exemplary embodiments of the Disclosure are shown and described. As will be understood, other different embodiments of the Disclosure are possible, and some of their details can be modified in various obvious ways without departing from the Disclosure. Accordingly, the drawings and description should be considered illustrative and not limiting in nature.

[0018] Accordingly, one aspect of the present invention provides a method for training a model for predicting a target response to therapy in a computer system having one or more processors and memory for storing one or more programs for execution by the one or more processors.

[0019] In some embodiments, the method includes, in electronic form, obtaining for each of the multiple training subjects, each training subject receiving therapy for a disorder, (i) corresponding multiple genome abundance values ​​for each training subject at a point prior to therapy, wherein each corresponding multiple genome abundance value includes, for each of the multiple gut microbiota, a corresponding value to the genome abundance of each gut microbiota in a corresponding biological sample from the gut of each training subject; and (ii) an index of each training subject's response to the therapy of each training subject.

[0020] In some such embodiments, the method includes sequencing the genomic DNA from a corresponding biological sample from the intestine of each of the multiple training subjects, thereby obtaining a number of corresponding nucleic acid sequences of at least 100,000.

[0021] In some such embodiments, the method includes obtaining, in electronic form, a number of corresponding nucleic acid sequences, at least 100,000, for each of the training subjects among a plurality of training subjects, corresponding to the genomic DNA from the corresponding biological sample from the intestine of each training subject.

[0022] In some such embodiments, the method includes determining, for each individual intestinal microorganism among a plurality of intestinal microorganisms, a corresponding value to the abundance of the genome of each intestinal microorganism from a plurality of corresponding nucleic acid sequences of at least 100,000.

[0023] In some such embodiments, the method includes: assembling a corresponding set of gut microbiota genomes for each of the set of training subjects in a set of training subjects by a metagenomic de novo sequence assembly from a corresponding set of at least 100,000 nucleic acid sequences in electronic form; and calculating a corresponding value for the abundance of the genome of each gut microbiota for each of the set of gut microbiota based on the prevalence of each nucleic acid sequence in a set of at least 100,000 nucleic acid sequences used to assemble each gut microbiota genome in the set of gut microbiota genomes corresponding to each gut microbiota.

[0024] In some such embodiments, the method includes assigning each nucleic acid sequence in a corresponding plurality of at least 100,000 sequences to each intestinal microorganism in a plurality of intestinal microorganisms for each subject in a plurality of training subjects, thereby generating a corresponding count of each nucleic acid sequence in the corresponding plurality of nucleic acid sequences assigned to each intestinal microorganism for each intestinal microorganism in a plurality of intestinal microorganisms, and determining a corresponding genomic abundance value for each intestinal microorganism based on the corresponding count of each nucleic acid sequence assigned to each intestinal microorganism for each intestinal microorganism in a plurality of intestinal microorganisms.

[0025] In some such embodiments, the multiple intestinal microorganisms include at least 20 intestinal microorganisms selected from Table 1, Table 2, or Figures 13A to 13XX.

[0026] In some such embodiments, the multiple intestinal microorganisms include at least 20 microorganisms selected from those microorganisms in Table 1, Table 2, or Figures 13A to 13XX, each having at least two binding affinity.

[0027] In some such embodiments, the biological sample from the intestine of each training subject is a fecal sample from each subject.

[0028] In some such embodiments, the therapy is a biological therapy, immunotherapy, chemotherapy, radiotherapy, gene therapy, hormone therapy, photodynamic therapy, targeted therapy, small molecules, antibodies, polynucleotides, natural compounds, immunomodulators, bone marrow therapy, stem cell therapy, surgical therapy, induction therapy, maintenance therapy, or a combination thereof.

[0029] In some such embodiments, the disorder is selected from the group consisting of type 2 diabetes, hypertension, schizophrenia, atherosclerotic cardiovascular disease (ACVD), cirrhosis (LC), inflammatory bowel disease (IBD), colorectal cancer (CRC), ankylosing spondylitis (AS), and Parkinson's disease (PD), inflammatory bowel disease (IBD), rheumatoid arthritis (RA), or progressive melanoma, and B-cell lymphoma.

[0030] In some embodiments, the disorder is cancer.

[0031] In some embodiments, the method also includes inputting information about each of the multiple training subjects into a model comprising multiple parameters, wherein the model applies the multiple parameters to the information by at least 10,000 calculations to obtain a corresponding output from the model for each training subject, the corresponding output comprising a prediction of the response of each training subject to a therapy, the information about each training subject comprising a corresponding genomic abundance value for each of the multiple gut microbiota, and the input comprising the multiple gut microbiota, selected from Table 1, Table 2, or Figures 13A to 13XX.

[0032] In some such embodiments, the prediction of each trainee's response is the class output of each trainee's response among multiple possible responses.

[0033] In some such embodiments, the prediction of each training subject's response is a probabilistic output for each training subject's response.

[0034] In some such embodiments, the model is a neural network algorithm, a support vector machine algorithm, a naive Bayes algorithm, a nearest neighbor algorithm, a boosted tree algorithm, a random forest algorithm, a convolutional neural network algorithm, a decision tree algorithm, a regression algorithm, or a clustering algorithm.

[0035] In some such embodiments, the number of parameters is at least 1,000, at least 10,000, at least 15,000, at least 50,000, at least 100,000, at least 250,000, at least 500,000, or at least 1,000,000 parameters.

[0036] In some such embodiments, the model applies multiple parameters to the information by performing calculations at least 25,000, at least 50,000, at least 100,000, at least 250,000, at least 500,000, or at least 1,000,000 times to obtain corresponding outputs from the model for each training subject.

[0037] In some embodiments, the method includes adjusting several parameters for each of the multiple training subjects based on one or more differences between (i) a corresponding output from a model and (ii) a corresponding index of each training subject's response to the therapy.

[0038] Another aspect of the present disclosure provides a method for using a model to predict a subject's response to a therapy for a disorder in a computer system having one or more processors and memory for storing one or more programs for execution by the one or more processors.

[0039] In some embodiments, the method includes obtaining, in electronic format, multiple genome abundance values ​​for each of the multiple intestinal microorganisms selected from Table 1, Table 2, or Figures 13A to 13XX, including the corresponding abundance values ​​for the genome of each species of intestinal bacterium in the multiple intestinal microorganisms in a biological sample from a subject.

[0040] In some such embodiments, the method includes sequencing genomic DNA from a biological sample from the intestine of a subject to obtain a plurality of at least 100,000 nucleic acid sequences.

[0041] In some such embodiments, the method includes obtaining, in electronic form, a plurality of at least 100,000 nucleic acid sequences for genomic DNA from a biological sample from the intestine of a subject.

[0042] In some such embodiments, the method includes determining a corresponding value for the genome abundance of each individual intestinal microorganism among a plurality of intestinal microorganisms from a plurality of at least 100,000 nucleic acid sequences.

[0043] In some such embodiments, the method includes assembling a plurality of corresponding gut microbiota genomes by metagenomic de novo sequence assembly from a plurality of at least 100,000 nucleic acid sequences in electronic form, and for each of the plurality of gut microbiota, calculating a corresponding value for the abundance of the genome of each gut microbiota based on the prevalence of each nucleic acid sequence in a plurality of at least 100,000 nucleic acid sequences used to assemble each gut microbiota genome in the plurality of gut microbiota genomes corresponding to each gut microbiota.

[0044] In some such embodiments, the method includes assigning each nucleic acid sequence in a plurality of at least 100,000 sequences to each intestinal microorganism in a plurality of intestinal microorganisms, thereby generating a corresponding count of each nucleic acid sequence in the plurality of nucleic acid sequences assigned to each intestinal microorganism for each intestinal microorganism in a plurality of intestinal microorganisms, and determining a corresponding genomic abundance value for each intestinal microorganism based on the corresponding count of each nucleic acid sequence assigned to each intestinal microorganism for each intestinal microorganism in a plurality of intestinal microorganisms.

[0045] In some such embodiments, the multiple intestinal microorganisms include at least 20 microorganisms selected from those microorganisms in Table 1, Table 2, or Figures 13A to 13XX, each having at least two binding affinity.

[0046] In some such embodiments, the biological sample from the intestine of the subject is a fecal sample.

[0047] In some such embodiments, the therapy is a biological therapy, immunotherapy, chemotherapy, radiotherapy, gene therapy, hormone therapy, photodynamic therapy, targeted therapy, small molecules, antibodies, polynucleotides, natural compounds, immunomodulators, bone marrow therapy, stem cell therapy, surgical therapy, induction therapy, maintenance therapy, or a combination thereof.

[0048] In some such embodiments, the disorder is selected from the group consisting of type 2 diabetes, hypertension, schizophrenia, atherosclerotic cardiovascular disease (ACVD), cirrhosis (LC), inflammatory bowel disease (IBD), colorectal cancer (CRC), ankylosing spondylitis (AS), and Parkinson's disease (PD), inflammatory bowel disease (IBD), rheumatoid arthritis (RA), progressive melanoma, and B-cell lymphoma.

[0049] In some embodiments, the disorder is cancer.

[0050] In some embodiments, the method includes inputting a plurality of genomic abundance values ​​into a model comprising a plurality of parameters, wherein the model applies the plurality of parameters to the plurality of genomic abundance values ​​by at least 10,000 calculations to generate predictions of the subject's response to therapy as an output from the model.

[0051] In some such embodiments, the prediction of the target response is the class output of each response among the multiple possible responses for each target.

[0052] In some such embodiments, the prediction of the target response is a probability output for the response of each target.

[0053] In some such embodiments, the model is a neural network algorithm, a support vector machine algorithm, a naive Bayes algorithm, a nearest neighbor algorithm, a boosted tree algorithm, a random forest algorithm, a convolutional neural network algorithm, a decision tree algorithm, a regression algorithm, or a clustering algorithm.

[0054] In some such embodiments, the number of parameters is at least 1,000, at least 10,000, at least 15,000, at least 50,000, at least 100,000, at least 250,000, at least 500,000, or at least 1,000,000 parameters.

[0055] In some such embodiments, the model applies multiple parameters to the information by performing calculations at least 25,000, at least 50,000, at least 100,000, at least 250,000, at least 500,000, or at least 1,000,000 times to obtain corresponding outputs for each subject from the model.

[0056] In some embodiments, the method includes i) administering a therapy to a subject if the predicted response of the subject to the therapy satisfies the threshold likelihood that the subject will respond well to the therapy, and ii) treating the subject by administering one or more of a plurality of intestinal microorganisms to the subject if the predicted response of the subject to the therapy does not satisfy the threshold likelihood that the subject will respond well to the therapy.

[0057] Another aspect of this disclosure provides a computer system comprising one or more processors and a non-temporary computer-readable medium containing computer-executable instructions that, when executed by one or more processors, cause the processors to perform the methods described herein.

[0058] Another aspect of this disclosure provides a non-temporary computer-readable storage medium that, when executed by a computer system, stores instructions causing the computer system to perform any of the methods described herein. [Brief explanation of the drawing]

[0059] [Figure 1] The following are illustrative block diagrams of exemplary computing devices according to some embodiments of the present disclosure. [Figure 2A] This disclosure provides a collective flowchart of processes and features for training models to predict a subject's response to therapy for a disorder, according to several embodiments of this disclosure. [Figure 2B] This disclosure provides a collective flowchart of processes and features for training models to predict a subject's response to therapy for a disorder, according to several embodiments of this disclosure. [Figure 2C] This disclosure provides a collective flowchart of processes and features for training models to predict a subject's response to therapy for a disorder, according to several embodiments of this disclosure. [Figure 2D]This disclosure provides a collective flowchart of processes and features for training models to predict a subject's response to therapy for a disorder, according to several embodiments of this disclosure. [Figure 3A] This disclosure provides a collective flowchart of processes and features for predicting a subject's response to therapy for a disorder, according to several embodiments of this disclosure. [Figure 3B] This disclosure provides a collective flowchart of processes and features for predicting a subject's response to therapy for a disorder, according to several embodiments of this disclosure. [Figure 3C] This disclosure provides a collective flowchart of processes and features for predicting a subject's response to therapy for a disorder, according to several embodiments of this disclosure. [Figure 4A] This study collectively demonstrates that reversible changes in the gut microbiota induced by a high-fiber diet are associated with corresponding changes in metabolic phenotypes in patients with type 2 diabetes mellitus (T2DM). The study design of the QD trial included written informed consent, personal information questionnaires, and HbA1c-based screening during the run-in period. Health checkups and sample collections were performed post-run-in at baseline (M0), 3 months after high-fiber intervention (W) or normal diet (U) (M3), and 1 year after discontinuation of high-fiber intervention (M15). [Figure 4B] This study collectively demonstrates that reversible changes in the gut microbiota induced by a high-fiber diet are associated with corresponding changes in metabolic phenotype in patients with type 2 diabetes mellitus (T2DM). (Changes in fiber intake.) [Figure 4C] This study collectively demonstrates that reversible changes in the gut microbiome induced by a high-fiber diet are associated with corresponding changes in metabolic phenotypes in patients with type 2 diabetes mellitus (T2DM). Overall changes in the gut microbiome are shown by principal coordinate analysis based on Breakertis distances with 1845 HQMAGs. [Figure 4D]This study collectively demonstrates that reversible changes in the gut microbiota induced by a high-fiber diet are associated with corresponding changes in metabolic phenotypes in patients with type 2 diabetes mellitus (T2DM). Mean Breakertis distances between groups. A Permanova test (sorted into 9,999) was performed to compare groups. *P<0.05 and ***P<0.001. The intensity of the square color indicates the magnitude of the mean Breakertis distance. [Figure 4E] Collectively, this study demonstrates that reversible changes in the gut microbiota induced by a high-fiber diet are associated with corresponding changes in metabolic phenotypes in patients with type 2 diabetes mellitus (T2DM), specifically changes in HbA1c. [Figure 4F] Collectively demonstrate that reversible changes in the gut microbiota induced by a high-fiber diet are associated with corresponding changes in metabolic phenotypes in patients with type 2 diabetes mellitus (T2DM). Percentage of participants achieving adequate glycemic control. [Figure 4G] This study collectively demonstrates that reversible changes in the gut microbiota induced by a high-fiber diet are associated with corresponding changes in metabolic phenotypes in patients with type 2 diabetes mellitus (T2DM). Fasting blood glucose. [Figure 4H] This study collectively demonstrates that reversible changes in the gut microbiota induced by a high-fiber diet are associated with corresponding changes in metabolic phenotypes in patients with type 2 diabetes mellitus (T2DM). For area under the curve (AUC) of glucose in the diet tolerance test (MTT), (E), (G), and (H), data are presented as the percentage change from baseline (±SEM). Friedman tests followed by Nemeny post-hoc tests were used for comparisons within the same group, with compact letters indicating significance (P<0.05). n=67 in the W group and n=28 in the U group. Mann-Whitney tests (two-tailed) were used for comparisons between W and U at the same time point, with *P<0.05, **P<0.01, and ***P<0.001. n=74 for W(M0) (n=72 for panel H), n=74 for W(M3), n=67 for W(M15), n=36 for U(M0), n=36 for U(M3), and n=28 for U(M15). [Figure 5A]This collectively illustrates how two competing bacterial guilds associated with HbA1c levels form a robust seesaw-like network within the ecosystem, despite substantial overall changes in the gut microbiota induced by high-fiber interventions. The distribution of different types of correlations between genome pairs during testing is shown. Three letters indicate correlations between genome pairs at subsequent M0, M3, and M15 (N for negative, P for positive, and U for uncorrelated). Stable correlations, NNN, and PPP are highlighted. [Figure 5B] This study collectively illustrates how two competing bacterial guilds associated with HbA1c levels form a robust seesaw-like network within the ecosystem, despite substantial overall changes in the gut microbiota induced by high-fiber interventions. Correlations between genomic clusters and HbA1c were observed using a linear mixed-effects model with the MaAslin2 package. Abundances were log-transformed. Subjects were used as random effects. N=67. *BH-adjusted P<0.05, ***BH-adjusted P<0.001. [Figure 6A] This study collectively illustrates how genomes within two competing guilds predict metabolic health outcomes in T2DM patients in the QD trial and distinguish cases from controls across seven diseases in 11 independent case-control metagenomic datasets (Case-Control Dataset Collection I). It examines the changes in mean-range-normalized, robust, centrally-log-ratio (rclr) transformed abundances and their ratios across the trial in the W group. Differences between time points were analyzed using the Friedman test, followed by the Nemeny post-hoc test. Compact letters reflect significance at P<0.05. [Figure 6B]This study collectively illustrates how genomes within two competing guilds predict metabolic health outcomes in T2DM patients in the QD trial and how they distinguish cases from controls across seven diseases in 11 independent case-control metagenomic datasets (Case-Control Dataset Collection I). Prediction of clinical parameters using genomes from two competing guilds. A linear mixed-effects model was trained with subjects as random effects, based on mean range-normalized rclr abundances in Guild 1 and Guild 2, as well as clinical parameters in M0 and M3. Clinical parameters were predicted based on the trained model using mean range-normalized rclr abundances in Guild 1 and Guild 2 at M15. The bar plot shows the Pearson correlation coefficients between predicted and measured clinical parameters at M15. An asterisk before the parameter name indicates the significance of the Pearson correlation. P-values ​​were adjusted using the Benjamini & Hochberg method. *Adjusted P<0.05, **Adjusted P<0.01, and Adjusted ***P<0.001. BMI (Body Mass Index); SBP (Systolic Blood Pressure); DBP (Diastolic Blood Pressure); WC (Waist Circumference); HP (Hip Circumference); TNF-α (Tumor Necrosis Factor-α); WBC (White Blood Cell Count); CRP (C-Reactive Protein); LBP (Lipopolysaccharide-Binding Protein); TC (Total Cholesterol); TG (Triglycerides); Lpa (Lipoprotein a); HDL (High-Density Lipoprotein); APOA (Apolipoprotein A); LDL (Low-Density Lipoprotein); APOB (Apolipoprotein B); GFR (MDRR); CysC (Cystatin C); ACR (Urinary Albumin / creatinine ratio; IMT, intima-media thickness; DAN, diabetic autonomic neuropathy score; MHR, mean heart rate; SDNN, standard deviation of NN intervals; SDANN, mean NN interval standard deviation calculated over 5 minutes; SDNNIndex, mean of NN interval standard deviations for 5-minute segments; rMSSD, root mean square of differences between consecutive NN intervals; pNN50, percentage of interval differences between consecutive NN intervals greater than 50 ms; TP, total power; VLF, very low frequency power; LF, low frequency power; HF, high frequency power; DPN, diabetic peripheral neuropathy score. [Figure 6C1]This study collectively illustrates how genomes within two competing guilds predict metabolic health outcomes in T2DM patients in the QD trial and how they differentiate cases from controls across seven diseases in 11 independent case-control metagenomic datasets (Case-Control Dataset Collection I). Differences in genetic capacity for carbohydrate substrate utilization (CAZy), short-chain fatty acid production (SCFA), antibiotic resistance genes (ARG), and pathogenicity factor genes (VF) are shown. Heatmaps show the proportion (CAZy) or gene copy number (SCFA, ARG, and VF) of each category in each genome. For carbohydrate substrate utilization, CAZy genes were predicted in each genome. The proportion of CAZy genes for a particular substrate was calculated by dividing the number of CAZy genes involved in its utilization by the total number of CAZy genes. Arabinoxylan-related CAZy family: CE1, CE2, CE4, CE6, CE7, GH10, GH11, GH115, GH43, GH51, GH67, GH3, and GH5; Cellulose-related: GH1, GH44, GH48, GH8, GH9, GH3, and GH5; Insulin-related: GH32 and GH91; Mucin-related family: GH1, GH2, GH3, GH4, GH18, GH19, GH20, GH29, GH33, GH38, GH58, GH79, GH84, GH85, GH88, GH89, GH92, GH95, GH98, GH99, GH101, GH105, GH109, ​​GH110, GH113, PL6, PL8, PL12, PL13 and PL21, pectin-related: CE12, CE8, GH28, PL1 and PL9, starch-related: GH13, GH31 and GH97. Regarding short-chain fatty acid production, FTHFS is a formate-tetrahydrofolate ligase for acetic acid production, SccpC is a propionyl-CoA succinate-CoA transferase, and Pct is a propionate-CoA transferase for propionic acid production, But is a butyryl-coenzyme A (butyryl-CoA) acetate-CoA transferase, Buk is a butyrate kinase, 4Hbt is a butyryl-CoA 4-hydroxybutyrate-CoA transferase, and Ato is a butyryl-CoA acetoacetate-CoA transferase for butyrate production (AtoA: alpha subunit, AtoD: beta subunit).The differences between Guild 1 and Guild 2 were analyzed using the Mann-Whitney test (two-tailed). #P<0.1, *P<0.05, **P<0.01, and ***P<0.001. Number of genomes in Guild 1 (green bars): n=50, Guild 2 (purple bars): n=91. [Figure 6C2]This study collectively illustrates how genomes within two competing guilds predict metabolic health outcomes in T2DM patients in the QD trial and how they differentiate cases from controls across seven diseases in 11 independent case-control metagenomic datasets (Case-Control Dataset Collection I). Differences in genetic capacity for carbohydrate substrate utilization (CAZy), short-chain fatty acid production (SCFA), antibiotic resistance genes (ARG), and pathogenicity factor genes (VF) are shown. Heatmaps show the proportion (CAZy) or gene copy number (SCFA, ARG, and VF) of each category in each genome. For carbohydrate substrate utilization, CAZy genes were predicted in each genome. The proportion of CAZy genes for a particular substrate was calculated by dividing the number of CAZy genes involved in its utilization by the total number of CAZy genes. Arabinoxylan-related CAZy family: CE1, CE2, CE4, CE6, CE7, GH10, GH11, GH115, GH43, GH51, GH67, GH3, and GH5; Cellulose-related: GH1, GH44, GH48, GH8, GH9, GH3, and GH5; Insulin-related: GH32 and GH91; Mucin-related family: GH1, GH2, GH3, GH4, GH18, GH19, GH20, GH29, GH33, GH38, GH58, GH79, GH84, GH85, GH88, GH89, GH92, GH95, GH98, GH99, GH101, GH105, GH109, ​​GH110, GH113, PL6, PL8, PL12, PL13 and PL21, pectin-related: CE12, CE8, GH28, PL1 and PL9, starch-related: GH13, GH31 and GH97. Regarding short-chain fatty acid production, FTHFS is a formate-tetrahydrofolate ligase for acetic acid production, SccpC is a propionyl-CoA succinate-CoA transferase, and Pct is a propionate-CoA transferase for propionic acid production, But is a butyryl-coenzyme A (butyryl-CoA) acetate-CoA transferase, Buk is a butyrate kinase, 4Hbt is a butyryl-CoA 4-hydroxybutyrate-CoA transferase, and Ato is a butyryl-CoA acetoacetate-CoA transferase for butyrate production (AtoA: alpha subunit, AtoD: beta subunit).The differences between Guild 1 and Guild 2 were analyzed using the Mann-Whitney test (two-tailed). #P<0.1, *P<0.05, **P<0.01, and ***P<0.001. Number of genomes in Guild 1 (green bars): n=50, Guild 2 (purple bars): n=91. [Figure 6D] This study collectively illustrates how genomes within two competing guilds predict metabolic health outcomes in T2DM patients in the QD trial and how they differentiate cases from controls across seven diseases in 11 independent case-control metagenomic datasets (Case-Control Dataset Collection I). Datasets from 11 different datasets covering seven diseases, including type 2 diabetes (T2D), cirrhosis (LC), ankylosing spondylitis (AS), atherosclerotic cardiovascular disease (ACVD), schizophrenia (SCZ), colorectal cancer (CRC), and inflammatory bowel disease (IBD), were collected as Case-Control Dataset Collection I. The sample size for each case-control dataset is indicated by the red and green numbers in the left panel, respectively. [Figure 6E1] This study collectively illustrates how genomes within two competing guilds predict metabolic health outcomes in T2DM patients in the QD trial and how they distinguish cases from controls across seven diseases in 11 independent case-control metagenomic datasets (Case-Control Dataset Collection I). For each dataset in the collection, metagenomic reads were recruited to 141 genomes in the two competing guilds of QD, and a random forest classification model with skip-one cross-validation was trained on an abundance matrix of 141 genomes whose genomic abundances in each sample were estimated, to classify cases and controls in each dataset. The ROC curves and area under the curve (AUC) are shown here. [Figure 6E2]This study collectively illustrates how genomes within two competing guilds predict metabolic health outcomes in T2DM patients in the QD trial and how they distinguish cases from controls across seven diseases in 11 independent case-control metagenomic datasets (Case-Control Dataset Collection I). For each dataset in the collection, metagenomic reads were recruited to 141 genomes in the two competing guilds of QD, and a random forest classification model with skip-one cross-validation was trained on an abundance matrix of 141 genomes whose genomic abundances in each sample were estimated, to classify cases and controls in each dataset. The ROC curves and area under the curve (AUC) are shown here. [Figure 6E3] This study collectively illustrates how genomes within two competing guilds predict metabolic health outcomes in T2DM patients in the QD trial and how they distinguish cases from controls across seven diseases in 11 independent case-control metagenomic datasets (Case-Control Dataset Collection I). For each dataset in the collection, metagenomic reads were recruited to 141 genomes in the two competing guilds of QD, and a random forest classification model with skip-one cross-validation was trained on an abundance matrix of 141 genomes whose genomic abundances in each sample were estimated, to classify cases and controls in each dataset. The ROC curves and area under the curve (AUC) are shown here. [Figure 6E4] This study collectively illustrates how genomes within two competing guilds predict metabolic health outcomes in T2DM patients in the QD trial and how they distinguish cases from controls across seven diseases in 11 independent case-control metagenomic datasets (Case-Control Dataset Collection I). For each dataset in the collection, metagenomic reads were recruited to 141 genomes in the two competing guilds of QD, and a random forest classification model with skip-one cross-validation was trained on an abundance matrix of 141 genomes whose genomic abundances in each sample were estimated, to classify cases and controls in each dataset. The ROC curves and area under the curve (AUC) are shown here. [Figure 6E5]This study collectively illustrates how genomes within two competing guilds predict metabolic health outcomes in T2DM patients in the QD trial and how they distinguish cases from controls across seven diseases in 11 independent case-control metagenomic datasets (Case-Control Dataset Collection I). For each dataset in the collection, metagenomic reads were recruited to 141 genomes in the two competing guilds of QD, and a random forest classification model with skip-one cross-validation was trained on an abundance matrix of 141 genomes whose genomic abundances in each sample were estimated, to classify cases and controls in each dataset. The ROC curves and area under the curve (AUC) are shown here. [Figure 6E6] This study collectively illustrates how genomes within two competing guilds predict metabolic health outcomes in T2DM patients in the QD trial and how they distinguish cases from controls across seven diseases in 11 independent case-control metagenomic datasets (Case-Control Dataset Collection I). For each dataset in the collection, metagenomic reads were recruited to 141 genomes in the two competing guilds of QD, and a random forest classification model with skip-one cross-validation was trained on an abundance matrix of 141 genomes whose genomic abundances in each sample were estimated, to classify cases and controls in each dataset. The ROC curves and area under the curve (AUC) are shown here. [Figure 6E7] This study collectively illustrates how genomes within two competing guilds predict metabolic health outcomes in T2DM patients in the QD trial and how they distinguish cases from controls across seven diseases in 11 independent case-control metagenomic datasets (Case-Control Dataset Collection I). For each dataset in the collection, metagenomic reads were recruited to 141 genomes in the two competing guilds of QD, and a random forest classification model with skip-one cross-validation was trained on an abundance matrix of 141 genomes whose genomic abundances in each sample were estimated, to classify cases and controls in each dataset. The ROC curves and area under the curve (AUC) are shown here. [Figure 6E8]This study collectively illustrates how genomes within two competing guilds predict metabolic health outcomes in T2DM patients in the QD trial and how they distinguish cases from controls across seven diseases in 11 independent case-control metagenomic datasets (Case-Control Dataset Collection I). For each dataset in the collection, metagenomic reads were recruited to 141 genomes in the two competing guilds of QD, and a random forest classification model with skip-one cross-validation was trained on an abundance matrix of 141 genomes whose genomic abundances in each sample were estimated, to classify cases and controls in each dataset. The ROC curves and area under the curve (AUC) are shown here. [Figure 6E9] This study collectively illustrates how genomes within two competing guilds predict metabolic health outcomes in T2DM patients in the QD trial and how they distinguish cases from controls across seven diseases in 11 independent case-control metagenomic datasets (Case-Control Dataset Collection I). For each dataset in the collection, metagenomic reads were recruited to 141 genomes in the two competing guilds of QD, and a random forest classification model with skip-one cross-validation was trained on an abundance matrix of 141 genomes whose genomic abundances in each sample were estimated, to classify cases and controls in each dataset. The ROC curves and area under the curve (AUC) are shown here. [Figure 6E10] This study collectively illustrates how genomes within two competing guilds predict metabolic health outcomes in T2DM patients in the QD trial and how they distinguish cases from controls across seven diseases in 11 independent case-control metagenomic datasets (Case-Control Dataset Collection I). For each dataset in the collection, metagenomic reads were recruited to 141 genomes in the two competing guilds of QD, and a random forest classification model with skip-one cross-validation was trained on an abundance matrix of 141 genomes whose genomic abundances in each sample were estimated, to classify cases and controls in each dataset. The ROC curves and area under the curve (AUC) are shown here. [Figure 6E11]This study collectively illustrates how genomes within two competing guilds predict metabolic health outcomes in T2DM patients in the QD trial and how they distinguish cases from controls across seven diseases in 11 independent case-control metagenomic datasets (Case-Control Dataset Collection I). For each dataset in the collection, metagenomic reads were recruited to 141 genomes in the two competing guilds of QD, and a random forest classification model with skip-one cross-validation was trained on an abundance matrix of 141 genomes whose genomic abundances in each sample were estimated, to classify cases and controls in each dataset. The ROC curves and area under the curve (AUC) are shown here. [Figure 7A] This study collectively illustrates genomes forming two competing guilds, identified from disease-specific case-control datasets, demonstrating significant effectiveness in classifying cases from controls across independent datasets for different diseases within Case-Control Dataset Collection I. The study identifies two competing guilds in a seesaw network within Case-Control Dataset Collection I. The sample size for each case-control study is indicated by the red and green numbers in the left panel, respectively. Case-Control Dataset Collection I contains 11 published metagenomic case-control datasets for seven diseases, including type 2 diabetes (T2D), cirrhosis (LC), ankylosing spondylitis (AS), atherosclerotic cardiovascular disease (ACVD), schizophrenia (SCZ), colorectal cancer (CRC), and inflammatory bowel disease (IBD). CRC was analyzed by combining datasets from three studies. IBD was analyzed by combining datasets from two studies. The correlation ratios following the seesaw network pattern of two competing guilds (i.e., positive edges within each guild and negative edges between the two guilds) are shown in yellow, while the ratio of negative correlations within each guild to positive correlations between guilds is represented by stacked bars of 100% in black. [Figure 7B]This study collectively illustrates genomes forming two competing guilds, identified from a single-disease specific case-control dataset, and demonstrates significant effectiveness in classifying cases from controls across independent datasets for different diseases within Case-Control Dataset Collection I. Two competing guilds found in one dataset were used as predictors to classify cases and controls across that dataset and 10 other datasets. A random forest classification model with skipped cross-validation was applied to each dataset in each set of two competing guilds. Area under the ROC curve (AUC) values ​​are shown in a heatmap. [Figure 8A]This study collectively illustrates combined core genomes extracted from all identified competing guilds to effectively differentiate cases from controls across a broader range of diseases and predict treatment outcomes in independent datasets. Combined genomes and combined core genomes were identified from eight sets of two competing guilds, generated from the QD dataset and case-control dataset collection I. All HQMAGs in each set of the two competing guilds were dereplicated based on a 99% mean nucleotide identity (ANI) cutoff between the two genomes. 788 non-redundant HQMAGs were obtained as combined genomes from all eight sets of the two competing guilds. Random forest classification models with one-out cross-validation were constructed based on the 788 HQMAGs in each dataset. HQMAGs were ranked based on their importance across all models. One HQMAG was then removed from each dataset, starting with the least important HQMAG (highest importance rank), to perform the random forest classification model on each dataset. For each dataset, HQMAG numbers were ranked based on the area under the ROC curve (AUC). The scatter plot shows the relationship between the number of HQMAGs and model performance. The y-axis is the sum of ranks based on AUC values ​​(smaller values ​​indicate better performance). 302 HQMAGs achieved the best performance. After excluding 18 HQMAGs that showed inconsistent C1A and C1B assignments across the dataset, a total of 284 HQMAGs were retained from the 302 HQMAGs as the combined core genomes of all 8 sets of the two competing guilds. [Figure 8B1]Collectively illustrate combined core genomes drawn from all identified competing guilds to effectively differentiate cases from controls across a broader range of diseases and predict treatment outcomes in independent datasets. Combined cores are used in one dataset for ankylosing spondylitis (AS#2): n=85 cases, n=55 controls; one for autism spectrum disorder (ASD): n=64 cases, n=64 controls; one for Behçet's disease (BD): n=24 cases, n=52 controls; one for COVID-19: n=47 cases, n=19 controls; three for colorectal cancer (CRC): CRC#4 n=40 cases, n=40 controls; CRC#5 n=61 cases, n=52 controls; CRC#6 n=52 cases, n=52 controls; one for Graves' disease (GD): n=88 cases, n=62 controls; and hypertension ( The following were used as predictors for the Case-Control Dataset Collection II, which has 15 published metagenomics case-control datasets for 10 diseases: two for HT (Heatstroke) (HT#1 cases n=60, controls n=56; HT#2 cases n=99, controls n=41); one for multiple sclerosis (MS) (cases n=24, controls n=24); three for pancreatic cancer (PC) (PC#1 cases n=43, controls n=235; PC#2 cases n=57, controls n=50; PC#3 cases n=44, controls n=32); and one for Parkinson's disease (PD) (cases n=39, controls n=40). A random forest classification model with skipped cross-validation was applied to each dataset. [Figure 8B2]Collectively illustrate combined core genomes drawn from all identified competing guilds to effectively differentiate cases from controls across a broader range of diseases and predict treatment outcomes in independent datasets. Combined cores are used in one dataset for ankylosing spondylitis (AS#2): n=85 cases, n=55 controls; one for autism spectrum disorder (ASD): n=64 cases, n=64 controls; one for Behçet's disease (BD): n=24 cases, n=52 controls; one for COVID-19: n=47 cases, n=19 controls; three for colorectal cancer (CRC): CRC#4 n=40 cases, n=40 controls; CRC#5 n=61 cases, n=52 controls; CRC#6 n=52 cases, n=52 controls; one for Graves' disease (GD): n=88 cases, n=62 controls; and hypertension ( The following were used as predictors for the Case-Control Dataset Collection II, which has 15 published metagenomics case-control datasets for 10 diseases: two for HT (Heatstroke) (HT#1 cases n=60, controls n=56; HT#2 cases n=99, controls n=41); one for multiple sclerosis (MS) (cases n=24, controls n=24); three for pancreatic cancer (PC) (PC#1 cases n=43, controls n=235; PC#2 cases n=57, controls n=50; PC#3 cases n=44, controls n=32); and one for Parkinson's disease (PD) (cases n=39, controls n=40). A random forest classification model with skipped cross-validation was applied to each dataset. [Figure 8B3]Collectively illustrate combined core genomes drawn from all identified competing guilds to effectively differentiate cases from controls across a broader range of diseases and predict treatment outcomes in independent datasets. Combined cores are used in one dataset for ankylosing spondylitis (AS#2): n=85 cases, n=55 controls; one for autism spectrum disorder (ASD): n=64 cases, n=64 controls; one for Behçet's disease (BD): n=24 cases, n=52 controls; one for COVID-19: n=47 cases, n=19 controls; three for colorectal cancer (CRC): CRC#4 n=40 cases, n=40 controls; CRC#5 n=61 cases, n=52 controls; CRC#6 n=52 cases, n=52 controls; one for Graves' disease (GD): n=88 cases, n=62 controls; and hypertension ( The following were used as predictors for the Case-Control Dataset Collection II, which has 15 published metagenomics case-control datasets for 10 diseases: two for HT (Heatstroke) (HT#1 cases n=60, controls n=56; HT#2 cases n=99, controls n=41); one for multiple sclerosis (MS) (cases n=24, controls n=24); three for pancreatic cancer (PC) (PC#1 cases n=43, controls n=235; PC#2 cases n=57, controls n=50; PC#3 cases n=44, controls n=32); and one for Parkinson's disease (PD) (cases n=39, controls n=40). A random forest classification model with skipped cross-validation was applied to each dataset. [Figure 8B4]Collectively illustrate combined core genomes drawn from all identified competing guilds to effectively differentiate cases from controls across a broader range of diseases and predict treatment outcomes in independent datasets. Combined cores are used in one dataset for ankylosing spondylitis (AS#2): n=85 cases, n=55 controls; one for autism spectrum disorder (ASD): n=64 cases, n=64 controls; one for Behçet's disease (BD): n=24 cases, n=52 controls; one for COVID-19: n=47 cases, n=19 controls; three for colorectal cancer (CRC): CRC#4 n=40 cases, n=40 controls; CRC#5 n=61 cases, n=52 controls; CRC#6 n=52 cases, n=52 controls; one for Graves' disease (GD): n=88 cases, n=62 controls; and hypertension ( The following were used as predictors for the Case-Control Dataset Collection II, which has 15 published metagenomics case-control datasets for 10 diseases: two for HT (Heatstroke) (HT#1 cases n=60, controls n=56; HT#2 cases n=99, controls n=41); one for multiple sclerosis (MS) (cases n=24, controls n=24); three for pancreatic cancer (PC) (PC#1 cases n=43, controls n=235; PC#2 cases n=57, controls n=50; PC#3 cases n=44, controls n=32); and one for Parkinson's disease (PD) (cases n=39, controls n=40). A random forest classification model with skipped cross-validation was applied to each dataset. [Figure 8B5]Collectively illustrate combined core genomes drawn from all identified competing guilds to effectively differentiate cases from controls across a broader range of diseases and predict treatment outcomes in independent datasets. Combined cores are used in one dataset for ankylosing spondylitis (AS#2): n=85 cases, n=55 controls; one for autism spectrum disorder (ASD): n=64 cases, n=64 controls; one for Behçet's disease (BD): n=24 cases, n=52 controls; one for COVID-19: n=47 cases, n=19 controls; three for colorectal cancer (CRC): CRC#4 n=40 cases, n=40 controls; CRC#5 n=61 cases, n=52 controls; CRC#6 n=52 cases, n=52 controls; one for Graves' disease (GD): n=88 cases, n=62 controls; and hypertension ( The following were used as predictors for the Case-Control Dataset Collection II, which has 15 published metagenomics case-control datasets for 10 diseases: two for HT (Heatstroke) (HT#1 cases n=60, controls n=56; HT#2 cases n=99, controls n=41); one for multiple sclerosis (MS) (cases n=24, controls n=24); three for pancreatic cancer (PC) (PC#1 cases n=43, controls n=235; PC#2 cases n=57, controls n=50; PC#3 cases n=44, controls n=32); and one for Parkinson's disease (PD) (cases n=39, controls n=40). A random forest classification model with skipped cross-validation was applied to each dataset. [Figure 8B6]Collectively illustrate combined core genomes drawn from all identified competing guilds to effectively differentiate cases from controls across a broader range of diseases and predict treatment outcomes in independent datasets. Combined cores are used in one dataset for ankylosing spondylitis (AS#2): n=85 cases, n=55 controls; one for autism spectrum disorder (ASD): n=64 cases, n=64 controls; one for Behçet's disease (BD): n=24 cases, n=52 controls; one for COVID-19: n=47 cases, n=19 controls; three for colorectal cancer (CRC): CRC#4 n=40 cases, n=40 controls; CRC#5 n=61 cases, n=52 controls; CRC#6 n=52 cases, n=52 controls; one for Graves' disease (GD): n=88 cases, n=62 controls; and hypertension ( The following were used as predictors for the Case-Control Dataset Collection II, which has 15 published metagenomics case-control datasets for 10 diseases: two for HT (Heatstroke) (HT#1 cases n=60, controls n=56; HT#2 cases n=99, controls n=41); one for multiple sclerosis (MS) (cases n=24, controls n=24); three for pancreatic cancer (PC) (PC#1 cases n=43, controls n=235; PC#2 cases n=57, controls n=50; PC#3 cases n=44, controls n=32); and one for Parkinson's disease (PD) (cases n=39, controls n=40). A random forest classification model with skipped cross-validation was applied to each dataset. [Figure 8B7]Collectively illustrate combined core genomes drawn from all identified competing guilds to effectively differentiate cases from controls across a broader range of diseases and predict treatment outcomes in independent datasets. Combined cores are used in one dataset for ankylosing spondylitis (AS#2): n=85 cases, n=55 controls; one for autism spectrum disorder (ASD): n=64 cases, n=64 controls; one for Behçet's disease (BD): n=24 cases, n=52 controls; one for COVID-19: n=47 cases, n=19 controls; three for colorectal cancer (CRC): CRC#4 n=40 cases, n=40 controls; CRC#5 n=61 cases, n=52 controls; CRC#6 n=52 cases, n=52 controls; one for Graves' disease (GD): n=88 cases, n=62 controls; and hypertension ( The following were used as predictors for the Case-Control Dataset Collection II, which has 15 published metagenomics case-control datasets for 10 diseases: two for HT (Heatstroke) (HT#1 cases n=60, controls n=56; HT#2 cases n=99, controls n=41); one for multiple sclerosis (MS) (cases n=24, controls n=24); three for pancreatic cancer (PC) (PC#1 cases n=43, controls n=235; PC#2 cases n=57, controls n=50; PC#3 cases n=44, controls n=32); and one for Parkinson's disease (PD) (cases n=39, controls n=40). A random forest classification model with skipped cross-validation was applied to each dataset. [Figure 8B8]Collectively illustrate combined core genomes drawn from all identified competing guilds to effectively differentiate cases from controls across a broader range of diseases and predict treatment outcomes in independent datasets. Combined cores are used in one dataset for ankylosing spondylitis (AS#2): n=85 cases, n=55 controls; one for autism spectrum disorder (ASD): n=64 cases, n=64 controls; one for Behçet's disease (BD): n=24 cases, n=52 controls; one for COVID-19: n=47 cases, n=19 controls; three for colorectal cancer (CRC): CRC#4 n=40 cases, n=40 controls; CRC#5 n=61 cases, n=52 controls; CRC#6 n=52 cases, n=52 controls; one for Graves' disease (GD): n=88 cases, n=62 controls; and hypertension ( The following were used as predictors for the Case-Control Dataset Collection II, which has 15 published metagenomics case-control datasets for 10 diseases: two for HT (Heatstroke) (HT#1 cases n=60, controls n=56; HT#2 cases n=99, controls n=41); one for multiple sclerosis (MS) (cases n=24, controls n=24); three for pancreatic cancer (PC) (PC#1 cases n=43, controls n=235; PC#2 cases n=57, controls n=50; PC#3 cases n=44, controls n=32); and one for Parkinson's disease (PD) (cases n=39, controls n=40). A random forest classification model with skipped cross-validation was applied to each dataset. [Figure 8B9]Collectively illustrate combined core genomes drawn from all identified competing guilds to effectively differentiate cases from controls across a broader range of diseases and predict treatment outcomes in independent datasets. Combined cores are used in one dataset for ankylosing spondylitis (AS#2): n=85 cases, n=55 controls; one for autism spectrum disorder (ASD): n=64 cases, n=64 controls; one for Behçet's disease (BD): n=24 cases, n=52 controls; one for COVID-19: n=47 cases, n=19 controls; three for colorectal cancer (CRC): CRC#4 n=40 cases, n=40 controls; CRC#5 n=61 cases, n=52 controls; CRC#6 n=52 cases, n=52 controls; one for Graves' disease (GD): n=88 cases, n=62 controls; and hypertension ( The following were used as predictors for the Case-Control Dataset Collection II, which has 15 published metagenomics case-control datasets for 10 diseases: two for HT (Heatstroke) (HT#1 cases n=60, controls n=56; HT#2 cases n=99, controls n=41); one for multiple sclerosis (MS) (cases n=24, controls n=24); three for pancreatic cancer (PC) (PC#1 cases n=43, controls n=235; PC#2 cases n=57, controls n=50; PC#3 cases n=44, controls n=32); and one for Parkinson's disease (PD) (cases n=39, controls n=40). A random forest classification model with skipped cross-validation was applied to each dataset. [Figure 8B10]Collectively illustrate combined core genomes drawn from all identified competing guilds to effectively differentiate cases from controls across a broader range of diseases and predict treatment outcomes in independent datasets. Combined cores are used in one dataset for ankylosing spondylitis (AS#2): n=85 cases, n=55 controls; one for autism spectrum disorder (ASD): n=64 cases, n=64 controls; one for Behçet's disease (BD): n=24 cases, n=52 controls; one for COVID-19: n=47 cases, n=19 controls; three for colorectal cancer (CRC): CRC#4 n=40 cases, n=40 controls; CRC#5 n=61 cases, n=52 controls; CRC#6 n=52 cases, n=52 controls; one for Graves' disease (GD): n=88 cases, n=62 controls; and hypertension ( The following were used as predictors for the Case-Control Dataset Collection II, which has 15 published metagenomics case-control datasets for 10 diseases: two for HT (Heatstroke) (HT#1 cases n=60, controls n=56; HT#2 cases n=99, controls n=41); one for multiple sclerosis (MS) (cases n=24, controls n=24); three for pancreatic cancer (PC) (PC#1 cases n=43, controls n=235; PC#2 cases n=57, controls n=50; PC#3 cases n=44, controls n=32); and one for Parkinson's disease (PD) (cases n=39, controls n=40). A random forest classification model with skipped cross-validation was applied to each dataset. [Figure 8B11]Collectively illustrate combined core genomes drawn from all identified competing guilds to effectively differentiate cases from controls across a broader range of diseases and predict treatment outcomes in independent datasets. Combined cores are used in one dataset for ankylosing spondylitis (AS#2): n=85 cases, n=55 controls; one for autism spectrum disorder (ASD): n=64 cases, n=64 controls; one for Behçet's disease (BD): n=24 cases, n=52 controls; one for COVID-19: n=47 cases, n=19 controls; three for colorectal cancer (CRC): CRC#4 n=40 cases, n=40 controls; CRC#5 n=61 cases, n=52 controls; CRC#6 n=52 cases, n=52 controls; one for Graves' disease (GD): n=88 cases, n=62 controls; and hypertension ( The following were used as predictors for the Case-Control Dataset Collection II, which has 15 published metagenomics case-control datasets for 10 diseases: two for HT (Heatstroke) (HT#1 cases n=60, controls n=56; HT#2 cases n=99, controls n=41); one for multiple sclerosis (MS) (cases n=24, controls n=24); three for pancreatic cancer (PC) (PC#1 cases n=43, controls n=235; PC#2 cases n=57, controls n=50; PC#3 cases n=44, controls n=32); and one for Parkinson's disease (PD) (cases n=39, controls n=40). A random forest classification model with skipped cross-validation was applied to each dataset. [Figure 8B12]Collectively illustrate combined core genomes drawn from all identified competing guilds to effectively differentiate cases from controls across a broader range of diseases and predict treatment outcomes in independent datasets. Combined cores are used in one dataset for ankylosing spondylitis (AS#2): n=85 cases, n=55 controls; one for autism spectrum disorder (ASD): n=64 cases, n=64 controls; one for Behçet's disease (BD): n=24 cases, n=52 controls; one for COVID-19: n=47 cases, n=19 controls; three for colorectal cancer (CRC): CRC#4 n=40 cases, n=40 controls; CRC#5 n=61 cases, n=52 controls; CRC#6 n=52 cases, n=52 controls; one for Graves' disease (GD): n=88 cases, n=62 controls; and hypertension ( The following were used as predictors for the Case-Control Dataset Collection II, which has 15 published metagenomics case-control datasets for 10 diseases: two for HT (Heatstroke) (HT#1 cases n=60, controls n=56; HT#2 cases n=99, controls n=41); one for multiple sclerosis (MS) (cases n=24, controls n=24); three for pancreatic cancer (PC) (PC#1 cases n=43, controls n=235; PC#2 cases n=57, controls n=50; PC#3 cases n=44, controls n=32); and one for Parkinson's disease (PD) (cases n=39, controls n=40). A random forest classification model with skipped cross-validation was applied to each dataset. [Figure 8B13]Collectively illustrate combined core genomes drawn from all identified competing guilds to effectively differentiate cases from controls across a broader range of diseases and predict treatment outcomes in independent datasets. Combined cores are used in one dataset for ankylosing spondylitis (AS#2): n=85 cases, n=55 controls; one for autism spectrum disorder (ASD): n=64 cases, n=64 controls; one for Behçet's disease (BD): n=24 cases, n=52 controls; one for COVID-19: n=47 cases, n=19 controls; three for colorectal cancer (CRC): CRC#4 n=40 cases, n=40 controls; CRC#5 n=61 cases, n=52 controls; CRC#6 n=52 cases, n=52 controls; one for Graves' disease (GD): n=88 cases, n=62 controls; and hypertension ( The following were used as predictors for the Case-Control Dataset Collection II, which has 15 published metagenomics case-control datasets for 10 diseases: two for HT (Heatstroke) (HT#1 cases n=60, controls n=56; HT#2 cases n=99, controls n=41); one for multiple sclerosis (MS) (cases n=24, controls n=24); three for pancreatic cancer (PC) (PC#1 cases n=43, controls n=235; PC#2 cases n=57, controls n=50; PC#3 cases n=44, controls n=32); and one for Parkinson's disease (PD) (cases n=39, controls n=40). A random forest classification model with skipped cross-validation was applied to each dataset. [Figure 8B14]Collectively illustrate combined core genomes drawn from all identified competing guilds to effectively differentiate cases from controls across a broader range of diseases and predict treatment outcomes in independent datasets. Combined cores are used in one dataset for ankylosing spondylitis (AS#2): n=85 cases, n=55 controls; one for autism spectrum disorder (ASD): n=64 cases, n=64 controls; one for Behçet's disease (BD): n=24 cases, n=52 controls; one for COVID-19: n=47 cases, n=19 controls; three for colorectal cancer (CRC): CRC#4 n=40 cases, n=40 controls; CRC#5 n=61 cases, n=52 controls; CRC#6 n=52 cases, n=52 controls; one for Graves' disease (GD): n=88 cases, n=62 controls; and hypertension ( The following were used as predictors for the Case-Control Dataset Collection II, which has 15 published metagenomics case-control datasets for 10 diseases: two for HT (Heatstroke) (HT#1 cases n=60, controls n=56; HT#2 cases n=99, controls n=41); one for multiple sclerosis (MS) (cases n=24, controls n=24); three for pancreatic cancer (PC) (PC#1 cases n=43, controls n=235; PC#2 cases n=57, controls n=50; PC#3 cases n=44, controls n=32); and one for Parkinson's disease (PD) (cases n=39, controls n=40). A random forest classification model with skipped cross-validation was applied to each dataset. [Figure 8B15]Collectively illustrate combined core genomes drawn from all identified competing guilds to effectively differentiate cases from controls across a broader range of diseases and predict treatment outcomes in independent datasets. Combined cores are used in one dataset for ankylosing spondylitis (AS#2): n=85 cases, n=55 controls; one for autism spectrum disorder (ASD): n=64 cases, n=64 controls; one for Behçet's disease (BD): n=24 cases, n=52 controls; one for COVID-19: n=47 cases, n=19 controls; three for colorectal cancer (CRC): CRC#4 n=40 cases, n=40 controls; CRC#5 n=61 cases, n=52 controls; CRC#6 n=52 cases, n=52 controls; one for Graves' disease (GD): n=88 cases, n=62 controls; and hypertension ( The following were used as predictors for the Case-Control Dataset Collection II, which has 15 published metagenomics case-control datasets for 10 diseases: two for HT (Heatstroke) (HT#1 cases n=60, controls n=56; HT#2 cases n=99, controls n=41); one for multiple sclerosis (MS) (cases n=24, controls n=24); three for pancreatic cancer (PC) (PC#1 cases n=43, controls n=235; PC#2 cases n=57, controls n=50; PC#3 cases n=44, controls n=32); and one for Parkinson's disease (PD) (cases n=39, controls n=40). A random forest classification model with skipped cross-validation was applied to each dataset. [Figure 8B16]Collectively illustrate combined core genomes drawn from all identified competing guilds to effectively differentiate cases from controls across a broader range of diseases and predict treatment outcomes in independent datasets. Combined cores are used in one dataset for ankylosing spondylitis (AS#2): n=85 cases, n=55 controls; one for autism spectrum disorder (ASD): n=64 cases, n=64 controls; one for Behçet's disease (BD): n=24 cases, n=52 controls; one for COVID-19: n=47 cases, n=19 controls; three for colorectal cancer (CRC): CRC#4 n=40 cases, n=40 controls; CRC#5 n=61 cases, n=52 controls; CRC#6 n=52 cases, n=52 controls; one for Graves' disease (GD): n=88 cases, n=62 controls; and hypertension ( The following were used as predictors for the Case-Control Dataset Collection II, which has 15 published metagenomics case-control datasets for 10 diseases: two for HT (Heatstroke) (HT#1 cases n=60, controls n=56; HT#2 cases n=99, controls n=41); one for multiple sclerosis (MS) (cases n=24, controls n=24); three for pancreatic cancer (PC) (PC#1 cases n=43, controls n=235; PC#2 cases n=57, controls n=50; PC#3 cases n=44, controls n=32); and one for Parkinson's disease (PD) (cases n=39, controls n=40). A random forest classification model with skipped cross-validation was applied to each dataset. [Figure 8C1]The combined core genomes extracted from all identified competing guilds were collectively illustrated to effectively differentiate cases from controls across a broader range of diseases and predict treatment outcomes in independent datasets. The combined core genomes were used as predictors in the treatment dataset collection to predict responders (R) and non-responders (NR) under treatment. For inflammatory bowel disease (IBD), 14 weeks of remission were used to determine R and NR in patients with IBD to anti-cytokine or anti-antigen therapy. IBD_anti-cytokine: R n=29, NR n=18; IBD_anti-integrin #1: R n=27, NR n=40; ​​IBD_anti-integrin #2: R n=29, NR n=53. For rheumatoid arthritis (RA), responders to methotrexate (MTX) were predefined as any patient with newly diagnosed RA showing improvement in disease activity score (DAS28) of ≥1.8 in 28 joints by 4 months after initiation of MTX monotherapy. For progressive melanoma, progression-free survival was used as the determined R and NR for immune checkpoint inhibitor (ICI) therapy. AM_ICI#1: R n=4, NR n=7; AM_ICI#2: R n=10, NR n=8; AM_ICI#3: R n=12, NR n=13; AM_ICI#4: R n=25, NR n=30; AM_ICI#5: R n=26, NR n=28. For B-cell lymphoma, the tumor response to CAR-T cell immunotherapy was classified as either complete remission or incomplete remission (partial remission, stable disease, progressive disease, or death) 180 days after CAR-T cell infusion by the treating physician. The model was trained in a German cohort and validated by a US cohort. Germany: R n=21, NR n=29; US: R n=21, NR n=24. [Figure 8C2]The combined core genomes extracted from all identified competing guilds were collectively illustrated to effectively differentiate cases from controls across a broader range of diseases and predict treatment outcomes in independent datasets. The combined core genomes were used as predictors in the treatment dataset collection to predict responders (R) and non-responders (NR) under treatment. For inflammatory bowel disease (IBD), 14 weeks of remission were used to determine R and NR in patients with IBD to anti-cytokine or anti-antigen therapy. IBD_anti-cytokine: R n=29, NR n=18; IBD_anti-integrin #1: R n=27, NR n=40; ​​IBD_anti-integrin #2: R n=29, NR n=53. For rheumatoid arthritis (RA), responders to methotrexate (MTX) were predefined as any patient with newly diagnosed RA showing improvement in disease activity score (DAS28) of ≥1.8 in 28 joints by 4 months after initiation of MTX monotherapy. For progressive melanoma, progression-free survival was used as the determined R and NR for immune checkpoint inhibitor (ICI) therapy. AM_ICI#1: R n=4, NR n=7; AM_ICI#2: R n=10, NR n=8; AM_ICI#3: R n=12, NR n=13; AM_ICI#4: R n=25, NR n=30; AM_ICI#5: R n=26, NR n=28. For B-cell lymphoma, the tumor response to CAR-T cell immunotherapy was classified as either complete remission or incomplete remission (partial remission, stable disease, progressive disease, or death) 180 days after CAR-T cell infusion by the treating physician. The model was trained in a German cohort and validated by a US cohort. Germany: R n=21, NR n=29; US: R n=21, NR n=24. [Figure 9A]In case-control dataset collections I and II, we collectively illustrate the discriminative power of combined core genomes from all eight sets of two competing guilds in classifying healthy individuals versus patients across colorectal cancer (CRC), inflammatory bowel disease (IBD), and pancreatic cancer (PC) datasets. Predictive matrices for case-control classification based on combined core genomes from all eight sets of two competing guilds are presented within each dataset (diagonal values), across pairs of datasets (one dataset used for model training and the other used for testing), and in a single-dataset-missing configuration (training the model on all but one dataset and testing on the missing dataset). A random forest classification model with single-dataset-missing cross-validation was applied. Area under the ROC curve (AUC) values ​​are shown in the matrix. CRC.#1: Cases n=74, controls n=54; #2: Cases n=46, controls n=63; #3: Cases n=22, controls n=60; #4: Cases n=40, controls n=40; ​​#5: Cases n=61, controls n=52; #6: Cases n=52, controls n=52. [Figure 9B] In case-control dataset collections I and II, we collectively illustrate the discriminative power of combined core genomes from all eight sets of two competing guilds in classifying healthy individuals versus patients across colorectal cancer (CRC), inflammatory bowel disease (IBD), and pancreatic cancer (PC) datasets. Predictive matrices for case-control classification based on combined core genomes from all eight sets of two competing guilds are shown within each dataset (diagonal values), across pairs of datasets (one dataset used for model training and the other used for testing), and with a single dataset excluded (training the model on all but one dataset and testing on the excluded dataset). A random forest classification model with single-dataset cross-validation was applied. Area under the ROC curve (AUC) values ​​are shown in the matrix. IBD.#1: Cases n=80, Controls n=26;#2: Cases n=121, Controls n=34;#3: Cases n=43, Controls n=22. [Figure 9C]In case-control dataset collections I and II, we collectively illustrate the discriminative power of combined core genomes from all eight sets of two competing guilds in classifying healthy individuals versus patients across colorectal cancer (CRC), inflammatory bowel disease (IBD), and pancreatic cancer (PC) datasets. Predictive matrices for case and control classification based on combined core genomes from all eight sets of two competing guilds are shown within each dataset (diagonal values), across pairs of datasets (one dataset used for model training and the other used for testing), and with one dataset excluded (the model is trained on all but one dataset and tested on the excluded dataset). A random forest classification model with one-out-of-dataset cross-validation was applied. Area under the ROC curve (AUC) values ​​are shown in the matrix. PC.#1: Cases n=43, Controls n=235;#2: Cases n=57, Controls n=50;#3: Cases n=44, Controls n=32. [Figure 10A1] This study collectively illustrates combined cores of two competing guilds that support predictions of therapeutic efficacy in a collection of therapeutic datasets for inflammatory bowel disease, rheumatoid arthritis, progressive melanoma, and B-cell lymphoma. The abundance of the combined core genome (284HQMAG) in pre-treatment samples was used as a predictor in a random forest classification model to predict responders (R) and non-responders (NR) during treatment. Area under the ROC curve (AUC) and AUC values ​​are shown in the panel. R and NR were determined using 14 weeks of remission. IBD_anti-cytokine, R n=29, NR n=18; IBD_anti-integrin #1, R ​​n=27, NR n=40; ​​IBD_anti-integrin #2, R n=29, NR n=53. [Figure 10A2]This study collectively illustrates combined cores of two competing guilds that support predictions of therapeutic efficacy in a collection of therapeutic datasets for inflammatory bowel disease, rheumatoid arthritis, progressive melanoma, and B-cell lymphoma. The abundance of the combined core genome (284HQMAG) in pre-treatment samples was used as a predictor in a random forest classification model to predict responders (R) and non-responders (NR) during treatment. Area under the ROC curve (AUC) and AUC values ​​are shown in the panel. R and NR were determined using 14 weeks of remission. IBD_anti-cytokine, R n=29, NR n=18; IBD_anti-integrin #1, R ​​n=27, NR n=40; ​​IBD_anti-integrin #2, R n=29, NR n=53. [Figure 10B1] This study collectively illustrates combined cores of two competing guilds that support predictive of therapeutic efficacy in a collection of therapeutic datasets for inflammatory bowel disease, rheumatoid arthritis, progressive melanoma, and B-cell lymphoma. The abundance of the combined core genome (284HQMAG) in prior-treatment samples was used as a predictor in a random forest classification model to predict responders (R) and non-responders (NR) during treatment. Area under the ROC curve (AUC) and AUC values ​​are shown in the panel. Responders to MTX were predefined as any patient with newly diagnosed RA and an improvement in disease activity score (DAS28) (25) in 28 joints of ≥1.8 by 4 months after initiation of MTX monotherapy. R n=19, NR n=28. [Figure 10B2]This study collectively illustrates combined cores of two competing guilds that support predictive of therapeutic efficacy in a collection of therapeutic datasets for inflammatory bowel disease, rheumatoid arthritis, progressive melanoma, and B-cell lymphoma. The abundance of the combined core genome (284HQMAG) in prior-treatment samples was used as a predictor in a random forest classification model to predict responders (R) and non-responders (NR) during treatment. Area under the ROC curve (AUC) and AUC values ​​are shown in the panel. Responders to MTX were predefined as any patient with newly diagnosed RA and an improvement in disease activity score (DAS28) (25) in 28 joints of ≥1.8 by 4 months after initiation of MTX monotherapy. R n=19, NR n=28. [Figure 10C]Figure 10C1: Collectively illustrates combined cores of two competing guilds that support the prediction of treatment efficacy in a collection of treatment datasets for inflammatory bowel disease, rheumatoid arthritis, progressive melanoma, and B-cell lymphoma. The abundance of the combined core genome (284HQMAG) in prior treatment samples was used as a predictor in a random forest classification model to predict responders (R) and non-responders (NR) during treatment. Area under the ROC curve (AUC) and AUC values ​​are shown in the panel. Overall response rate (ORR, left matrix) and progression-free survival (PFS12, right matrix) were used for the determined R and NR, respectively. Predictive matrices for microbiome-based predictions of response assessed within each cohort (values ​​on the diagonal), across pairs of cohorts (one cohort used to train the model and the other cohort for testing), and in a setting with one cohort removed (training the model in all but one cohort and testing in the removed cohort), via ORR (left matrix) and PFS12 (right matrix). ORR: R n=94, NR n=71; PFS12: R n=77, NR n=86. Figure 10C2: Collectively illustrates combined cores of two competing guilds supporting predictions of treatment efficacy in a collection of treatment datasets for inflammatory bowel disease, rheumatoid arthritis, progressive melanoma, and B-cell lymphoma. The abundance of combined core genomes (284HQMAG) in pre-treatment samples was used as a predictor in a random forest classification model to predict responders (R) and non-responders (NR) during treatment. The area under the ROC curve (AUC) and AUC values ​​are shown in the panel. Overall response rate (ORR, left matrix) and progression-free survival (PFS12, right matrix) were used for the determined R and NR, respectively. ORR (left matrix) and PFS12 (right matrix) are used to predict microbiome-based predictions of response assessed within each cohort (values ​​on the diagonal), across pairs of cohorts (one cohort used to train the model and the other cohort for the trial), and in a single-cohort exclusion setting (training the model in all but one cohort and testing in the excluded cohort).ORR: R n=94, NR n=71; PFS12: R n=77, NR n=86. [Figure 10D] Figure 10D1: Collectively illustrates combined cores of two competing guilds that support prediction of therapeutic efficacy in a collection of therapeutic datasets for inflammatory bowel disease, rheumatoid arthritis, progressive melanoma, and B-cell lymphoma. The abundance of the combined core genome (284HQMAG) in pre-treatment samples was used as a predictor in a random forest classification model to predict responders (R) and non-responders (NR) during treatment. Area under the ROC curve (AUC) and AUC values ​​are shown in the panel. Tumor response to CAR-T cell immunotherapy was classified as either complete remission or incomplete remission (partial remission, stable disease, progressive disease, or death) 180 days after CAR-T cell infusion by the treating physician. The model was trained in #1 (German cohort) and validated by #2 (US cohort). #1: R n=21, NR n=29; #2: R n=21, NR n=24. Figure 10D2: Collectively illustrates combined cores of two competing guilds that support prediction of therapeutic efficacy in a collection of therapeutic datasets for inflammatory bowel disease, rheumatoid arthritis, progressive melanoma, and B-cell lymphoma. The abundance of the combined core genome (284HQMAG) in pre-treatment samples was used as a predictor in a random forest classification model to predict responders (R) and non-responders (NR) during treatment. Area under the ROC curve (AUC) and AUC values ​​are shown in the panel. Tumor response to CAR-T cell immunotherapy was classified as either complete remission or incomplete remission (partial remission, stable disease, progressive disease, or death) 180 days after CAR-T cell infusion by the treating physician. The model was trained in #1 (German cohort) and validated by #2 (US cohort). #1: R n=21, NR n=29; #2: R n=21, NR n=24. [Figure 11A]We provide a universal model for distinguishing cases and controls across various diseases, collectively illustrating core genomes of two competing guilds (Case-Control Dataset Collections I and II). All control and case samples from Case-Control Dataset Collections I and II, encompassing a total of 26 datasets across 15 different diseases, were combined and randomly assigned; 80% were used to train a random forest classification model, and 20% were used for testing. [Figure 11B1] This provides a universal model for distinguishing between cases and controls across various diseases, collectively illustrating core genomes of two competing guilds (case-control dataset collections I and II). It includes ROC curves and area under the ROC curve (AUC). [Figure 11B2] This provides a universal model for distinguishing between cases and controls across various diseases, collectively illustrating core genomes of two competing guilds (case-control dataset collections I and II). It includes ROC curves and area under the ROC curve (AUC). [Figure 11C1] This provides a universal model for distinguishing between cases and controls across various diseases, collectively illustrating core genomes of two competing guilds (case-control dataset collections I and II). A density plot of probability scores between cases and controls is shown. The probability scores, generated from a random forest classification model, represent the probability that a single sample is predicted as a case. [Figure 11C2] This provides a universal model for distinguishing between cases and controls across various diseases, collectively illustrating core genomes of two competing guilds (case-control dataset collections I and II). A density plot of probability scores between cases and controls is shown. The probability scores, generated from a random forest classification model, represent the probability that a single sample is predicted as a case. [Figure 11D1]This provides a universal model for distinguishing between cases and controls across various diseases, collectively illustrating core genomes of two competing guilds (case-control dataset collections I and II). Box plots of probability scores between control and case samples. Mann-Whitney test applied. ***P<0.001. Training: controls n=1,285, cases n=1,424; Test: controls n=319, cases n=356. [Figure 11D2] This provides a universal model for distinguishing between cases and controls across various diseases, collectively illustrating core genomes of two competing guilds (case-control dataset collections I and II). Box plots of probability scores between control and case samples. Mann-Whitney test applied. ***P<0.001. Training: controls n=1,285, cases n=1,424; Test: controls n=319, cases n=356. [Figure 12A] This collectively illustrates the corresponding contigs, referenced by sequence numbers, obtained for each of the 788 genomes. [Figure 12B] This collectively illustrates the corresponding contigs, referenced by sequence numbers, obtained for each of the 788 genomes. [Figure 12C] This collectively illustrates the corresponding contigs, referenced by sequence numbers, obtained for each of the 788 genomes. [Figure 12D] This collectively illustrates the corresponding contigs, referenced by sequence numbers, obtained for each of the 788 genomes. [Figure 12E] This collectively illustrates the corresponding contigs, referenced by sequence numbers, obtained for each of the 788 genomes. [Figure 12F] This collectively illustrates the corresponding contigs, referenced by sequence numbers, obtained for each of the 788 genomes. [Figure 12G] This collectively illustrates the corresponding contigs, referenced by sequence numbers, obtained for each of the 788 genomes. [Figure 12H]This collectively illustrates the corresponding contigs, referenced by sequence numbers, obtained for each of the 788 genomes. [Figure 12I] This collectively illustrates the corresponding contigs, referenced by sequence numbers, obtained for each of the 788 genomes. [Figure 13A] This provides a collective example of the taxonomic assignments of 788 combined microbiomes. [Figure 13B] This provides a collective example of the taxonomic assignments of 788 combined microbiomes. [Figure 13C] This provides a collective example of the taxonomic assignments of 788 combined microbiomes. [Figure 13D] This provides a collective example of the taxonomic assignments of 788 combined microbiomes. [Figure 13E] This provides a collective example of the taxonomic assignments of 788 combined microbiomes. [Figure 13F] This provides a collective example of the taxonomic assignments of 788 combined microbiomes. [Figure 13G] This provides a collective example of the taxonomic assignments of 788 combined microbiomes. [Figure 13H] This provides a collective example of the taxonomic assignments of 788 combined microbiomes. [Figure 13I] This provides a collective example of the taxonomic assignments of 788 combined microbiomes. [Figure 13J] This provides a collective example of the taxonomic assignments of 788 combined microbiomes. [Figure 13K] This provides a collective example of the taxonomic assignments of 788 combined microbiomes. [Figure 13L] This provides a collective example of the taxonomic assignments of 788 combined microbiomes. [Figure 13M] This provides a collective example of the taxonomic assignments of 788 combined microbiomes. [Figure 13N] This provides a collective example of the taxonomic assignments of 788 combined microbiomes. [Figure 13O] This provides a collective example of the taxonomic assignments of 788 combined microbiomes. [Figure 13P] This provides a collective example of the taxonomic assignments of 788 combined microbiomes. [Figure 13Q] This provides a collective example of the taxonomic assignments of 788 combined microbiomes. [Figure 13R] This provides a collective example of the taxonomic assignments of 788 combined microbiomes. [Figure 13S] This provides a collective example of the taxonomic assignments of 788 combined microbiomes. [Figure 13T] This provides a collective example of the taxonomic assignments of 788 combined microbiomes. [Figure 13U] This provides a collective example of the taxonomic assignments of 788 combined microbiomes. [Figure 13V] This provides a collective example of the taxonomic assignments of 788 combined microbiomes. [Figure 13W] This provides a collective example of the taxonomic assignments of 788 combined microbiomes. [Figure 13X] This provides a collective example of the taxonomic assignments of 788 combined microbiomes. [Figure 13Y] This provides a collective example of the taxonomic assignments of 788 combined microbiomes. [Figure 13Z] This provides a collective example of the taxonomic assignments of 788 combined microbiomes. [Figure 13AA] This provides a collective example of the taxonomic assignments of 788 combined microbiomes. [Figure 13BB] This provides a collective example of the taxonomic assignments of 788 combined microbiomes. [Figure 13CC]This provides a collective example of the taxonomic assignments of 788 combined microbiomes. [Figure 13DD] This provides a collective example of the taxonomic assignments of 788 combined microbiomes. [Figure 13EE] This provides a collective example of the taxonomic assignments of 788 combined microbiomes. [Figure 13FF] This provides a collective example of the taxonomic assignments of 788 combined microbiomes. [Figure 13GG] This provides a collective example of the taxonomic assignments of 788 combined microbiomes. [Figure 13HH] This provides a collective example of the taxonomic assignments of 788 combined microbiomes. [Figure 13II] This provides a collective example of the taxonomic assignments of 788 combined microbiomes. [Figure 13JJ] This provides a collective example of the taxonomic assignments of 788 combined microbiomes. [Figure 13KK] This provides a collective example of the taxonomic assignments of 788 combined microbiomes. [Figure 13LL] This provides a collective example of the taxonomic assignments of 788 combined microbiomes. [Figure 13MM] This provides a collective example of the taxonomic assignments of 788 combined microbiomes. [Figure 13NN] This provides a collective example of the taxonomic assignments of 788 combined microbiomes. [Figure 1300] This provides a collective example of the taxonomic assignments of 788 combined microbiomes. [Figure 13PP] This provides a collective example of the taxonomic assignments of 788 combined microbiomes. [Figure 13QQ] This provides a collective example of the taxonomic assignments of 788 combined microbiomes. [Figure 13RR]This provides a collective example of the taxonomic assignments of 788 combined microbiomes. [Figure 13SS] This provides a collective example of the taxonomic assignments of 788 combined microbiomes. [Figure 13TT] This provides a collective example of the taxonomic assignments of 788 combined microbiomes. [Figure 13UU] This provides a collective example of the taxonomic assignments of 788 combined microbiomes. [Figure 13VV] This provides a collective example of the taxonomic assignments of 788 combined microbiomes. [Figure 13WW] This provides a collective example of the taxonomic assignments of 788 combined microbiomes. [Figure 13XX] This provides a collective example of the taxonomic assignments of 788 combined microbiomes. [Figure 14A] This paper provides a collective example of pairwise ANI comparisons of genomes. It describes pairwise ANI comparisons of all genomes across a pool of 788 combined genomes. [Figure 14B] This paper provides a collective example of pairwise ANI comparisons of genomes. It describes pairwise ANI comparisons between guild 1 genomes and guild 2 genomes. [Figure 15A] This study collectively illustrates the ability of combined pools to classify cases and controls across different studies. Eight sets of signature microbiomes—T2D, LC, SCZ, IBD, AS, ACVD, and CRC—obtained from QD and various disease cases were pooled together as a combined microbiome signature. A comparison of the classification performance of the combined pool with each individual signature microbiome based on AUC values ​​is presented. [Figure 15B]This study collectively illustrates the ability of combined pools to classify cases and controls across different studies. Eight sets of signature microbiomes obtained from QD and various disease cases—T2D, LC, SCZ, IBD, AS, ACVD, and CRC—were pooled together as combined microbiome signatures. Significance within group comparisons was demonstrated using Friedman tests followed by Dunn's post-hoc tests (#BH-adjusted P<0.1, *BH-adjusted P<0.05). [Figure 16A] This section collectively illustrates the ranking of microbiome signature classification performance. Nine sets of microbiome signatures, obtained from a combined pool, QD, or various disease cases: T2D, LC, SCZ, IBD, AS, ACVD, and CRC, were ranked according to their performance in case and control classification across 11 datasets. All ranking numbers assigned to each set of signature microbiomes are plotted in Figure 16A. [Figure 16B] This section collectively illustrates the ranking of microbiome signature classification performance. Nine sets of microbiome signatures obtained from a combined pool, QD, or various disease cases: T2D, LC, SCZ, IBD, AS, ACVD, and CRC were ranked according to their performance in case and control classification across 11 datasets. Significance within group comparisons is shown. [Figure 16C] This section collectively illustrates the ranking of microbiome signature classification performance. Nine sets of microbiome signatures, obtained from a combined pool, QD, or various disease cases: T2D, LC, SCZ, IBD, AS, ACVD, and CRC, were ranked according to their performance in classifying cases and controls across 11 datasets. The sum of the rankings for each set of microbiome signatures is shown. Kruskal-Wallis tests and subsequent Dunn post-hoc tests were performed for analysis (#BH-adjusted P<0.1, *BH-adjusted P<0.05). Microbiome signatures obtained from the combined pool exhibit the best performance in classifying healthy subjects versus patients across different datasets. [Figure 17] The selection of a combined core pool is illustrated. For each dataset, random forest classification is performed based on 788 combined genomes. Each of the 788 genomes is ranked based on its importance. The total rank is obtained by summing the rank values ​​across 11 datasets. All 788 genomes are ranked again based on the total value. The most important genome across the 11 datasets receives the lowest total rank value. Starting with the least important genome, one genome is removed from each dataset based on its importance. Classification performance (AUC) is calculated for each remaining genome after removal by the random forest model, and all genomes are ranked based on their AUC value. The rank values ​​for each genome across the 11 datasets are summed. The sum of the ranks for each genome across the 11 datasets is plotted. 302 genomes achieved the lowest total AUC rank. After removing 18 genomes showing inconsistent C1A and C1B assignments, 284 genomes remained as the combined core pool. [Figure 18A] This study collectively illustrates the classification capabilities of two competing guilds identified from QD, various disease types, a combined pool, and a combined core pool. Microbiome signatures, including genomes, for the two competing guilds were obtained from various diseases: T2D (Figure 18A), LC (Figure 18B), AS (Figure 18C), CRC (Figure 18D), IBD (Figure 18E), QD (Figure 18F), AVCD (Figure 18G), SCZ (Figure 18H), a combined pool (Figure 18I), and a combined core pool (Figure 18J). Using the microbiome signatures identified for each symptom, controls and patients in each dataset were classified using a random forest classifier. Figure 31 shows that all microbiome signatures have the ability to classify cases and controls across different studies. [Figure 18B]This study collectively illustrates the classification capabilities of two competing guilds identified from QD, various disease types, a combined pool, and a combined core pool. Microbiome signatures, including genomes, for the two competing guilds were obtained from various diseases: T2D (Figure 18A), LC (Figure 18B), AS (Figure 18C), CRC (Figure 18D), IBD (Figure 18E), QD (Figure 18F), AVCD (Figure 18G), SCZ (Figure 18H), a combined pool (Figure 18I), and a combined core pool (Figure 18J). Using the microbiome signatures identified for each symptom, controls and patients in each dataset were classified using a random forest classifier. Figure 31 shows that all microbiome signatures have the ability to classify cases and controls across different studies. [Figure 18C] This study collectively illustrates the classification capabilities of two competing guilds identified from QD, various disease types, a combined pool, and a combined core pool. Microbiome signatures, including genomes, for the two competing guilds were obtained from various diseases: T2D (Figure 18A), LC (Figure 18B), AS (Figure 18C), CRC (Figure 18D), IBD (Figure 18E), QD (Figure 18F), AVCD (Figure 18G), SCZ (Figure 18H), a combined pool (Figure 18I), and a combined core pool (Figure 18J). Using the microbiome signatures identified for each symptom, controls and patients in each dataset were classified using a random forest classifier. Figure 31 shows that all microbiome signatures have the ability to classify cases and controls across different studies. [Figure 18D]This study collectively illustrates the classification capabilities of two competing guilds identified from QD, various disease types, a combined pool, and a combined core pool. Microbiome signatures, including genomes, for the two competing guilds were obtained from various diseases: T2D (Figure 18A), LC (Figure 18B), AS (Figure 18C), CRC (Figure 18D), IBD (Figure 18E), QD (Figure 18F), AVCD (Figure 18G), SCZ (Figure 18H), a combined pool (Figure 18I), and a combined core pool (Figure 18J). Using the microbiome signatures identified for each symptom, controls and patients in each dataset were classified using a random forest classifier. Figure 31 shows that all microbiome signatures have the ability to classify cases and controls across different studies. [Figure 18E] This study collectively illustrates the classification capabilities of two competing guilds identified from QD, various disease types, a combined pool, and a combined core pool. Microbiome signatures, including genomes, for the two competing guilds were obtained from various diseases: T2D (Figure 18A), LC (Figure 18B), AS (Figure 18C), CRC (Figure 18D), IBD (Figure 18E), QD (Figure 18F), AVCD (Figure 18G), SCZ (Figure 18H), a combined pool (Figure 18I), and a combined core pool (Figure 18J). Using the microbiome signatures identified for each symptom, controls and patients in each dataset were classified using a random forest classifier. Figure 31 shows that all microbiome signatures have the ability to classify cases and controls across different studies. [Figure 18F]This study collectively illustrates the classification capabilities of two competing guilds identified from QD, various disease types, a combined pool, and a combined core pool. Microbiome signatures, including genomes, for the two competing guilds were obtained from various diseases: T2D (Figure 18A), LC (Figure 18B), AS (Figure 18C), CRC (Figure 18D), IBD (Figure 18E), QD (Figure 18F), AVCD (Figure 18G), SCZ (Figure 18H), a combined pool (Figure 18I), and a combined core pool (Figure 18J). Using the microbiome signatures identified for each symptom, controls and patients in each dataset were classified using a random forest classifier. Figure 31 shows that all microbiome signatures have the ability to classify cases and controls across different studies. [Figure 18G] This study collectively illustrates the classification capabilities of two competing guilds identified from QD, various disease types, a combined pool, and a combined core pool. Microbiome signatures, including genomes, for the two competing guilds were obtained from various diseases: T2D (Figure 18A), LC (Figure 18B), AS (Figure 18C), CRC (Figure 18D), IBD (Figure 18E), QD (Figure 18F), AVCD (Figure 18G), SCZ (Figure 18H), a combined pool (Figure 18I), and a combined core pool (Figure 18J). Using the microbiome signatures identified for each symptom, controls and patients in each dataset were classified using a random forest classifier. Figure 31 shows that all microbiome signatures have the ability to classify cases and controls across different studies. [Figure 18H]This study collectively illustrates the classification capabilities of two competing guilds identified from QD, various disease types, a combined pool, and a combined core pool. Microbiome signatures, including genomes, for the two competing guilds were obtained from various diseases: T2D (Figure 18A), LC (Figure 18B), AS (Figure 18C), CRC (Figure 18D), IBD (Figure 18E), QD (Figure 18F), AVCD (Figure 18G), SCZ (Figure 18H), a combined pool (Figure 18I), and a combined core pool (Figure 18J). Using the microbiome signatures identified for each symptom, controls and patients in each dataset were classified using a random forest classifier. Figure 31 shows that all microbiome signatures have the ability to classify cases and controls across different studies. [Figure 18I] This study collectively illustrates the classification capabilities of two competing guilds identified from QD, various disease types, a combined pool, and a combined core pool. Microbiome signatures, including genomes, for the two competing guilds were obtained from various diseases: T2D (Figure 18A), LC (Figure 18B), AS (Figure 18C), CRC (Figure 18D), IBD (Figure 18E), QD (Figure 18F), AVCD (Figure 18G), SCZ (Figure 18H), a combined pool (Figure 18I), and a combined core pool (Figure 18J). Using the microbiome signatures identified for each symptom, controls and patients in each dataset were classified using a random forest classifier. Figure 31 shows that all microbiome signatures have the ability to classify cases and controls across different studies. [Figure 18J]This study collectively illustrates the classification capabilities of two competing guilds identified from QD, various disease types, a combined pool, and a combined core pool. Microbiome signatures, including genomes, for the two competing guilds were obtained from various diseases: T2D (Figure 18A), LC (Figure 18B), AS (Figure 18C), CRC (Figure 18D), IBD (Figure 18E), QD (Figure 18F), AVCD (Figure 18G), SCZ (Figure 18H), a combined pool (Figure 18I), and a combined core pool (Figure 18J). Using the microbiome signatures identified for each symptom, controls and patients in each dataset were classified using a random forest classifier. Figure 31 shows that all microbiome signatures have the ability to classify cases and controls across different studies. [Figure 19] We illustrate combined case and control samples from 25 datasets corresponding to 15 different diseases (type 2 diabetes (T2D), hypertension (HT), schizophrenia (SCZ), atherosclerotic cardiovascular disease (ACVD), cirrhosis (LC), inflammatory bowel disease (IBD), colorectal cancer (CRC), ankylosing spondylitis (AS), Parkinson's disease (PD), multiple sclerosis (MS), Gaucher disease type 2 (GDII), COVID-19 (COV), Behçet's disease (BD), autism spectrum disorder (ASD), and pancreatic cancer (PC)). [Figure 20A1] This paper collectively illustrates a case-versus-control universal random forest classification model based on the abundance of 284 core genomes. 80% training: controls, n=1285; cases, n=1424, 10x CV (A1: Area under the ROC curve of the random forest classifier (AUC); A2: Score density for cases and controls; A3: Probability scores for cases and controls). [Figure 20A2] This paper collectively illustrates a case-versus-control universal random forest classification model based on the abundance of 284 core genomes. 80% training: controls, n=1285; cases, n=1424, 10x CV (A1: Area under the ROC curve of the random forest classifier (AUC); A2: Score density for cases and controls; A3: Probability scores for cases and controls). [Figure 20A3] This paper collectively illustrates a case-versus-control universal random forest classification model based on the abundance of 284 core genomes. 80% training: controls, n=1285; cases, n=1424, 10x CV (A1: Area under the ROC curve of the random forest classifier (AUC); A2: Score density for cases and controls; A3: Probability scores for cases and controls). [Figure 20B1] This study collectively illustrates a case-versus-control universal random forest classification model based on the abundance of 284 core genomes. 20% trials: controls, n=319; cases, n=356 (B1: Area under the ROC curve (AUC) of the random forest classifier; B2: Score density for cases and controls; B3: Probability scores for cases and controls). [Figure 20B2] This study collectively illustrates a case-versus-control universal random forest classification model based on the abundance of 284 core genomes. 20% trials: controls, n=319; cases, n=356 (B1: Area under the ROC curve (AUC) of the random forest classifier; B2: Score density for cases and controls; B3: Probability scores for cases and controls). [Figure 20B3] This study collectively illustrates a case-versus-control universal random forest classification model based on the abundance of 284 core genomes. 20% trials: controls, n=319; cases, n=356 (B1: Area under the ROC curve (AUC) of the random forest classifier; B2: Score density for cases and controls; B3: Probability scores for cases and controls). [Figure 21A] This collectively illustrates iterative training of a universal random forest classification model against case-versus-control groups with randomly selected genome counts. Each data point represents the mean AUC of a random forest model trained 10 times using different sets of randomly selected genomes, with a total number of X (as indicated by the X-axis) determined for the training set. [Figure 21B]This collectively illustrates repeated training of a universal random forest classification model for case-versus-control cases with randomly selected genome counts. Each data point represents the mean AUC of a random forest model trained 10 times using different sets of randomly selected genomes, with a total number of X (as indicated by the X axis) determined for the test set.

[0060] Similar reference numbers refer to the corresponding parts throughout the entire drawing. [Modes for carrying out the invention]

[0061] The methods and systems described herein facilitate the prediction of a subject's response to therapy for a disorder based on the composition of the subject's microbiome.

[0062] definition The terms used in this disclosure are for the sole purpose of describing specific embodiments and are not intended to limit the invention. As used in the specification and appended claims of the invention, the singular forms “a,” “an,” and “the” are intended to include the plural form unless the context clearly indicates otherwise. The term “and / or,” as used herein, will also be understood to refer to and encompass any and all possible combinations of one or more of the enumerated items relating to the invention. “Includes,” “comprising,” or any variation thereof, as used herein, specify the presence of the described features, integers, steps, actions, elements, and / or components, but will not exclude the presence or addition of one or more other features, integers, steps, actions, elements, components, and / or groups thereof. Furthermore, to the extent that the terms “including,” “include,” “having,” “has,” and “with,” or variations thereof, are used in any of the detailed description and / or claims, such terms are intended to be comprehensive, in a similar manner to the term “comprising.”

[0063] As used herein, the term "if" may, depending on the context, be interpreted as "when" or "upon" or "in response to" or "according to" the determination that a preceding stated condition is true. Similarly, the phrase "if determined" or "[the stated condition or event] is detected" may, depending on the context, be interpreted as "at the time of determination" or "in response to determination" or "[the stated condition or event] is detected" or "[the stated condition or event] is detected."

[0064] Furthermore, while terms such as "first," "second," etc., may be used herein to describe various elements, it should be understood that these elements should not be limited by these terms. These terms are used solely to distinguish one element from another. For example, the first subject may be referred to as the second subject without departing from the scope of this disclosure, and similarly, the second subject may be referred to as the first subject. The first and second subjects are both subjects, but they are not the same subject. The terms "subject," "user," and "patient" are used interchangeably herein.

[0065] As used herein, the term “measure of central tendency” refers to the central or representative value of a distribution of values. Non-restrictive examples of measures of central tendency include the arithmetic mean, weighted mean, midrange, mid-hinge, trimian, geometric mean, geometric median, windarized mean, median, and mode of a distribution of values.

[0066] As used herein, the term “Subject” refers to any living organism or non-living organism, including but not limited to humans (e.g., male humans, female humans, fetuses, pregnant women, children, etc.), non-human mammals, or non-human animals. Any human or non-human animal, including but not limited to mammals, reptiles, birds, amphibians, fish, ungulates, ruminants, Bos (e.g., cattle), Equidae (e.g., horses), goats and sheep (e.g., sheep, goats), Suidae (e.g., pigs), Camelidae (e.g., camels, llamas, alpacas), monkeys, apes (e.g., gorillas, chimpanzees), Ursidae plantigrade carnivores (e.g., bears), poultry, dogs, cats, mice, rats, fish, dolphins, whales, and sharks, may function as a subject. In some embodiments, the subject is a male or female of any age (e.g., male, female, or child).

[0067] As used herein, the term “administer” in relation to the methods of the present invention means a method for therapeutically or preventively preventing, treating, or improving a syndrome, disorder, or disease described herein. Such a method includes administering an effective amount of the therapeutic agent at different times during the course of therapy, or in a combination of simultaneous forms. The methods of the present invention should be understood to encompass all known therapeutic regimens.

[0068] As used herein, the terms “cancer,” “cancer tissue,” or “tumor” refer to an abnormal mass of tissue in which the growth of the mass exceeds and is in harmony with the growth of normal tissue, including both solid masses (e.g., such as solid tumors) and fluid masses (e.g., such as blood cancers). Cancer or tumors may be defined as “benign” or “malignant” depending on the following characteristics: degree of cell differentiation, including morphology and function; growth rate; local invasion; and metastasis. “Benign” tumors are well-differentiated, have characteristically slower growth than malignant tumors, and may remain localized at the site of origin. In addition, in some cases, benign tumors do not have the ability to invade, invade, or metastasize to distant sites. “Malignant” tumors are poorly differentiated (anaplastic), have progressive invasion, invasion, and destruction of surrounding tissue, and grow characteristically rapidly. Furthermore, malignant tumors may have the ability to metastasize to distant sites. Thus, cancer cells are cells found within an abnormal mass of tissue in which the growth is in harmony with the growth of normal tissue. Therefore, “tumor sample” refers to a biological sample obtained from or derived from the tumor in question, as described herein.

[0069] Non-specific examples of cancer types include ovarian cancer, cervical cancer, uveal melanoma, colorectal cancer, chromophobic renal cell carcinoma, liver cancer, endocrine tumors, oropharyngeal cancer, retinoblastoma, biliary tract cancer, adrenal cancer, nerve cancer, neuroblastoma, basal cell carcinoma, brain tumors, breast cancer, non-clear cell renal cell carcinoma, glioblastoma, glioma, kidney cancer, gastrointestinal stromal tumors, medulloblastoma, bladder cancer, stomach cancer, bone cancer, non-small cell lung cancer, and chest cancer. Examples include adenoma, prostate cancer, clear cell renal cell carcinoma, skin cancer, thyroid cancer, sarcoma, testicular cancer, head and neck cancer (e.g., head and neck squamous cell carcinoma), meningioma, peritoneal cancer, endometrial cancer, pancreatic cancer, mesothelioma, esophageal cancer, small cell lung cancer, Her2-negative breast cancer, ovarian serous carcinoma, HR+ breast cancer, uterine serous carcinoma, endometrial cancer, gastroesophageal junction adenocarcinoma, gallbladder cancer, chordoma, and papillary renal cell carcinoma.

[0070] As used herein, the terms “cancer condition” or “cancer symptoms” refer to the characteristic features of a cancer patient’s symptoms, such as the diagnostic status, type of cancer, location of cancer, primary site of cancer, stage of cancer, prognosis of cancer, and / or one or more additional features of cancer (e.g., tumor features such as morphology, heterogeneity, size, etc.). In some embodiments, one or more additional personal features of the subject may be used to further describe the cancer condition or cancer symptoms of the subject, such as age, sex, weight, race, personal habits (e.g., smoking, drinking, diet), other relevant health conditions (e.g., hypertension, dry skin, other diseases), current drug therapy, allergies, relevant medical history, current side effects of cancer treatment, and other drug therapies.

[0071] As used herein, the terms “treat,” “treating,” “treatment,” or “therapy” refer to both therapeutic and preventive or protective measures aimed at preventing or delaying (mitigating) a targeted pathological condition or disorder. Those requiring treatment include individuals diagnosed with a disorder, as well as those prone to the disorder (e.g., due to a genetic predisposition), or those for whom the disorder should be prevented. “Prevent,” “preventing,” and “prevention” refer to reducing the likelihood of the onset (or recurrence) of a disease, disorder, condition, or associated symptoms(s). The terms mean obtaining a beneficial or desired outcome, e.g., a clinical outcome. Beneficial or desired outcomes may include, but are not limited to, the relief of one or more symptoms. The term “relief” refers, for example, with respect to the symptoms of a condition as used herein, to reducing at least one of the frequency and amplitude of the patient’s symptoms of the condition.

[0072] As used herein, “response” means, as used herein, the response of a subject suffering from a pathology treatable by a biological drug, chemical drug, or physiotherapy to such biological drug, chemical drug, or physiotherapy. Standard criteria may vary depending on the disease.

[0073] As used herein, “immunotherapy” encompasses all therapies that directly or indirectly modify a patient’s immune response or immune system. Regarding immunotherapy strategies, the detection of a strong immune response at the tumor site has been found to be a reliable marker for several cancers, such as colorectal and rectal cancer, and this association between the existing immune response and better therapeutic efficacy has been hypothesized. The immune response encompasses any form of immune response in the patient, whether direct, indirect, or both, to the cancer or tumor site in question. The immune response refers to the immune response of a host cancer patient in response to a tumor, and includes the presence, number, or alternatively, activity of cells and associated signaling molecules involved in the host’s immune response, including all cytokines, chemokines, growth factors, and stem cell growth factors. In some embodiments, the immune response encompasses numerous different cell subtypes, such as T cell lineages, B cell lineages, natural killer cells, macrophages, dendritic cells, myeloid-derived suppressor cells, lytic dendritic cells, fibroblasts, endothelial cells, and a vast number of signaling molecules (cytokines, chemokines, and other signaling molecules).

[0074] As used herein, “immunotherapy agent” refers to a compound, composition, or treatment that indirectly or directly enhances, stimulates, or enhances the body’s immune response to cancer cells and / or reduces the side effects of other anticancer therapies. Thus, immunotherapy is a therapy that directly or indirectly stimulates or enhances the immune system’s response to cancer cells and / or reduces the side effects that may be caused by other anticancer agents. Immunotherapy is also referred to in the art as immunotherapy, biological therapy, biological response modification therapy, and biotherapy. Examples of common immunotherapy agents known in the art include, but are not limited to, cytokines, cancer vaccines, monoclonal antibodies, and non-cytokine adjuvants. Alternatively, immunotherapy treatment may consist of administering a certain amount of immune cells (T cells, NK cells, dendritic cells, B cells, etc.) to a patient.

[0075] Immunotherapy agents can be nonspecific, meaning they generally enhance the immune system to be more effective in fighting the growth and / or spread of cancer cells, or they can be specific, meaning they can target cancer cells themselves. Immunotherapy regimens can combine the use of nonspecific and specific immunotherapy agents. Nonspecific immunotherapy agents are substances that stimulate or indirectly enhance the immune system. Nonspecific immunotherapy agents are used alone as a primary therapy for the treatment of cancer, and in addition to primary therapies, in which case they function as adjuvants to enhance the effectiveness of other therapies (e.g., cancer vaccines). Nonspecific immunotherapy agents can also function in this latter context to reduce the side effects of other therapies, e.g., myelosuppression induced by certain chemotherapy agents. Nonspecific immunotherapy agents can act on major immune system cells and trigger secondary responses such as increased production of cytokines and immunoglobulins. Alternatively, the drug itself may contain cytokines. Nonspecific immunotherapy agents are generally classified as cytokines or non-cytokine adjuvants.

[0076] Several cytokines have found applications in cancer treatment, either as general nonspecific immunotherapies designed to enhance the immune system, or as adjuvants provided in conjunction with other therapies. Preferred cytokines include, but are not limited to, interferons, interleukins, and colony-stimulating factors.

[0077] The interferons (IFNs) intended by this invention include common types of IFN, IFN-alpha (IFN-a), IFN-beta (IFN-β), and IFN-gamma (IFN-y). IFNs can act directly on cancer cells, for example, by slowing their growth, promoting their development into cells with more normal behavior, and / or increasing their production of antigens, thus facilitating the immune system's recognition and destruction of cancer cells. IFNs can also act indirectly on cancer cells, for example, by slowing angiogenesis, strengthening the immune system, and / or stimulating natural killer (NK) cells, T cells, and macrophages. Recombinant IFN-alpha is commercially available as Roferon (Roche Pharmaceuticals) and Intron A (Schering Corporation). IFN-alpha has been shown to be effective, either alone or in combination with other immunotherapies or chemotherapy agents, in the treatment of a variety of cancers, including melanoma (including metastatic melanoma), kidney cancer (including metastatic kidney cancer), breast cancer, prostate cancer, and cervical cancer (including metastatic cervical cancer).

[0078] The interleukins intended by this invention include IL-2, IL-4, IL-11, and IL-12. Examples of commercially available recombinant interleukins include Proleukin® (IL-2; Chiron Corporation) and Neumega® (IL-12; Wyeth Pharmaceuticals). Zymogenetics, Inc. (Seattle, Wash.) is currently testing a recombinant form of IL-21, which is also intended for use in combination with the present invention. Interleukins have been shown to be effective, either alone or in combination with other immunotherapies or chemotherapies, in the treatment of a variety of cancers, including renal cancer (including metastatic renal cancer), melanoma (including metastatic melanoma), ovarian cancer (including recurrent ovarian cancer), cervical cancer (including metastatic cervical cancer), breast cancer, colorectal cancer, lung cancer, brain cancer, and prostate cancer. Interleukins have also shown good activity in combination with IFN-α in the treatment of various cancers (Negrier et al., Ann Oncol. 2002 13(9):1460-8; Touranietal, JClin Oncol. 2003 21(21):398794).

[0079] The colony-stimulating factors (CSFs) intended by this invention include granulocyte colony-stimulating factor (G-CSF or filgrastim), granulocyte-macrophage colony-stimulating factor (GM-CSF or salgramostim), and erythropoietin (epoetin alfa, darbepoetin). Treatment with one or more growth factors can help stimulate the generation of new blood cells in patients undergoing conventional chemotherapy. Therefore, treatment with CSFs can help reduce the side effects associated with chemotherapy and allow for the use of higher doses of chemotherapy agents. Various recombinant colony-stimulating factors, such as Neupogen® (G-CSF; Amgen), Neulasta (pelfilgrastim; Amgen), Leukine (GM-CSF; Berlex), Procrit (erythropoietin; Ortho Biotech), Epogen (erythropoietin; Amgen), and Arnesp (erytropoietin), are commercially available. Colony-stimulating factors have shown efficacy in the treatment of cancers including melanoma, colorectal cancer (including metastatic colorectal cancer), and lung cancer.

[0080] Suitable non-cytokine adjuvants for use in combination with the present invention include, but are not limited to, levamisole, aluminum hydroxide (alum), Bacillus Calmette-Guérin (ACG), incomplete Freund's adjuvant (IFA), QS-21, DETOX, keyhole-limpet hemocyanin (KLH), and dinitrophenyl (DNP). Non-cytokine adjuvants in combination with other immunotherapeutic and / or chemotherapeutic agents have demonstrated efficacy against a variety of cancers, including, for example, colon and colorectal cancer (levamisole), melanoma (BCG and QS-21), and renal and bladder cancer (BCG).

[0081] In addition to having specific or nonspecific targets, immunotherapeutic agents can be active, i.e., they can stimulate the body's own immune response, or they can be passive, i.e., they can include immune system components produced outside the body.

[0082] Passive specific immunotherapy typically involves the use of one or more monoclonal antibodies that are specific to specific antigens found on the surface of cancer cells or to specific cell growth factors. Monoclonal antibodies can be used to treat cancer in several ways, for example, by targeting specific cell growth factors, such as those involved in angiogenesis, to enhance the target immune response against a particular type of cancer, or by enhancing the delivery of other anticancer drugs to cancer cells when bound to or conjugated to drugs such as chemotherapeutic agents, radioactive particles, or toxins, in order to inhibit cancer growth.

[0083] Monoclonal antibodies currently in use as cancer immunotherapy agents suitable for inclusion in the combination of the present invention include, but are not limited to, rituximab (Rituxan®), trastuzumab (Herceptin®), ibritumomab tiuxetan (Zevalin®), tocitumomab (Bexxar®), cetuximab (C-225, Erbitux®), bevacizumab (Avastin®), gemtuzumab ozogamisin (Mylotarg®), alemtuzumab (Campath®), and BL22. Monoclonal antibodies are used to treat a wide range of cancers, including breast cancer (including advanced metastatic breast cancer), colorectal cancer (including advanced and / or metastatic colorectal cancer), ovarian cancer, lung cancer, prostate cancer, cervical cancer, melanoma, and brain tumors.

[0084] Other examples include antibodies specific to co-stimulatory molecules. Examples of co-stimulatory molecules include B7-1 / CD80, CD28, B7-2 / CD86, CTLA-4, B7-H1 / PD-L1, Gi24 / Dies 1 / VISTA, B7-H2, ICOS, B7-H3 PD-1, B7-H4, PD-L2 / B7-DC, B7-H6, PDCD6, BTLA, 4-1 BB / TNFSF9 / CD137, CD40 ligand / TNFSF5, and 4-1 BB ligand / TNFSF9. GITR / TNFRSF18, HVEM / TNFRSF14, CD27 / TNFRSF7, LIGHT / TNFSF14, CD27 ligand / TNFSF7, OX40 / TNFRSF4, CD30 / TNFRSF8, OX40 ligand / TNFSF4, CD30 ligand / TNFSF8, TACI / TNFRSF13B, CD40 / TNFRSF5, 2B4 / CD244 / SLAMF4, CD84 / SLAMF5, BLAME / SLAMF8, CD229 / SLAMF3, CD2CRACC / SLAMF7, CD2F-10 / SLAMF9, NTB-A / SLAMF6, CD48 / SLAMF2, SLAM / CD 150, CD58 / LFA-3, CD2 Ikaros, CD53 Integrin Alpha 4 / CD49d, CD82 / Kai-1 Integrin Alpha 4 Beta 1, CD90 / Thyl Examples include integrin alpha-4 beta-7 / LPAM-1, CD96 LAG-3, CD160 LMIR1 / CD300A, CRTAM TCL1A, DAP12 TCL1B, Dectin-1 / CLEC7A TIM-1 / KIM-1 / HAVCR, DPPIV / CD26 TIM-4, EphB6 TSLP, HLA class I TSLP R, and HLA-DR. In particular, antibodies are selected from the group consisting of anti-CTLA4 antibodies (e.g., ipilimumab), anti-PD1 antibodies, anti-PDL1 antibodies, anti-TIMP3 antibodies, anti-LAG3 antibodies, anti-B7H3 antibodies, anti-B7H4 antibodies, anti-TREM antibodies, anti-BTLA antibodies, anti-LIGHT antibodies, or anti-B7H6 antibodies.

[0085] Monoclonal antibodies can be used alone or in combination with other immunotherapeutic or chemotherapeutic agents.

[0086] Activity-specific immunotherapy typically involves the use of cancer vaccines. Cancer vaccines containing whole cancer cells, parts of cancer cells, or one or more antigens derived from cancer cells have been developed. Cancer vaccines have been investigated, alone or in combination with one or more immunotherapy or chemotherapy agents, in the treatment of several types of cancer, including melanoma, kidney cancer, ovarian cancer, breast cancer, colorectal cancer, and lung cancer. Non-specific immunotherapeutic agents are useful in combination with cancer vaccines to enhance the body's immune response.

[0087] Immunotherapy treatment may consist of adoptive immunotherapy, as described by Nicholas P. Restifo, Mark E. Dudley, and Steven A. Rosenberg ("Adoptive immunotherapy for cancer: harnessing the T cell response," Nature Reviews Immunology, Volume 12, April 2012). In adoptive immunotherapy, the patient's circulating lymphocytes or tumor-infiltrating lymphocytes are isolated in vitro, activated with lymphokines such as IL-2, or transduced with tumor necrosis genes, and then re-administered (Rosenberg et al., 1988; 1989). The activated lymphocytes are most preferably the patient's own cells previously isolated from blood or tumor samples and activated (or "expanded") in vitro. This form of immunotherapy has resulted in several cases of recurrent melanoma and renal cancer.

[0088] As used herein, the term “genomic abundance” refers to the absolute or relative amount of microbial genome in a biological sample from the gut of a subject. Genomic abundance may be expressed in different units, including copy number, molar concentration, mass (e.g., normalized against genome size), unique sequence reads (e.g., normalized against genome size), a percentage of any of the previous metrics relative to the total amount of metrics across all genomes in the sample, or a percentage of any of the previous metrics relative to the total amount of metrics across multiple genomes in the sample. In some embodiments, the genomic abundance is normalized against the total genomic abundance in the sample. In some embodiments, the genomic abundance is normalized against the genomic abundance relative to a control genome in the sample. In some embodiments, values ​​for multiple genomic abundances in the sample are standardized, normalized, and / or scaled. Examples of methods for normalizing genomic abundance values ​​are described, for example, in Lin, H., Peddada, SD, *Analysis of microbial compositions: a review of normalization and differential abundance analysis*, *Biofilms Microbiomes*, 6(60)(2020) and Lutz KC, et al., *A Survey of Statistical Methods for Microbiome Data Analysis*, *Frontiers in Applied Mathematics and Statistics*, 8(2022), the contents of which are incorporated herein by reference in their entirety. Methods for measuring genomic abundance values ​​are known in the art. For example, metagenomic sequencing can be used to extensively reconstruct microbial genomes from next-generation sequencing of genomic DNA in biological samples, such as biological samples from the gut of a target organism.For a review of metagenomic sequences, see, for example, Quince C, et al., Shotgun metagenomics, from sampling to analysis, Nat Biotechnol, 35(9):833-44 (2017), the contents of which are incorporated herein by reference in their entirety. Genomic abundance may also be determined by quantification of the copy number of ribosomal genes, such as the 16S rRNA gene. Examples of rRNA quantification are described in Manzari C., et al., Accurate quantification of bacterial abundance in metagenomic DNAs accounting for variable DNA integrity levels, Microb Genom., 6(10):mgen000417 (2020) and Barlow, JT, et al., A quantitative sequencing framework for absolute abundance measurements of mucosal and lumenal microbial communities, Nat Commun., 11:2590 (2020), the contents of which are incorporated herein by reference in their entirety.

[0089] As used herein, the term “relative abundance” refers to the ratio of a second amount of a compound measured in a second sample to a first amount of a compound measured in a sample, e.g., the genome of a first microorganism. In some embodiments, relative abundance refers to the ratio of the amount of a compound to the total amount of a compound in the same sample, e.g., the total amount of a microbial genome or the total amount of multiple genomes, e.g., the amount of the genome of a first microorganism. In other embodiments, relative abundance refers to the ratio of the amount of a compound in the first sample, e.g., the genome of a first microorganism, to the amount of a compound in the second sample. For example, the ratio of the normalized amount of genome for a first microorganism in the first sample to the normalized amount of genome for a first microorganism in the second sample and / or reference sample.

[0090] As used herein, terms such as “sequencing” and “sequence determination” refer to any biochemical process that may be used to determine the order of biomolecules such as nucleic acids or proteins. For example, sequencing data may include all or part of the nucleotide bases in a nucleic acid molecule, such as an mRNA transcript or a genomic locus.

[0091] As used herein, the terms “sequence read” or “read” refer to a nucleotide sequence produced by any nucleic acid sequencing process described herein or known in the art. Reads can be produced from one end of a nucleic acid fragment ("single-ended read") or from both ends of a nucleic acid fragment (e.g., paired-end read, double-ended read). The length of a sequence read is often related to the particular sequencing technique. For example, high-throughput methods provide sequence reads whose size can vary by tens to hundreds of base pairs (bp). In some embodiments, the sequence reads are of an average, median, or mean length of approximately 15 bp to 900 bp (e.g., approximately 20 bp, 25 bp, 30 bp, 35 bp, 40 bp, 45 bp, 50 bp, 55 bp, 60 bp, 65 bp, 70 bp, 75 bp, 80 bp, 85 bp, 90 bp, 95 bp, 100 bp, 110 bp, 120 bp, 130 bp, 140 bp, 150 bp, 200 bp, 250 bp, 300 bp, 350 bp, 400 bp, 450 bp, or approximately 500 bp). In some embodiments, the sequence reads are of an average, median, or mean length of approximately 1000 bp, 2000 bp, 5000 bp, 10,000 bp, or 50,000 bp or more. For example, Nanopore® sequencing can provide sequence reads that can vary in size by tens, hundreds, or thousands of base pairs. Illumina® parallel sequencing can provide less variable sequence reads, for example, so that most sequence reads can be less than 200 bp. A sequence read (or sequencing read) can refer to sequence information corresponding to a nucleic acid molecule (e.g., a string of nucleotides). For example, a sequence read can correspond to a string of nucleotides from a portion of a nucleic acid fragment (e.g., about 20 to about 150), a string of nucleotides at one or both ends of a nucleic acid fragment, or nucleotides from the entire nucleic acid fragment.Sequence reads can be obtained in various ways, for example, by using sequencing techniques, or by using probes, such as hybridization arrays or capture probes, or by amplification techniques such as polymerase chain reaction (PCR), linear amplification using a single primer, or isothermal amplification.

[0092] As used herein, the term “read segment” refers to any form of nucleotide sequence read, including raw sequence reads obtained directly from nucleic acid sequencing techniques or sequences derived therefrom, such as aligned sequence reads, folded sequence reads, or stitched sequence reads.

[0093] As used herein, the term “read count” refers to the total number of nucleic acid reads generated during a nucleic acid sequencing reaction, which may or may not be equivalent to the number of nucleic acid molecules generated.

[0094] As used herein, the terms “read depth,” “sequencing depth,” or “depth” may refer to the total number of unique nucleic acid fragments that encompass a particular locus or region of the microbial genome sequenced in a particular sequencing reaction. Sequence depth can be expressed as “Y times,” e.g., 50 times, 100 times, etc., where “Y” refers to the number of unique nucleic acid fragments that encompass a particular locus sequenced in the sequencing reaction. In such cases, Y is necessarily an integer, as it represents the actual sequencing depth for a particular locus. Alternatively, read depth, sequencing depth, or depth may refer to a measure of the central tendency (e.g., mean or mode) of the number of unique nucleic acid fragments that encompass one of several loci or regions of the microbial genome sequenced in a particular sequencing reaction. For example, in some embodiments, sequencing depth refers to the average depth of all loci across a targeted sequencing panel, exome, or entire genome of a microorganism. In such cases, Y may be expressed as a fraction or decimal, as it refers to the average coverage across multiple loci. When average depths are listed, the actual depth for any given locus may differ from the total depth listed. A metric can be determined that provides a range of sequencing depths in which a defined percentage of the total number of loci fall. For example, a range of sequencing depths in which 90%, 95%, or 99% of the loci fall. As those skilled in the art will understand, different sequencing techniques provide different sequencing depths. For example, low-pass whole-genome sequencing may refer to techniques that provide sequencing depths of less than 5x, less than 4x, less than 3x, or less than 2x, for example, approximately 0.5x to approximately 3x.

[0095] As used herein, the term “sequencing width” refers to the percentage of a particular microbial genome that has been sequenced. Sequencing width can be expressed as a fraction, decimal, or percentage, and is generally calculated as (number of loci analyzed / total number of loci in the genome). The denominator of the percentage can be a repetitive-masked genome, so that 100% can correspond to the entire reference genome excluding the masked portion. A repetitive-masked genome can refer to a genome in which sequence repeats are masked (e.g., sequence reads align to an unmasked portion of the genome). In some embodiments, any portion of the genome can be masked, and therefore the sequencing width can be evaluated for any desired portion of the genome.

[0096] As used herein, the terms “sequence ratio” and “coverage ratio” interchangeably refer to any measure of the number of units of each genomic sequence in one or more first biological samples (e.g., test and / or tumor samples) compared to the number of units of each genomic sequence in one or more second biological samples (e.g., reference and / or control samples). In some embodiments, the sequence ratio is the copy ratio, log2-converted copy ratio (e.g., log2 copy ratio), coverage ratio, base fraction, allele fraction (e.g., variant allele fraction), and / or tumor polyploid. In some embodiments, the sequence ratio is logarithmic N This is the conversion copy ratio, where N is any real number greater than 1.

[0097] As used herein, the term “sequencing probe” refers to a molecule that binds to a nucleic acid with affinity based on the expected nucleotide sequence of the RNA or DNA present at that locus.

[0098] As used herein, the terms “targeted panel” or “targeted gene panel” refer to a combination of probes for sequencing nucleic acids present (e.g., by next-generation sequencing) in a biological sample from a subject (e.g., a tumor sample, a liquid biopsy sample, a germ cell tissue sample, a leukocyte sample, or a tumor or tissue organoid sample) that have been selected to map to one or more loci of interest within the genome.

[0099] As used herein, the terms “sensitivity” or “true positive rate” (TPR) refer to the number of true positives divided by the sum of the number of true positives and false negatives. Sensitivity can characterize the ability of an assay or method to accurately identify the proportion of a population that is truly symptomatic. For example, sensitivity can characterize the ability of a method to accurately identify the number of subjects in a population that have a particular biological characteristic.

[0100] As used herein, the terms “specificity” or “true negative rate” (TNR) refer to the number of true negatives divided by the sum of true negatives and false positives. Specificity can characterize the ability of an assay or method to accurately identify the proportion of a population that is truly asymptomatic. For example, specificity can characterize the ability of a method to correctly identify the number of subjects in a population that do not possess a particular biological characteristic.

[0101] Where used interchangeably in this specification, the terms “classifier” or “model” refer to a machine learning model or algorithm.

[0102] In some embodiments, the model includes unsupervised learning algorithms. An example of an unsupervised learning algorithm is cluster analysis. In some embodiments, the model includes supervised machine learning. Non-exclusive examples of supervised learning algorithms include, but are not limited to, logistic regression, neural networks, support vector machines, naive Bayes algorithms, nearest neighbor algorithms, random forest algorithms, decision tree algorithms, boosted tree algorithms, polynomial logistic regression algorithms, linear models, linear regression, gradient boosting, mixture models, hidden Markov models, Gaussian NB algorithms, linear discriminant analysis, diffusion models, or any combination thereof. In some embodiments, the model is a multinomial classifier algorithm. In some embodiments, the model is a two-stage stochastic gradient descent (SGD) model. In some embodiments, the model is a deep neural network (e.g., a deep and wide sample-level model).

[0103] Neural networks. In some embodiments, the model is a neural network (e.g., a convolutional neural network and / or a residual neural network). Neural network algorithms, also known as artificial neural networks (ANNs), include convolutional and / or residual neural network algorithms (deep learning algorithms). In some embodiments, a neural network is a machine learning algorithm trained to map an input dataset to an output dataset, and the neural network includes an interconnected group of nodes organized into multiple layers of nodes. For example, in some embodiments, a neural network architecture may include at least an input layer, one or more hidden layers, and an output layer. In some embodiments, a neural network may include any total number of layers and any number of hidden layers, where the hidden layers function as trainable feature extractors that enable mapping a set of input data to output values ​​or a set of output values. In some embodiments, a deep learning algorithm is a neural network including multiple hidden layers, e.g., two or more hidden layers. In some cases, each layer of the neural network includes several nodes (or "neurons"). In some embodiments, a node receives inputs coming directly from either input data or the output of a node in the previous layer and performs a specific operation, e.g., a summing operation. In some embodiments, the connection from the input to the node is associated with parameters (e.g., weights and / or weight coefficients). In some embodiments, the node receives inputs, x i, and the products of all pairs of the parameters associated with them are summed. In some embodiments, the weighted sum is offset by a bias b. In some embodiments, the output of a node or neuron is gated using a threshold or activation function f, which in some cases is a linear or nonlinear function. In some embodiments, the activation function is, for example, a rectified linear unit (ReLU) activation function, a leaky ReLU activation function, or other functions such as saturated hyperbolic tangent, identity, binary step, logistic, arcTan, soft sine, parametric rectified linear unit, exponential linear unit, softPlus, bent identity, softExponential, sinusoid, sine, Gaussian, or sigmoid function, or any combination thereof.

[0104] In some embodiments, the weight coefficients, bias values, thresholds, or other computational parameters of a neural network are “taught” or “learned” during the training phase using one or more sets of training data. For example, in some embodiments, the parameters are trained using input data from the training dataset and gradient descent or backpropagation so that the output value(s) computed by the ANN match examples contained in the training dataset. In some embodiments, the parameters are obtained from a backpropagation neural network training process.

[0105] Any of the various neural networks are suitable for use according to this disclosure. Examples include, but are not limited to, feedforward neural networks, radial basis function networks, recurrent neural networks, residual neural networks, convolutional neural networks, residual convolutional neural networks, or any combination thereof. In some embodiments, machine learning utilizes a pre-trained and / or transfer-learned ANN or deep learning architecture. In some embodiments, convolutional and / or residual neural networks are used according to this disclosure.

[0106] For example, a deep neural network model includes an input layer, several individually parameterized (e.g., weighted) convolutional layers, and an output scorer. Each parameter (e.g., weight) of the convolutional layer, as well as the input layer, contributes to several parameters (e.g., weights) associated with the deep neural network model. In some embodiments, at least 50 parameters, at least 100 parameters, at least 1000 parameters, at least 2000 parameters, or at least 5000 parameters are associated with the deep neural network model. Therefore, the deep neural network model cannot be solved mentally and requires the use of a computer. In other words, given an input to the model, in such embodiments, the model output must be determined using a computer, not mentally. For example, see Krizhevsky et al., 2012, “Imagenet classification with deep convolutional neural networks,” in Advances in Neural Information Processing Systems 2, Pereira, Burges, Bottou, Weinberger, eds., pp. 1097-1105, Curran Associates, Inc., Zeiler, 2012 “ADADELTA: an adaptive learning rate method,” CoRR, vol. abs / 1212.5701, and Rumelhart et al., 1988, “Neurocomputing: Foundations of research,” ch. Learning Representations by Back-propagating Errors, pp. 696-699, Cambridge, MA, USA: MIT Press, each of which is incorporated herein by reference.

[0107] Neural network algorithms, including convolutional neural network models suitable for use as models, are disclosed, for example, in Vincent et al., 2010, “Stacked denoising autoencoders: Learning useful representations in a deep network with a local denoising criterion,” J Mach Learn Res 11, pp. 3371-3408, Larochelle et al., 2009, “Exploring strategies for training deep neural networks,” J Mach Learn Res 10, pp. 1-40, and Hassoun, 1995, Fundamentals of Artificial Neural Networks, Massachusetts Institute of Technology, each of which is incorporated herein by reference. Further exemplary neural networks suitable for use as models are disclosed in Duda et al., 2001, Pattern Classification, Second Edition, John Wiley & Sons, Inc., New York, and Hastie et al., 2001, The Elements of Statistical Learning, Springer-Verlag, New York, each of which is incorporated herein by reference in its entirety. Further exemplary neural networks suitable for use as models are also described in Draghici, 2003, Data Analysis Tools for DNA Microarrays, Chapman & Hall / CRC, and Mount, 2001, Bioinformatics: Sequence and Genome Analysis, Cold Spring Harbor Laboratory Press, Cold Spring Harbor, New York, each of which is incorporated herein by reference in its entirety.

[0108] Support Vector Machines. In some embodiments, the model is a support vector machine (SVM). Suitable SVM algorithms for use as models include, for example, Cristianini and Shawe-Taylor, 2000, “An Introduction to Support Vector Machines,” Cambridge University Press, Cambridge; Boser et al., 1992, “A training algorithm for optimal margin classifiers,” in Proceedings of the 5th Annual ACM Workshop on Computational Learning Theory, ACM Press, Pittsburgh, Pa., pp. 142-152; Vapnik, 1998, Statistical Learning Theory, Wiley, New York; Mount, 2001, Bioinformatics: sequence and genome analysis, Cold Spring Harbor Laboratory Press, Cold Spring Harbor, NY; Duda, Pattern Classification, Second Edition, 2001, John Wiley & Sons, Inc., pp. 259, 262-265; and Hastie, 2001, The Elements of Statistical Learning, Springer, New York. This is described in York; and Furey et al., 2000, Bioinformatics 16, 906-914, each of which is incorporated herein by reference in whole. When used for classification, SVMs separate a given set of binary-labeled data using hyperplanes furthest from the labeled data. In certain cases where linear separation is not possible, SVMs work in combination with the technique of a “kernel” that automatically provides a nonlinear mapping to the feature space. The hyperplanes found by the SVM in the feature space correspond in some cases to nonlinear decision boundaries in the input space.In some embodiments, multiple parameters associated with the SVM (e.g., weights) define a hyperplane. In some embodiments, the hyperplane is defined by at least 10, at least 20, at least 50, or at least 100 parameters, and the SVM model requires a computer to compute because it cannot be solved mentally.

[0109] Naive Bayes algorithms. In some embodiments, the model is a naive Bayes algorithm. Suitable naive Bayes models for use as models are disclosed, for example, in Ng et al., 2002, “On discriminative vs. generative classifiers: A comparison of logistic regression and naive Bayes,” Advances in Neural Information Processing Systems, 14, which is incorporated herein by reference. Naive Bayes models are any model within the family of “stochastic models” that are based on applying Bayes’ theorem with strong (naive) independence assumptions between features. In some embodiments, they are combined with kernel density estimation. For example, see Hastie et al., 2001, The elements of statistical learning: data mining, inference, and prediction, eds. Tibshirani and Friedman, Springer, New York, which is incorporated herein by reference.

[0110] The nearest neighbor algorithm. In some embodiments, the model is the nearest neighbor algorithm. In some embodiments, the nearest neighbor model is memory-based and does not include the model to be fitted. For nearest neighbors, given a query point x0 (under test), k training points x (r)The k (here, the training target) that is closest to x0 is identified, and then the point x0 is classified using the k nearest neighbors. In some embodiments, the Euclidean distance in the feature space is used to determine the distance d (i) =||x (i) -x (O) It is determined as ||. Typically, when the nearest neighbor algorithm is used, the abundance data used to compute the linear discriminant is standardized to have a mean of zero and a variance of 1. In some embodiments, the nearest neighbor rule is improved to address the problems of unequal class pre-determination, differential misclassification costs, and feature selection. Many of these improvements involve some form of weighted voting for neighbors. For further information on nearest neighbor analysis, see Duda, Pattern Classification, Second Edition, 2001, John Wiley & Sons, Inc., and Hastie, 2001, The Elements of Statistical Learning, Springer, New York, each of which is incorporated herein by reference.

[0111] The k-nearest neighbor model is a non-parametric machine learning method in which the input consists of k nearest training examples in the feature space. The output is class membership. An object is classified by multiple votes of its neighbors, and the object is assigned to the most common class among its k nearest neighbors (k is typically a small positive integer). When k=1, the object is simply assigned to the class of its single nearest neighbor. See Duda et al., 2001, Pattern Classification, Second Edition, John Wiley & Sons, incorporated herein by reference. In some embodiments, the number of distance calculations required to solve the k-nearest neighbor model is such that it is not mentally feasible, and a computer is used to solve a model for a given input.

[0112] Random forest, decision tree, and boost tree algorithms. In some embodiments, the model is a decision tree. Decision trees suitable for use as models are outlined in Duda, 2001, Pattern Classification, John Wiley & Sons, Inc., New York, pp. 395–396, which is incorporated herein by reference. Tree-based methods divide the feature space into sets of rectangles and fit a model (such as a constant) to each. In some embodiments, the decision tree is random forest regression. For example, one particular algorithm is the Classification and Regression Tree Method (CART). Other particular decision tree algorithms include, but are not limited to, ID3, C4.5, MART, and Random Forest. CART, ID3, and C4.5 are described in Duda, 2001, Pattern Classification, John Wiley & Sons, Inc., New York, pp. 396–408 and pp. 411–412, which is incorporated herein by reference. CART, MART, and C4.5 are described in Hastie et al., 2001, The Elements of Statistical Learning, Springer-Verlag, New York, Chapter 9, which are incorporated herein by reference in their entirety. Random forests are described in Breiman, 1999, “Random Forests--Random Features,” Technical Report 567, Statistics Department, UC Berkeley, September 1999, which are incorporated herein by reference in their entirety. In some embodiments, decision tree models include at least 10, at least 20, at least 50, or at least 100 parameters (e.g., weights and / or decisions) and require a computer to compute them because they cannot be solved mentally.

[0113] Regression. In some embodiments, the model uses a regression algorithm. In some embodiments, the regression algorithm is any type of regression. For example, in some embodiments, the regression algorithm is logistic regression. In some embodiments, the regression algorithm is logistic regression by Lasso, L2, or elastic net normalization. In some embodiments, these extracted features having corresponding regression coefficients that fail to satisfy a threshold are removed from consideration. In some embodiments, a generalization of the logistic regression model that handles multi-category responses is used as the model. The logistic regression algorithm is disclosed in Agresti, An Introduction to Categorical Data Analysis, 1996, Chapter 5, pp. 103-144, John Wiley & Son, New York, incorporated herein by reference. In some embodiments, the model utilizes the regression model disclosed in Hastie et al., 2001, The Elements of Statistical Learning, Springer-Verlag, New York. In some embodiments, the logistic regression model includes at least 10, at least 20, at least 50, at least 100, or at least 1000 parameters (e.g., weights), and since it cannot be solved mentally, a computer is required to compute it.

[0114] Linear Discriminant Analysis Algorithms. In some embodiments, linear discriminant analysis (LDA), also known as normal discriminant analysis (NDA), or discriminant function analysis, is a generalization of Fisher's linear discriminant, a method used in statistics, pattern recognition, and machine learning to find linear combinations of features that characterize or separate two or more classes of objects or events. In some embodiments, the resulting combinations are used as a model (linear model) in some embodiments of this disclosure.

[0115] Mixed models and hidden Markov models. In some embodiments, the model is a mixed model, such as the one described in McLachlan et al., Bioinformatics 18(3):413-422, 2002. In some embodiments, particularly those embodiments including a time component, the model is a hidden Markov model, such as the one described in Schliep et al., 2003, Bioinformatics 19(1):i255-i263.

[0116] Clustering. In some embodiments, the model is an unsupervised clustering model. In some embodiments, the model is a supervised clustering model. Suitable clusterings for use as clustering algorithms are, for example, pages 211–256 of Duda and Hart, Pattern Classification and Scene Analysis, 1973, John Wiley & Sons, Inc., New York (hereinafter referred to herein as “Duda 1973”), which are incorporated herein by reference in their entirety. As an exemplary example, in some embodiments, the problem of clustering is described as one of finding natural groupings in a dataset. To identify natural groupings, two problems are addressed. First, determine a method for measuring similarity (or difference) between two samples. This metric (e.g., a similarity measure) is used to ensure that samples in one cluster are more similar to each other than samples in the other cluster. Second, determine a mechanism for dividing the data into clusters using the similarity measure. One way to begin a clustering investigation is to define a distance function and compute a matrix of distances between all pairs of samples in the training set. If distance is a good measure of similarity, the distance between reference entities within the same cluster is significantly smaller than the distance between reference entities in different clusters. However, in some embodiments, clustering does not use a distance metric. For example, in some embodiments, a non-metric similarity function s(x, x') is used to compare two vectors x and x'. In some such embodiments, s(x, x') is a symmetric function that has a large value when x and x' are "similar" in some way. Once a method is chosen for measuring the "similarity" or "difference" between points in the dataset, clustering uses a criterion function to measure the clustering quality of any split of the data. The data is clustered using splits of the dataset that extremize the criterion function.Specific exemplary clustering techniques intended for use in this disclosure include, but are not limited to, hierarchical clustering (agglomerative clustering using nearest neighbor algorithms, farthest neighbor algorithms, mean linkage algorithms, centroid algorithms, or sum-of-squares algorithms), k-means clustering, fuzzy k-means clustering algorithms, and Jarvis-Patrick clustering. In some embodiments, the clustering includes unsupervised clustering (e.g., without prior determination of a predetermined number of clusters and / or cluster assignments).

[0117] Model and boosting ensembles. In some embodiments, an ensemble of models (two or more) is used. In some embodiments, a boosting technique such as AdaBoost is used in conjunction with many other types of learning algorithms to improve the performance of the models. In this approach, the outputs of any of the models disclosed herein, or their equivalents, are combined into a weighted sum representing the final output of the boosted models. In some embodiments, multiple outputs from the models are combined using any measure of central tendency known in the art, including but not limited to the mean, median, mode, weighted mean, weighted median, weighted mode, etc. In some embodiments, multiple outputs are combined using a voting method. In some embodiments, each model in the ensemble of models is weighted or unweighted.

[0118] As used herein, the term “parameter” means any coefficient or value of an internal or external element (e.g., weights and / or hyperparameters) in an algorithm, model, regressionr, and / or classifier that can affect (e.g., modify, adapt, and / or tune) one or more inputs, outputs, and / or features in the algorithm, model, regressionr, and / or classifier. For example, in some embodiments, a parameter means any coefficient, weight, and / or hyperparameter that can be used to control, modify, adapt, and / or tune the behavior, learning, and / or performance of an algorithm, model, regressionr, and / or classifier. In some cases, a parameter is used to increase or decrease the influence of an input (e.g., a feature) to an algorithm, model, regressionr, and / or classifier. As a non-limiting example, in some embodiments, a parameter is used to increase or decrease the influence of a node (e.g., a neural network), which includes one or more activation functions. The assignment of parameters to specific inputs, outputs, and / or functions is not limited to any single paradigm for a given algorithm, model, regressionr, and / or classifier, but can be used in any suitable algorithm, model, regressionr, and / or classifier architecture for desired performance. In some embodiments, the parameters have fixed values. In some embodiments, the parameter values ​​are adjustable manually and / or automatically. In some embodiments, the parameter values ​​are modified by a validation and / or training process for the algorithm, model, regressionr, and / or classifier (e.g., by an error minimization and / or backpropagation method). In some embodiments, the algorithms, models, regressionrs, and / or classifiers of this disclosure include a plurality of parameters.In some embodiments, the plurality of parameters are n parameters, where n ≥ 2; n ≥ 5; n ≥ 10; n ≥ 25; n ≥ 40; n ≥ 50; n ≥ 75; n ≥ 100; n ≥ 125; n ≥ 150; n ≥ 200, n ≥ 225, n ≥ 250, n ≥ 350, n ≥ 500, n ≥ 600, n ≥ 750, n ≥ 1,000, n ≥ 2,000, n ≥ 4,000, n ≥ 5,000, n ≥ 7,500, n ≥ 10,000, n ≥ 20,000, n ≥ 40,000, n ≥ 75,000, n ≥ 100,000, n ≥ 200,000, n ≥ 500,000, n ≥ 1 × 10. 6 , n ≥ 5 × 10 6 , or n ≥ 1 × 10 7 is. Therefore, the algorithms, models, regressors, and / or classifiers of the present disclosure cannot be implemented mentally. In some embodiments, n is from 10,000 to 1 × 10 7 , from 100,000 to 5 × 10 6 , or from 500,000 to 1 × 10 6 is. In some embodiments, the algorithms, models, regressors, and / or classifiers of the present disclosure operate in a k-dimensional space, where k is a positive integer greater than or equal to 5 (e.g., 5, 6, 7, 8, 9, 10, etc.). Therefore, the algorithms, models, regressors, and / or classifiers of the present disclosure cannot be implemented mentally.

[0119] As used herein, the term “untrained model” (e.g., “untrained classifier” and / or “untrained neural network”) refers to a machine learning model or algorithm, such as a classifier or neural network, that has not been trained on a target dataset. In some embodiments, “training a model” (e.g., “training a neural network”) refers to the process of training an untrained or partially trained model (e.g., “untrained or partially trained neural network”). Furthermore, it should be understood that the term “untrained model” does not preclude the possibility that transfer learning techniques may be used to train such untrained or partially trained models. For example, Fernandes et al., 2017, “Transfer Learning with Partial Observability Applied to Cervical Cancer Screening,” Pattern Recognition and Image Analysis: 8th Iberian Conference Proceedings, 243-250, incorporated herein by reference, provides a non-limiting example of such transfer learning. In examples where transfer learning is used, the aforementioned untrained model is provided with additional data beyond that of the primary training dataset. Typically, this additional data takes the form of parameters (e.g., coefficients, weights, and / or hyperparameters) learned from another auxiliary training dataset. Furthermore, while a description of a single auxiliary training dataset is disclosed, it should be understood that there is no limit to the number of auxiliary training datasets that can be used to complement the primary training dataset when training an untrained model in this disclosure. For example, in some embodiments, two or more auxiliary training datasets, three or more auxiliary training datasets, four or more auxiliary training datasets, or five or more auxiliary training datasets may be used to complement the primary training dataset through transfer learning, each of which such auxiliary datasets is distinct from the primary training dataset.In some such embodiments, any form of transfer learning is used. For example, consider the case where, in addition to the primary training dataset, there are a first auxiliary training dataset and a second auxiliary training dataset. In such a case, the parameters learned from the first auxiliary training dataset (by applying the first model to the first auxiliary training dataset) are applied to the second auxiliary training dataset using a transfer learning technique (e.g., a second model that is the same as or different from the first model), which then yields a trained intermediate model whose parameters are applied to the primary training dataset, which, together with the primary training dataset itself, is applied to an untrained model. Alternatively, in another exemplary embodiment, a first set of parameters learned from a first auxiliary training dataset (by applying a first model to the first auxiliary training dataset) and a second set of parameters learned from a second auxiliary training dataset (by applying a second model, identical or different to the first model, to the second auxiliary training dataset) are each individually applied to separate instances of the primary training dataset (e.g., by separate independent matrix multiplication), and both such applications of parameters for isolating instances of the primary training dataset, together with the primary training dataset itself (or some reduced form of the primary training dataset, such as key components or regression coefficients learned from the primary training set), are then applied to an untrained model to train the untrained model.

[0120] Where used herein, the term "AUC" refers, for example, to the area under the curve of an ROC curve. Its value can be used to assess the merit of a test in a given sample population, where a value of 1 represents a good test in the range of 0.5, meaning that the test provides a random response when classifying the test subjects. Since the range of AUC is only 0.5 to 1.0, small changes in AUC have greater significance than similar changes in a metric in the range of 0 to 1 or 0 to 100%. Given a percentage change in AUC, it is calculated based on the fact that the entire range of the metric is 0.5 to 1.0. Various statistical packages can calculate the AUC of an ROC curve. AUC can be used to compare the accuracy of classification algorithms across the full data range. A classification algorithm with a larger AUC is, by definition, more capable of correctly classifying the unknown between two groups of interest (disease and disease-free, responders and non-responders).

[0121] As used herein, the term “instruction” refers to an instruction given to a computer processor by a computer program. In a digital computer, each instruction is a sequence of 0s and 1s that describes a physical operation performed by the computer. Such instructions may include data transfer instructions and data manipulation instructions. In some embodiments, each instruction is a type of instruction in an instruction set recognized by the specific processor type used to execute the instruction. Examples of instruction sets include, but are not limited to, reduced instruction set computers (RISC), complex instruction set computers (CISC), minimal instruction set computers (MISC), very long instruction words (VLIW), explicit parallel instruction computing (EPIC), and single instruction set computers (OISC).

[0122] Several embodiments are described below with reference to illustrative applications. It should be understood that numerous specific details, relationships, and methods are provided to make a complete understanding of the features described herein. However, those skilled in the art will readily recognize that the features described herein can be implemented without one or more specific details, or in other ways. The features described herein are not limited by the order of the illustrated actions or events, as some actions can occur in different orders and / or simultaneously with other actions or events. Furthermore, not all illustrated actions or events are required to implement the methodology according to the features described herein.

[0123] Herein, embodiments illustrated in the accompanying drawings are given in detail. The following detailed description includes many specific details to provide a full understanding of the disclosure. However, it will be apparent to those skilled in the art that the disclosure can be carried out without these specific details. In other cases, well-known methods, procedures, components, circuits, and networks are not described in detail so as not to unnecessarily obscure the aspects of the embodiments.

[0124] Exemplary System Embodiment Since an overview of some aspects of this disclosure and some definitions used in this disclosure are provided, details of an exemplary system will be described in conjunction with Figure 1. Figure 1 is a block diagram illustrating a system 100 according to some embodiments. In some embodiments, the system 100 includes one or more processing units CPUs 102 (also referred to as processors), one or more network interfaces 104, a user interface 106 including (optionally) a display 108 and an input system 110, non-persistent memory 111, persistent memory 112, and one or more communication buses 114 for interconnecting these components. The one or more communication buses 114 optionally include circuits (sometimes called chipsets) that interconnect and control communication between system components. Non-persistent memory 111 typically includes high-speed random-access memory such as DRAM, SRAM, DDR RAM, ROM, EEPROM, and flash memory, while persistent memory 112 typically includes CD-ROMs, digital general-purpose disks (DVDs), or other optical storage, magnetic cassettes, magnetic tapes, magnetic disk storage, or other magnetic storage devices, magnetic disk storage devices, optical disk storage devices, flash memory devices, or other non-volatile solid-state storage devices. Persistent memory 112 optionally includes one or more storage devices located remotely from the CPU(s) 102. Persistent memory 112 and non-volatile memory devices(s) within non-persistent memory 112 include non-temporary computer-readable storage media. In some embodiments, non-persistent memory 111, or alternatively, non-temporary computer-readable storage media, store the following programs, modules, and data structures, or subsets thereof, possibly in conjunction with persistent memory 112: ● An optional operating system 116 that handles various basic system services and includes procedures for performing hardware-dependent tasks; ●Optional network communication module (or instruction) 118 for connecting system 100 to other devices and / or communication network 104; ● A microbiome assessment module 140 for determining the target disease state among multiple disease states based on the composition of the target microbiome; and ● A data store 140 of target information based on microbiome sequencing results 150, including the abundance values ​​152 of microorganisms in each of guilds 152-A and 152-B, as described herein.

[0125] In various embodiments, one or more of the identified elements described above are stored in one or more of the aforementioned memory devices and correspond to a set of instructions for performing the functions described above. The identified modules, data, or programs (e.g., sets of instructions) do not need to be implemented as separate software programs, procedures, datasets, or modules, and therefore various subsets of these modules and data may be combined or rearranged in various embodiments. In some embodiments, non-persistent memory 111 optionally stores a subset of the modules and data structures identified above. Furthermore, in some embodiments, the memory stores additional modules and data structures not described above. In some embodiments, one or more of the identified elements described above are stored in a computer system other than the computer system of the visualization system 100 and are addressable by the visualization system 100, so that the visualization system 100 can retrieve all or part of such data when needed.

[0126] Figure 1 depicts "System 100," but the figure is intended more as a functional description of various features that may be present in a computer system than as a structural schematic of the embodiments described herein. In practice, as will be recognized by those skilled in the art, items shown separately can be combined, and some items can be separated. Furthermore, Figure 1 depicts certain data and modules in non-persistent memory 111, but some or all of this data and modules may instead be stored in persistent memory 112.

[0127] 1. A method for training a model to predict target responses to therapies for a disorder. Figure 2 is a schematic diagram of a method for training a model to predict an individual's response to therapy for a disorder, as will be discussed below. The method can be implemented using a computer system (for example, computer system 100 shown and described above with reference to Figure 1).

[0128] Referring to Block 200, in some embodiments, the method includes, in electronic form, for each of the multiple training subjects, (i) corresponding multiple genomic abundance values ​​for each training subject at a time prior to therapy, wherein the corresponding multiple genomic abundance values ​​include, for each of the multiple intestinal microorganisms, corresponding values ​​to the abundance of the genome of each intestinal microorganism in a corresponding biological sample from the intestine of each training subject; and (ii) an index of each training subject's response to therapy for each training subject. Each of the multiple training subjects is undergoing therapy for a disability.

[0129] In some embodiments, the multiple training subjects include at least 50, at least 100, at least 200, at least 500, at least 1,000, at least 5,000, at least 10,000, at least 20,000, at least 50,000, at least 100,000, at least 500,000, or at least 1,000,000 subjects. In some embodiments, the multiple training subjects include 1,000,000 or fewer, 500,000 or fewer, 100,000 or fewer, 50,000 or fewer, 20,000 or fewer, 10,000 or fewer, 1,000 or fewer subjects, 500 or fewer subjects, 100 or fewer subjects, or 50 or fewer subjects. In some embodiments, the multiple training subjects consist of 50-100, 50-200, 50-500, 100-500, 200-500, 200-1000, 500-1000, 200-5,000, 1,000-10,000, 5,000-200,000+, 10,000-50,000, 20,000-100,000, or 500,000-1,000,000 subjects. In some embodiments, the multiple training subjects fall into another range, starting with 50 or more subjects and ending with 100,000,000 or fewer subjects. In some embodiments, the multiple subjects share similar health conditions (such as physical or mental condition, medical history, gene carrier status, or drug use).

[0130] In some embodiments, corresponding biological samples were collected from the intestines of each trainee before treatment or therapy. In some embodiments, the biological samples were collected within 15 minutes, 30 minutes, 1 hour, 2 hours, 3 hours, 4 hours, 5 hours, 6 hours, 12 hours, or 24 hours before treatment or therapy. In some embodiments, the biological samples were collected 1 day, 2 days, 3 days, 4 days, 5 days, 6 days, 1 week, 2 weeks, 3 weeks, 4 weeks, or longer before treatment or therapy. In some embodiments, the biological samples were collected approximately 1 month, 2 months, 3 months, 4 months, 5 months, 6 months, or longer before treatment or therapy.

[0131] In some embodiments, sample data (including plasma and stool samples) and corresponding clinical information (including sex / age / body fat mass / underlying diseases / histopathological features, etc.) were collected for each trainee before therapy. Individual biological samples were subjected to complete microbiome analysis. In some embodiments, the samples were tissue biopsies, intestinal samples, or mucosal samples. See, for example, Tang Q, Jet al., Current Sampling Methods for Gut Microbiota: A Call for More Precise Devices, Front Cell Infect Microbiol., 10:151 (2020), the contents of which are incorporated herein by reference in their entirety. In some embodiments, the biological sample from the intestines of each subject is a fecal sample from each trainee.

[0132] In some embodiments, the correspondence value to genome abundance represents a value representing the absolute abundance of the microbial genome. In some embodiments, the correspondence value to genome abundance represents a normalized abundance value or a relative abundance value (e.g., the abundance of one microorganism normalized to the abundance of the total microbiome of interest). In some embodiments, the correspondence value to genome abundance represents an averaged abundance value (e.g., the average of abundances obtained at different time points or from different biological samples from patients, or the average of abundances obtained using different probes), or a combination of any of the above. The correspondence value to genome abundance is measured by any technique known in the art. In some embodiments, the genome abundance value of a genome is measured by quantitative PCR (qPCR), such as bacterial 16S rRNA qPCR, RT-PCR, or qRT-PCR, to quantify the abundance of a region of interest in the genome, for example, as described in U.S. Patent No. 11,427,865, the disclosure of which is incorporated entirely herein by reference. In some embodiments, genome abundance is measured by targeted sequencing (e.g., 16S rRNA sequencing, or any other suitable biomarker), partial genome sequencing, or whole genome sequencing, thereby determining genome abundance by quantifying the number of reads of a targeted region within the microbial genome, as disclosed, for example, in U.S. Patent Application Publication 2021 / 0403986 or U.S. Patent No. 11,332,783 (their disclosures are incorporated herein by reference in their entirety). In some embodiments, deep sequencing is used to determine the abundance of a targeted sequence, as disclosed, for example, in U.S. Patent Application Publication 2018 / 0237863, the disclosures of which are incorporated herein by reference in their entirety.In some embodiments, the sequencing depth is at least 2X, at least 3X, at least 4X, at least 5X, at least 6X, at least 7X, at least 8X, at least 9X, at least 10X, at least 11X, at least 12X, at least 13X, at least 14X, at least 15X, at least 16X, at least 17X, at least 18X, at least 19X, at least 20X, at least 21X, at least 22X, at least 23X, at least 24X, at least 25X, at least 26X, at least 27X, at least 28X, at least 29X, at least 30X, at least 31X, at least 32X, at least 33X, at least 34X, at least 35X, at least 36X, at least 37X, at least 38X, at least 39X, at least 40X, at least 41X, at least 42X, at least 43X, at least 44X, at least 45X, at least 46X, at least 47X, at least 48X, at least 49X, at least 50X, at least 51X, at least 52X, at least 53X, at least 54X, at least 55X, at least 56X, at least 57X, at least 58X, at least 59X, at least 60X, at least 70X, at least 80X, at least 90X, at least 100X, at least 110X, at least 120X, at least 130X, at least 150X, at least 200X, at least 300X, at least 400X, at least 500X, at least 750X, at least 1000X, or more. In some embodiments, for example, shotgun metagenomic sequencing is used to provide sequence reads of the genome in a sample, as described in U.S. Patent No. 11,028,449, the content of which is incorporated herein by reference in its entirety.

[0133] In some embodiments, the indicators of the subject's response are characterized by clinical outcome measures and include, but are not limited to, complete remission, partial remission, non-remission, survival, onset of adverse events, or any combination thereof. In some embodiments, a responder has a complete remission in response to treatment and a non-responder has a non-remission or partial remission in response to treatment. In some embodiments, the subject is subjected to regular clinical examinations, laboratory analyses, and computed tomography. Tumor response was evaluated using RECIST criteria. In some embodiments, a complete response is defined as complete radiographic disappearance of measurable or evaluable disease, or stable minimal radiographic findings, a partial response is defined as at least a 50% decrease in the longest dimension of measurable disease, stable disease is defined as a decrease of less than 25% in the longest dimension, and progressive disease is defined as tumor growth of more than 25% in the longest dimension or the occurrence of new lesions. In some embodiments, the overall response rate is defined as the sum of the complete response rate and the partial response rate, and the tumor control rate is defined as the sum of the overall response rate with stable disease rate.

[0134] In some embodiments, the indicators of the subject's response characterize the actual therapeutic efficacy of the therapy and include, but are not limited to, progression-free survival (PFS), progression-free survival under treatment, overall survival (OS), response to therapy (RT), overall response rate (ORR), durable clinical benefit (DCB), disease activity score, or any combination thereof, or any other method for assessing the progression or prognosis of a disease or disorder known in the art.

[0135] In some embodiments, “progression-free survival” (PFS) has the meaning understood in the art of the length of time during and after treatment for a disease, such as cancer, in which a patient lives with the disease but does not worsen. In some embodiments, measuring progression-free survival is used as an assessment of how well a new treatment is working. In some embodiments, PFS is determined in a randomized clinical trial, and in some such embodiments, PFS refers to the time from randomization to objective tumor progression and / or death.

[0136] In some embodiments, ORR may be defined as the proportion of patients whose partial (PR) or complete (CR) response is identified as the best overall response (BOR) according to several metrics, such as the Response Evaluation Criteria in Solid Tumors (RECIST 1.1). Stable disease (SD) is classified as non-response along with progressive disease (PD). In some embodiments, ORR has the meaning understood in the art, referring to the proportion of patients who have a predefined amount of tumor size reduction over a minimum period of time. In some embodiments, it is typically the duration of response, measured from the time of the initial response to recorded tumor progression. In some embodiments, ORR includes the sum of partial and complete responses.

[0137] In some embodiments, “clinical effect” refers to a clinical benefit. In some embodiments, such a clinical benefit is or includes a reduction in symptoms caused by tumor growth, such as a reduction in tumor size, an increase in progression-free survival, an increase in overall survival, a reduction in total tumor burden, pain, organ failure, bleeding, skeletal damage, and other associated sequelae of metastatic cancer, as well as combinations thereof. In some embodiments, the clinical effect is a “sustained clinical effect” (DCB) that is maintained for a relevant period. In some embodiments, the relevant period is at least one month, two months, three months, four months, five months, six months, seven months, eight months, nine months, ten months, eleven months, one year, two years, three years, four years, five years, or longer.

[0138] In some embodiments, the subject's response is measured by a Disease Activity Score (DAS) (see, e.g., Van der Heijde DMet al., J Rheumatol, 1993, 20(3):579-81; Prevoo MLet al, Arthritis Rheum, 1995, 38:44-8). The DAS system represents both the current state and changes in disease activity. The DAS scoring system uses a weighted formula derived from clinical trials in RA. For example, DAS28 is 0.56(T28)+0.28(SW28)+0.70(LnESR)+0.014GH, where T represents the number of tender joints, SW represents the number of swollen joints, ESR represents the erythrocyte sedimentation rate, and GH represents global health. Various DAS values ​​represent high or low disease activity and remission, while change and endpoint scores result in a patient classification based on the degree of response (none, moderate, good).

[0139] In some embodiments, the indicator of the response to the target is measured by the level of the immune response or immune parameters in patients with cancer resulting from immunotherapy. In some embodiments, the immune response or immune parameters are characterized by the expression levels of various biological markers of the host immune response, in conjunction with the development of cancer at a given stage of cancer development (i.e., treatment efficacy). In some embodiments, the expression levels of the biological markers are compared to a baseline value for the same biological marker, and, if necessary, to multiple baseline values. Thus, the baseline value for the same biological marker is predetermined and is already known to represent a baseline value that is appropriate for distinguishing between low and high levels of the immune response in patients with cancer for that biological marker. The predetermined baseline value for the biological marker correlates with responders to treatment in cancer patients, or conversely, with non-responders to treatment in cancer patients.

[0140] In some embodiments, changes in combinations of biological markers are quantified. In some embodiments, combinations of 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, or more distinct biological markers are quantified.

[0141] In certain embodiments, biological markers are quantified by immunohistochemical techniques. Exemplary biological markers include 18s, ACE, ACTB, AGTR1, AGTR2, APC, APOA1, ARF1, AXIN1, BAX, BCL2, BCL2L1, CXCR5, BMP2, BRCA1, BTLA, C3, CASP3, CASP9, CCL1, CCL11, CCL13, CCL16, CCL17, CCL18, CCL19, CCL2, CCL20, CCL21, CCL22, CCL23, CCL24, CCL25, CCL26, CCL27, CCL28, CCL3, CCL5, CCL7, CCL8, C CNB1, CCND1, CCNE1, CCR1, CCR10, CCR2, CCR3, CCR4, CCR5, CCR6, CCR7, CCR8, CCR9, CCRL2, CD154, CD19, CD1a, CD2, CD226, CD244, PDCD1LG1, CD28 , CD34, CD36, CD38, CD3E, CD3G, CD3Z, CD4, CD40LG, CD5, CD54, CD6, CD68, CD69, CLIP, CD80, CD83, SLAMF5, CD86, CD8A, CDH1, CDH7, CDK2, CDK4, CD KN1A, CDKN1B, CDKN2A, CDKN2B, CEACAM1, COL4A5, CREBBP, CRLF2, CSF1, CSF2, CSF3, CTLA4, CTNN81, CTSC, CX3CL1, CX3CRI, CXCL1, CXCL10, CXCL1 1, CXCL12, CXCL13, CXCL14, CXCL16, CXCL2, CXCL3, CXCL5, CXCL6, CXCL9, CXCR3, CXCR4, CXCR6, CYP1A2, CYP7A1, DCC, DCN, DEFA6, DICER1, DKK1, D ok-1, Dok-2, DOK6, DVL1, E2F4, EBI3, ECE1, ECGF1, EDN1, EGF, EGFR, EIF4E, CD105, ENPEP, ERBB2, EREG, FCGR3A, CGR3B, FN1, FOXP3, FYN, FZD1, GA PD, GLI2, GNLY, GOLPH4, GRB2, GSK3B, GSTP1, GUSB, GZMA, GZMH, GZMK, HLA-B, HLA-C, HLA-, MA, HLA-DMB, HLA-DOA, HLA-DOB, HLA-DPA1, HLA-DQA2,HLA-DRA, HLX1, HMOX1, HRAS, HSPB3, HUWE1, ICAM1, ICAM-2, ICOS, ID1, ifna1, ifna17, ifna2, ifna5, ifna6, ifna8, IFNAR1, IFNAR2, IFNG, IFNGR1, IFNG R2, IGF1, IHH, IKBKB, IL10, IL12A, IL12B, IL12RB1, IL12RB2, IL13, IL13RA2, IL15, IL15RA, IL17, IL17R, IL17RB, IL18, IL1A, IL1B, IL1RI, IL2, IL21, I L21R, IL23A, IL23R, IL24, IL27, IL2RA, IL2RB, IL2RG, IL3, IL31RA, IL4, IL4RA, IL5, IL6, IL7, IL7RA, IL8, CXCR1, CXCR2, IL9, IL9R, IRF1, ISGF3G, ITGA4, ITGA7, integrin, alpha E (antigen CD103, human mucosal lymphocyte, antigen 1; alpha polypeptide), gene hCG33203, ITGB3, JAK2, JAK3, KLRB1, KLRC4, KLRF1, KLRG1, KRAS, LAG3, LAIR2, LEF 1, LGALS9, LILRB3, LRP2, LTA, SLAMF3, MADCAM1, MADH3, MADH7, MAF, MAP2K1, MDM2, MICA, MICB, MKI67, MMP12, MMP9, MTA1, MTSS1, MYC, MYD88, MYH6, NCAM 1, NFATC1, NKG7, NLK, NOS2A, P2X7, PDCD1, PECAM-, CXCL4, PGK1, PIAS1, PIAS2, PIAS3, PIAS4, PLAT, PML, PP1A, CXCL7, PPP2CA, PRF1, PROM1, PSMB5, ​​PTCH , PTGS2, PTP4A3, PTPN6, PTPRC, RAB23, RAC / RHO, RAC2, RAF, RB1, RBL1, REN, Drosha, SELE, SELL, SELP, SERPINE1, SFRP1, SIRP Beta 1, SKI, SLAMF1, SLAMF6 , SLAMF7, SLAMF8, SMAD2, SMAD4, SMO, SMOH, SMURF1, SOCS1, SOCS2, SOCS3, SOCS4, SOCS5, SOCS6, SOCS7, SOD1, SOD2, SOD3, SOS1, SOX17, CD43, ST14, STAM,STAT1, STAT2, STAT3, STAT4, STAT5A, STAT5B, STAT6, STK36, TAP1, TAP2, TBX21, TCF7, TERT, TFRC, TGFA, TGFB1, TGFBR1, TGFBR2, TIMP3, TLR1, TLRO1, TLR2, TLR3, TLR4, TLR5, TLR6, TLR7, TLR8, TLR9, TNF, TNFRSF 10A, TNFRSF11A, TNFRSF18, TNFRSF1A, TNFRSF1B, OX-40, TNFRSF5, TNFRSF6, TNFRSF7, TNFRSF8, TNFRSF9 , TNFSF10, TNFSF6, TOB1, TP53, TSLP, VCAM1, VEGF, WIF1, WNT1, WNT4, XCL1, XCR1, ZAP70, and ZIC2. ,

[0142] In some embodiments, a treatment regimen or therapy may be administered via any common route, as long as the target tissue or cells are available through that route. This includes, but is not limited to, intravenous, catheter-based, orthotopic, intradermal, subcutaneous, intramuscular, intraperitoneal intratumor, oral, nasal, buccal, rectal, vaginal, or topical administration. The choice of therapeutic agent and administration regimen may depend on a variety of factors, including the drug combination used, the specific disease being treated, and the patient's condition and medical history.

[0143] Referring to Block 202, in some embodiments, the method includes sequencing genomic DNA from a corresponding biological sample from the intestine of each of the multiple training subjects, thereby obtaining a plurality of corresponding nucleic acid sequences. In some embodiments, the plurality of corresponding nucleic acid sequences include at least 10,000, at least 100,000, at least 250,000, at least 500,000, at least 1,000,000, at least 2,500,000, at least 5,000,000, at least 10,000,000, or at least 50,000,000 nucleic acid sequences. In some embodiments, the corresponding multiple nucleic acid sequences include nucleic acid sequences of 250,000,000 or less, 100,000,000 or less, 50,000,000 or less, 25,000,000 or less, 10,000,000 or less, 5,000,000 or less, 1,000,000 or less, and 100,000 or less. In some embodiments, the corresponding plurality of nucleic acid sequences consist of 100,000 to 1,000,000, 200,000 to 5,000,000, 500,000 to 10,000,000, 1,000,000 to 20,000,000, 5,000,000 to 50,000,000, 10,000,000 to 100,000,000, or 50,000,000 to 250,000,000 nucleic acid sequences. In some embodiments, the corresponding plurality of nucleic acid sequences fall into another range, starting with 1,000 or more nucleic acid sequences and ending with 250,000,000 or fewer nucleic acid sequences.

[0144] In some embodiments, a plurality of corresponding nucleic acid sequences are obtained by metagenomic sequencing, for example, as disclosed in U.S. Patent Application Publication 2016 / 0239602 or U.S. Patent No. 11,495,326, the contents of which are incorporated herein by reference in their entirety. In some embodiments, metagenomic sequencing further includes generating a plurality of metagenomic fragment reads. In some embodiments, metagenomic sequencing further includes fragmenting a microbial genome into random fragments of a target size. The resulting fragments may vary in size. In one embodiment, fragments of approximately 500 nucleotides can be obtained. In some embodiments, fragments of 100 to 2000 nucleotides can be obtained, for example, 200 to 800, 100 to 900, 100 to 1000, 300 to 800, and 400 to 900 nucleotides. In some embodiments, the method may further include extracting the metagenomic fragments from a corresponding biological sample. In some embodiments, metagenomic sequencing further includes sequencing the fragments using a high-throughput sequencing method to generate a plurality of sequencing reads.

[0145] In some embodiments, the corresponding multiple nucleic acid sequences are obtained by targeted panel sequencing. An example of targeted panel sequencing is described in U.S. Patent Application Publication 2019 / 0316209. In some embodiments, targeted panel sequencing includes hybridizing genomic DNA isolated from a biological sample from the gut of interest with a panel of probes, which includes one or more probes that hybridize to a unique sequence within the genome of each microorganism being quantified, before sequencing the recovered nucleic acids. In some embodiments, the microorganisms include a plurality of microorganisms listed in Table 1, Table 2, and / or Figures 13A–13XX. In some embodiments, combinations of quasi-unique sequences (e.g., sequences found in a small number of microbial genomes) can be used to deconvolve genomic abundance values ​​using an algorithm, e.g., a system of equations. In some embodiments, the panel of probes includes at least one probe that hybridizes to a sequence unique to each microbial genome being detected. In some embodiments, the probe panel includes at least two, at least three, at least four, at least five, at least ten, at least 25, at least 50, or more probes that hybridize to different sequences unique to each microbial genome being detected. In some embodiments, the probe panel includes at least 20, at least 30, at least 40, at least 50, at least 75, at least 100, at least 125, at least 150, at least 200, at least 150, at least 300, at least 400, at least 500, at least 750, at least 1000, at least 1250, at least 1500, at least 2000, at least 2500, at least 3000, at least 4000, at least 5000, at least 7500, or at least 10,000 or more unique probes.

[0146] In some embodiments, the sequencing genomic DNA from the corresponding biological sample contains at its ends a partial or complete sequencing platform adapter sequence useful for sequencing using the sequencing platform of interest. Examples of sequencing platforms of interest include, but are not limited to, Illumina®'s HiSeq®, MiSeq®, and Genome Analyzer® sequencing systems, Ion Torrent®'s Ion PGM® and Ion Proton® sequencing systems, Pacific Biosciences' PACBIO RS II Sequel system, Life Technologies®' SOLiD sequencing system, Roche's 454GS FLX+ and GS Junior sequencing systems, Oxford Nanopore's MinION® system, or any other sequencing platform of interest.

[0147] Referring to block 204, in some embodiments, the method includes obtaining, in electronic form, a number of corresponding nucleic acid sequences for genomic DNA from a corresponding biological sample from the intestine of each of the multiple training subjects.

[0148] Referring to Block 206, in some embodiments, the method includes determining, for each individual intestinal microorganism among a plurality of intestinal microorganisms, a corresponding value to the abundance of the genome of each intestinal microorganism from a plurality of corresponding nucleic acid sequences.

[0149] In some embodiments, the genome abundance values ​​determined for each of the multiple training subjects include at least 20, at least 25, at least 50, at least 75, at least 100, at least 125, at least 150, at least 175, at least 200, at least 250, at least 300, at least 350, at least 400, at least 500, at least 600, at least 700, at least 700, at least 800, at least 900, at least 1000, at least 1500, at least 2000, at least 25000, at least 5000, or at least 10,000 genome abundance values, each genome abundance value corresponding to a different intestinal microorganism. In some embodiments, the genomic abundance values ​​determined for each of the multiple training subjects include genomic abundance values ​​of 250,000 or less, 100,000 or less, 50,000 or less, 25,000 or less, 10,000 or less, 5,000 or less, 2,500 or less, 1,000 or less, 750 or less, 500 or less, or less. In some embodiments, the genomic abundance values ​​determined for each of the multiple training subjects include genomic abundance values ​​of 10-40, 20-50, 30-80, 40-100, 50-150, 60-200, 80-300, 90-500, 100-1,000, 500-2,000, or 1,000-5,000. In some embodiments, the genomic abundance value determined for each individual subject among multiple training subjects falls within a different range, starting with a genomic abundance value of 10 or less and ending with a genomic abundance value of 250,000 or less.

[0150] Referring to Block 208, in some embodiments, for each training subject in a plurality of training subjects, the method includes: assembling a corresponding plurality of gut microbiota genomes in electronic form by metagenomic de novo sequence assembly from a plurality of corresponding nucleic acid sequences; and for each of the plurality of gut microbiota, calculating a corresponding value to the abundance of the genome of each gut microbiota based on the prevalence of each nucleic acid sequence in the plurality of nucleic acid sequences used to assemble each gut microbiota genome in the plurality of gut microbiota genomes corresponding to each gut microbiota. In some embodiments, the metagenomic de novo sequence assembly further includes generating contigs based on sequence reads generated by a shotgun sequencing technique. Such a technique is described, for example, in U.S. Patent No. 10,529,443, the contents of which are incorporated herein by reference in their entirety. In some embodiments, the first plurality of nucleic acid sequences are assembled into the whole genomes of the plurality of gut microbiota. In some embodiments, the plurality of nucleic acid sequences are assembled into partial genomes of the plurality of gut microbiota.

[0151] Referring to block 210, in some embodiments, for each subject in a plurality of training subjects, the method includes assigning each nucleic acid sequence in a plurality of corresponding nucleic acid sequences to each intestinal microorganism in a plurality of intestinal microorganisms, thereby generating a corresponding count of each nucleic acid sequence in the plurality of corresponding nucleic acid sequences assigned to each intestinal microorganism for each intestinal microorganism in a plurality of intestinal microorganisms, and determining a corresponding genomic abundance value for each intestinal microorganism based on the corresponding count of each nucleic acid sequence assigned to each intestinal microorganism for each intestinal microorganism in a plurality of intestinal microorganisms. In some embodiments, assigning each nucleic acid to each intestinal microorganism includes mapping the nucleic acid to a reference nucleic acid (e.g., a contig enumerated in Figure 12), and in some embodiments, assigning each nucleic acid to each intestinal microorganism includes annotating genomic information based on an existing database. In some embodiments, the nucleic acid sequences are analyzed, and the annotation defines a taxonomic assignment using sequence similarity methods and phylogenetic placement methods, or a combination of the two strategies.

[0152] Sequence similarity-based methods for assigning each nucleic acid sequence to a specific intestinal microorganism include, but are not limited to, various embodiments of algorithms familiar to those skilled in the art, such as BLAST, BLASTx, tBLASTn, tBLASTx, RDP classifier, DNAclust, and Qiime or Mothur. These methods rely on mapping sequence reads to a reference database and selecting the best match with the best score and e-value. In some embodiments, phylogenetic methods are used in combination with sequence similarity methods to improve the accuracy of annotation or taxonomic assignment calls. Common databases include, but are not limited to, GT-DBTK, National Center for Biotechnology Information (NCBI) Genbank, European Bioinformatics Institute-European Nucleotide Archive (EBI-ENA), National Institute of Genetics, USD Department of Energy (USDOE) Integrated Microbial Genomes & Microbiomes (IMG / M), and other databases available in the relevant field.

[0153] In some embodiments, multiple genome abundances are determined using a microarray containing probe sequences capable of detecting unique genome sequences for each respective genome of multiple intestinal microorganisms. In some embodiments, a panel of probes on the microarray includes at least one probe that hybridizes to a sequence unique to each microorganism genome being detected. In some embodiments, a panel of probes includes at least two, at least three, at least four, at least five, at least ten, at least 25, at least 50, or more probes that hybridize to different sequences unique to each microorganism genome being detected. In some embodiments, the probe panel includes at least 20, at least 30, at least 40, at least 50, at least 75, at least 100, at least 125, at least 150, at least 200, at least 150, at least 300, at least 400, at least 500, at least 750, at least 1000, at least 1250, at least 1500, at least 2000, at least 2500, at least 3000, at least 4000, at least 5000, at least 7500, and at least 10,000 or more unique probes.

[0154] Referring to block 212, in some embodiments, the multiple intestinal microorganisms include at least 20 intestinal microorganisms selected from Table 1, Table 2, or Figures 13A to 13XX. In some embodiments, at least about 20, 25, 30, 35, 40, 45, 50, 55, 60, 65, 70, 75, 80, 85, 90, 95, or 100 or more intestinal microorganisms are selected from Table 1, Table 2, or Figures 13A to 13XX. In some embodiments, the multiple intestinal microorganisms include at least 25 intestinal microorganisms selected from Table 1, Table 2, or Figures 13A to 13XX. In some embodiments, the multiple intestinal microorganisms include at least 30 intestinal microorganisms selected from Table 1, Table 2, or Figures 13A to 13XX. In some embodiments, the multiple intestinal microorganisms include at least 40 intestinal microorganisms selected from Table 1, Table 2, or Figures 13A to 13XX. In some embodiments, the multiple intestinal microorganisms include at least 15, at least 20, at least 25, at least 30, at least 35, at least 40, at least 45, at least 50, at least 55, at least 60, at least 65, at least 70, at least 75, at least 80, at least 85, at least 90, at least 95, at least 100, at least 125, at least 150, at least 200, at least 250, at least 300, at least 400, at least 500, at least 600, at least 700, at least 800, or all intestinal microorganisms selected from Table 1, Table 2, or Figures 13A to 13XX.In some embodiments, the plurality of gut microbiota are at least 20, at least 25, at least 30, at least 35, at least 40, at least 45, at least 50, at least 55, at least 60, at least 65, at least 70, at least 75, at least 80, at least 85, at least 90, at least 95, at least 100, at least 125, at least 150, at least 200, at least 250, at least 300, at least 400, at least 500, at least 600, at least 700, at least 800, or all gut microbiota selected from Table 1, Table 2, or FIGS. 13A - 13XX. In some embodiments, the plurality of gut microbiota are all gut microbiota listed in Table 1. In some embodiments, the plurality of gut microbiota are all gut microbiota listed in Table 2. In some embodiments, the plurality of gut microbiota are all gut microbiota listed in FIGS. 13A - 13XX.

Table 1-1

Table 1-2

Table 1-3

Table 1-4

Table 1-5

Table 1-6

Table 1-7

Table 2-1

[0155] The bacterial species listed in Tables 1, 2, and Figures 13A–13XX were identified by metagenomic sequencing of genomic DNA isolated from human fecal samples, as described in the examples, and determined to be part of two competing microbial guilds for at least one biological characteristic. Briefly, genomic DNA was isolated from each fecal sample, sequenced by next-generation sequencing, and contigs for microbial genome sequences were constructed de novo. In general, the identified contigs for each microorganism are expected to represent more than 95% of the entire genome for that microorganism. Genomic constructs with less than 1% sequence variance from each other were combined and defined as being from the same microorganism. Genomic contigs for each microorganism listed in Tables 1, 2, and Figures 13A–13XX are provided in the sequence listings submitted with this application. The taxonomic assignment for each microorganism is given in Tables 1, 2, or Figures 13A–13XX. The correspondence between the sequence identifier assigned to each contig and the microorganism to which it belongs is provided in Figure 12. For example, the contigs provided as Sequence IDs: 1-68 are classified as belonging to the Bacteria domain, Proteobacteria phylum, Gammaproteobacteria class, Enterobacterales order, Enterobacteriaceae family, Escherichia genus, and Escherichia coli species, and correspond to the genome sequence of microorganism 1U001.8 (as shown in Figure 12A), which is located in Guild 2 of the 141 core microorganisms identified in Table 1.

[0156] Therefore, in some embodiments of the method described herein, if the identified genome construct has at least 97% sequence identity compared to the microbial contig provided in the sequence listing, as shown in Figure 12, the genome identified by metagenomic analysis is classified as corresponding to the microorganisms listed in Tables 1, 2, and / or Figures 13A to 13XX. In some embodiments, if the identified genome construct has at least 98% sequence identity compared to the microbial contig provided in the sequence listing, as shown in Figure 12, the genome identified by metagenomic analysis is classified as corresponding to the microorganisms listed in Tables 1, 2, and / or Figures 13A to 13XX. In some embodiments, if the identified genome construct has at least 99% sequence identity compared to the microbial contig provided in the sequence listing, as shown in Figure 12, the genome identified by metagenomic analysis is classified as corresponding to the microorganisms listed in Tables 1, 2, and / or Figures 13A to 13XX. In some embodiments, as shown in Figure 12, if the identified genome construct has at least 99.5% sequence identity with respect to the microbial contig provided in the sequence listing, the genome identified by metagenomic analysis is classified as corresponding to the microorganisms listed in Tables 1, 2, and / or Figures 13A to 13XX. In some embodiments, if the identified genome construct has at least 97%, at least 97.5%, at least 98%, at least 98.5%, at least 99%, at least 99.1%, at least 99.2%, at least 99.3%, at least 99.4%, at least 99.5%, at least 99.6%, at least 99.7%, at least 99.8%, at least 99.9%, or higher sequence identity with respect to the microbial contig provided in the sequence listing, as shown in Figure 12, the genome identified by metagenomic analysis is classified as corresponding to the microorganisms listed in Tables 1, 2, and / or Figures 13A to 13XX.

[0157] Referring to block 214, in some embodiments, the multiple intestinal microorganisms include at least 20 microorganisms selected from those microorganisms in Table 1, Table 2, or Figures 13A to 13XX, each having at least 2 binding affinities. In some embodiments, the identified set of intestinal microorganisms is selected from those microorganisms having at least 2, at least 3, at least 4, at least 5, at least 10, at least 15, at least 20, at least 25, or more binding affinities.

[0158] Referring to Block 216, in some embodiments, the biological sample from the intestines of each subject is a fecal sample from each training subject. In some embodiments, the biological sample is a sample obtained from the small or large intestine, preferably the colon or rectum, and more preferably in the form of a fecal sample or rectal swab, or in the form of a biopsy specimen of the gastrointestinal mucosa.

[0159] Referring to Block 218, in some embodiments, the therapy is a biological therapy, immunotherapy, chemotherapy, radiotherapy, gene therapy, hormone therapy, photodynamic therapy, targeted therapy, small molecules, antibodies, polynucleotides, natural compounds, immunomodulators, bone marrow therapy, stem cell therapy, surgical therapy, induction therapy, maintenance therapy, or a combination thereof.

[0160] Referring to Block 220, in some embodiments, the disorder is selected from the group consisting of type 2 diabetes, hypertension, schizophrenia, atherosclerotic cardiovascular disease (ACVD), cirrhosis (LC), inflammatory bowel disease (IBD), colorectal cancer (CRC), ankylosing spondylitis (AS), and Parkinson's disease (PD), inflammatory bowel disease (IBD), rheumatoid arthritis (RA), or progressive melanoma, and B-cell lymphoma. In some embodiments, the disorder is, for example, hypertension (HT), schizophrenia (SCZ), multiple sclerosis (MS), Gaucher disease type 2 (GDII), COVID-19 (COV), Behçet's disease (BD), autism spectrum disorder (ASD), or pancreatic cancer (PC). In some embodiments, the disorder is cancer, Alzheimer's disease, cardiovascular disease, autoimmune disease, mental health disorder, infection, or genetic disorder.

[0161] In some embodiments, disorders are classified by any indicator of a biological state, function, structure, process, response, or condition in a patient. Such indicators include any of a number of variables (parameters) commonly measured in medicine to assess a patient for purposes such as diagnosis, prognosis, and / or treatment. Typically, the indicators of interest herein are those whose values ​​(which may be quantitative or qualitative) reflect, characterize, or relate to the function or structure of organs and organ systems, and / or whose values ​​reflect, characterize, or relate to the presence or severity of a condition. In some embodiments, diseases are classified by the progression or prognosis of a disease or disorder, for example, by different stages, types, frequencies, or severity of cancer, which can be objectively measured or experienced by the subject. In certain embodiments, disorders may be acquired by medical devices, which may be used to analyze the state of physical substrates such as test strips, depth gauges, filters, or other substrates, providing means for detecting or measuring the state of a body part, wound, or lesion, or the presence of a substance in a biological sample obtained from the subject. In some embodiments, the aim is to detect pathogens (e.g., viruses, bacteria, fungi), abnormal tissues (e.g., tumor sites), or antibodies against biomarkers in biological samples, and / or to detect the presence of such substances in biological samples from patients for purposes such as diagnosing the presence of a disorder or disease.

[0162] Referring to block 222, in some embodiments, the obstruction is cancer.

[0163] Referring to block 224, in some embodiments, the method includes inputting information about each of the multiple training subjects into a model that includes multiple parameters. The model applies the multiple parameters to the information by, for example, at least 10,000 calculations to obtain a corresponding output from the model for each training subject. The corresponding output includes a prediction of the response of each training subject to a therapy, and the information about each training subject includes a corresponding genomic abundance value for each of the multiple gut microbiota, the multiple gut microbiota are selected from Table 1, Table 2, or Figures 13A to 13XX.

[0164] In some embodiments, the model is trained on datasets collected across multiple therapies for a disorder, and the model is trained to distinguish between responsive and non-responsive states. In some embodiments, the model includes a learning statistical classifier system. In some embodiments, the learning statistical classifier system is a random forest, a classification and regression tree, a boost tree, or a neural network. For example, as described in Example 3, a random forest classifier was trained on datasets from 11 different studies that collectively view the microbiome in four different disorders. As shown in Figure 8C, the resulting model was powered to predict responders or non-responders to anti-cytokine or anti-integrin therapy, methotrexate treatment in newly diagnosed rheumatoid arthritis, immune checkpoint inhibitor (ICI) treatment for progressive melanoma, and CD19-CAR-T immunotherapy for B-cell lymphoma.

[0165] Referring to block 226, in some embodiments, the prediction of each training subject's response is the class output of each response among multiple possible responses for each training subject. The method allows for the setting of a single “cutoff” value that enables the distinction between responders and non-responders to the treatment. In some embodiments, the prediction of each training subject's response includes a prediction of the objective response rate of human subjects to the treatment or therapy, and the prediction of the objective response rate includes an index or classification of the amount of full or partial response to the treatment.

[0166] Referring to block 228, in some embodiments, the prediction of each training subject's response is a probability output for each training subject's response. As disclosed above, the method allows setting a single “cutoff” value that enables the distinction between responders and non-responders to a treatment. In some embodiments, the method includes using a model to calculate a probability value for a subject, comparing the probability value to a threshold derived from a responder / non-responder cohort to determine whether the probability value is above / below the threshold, and classifying the subject as a responder / non-responder if the probability value is above / below the threshold. In embodiments, the threshold may be a probability value of at least 50%, 55%, 50%, 65%, 70%, 75%, or about 80% or higher. In other embodiments, the probability value is the positive predictive value, measured by the area under the curve (AUC) of the receiver operating characteristic (ROC) curve. In certain embodiments, the probability value is calculated using a multivariate logistic regression model, a neural network model, a random forest model, or a decision tree model.

[0167] Referring to block 230, in some embodiments, the model is a neural network algorithm, a support vector machine algorithm, a naive Bayes algorithm, a nearest neighbor algorithm, a boosted tree algorithm, a random forest algorithm, a convolutional neural network algorithm, a decision tree algorithm, a regression algorithm, or a clustering algorithm.

[0168] Referring to block 232, in some embodiments, the multiple parameters are at least 1,000, at least 10,000, at least 15,000, at least 50,000, at least 100,000, at least 250,000, at least 500,000, or at least 1,000,000, at least 2,500,000, at least 5,000,000, or at least 10,000,000 parameters.

[0169] Referring to block 234, in some embodiments, the model applies multiple parameters to the information through at least 1,000 calculations, at least 5,000 calculations, at least 10,000 calculations, at least 25,000 calculations, at least 50,000 calculations, at least 100,000 calculations, at least 250,000 calculations, at least 500,000 calculations, at least 1,000,000 calculations, at least 2,500,000 calculations, at least 5,000,000 calculations, and at least 10,000,000 or more calculations, and obtains corresponding outputs from the model for each training target.

[0170] Referring to block 236, in some embodiments, the method includes adjusting several parameters for each of the multiple training subjects based on one or more differences between (i) a corresponding output from a model and (ii) a corresponding index of the training subject's response to the therapy.

[0171] In some embodiments where deep learning techniques utilize neural networks as described above, training the neural network to improve the accuracy of its predictions involves modifying one or more parameters, including but not limited to weights in the filters within the convolutional layers, and biases within the network layers. In some embodiments, the weights and biases are further constrained by various forms of normalization, such as L1, L2, weight decay, and dropout.

[0172] For example, in some embodiments, if the training data is labeled (e.g., with an index of the state of biological characteristics), then either the neural network or any of the models disclosed herein may optionally have their parameters (e.g., weights) adjusted (to potentially minimize the error between the predicted index of the system and the measured index of the training data). Various methods are used to minimize the error function, such as gradient descent, and include, but are not limited to, logarithmic loss, sum of squared errors, and hinge loss methods. In some embodiments, these methods further include quadratic methods or approximations such as momentum methods, Hessen-free estimation, Nesterov's accelerated gradient method, and AdaGrad. In some embodiments, the methods also combine unlabeled generative pre-training with labeled discriminative training.

[0173] Therefore, in some embodiments, training a neural network involves tuning one or more parameters of a plurality of parameters by backpropagation via a loss function. In some embodiments, the loss function is a regression task and / or a classification task. Non-limiting examples of loss functions suitable for regression tasks include, but are not limited to, the mean squared error loss function, mean absolute error loss function, Huber loss function, Log-Cosh loss function, or quantile loss function. See Wang et al., 2020, “A Comprehensive Survey of Loss Functions in Machine Learning,” Annals of Data Science, doi.org / 10.1007 / s40745-020-00253-5, last accessed September 15, 2021, which is incorporated herein by reference in its entirety. Non-limiting examples of loss functions suitable for classification tasks include, but are not limited to, the binary cross-entropy loss function, hinge loss function, or squared hinge loss function. In some embodiments, the loss function is any suitable regression task loss function or classification task loss function.

[0174] Other preferred methods for training the neural networks intended for use in this disclosure are described further herein (see, for example, the untrained models described above).

[0175] In some embodiments, the parameters of the neural network are initialized randomly before training.

[0176] In some embodiments, the neural network includes dropout normalization parameters. For example, in some embodiments, normalization is performed by adding a penalty to the loss function, the penalty being proportional to the parameter value in the trained or untrained model. Generally, normalization reduces the complexity of the model by adding a penalty to one or more parameters, thereby reducing the importance of each hidden neuron associated with those parameters. Such practices can result in a more generalized model and reduce data overfitting. In some embodiments, normalization includes L1 or L2 penalties.

[0177] In some embodiments, training a neural network includes an optimizer. In some embodiments, the optimizer may use a loss function to update the parameters of the neural network or other model via backpropagation. In some embodiments, training a neural network includes a learning rate.

[0178] In some embodiments, the learning rate is at least 0.0001, at least 0.0005, at least 0.001, at least 0.005, at least 0.01, at least 0.05, at least 0.1, at least 0.2, at least 0.3, at least 0.4, at least 0.5, at least 0.6, at least 0.7, at least 0.8, at least 0.9, or at least 1. In some embodiments, the learning rate is 1 or less, 0.9 or less, 0.8 or less, 0.7 or less, 0.6 or less, 0.5 or less, 0.4 or less, 0.3 or less, 0.2 or less, 0.1 or less, 0.05 or less, 0.01 or less, or less than that. In some embodiments, the learning rate is 0.0001 to 0.01, 0.001 to 0.5, 0.001 to 0.01, 0.005 to 0.8, or 0.005 to 1. In some embodiments, the learning rate falls within a different range, starting at 0.0001 or higher and ending at 1 or lower.

[0179] In some embodiments, the learning rate further includes learning rate decay (e.g., a decrease in the learning rate over one or more epochs). For example, the learning decay rate may be a decrease in the learning rate of 0.5 or 0.1. In some embodiments, the learning rate is differential learning rate. In some embodiments, training the neural network further uses a scheduler that conditionally applies learning rate decay based on an evaluation of a performance metric over a threshold number of training epochs (e.g., learning rate decay is applied if the performance metric fails to meet a threshold performance value for at least a threshold number of training epochs).

[0180] In some embodiments, the performance of a neural network is measured at one or more time points using performance metrics, including but not limited to training loss metrics, validation loss metrics, and / or mean absolute error. In some embodiments, the performance metrics are the area under receiver operating characteristics (AUROC) and / or the area under the precision-recall curve (AUPRC).

[0181] For example, in some embodiments, the performance of a neural network is measured by validating the model using a validation (e.g., expanded) dataset. In some such embodiments, the neural network is trained so that it meets the minimum performance requirements based on validation.

[0182] In some embodiments, any preferred method for validation can be used, including but not limited to K-fold cross-validation, advanced cross-validation, random cross-validation, grouped cross-validation (e.g., K-fold grouped cross-validation), bootstrap bias-corrected cross-validation, random search, and / or Bayesian hyperparameter optimization.

[0183] In some embodiments, a method is provided for training a model with multiple parameters by a procedure that includes (i) inputting corresponding genomic abundance values ​​for each of the multiple intestinal microorganisms for each of the multiple intestinal microorganisms for each of the multiple training subjects, thereby obtaining a corresponding prediction of the training subject's response to therapy for each of the multiple training subjects as an output from the model, and (ii) refining the multiple model parameters based on the difference between the training subject's corresponding actual response to therapy and the training subject's corresponding predicted response to therapy.

[0184] 2. How to apply a model to predict a subject's response to therapy for a disorder. Figure 3 is a schematic diagram of a method for applying a model to predict the subject's response to therapy for the disorders discussed below. Method 300 can be implemented using a computer system (e.g., computer system 100 shown and described above with reference to Figure 1).

[0185] Referring to Block 300, in some embodiments, the method includes obtaining, in electronic form, a plurality of genomic abundance values ​​for each of the plurality of intestinal microorganisms selected from Table 1, Table 2, or Figures 13A to 13XX, which include the corresponding abundance values ​​for the genome of each species of intestinal bacterium in the plurality of intestinal microorganisms in a biological sample from a subject.

[0186] In some embodiments, corresponding biological samples were collected from the intestines of each subject prior to treatment or therapy. In some embodiments, the biological samples were collected within 15 minutes, 30 minutes, 1 hour, 2 hours, 3 hours, 4 hours, 5 hours, 6 hours, 12 hours, or 24 hours prior to treatment or therapy. In some embodiments, the biological samples were collected 1 day, 2 days, 3 days, 4 days, 5 days, 6 days, 1 week, 2 weeks, 3 weeks, 4 weeks, or longer prior to treatment or therapy. In some embodiments, the biological samples were collected approximately 1 month, 2 months, 3 months, 4 months, 5 months, 6 months, or longer prior to treatment or therapy.

[0187] In some embodiments, sample data (including plasma and stool samples) and corresponding clinical information (including sex / age / body fat mass / underlying diseases / histopathological characteristics, etc.) were collected for each subject before treatment. Individual biological samples were subjected to complete microbiome analysis. In some embodiments, the samples were tissue biopsies, intestinal samples, or mucosal samples. See, for example, Tang Q, Jet al., Current Sampling Methods for Gut Microbiota: A Call for More Precise Devices, Front Cell Infect Microbiol., 10:151 (2020), the contents of which are incorporated herein by reference in their entirety. In some embodiments, the biological sample from the intestines of each subject was a fecal sample from each subject.

[0188] In some embodiments, the multiple intestinal microorganisms include at least 25 intestinal microorganisms selected from Table 1, Table 2, or Figures 13A to 13XX. In some embodiments, the multiple intestinal microorganisms include at least 30 intestinal microorganisms selected from Table 1, Table 2, or Figures 13A to 13XX. In some embodiments, the multiple intestinal microorganisms include at least 40 intestinal microorganisms selected from Table 1, Table 2, or Figures 13A to 13XX. In some embodiments, the multiple intestinal microorganisms include at least 25 intestinal microorganisms selected from Table 1, Table 2, or Figures 13A to 13XX. In some embodiments, the multiple intestinal microorganisms include at least 15, at least 20, at least 25, at least 30, at least 35, at least 40, at least 45, at least 50, at least 55, at least 60, at least 65, at least 70, at least 75, at least 80, at least 85, at least 90, at least 95, at least 100, at least 125, at least 150, at least 200, at least 250, at least 300, at least 400, at least 500, at least 600, at least 700, at least 800, or all intestinal microorganisms selected from Table 1, Table 2, or Figures 13A to 13XX. In some embodiments, the multiple intestinal microorganisms are at least 20, at least 25, at least 30, at least 35, at least 40, at least 45, at least 50, at least 55, at least 60, at least 65, at least 70, at least 75, at least 80, at least 85, at least 90, at least 95, at least 100, at least 125, at least 150, at least 200, at least 250, at least 300, at least 400, at least 500, at least 600, at least 700, at least 800, or all intestinal microorganisms selected from Table 1, Table 2, or Figures 13A to 13XX. In some embodiments, the multiple intestinal microorganisms are all intestinal microorganisms listed in Table 1.In some embodiments, the multiple intestinal microorganisms are all the intestinal microorganisms listed in Table 2. In some embodiments, the multiple intestinal microorganisms are all the intestinal microorganisms listed in Figures 13A to 13XX.

[0189] In some embodiments, the correspondence value to genome abundance represents a value representing the absolute abundance of the microbial genome. In some embodiments, the correspondence value to genome abundance represents a normalized abundance value or a relative abundance value (e.g., the abundance of one microorganism normalized to the abundance of the total microbiome of interest). In some embodiments, the correspondence value to genome abundance represents an averaged abundance value (e.g., the average of abundances obtained at different time points or from different biological samples from patients, or the average of abundances obtained using different probes), or a combination of any of the above. The correspondence value to genome abundance is measured by any technique known in the art. In some embodiments, the genome abundance value of a genome is measured by quantitative PCR (qPCR), such as bacterial 16S rRNA qPCR, RT-PCR, or qRT-PCR, to quantify the abundance of a region of interest in the genome, for example, as described in U.S. Patent No. 11,427,865, the disclosure of which is incorporated entirely herein by reference. In some embodiments, genome abundance is measured by targeted sequencing (e.g., 16S rRNA sequencing, or any other suitable biomarker), partial genome sequencing, or whole genome sequencing, thereby determining genome abundance by quantifying the number of reads of a targeted region within the microbial genome, as disclosed, for example, in U.S. Patent Application Publication 2021 / 0403986 or U.S. Patent No. 11,332,783 (their disclosures are incorporated herein by reference in their entirety). In some embodiments, deep sequencing is used to determine the abundance of a targeted sequence, as disclosed, for example, in U.S. Patent Application Publication 2018 / 0237863, the disclosures of which are incorporated herein by reference in their entirety.In some embodiments, the sequencing depth is at least 2X, at least 3X, at least 4X, at least 5X, at least 6X, at least 7X, at least 8X, at least 9X, at least 10X, at least 11X, at least 12X, at least 13X, at least 14X, at least 15X, at least 16X, at least 17X, at least 18X, at least 19X, at least 20X, at least 21X, at least 22X, at least 23X, at least 24X, at least 25X, at least 26X, at least 27X, at least 28X, at least 29X, at least 30X, at least 31X, at least 32X, at least 33X, at least 34X, at least 35X, at least 36X, at least 37X, at least 38X, At least 39X, at least 40X, at least 41X, at least 42X, at least 43X, at least 44X, at least 45X, at least 46X, at least 47X, at least 48X, at least 49X, at least 50X, at least 51X, at least 52X, at least 53X, at least 54X, at least 55X, at least 56X, at least 57X, at least 58X, at least 59X, at least 60X, at least 70X, at least 80X, at least 90X, at least 100X, at least 110X, at least 120X, at least 130X, at least 150X, at least 200X, at least 300X, at least 400X, at least 500X, at least 750X, at least 1000X, or more. In some embodiments, shotgun metagenomic sequencing is used to provide sequence reads of the genome in a sample, for example, as described in U.S. Patent No. 11,028,449 (the contents of which are incorporated herein by reference in their entirety).

[0190] In some embodiments of the methods described herein, if the identified genome construct has at least 97% sequence identity compared to the microbial contig provided in the sequence listing, as shown in Figure 12, the genome identified by metagenomic analysis is classified as corresponding to the microorganisms listed in Tables 1, 2, and / or Figures 13A to 13XX. In some embodiments, if the identified genome construct has at least 98% sequence identity compared to the microbial contig provided in the sequence listing, as shown in Figure 12, the genome identified by metagenomic analysis is classified as corresponding to the microorganisms listed in Tables 1, 2, and / or Figures 13A to 13XX. In some embodiments, if the identified genome construct has at least 99% sequence identity compared to the microbial contig provided in the sequence listing, as shown in Figure 12, the genome identified by metagenomic analysis is classified as corresponding to the microorganisms listed in Tables 1, 2, and / or Figures 13A to 13XX. In some embodiments, as shown in Figure 12, if the identified genome construct has at least 99.5% sequence identity with respect to the microbial contig provided in the sequence listing, the genome identified by metagenomic analysis is classified as corresponding to the microorganisms listed in Table 1, Table 2, and / or Figure AXX. In some embodiments, if the identified genome construct has at least 97%, at least 97.5%, at least 98%, at least 98.5%, at least 99%, at least 99.1%, at least 99.2%, at least 99.3%, at least 99.4%, at least 99.5%, at least 99.6%, at least 99.7%, at least 99.8%, at least 99.9%, or higher sequence identity with respect to the microbial contig provided in the sequence listing, as shown in Figure 12, the genome identified by metagenomic analysis is classified as corresponding to the microorganisms listed in Table 1, Table 2, and / or Figures 13A to 13XX.

[0191] Referring to Block 302, in some embodiments, the method includes sequencing genomic DNA from a biological sample from the intestine of a subject to obtain a plurality of nucleic acid sequences. In some embodiments, the plurality of nucleic acid sequences include at least 10,000, at least 100,000, at least 250,000, at least 500,000, at least 1,000,000, at least 2,500,000, at least 5,000,000, at least 10,000,000, or at least 50,000,000 nucleic acid sequences. In some embodiments, the multiple nucleic acid sequences include nucleic acid sequences of 250,000,000 or less, 100,000,000 or less, 50,000,000 or less, 25,000,000 or less, 10,000,000 or less, 5,000,000 or less, 1,000,000 or less, and 100,000 or less. In some embodiments, the plurality of nucleic acid sequences consist of 100,000 to 1,000,000, 200,000 to 5,000,000, 500,000 to 10,000,000, 1,000,000 to 20,000,000, 5,000,000 to 50,000,000, 10,000,000 to 100,000,000, or 50,000,000 to 250,000,000 nucleic acid sequences. In some embodiments, the plurality of nucleic acid sequences fall into another range, starting with 1,000 or more nucleic acid sequences and ending with 250,000,000 or fewer nucleic acid sequences.

[0192] In some embodiments, a plurality of corresponding nucleic acid sequences are obtained by metagenomic sequencing, for example, as disclosed in U.S. Patent Application Publication 2016 / 0239602 or U.S. Patent No. 11,495,326, the contents of which are incorporated herein by reference in their entirety. In some embodiments, metagenomic sequencing further includes generating a plurality of metagenomic fragment reads. In some embodiments, metagenomic sequencing further includes fragmenting a microbial genome into random fragments of a target size. The resulting fragments may vary in size. In one embodiment, fragments of approximately 500 nucleotides can be obtained. In some embodiments, fragments of 100 to 2000 nucleotides can be obtained, for example, 200 to 800, 100 to 900, 100 to 1000, 300 to 800, and 400 to 900 nucleotides. In some embodiments, the method may further include extracting the metagenomic fragments from a corresponding biological sample. In some embodiments, metagenomic sequencing further includes sequencing the fragments using a high-throughput sequencing method to generate a plurality of sequencing reads.

[0193] In some embodiments, the corresponding multiple nucleic acid sequences are obtained by targeted panel sequencing. An example of targeted panel sequencing is described in U.S. Patent Application Publication 2019 / 0316209. In some embodiments, targeted panel sequencing includes hybridizing genomic DNA isolated from a biological sample from the gut of interest with a panel of probes, which includes one or more probes that hybridize to a unique sequence within the genome of each microorganism being quantified, before sequencing the recovered nucleic acids. In some embodiments, the microorganisms include a plurality of microorganisms listed in Table 1, Table 2, and / or Figures 13A–13XX. In some embodiments, combinations of quasi-unique sequences (e.g., sequences found in a small number of microbial genomes) can be used to deconvolve genomic abundance values ​​using an algorithm, e.g., a system of equations. In some embodiments, the panel of probes includes at least one probe that hybridizes to a sequence unique to each microbial genome being detected. In some embodiments, the probe panel includes at least two, at least three, at least four, at least five, at least ten, at least 25, at least 50, or more probes that hybridize to different sequences unique to each microbial genome being detected. In some embodiments, the probe panel includes at least 20, at least 30, at least 40, at least 50, at least 75, at least 100, at least 125, at least 150, at least 200, at least 150, at least 300, at least 400, at least 500, at least 750, at least 1000, at least 1250, at least 1500, at least 2000, at least 2500, at least 3000, at least 4000, at least 5000, at least 7500, or at least 10,000 or more unique probes.

[0194] In some embodiments, the sequencing genomic DNA from the corresponding biological sample contains at its ends a partial or complete sequencing platform adapter sequence useful for sequencing using the sequencing platform of interest. The sequencing platform of interest may include, but is not limited to, Illumina®'s HiSeq®, MiSeq®, and Genome Analyzer® sequencing systems, Ion Torrent®'s Ion PGM® and Ion Proton® sequencing systems, Pacific Biosciences' PACBIO RS II Sequel system, Life Technologies®' SOLiD sequencing system, Roche's 454 GS FLX+ and GS Junior sequencing systems, Oxford Nanopore's MinION® system, or any other sequencing platform of interest.

[0195] Referring to Block 304, in some embodiments, the method includes obtaining multiple nucleic acid sequences for genomic DNA from a biological sample from the intestine of a subject in electronic form.

[0196] Referring to Block 306, in some embodiments, the method includes determining, for each individual intestinal microorganism in a plurality of intestinal microorganisms, a corresponding value to the genome abundance of each intestinal microorganism from a plurality of nucleic acid sequences. In some embodiments, the genome abundance values ​​determined for a subject include at least 20, at least 25, at least 50, at least 75, at least 100, at least 125, at least 150, at least 175, at least 200, at least 250, at least 300, at least 350, at least 400, at least 500, at least 600, at least 700, at least 700, at least 800, at least 900, at least 1000, at least 1500, at least 2000, at least 25000, at least 5000, or at least 10,000 genome abundance values, each genome abundance value corresponding to a different intestinal microorganism. In some embodiments, the genome abundance values ​​include genome abundance values ​​of 250,000 or less, 100,000 or less, 50,000 or less, 25,000 or less, 10,000 or less, 5,000 or less, 2,500 or less, 1,000 or less, 750 or less, 500 or less, or less. In some embodiments, the genome abundance values ​​consist of genome abundance values ​​of 10-40, 20-50, 30-80, 40-100, 50-150, 60-200, 80-300, 90-500, 100-1,000, 500-2,000, or 1,000-5,000. In some embodiments, the number of genome abundance values ​​falls within another range, starting with genome abundance values ​​of 10 or less and ending with genome abundance values ​​of 250,000 or less.

[0197] Referring to Block 308, in some embodiments, the method includes assembling a plurality of corresponding gut microbiota genomes in electronic form by metagenomic de novo sequence assembly from a plurality of nucleic acid sequences, and for each of the plurality of gut microbiota, calculating a corresponding value to the abundance of the genome of each gut microbiota based on the prevalence of each nucleic acid sequence in the plurality of nucleic acid sequences used to assemble each gut microbiota genome in the plurality of gut microbiota genomes corresponding to each gut microbiota. In some embodiments, the metagenomic de novo sequence assembly further includes generating contigs based on sequence reads generated by a shotgun sequencing technique. Such a technique is described, for example, in U.S. Patent No. 10,529,443, the contents of which are incorporated herein by reference in their entirety. In some embodiments, the plurality of nucleic acid sequences can be assembled into whole genomes of the plurality of gut microbiota. In some embodiments, the plurality of nucleic acid sequences can be assembled into partial genomes of the plurality of gut microbiota.

[0198] Referring to block 310, in some embodiments, the method includes assigning each nucleic acid sequence in a plurality of nucleic acid sequences to each of a plurality of intestinal microorganisms, thereby generating a corresponding count of each nucleic acid sequence in the plurality of nucleic acid sequences assigned to each intestinal microorganism for each of the plurality of intestinal microorganisms, and determining a corresponding genomic abundance value for each intestinal microorganism based on the corresponding count of each nucleic acid sequence assigned to each intestinal microorganism for each of the plurality of intestinal microorganisms. In some embodiments, assigning each nucleic acid to each intestinal microorganism includes mapping the nucleic acid to a reference nucleic acid. In some embodiments, assigning each nucleic acid to each intestinal microorganism includes annotating genomic information based on an existing database. In some embodiments, the nucleic acid sequences are analyzed, and the annotation defines a taxonomic assignment using sequence similarity methods and phylogenetic assignment methods, or a combination of the two strategies.

[0199] Sequence similarity-based methods for assigning each nucleic acid sequence to each intestinal microorganism include, but are not limited to, various embodiments of algorithms familiar to those skilled in the art, such as BLAST, BLASTx, tBLASTn, tBLASTx, RDP classifier, DNAclust, and Qiime or Mothur. These methods rely on mapping sequence reads to a reference database and selecting the best match with the best score and e-value. In some embodiments, phylogenetic methods are used in combination with sequence similarity methods to improve the accuracy of annotation or taxonomic assignment calls. Common databases include, but are not limited to, GT-DBTK, National Center for Biotechnology Information (NCBI) Genbank, European Bioinformatics Institute-European Nucleotide Archive (EBI-ENA), National Institute of Genetics, USD Department of Energy (USDOE) Integrated Microbial Genomes & Microbiomes (IMG / M), and other databases available in the relevant field.

[0200] Referring to block 312, in some embodiments, the multiple intestinal microorganisms include at least 20 microorganisms selected from those listed in Table 1, Table 2, or Figures 13A to 13XX, having at least 2, at least 3, at least 4, at least 5, at least 10, at least 15, at least 20, at least 25, or more binding affinities. In some embodiments, the multiple intestinal microorganisms include at least 20 microorganisms selected from those listed in Table 1, Table 2, or Figures 13A to 13XX, having at least 2 binding affinities. In some embodiments, the multiple intestinal microorganisms include at least 20 microorganisms selected from those listed in Table 1, Table 2, or Figures 13A to 13XX, having at least 3, at least 4, at least 5, at least 10, at least 15, at least 20, at least 25, or more binding affinities.

[0201] Referring to Block 314, in some embodiments, the biological sample from the intestine in question is a fecal sample. In some embodiments, the sample is a tissue biopsy, intestinal, or mucosal sample. In some embodiments, the biological sample is a sample obtained from the small or large intestine, preferably the colon or rectum, and more preferably in the form of a fecal sample or rectal swab, or in the form of a biopsy specimen of gastrointestinal mucosa.

[0202] Referring to Block 316, in some embodiments, the therapy is a biological therapy, immunotherapy, chemotherapy, radiotherapy, gene therapy, hormone therapy, photodynamic therapy, targeted therapy, small molecules, antibodies, polynucleotides, natural compounds, immunomodulators, bone marrow therapy, stem cell therapy, surgical therapy, induction therapy, maintenance therapy, or a combination thereof.

[0203] Referring to Block 318, in some embodiments, the disorder is selected from the group consisting of type 2 diabetes, hypertension, schizophrenia, atherosclerotic cardiovascular disease (ACVD), cirrhosis (LC), inflammatory bowel disease (IBD), colorectal cancer (CRC), ankylosing spondylitis (AS), and Parkinson's disease (PD), inflammatory bowel disease (IBD), rheumatoid arthritis (RA), progressive melanoma, and B-cell lymphoma. In some embodiments, the disorder is, for example, hypertension (HT), schizophrenia (SCZ), multiple sclerosis (MS), Gaucher disease type 2 (GDII), COVID-19 (COV), Behçet's disease (BD), autism spectrum disorder (ASD), or pancreatic cancer (PC). In some embodiments, the disorder is cancer, Alzheimer's disease, cardiovascular disease, autoimmune disease, mental health disorder, infection, or genetic disorder.

[0204] In some embodiments, disorders are classified by any indicator of a biological state, function, structure, process, response, or condition in a patient. Such indicators include any of a number of variables (parameters) commonly measured in medicine to assess a patient for purposes such as diagnosis, prognosis, and / or treatment. Typically, the indicators of interest herein are those whose values ​​(which may be quantitative or qualitative) reflect, characterize, or relate to the function or structure of organs and organ systems, and / or whose values ​​reflect, characterize, or relate to the presence or severity of a condition. In some embodiments, diseases are classified by the progression or prognosis of a disease or disorder, for example, by different stages, types, frequencies, or severity of cancer, which can be objectively measured or experienced by the subject. In certain embodiments, disorders may be acquired by medical devices, which may be used to analyze the state of physical substrates such as test strips, depth gauges, filters, or other substrates, providing means for detecting or measuring the state of a body part, wound, or lesion, or the presence of a substance in a biological sample obtained from the subject. In some embodiments, the aim is to detect pathogens (e.g., viruses, bacteria, fungi), abnormal tissues (e.g., tumor sites), or antibodies against biomarkers in biological samples, and / or to detect the presence of such substances in biological samples from patients for purposes such as diagnosing the presence of a disorder or disease.

[0205] Referring to block 320, in some embodiments, the obstruction is cancer.

[0206] Referring to block 322, in some embodiments, the method includes inputting multiple genomic abundance values ​​into a model that includes multiple parameters. The model applies the multiple parameters to the multiple genomic abundance values ​​by, for example, at least 10,000 calculations to generate predictions of the subject's response to therapy as an output from the model.

[0207] In some embodiments, the model is trained on datasets collected across multiple therapies for a disorder, and the model is trained to distinguish between responsive and non-responsive states. In some embodiments, the model includes a learning statistical classifier system. In some embodiments, the learning statistical classifier system is a random forest, a classification and regression tree, a boost tree, or a neural network. For example, as described in Example 3, a random forest classifier was trained on datasets from 11 different studies that collectively view the microbiome in four different disorders. As shown in Figure 8C, the resulting model was powered to predict responders or non-responders to anti-cytokine or anti-integrin therapy, methotrexate treatment in newly diagnosed rheumatoid arthritis, immune checkpoint inhibitor (ICI) treatment for progressive melanoma, and CD19-CAR-T immunotherapy for B-cell lymphoma.

[0208] In some embodiments, the response of the subjects was characterized by a clinical outcome scale, including, but not limited to, complete remission, partial remission, non-remission, survival, incidence of adverse events, or any combination thereof. In some embodiments, one responder had complete remission in response to treatment, and a non-responder had non-remission or partial remission in response to treatment. In some embodiments, patients were subjected to routine clinical examinations, laboratory analyses, and computed tomography. Tumor responses were assessed using RECIST criteria. In some embodiments, complete response was defined as complete radioactive disappearance of measurable or evaluable disease, or stable minimum radioactive findings; partial response was defined as a reduction of at least 50% in the longest dimension of measurable disease; stable disease was defined as a reduction of less than 25% in the longest dimension; and progressive disease was defined as tumor growth of more than 25% in the longest dimension, or development of new lesions. In some embodiments, overall response rate was defined as the sum of complete response rates and partial response rates, and tumor control rate was defined as the sum of overall response rates with a stable disease rate.

[0209] In some embodiments, the measure of the response to the subject is characterized by the actual therapeutic efficacy of the therapy, including progression-free survival (PFS), progression-free survival under treatment, overall survival (OS), response to treatment (RT), overall response rate (ORR), sustained clinical benefit (DCB), disease activity score, or any combination thereof, or any other method for assessing the progression or prognosis of a disease or disability known in the art.

[0210] In some embodiments, “progression-free survival” (PFS) has the meaning understood in the art of the length of time during and after treatment for a disease, such as cancer, in which a patient lives with the disease but does not worsen. In some embodiments, measuring progression-free survival is used as an assessment of how well a new treatment is working. In some embodiments, PFS is determined in a randomized clinical trial, and in some such embodiments, PFS refers to the time from randomization to objective tumor progression and / or death.

[0211] In some embodiments, ORR may be defined as the proportion of patients whose partial (PR) or complete (CR) response is identified as the best overall response (BOR) according to several metrics, such as the Response Evaluation Criteria in Solid Tumors (RECIST 1.1). Stable disease (SD) is classified as non-response along with progressive disease (PD). In some embodiments, ORR has the meaning understood in the art, referring to the proportion of patients who have a predefined amount of tumor size reduction over a minimum period of time. In some embodiments, it is typically the duration of response, measured from the time of the initial response to recorded tumor progression. In some embodiments, ORR includes the sum of partial and complete responses.

[0212] In some embodiments, “clinical effect” refers to a clinical benefit. In some embodiments, such a clinical benefit is or includes a reduction in symptoms caused by tumor growth, such as a reduction in tumor size, an increase in progression-free survival, an increase in overall survival, a reduction in total tumor burden, pain, organ failure, bleeding, skeletal damage, and other associated sequelae of metastatic cancer, as well as combinations thereof. In some embodiments, the clinical effect is a “sustained clinical effect” (DCB) that is maintained for a relevant period. In some embodiments, the relevant period is at least one month, two months, three months, four months, five months, six months, seven months, eight months, nine months, ten months, eleven months, one year, two years, three years, four years, five years, or longer.

[0213] In some embodiments, the subject's response is measured by a Disease Activity Score (DAS) (see, e.g., Van der Heijde DMet al., J Rheumatol, 1993, 20(3):579-81; Prevoo MLet al, Arthritis Rheum, 1995, 38:44-8). The DAS system represents both the current state and changes in disease activity. The DAS scoring system uses a weighted formula derived from clinical trials in RA. For example, DAS28 is 0.56(T28)+0.28(SW28)+0.70(LnESR)+0.014GH, where T represents the number of tender joints, SW represents the number of swollen joints, ESR represents the erythrocyte sedimentation rate, and GH represents global health. Various DAS values ​​represent high or low disease activity and remission, while change and endpoint scores result in a patient classification based on the degree of response (none, moderate, good).

[0214] In some embodiments, the indicator of the response to the target is measured by the level of the immune response or immune parameters in patients with cancer resulting from immunotherapy. In some embodiments, the immune response or immune parameters are characterized by the expression levels of various biological markers of the host immune response, in conjunction with the development of cancer at a given stage of cancer development (i.e., treatment efficacy). In some embodiments, the expression levels of the biological markers are compared to a reference value for the same biological marker, and, if necessary, to a reference value. Thus, the reference value for the same biological marker is predetermined and is already known to indicate a reference value that is appropriate for distinguishing between low and high levels of the immune response in patients with cancer for that biological marker. The predetermined reference value for the biological marker correlates with responders to treatment in cancer patients, or conversely, with non-responders to treatment in cancer patients.

[0215] In some embodiments, changes in combinations of biological markers are quantified. In some embodiments, combinations of 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, or more distinct biological markers are quantified.

[0216] In certain embodiments, biological markers are quantified by immunohistochemical techniques. Exemplary biological markers include 18s, ACE, ACTB, AGTR1, AGTR2, APC, APOA1, ARF1, AXIN1, BAX, BCL2, BCL2L1, CXCR5, BMP2, BRCA1, BTLA, C3, CASP3, CASP9, CCL1, CCL11, CCL13, CCL16, CCL17, CCL18, CCL19, CCL2, CCL20, CCL21, CCL22, CCL23, CCL24, CCL25, CCL26, CCL27, CCL28, CCL3, CCL5, CCL7, CCL8, C CNB1, CCND1, CCNE1, CCR1, CCR10, CCR2, CCR3, CCR4, CCR5, CCR6, CCR7, CCR8, CCR9, CCRL2, CD154, CD19, CD1a, CD2, CD226, CD244, PDCD1LG1, CD28 , CD34, CD36, CD38, CD3E, CD3G, CD3Z, CD4, CD40LG, CD5, CD54, CD6, CD68, CD69, CLIP, CD80, CD83, SLAMF5, CD86, CD8A, CDH1, CDH7, CDK2, CDK4, CD KN1A, CDKN1B, CDKN2A, CDKN2B, CEACAM1, COL4A5, CREBBP, CRLF2, CSF1, CSF2, CSF3, CTLA4, CTNN81, CTSC, CX3CL1, CX3CRI, CXCL1, CXCL10, CXCL1 1, CXCL12, CXCL13, CXCL14, CXCL16, CXCL2, CXCL3, CXCL5, CXCL6, CXCL9, CXCR3, CXCR4, CXCR6, CYP1A2, CYP7A1, DCC, DCN, DEFA6, DICER1, DKK1, D ok-1, Dok-2, DOK6, DVL1, E2F4, EBI3, ECE1, ECGF1, EDN1, EGF, EGFR, EIF4E, CD105, ENPEP, ERBB2, EREG, FCGR3A, CGR3B, FN1, FOXP3, FYN, FZD1, GA PD, GLI2, GNLY, GOLPH4, GRB2, GSK3B, GSTP1, GUSB, GZMA, GZMH, GZMK, HLA-B, HLA-C, HLA-, MA, HLA-DMB, HLA-DOA, HLA-DOB, HLA-DPA1, HLA-DQA2,HLA-DRA, HLX1, HMOX1, HRAS, HSPB3, HUWE1, ICAM1, ICAM-2, ICOS, ID1, ifna1, ifna17, ifna2, ifna5, ifna6, ifna8, IFNAR1, IFNAR2, IFNG, IFNGR1, IFNG R2, IGF1, IHH, IKBKB, IL10, IL12A, IL12B, IL12RB1, IL12RB2, IL13, IL13RA2, IL15, IL15RA, IL17, IL17R, IL17RB, IL18, IL1A, IL1B, IL1RI, IL2, IL21, I L21R, IL23A, IL23R, IL24, IL27, IL2RA, IL2RB, IL2RG, IL3, IL31RA, IL4, IL4RA, IL5, IL6, IL7, IL7RA, IL8, CXCR1, CXCR2, IL9, IL9R, IRF1, ISGF3G, ITGA4, ITGA7, integrin, alpha E (antigen CD103, human mucosal lymphocyte, antigen 1; alpha polypeptide), gene hCG33203, ITGB3, JAK2, JAK3, KLRB1, KLRC4, KLRF1, KLRG1, KRAS, LAG3, LAIR2, LEF 1, LGALS9, LILRB3, LRP2, LTA, SLAMF3, MADCAM1, MADH3, MADH7, MAF, MAP2K1, MDM2, MICA, MICB, MKI67, MMP12, MMP9, MTA1, MTSS1, MYC, MYD88, MYH6, NCAM 1, NFATC1, NKG7, NLK, NOS2A, P2X7, PDCD1, PECAM-, CXCL4, PGK1, PIAS1, PIAS2, PIAS3, PIAS4, PLAT, PML, PP1A, CXCL7, PPP2CA, PRF1, PROM1, PSMB5, ​​PTCH , PTGS2, PTP4A3, PTPN6, PTPRC, RAB23, RAC / RHO, RAC2, RAF, RB1, RBL1, REN, Drosha, SELE, SELL, SELP, SERPINE1, SFRP1, SIRP Beta 1, SKI, SLAMF1, SLAMF6 , SLAMF7, SLAMF8, SMAD2, SMAD4, SMO, SMOH, SMURF1, SOCS1, SOCS2, SOCS3, SOCS4, SOCS5, SOCS6, SOCS7, SOD1, SOD2, SOD3, SOS1, SOX17, CD43, ST14, STAM,STAT1, STAT2, STAT3, STAT4, STAT5A, STAT5B, STAT6, STK36, TAP1, TAP2, TBX21, TCF7, TERT, TFRC, TGFA, TGFB1, TGFBR1, TGFBR2, TIMP3, TLR1, TLRO1, TLR2, TLR3, TLR4, TLR5, TLR6, TLR7, TLR8, TLR9, TNF, TNFRSF 10A, TNFRSF11A, TNFRSF18, TNFRSF1A, TNFRSF1B, OX-40, TNFRSF5, TNFRSF6, TNFRSF7, TNFRSF8, TNFRSF9 , TNFSF10, TNFSF6, TOB1, TP53, TSLP, VCAM1, VEGF, WIF1, WNT1, WNT4, XCL1, XCR1, ZAP70, and ZIC2. ,

[0217] Referring to block 324, in some embodiments, the response prediction for a subject is a class output of each response among multiple possible responses for each subject. The method allows setting a single “cutoff” value that enables the distinction between responders and non-responders to the treatment. In some embodiments, the response prediction for each subject includes a prediction of the objective response rate for human subjects to the treatment or therapy, and the objective response rate prediction includes an index or classification of the amount of full or partial response to the treatment.

[0218] Referring to block 326, in some embodiments, the prediction of a subject's response is a probability output for each subject's response. As disclosed above, the method allows setting a single “cutoff” value that enables the distinction between responders and non-responders to a treatment. In some embodiments, the method includes using a model to calculate a probability value for a subject, comparing the probability value to a threshold derived from a responder / non-responder cohort to determine whether the probability value is above or below the threshold, and classifying the subject as a responder / non-responder if the probability value is above or below the threshold. In embodiments, the threshold may be a probability value of at least 50%, 55%, 50%, 65%, 70%, 75%, or about 80% or higher. In other embodiments, the probability value is the positive predictive value measured by the area under the curve (AUC) of the receiver operating characteristic (ROC) curve. In certain embodiments, the probability value is calculated using a multivariate logistic regression model, a neural network model, a random forest model, or a decision tree model.

[0219] Referring to block 328, in some embodiments, the model is a neural network algorithm, a support vector machine algorithm, a naive Bayes algorithm, a nearest neighbor algorithm, a boosted tree algorithm, a random forest algorithm, a convolutional neural network algorithm, a decision tree algorithm, a regression algorithm, or a clustering algorithm.

[0220] Referring to block 330, in some embodiments, the multiple parameters are at least 1,000, at least 10,000, at least 15,000, at least 50,000, at least 100,000, at least 250,000, at least 500,000, at least 1,000,000, at least 2,500,000, at least 5,000,000, and at least 10,000,000 or more parameters.

[0221] Referring to block 332, in some embodiments, the model applies multiple parameters to the information through at least 1,000 calculations, at least 5,000 calculations, at least 10,000 calculations, at least 25,000 calculations, at least 50,000 calculations, at least 100,000 calculations, at least 250,000 calculations, at least 500,000 calculations, at least 1,000,000 calculations, at least 2,500,000 calculations, at least 5,000,000 calculations, and at least 10,000,000 or more calculations, and obtains corresponding outputs for each subject from the model.

[0222] Referring to Block 334, in some embodiments, the method further includes i) administering a therapy to a subject if the prediction of the subject's response to the therapy satisfies the threshold likelihood that the subject will respond well to the therapy, and ii) treating the subject by administering one or more of a plurality of intestinal microorganisms to the subject if the prediction of the subject's response to the therapy does not satisfy the threshold likelihood that the subject will respond well to the therapy.

[0223] In some embodiments, administration includes identifying one or more intestinal microorganisms that are underrepresented in a subject, for example, based on corresponding genomic abundance values ​​for the microorganisms, and administering one or more identified intestinal microorganisms to the subject. In some embodiments, identification includes determining whether the abundance of an intestinal microorganism, for example, based on corresponding genomic abundance values ​​for the microorganisms, meets a corresponding threshold amount. If the abundance of a microorganism does not meet the corresponding threshold amount, the microorganism is identified for administration. In some embodiments, the corresponding threshold amount is a relative abundance. In some embodiments, the corresponding threshold amount is an amount relative to the abundance of one or more different intestinal microorganisms in the subject. In some embodiments, the corresponding threshold amount is an amount relative to the total abundance of multiple intestinal microorganisms in the subject.

[0224] In some embodiments, administration involves administering a predefined set of microorganisms. In some embodiments, the predefined set of microorganisms includes at least five enteric microorganisms selected from Table 1, Table 2, or Figures 13A to 13XX. In some embodiments, the predefined set of microorganisms includes at least 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 35, 40, 45, 50, 55, 60, 65, 70, 75, 80, 85, 90, 95, 100, 110, 120, 130, 140, 150, 160, 170, 175, 180, 190, 200, 225, 250, 275, 300, 350, 400, 450, 500, 600, 700 or more intestinal microorganisms, selected from Table 1, Table 2, or Figures 13A to 13XX.

[0225] In some embodiments, the predefined set of microorganisms includes only the intestinal microorganisms selected from Table 1, Table 2, or Figures 13A to 13XX assigned to Guild 1. That is, the predefined set of microorganisms does not include the microorganisms selected from Table 1, Table 2, or Figures 13A to 13XX assigned to Guild 2. In some embodiments, the predefined set of microorganisms includes at least five intestinal microorganisms selected from Table 1, Table 2, or Figures 13A to 13XX assigned to Guild 1. In some embodiments, the predefined set of microorganisms includes at least 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 35, 40, 45, 50, 55, 60, 65, 70, 75, 80, 85, 90, 95, 100, 110, 120, 130, 140, 150, 160, 170, 175, 180, 190, 200, 225, 250, 275, 300, 350, 400 or more intestinal microorganisms, selected from Table 1, Table 2, or Figures 13A to 13XX assigned to Guild 1.

[0226] In some embodiments, the predefined set of microorganisms includes only the intestinal microorganisms selected from Table 1, Table 2, or Figures 13A to 13XX assigned to Guild 2. That is, the predefined set of microorganisms does not include the microorganisms selected from Table 1, Table 2, or Figures 13A to 13XX assigned to Guild 1. In some embodiments, the predefined set of microorganisms includes at least five intestinal microorganisms selected from Table 1, Table 2, or Figures 13A to 13XX assigned to Guild 2. In some embodiments, the predefined set of microorganisms includes at least 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 35, 40, 45, 50, 55, 60, 65, 70, 75, 80, 85, 90, 95, 100, 110, 120, 130, 140, 150, 160, 170, 175, 180, 190, 200, 225, 250, 275, 300, 350, 400 or more intestinal microorganisms, selected from Table 1, Table 2, or Figures 13A to 13XX assigned to Guild 2.

[0227] In some embodiments, if the prediction of a subject's response to the therapy does not meet the likelihood threshold for the subject to respond well to the therapy, the method further includes administering the therapy to the subject. In some embodiments, the therapy is administered to the subject approximately simultaneously with the administration of one or more of the multiple intestinal microorganisms. In some embodiments, the therapy is administered to the subject after the administration of one or more of the multiple intestinal microorganisms. In some embodiments, the therapy is administered to the subject for at least 1 day, at least 2 days, at least 3 days, at least 4 days, at least 5 days, at least 6 days, at least 7 days, at least 1 week, at least 2 weeks, at least 3 weeks, at least 4 weeks, at least 5 weeks, at least 6 weeks, at least 7 weeks, at least 8 weeks or longer after the administration of one or more of the multiple intestinal microorganisms. In some embodiments, the therapy is administered to the subject within 3 months, within 2 months, within 1 month, within 4 weeks, within 3 weeks, within 2 weeks, within 1 week, within 6 days, within 5 days, within 4 days, within 3 days, or within 2 days after the administration of one or more of the multiple intestinal microorganisms. In some embodiments, the therapy is administered to the subject at intervals of 1 day to 2 months, 1 day to 1 month, 1 day to 3 weeks, 1 day to 2 weeks, 1 day to 1 week, 1 day to 3 days, 2 days to 2 months, 2 days to 1 month, 2 days to 3 weeks, 2 days to 2 weeks, 2 days to 1 week, 2 days to 3 days, 3 days to 2 months, 3 days to 1 month, 3 days to 3 weeks, 3 days to 2 weeks, 3 days to 1 week, 1 week to 2 months, 1 week to 1 month, 1 week to 3 weeks, or 1 week to 2 weeks, after one or more of the intestinal microorganisms have been administered.

[0228] In some embodiments, if a subject is classified as a predicted non-responder before treatment, the clinician may treat that subject differently from those classified as predicted responders. Classifying a subject as either a predicted non-responder or a predicted responder may allow for the adoption of a more appropriate specific or alternative treatment regimen for the patient.

[0229] In some embodiments, a therapeutic regimen or therapy can be administered via any common route, as long as the target tissue or cells are available through that route. This includes, but is not limited to, intravenous, catheter-based, orthotopic, intradermal, subcutaneous, intramuscular, intraperitoneal intratumor, oral, nasal, buccal, rectal, vaginal, or topical administration. The choice of therapeutic agent and administration regimen may depend on various factors, including the drug combination used, the specific disease being treated, and the patient's condition and medical history.

[0230] In some embodiments, non-responders are administered one or more of several intestinal microorganisms by oral administration or colonoscopy, but not limited to these methods. The intestinal microbiota therapeutic compositions for use described herein may be prepared and administered using methods known in the art. Generally, any suitable method may be used, but the compositions are formulated for oral, colonoscopy, or transnasal gastric delivery.

[0231] In some embodiments, non-responders receive a fecal microbiota transplant from a responder population via a method disclosed, for example, in US2023 / 0109343, US2020 / 0147151, or US2021 / 036172. In some embodiments, non-responders receive an effective amount of a pre-selected isolated population of enteromicrobials from the responder's fecal material. In some embodiments, non-responders receive an effective amount of a pre-selected isolated population of enteromicrobials from Table 1, Table 2, or Figures 13A–13XX. In some embodiments, one or more of the multiple intestinal microorganisms administered to non-responders include a therapeutically effective or sufficient amount of at least 1, at least 5, at least 10, at least 15, at least 20, at least 25, at least 30, at least 35, at least 40, at least 45, at least 50, at least 55, at least 60, at least 65, at least 70, at least 75, at least 80, at least 85, at least 90, at least 95, at least 100, at least 125, at least 150, at least 200, at least 250, at least 300, at least 400, at least 500, at least 600, at least 700, at least 8 or all of the isolated or purified population of intestinal microorganisms selected from Table 1, Table 2, or Figures 13A to 13XX. In some embodiments, one or more of the multiple intestinal microorganisms administered to non-responders are at least about 1 × 10⁶ bacteria. 3 A viable colony-forming unit (CFU), or at least about 1 × 10⁻¹⁶ 4 , 1 x 10 5 , 1 x 10 6 , 1 x 10 7 , 1 x 10 8 , 1 x 10 9 , 1 x 10 10 , 1 x 10 11 , 1 x 10 12 , 1 x 10 13 , 1 x 10 14 , 1 x 10 15It contains viable CFU (or any induceable range thereof). In some embodiments, a single dose contains at least, at most, or exactly 1 × 10⁶ of the specified bacteria. 4 , 1 x 10 5 , 1 x 10 6 , 1 x 10 7 , 1 x 10 8 , 1 x 10 9 , 1 x 10 10 , 1 x 10 11 , 1 x 10 12 , 1 x 10 13 , 1 x 10 14 , 1 x 10 15 , or 1 × 10 15 It contains an amount of viable CFU (or any induceable range thereof) of enteric microorganisms (such as the specific bacteria or species, genera, or families described herein) exceeding 1 × 10⁶. In some embodiments, a single dose contains at least, at most, or exactly 1 × 10⁶ of the total enteric microorganisms. 4 , 1 x 10 5 , 1 x 10 6 , 1 x 10 7 , 1 x 10 8 , 1 x 10 9 , 1 x 10 10 , 1 x 10 11 , 1 x 10 12 , 1 x 10 13 , 1 x 10 14 , 1 x 10 15 , or 1 × 10 15 It contains viable CFU exceeding (or any induced range within that).

[0232] In some embodiments, multiple intestinal microorganisms are administered concurrently or sequentially with one or more therapies for a disease or disorder. In some embodiments, some, most, or substantially all of the colon, intestine, or intestinal microbiota of the target are removed before administration of the composition.

[0233] In some embodiments, multiple intestinal microorganisms are administered two or more times. In certain embodiments, the composition is administered daily, weekly, or monthly. In some embodiments, multiple intestinal microorganisms are administered for two, three, or four months to induce and / or maintain a suitable microbiome in the GI tubule of non-responders.

[0234] composition In one embodiment, the present disclosure provides a pharmaceutical composition comprising a first intestinal microorganism selected from those microorganisms enumerated in Figures 13A to 13XX. In Figure 13, each entry identifies the species of intestinal microorganism identified in the following examples, whether that species is present in the core set of 284 microbiomes (core 284 genomes), which of the two guilds described in the examples the microorganism belongs to, and provides a taxonomic classification of the microorganism, where d=domain, p=phylum, c=class, o=order, f=family, g=genus, and s=species. For example, the first entry in Figure 13A is reproduced below: Microbe ID: 1U001.8 Core: N Guild Assignment: Guild 2 d_Bacteria;p_Proteobacteria;c_Gammaproteobacteria;o_Enterobacterales;f_Enterobacteriaceae;g_Escherichia;s_Escherichia coli This is not part of the core set of microorganisms, but part of Guild 2, and defines organism 1U001.8 having the taxonomic classification of Domain=Bacteria, Phylum=Proteobacteria, Class=Gammaproteobacteria, Order=Enterobacterales, Family=Enterobacteriaceae, Genus=Escherichia, and species Escherichia coli.

[0235] The genome sequences of each organism listed in Figure 13 can be found in the sequence listings filed herein, as mapped according to the associated entries in Figure 12. For example, as shown in Figure 12A, organism 1U001.8 has genome sequences corresponding to sequence numbers 1-68. As described in the examples, species are defined as organisms having at least a threshold percentage similarity in their genome sequences. For example, in some embodiments, microorganisms are defined as organism 1U001.8 if their genomes share at least 99% identity with sequences 1-68. In some embodiments, a microorganism is defined as a microorganism listed in Figure 13A if its genome has at least 97%, at least 97.5%, at least 98%, at least 98.5%, at least 99%, at least 99.1%, at least 99.2%, at least 99.3%, at least 99.4%, at least 99.5%, at least 99.6%, at least 99.7%, at least 99.8%, or at least 99.9% sequence identity with the corresponding genome sequence in the sequence listing, as mapped in Figure 12.

[0236] In some embodiments, the pharmaceutical composition comprises two or more microorganisms listed in Figure 13. In some embodiments, the pharmaceutical composition comprises at least two, at least three, at least four, at least five, at least ten, at least fifteen, at least 20, at least 25, at least 30, at least 40, at least 50, at least 75, at least 100, at least 125, at least 150, at least 175, at least 200, at least 225, at least 250, at least 300, at least 350, at least 400, at least 450, at least 500, at least 550, at least 600, at least 650, at least 700, at least 750, or at least 800 microorganisms listed in Figure 13.

[0237] In some embodiments, the majority of microorganisms in the pharmaceutical composition are those listed in Figure 13. In some embodiments, at least 80% of the microorganisms in the pharmaceutical composition are those listed in Figure 13. In some embodiments, at least 85% of the microorganisms in the pharmaceutical composition are those listed in Figure 13. In some embodiments, at least 90% of the microorganisms in the pharmaceutical composition are those listed in Figure 13. In some embodiments, at least 95% of the microorganisms in the pharmaceutical composition are those listed in Figure 13. In some embodiments, at least 98% of the microorganisms in the pharmaceutical composition are those listed in Figure 13. In some embodiments, at least 99% of the microorganisms in the pharmaceutical composition are those listed in Figure 13. In some embodiments, at least 99.5% of the microorganisms in the pharmaceutical composition are those listed in Figure 13. In some embodiments, at least 99.8% of the microorganisms in the pharmaceutical composition are those listed in Figure 13. In some embodiments, at least 99.9% of the microorganisms in the pharmaceutical composition are those listed in Figure 13. In some embodiments, at least 99.99% of the microorganisms in the pharmaceutical composition are those listed in Figure 13. In some embodiments, the pharmaceutical composition contains only those microorganisms listed in Figure 13.

[0238] In some embodiments, the majority of microorganisms in the pharmaceutical composition are those listed as core microorganisms in Figure 13. In some embodiments, at least 80% of the microorganisms in the pharmaceutical composition are those listed as core microorganisms in Figure 13. In some embodiments, at least 85% of the microorganisms in the pharmaceutical composition are those listed as core microorganisms in Figure 13. In some embodiments, at least 90% of the microorganisms in the pharmaceutical composition are those listed as core microorganisms in Figure 13. In some embodiments, at least 95% of the microorganisms in the pharmaceutical composition are those listed as core microorganisms in Figure 13. In some embodiments, at least 98% of the microorganisms in the pharmaceutical composition are those listed as core microorganisms in Figure 13. In some embodiments, at least 99% of the microorganisms in the pharmaceutical composition are those listed as core microorganisms in Figure 13. In some embodiments, at least 99.5% of the microorganisms in the pharmaceutical composition are those listed as core microorganisms in Figure 13. In some embodiments, at least 99.8% of the microorganisms in the pharmaceutical composition are those listed as core microorganisms in Figure 13. In some embodiments, at least 99.9% of the microorganisms in the pharmaceutical composition are those listed as core microorganisms in Figure 13. In some embodiments, at least 99.99% of the microorganisms in the pharmaceutical composition are those listed as core microorganisms in Figure 13. In some embodiments, the pharmaceutical composition contains only those microorganisms listed as core microorganisms in Figure 13.

[0239] In some embodiments, the pharmaceutical composition contains at least two, at least three, at least four, at least five, at least ten, at least fifteen, at least 20, at least 25, at least 30, at least 40, at least 50, at least 75, at least 100, at least 125, at least 150, at least 175, at least 200, at least 225, at least 250, at least 275, or all of the microorganisms listed as core microorganisms in Figure 13.

[0240] In some embodiments, the majority of microorganisms in the pharmaceutical composition are those listed as Guild 1 microorganisms in Figure 13. In some embodiments, at least 80% of the microorganisms in the pharmaceutical composition are those listed as Guild 1 microorganisms in Figure 13. In some embodiments, at least 85% of the microorganisms in the pharmaceutical composition are those listed as Guild 1 microorganisms in Figure 13. In some embodiments, at least 90% of the microorganisms in the pharmaceutical composition are those listed as Guild 1 microorganisms in Figure 13. In some embodiments, at least 95% of the microorganisms in the pharmaceutical composition are those listed as Guild 1 microorganisms in Figure 13. In some embodiments, at least 98% of the microorganisms in the pharmaceutical composition are those listed as Guild 1 microorganisms in Figure 13. In some embodiments, at least 99% of the microorganisms in the pharmaceutical composition are those listed as Guild 1 microorganisms in Figure 13. In some embodiments, at least 99.5% of the microorganisms in the pharmaceutical composition are those listed as Guild 1 microorganisms in Figure 13. In some embodiments, at least 99.8% of the microorganisms in the pharmaceutical composition are listed as Guild 1 microorganisms in Figure 13. In some embodiments, at least 99.9% of the microorganisms in the pharmaceutical composition are listed as Guild 1 microorganisms in Figure 13. In some embodiments, at least 99.99% of the microorganisms in the pharmaceutical composition are listed as Guild 1 microorganisms in Figure 13. In some embodiments, the pharmaceutical composition contains only the microorganisms listed as Guild 1 microorganisms in Figure 13.

[0241] In some embodiments, the pharmaceutical composition includes at least 2, at least 3, at least 4, at least 5, at least 10, at least 15, at least 20, at least 25, at least 30, at least 40, at least 50, at least 75, at least 100, at least 125, at least 150, at least 175, at least 200, at least 225, at least 250, at least 275, at least 300, at least 400, or all of the microorganisms listed as Guild 1 microorganisms in Figure 13.

[0242] In some embodiments, the majority of microorganisms in the pharmaceutical composition are those listed as Guild 1 and core microorganisms in Figure 13. In some embodiments, at least 80% of the microorganisms in the pharmaceutical composition are those listed as Guild 1 and core microorganisms in Figure 13. In some embodiments, at least 85% of the microorganisms in the pharmaceutical composition are those listed as Guild 1 and core microorganisms in Figure 13. In some embodiments, at least 90% of the microorganisms in the pharmaceutical composition are those listed as Guild 1 and core microorganisms in Figure 13. In some embodiments, at least 95% of the microorganisms in the pharmaceutical composition are those listed as Guild 1 and core microorganisms in Figure 13. In some embodiments, at least 98% of the microorganisms in the pharmaceutical composition are those listed as Guild 1 and core microorganisms in Figure 13. In some embodiments, at least 99% of the microorganisms in the pharmaceutical composition are those listed as Guild 1 and core microorganisms in Figure 13. In some embodiments, at least 99.5% of the microorganisms in the pharmaceutical composition are those listed as Guild 1 and core microorganisms in Figure 13. In some embodiments, at least 99.8% of the microorganisms in the pharmaceutical composition are those listed as Guild 1 and core microorganisms in Figure 13. In some embodiments, at least 99.9% of the microorganisms in the pharmaceutical composition are those listed as Guild 1 and core microorganisms in Figure 13. In some embodiments, at least 99.99% of the microorganisms in the pharmaceutical composition are those listed as Guild 1 and core microorganisms in Figure 13. In some embodiments, the pharmaceutical composition contains only those microorganisms listed as Guild 1 and core microorganisms in Figure 13.

[0243] In some embodiments, the pharmaceutical composition comprises at least two, at least three, at least four, at least five, at least ten, at least fifteen, at least 20, at least 25, at least 30, at least 40, at least 50, at least 75, at least 100, at least 125, at least 150, at least 175, or all of the microorganisms listed in Figure 13 as Guild 1 and core microorganisms.

[0244] In some embodiments, the majority of microorganisms in the pharmaceutical composition are those listed as guild-2 microorganisms in Figure 13. In some embodiments, at least 80% of the microorganisms in the pharmaceutical composition are those listed as guild-2 microorganisms in Figure 13. In some embodiments, at least 85% of the microorganisms in the pharmaceutical composition are those listed as guild-2 microorganisms in Figure 13. In some embodiments, at least 90% of the microorganisms in the pharmaceutical composition are those listed as guild-2 microorganisms in Figure 13. In some embodiments, at least 95% of the microorganisms in the pharmaceutical composition are those listed as guild-2 microorganisms in Figure 13. In some embodiments, at least 98% of the microorganisms in the pharmaceutical composition are those listed as guild-2 microorganisms in Figure 13. In some embodiments, at least 99% of the microorganisms in the pharmaceutical composition are those listed as guild-2 microorganisms in Figure 13. In some embodiments, at least 99.5% of the microorganisms in the pharmaceutical composition are those listed as guild-2 microorganisms in Figure 13. In some embodiments, at least 99.8% of the microorganisms in the pharmaceutical composition are listed as guild 2 microorganisms in Figure 13. In some embodiments, at least 99.9% of the microorganisms in the pharmaceutical composition are listed as guild 2 microorganisms in Figure 13. In some embodiments, at least 99.99% of the microorganisms in the pharmaceutical composition are listed as guild 2 microorganisms in Figure 13. In some embodiments, the pharmaceutical composition contains only the microorganisms listed as guild 2 microorganisms in Figure 13.

[0245] In some embodiments, the pharmaceutical composition contains at least 2, at least 3, at least 4, at least 5, at least 10, at least 15, at least 20, at least 25, at least 30, at least 40, at least 50, at least 75, at least 100, at least 125, at least 150, at least 175, at least 200, at least 225, at least 250, at least 275, at least 300, at least 400, or all of the microorganisms listed as Guild 2 microorganisms in Figure 13.

[0246] In some embodiments, the majority of microorganisms in the pharmaceutical composition are those listed as Guild 2 and core microorganisms in Figure 13. In some embodiments, at least 80% of the microorganisms in the pharmaceutical composition are those listed as Guild 2 and core microorganisms in Figure 13. In some embodiments, at least 85% of the microorganisms in the pharmaceutical composition are those listed as Guild 2 and core microorganisms in Figure 13. In some embodiments, at least 90% of the microorganisms in the pharmaceutical composition are those listed as Guild 2 and core microorganisms in Figure 13. In some embodiments, at least 95% of the microorganisms in the pharmaceutical composition are those listed as Guild 2 and core microorganisms in Figure 13. In some embodiments, at least 98% of the microorganisms in the pharmaceutical composition are those listed as Guild 2 and core microorganisms in Figure 13. In some embodiments, at least 99% of the microorganisms in the pharmaceutical composition are those listed as Guild 2 and core microorganisms in Figure 13. In some embodiments, at least 99.5% of the microorganisms in the pharmaceutical composition are those listed as Guild 2 and core microorganisms in Figure 13. In some embodiments, at least 99.8% of the microorganisms in the pharmaceutical composition are those listed as Guild 2 and core microorganisms in Figure 13. In some embodiments, at least 99.9% of the microorganisms in the pharmaceutical composition are those listed as Guild 2 and core microorganisms in Figure 13. In some embodiments, at least 99.99% of the microorganisms in the pharmaceutical composition are those listed as Guild 2 and core microorganisms in Figure 13. In some embodiments, the pharmaceutical composition contains only those microorganisms listed as Guild 2 and core microorganisms in Figure 13.

[0247] In some embodiments, the pharmaceutical composition comprises at least two, at least three, at least four, at least five, at least ten, at least fifteen, at least 20, at least 25, at least 30, at least 40, at least 50, at least 75, at least 100, at least 125, at least 150, at least 175, or all of the microorganisms listed in Figure 13 as Guild 2 and core microorganisms.

[0248] In some embodiments, the pharmaceutical composition is prepared from a culture of a microorganism or a group of microorganisms. For example, in embodiments where the pharmaceutical composition contains a single microorganism, the microorganism is cultured alone, and the culture is used to prepare a composition, for example, for fecal microbiota transplantation (FMT). In some embodiments where the pharmaceutical composition contains multiple microorganisms, each microorganism is cultured separately and then combined to produce the pharmaceutical composition. In some embodiments where the pharmaceutical composition contains multiple microorganisms, two or more microorganisms are cultured together and optionally mixed with other microorganisms that have been cultured separately. In some embodiments where the pharmaceutical composition contains multiple microorganisms, all of the microorganisms are cultured together.

[0249] In some embodiments, the pharmaceutical composition is for fecal microbiota transplantation. A review of the use of FMT is provided by Al-Ali D, Ahmed A, Shafiq A, McVeigh C, Chaari A, Zakaria D, and Bendriss G, “Fecal microbiota transplants: A review of emerging clinical data on applications, efficacy, and risks (2015-2020),” Qatar Med J., 2021(1):5(2021), the disclosure of which is incorporated herein by reference.

[0250] In some embodiments, the pharmaceutical composition for FMT is a fecal sample supplemented with one or more of the microorganisms disclosed in Figure 13. In some embodiments, at least half of the microorganisms in the supplemented fecal sample are from the supplement. In some embodiments, at least 15%, at least 20%, at least 25%, at least 30%, at least 35%, at least 40%, at least 45%, at least 50%, at least 55%, at least 60%, at least 65%, at least 70%, at least 75%, at least 80%, at least 85%, at least 90%, at least 95%, at least 98%, at least 99%, at least 99.5%, at least 99.8%, or at least 99.9% of the microorganisms in the supplemented fecal sample are from the supplement. In some embodiments, the fecal sample is sterilized before supplementation with one or more microorganisms listed in Table 13 to kill most of the microorganisms from the unsupplemented fecal sample (e.g., at least 50%, at least 75%, at least 90%, at least 95%, at least 98%, at least 99%, at least 99.5%, at least 99.8%, at least 99.9%, or all of them).

[0251] In some embodiments, the pharmaceutical composition is a synthetic fecal sample (e.g., synthetic stool). An exemplary description of the use of synthetic stool is provided in Gweon TG, Na SY, “Next Generation Fecal Microbiota Transplantation,” Clin Endosc., 54(2):152-156 (2021), the disclosure of which is incorporated herein by reference.

[0252] In some embodiments, the composition further comprises pharmaceutically acceptable excipients.

[0253] In some embodiments, the first intestinal microorganisms belong to Guild 1, as identified in Figures 13A to 13XX. In some embodiments, the first intestinal microorganisms belong to Guild 2, as identified in Figures 13A to 13XX.

[0254] In some embodiments, the first intestinal microorganism has a genome that has at least 99% sequence identity with respect to the set of microbial contigs listed in Figures 12A to 12I.

[0255] In some embodiments, the first intestinal microorganisms constitute at least 50% of the total amount of intestinal microorganisms in the composition. In some embodiments, the first intestinal microorganisms constitute at least 75% of the total amount of intestinal microorganisms in the composition. In some embodiments, the first intestinal microorganisms constitute at least 90% of the total amount of intestinal microorganisms in the composition. In some embodiments, the first intestinal microorganisms constitute at least 95% of the total amount of intestinal microorganisms in the composition. In some embodiments, the first intestinal microorganisms constitute at least 99% of the total amount of intestinal microorganisms in the composition. In some embodiments, the first intestinal microorganisms constitute at least 99.5% of the total amount of intestinal microorganisms in the composition. In some embodiments, the first intestinal microorganisms constitute at least 99.9% of the total amount of intestinal microorganisms in the composition.

[0256] In some embodiments, the composition further comprises a second intestinal microorganism selected from those microorganisms listed in Figures 13A to 13XX. In some embodiments, the second intestinal microorganism belongs to the same guild as the first intestinal microorganism, as identified in Figures 13A to 13XX.

[0257] In one embodiment, the present disclosure provides a composition comprising a first intestinal microorganism selected from those listed in Figures 13A to 13XX. In some embodiments, the composition comprises two or more microorganisms listed in Figure 13. In some embodiments, the composition comprises at least two, at least three, at least four, at least five, at least ten, at least fifteen, at least 20, at least 25, at least 30, at least 40, at least 50, at least 75, at least 100, at least 125, at least 150, at least 175, at least 200, at least 225, at least 250, at least 300, at least 350, at least 400, at least 450, at least 500, at least 550, at least 600, at least 650, at least 700, at least 750, or at least 800 microorganisms listed in Figure 13.

[0258] In some embodiments, the majority of microorganisms in the composition are those listed in Figure 13. In some embodiments, at least 80% of the microorganisms in the composition are those listed in Figure 13. In some embodiments, at least 85% of the microorganisms in the composition are those listed in Figure 13. In some embodiments, at least 90% of the microorganisms in the composition are those listed in Figure 13. In some embodiments, at least 95% of the microorganisms in the composition are those listed in Figure 13. In some embodiments, at least 98% of the microorganisms in the composition are those listed in Figure 13. In some embodiments, at least 99% of the microorganisms in the composition are those listed in Figure 13. In some embodiments, at least 99.5% of the microorganisms in the composition are those listed in Figure 13. In some embodiments, at least 99.8% of the microorganisms in the composition are those listed in Figure 13. In some embodiments, at least 99.9% of the microorganisms in the composition are those listed in Figure 13. In some embodiments, at least 99.99% of the microorganisms in the composition are those listed in Figure 13. In some embodiments, the composition comprises only the microorganisms listed in Figure 13.

[0259] In some embodiments, the majority of microorganisms in the composition are those listed as core microorganisms in Figure 13. In some embodiments, at least 80% of the microorganisms in the composition are those listed as core microorganisms in Figure 13. In some embodiments, at least 85% of the microorganisms in the composition are those listed as core microorganisms in Figure 13. In some embodiments, at least 90% of the microorganisms in the composition are those listed as core microorganisms in Figure 13. In some embodiments, at least 95% of the microorganisms in the composition are those listed as core microorganisms in Figure 13. In some embodiments, at least 98% of the microorganisms in the composition are those listed as core microorganisms in Figure 13. In some embodiments, at least 99% of the microorganisms in the composition are those listed as core microorganisms in Figure 13. In some embodiments, at least 99.5% of the microorganisms in the composition are those listed as core microorganisms in Figure 13. In some embodiments, at least 99.8% of the microorganisms in the composition are those listed as core microorganisms in Figure 13. In some embodiments, at least 99.9% of the microorganisms in the composition are those listed as core microorganisms in Figure 13. In some embodiments, at least 99.99% of the microorganisms in the c...

Claims

1. A method for predicting a subject's response to therapy for a disorder, In a computer system having one or more processors and memory for storing one or more programs for execution by the one or more processors, A) To obtain, in electronic format, multiple genome abundance values ​​for each of the multiple intestinal microorganisms selected from Table 1, Table 2, or Figures 13A to 13XX, including the corresponding abundance values ​​for the genome of each species of intestinal bacteria in the biological sample from the subject, B) A method comprising inputting the plurality of genomic abundance values ​​into a model that includes a plurality of parameters, wherein the model applies the plurality of parameters to the plurality of genomic abundance values ​​in order to generate an output from the model of a prediction of the subject's response to the therapy.

2. The acquisition A) is, (i) Obtaining, in electronic format, a plurality of at least 100,000 nucleic acid sequences for the genomic DNA from the biological sample from the target intestine, (ii) The method according to claim 1, comprising determining a correspondence value for each of the plurality of intestinal microorganisms to the abundance of the genome of each intestinal microorganism from the plurality of at least 100,000 nucleic acid sequences.

3. The above decision A) (ii) is, The assembly of a plurality of corresponding intestinal microbial genomes by a metagenomic de novo sequence assembly from the plurality of at least 100,000 nucleic acid sequences in electronic form, The method according to claim 2, comprising: for each of the plurality of intestinal microorganisms, calculating the corresponding value of the genome of each intestinal microorganism to the abundance of each intestinal microorganism based on the prevalence of each nucleic acid sequence in the plurality of at least 100,000 nucleic acid sequences used to assemble each intestinal microorganism genome in the plurality of intestinal microorganism genomes corresponding to each intestinal microorganism.

4. The above decision A) (ii) is, Assigning each nucleic acid sequence in the plurality of at least 100,000 sequences to each of the plurality of intestinal microorganisms, thereby generating a corresponding count for each of the plurality of nucleic acid sequences assigned to each of the plurality of intestinal microorganisms. The method according to claim 2, comprising determining the corresponding genome abundance value for each of the intestinal microorganisms in the plurality of intestinal microorganisms based on the corresponding count of the nucleic acid sequence assigned to each of the intestinal microorganisms.

5. The method according to any one of claims 2 to 4, further comprising sequencing the genomic DNA from the biological sample from the intestine of the subject, thereby obtaining the plurality of at least 100,000 nucleic acid sequences.

6. The method described above is If the prediction of the subject's response to the therapy satisfies the threshold likelihood that the subject will respond well to the therapy, then the therapy shall be administered to the subject, and The method according to any one of claims 1 to 5, further comprising treating the subject by administering one or more of the plurality of intestinal microorganisms to the subject if the prediction of the subject's response to the therapy does not satisfy the threshold likelihood that the subject will respond well to the therapy.

7. The method according to any one of claims 1 to 6, wherein the plurality of intestinal microorganisms include at least 20 microorganisms selected from those microorganisms in Table 1, Table 2, or Figures 13A to 13XX, each having at least two binding affinity.

8. The method according to any one of claims 1 to 7, wherein the biological sample from the intestine of the subject is a fecal sample.

9. The method according to any one of claims 1 to 8, wherein the therapy is a biological therapy, immunotherapy, chemotherapy, radiotherapy, gene therapy, hormone therapy, photodynamic therapy, targeted therapy, small molecules, antibodies, polynucleotides, natural compounds, immunomodulators, bone marrow therapy, stem cell therapy, surgical therapy, induction therapy, maintenance therapy, or a combination thereof.

10. The method according to any one of claims 1 to 9, wherein the disorder is selected from the group consisting of type 2 diabetes, hypertension, schizophrenia, atherosclerotic cardiovascular disease (ACCVD), cirrhosis of the liver (LC), inflammatory bowel disease (IBD), colorectal cancer (CRC), ankylosing spondylitis (AS), and Parkinson's disease (PD), inflammatory bowel disease (IBD), rheumatoid arthritis (RA), progressive melanoma, and B-cell lymphoma.

11. The method according to any one of claims 1 to 9, wherein the disorder is cancer.

12. The method according to any one of claims 1 to 11, wherein the prediction of the response of the target is the class output of each response in a plurality of possible responses for each target.

13. The method according to any one of claims 1 to 12, wherein the prediction of the response of the target is a probability output for the response of each of the targets.

14. The method according to any one of claims 1 to 13, wherein the model is a neural network algorithm, a support vector machine algorithm, a naive Bayes algorithm, a nearest neighbor algorithm, a boosted tree algorithm, a random forest algorithm, a convolutional neural network algorithm, a decision tree algorithm, a regression algorithm, or a clustering algorithm.

15. The method according to any one of claims 1 to 14, wherein the plurality of parameters are at least 1,000, at least 10,000, at least 15,000, at least 50,000, at least 100,000, at least 250,000, at least 500,000, or at least 1,000,000 parameters.

16. The method according to any one of claims 1 to 15, wherein the model applies the plurality of parameters to the information by calculations at least 25,000, at least 50,000, at least 100,000, at least 250,000, at least 500,000, or at least 1,000,000 times in order to obtain corresponding outputs for each of the objects from the model.

17. A method for training a model to predict a subject's response to therapy for a disorder, In a computer system having one or more processors and memory for storing one or more programs for execution by the one or more processors, A) In electronic format, for each of the multiple training subjects, each of the multiple training subjects who is receiving therapy for a disability, (i) At the time prior to receiving the therapy, a plurality of corresponding genome abundance values ​​for each of the training subjects, wherein the plurality of corresponding genome abundance values ​​include, for each of the plurality of intestinal microorganisms, a corresponding value for the genome abundance of each intestinal microorganism in the corresponding biological sample from the intestine of each of the training subjects, and (ii) Obtain an index of the response of each of the training subjects to the therapy, B) For each of the multiple training targets, information about each training target is input into a model that includes multiple parameters, wherein the model applies the multiple parameters to obtain a corresponding output from the model for each training target. The corresponding output includes a prediction of the response of each of the trainees to the therapy, The information relating to each of the training targets includes the corresponding genome abundance values ​​for each of the multiple intestinal microorganisms. The aforementioned multiple intestinal microorganisms are selected from Table 1, Table 2, or Figures 13A to 13XX, and input accordingly. C) A method comprising adjusting the plurality of parameters for each of the plurality of training subjects based on one or more differences between (i) the corresponding output from the model and (ii) the corresponding index of the response of each training subject to the therapy.

18. The acquisition A) described above applies to each of the training targets among the multiple training targets, (i) Obtaining, in electronic format, a plurality of at least 100,000 nucleic acid sequences corresponding to the genomic DNA from the corresponding biological sample from the intestine of each of the training subjects, (ii) The method according to claim 17, comprising determining the corresponding value for the abundance of the genome of each of the plurality of intestinal microorganisms from the corresponding plurality of at least 100,000 nucleic acid sequences.

19. The above decision A)(ii) is made with respect to each of the training subjects among the plurality of training subjects, The assembly of a plurality of corresponding intestinal microbial genomes by a metagenomic de novo sequence assembly from the plurality of corresponding nucleic acid sequences, The method according to claim 18, comprising: for each of the plurality of intestinal microorganisms, calculating the corresponding value of the genome of each intestinal microorganism to the abundance of each intestinal microorganism based on the prevalence of each nucleic acid sequence in the plurality of at least 100,000 nucleic acid sequences used to assemble each intestinal microorganism genome in the plurality of intestinal microorganism genomes corresponding to each intestinal microorganism.

20. The above decision A)(ii) applies to each of the multiple training targets, Assigning each nucleic acid sequence in the aforementioned plurality of at least 100,000 corresponding sequences to each of the plurality of intestinal microorganisms, thereby generating a corresponding count for each of the corresponding nucleic acid sequences in the plurality of corresponding nucleic acid sequences assigned to each of the plurality of intestinal microorganisms, The method according to claim 18, comprising determining the corresponding genome abundance value for each of the intestinal microorganisms in the plurality of intestinal microorganisms based on the corresponding count of the nucleic acid sequence assigned to each of the intestinal microorganisms.

21. The method according to any one of claims 18 to 20, further comprising sequencing the genomic DNA from the corresponding biological sample from the intestine of each of the plurality of training subjects, thereby obtaining the corresponding plurality of at least 100,000 nucleic acid sequences.

22. The method according to any one of claims 17 to 21, wherein the plurality of intestinal microorganisms comprises at least 20 intestinal microorganisms selected from Table 1, Table 2, or Figures 13A to 13XX.

23. The method according to any one of claims 17 to 22, wherein the plurality of intestinal microorganisms comprises at least 20 microorganisms selected from those microorganisms in Table 1, Table 2, or Figures 13A to 13XX, each having at least two binding affinity.

24. The method according to any one of claims 17 to 23, wherein for each of the subjects among the plurality of training subjects, the biological sample from the intestine of each subject is a fecal sample from each of the training subjects.

25. The method according to any one of claims 17 to 24, wherein the therapy is a biological therapy, immunotherapy, chemotherapy, radiotherapy, gene therapy, hormone therapy, photodynamic therapy, targeted therapy, small molecules, antibodies, polynucleotides, natural compounds, immunomodulators, bone marrow therapy, stem cell therapy, surgical therapy, induction therapy, maintenance therapy, or a combination thereof.

26. The method according to any one of claims 17 to 25, wherein the disorder is selected from the group consisting of type 2 diabetes, hypertension, schizophrenia, atherosclerotic cardiovascular disease (ACCVD), cirrhosis (LC), inflammatory bowel disease (IBD), colorectal cancer (CRC), ankylosing spondylitis (AS), and Parkinson's disease (PD), inflammatory bowel disease (IBD), rheumatoid arthritis (RA), or progressive melanoma, and B-cell lymphoma.

27. The method according to any one of claims 17 to 26, wherein the disorder is cancer.

28. The method according to any one of claims 17 to 27, wherein the prediction of the response of each of the training subjects is the class output of each of the multiple possible responses of each of the training subjects.

29. The method according to any one of claims 17 to 27, wherein the prediction of the response of each of the training targets is a probability output for the response of each of the training targets.

30. The method according to any one of claims 17 to 29, wherein the model is a neural network algorithm, a support vector machine algorithm, a naive Bayes algorithm, a nearest neighbor algorithm, a boosted tree algorithm, a random forest algorithm, a convolutional neural network algorithm, a decision tree algorithm, a regression algorithm, or a clustering algorithm.

31. The method according to any one of claims 17 to 30, wherein the plurality of parameters are at least 1,000, at least 10,000, at least 15,000, at least 50,000, at least 100,000, at least 250,000, at least 500,000, or at least 1,000,000 parameters.

32. The method according to any one of claims 17 to 31, wherein the model applies the plurality of parameters to the information by calculations at least 25,000, at least 50,000, at least 100,000, at least 250,000, at least 500,000, or at least 1,000,000 times in order to obtain corresponding outputs from the model for each of the training targets.

33. A computer system, One or more processors, A computer system comprising: a non-temporary computer-readable medium containing a computer-executable instruction that, when executed by one or more of the aforementioned processors, causes the processors to carry out the method according to any one of claims 1 to 32.

34. A non-temporary computer-readable storage medium having stored program code instructions that, when executed by the processor, cause the processor to perform the method according to any one of claims 1 to 32.

35. A pharmaceutical composition comprising a first intestinal microorganism selected from those microorganisms listed in Figures 13A to 13XX.

36. The pharmaceutical composition according to claim 35, further comprising a pharmaceutically acceptable excipient.

37. The pharmaceutical composition according to claim 35 or 36, wherein the first intestinal microorganism belongs to Guild 1, as identified in Figures 13A to 13XX.

38. The pharmaceutical composition according to claim 35 or 36, wherein the first intestinal microorganism belongs to Guild 2, as identified in Figures 13A to 13XX.

39. The pharmaceutical composition according to any one of claims 35 to 38, wherein the first intestinal microorganism has a genome having at least 99% sequence identity with respect to the set of microbial contigs listed in Figures 12A to 12I.

40. The pharmaceutical composition according to any one of claims 35 to 39, wherein the first intestinal microorganism constitutes at least 50% of the total amount of intestinal microorganisms in the composition.

41. The pharmaceutical composition according to claim 40, wherein the first intestinal microorganism constitutes at least 75% of the total amount of intestinal microorganisms in the composition.

42. The pharmaceutical composition according to claim 40, wherein the first intestinal microorganism constitutes at least 90% of the total amount of intestinal microorganisms in the composition.

43. The pharmaceutical composition according to claim 40, wherein the first intestinal microorganism constitutes at least 95% of the total amount of intestinal microorganisms in the composition.

44. The pharmaceutical composition according to claim 40, wherein the first intestinal microorganism constitutes at least 99% of the total amount of intestinal microorganisms in the composition.

45. The pharmaceutical composition according to claim 40, wherein the first intestinal microorganism constitutes at least 99.5% of the total amount of intestinal microorganisms in the composition.

46. The pharmaceutical composition according to claim 40, wherein the first intestinal microorganism constitutes at least 99.9% of the total amount of intestinal microorganisms in the composition.

47. The pharmaceutical composition according to any one of claims 35 to 46, further comprising a second intestinal microorganism selected from those microorganisms listed in Figures 13A to 13XX.

48. The pharmaceutical composition according to claim 47, wherein the second intestinal microorganism belongs to the same guild as the first intestinal microorganism, as identified in Figures 13A to 13XX.

49. A method for treating a subject in need of treatment, comprising administering to the subject a therapeutically effective amount of the pharmaceutical composition described in any one of claims 35 to 48.

50. The method according to claim 49, wherein the administration is performed by fecal microbiome transplantation.

51. The method according to claim 49, wherein the administration is performed by direct implantation into the intestines of the subject.

52. The method according to claim 49, wherein the administration is by oral ingestion.

53. The method according to any one of claims 49 to 52, wherein the subject has a condition selected from the group consisting of type 2 diabetes mellitus (T2D), hypertension (HT), schizophrenia (SCZ), atherosclerotic cardiovascular disease (ACCVD), cirrhosis of the liver (LC), inflammatory bowel disease (IBD), colorectal cancer (CRC), ankylosing spondylitis (AS), Parkinson's disease (PD), multiple sclerosis (MS), Gaucher disease type 2 (GDII), COVID-19 (COV), Behçet's disease (BD), autism spectrum disorder (ASD), and pancreatic cancer (PC).

54. The method according to claim 49, wherein the subject has cancer.

55. The method according to any one of claims 49 to 54, further comprising administering a second therapeutic agent to the subject.

56. A method for isolating intestinal microorganisms selected from those intestinal microorganisms listed in Figure 41, the method comprising using sequences associated with the intestinal microorganisms in Figure 41 for isolation.

57. The method according to claim 56, comprising: isolating a set of one or more microbial cultures grown from a single cell of a biological sample; and determining, for each of the microbial cultures in the set of one or more microbial cultures, whether the genomic DNA isolated from each culture has sequence identity with one or more contigs associated with the microorganisms listed in Figure 41.

58. A method for treating the subject, In electronic format, using at least one processor, multiple genome abundance values ​​are obtained for each of the at least 20 species of intestinal bacteria in a biological sample from the subject, including the corresponding abundance values ​​for the genome of each of the at least 20 species of intestinal bacteria. The process involves inputting the multiple genome abundance values ​​into a health detection model that includes multiple health detection model parameters, wherein the health detection model applies the multiple health detection model parameters to the multiple genome abundance values ​​in order to generate an indicator of the target's health as an output from the health detection model. A method comprising administering to the subject at least one therapeutic agent containing at least one intestinal microbial transplant.

59. In electronic format, for each of the multiple training subjects, For each of the multiple intestinal microorganisms, a corresponding set of multiple genome abundance values, including a corresponding value for the genome abundance of each intestinal microorganism in the corresponding biological sample from the respective training target intestine, and To acquire the corresponding state of the biological characteristics of each of the aforementioned training subjects, For each of the multiple training targets, information about each training target is input into a model that includes multiple parameters, wherein the model applies the multiple parameters to the information in order to obtain a corresponding output from the model for each training target. The corresponding output includes an index of the corresponding state of the biological characteristics of each of the training subjects, The information relating to each of the training targets is to be input, including the corresponding genome abundance value for each of the multiple intestinal microorganisms. The method of claim 58, further comprising: for each of the first plurality of training subjects, adjusting the plurality of parameters based on one or more differences between (i) the corresponding output from the model and (ii) the corresponding state of the biological characteristics of each of the first plurality of training subjects.

60. To obtain the above, To obtain, in electronic format, a plurality of at least 100,000 nucleic acid sequences for the genomic DNA from the biological sample from the intestine of the subject, The method according to claim 58, comprising determining the correspondence value for each of the plurality of intestinal microorganisms to the abundance of the genome of each intestinal microorganism from the plurality of at least 100,000 nucleic acid sequences.

61. The above decision is The assembly of a plurality of corresponding intestinal microbial genomes by a metagenomic de novo sequence assembly from the plurality of at least 100,000 nucleic acid sequences in electronic form, The method according to claim 60, comprising: for each of the plurality of intestinal microorganisms, calculating the corresponding value of the genome of each intestinal microorganism to the abundance of each intestinal microorganism based on the prevalence of each nucleic acid sequence in the plurality of at least 100,000 nucleic acid sequences used to assemble each intestinal microorganism genome in the plurality of intestinal microorganism genomes corresponding to each intestinal microorganism.

62. The above decision is Assigning each nucleic acid sequence in the plurality of at least 100,000 sequences to each of the plurality of intestinal microorganisms, thereby generating a corresponding count for each of the plurality of nucleic acid sequences assigned to each of the plurality of intestinal microorganisms. The method according to claim 60, comprising determining the corresponding genome abundance value for each of the intestinal microorganisms in the plurality of intestinal microorganisms based on the corresponding count of the nucleic acid sequence assigned to each of the intestinal microorganisms.

63. The method according to any one of claims 60 to 62, further comprising sequencing the genomic DNA from the biological sample from the intestine of the subject, thereby obtaining the plurality of at least 100,000 nucleic acid sequences.

64. The method according to any one of claims 60 to 63, wherein the plurality of intestinal microorganisms comprises at least 20 microorganisms having at least two binding properties.

65. The method according to any one of claims 60 to 64, wherein the biological sample from the intestine of the subject is a fecal sample.

66. The method according to any one of claims 60 to 65, wherein the indicator of health of the subject is an indicator of biological characteristics, and the biological characteristics are a disease or disorder, a therapy applied to the subject, or a diet for the subject.

67. The method according to claim 66, wherein the disease or disorder is selected from the group consisting of type 2 diabetes mellitus, hypertension, schizophrenia, atherosclerotic cardiovascular disease (ACCVD), cirrhosis of the liver (LC), inflammatory bowel disease (IBD), colorectal cancer (CRC), ankylosing spondylitis (AS), and Parkinson's disease (PD).

68. The method according to claim 66, wherein the disease or disorder is cancer.

69. The method according to any one of claims 58 to 68, wherein the indicator of the health of the subject is a class output of each of the multiple possible states of the health of the subject.

70. The method according to any one of claims 58 to 68, wherein the indicator of the health of the subject is a probability output for the corresponding state of the health of the subject.

71. The method according to any one of claims 58 to 70, wherein the health detection model includes at least one of a neural network algorithm, a support vector machine algorithm, a naive Bayes algorithm, a nearest neighbor algorithm, a boost tree algorithm, a random forest algorithm, a convolutional neural network algorithm, a decision tree algorithm, a regression algorithm, or a clustering algorithm.

72. The method according to any one of claims 58 to 71, wherein the plurality of parameters are at least 1,000, at least 10,000, at least 15,000, at least 50,000, at least 100,000, at least 250,000, at least 500,000, or at least 1,000,000 parameters.

73. The method according to any one of claims 58 to 72, wherein the model applies the plurality of parameters to the information by calculations at least 25,000, at least 50,000, at least 100,000, at least 250,000, at least 500,000, or at least 1,000,000 times in order to obtain corresponding outputs from the model for each of the training targets.