Metabolic enzyme directed evolution system based on growth coupling and ai reconstruction metabolic pathway optimization method

By using a growth-coupled directed evolution system of metabolic enzymes and an AI-based method for reconstructing metabolic pathways, the predictive design challenge of metabolite landscape between protein sequences and cellular functions was solved, resulting in a significant increase in metabolite yield and establishing a predictive cell design and biotechnology innovation platform.

CN122201431APending Publication Date: 2026-06-12TSINGHUA UNIVERSITY

Patent Information

Authority / Receiving Office
CN · China
Patent Type
Applications(China)
Current Assignee / Owner
TSINGHUA UNIVERSITY
Filing Date
2026-01-20
Publication Date
2026-06-12

AI Technical Summary

Technical Problem

Existing technologies struggle to systematically explore and predictively design the metabolite landscape between protein sequences and cellular functions. Traditional methods are labor-intensive and fail to capture the evolutionary pressures shaping metabolism within high-dimensional sequence landscapes. The lack of effective sampling strategies limits the construction of a holistic view of cellular metabolism.

Method used

We employ a growth-coupled metabolic enzyme directed evolution system, combined with an AI-based metabolic pathway optimization method. Through an orthogonal DNA replication system and a biosensor system, we achieve metabolite sensing in host cells, mimicking natural selection, promoting the development of highly adaptive mutations and the cooperative interaction of low-adaptive mutations, and expanding the functional sequence space sampling.

🎯Benefits of technology

It has enabled the comprehensive mapping and reconstruction of the metabolic adaptive landscape for predictive cell design, significantly improving metabolite yield by more than 14 times, and establishing a predictive cell design and biotechnology innovation platform.

✦ Generated by Eureka AI based on patent content.

Smart Images

  • Figure CN122201431A_ABST
    Figure CN122201431A_ABST
Patent Text Reader

Abstract

The application discloses a metabolic enzyme directed evolution system based on growth coupling and an AI reconstruction metabolic pathway optimization method. The application provides an AI reconstruction metabolic pathway optimization method based on a metabolic enzyme directed evolution system based on growth coupling, which comprises the following steps: using the metabolic enzyme directed evolution system based on growth coupling to evolve metabolic enzymes, culturing and screening host cells, obtaining catalytic enzyme mutants conforming to the direction of directed evolution based on the host cells surviving under the screening pressure, and the mutant protein sequences of the catalytic enzyme mutants; and inputting the mutant protein sequences corresponding to the screened catalytic enzyme mutants and the wild type protein sequences corresponding to the wild type catalytic enzyme into a screening model to obtain the adaptability scores corresponding to each catalytic enzyme mutant; and screening the target catalytic enzyme mutant based on the adaptability scores. The application utilizes a coevolution strategy and an AI driven model to promote predictive cell design and biotechnological innovation.
Need to check novelty before this filing date? Find Prior Art

Description

Technical Field

[0001] This invention belongs to the field of bioengineering and synthetic biology, and relates to a growth-coupled directed evolution system of metabolic enzymes and its AI-based method for optimizing metabolic pathways. Background Technology

[0002] Establishing direct links between protein sequences and cellular functions is fundamental to understanding biological systems. Metabolism plays a central role in cellular function, coordinating essential processes such as DNA, RNA, and protein biosynthesis, as well as cell growth and maintenance. Deciphering and predictively designing metabolic behavior is a long-standing goal in biology with broad applications—from maintaining cellular homeostasis and regulating immune responses to optimizing substrate transformation to facilitate biotechnological innovation. While protein engineering has traditionally focused on improving catalytic efficiency, catalytic activity itself does not determine metabolite yield. Cellular factors, such as metabolic burden, resource allocation, and cell state, exert equally crucial influences in this highly interconnected network. To systematically explore these complexities, the inventors previously introduced the concept of a cellular functional landscape, which establishes direct links between protein sequences (e.g., metabolic enzymes) and cellular functions (e.g., metabolite output) within the native cellular environment. Figure 1 Despite decades of research, predictive design of such sequence-metabolite landscapes remains difficult to achieve because the fundamental principles governing these relationships are not yet fully understood.

[0003] Comprehensive mapping of this landscape requires systematic exploration of the sequence space to identify variants that regulate metabolic flux and homeostasis. Traditional methods—such as deep mutational scanning (DMS), site-saturated mutations, and randomized library construction—have yielded valuable insights but remain labor-intensive, time-consuming, and involve trade-offs between accuracy, precision, and sequence coverage. Furthermore, these methods typically explore low-dimensional mutational spaces, failing to capture the evolutionary pressures that shape metabolism within high-dimensional sequence landscapes. Directed evolution accelerates variant selection but is inherently biased towards highly active variants, often neglecting less active variants that offer crucial insights into metabolic kinetics. Previously, the inventors demonstrated that efficient sampling of the protein adaptive landscape enables extreme compression and reconstruction for predictive design. This raises the question: can the cellular sequence-metabolite landscape also be compressed through strategic sampling of key data points, thereby enabling predictive cell design? However, current methods lack efficient sampling strategies capable of simultaneously capturing both enhancing and diminishing variants, limiting the ability to construct a holistic view of cellular metabolism. Summary of the Invention

[0004] The problem the invention aims to solve

[0005] To overcome the above limitations, the inventors proposed the concept of co-evolution, a paradigm different from competitive directed evolution. Figure 2 In co-evolution, selection pressures are distributed among different variants through metabolite sensing, mimicking natural selection across ecosystems. Metabolite sensing can occur within individual cells carrying different mutations (cell-level cooperation) or through intercellular communication via extracellular diffusion (population-level cooperation). This framework allows strong selection pressures to drive microbial populations toward highly adaptive mutations while simultaneously fostering cooperative interactions with less adaptive mutations, thereby expanding functional sequence space sampling and promoting collective adaptation. This systematic exploration enables a comprehensive mapping of the metabolic adaptive landscape by capturing key data points that inform machine learning and language models, facilitating landscape reconstruction and interpretation. The reconstructed landscape, in turn, allows for deeper understanding of metabolic network structure for predictive cell design.

[0006] Solution for solving the problem

[0007] [1]. An AI-based method for optimizing metabolic pathways by reconstructing metabolic pathways based on a growth-coupled metabolic enzyme directed evolution system, characterized in that the method comprises:

[0008] Metabolic enzymes are evolved using a growth-coupled directed evolution system, wherein host cells are cultured and screened, and based on the host cells that survive the screening pressure, catalytic enzyme mutants conforming to the directed evolution direction are obtained, along with mutant protein sequences of the catalytic enzyme mutants; and,

[0009] The mutant protein sequences corresponding to the screened catalytic enzyme mutants and the wild-type protein sequences corresponding to the wild-type catalytic enzymes are input into the screening model to obtain the fitness score for each catalytic enzyme mutant; based on the fitness score, target catalytic enzyme mutants are screened.

[0010] The growth-coupled metabolic enzyme directed evolution system includes: a host cell, and an orthogonal DNA replication system and a biosensor system integrated into the host cell;

[0011] The orthogonal DNA replication system is used to mutate a target gene encoding at least one metabolic enzyme in order to carry out continuous directed evolution of the target gene.

[0012] The biosensor system includes a biosensor and a selectable marker gene and / or a reporter gene. The biosensor is configured to respond to metabolites in a metabolic pathway involving the metabolic enzyme to regulate the expression of the selectable marker gene and / or reporter gene. The biosensor couples the concentration of the metabolite with the survival or growth advantage of the host cell, such that host cells surviving under selection pressure contain the target gene that has evolved to promote the production of the metabolite.

[0013] Preferably, the biosensor system includes a biosensor, as well as a selectable marker gene and a reporter gene.

[0014] [2]. The method according to [1] is characterized in that the metabolites include intracellular metabolites and / or metabolites that diffuse to the extracellular space.

[0015] [3]. The method according to [1] or [2] is characterized in that the biosensor system is integrated into the genome of the host cell, and / or,

[0016] The biosensor is expressed by a promoter; and / or,

[0017] The selectable marker gene and / or reporter gene are expressed by a promoter containing a binding sequence of the biosensor;

[0018] Optionally, when the metabolite is absent or the metabolite concentration is insufficient to trigger a response in the biosensor, the biosensor binds to the binding sequence to suppress the expression of the selectable marker gene and / or reporter gene; when the target gene in a host cell surviving under selection pressure evolves into a target gene that promotes the production of the metabolite, the metabolite concentration increases, and the biosensor binds to the metabolite, thereby promoting the expression of the selectable marker gene and reporter gene.

[0019] [4]. The method according to any one of [1]-[3] is characterized in that the host cell comprises a prokaryotic cell or a eukaryotic cell;

[0020] Optionally, the prokaryotic cells include Escherichia coli;

[0021] Optionally, the eukaryotic cells include yeast, mammalian cells, insect cells, plant cells, and / or fungi;

[0022] Optionally, the host cell includes yeast, including Saccharomyces cerevisiae, Saccharomyces kiwifruit, Saccharomyces cerevisiae, Saccharomyces rubrum, or Saccharomyces pastoris;

[0023] Preferably, the host cell comprises Saccharomyces cerevisiae.

[0024] [5]. The method according to any one of [1]-[4] is characterized in that the selection marker gene includes an antibiotic resistance gene or a nutrient marker gene; and / or, the reporter gene contains a gene encoding a fluorescent protein;

[0025] Optionally, the antibiotic resistance gene includes at least one selected from bleomycin resistance gene, kanamycin resistance gene, hygromycin resistance gene, norsin gene, and amphotericin gene;

[0026] Optionally, the nutrient marker gene includes at least one selected from URA, LEU, HIS, ADE2, TRP1, and MET17;

[0027] Optionally, the fluorescent protein includes green fluorescent protein or a derivative thereof, and / or red fluorescent protein or a derivative thereof.

[0028] [6]. The method according to any one of [1]-[5], characterized in that the orthogonal DNA replication system comprises:

[0029] The plasmid contains a foreign plasmid encoding a target gene for at least one metabolic enzyme; and a fallibility-prone DNA polymerase expression plasmid.

[0030] [7]. The method according to any one of [1]-[6] is characterized in that the screening model comprises: a one-dimensional vector extraction unit, a two-dimensional vector extraction unit, a geometric encoder, and a multilayer perceptron, wherein inputting the mutant protein sequence corresponding to the screened catalytic enzyme mutant and the wild-type protein sequence corresponding to the wild-type catalytic enzyme into the screening model to obtain the fitness score corresponding to each catalytic enzyme mutant comprises: the one-dimensional vector extraction unit performing one-dimensional feature extraction based on the mutant protein sequence to obtain a one-dimensional mutation vector characterizing the evolutionary information of the catalytic enzyme mutant, and performing one-dimensional feature extraction based on the wild-type protein sequence to obtain a one-dimensional wild-type vector characterizing the evolutionary information of the wild-type catalytic enzyme; the two-dimensional vector extraction unit performing two-dimensional feature extraction based on the mutant protein sequence to obtain a table A two-dimensional mutation vector representing geometric features related to the function of the catalytic enzyme mutant is obtained. Based on the wild-type protein sequence, two-dimensional feature extraction is performed to obtain a two-dimensional wild-type vector representing geometric features related to the function of the wild-type catalytic enzyme. The one-dimensional mutation vector, one-dimensional wild-type vector, two-dimensional mutation vector, and two-dimensional wild-type vector are input into the geometric encoder to obtain a first node embedding vector corresponding to the catalytic enzyme mutant and a first edge embedding vector for connecting the first node embedding vector, as well as a second node embedding vector corresponding to the wild-type catalytic enzyme and a second edge embedding vector for connecting the second node embedding vector. The first node embedding vector, the first edge embedding vector, the second node embedding vector, and the second edge embedding vector are input into a multilayer perceptron to obtain an fitness score.

[0031] [8]. A mutant obtained by the AI-based metabolic pathway optimization method based on the growth-coupled metabolic enzyme directed evolution system as described in any one of [1]-[7];

[0032] Optionally, the mutant includes a 4-coumaric acid-CoA ligase (4CL) mutant, the 4-coumaric acid-CoA ligase mutant corresponding to the amino acid sequence shown in SEQ ID NO: 5, having any one of the mutations shown in (m1)-(m30):

[0033] (m1) I284M, I533M;

[0034] (m2) N415S, I533M;

[0035] (m3)I284M,N415S;

[0036] (m4) F110R, K426E, L450S;

[0037] (m5)I167V,N397K,K426E;

[0038] (m6) F110K, E190G, I454V;

[0039] (m7) F110R, N238D, E331V, S532A;

[0040] (m8) N397K, V498E, S532A, D545G;

[0041] (m9) N397G, K426E;

[0042] (m10) F110R, I167N, N397K;

[0043] (m11) T3I, F110K, M318K;

[0044] (m12) S262G, M318K, D488G;

[0045] (m13) N397K, K426E, L450S;

[0046] (m14) I46S, I167T, K426E;

[0047] (m15) M318K, E331V, D488G;

[0048] (m16) F110R, E331V, E365G;

[0049] (m17) F110R,N397K;

[0050] (m18)I271L,N397G,K426E,I541T;

[0051] (m19)I271L,K426E,D488G,I541T;

[0052] (m20) F110K,N397K;

[0053] (m21) I46S, K426E;

[0054] (m22) N415S, K544S;

[0055] (m23) N415S, T423S;

[0056] (m24) N397K,K426E;

[0057] (m25)F110R,I252T,M318K,E331V;

[0058] (m26)F110K,V185G,E331V,I505L;

[0059] (m27) F110K, V246G, E365G, S532A;

[0060] (m28) V8D, Q130L, M318K, S532A;

[0061] (m29) D13G, M318K, N397K, I524V;

[0062] (m30) F110K, I271L, N397K, I541T;

[0063] Optionally, the mutant includes a chalcone synthase (CHS) mutant, the chalcone synthase mutant corresponding to the amino acid sequence shown in SEQ ID NO: 6, having any one of the mutations shown in (n1)-(n30):

[0064] (n1)D61L,A308K;

[0065] (n2)D61R,K66E,A308K;

[0066] (n3)K66E,A308K;

[0067] (n4) K66E, K67R, A308K;

[0068] (n5) K281E, S293H, A308K;

[0069] (n6) K66E, I229T, K234R, A308K;

[0070] (n7)V2I,K281E,A308K;

[0071] (n8) K281E, S293T, A308K;

[0072] (n9) K66E, S208N, A308K;

[0073] (n10)D61A,K66E,A308K;

[0074] (n11) K66E, A308R;

[0075] (n12) K66E, K67R, A308K, L343H;

[0076] (n13)D61L,K281E,A308K;

[0077] (n14)K66E,S208N;

[0078] (n15)D61A,K281E,A308K;

[0079] (n16)A308K,L343H;

[0080] (n17) K55R, K281E, A308K;

[0081] (n18)D61A,K66E;

[0082] (n19)K281E,A308K;

[0083] (n20)S293T,A308K;

[0084] (n21)D61R,K66E;

[0085] (n22) K55R, D61R, K66E, K67R;

[0086] (n23)K66E,L343H;

[0087] (n24)D61A,K66E,K67R,I229T;

[0088] (n25)D61R,K66E,K67R,A308R;

[0089] (n26) K66E, K67R, S208N, I229T;

[0090] (n27) K66E, K67R, S293F, L343H;

[0091] (n28) K66E, K67R, I229T, L343H;

[0092] (n29) K66E, K67R, I229T, K234R;

[0093] (n30)D61A,K66E,K67R,K234R;

[0094] Optionally, the mutant comprises a combination of a 4-coumaric acid-CoA ligase mutant and a chalcone synthase mutant.

[0095] The mutants include combinations of 4-coumaric acid-CoA ligase mutants and chalcone synthase mutants selected from any one of the following (z1)-(z32):

[0096] (z1)4CL mutants I284M, I533M; CHS mutants K281E, S293T, A308K;

[0097] (z2)4CL mutants N415S, I533M; CHS mutants D61L, A308K;

[0098] (z3)4CL mutants I284M, I533M; CHS mutants K281E, S293H, A308K;

[0099] (z4)4CL mutants N415S, I533M; CHS mutants K281E, S293T, A308K;

[0100] (z5)4CL mutants I167V, N397K, K426E; CHS mutants K281E, S293T, A308K;

[0101] (z6)4CL mutants I167V, N397K, K426E; CHS mutants D61R, K66E, A308K;

[0102] (z7)4CL mutants F110R, I167N, N397K; CHS mutants K66E, A308K;

[0103] (z8)4CL mutants F110R, I167N, N397K; CHS mutants D61L, A308K;

[0104] (z9)4CL mutants I167V, N397K, K426E; CHS mutants D61L, A308K;

[0105] (z10)4CL mutants I167V, N397K, K426E; CHS mutants V2I, K281E, A308K;

[0106] (z11)4CL mutants I284M and I533M; CHS mutants D61L and A308K;

[0107] (z12)4CL mutants F110R, I167N, N397K; CHS mutants K66E, K67R, A308K;

[0108] (z13)4CL mutants I284M, I533M; CHS mutants D61R, K66E, A308K;

[0109] (z14)4CL mutants I167V, N397K, K426E; CHS mutants K66E, A308K;

[0110] (z15)4CL mutants I167V, N397K, K426E; CHS mutants K66E, K67R, A308K;

[0111] (z16)4CL mutants F110R, I167N, N397K; CHS mutants D61R, K66E, A308K;

[0112] (z17)4CL mutants F110R, I167N, N397K; CHS mutants K281E, S293T, A308K;

[0113] (z18)4CL mutants N415S, I533M; CHS mutants D61R, K66E, A308K;

[0114] (z19)4CL mutants I284M and I533M; CHS mutants K66E and A308K;

[0115] (z20)4CL mutants I167V, N397K, K426E; CHS mutants K281E, S293H, A308K;

[0116] (z21)4CL mutants F110R, I167N, N397K; CHS mutants V2I, K281E, A308K;

[0117] (z22)4CL mutants F110R, I167N, N397K; CHS mutants K281E, S293H, A308K;

[0118] (z23)4CL mutants N415S, I533M; CHS mutants K66E, K67R, A308K;

[0119] (z24)4CL mutants N415S, I533M; CHS mutants K281E, S293H, A308K;

[0120] (z25)4CL mutants N415S, I533M; CHS mutants K66E, A308K;

[0121] (z26)4CL mutants I284M, I533M; CHS mutants V2I, K281E, A308K;

[0122] (z27)4CL mutants N415S, I533M; CHS mutants V2I, K281E, A308K;

[0123] (z28)4CL mutants I284M, I533M; CHS mutants K66E, K67R, A308K;

[0124] (z29)4CL mutants I167V, N397K, K426E; CHS mutants K66E, I229T, K234R, A308K;

[0125] (z30)4CL mutants N415S, I533M; CHS mutants K66E, I229T, K234R, A308K;

[0126] (z31)4CL mutants I284M, I533M; CHS mutants K66E, I229T, K234R, A308K;

[0127] (z32)4CL mutants F110R, I167N, N397K; CHS mutants K66E, I229T, K234R, A308K.

[0128] [9]. Use of the mutants described in [8] for the preparation of metabolites in the flavonoid synthesis pathway;

[0129] Optionally, the metabolites in the flavonoid synthesis pathway include naringenin or resveratrol.

[0130]

[10] . A growth-coupled metabolic enzyme directed evolution system, characterized in that the growth-coupled metabolic enzyme directed evolution system comprises: a host cell, and an orthogonal DNA replication system and a biosensor system integrated into the host cell;

[0131] The orthogonal DNA replication system is used to mutate a target gene encoding at least one metabolic enzyme in order to carry out continuous directed evolution of the target gene.

[0132] The biosensor system includes a biosensor and a selectable marker gene and / or a reporter gene. The biosensor is configured to respond to metabolites in a metabolic pathway involving the metabolic enzyme to regulate the expression of the selectable marker gene and / or reporter gene. The biosensor couples the concentration of the metabolite with the survival or growth advantage of the host cell, such that host cells surviving under selection pressure contain the target gene that has evolved to promote the production of the metabolite.

[0133] Preferably, the biosensor system includes a biosensor, as well as a selectable marker gene and a reporter gene;

[0134] Optionally, the metabolites include intracellular metabolites and / or extracellular metabolites; optionally, the intracellular metabolites include p-coumaroyl-CoA; optionally, the extracellular metabolites include naringenin.

[0135] Optionally, the biosensor is expressed by a promoter, and / or the selectable marker gene and / or reporter gene is expressed by a promoter containing a binding sequence of the biosensor;

[0136] Optionally, the biosensor includes CouR or a functional variant thereof, and / or TtgR or a functional variant thereof;

[0137] Optionally, the CouR comprises an amino acid sequence as shown in SEQ ID NO: 8, or an amino acid sequence having at least 80% identity with SEQ ID NO: 8; and / or, the binding sequence of the CouR or a functional variant thereof comprises a nucleotide sequence as shown in SEQ ID NO: 23, or a nucleotide sequence having at least 80% identity with SEQ ID NO: 23;

[0138] Optionally, the TtgR comprises an amino acid sequence as shown in SEQ ID NO: 9, or an amino acid sequence having at least 80% identity with SEQ ID NO: 9; and / or, the binding sequence of the TtgR or a functional variant thereof comprises a nucleotide sequence as shown in SEQ ID NO: 24, or a nucleotide sequence having at least 80% identity with SEQ ID NO: 24;

[0139] Optionally, when the biosensor includes CouR or a functional variant thereof, the promoter containing the binding sequence of the biosensor comprises a nucleotide sequence as shown in any one of SEQ ID NO: 19-22, or a nucleotide sequence having at least 80% identity with any one of SEQ ID NO: 19-22; preferably, the promoter driving the expression of CouR or a functional variant thereof comprises a nucleotide sequence as shown in SEQ ID NO: 19, or a nucleotide sequence having at least 80% identity with SEQ ID NO: 19;

[0140] Optionally, when the biosensor includes TtgR or a functional variant thereof, the promoter containing the binding sequence of the biosensor comprises a nucleotide sequence as shown in any one of SEQ ID NO: 15-18, or a nucleotide sequence having at least 80% identity with any one of SEQ ID NO: 15-18; preferably, the promoter driving the expression of TtgR or a functional variant thereof comprises a nucleotide sequence as shown in SEQ ID NO: 15 or 17, or a nucleotide sequence having at least 80% identity with SEQ ID NO: 15 or 17;

[0141] Optionally, the metabolic enzyme includes 4-coumaric acid-CoA ligase and chalcone synthase;

[0142] Optionally, the selectable marker gene includes an antibiotic resistance gene or a nutrient marker gene; and / or, the reporter gene comprises a gene encoding a fluorescent protein;

[0143] Optionally, the antibiotic resistance gene includes at least one selected from bleomycin resistance gene, kanamycin resistance gene, hygromycin resistance gene, norsin gene, and amphotericin gene;

[0144] Optionally, the nutrient marker gene includes at least one selected from URA, LEU, HIS, ADE2, TRP1, and MET17;

[0145] Optionally, the fluorescent protein includes green fluorescent protein or a derivative thereof, and / or red fluorescent protein or a derivative thereof;

[0146] Optionally, the orthogonal DNA replication system comprises: an exogenous plasmid containing a target gene encoding at least one metabolic enzyme; and a fallibility-prone DNA polymerase expression plasmid.

[0147] The effects of the invention

[0148] This invention provides a growth-coupled directed evolutionary system for metabolic enzymes (CoEvo) and its AI-based metabolic pathway optimization method (MetaAI). Combining CoEvo and MetaAI in this invention creates a collaborative platform for co-evolution, landscape compression, reconstruction, and predictive design. Figure 3 Using yeast as a model system—a versatile metabolic chassis with wide applications in both basic research and industrial settings—this approach was applied to flavonoid metabolism. Predictive modeling of the cell sequence-metabolite landscape (r > 0.9) was achieved through the integration of co-evolution and machine learning. We then used the reconstructed landscape to design new sequences, resulting in a significant increase in metabolites (yield increase > 14-fold). Finally, a systematic framework was established for mapping and engineering metabolic landscapes, leveraging co-evolutionary strategies and AI-driven models to advance predictive cell design and biotechnological innovation. Attached Figure Description

[0149] Figure 1 : Schematic diagram of protein sequences (metabolic enzymes) and cellular functional landscape.

[0150] Figure 2 : A schematic diagram of the capture landscape in competitive evolution and co-evolution.

[0151] Figure 3 : Schematic diagram of the research concept of this invention.

[0152] Figure 4: A schematic diagram of the principle of yeast directed evolution. In Saccharomyces cerevisiae, the host genome is replicated by host DNA polymerases (DNAPs) with a low error rate; the engineered telomere protein-dependent DNA polymerase (TP-DNAP1-4-2) replicates only the orthogonal P1 plasmid and is set to a high error rate; by loading the target gene onto P1, rapid mutation and continuous directed evolution can be achieved without affecting the stability of the host genome.

[0153] Figure 5 Schematic diagram of flavonoid biosynthesis pathway and yeast central metabolism.

[0154] Figure 6 : The inhibition and activation of CourR induced by different promoter combinations of modules under coumaroyl-CoA induction.

[0155] Figure 7 Effects of coumaric acid dosage, 4CL copy number, and different substrate analogs on Courier-mediated gene expression.

[0156] Figure 8 : Schematic diagram of the TtgR operator.

[0157] Figure 9 Effects of different concentrations of naringenin on TtgR and the effects of different substrate analogs on TtgR-mediated gene expression.

[0158] Figure 10 : A dedicated biosensor for coumaroyl coenzyme A and naringenin.

[0159] Figure 11 : Constructing an evolutionary system for coumaroyl-CoA gene circuitry and fluorescence detection.

[0160] Figure 12 Schematic diagram of resistance detection regulated by coumaroyl-CoA biosensor.

[0161] Figure 13 : Figure 13 The diagram in 'a' shows the schematic design of the yeast directed evolution enzyme 4CL gene circuit coupled with growth. Figure 13 Figures b and c show the initial screening test for bleomycin resistance in the coumaroyl-CoA evolutionary system.

[0162] Figure 14 : Constructing the naringenin gene circuit and fluorescence detection in the evolutionary system.

[0163] Figure 15 : Bleomycin resistance screening test in evolutionary systems.

[0164] Figure 16 : Bleomycin resistance screening test and evolutionary process in the initial evolutionary system.

[0165] Figure 17 Results of yeast directed evolution of 4CL in the coumaroyl-CoA pathway.

[0166] Figure 18 Trends in the distribution of protein structure and function in mutant libraries.

[0167] Figure 19 Results of yeast directed evolution along the naringenin pathway.

[0168] Figure 20 Trends in the distribution of protein structure and function in mutant libraries.

[0169] Figure 21 Standard curves for Nargenin, Resveratrol, p-Coumaric acid, and s-Reticuline; among which, Figure 21 α-Resveratrol; Figure 21 Naringenin in the form of β-naringenin. Figure 21 p-Coumaric Acid in the formula. Figure 21 cs-reticuline was used. A linear regression was performed with peak area on the ordinate and standard concentration (mM) on the x-axis. The linear range was approximately 0-2.5 mM (a), 0-2.5 mM (b), 0-2.5 mM (c), and 0-0.8 mM (d). Note: y represents peak area, x represents concentration; quantitative calculations used the average of three measurements.

[0170] Figure 22 : Validation of the improvement effect of machine learning-assisted design in flavonoid biosynthesis compared with directed evolution (a,b).

[0171] Figure 23 Validation of the effect of machine learning-assisted design on improving flavonoid biosynthesis. Detailed Implementation

[0172] Various exemplary embodiments, features, and aspects of the present invention will be described in detail below. The term "exemplary" as used herein means "serving as an example, embodiment, or illustration." Any embodiment described herein as "exemplary" is not necessarily to be construed as superior to or better than other embodiments.

[0173] Furthermore, to better illustrate the present invention, numerous specific details are set forth in the following detailed embodiments. Those skilled in the art should understand that the present invention can be practiced without certain specific details. In other instances, methods, means, apparatus, and steps well known to those skilled in the art have not been described in detail in order to highlight the spirit of the present invention.

[0174] Unless otherwise stated, all units used in this specification are international standard units, and all numerical values ​​and ranges appearing in this invention should be understood to include systematic errors that are unavoidable in industrial production.

[0175] I. Terminology

[0176] In this specification, the word "may" has two meanings: to perform a certain process and not to perform a certain process.

[0177] In this specification, references to "some specific / preferred embodiments," "other specific / preferred embodiments," "implementation," etc., refer to specific elements (e.g., features, structures, properties, and / or characteristics) related to that embodiment, which are included in at least one of the embodiments described herein and may or may not be present in other embodiments. Furthermore, it should be understood that these elements may be combined in any suitable manner in various embodiments.

[0178] In this specification, the range of values ​​referred to as "value A to value B" refers to the range including the endpoint values ​​A and B.

[0179] In this specification, the terms "comprising" or "including" are open-ended expressions, meaning they include the contents specified in this invention but do not exclude other aspects.

[0180] In this specification, the terms “optionally,” “optionally,” or “optionally” generally mean that an event or condition described below may, but may not, occur, and the description includes both cases in which the event or condition occurs and cases in which the event or condition does not occur.

[0181] In this specification, the term "5-fluoroorotic acid" (5-fluorouracil-6-carboxylic acid monohydrate; 5-FOA) is used in yeast molecular genetics studies to detect the expression of the URA3 gene (encoding orotic nucleoside-5'-monophosphate (OMP) dicarboxylase). Yeast with URA3 gene activity (Ura+) can convert 5-FOA into cytotoxic fluorodeoxyuridine. When uracil is supplemented in the culture medium, only yeast strains with the URA3 gene mutation can grow in media containing 5-FOA. 5-FOA is commonly used in experiments with *Saccharomyces cerevisiae* (URA3), *Schizosaccharomyces cerevisiae* (URA4 and URA5), *Candida albicans* (URA3), and *Escherichia coli* (pyrF).

[0182] In this manual, the term "bleoR" refers to bleomycin, a member of a large family of antitumor glycopeptide antibiotics produced by the metabolism of *Streptomyces verticillata*. The most commonly used members of this family are the bleomycin and phleomycin families. Bleomycin is primarily used as an antitumor compound, in combination with other anticancer agents, to treat lymphoma, squamous cell carcinoma, and testicular cancer. Phleomycin is mainly used as a selective antibiotic in molecular genetics research to screen for stably transfected cells carrying resistance genes such as Shble.

[0183] In this specification, the term "biosensor" refers to an analytical device consisting of a sensing element and a transducer. The sensing element identifies target substances, primarily including biological substances such as antibodies, enzymes, nucleic acids, and cells, as well as synthetic substances similar to biological substances, such as aptamers, peptides, and molecularly imprinted polymers (MIPs). The transducer converts the interaction between the sensing element and the target molecule into different signals. For example, enzymes catalyze chemical reactions of specific substances, converting them into electrical signals; biological antibodies capture specific antigens and then convert them into light signals through labeled fluorescence. The biosensor described in this article is a protein-based biosensor that primarily alters its function through allosteric effects or forms polymers. After sensing a target substance, it influences the expression of downstream reporter genes through allosteric effects or polymer formation.

[0184] In this specification, the terms “polypeptide,” “peptide,” and “protein” are used interchangeably herein and refer to an amino acid polymer of any length. The polymer may be linear or branched, may contain modified amino acids, and may be separated by non-amino acid segments. The term also includes amino acid polymers that have been modified (e.g., through disulfide bond formation, glycosylation, lipidation, acetylation, phosphorylation, or any other manipulation, such as conjugation with labeled components).

[0185] In this specification, the term "amino acid" can include natural amino acids, non-natural amino acids, amino acid analogs, and all their D and L stereoisomers. The amino acids and their abbreviations and English abbreviations in this invention are shown below:

[0186] Histidine (His, H); Serine (S); Glutamic acid (Glu, E); Glutamine (Gln, Q); Glycine (Gly, G); Threonine (Thr, T); Phenylalanine (Phe, F); Aspartic acid (Asp, D); Tyrosine (Tyr, Y); Leucine (Leu, L); Isoleucine (Ile, I); Arginine (Arg, R); Alanine (Ala, A); Valine (Val, V); Tryptophan (Trp, W); Methionine (Met, M); Asparagine (Asn, N); Cysteine ​​(Cys, C); Lysine (Lys, K); Proline (Pro, P).

[0187] In this specification, the term "wild-type" refers to an object that can be found in nature. For example, a polypeptide or polynucleotide sequence that exists in an organism, can be isolated from a natural source, and has not been intentionally modified by humans in a laboratory is naturally occurring. As used in this invention, "naturally occurring" and "wild-type" are synonyms.

[0188] In this specification, the term "mutant" refers to a polynucleotide or polypeptide that contains alterations (i.e., substitutions, insertions, and / or deletions) at one or more (e.g., several) positions relative to the "wild type" or "comparative" type. Substitution refers to replacing a nucleotide or amino acid occupying a position with a different nucleotide or amino acid. Deletion refers to removing a nucleotide or amino acid occupying a position. Insertion refers to adding a nucleotide or amino acid adjacent to and immediately following the nucleotide or amino acid occupying the position.

[0189] In this specification, the term "mutated amino acid" includes "one or more amino acids that have been substituted, repeated, deleted, or added." In this invention, the term "mutation" refers to a change in the amino acid sequence. In one specific embodiment, the term "mutation" refers to "substitution."

[0190] In one embodiment, the "mutation" of this invention may be selected from "conservative mutations." In this invention, the term "conservative mutation" refers to a mutation that maintains the normal function of a protein. A representative example of a conservative mutation is a conserved substitution.

[0191] In this specification, the term "conservative substitution" refers to the replacement of an amino acid residue with an amino acid residue having a similar side chain. Families of amino acid residues having similar side chains are defined in the art and include those with basic side chains (e.g., lysine, arginine, and histidine), acidic side chains (e.g., aspartic acid and glutamic acid), nonpolar side chains (e.g., glycine, asparagine, glutamine, serine, threonine, tyrosine, and cysteine), nonpolar side chains (e.g., alanine, valine, leucine, isoleucine, proline, phenylalanine, methionine, and tryptophan), β-branched chains (e.g., threonine, valine, and isoleucine), and aromatic side chains (e.g., tyrosine, phenylalanine, tryptophan, and histidine). As used in this invention, a "conservative substitution" generally involves exchanging an amino acid at one or more sites in a protein. This substitution can be conserved. In addition to substitutions considered as conserved substitutions, conserved mutations also include naturally occurring mutations arising from individual, strain, or species differences in gene origin.

[0192] In this specification, the term "polynucleotide" refers to a polymer composed of nucleotides. Polynucleotides can be in the form of individual fragments or as a component of a larger nucleotide sequence structure, derived from a nucleotide sequence isolated at least once in number or concentration, and capable of being recognized, manipulated, and recovered using standard molecular biology methods (e.g., using cloning vectors). This also includes an RNA sequence (i.e., A, U, G, C) when a nucleotide sequence is represented by a DNA sequence (i.e., A, T, G, C), where "U" replaces "T". In other words, "polynucleotide" refers to a polymer of nucleotides removed from other nucleotides (individual fragments or entire fragments), or it can be a component or part of a larger nucleotide structure, such as an expression vector or a polycistronic sequence. Polynucleotides include DNA, RNA, and cDNA sequences. "Recombinant polynucleotides" are a type of "polynucleotide".

[0193] In this specification, the terms "sequence identity" and "identity percentage" refer to the percentage of identical (i.e., same) nucleotides or amino acids between two or more polynucleotides or polypeptides. Sequence identity between two or more polynucleotides or polypeptides can be determined by aligning the nucleotide or amino acid sequences of the polynucleotide or polypeptide and scoring the number of positions in the aligned polynucleotide or polypeptide containing the same nucleotide or amino acid residues, comparing this to the number of positions in the aligned polynucleotide or polypeptide containing different nucleotide or amino acid residues. Polynucleotides may differ at a position, for example, by containing different nucleotides (i.e., substitution or mutation) or deleted nucleotides (i.e., nucleotide insertion or deletion in one or two polynucleotides). Polypeptides may differ at a position, for example, by containing different amino acids (i.e., substitution or mutation) or deleted amino acids (i.e., amino acid insertion or deletion in one or two polypeptides). Sequence identity can be calculated by dividing the number of positions containing the same nucleotide or amino acid residues by the total number of amino acid residues in the polynucleotide or polypeptide. For example, the percentage of identity can be calculated by dividing the number of positions containing the same nucleotide or amino acid residues by the total number of nucleotide or amino acid residues in the polynucleotide or polypeptide and then multiplying by 100.

[0194] In some embodiments, when comparing and aligning two or more sequences or subsequences using sequence comparison algorithms or by visual inspection to measure maximum correspondence, the two or more sequences or subsequences have a “sequence identity” or “percentage of identity” of at least 80%, 81%, 82%, 83%, 84%, 85%, 86%, 87%, 88%, 89%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, or 99% nucleotides. In some embodiments, the sequences are substantially identical along the entire length of any one or two compared biopolymers (e.g., polynucleotides).

[0195] In this specification, the term "corresponding to" has the meaning commonly understood by those skilled in the art. Specifically, "corresponding to" means that, after homology or sequence identity alignment, one sequence corresponds to a specified position in another sequence. Therefore, for example, regarding "corresponding to the 150th amino acid residue of the amino acid sequence shown in Sequence 1," if a 6×His tag is added to one end of the amino acid sequence shown in Sequence 1, then the 150th position corresponding to the 150th position in the resulting mutant might be the 156th position.

[0196] In this specification, the term "expression vector" refers to a DNA construct containing a DNA sequence operatively linked to a suitable control sequence for expressing a target gene in a suitable host. "Recombinant expression vector" refers to a DNA structure for expressing, for example, a polynucleotide encoding a desired exogenous polypeptide. Recombinant expression vectors may include, for example, a collection of genetic elements that regulate gene expression, such as promoters and enhancers; ii) a structural or coding sequence transcribed into mRNA and translated into a protein; and iii) a transcriptional subunit containing appropriate transcription and translation initiation and termination sequences. Recombinant expression vectors are constructed in any suitable manner. The nature of the vector is not important, and any vector, including plasmids, viruses, bacteriophages, and transposons, may be used. Possible vectors used in this invention include, but are not limited to, chromosomal, non-chromosomal, and synthetic DNA sequences, such as bacterial plasmids, bacteriophage DNA, yeast plasmids, and vectors derived from combinations of plasmids and bacteriophage DNA, and DNA from viruses such as vaccinia, adenovirus, fowlpox, baculovirus, SV40, and pseudorabies. For example, as plasmid vectors, vectors based on pDZ, pBR, pUC, pBluescriptII, pGEM, pTZ, pCL, and pET can be used. Specifically, vectors such as pDZ, pDC, pDCM2, pACYC177, pACYC184, pCL, pECCG117, pUC19, pBR322, pMW118, pCC1BAC, and pXMJ19 can be used, but they are not limited to these, as long as they can be replicated and expressed in Corynebacterium glutamicum. As phage vectors, vectors such as pWE15, M13, MBL3, MBL4, IXII, ASHII, APII, t10, t11, Charon4A, and Charon21A can be used.

[0197] In this specification, the term "host cell" means any cell type that is readily available to contain mutants of the present invention, or to contain polynucleotides or recombinant expression vectors encoding mutants.

[0198] In this specification, the term "recombinant host cell" refers to a host cell that differs from the parent cell after the introduction of exogenous polynucleotides, nucleic acid constructs, or recombinant expression vectors. Recombinant host cells are specifically achieved through transformation.

[0199] II. Directed Evolution System of Metabolic Enzymes Based on Growth Coupling

[0200] In some aspects of the present invention, a growth-coupled directed evolution system for metabolic enzymes is provided, wherein the growth-coupled directed evolution system for metabolic enzymes comprises: a host cell, and an orthogonal DNA replication system and a biosensor system integrated into the host cell.

[0201] In some embodiments, the orthogonal DNA replication system is used to mutate a target gene encoding at least one metabolic enzyme in order to carry out continuous directed evolution of the target gene.

[0202] In some embodiments, the biosensor system includes a biosensor and a selectable marker gene and / or reporter gene, the biosensor being configured to regulate the expression of the selectable marker gene and / or reporter gene in response to metabolites in a metabolic pathway involving the metabolic enzyme.

[0203] The biosensor couples the concentration of the metabolite with the survival or growth advantage of the host cell, so that the host cells that survive under selection pressure contain the target gene that has evolved to promote the production of the metabolite.

[0204] In some embodiments, the present invention provides a growth-coupled directed evolutionary screening system for metabolic enzymes, comprising an orthogonal DNA replication system (e.g., the orthogonal DNA replication system comprising a cytoplasmic linear plasmid and a TP-DNApol error-prone DNA polymerase), one or more biosensor systems that can be integrated into the host cell genome (e.g., comprising a biosensor, and a selection marker gene and / or a reporter gene, wherein the biosensor is coupled to the function of the evolved metabolic enzyme / metabolic pathway, inducing the expression and repression of downstream selection marker genes and / or reporter genes to achieve the purpose of screening and enrichment).

[0205] The growth-coupled directed evolutionary screening system for metabolic enzymes provided by this invention can be used to screen for metabolic enzymes and pathways with stronger activity. The metabolic enzymes evolved using this system exhibit functional changes, which can be reflected in the expression of selectable marker genes and / or reporter genes, and can automatically and continuously enrich beneficial mutations.

[0206] (Metabolites)

[0207] In some embodiments, the metabolites include intracellular metabolites and / or metabolites that diffuse outside the cell.

[0208] In some alternative embodiments, the intracellular metabolite includes p-coumaroyl-CoA.

[0209] In some alternative implementations, the metabolites that diffuse into the extracellular space include naringenin.

[0210] (Biosensor system)

[0211] In some embodiments, the biosensor system includes a biosensor, as well as a selectable marker gene and / or a reporter gene.

[0212] In some implementations, the biosensor, as well as the selectable marker gene and / or reporter gene, are expressed by different promoters.

[0213] When the metabolite is absent or the metabolite concentration is insufficient to trigger a response from the biosensor, the biosensor binds to the binding sequence to inhibit the expression of the selectable marker gene and / or reporter gene.

[0214] When the target gene is evolved into a target gene that promotes the production of the metabolite in host cells that survive under selection pressure, the biosensor binds to the metabolite, thereby promoting the expression of the selection marker gene and / or reporter gene.

[0215] In some implementations, the biosensor is expressed by a constitutive promoter.

[0216] In some implementations, the promoter driving the expression of the selectable marker gene and / or reporter gene contains a binding sequence of the biosensor.

[0217] In some embodiments, the biosensor system includes a biosensor, a selectable marker gene, and a reporter gene, wherein the selectable marker gene and the reporter gene are expressed by the same promoter. In this case, it is understood that the selectable marker gene and the reporter gene can be linked by a nucleotide sequence encoding a self-cleaving peptide 2A (2A). Different types of self-cleaving peptides 2A and their amino acid and nucleotide sequences are well known and available to those skilled in the art.

[0218] In some embodiments, the biosensor system is integrated into the genome of the host cell. Methods for integrating a biosensor system, such as its biosensor, along with selected marker genes and / or reporter genes, into the genome of a host cell are known to those skilled in the art, for example, by homologous recombination.

[0219] In some embodiments, the biosensor, along with the selectable marker gene and / or reporter gene, may be integrated into the same site in the genome of the host cell. In other embodiments, the biosensor, along with the selectable marker gene and / or reporter gene, may be integrated into different sites in the genome of the host cell.

[0220] In some embodiments, the biosensor, along with the selectable marker gene and / or reporter gene, can be constructed into the same expression construct for integration into the host cell's genome. In other embodiments, the biosensor, along with the selectable marker gene and / or reporter gene, can be constructed into different expression constructs for integration into the host cell's genome.

[0221] In some exemplary embodiments, the biosensor system or the expression construct comprising the biosensor system includes the following structure:

[0222] [Promoter] - [Biosensor] - [Promoter containing the binding sequence of the biosensor] - [Selection of marker genes and / or reporter genes]

[0223] In some embodiments, the biosensor system may also include screening genes for screening whether host cells have successfully integrated the biosensor system, such as nutrient marker genes, for example, but not limited to URA, LEU, HIS, ADE2, TRP1, and MET17.

[0224] In some exemplary embodiments, the biosensor system or the expression construct comprising the biosensor system includes the following structure:

[0225] [Promoter]-[Biosensor]-[Promoter containing biosensor binding sequence]-[Select marker gene and / or reporter gene]-[Promoter]-[Selection gene].

[0226] In some embodiments, the biosensor system may also include enzymes required in metabolic pathways, other than the metabolic enzymes to be evolved, such as CHI.

[0227] In some exemplary embodiments, the biosensor system or the expression construct comprising the biosensor system includes the following structure:

[0228] [Promoter]-[Biosensor]-[Promoter containing the binding sequence of the biosensor]-[Selection marker gene and / or reporter gene]-[Promoter]-[Required enzyme other than the metabolic enzyme to be evolved].

[0229] CouR system

[0230] In some specific implementations, the biosensor includes CouR or a functional variant thereof.

[0231] In some optional embodiments, the CouR comprises an amino acid sequence as shown in SEQ ID NO:8, or an amino acid sequence having at least 80% identity with SEQ ID NO:8.

[0232] In some optional embodiments, the binding sequence of the CouR or a functional variant thereof comprises a nucleotide sequence as shown in SEQ ID NO:23, or a nucleotide sequence having at least 80% identity with SEQ ID NO:23.

[0233] In some implementations, the promoters driving the expression of CouR or its functional variants include constitutive promoters.

[0234] In some exemplary embodiments, the promoter driving the expression of CouR or its functional variants comprises a nucleotide sequence (pADH1) as shown in SEQ ID NO:10, or a nucleotide sequence having at least 80% identity with SEQ ID NO:10.

[0235] In some implementations, the selectable marker gene and / or reporter gene are expressed by a promoter containing a binding sequence of the biosensor.

[0236] In some alternative embodiments, the promoter containing the binding sequence of the biosensor comprises a nucleotide sequence as shown in any one of SEQ ID NO:19-22, or a nucleotide sequence having at least 80% identity with any one of SEQ ID NO:19-22.

[0237] In some preferred embodiments, the promoter containing the binding sequence of the biosensor comprises a nucleotide sequence as shown in SEQ ID NO:19, or a nucleotide sequence having at least 80% identity with SEQ ID NO:19.

[0238] TtgR system

[0239] In some specific implementations, the biosensor includes TtgR or a functional variant thereof.

[0240] In some optional embodiments, the TtgR comprises an amino acid sequence as shown in SEQ ID NO:9, or an amino acid sequence having at least 80% identity with SEQ ID NO:9.

[0241] In some optional embodiments, the binding sequence of the TtgR or a functional variant thereof comprises a nucleotide sequence as shown in SEQ ID NO:24, or a nucleotide sequence having at least 80% identity with SEQ ID NO:24.

[0242] In some implementations, the promoters driving the expression of TtgR or its functional variants include constitutive promoters.

[0243] In some exemplary embodiments, the promoter driving the expression of TtgR or its functional variants comprises a nucleotide sequence (pADH1) as shown in SEQ ID NO:10, or a nucleotide sequence having at least 80% identity with SEQ ID NO:10.

[0244] In some implementations, the selectable marker gene and / or reporter gene are expressed by a promoter containing a binding sequence of the biosensor.

[0245] In some alternative embodiments, the promoter containing the binding sequence of the biosensor comprises a nucleotide sequence as shown in any one of SEQ ID NO:15-18, or a nucleotide sequence having at least 80% identity with any one of SEQ ID NO:15-18.

[0246] In some preferred embodiments, the promoter containing the binding sequence of the biosensor comprises a nucleotide sequence as shown in SEQ ID NO:15 or 17, or a nucleotide sequence having at least 80% identity with SEQ ID NO:15 or 17.

[0247] (Orthogonal DNA replication system and metabolic enzymes)

[0248] In some embodiments, the orthogonal DNA replication system comprises:

[0249] The plasmid contains a foreign plasmid encoding a target gene for at least one metabolic enzyme; and a fallibility-prone DNA polymerase expression plasmid.

[0250] In some preferred embodiments, the orthogonal DNA replication system is a growth-coupled yeast orthogonal replication system previously developed by the applicant (Ravikumar, A., Arrieta, A., Liu, CC, 2014. Anorthogonal DNA replication system in yeast. Nat Chem Biol 10, 175–177. https: / / doi.org / 10.1038 / nchembio.1439; Ravikumar, A., Arzumanyan, GA, Obadi, MKA, Javanpour, AA, Liu, CC, 2018. Scalable, Continuous Evolution of Genes at Mutation Rates above Genomic Error Thresholds. Cell 175, 1946-1957.e13). https: / / doi.org / 10.1016 / j.cell.2018.10.021, which fully leverages the eukaryotic modification advantages of yeast, by modifying the linear plasmid pGKL1 / 2 and its associated DNA polymerase, to induce high-frequency mutations (~10) in the target gene at the plasmid level. -5 / bp), while the host genome mutation rate remains at an extremely low level (<10). -10 / bp).

[0251] In some implementations, the exogenous plasmid containing the target gene encoding at least one metabolic enzyme can be subjected to directed evolution and screening as long as it can be transformed into a host cell (e.g., yeast), without any significant limitation on the length of the target gene.

[0252] In some embodiments, the growth-coupled directed evolution screening system for metabolic enzymes provided by the present invention is used to evolve metabolic enzymes, which can directionally evolve the catalytic ability of metabolic enzymes. Metabolic enzymes that conform to the evolutionary direction can grow more yeast per unit time under the screening pressure.

[0253] In some specific embodiments, the metabolic enzyme includes 4-coumaric acid-CoA ligase and / or chalcone synthase.

[0254] In some specific implementations, the exogenous plasmid contains target genes encoding 4-coumaric acid-CoA ligase and chalcone synthase.

[0255] (Screening pressure)

[0256] In some implementations, all controllable factors that can affect the growth rate of host cells (e.g., yeast), such as antibiotic concentration and culture medium replacement frequency, can be used as part of the directed evolution screening of metabolic enzymes using this screening system. When such controllable factors are used as part of the directed evolution screening of metabolic enzymes using this screening system, they are also within the scope of protection of this invention.

[0257] In some embodiments, the screening pressure includes growth screening pressure. In some embodiments, the screening pressure is adjustable and includes all variables known in the art in yeast screening systems, such as increasing / decreasing antibiotic concentration, increasing / decreasing the ratio of bacterial culture dilution, increasing / decreasing substrate concentration, accelerating / decelerating the frequency of culture medium replacement, and adding / reducing the types of antibiotics.

[0258] (Host cell)

[0259] In some embodiments, the host cell includes prokaryotic cells or eukaryotic cells.

[0260] In some optional embodiments, the prokaryotic cells include Escherichia coli.

[0261] In some optional embodiments, the eukaryotic cells include yeast, mammalian cells, insect cells, plant cells, and / or fungi.

[0262] In some optional embodiments, the host cell comprises yeast, including Saccharomyces cerevisiae, Saccharomyces kiwifruit, Saccharomyces cerevisiae, Saccharomyces rubrum, or Saccharomyces pastoris.

[0263] In some preferred embodiments, the host cell includes Saccharomyces cerevisiae.

[0264] (Selection of marker genes and reporter genes)

[0265] In some embodiments, the selector gene includes an antibiotic resistance gene or a nutrient marker gene; and / or, the reporter gene comprises a gene encoding a fluorescent protein.

[0266] In some optional embodiments, the antibiotic resistance gene includes at least one selected from bleomycin resistance gene, kanamycin resistance gene, hygromycin resistance gene, norristin gene, and amphotericin gene.

[0267] In some optional embodiments, the nutrient marker gene includes at least one selected from URA, LEU, HIS, ADE2, TRP1, and MET17.

[0268] In some optional implementations, selection marker genes are used for screening. This involves first knocking out endogenous genes in the host cell that are associated with the selection marker gene, and then introducing a new selection marker gene to accurately obtain the desired selection marker gene.

[0269] In some optional embodiments, the fluorescent protein includes green fluorescent protein or a derivative thereof, and / or red fluorescent protein or a derivative thereof.

[0270] III. Directed Evolution Methods for Growth-Coupled Metabolic Enzymes

[0271] In some aspects of the present invention, a growth-coupled directed evolution method for metabolic enzymes is provided, which uses the growth-coupled directed evolution system for metabolic enzymes described above to evolve metabolic enzymes, wherein host cells are cultured and screened, and based on the host cells that survive under screening pressure, catalytic enzyme mutants that conform to the direction of directed evolution are obtained.

[0272] In some implementations, a growth-coupled directed evolution method for metabolic enzymes is used to obtain the mutant protein sequence (mutated amino acid sequence) of the catalytic enzyme mutant, and optionally, the detection value of a reporter gene.

[0273] For example, the detection value of the reporter gene is the fluorescence intensity of the fluorescent protein, such as the average fluorescence intensity, which can be obtained by numerical methods in the art, such as flow cytometry.

[0274] According to some embodiments of the present invention, the screening pressure shown is adjustable, including all variables known in the art in yeast screening systems such as increasing / decreasing antibiotic concentration, accelerating / decelerating the frequency of culture medium replacement, adding / reducing the types of antibiotics, etc.

[0275] In some specific implementations, the growth-coupled directed evolution system method for metabolic enzymes includes:

[0276] (A) Constructing exogenous expression plasmids containing the target gene to be evolved;

[0277] (B) Constructing expression plasmids for error-prone DNA polymerases;

[0278] (C) Constructing expression plasmids for biosensor systems;

[0279] (D) The exogenous expression plasmid described in step (A), the error-prone DNA polymerase expression plasmid described in step (B), and the expression plasmid of the biosensor system described in step (C) are co-transformed into host cells.

[0280] (E) The host cells are cultured under selection pressure, and the host cells that survive the selection pressure contain the target gene that has evolved to promote the production of the metabolite.

[0281] In some implementations, the backbone of the exogenous expression plasmid described in step (A) is derived from a cytoplasmic filament plasmid of a non-host eukaryote.

[0282] Optionally, the non-host eukaryote is Kluyveromyces lactis.

[0283] Furthermore, the exogenous expression plasmid contains sequences homologous to select marker loci such as URA3, LEU2, HIS3, and TRP1 in the yeast genome, and is used to integrate the plasmid into a predetermined site in the genome through homologous recombination.

[0284] In some embodiments, the biosensor system described in step (C) includes the antibiotic bleomycin resistance gene (BleoR). Optionally, the screening marker gene used in step (C) or step (A) includes at least one selected from URA3, LEU2, HIS3, TRP1, and MET15.

[0285] IV. AI-based metabolic pathway optimization method based on growth-coupled metabolic enzyme directed evolution system

[0286] In some aspects of the present invention, an AI-based method for optimizing metabolic pathways by reconstructing metabolic pathways based on a growth-coupled directed evolution system of metabolic enzymes is provided, the method comprising:

[0287] The metabolic enzyme is evolved using the growth-coupled directed evolution system for metabolic enzymes described in Section II, "Growth-Coupled Directed Evolution System for Metabolic Enzymes." This involves culturing and screening host cells, obtaining catalytic enzyme mutants conforming to the directed evolution direction based on the host cells that survive the screening pressure, and obtaining mutant protein sequences of the catalytic enzyme mutants; and...

[0288] The mutant protein sequences corresponding to the screened catalytic enzyme mutants and the wild-type protein sequences corresponding to the wild-type catalytic enzymes are input into the screening model to obtain the fitness score corresponding to each catalytic enzyme mutant; based on the fitness score, the target catalytic enzyme mutants are screened.

[0289] In some implementations, the screening model includes: a one-dimensional vector extraction unit, a two-dimensional vector extraction unit, a geometric encoder, and a multilayer perceptron. The step of inputting the mutant protein sequences corresponding to the screened catalytic enzyme mutants and the wild-type protein sequences corresponding to the wild-type catalytic enzymes into the screening model to obtain fitness scores for each catalytic enzyme mutant includes: the one-dimensional vector extraction unit performing one-dimensional feature extraction based on the mutant protein sequences to obtain a one-dimensional mutation vector characterizing the evolutionary information of the catalytic enzyme mutants; and performing one-dimensional feature extraction based on the wild-type protein sequences to obtain a one-dimensional wild-type vector characterizing the evolutionary information of the wild-type catalytic enzymes; the two-dimensional vector extraction unit performing two-dimensional feature extraction based on the mutant protein sequences to obtain a vector characterizing the mutation of the catalytic enzyme. Two-dimensional mutation vectors representing geometric features related to the function of the wild-type catalytic enzyme are obtained by extracting two-dimensional features based on the wild-type protein sequence. The one-dimensional mutation vector, one-dimensional wild-type vector, two-dimensional mutation vector, and two-dimensional wild-type vector are then input into the geometric encoder to obtain a first node embedding vector corresponding to the catalytic enzyme mutant and a first edge embedding vector connecting the first node embedding vector, as well as a second node embedding vector corresponding to the wild-type catalytic enzyme and a second edge embedding vector connecting the second node embedding vector. The first node embedding vector, the first edge embedding vector, the second node embedding vector, and the second edge embedding vector are then input into a multilayer perceptron to obtain an fitness score.

[0290] V. Mutants obtained by AI-based reconstructed metabolic pathway optimization methods based on growth-coupled metabolic enzyme directed evolution systems

[0291] In some aspects of the invention, mutants obtained by an AI-based metabolic pathway optimization method based on a growth-coupled directed evolution system of metabolic enzymes are provided.

[0292] In some embodiments, the mutant includes a 4-coumaric acid-CoA ligase (4CL) mutant.

[0293] The 4-coumaric acid-CoA ligase mutant corresponds to the amino acid sequence shown in SEQ ID NO: 5 and has any of the mutations shown in (m1)-(m30) below:

[0294] (m1) I284M, I533M;

[0295] (m2) N415S, I533M;

[0296] (m3)I284M,N415S;

[0297] (m4) F110R, K426E, L450S;

[0298] (m5) I167V, N397K, K426E;

[0299] (m6) F110K, E190G, I454V;

[0300] (m7) F110R, N238D, E331V, S532A;

[0301] (m8) N397K, V498E, S532A, D545G;

[0302] (m9) N397G, K426E;

[0303] (m10) F110R, I167N, N397K;

[0304] (m11) T3I, F110K, M318K;

[0305] (m12) S262G, M318K, D488G;

[0306] (m13) N397K, K426E, L450S;

[0307] (m14) I46S, I167T, K426E;

[0308] (m15) M318K, E331V, D488G;

[0309] (m16) F110R, E331V, E365G;

[0310] (m17) F110R,N397K;

[0311] (m18)I271L,N397G,K426E,I541T;

[0312] (m19)I271L,K426E,D488G,I541T;

[0313] (m20) F110K,N397K;

[0314] (m21) I46S, K426E;

[0315] (m22) N415S, K544S;

[0316] (m23) N415S, T423S;

[0317] (m24) N397K,K426E;

[0318] (m25) F110R, I252T, M318K, E331V;

[0319] (m26) F110K, V185G, E331V, I505L;

[0320] (m27) F110K, V246G, E365G, S532A;

[0321] (m28) V8D, Q130L, M318K, S532A;

[0322] (m29) D13G, M318K, N397K, I524V;

[0323] (m30) F110K, I271L, N397K, I541T.

[0324] In some implementations, the mutant includes a chalcone synthase (CHS) mutant.

[0325] The chalcone synthase mutant corresponds to the amino acid sequence shown in SEQ ID NO: 6 and has any of the mutations shown in (n1)-(n30) below:

[0326] (n1)D61L,A308K;

[0327] (n2)D61R,K66E,A308K;

[0328] (n3)K66E,A308K;

[0329] (n4) K66E, K67R, A308K;

[0330] (n5) K281E, S293H, A308K;

[0331] (n6) K66E, I229T, K234R, A308K;

[0332] (n7)V2I,K281E,A308K;

[0333] (n8) K281E, S293T, A308K;

[0334] (n9) K66E, S208N, A308K;

[0335] (n10)D61A,K66E,A308K;

[0336] (n11) K66E, A308R;

[0337] (n12) K66E, K67R, A308K, L343H;

[0338] (n13)D61L,K281E,A308K;

[0339] (n14)K66E,S208N;

[0340] (n15)D61A,K281E,A308K;

[0341] (n16)A308K,L343H;

[0342] (n17) K55R, K281E, A308K;

[0343] (n18)D61A,K66E;

[0344] (n19)K281E,A308K;

[0345] (n20)S293T,A308K;

[0346] (n21)D61R,K66E;

[0347] (n22) K55R, D61R, K66E, K67R;

[0348] (n23)K66E,L343H;

[0349] (n24)D61A,K66E,K67R,I229T;

[0350] (n25)D61R,K66E,K67R,A308R;

[0351] (n26) K66E, K67R, S208N, I229T;

[0352] (n27) K66E, K67R, S293F, L343H;

[0353] (n28) K66E, K67R, I229T, L343H;

[0354] (n29) K66E, K67R, I229T, K234R;

[0355] (n30)D61A,K66E,K67R,K234R.

[0356] In some embodiments, the mutant comprises a combination of a 4-coumaric acid-CoA ligase mutant and a chalcone synthase mutant.

[0357] In some embodiments, the mutant comprises a combination of a 4-coumaric acid-CoA ligase mutant and a chalcone synthase mutant selected from any one of the following:

[0358] (z1)4CL mutants I284M, I533M; CHS mutants K281E, S293T, A308K;

[0359] (z2)4CL mutants N415S, I533M; CHS mutants D61L, A308K;

[0360] (z3)4CL mutants I284M, I533M; CHS mutants K281E, S293H, A308K;

[0361] (z4)4CL mutants N415S, I533M; CHS mutants K281E, S293T, A308K;

[0362] (z5)4CL mutants I167V, N397K, K426E; CHS mutants K281E, S293T, A308K;

[0363] (z6)4CL mutants I167V, N397K, K426E; CHS mutants D61R, K66E, A308K;

[0364] (z7)4CL mutants F110R, I167N, N397K; CHS mutants K66E, A308K;

[0365] (z8)4CL mutants F110R, I167N, N397K; CHS mutants D61L, A308K;

[0366] (z9)4CL mutants I167V, N397K, K426E; CHS mutants D61L, A308K;

[0367] (z10)4CL mutants I167V, N397K, K426E; CHS mutants V2I, K281E, A308K;

[0368] (z11)4CL mutants I284M and I533M; CHS mutants D61L and A308K;

[0369] (z12)4CL mutants F110R, I167N, N397K; CHS mutants K66E, K67R, A308K;

[0370] (z13)4CL mutants I284M, I533M; CHS mutants D61R, K66E, A308K;

[0371] (z14)4CL mutants I167V, N397K, K426E; CHS mutants K66E, A308K;

[0372] (z15)4CL mutants I167V, N397K, K426E; CHS mutants K66E, K67R, A308K;

[0373] (z16)4CL mutants F110R, I167N, N397K; CHS mutants D61R, K66E, A308K;

[0374] (z17)4CL mutants F110R, I167N, N397K; CHS mutants K281E, S293T, A308K;

[0375] (z18)4CL mutants N415S, I533M; CHS mutants D61R, K66E, A308K;

[0376] (z19)4CL mutants I284M and I533M; CHS mutants K66E and A308K;

[0377] (z20)4CL mutants I167V, N397K, K426E; CHS mutants K281E, S293H, A308K;

[0378] (z21)4CL mutants F110R, I167N, N397K; CHS mutants V2I, K281E, A308K;

[0379] (z22)4CL mutants F110R, I167N, N397K; CHS mutants K281E, S293H, A308K;

[0380] (z23)4CL mutants N415S, I533M; CHS mutants K66E, K67R, A308K;

[0381] (z24)4CL mutants N415S, I533M; CHS mutants K281E, S293H, A308K;

[0382] (z25)4CL mutants N415S, I533M; CHS mutants K66E, A308K;

[0383] (z26)4CL mutants I284M, I533M; CHS mutants V2I, K281E, A308K;

[0384] (z27)4CL mutants N415S, I533M; CHS mutants V2I, K281E, A308K;

[0385] (z28)4CL mutants I284M, I533M; CHS mutants K66E, K67R, A308K;

[0386] (z29)4CL mutants I167V, N397K, K426E; CHS mutants K66E, I229T, K234R, A308K;

[0387] (z30)4CL mutants N415S, I533M; CHS mutants K66E, I229T, K234R, A308K;

[0388] (z31)4CL mutants I284M, I533M; CHS mutants K66E, I229T, K234R, A308K;

[0389] (z32)4CL mutants F110R, I167N, N397K; CHS mutants K66E, I229T, K234R, A308K.

[0390] It is understandable that, taking the (z1)4CL mutants I284M and I533M and the CHS mutants K281E, S293T, and A308K as examples, they are combinations of 4-coumaric acid-CoA ligase mutants with I284M and I533M mutations corresponding to the amino acid sequence shown in SEQ ID NO: 5, and chalcone synthase mutants with K281E, S293T, and A308K mutations corresponding to the amino acid sequence shown in SEQ ID NO: 6.

[0391] VI. Uses of mutants

[0392] This invention also provides the mutant described in Section V, "Mutants obtained by AI-reconstructed metabolic pathway optimization method based on growth-coupled metabolic enzyme directed evolution system", for use in preparing metabolites in the flavonoid synthesis pathway.

[0393] In some specific implementations, the metabolites in the flavonoid synthesis pathway include naringenin or resveratrol.

[0394] Example

[0395] The embodiments of the present invention will be described in detail below with reference to examples. However, those skilled in the art will understand that the following examples are for illustrative purposes only and should not be considered as limiting the scope of the invention. Unless otherwise specified in the examples, conventional conditions or conditions recommended by the manufacturer are followed. Reagents or instruments whose manufacturers are not specified are all commercially available conventional products.

[0396] I. Materials and Methods

[0397] 1. Chemicals and reagents

[0398] Oligonucleotides were synthesized by Xianghong Biotechnology (Beijing, China). Enzymes used for molecular cloning—including Q5® high-fidelity DNA polymerase, Gibson Assembly® Master Mix, restriction endonucleases, and T4 DNA ligase—were purchased from NewEngland Biolabs (MA, USA). Phante DNA polymerase premix and Gibson Assembly® Master Mix were purchased from Vazyme Bio Inc. (Nanjing, China). Plasmid extraction was performed using the Magen® Plasmid Kit (MagenBiotechnology, Guangzhou, China).

[0399] Escherichia coli was cultured in LB medium containing 10 g / L tryptone, 5 g / L yeast extract and 10 g / L NaCl.

[0400] Saccharomyces cerevisiae was cultured on YPD medium (20 g / L peptone, 10 g / L yeast extract and 20 g / L glucose) or synthesis-determined (SD) deletion medium (Coolaber, China).

[0401] Supplement with an appropriate mixture of auxotrophic amino acids (DO supplement) based on plasmid selection markers. All solid media contain 2% agar.

[0402] Naringenin used for biosensor validation was purchased from Sigma-Aldrich (MO, USA), while naringenin chalcone was purchased from ChemFaces (Wuhan, China).

[0403] CHS enzyme assay substrates—p-coumaroyl-CoA and malonyl-CoA—were purchased from TransMIT GmbH (Giessen, Germany) and Sigma-Aldrich, respectively.

[0404] Unless otherwise stated, all other chemicals were purchased from Sigma-Aldrich.

[0405] 2. Plasmids and bacterial strains

[0406] The bacterial strains and plasmids used in subsequent examples are listed in Tables 1 and 2. All cloning experiments were performed using *E. coli* DH5α (purchased from Kangti Biotechnology, Shenzhen, China), while all strains used for protein expression and purification were derived from BL21star™(DE3)ΔsucCΔfumC (purchased from Kangti Biotechnology, Shenzhen, China). The strains used for flavonoid metabolite expression, the gene circuits for testing metabolites, and the construction of the continuous directed evolution system were all performed in yeast. All strains used for continuous directed evolution were constructed using strain GAY319 (provided by Chang Liu from the University of California).

[0407] The plasmid composition of the directed evolution strain is as follows: Plasmid EC-633, used for erroneous replication of the P1 plasmid, was obtained from Addgene (#130873), also referred to in this specification as mutagenic plasmid Ec633, which encodes the error-prone polymerase TP-DNAP1-4-2. The plasmid used for genomic homologous recombination was provided by Chen Ye of the Shenzhen Institutes of Advanced Technology, Chinese Academy of Sciences. Homologous recombination plasmids for the P1 regulatory gene, small molecule receptor gene, and flavonoid metabolic pathway gene were synthesized by Tsingke and Genewiz.

[0408] In subsequent embodiments, since these genes have multiple names, URA3, LEU2, HIS3, BleoR, TtgR, CouR, 4CL, CHS, and CHI are used to refer to the corresponding genes.

[0409] The yeast strain GA-Y319 (MATa can1 his3 leu2Δ0 ura3Δ0 trp1Δ0 flo1+p1_wt+p2_wt) used for testing and evolution was provided by Liu Chang. The yeast strain CYE72 (BY4741+lacI, xylR,tetR) and its evolutionary parent strain BY4741 used for FACS analysis were provided by Chen Yehui. It should be noted that the yeast strain used for growth-coupled resistance testing and evolution in the embodiments of this application is GA-Y319, the public source of which can be found in the prior art literature (García-García, JD, et al., Using continuous directed evolution to improve enzymes for plant applications. Plant Physiology, 2022, 188(2): 971-983.), which is available to the public based on this public information.

[0410] The backbone of exogenous expression plasmids containing the target gene to be evolved, expression plasmids of error-prone DNA polymerases, and expression plasmids for biosensor systems can also be found in the disclosure of patent CN 116790652 A, which is incorporated herein by reference.

[0411]

[0412] 3. Plasmid construction

[0413] All primers used in subsequent examples were purchased from Xianghong Biotechnology (Beijing, China). All high-fidelity enzymes and homologous recombinases required for PCR and cloning were purchased from Vazyme and NEB. All individually cloned plasmids were assembled using Gibson Assembly or Golden Gate Assembly according to the manufacturer's instructions. The DNA oligonucleotide sequences sequenced at evolutionary sites included trp-seq-F 5'-ATGGCGTTATTGGTGTTGAT-3' (SEQ ID NO:1), Ura-seq-F 5'-ACAGTATAGAACCGTGGATG-3' (SEQ ID NO:2), and Orf4-R 5'-CATCTCTTCTACCAAGACCT-3' (SEQ ID NO:3). Key genetic components used in subsequent examples are listed in Tables 1 and 2.

[0414] Table 2. Yeast strains used in this study

[0415]

[0416] 4. Transformation and culture of strains

[0417] All yeast plasmids used are listed in Table 1. Plasmids from each strain were linearized using the BsaI restriction endonuclease from NEB and then stepwise integrated into the yeast genome or P1 according to a standard electroporation protocol. Synthetic deficient medium (SD medium) was purchased from Coolaber and served as the basal medium for yeast growth under all conditions, with the appropriate deficient amino acid mixture (DO supplement) selected based on the nutrient tag of the transforming plasmid. Yeast strains in liquid medium were cultured with shaking at 30°C and 220 rpm, while yeast on solid medium (2% agar) was cultured in an incubator at 30°C. *E. coli* strains in liquid medium were cultured with shaking at 37°C and 220 rpm, while *E. coli* on solid medium (2% agar) was cultured at 37°C.

[0418] 5. Yeast Conversion

[0419] Preparation of competent yeast cells: First, GAY319 was inoculated into 5 mL of yeast culture medium and cultured at 30°C and 220 rpm for 24 hours with shaking to prepare a yeast seed culture. Then, the saturated yeast seed culture was diluted to an OD of approximately 0.15 (100 mL) and grown to the logarithmic growth phase, with an OD of 0.6-0.8 (approximately 5 hours). Competent yeast cells were prepared using sterile centrifuge tubes, EP tubes, and pipette tips, pre-chilled. All operations were performed aseptically on ice. The yeast precipitate was washed twice with 50 mL of pre-chilled ddH2O, then once with 50 mL of electroporation solution (1 M sorbitol + 1 mM calcium chloride), and finally resuspended in 200 μL of electroporation solution.

[0420] Yeast transformation: 11 μg (5 μg linearized sensor plasmid, 5 μg linearized plasmid containing metabolic enzymes, and 1 μg mutagenic plasmid) was added to 100 μL of competent cells, gently mixed, and incubated on ice for 5 minutes. Using an electroporator, the parameters were set to 2.5 kV and 25 μF, followed immediately by the addition of 1 mL of pre-chilled YPD medium and incubation at 30°C for 1 hour.

[0421] Plate coating: Centrifuge the conversion mixture at 3000 rpm for 1 minute, discard the supernatant, resuspend the precipitate in 1 mL ddH2O, and plate it onto the corresponding auxotrophic yeast selective plates.

[0422] 6. Yeast fermentation

[0423] Single colonies of the parental strain were inoculated into 5 mL of SD-Leu liquid medium and cultured overnight at 30°C and 220 rpm with shaking. The saturated culture was then transferred to 50 mL of SD deficient medium at a 1:10 inoculation ratio and cultured at 30°C and 220 rpm with shaking for 24 hours (for resveratrol production) or 48 hours (for naringin production). During the culture, p-coumaric acid was added as a substrate to a final concentration of 1.5 mM. After fermentation, the fermentation broth was extracted with an organic solvent, sonicated for 5 minutes, and then centrifuged at 5000 rpm for 5 minutes. The supernatant was collected for high-performance liquid chromatography (HPLC) analysis.

[0424] 7. HPLC analysis

[0425] HPLC analysis was performed using an Agilent 1260 Infinity II HPLC system (Agilent Technologies, Santa Clara, CA, USA) equipped with a photodiode array detector (DAD). Chromatographic separations were performed using an Agilent ZORBAXSB-C18 column (4.6 × 250 mm, 5 μm). The mobile phase consisted of water (A) and acetonitrile (B), both containing 0.1% formic acid. The gradient elution program was as follows: 0–5 min, 10% B; 5–15 min, 10–30% B; 15–25 min, 30–50% B; 25–30 min, 50–100% B; 30–35 min, 100% B; 35–40 min, 100–10% B; 40–45 min, 10% B. The flow rate was 1.0 mL / min, and the column temperature was 30°C. The detection wavelengths were 306 nm (resveratrol) and 290 nm (naringenin). Quantification was performed using the external standard method.

[0426]

[0427]

[0428]

[0429] In this context, a single underscore represents an operator sequence; a double underscore represents an interval sequence.

[0430] The binding sequence (operon sequence) of CouR or its functional variants is as follows:

[0431] TTGTTATACTCTATAACTATTCTGCACAG (SEQ ID NO: 23).

[0432] The binding sequence (operon sequence) of TtgR or its functional variants is:

[0433] gtatttacaaacaaccatgaatgtaagtat (SEQ ID NO: 24).

[0434] 8. Induction and Growth Measurement

[0435] 8.1 Configuration of Gradient Induction System

[0436] The gradient induction assay employed a multi-factor orthogonal experimental design: Four standard stock solutions were first prepared—isopropyl-β-D-thiogalactoside (IPTG) solution (150 mM), bleomycin (100 mg / mL), 5-fluoroorotic acid (100 mg / mL), and L-lactic acid solution (150 mM). A 6×6 bivariate induction matrix was constructed using 96-well deep-well plates. For example, the horizontal gradient was set as the IPTG concentration gradient regulated by the lactose operon repressor (LacI), and the vertical gradient was established as the antibiotic concentration gradient (0-1 mg / mL). Differential induction parameters were implemented based on the strain's genotype characteristics. For the CYE72 strain (LacIQ) with high LacI expression, a broad IPTG concentration gradient of 0-20 mM was set; while for the GAY319 strain (LacI+) with low LacI expression, a fine gradient of 0-5 mM was used. The final volume of each experiment was strictly controlled at 1000 μL, and antibiotic concentrations of 0, 0.05, 0.1, 0.2, 0.5, and 1 mg / mL were set for preparation. The system design follows the principle of constant volume. By pre-preparing equal volumes of induction composite solution (containing a fixed volume of DMSO cosolvent) and then dispensing it, the osmotic pressure deviation caused by volume differences is effectively eliminated. This method is applicable to all induction experiments in this project and can be extended to: (1) determination of antibiotic resistance threshold; (2) dose response analysis of inducible promoters (such as Ptac and Plac); (3) optimization of positive and negative bidirectional selection pressure (such as the URA3 / 5-FOA reverse screening system). The core of the experimental design is to maintain the stoichiometric independence of each inducing factor and to achieve simultaneous optimization of multiple parameters through matrix arrangement, which significantly improves the screening throughput. This method confirms that the volume control error rate is less than 2%, which meets the accuracy requirements of high-throughput screening.

[0437] 8.2 Induction and Growth Measurement

[0438] The experimental procedure can be summarized as follows:

[0439] (1) Activation stage of strain: Use a 10 μL micropipette tip to pick up the monoclonal recombinant yeast colonies formed on the surface of the solid culture medium, and inoculate them into 3 mL of the corresponding auxotrophic liquid culture medium. Use sterile centrifuge tubes to shake and culture in a constant temperature shaker at 30°C (220 rpm, 16-18 h) to allow the cells to reach the logarithmic growth phase.

[0440] (2) Standardization of bacterial suspension: Take 1 mL of overnight culture medium and measure the optical density (OD) at a wavelength of 600 nm using an ELISA reader. 600 By precisely adjusting the concentration of yeast suspensions in each experimental group, the yeast strains used for screening genes were adjusted to the same OD value.600 Ensure all samples meet the isodense inoculation standard (OD). 600 = 1.0);

[0441] (3) Construction of induction culture system: Take the same number of each yeast strain after adjustment and add them to the prepared induction system. Use a pipette to accurately transfer the standardized bacterial suspension at a volume of 6 μL / well into a 96-well deep-well culture plate. Each treatment group is set up with three independent replicates. During the inoculation process, strictly maintain a constant pipetting depth and speed. After inoculation, gently blow to ensure uniform suspension of the cells. Cover the surface of the culture plate with a sterile, breathable sealing film to prevent evaporative contamination.

[0442] (4) Dynamic growth monitoring: After inoculation, the deep-well plate was placed in a microplate constant temperature shaking culture system (parameter settings: 30 °C, 900 rpm). Timed monitoring began 8 h after induction culture. Every 4 h, 100 μL of culture medium was quantitatively transferred to a 96-well transparent flat-bottom detection plate using a multi-channel pipette, and OD was measured using a full-wavelength microplate reader. 600 Continuous measurements were performed, with a monitoring period of 48 hours. A blank culture medium control was simultaneously set up during the measurement process to eliminate background interference.

[0443] 8.3 Flow cytometry fluorescence expression measurement

[0444] (1) Cell preparation: Bacterial cells were cultured in 96-well plates with a total volume of 500 μL per well. The bacterial culture was inoculated at a ratio of 1:200. After the induction system was prepared, the cells were cultured in a microplate shaker at 30 °C and 900 rpm for 16 h. The microplates were sealed with sterile, breathable sealing film. After the bacterial growth OD600 reached 1, 20 μL of bacterial culture was added to 180 μL of loading buffer PBS (the loading buffer was prepared as follows: 1% actinomycin was added to PBS buffer). The reaction was stopped at room temperature for 1 h. The cells can be stored in a 4°C freezer for 48 h.

[0445] (2) Flow cytometry: Flow cytometer was used for detection. YFP was detected using a 488 nm laser with the FITC detection channel; BFP was detected using a 405 nm laser with the PB450 detection channel; mCherry or mRuby was detected using a 488 nm laser with the ECD detection channel. CytExpert software was used for detection, and samples were loaded in well plate mode, collecting 10,000 events.

[0446] (3) Statistical analysis: Appropriate cell populations were selected using SSC and FSC, and the target cell populations were analyzed. Fluorescence intensity was usually expressed as mean fluorescence intensity (MFI), and the ratio of MFI of different fluorescence signals represented the relative fold. The cell collection event count was 10,000, and the needle washing time was 6 seconds.

[0447] 9. Yeast heterologous expression and detection

[0448] 9.1 Detection of yeast heterologous expression products

[0449] (1) Preparation of strain for preservation: Single-clonal transformants were picked up with sterile pipette tips and inoculated onto the surface of specific auxotrophic synthesis-deficient (SD) solid medium using a serial dilution plating method. After inoculation, the plates were placed in a constant temperature incubator (30±0.5°C) for aerobic culture. After the colony morphology stabilized, they were transferred to a 4°C cold chain preservation system to preserve the strain seed bank.

[0450] (2) Secondary metabolite induction culture: The yeast was inoculated into a auxotrophic liquid medium and cultured for one day in a shaker at 30 °C and 220 rpm to prepare a seed culture. Then, the inoculum was transferred to 50 mL of the corresponding auxotrophic SD liquid medium (containing 2 mM coumaric acid substrate) at an inoculation ratio of 1:10 for scale-up culture and cultured for 48 hours (naringenin) or 24 hours (resveratrol) in a constant temperature shaker at 30 °C and 220 rpm.

[0451] (3) Metabolite separation and identification: 1) Extraction and purification: 50 mL of fermentation broth was used for phase separation. The bacterial precipitate was ultrasonically broken (300 W, pulse mode) and then subjected to three ethyl acetate gradient extractions (solvent ratio 1:1.5) with the supernatant. The combined organic phases were dehydrated with anhydrous sodium sulfate and concentrated to constant weight using a vacuum rotary evaporator (30°C, 200 Mbar). The residues were redissolved in 300 μL of chromatographic grade methanol and filtered through a 0.22 μm organic microporous membrane to prepare analytical grade samples. 2) Based on ultra-high performance liquid chromatography-quadrupole time-of-flight mass spectrometry, the chromatographic separation and mass spectrometry detection of the samples were completed. Chromatographic conditions: C18 reversed-phase column (2.1×100 mm, 1.7 μm), mobile phase was 0.1% formic acid-water-acetonitrile gradient elution. Mass spectrometry parameters: electrospray ionization source (ESI±), scan range m / z 100-1500. HPLC-DAD detection (detection wavelengths 280 / 310 nm) was simultaneously set up to verify the characteristic absorption of flavonoids. Detection and separation of yeast feed products.

[0452] 9.2 Plotting the Product Standard Curve

[0453] Accurately weigh nargenin, resveratrol, and p-coumaric acid standards, and prepare a series of standard solutions of varying concentrations (0.5, 0.8, 1.0, 1.5, and 2.0 mM) in volumetric flasks using methanol as the solvent. Using the concentration of each standard solution as the independent variable, measure the peak area at detection wavelengths of 280 nm and 310 nm as the dependent variable, and plot the standard curve as shown below. Figure 21 As shown, this curve is used for subsequent quantitative analysis of the target compound. 10. HPLC determination of the product.

[0454] Subsequent high-performance liquid chromatography (HPLC) analyses were performed on an Agilent 1200 HPLC system. Separation was performed using a Phenomenex Gemini C18 column (150 mm × 2.0 mm, 5 μm). The mobile phase consisted of an aqueous solution containing 0.1% formic acid and acetonitrile. The elution program was as follows: the acetonitrile concentration linearly increased from 10% to 100% within 0–25 minutes and maintained at 100% acetonitrile for 30 minutes. The flow rate was set at 1.0 mL / min, and the injection volume was 5 μL.

[0455] II. Systems for Co-evolution (CoEvo system; directed evolution system of metabolic enzymes based on growth coupling)

[0456] First, the CoEvo system was developed, which is a co-evolutionary platform built based on the OrthoRep continuous evolution system. Using *Saccharomyces cerevisiae* as the host, OrthoRep utilizes approximately 100 copies of the P1 plasmid per cell and approximately 10... 8 The cell / mL culture density provides an ideal platform for co-evolution at the cell and population levels. This architecture enables high-throughput mapping of the metabolic landscape by capturing key sequence-function data points (see [link to article]). Figure 3 ).

[0457] To achieve co-evolution, this embodiment optimizes the orthogonal plasmid replication mechanism of OrthoRep, which couples error-prone DNA polymerases with mismatch repair, thereby introducing targeted mutations into specific sequences. Enzymes involved in metabolic pathways (i.e., catalytic enzymes) are encoded on the P1 plasmid, generating a diverse library of variants for in vivo evolution. To further improve mutation throughput, the mutagenic plasmid Ec633, which encodes an error-prone polymerase, is introduced, thereby accelerating the continuous diversification of enzymes (see [link to documentation]). Figure 4 ).

[0458] Specifically, TP-DNAP1-4-2 polymerase produces approximately 1 × 10⁻⁶ bases per cell. -5The substitution rate induces mutations, while the high copy number of the P1 plasmid ensures robust mutation coverage and evolutionary efficiency. To couple enzyme activity with evolutionary selection, a biosensor was implemented to detect metabolite concentrations and correlate them with the expression of selective genes and / or reporter genes. This sensor circuitry was stably integrated into the yeast genome, enabling continuous metabolic monitoring and reliable population-level measurements. This feedback loop tightly couples enzyme activity, metabolite production, and cell proliferation, allowing for rapid evolutionary cycles (approximately 4-5 hours per cycle) and facilitating tens to hundreds of rounds of evolution within days. This sensor-driven co-evolutionary system is named the CoEvo system in this invention.

[0459] In this embodiment, flavonoid metabolism was selected as the model metabolic system (see [reference]). Figure 5 The CoEvo platform is used because of its biological relevance, therapeutic potential, and integration into both central and secondary metabolic pathways. In this system, p-coumaroyl-CoA is retained intracellularly as a cellular-level cooperative pressure, while naringenin diffuses into the extracellular culture medium as a population-level pressure. The effectiveness of the CoEvo platform was validated through co-evolution of a key enzyme in the naringenin metabolic pathway, demonstrating its power in high-throughput sequence-metabolite mapping.

[0460] 1. Shared metabolite sensing at the cellular and population levels

[0461] To achieve precise metabolite sensing across cellular and population scales, dedicated biosensors for coumaroyl-CoA and naringenin were developed, enabling real-time sensing, regulation, and selection-driven screening of enzyme performance. Two complementary selection mechanisms were implemented: a fluorescent reporter gene (YFP) for rapid quantification, and resistance genes (BleoR and bleomycin) for growth-based selection and evolutionary optimization.

[0462] 1.1 Cellular-level sensing of coumaroyl coenzyme A (CouR system)

[0463] To achieve intracellular sensing of coumaroyl-CoA, CouR, a MarR family transcriptional repressor from *Rhodopseudomonas palustris*, was used. CouR regulates genes involved in coumaric acid metabolism. In the absence of a ligand, CouR binds to the operon sequence (also known as the CouR-specific operon sequence, the CouR binding sequence, or the CouR binding site) and represses reporter gene expression. Upon ligand binding (i.e., to a specific metabolite), DNA affinity decreases, leading to derepression and transcriptional activation—allowing for real-time quantification of metabolite levels based on fluorescence.

[0464] Since p-coumaryl-CoA is membrane-impermeable (i.e., an intracellular metabolite), external supply of p-coumaric acid to yeast strains expressing 4CL promotes intracellular biosynthesis of p-coumaryl-CoA and optimizes sensor sensitivity. To enhance specificity, a synthetic promoter variant incorporating a CouR binding site was designed.

[0465] The expression plasmids pSZL6-01 to pSZL6-04 of the couR-related genes were electroporated into yeast CYE72, and the fluorescent expression of different promoter and operon combinations was detected. Specifically, the couR coding gene was placed under the control of a constitutive promoter (such as pADH1). The couR-specific operon sequence (couO) was inserted upstream of a minimal promoter (such as the CYC1 minimal promoter) to control the expression of the reporter gene YFP (yellow fluorescent protein) and the selectable marker gene BleoR (bleomycin resistance gene). To achieve effective regulation of downstream selectable marker genes and reporter genes by couR, this invention systematically optimized the insertion strategy of its specific operon sequence (couO) in the promoter. Preferably, couO is inserted downstream of the TATA box region of the yeast promoter to construct a regulatory module containing single and multiple combinations of couO. Figure 6 (a) Experiments showed that this module exhibited maximum fluorescence intensity contrast and dynamic range under ligand-free conditions. Further construction of promoter structures containing four different couO sites revealed pSZL6-01 (pCouR-1, ...) as an example. Figure 6 Also referred to as pCouR1 in this specification as PcouO1, and similarly as pCouR-2 to pCouR-4, which can also be referred to as pCouR2 to pCouR4 or PcouO2 to PcouO4 respectively) configuration response characteristics are optimal, and the single operon site downstream of the TATA box exhibits the highest dynamic range and signal responsiveness. Figure 6 (a) shows that its YFP expression was activated more than 17-fold after ligand induction, and the response concentration range was wider. Figure 6 b in Figure 10 (a) in the middle.

[0466] To evaluate the dose-response capability of the CouR system to the ligand p-coumaryl-CoA, an indirect induction strategy was adopted: by adding the precursor p-coumaric acid to yeast engineered strains expressing different copies of 4CL enzyme, different concentrations of the target ligand (low, medium, and high) were synthesized in situ intracellularly.

[0467] Specifically, to investigate the effect of 4CL gene copy number on expression levels, yeast expression plasmids pSZL6-05 to pSZL6-08, containing 2 to 5 copies of the 4CL gene, were constructed. These plasmids were assembled using the Golden Gate assembly method, assembling the "promoter-4CL gene-terminator" expression cassette in a multi-copy configuration. Each plasmid contained the ampicillin resistance gene (Amp), the yeast 2μ origin of replication, the *E. coli* ColE1 origin of replication, the TRP1 nutrient selection marker, and the constitutive promoter PADH1 and terminator ADH2t. These multi-copy plasmids were introduced into yeast cells via yeast transformation to achieve multi-copy expression of the 4CL gene in yeast.

[0468] As a control, single-copy integrative strains and empty vector control strains were also constructed:

[0469] "1 copy" strain: A single 4CL gene expression cassette is integrated into the yeast genome through homologous recombination to obtain a stable single-copy expression strain. The plasmid used for its construction is pSZL6-17 (containing a single 4CL gene expression unit).

[0470] "0 copy" control strain: Yeast was transformed with a blank expression vector pSZL6-17Δ4CL (i.e., the 4CL coding sequence was deleted from pSZL6-17) that did not contain the 4CL gene, as a negative control.

[0471] All yeast transformation and expression experiments used the CEN.PK series yeast strain CYE72 as the host. The constructed multi-copy plasmids pSZL6-05 to pSZL6-08, single-copy integrated strains, and empty vector control strains were introduced into CYE72 yeast. The effect of different 4CL gene copy numbers on the expression level of the target product was systematically evaluated by detecting fluorescence reporter signals.

[0472] As mentioned earlier, strains with different 4CL copy numbers (0-5) were constructed. The results showed that YFP fluorescence was lowest and COUR continued to inhibit expression when there was no 4CL; while YFP expression was strongest when the 4CL copy number was 5, indicating that high concentrations of ligands effectively relieved the inhibition. Figure 7 (b) Two copies of the strain showed a moderate response, forming a dose gradient. To further validate the specificity of the CouR system, induction was performed using p-coumaric acid (p-HCA), L-tyrosine, resveratrol, isoflavone, and genistein as controls. Only the p-coumaroyl-CoA treatment group showed significant fluorescence enhancement, while the fluorescence of the other control groups was close to background levels. Figure 7The parameters c to e in the assay demonstrate the system's high selectivity for the target molecule. Combined with specificity assays, this confirms that CouR selectively responds to coumaroyl-CoA, with no activation observed in its precursor coumaric acid, validating the high specificity of the biosensor. Figure 7 (c to e in the text).

[0473] In summary, as Figure 7 As shown, in the absence of a ligand, CouR binds to CouO, inhibiting reporter gene expression; when p-coumaroyl-CoA accumulates intracellularly, it binds to CouR, causing CouR to dissociate from the DNA, thereby initiating the expression of YFP and BleoR. By testing different numbers and locations of CouO sequences, sensor versions with high dynamic range and sensitivity were obtained (e.g., constructs containing a single operon site downstream of the TATA box). Specificity experiments confirmed that the sensor is highly specific for p-coumaroyl-CoA and unresponsive to its precursor p-coumaric acid.

[0474] 1.2. Population-level sensing of naringenin (TtgR system)

[0475] To achieve population-level sensing, a biosensor responsive to naringenin (also referred to in this specification as (2S)-naringenin) was developed, repurposing the TtgR operon of *Pseudomonas putida* for transcriptional regulation in yeast. Figure 8 (a) TtgR is initially involved in flavonoid metabolism and efflux regulation. By inserting the TtgO binding site into the synthetic promoter upstream of the YFP reporter gene and using the ADH1 promoter to drive TtgR expression, it is adapted for yeast-specific control. Figure 8 (a) In this invention, the TtgR transcription factor derived from *Pseudomonas putida* DOT-T1E and its binding sequence TtgO are introduced into *Saccharomyces cerevisiae* to construct a biosensor that can respond to naringenin in a eukaryotic environment. TtgR belongs to the TetR family and binds to the promoter and inhibits transcription in the absence of an inducer; upon the addition of naringenin, TtgR dissociates from DNA, relieving the inhibition of the downstream reporter gene YFP.

[0476] To optimize the system's performance in yeast, TtgO was inserted into different synthetic promoters (ttgO1 to ttgO4) to construct multiple promoter-TtgO combinations (pTtgR-1 to pTtgR-4, also referred to as PttgO1 to PttgO4 or PttgR-1 to PttgR-4 in this specification). Specifically, ttgR-related gene expression plasmids pSZL6-09 to pSZL6-12 were transformed into yeast cells, forming yeast cells ySZL6-09 to ySZL6-12, which were used to detect the fluorescent expression of different promoter and operon combinations.

[0477] Experiments showed that TtgR effectively bound to and inhibited YFP expression in the absence of naringenin; the addition of 0.5 mM naringenin significantly activated YFP expression. Among them, ySZL6-09 (ttgO1, i.e., pTtgR-1 conformation) exhibited the strongest response, with an activation fold exceeding 60-fold before and after induction; ySZL6-11 (ttgO3) also performed well, with the lowest background leakage, demonstrating high activation efficiency and low background characteristics. ttgO3 was selected for subsequent high-throughput screening experiments. Figure 8 b in, and Figure 10 (b) in the middle.

[0478] Further evaluation of the dose-response characteristics of the TtgR system to naringenin was conducted. The optimal biosensor (ttgO3) exhibited a linear response to (2S)-naringenin in the range of 0–0.8 mM. Figure 9 (b) Specificity tests for pathway intermediates and structural analogs—including coumaric acid, L-tyrosine, and caffeic acid—confirmed that only (2S)-naringenin induced strong fluorescence output, validating its suitability for high-throughput screening and selection. Figure 9 (ad in the middle).

[0479] When the sensor was integrated into an engineered yeast strain capable of synthesizing naringenin, the fluorescence intensity of YFP was positively correlated with the intracellular naringenin concentration, validating the system's potential for real-time monitoring and high-throughput screening of metabolites.

[0480] It can be seen that the TtgR biosensor constructed in this invention has high activation fold, low background leakage, good linear response range and high specificity, and is suitable for the construction of dynamic monitoring and regulation modules for flavonoid metabolism engineering in yeast systems.

[0481] 1.3 Cooperative sensing enables variant communication and interaction

[0482] To demonstrate that the biosensor system constructed in 1.1 can detect and respond to shared metabolite pressure between covariates, yeast strains carrying 0 to 5 copies of 4CL were constructed, and the biosensor output was evaluated. Using the GoldenGate assembly method, multiple "promoter-4CL-terminator" units were linked to construct multi-copy tandem expression cassettes containing 1 to 5 copies in vitro. These tandem expression cassettes or their constructed linearized integration vectors were transformed into yeast strains ySZL6-15 or ySZL6-16 via electroporation. The fluorescence intensity of the detected product after substrate concentration gradient induction was positively correlated with the 4CL copy number, confirming that metabolite sensing effectively tracked enzyme activity cooperation (see [link to article]). Figure 11 ).

[0483] Strains ySZL6-15 and 16 were obtained by transforming the plasmid pSZL6-15 / 16 couR-regulated gene expression plasmid into yeast, and the expression of bleoR-p2A-YFP was detected by regulating the promoter driven by couR. Figure 11 The results of ySZL6-15 are shown as an example.

[0484] Similarly, strains ySZL6-13 and 14 were transformed into yeast with the pSZL6-13 / 14 ttgR-regulated gene expression plasmid, and the expression of bleoR-p2A-YFP driven by the ttgR-regulated promoter was detected. Subsequently, ttgR-PttgO3-bleoR-YFP (pSZL6-14 ttgR) was selected for evolutionary screening.

[0485] like Figure 12 As shown in a, the sensor reporter system is a BleoR-URA3 fusion expression unit, constructed based on the coumarin-CoA responsive couR sensor described above. Specifically, the DNA module containing the expression couR regulatory protein and the BleoR-URA3 fusion reporter gene from plasmid pSZL6-16 in the previous table was integrated into the GAY319 yeast evolutionary strain ySZL6-16, suitable for high-throughput growth screening, via genomic homologous recombination / electroporation. This strain background provides a broader dynamic detection range for subsequent evolution. This design combines fluorescence screening with bleomycin resistance-uracil nutrient-dependent growth stress screening to establish a growth-coupled screening system for subsequent directed evolution of metabolic pathways such as 4CL enzymes. Growth assays at increasing bleomycin concentrations further validated the production-dependent survival dynamics (see [reference]). Figure 12 Strains with higher 4CL copy numbers exhibited enhanced resistance at 0.002–0.02 mg / mL bleomycin, while growth was inhibited in all strains at 0.05 mg / mL. These results demonstrate the high sensitivity of the biosensor to ligand concentration and its effectiveness in imposing stringent selection pressure during enzyme evolution. Furthermore, synergistic interactions among enzyme variants jointly promoted growth advantage, reinforcing the principle of co-evolution in metabolic optimization (see [link to relevant documentation]). Figure 12 (b) in the middle.

[0486] 1.4. Evolutionary System Integration and Application: The aforementioned biosensor system was integrated with the OrthoRep in vivo continuous evolution platform to construct an evolutionary pathway coupling yeast growth with sensor performance. Under increasing bleomycin stress, yeast could only survive when the sensor responded to the drug molecule and activated BleoR expression, thus achieving continuous optimization of sensor performance. This method, through a combination of orthogonal evolution and growth-coupled screening, efficiently obtained highly active mutants, providing a reliable platform for directed protein evolution.

[0487] 1.4.1 Coupling of Catalytic Enzyme Activity in the Yeast Evolutionary System

[0488] Based on the working principle of the OrthoRep system, this invention successfully reintegrated the designed gene circuit into the GAY319 yeast genome and completed functional verification. Figure 13 As shown in Figure a, a drug-responsive regulatory module was embedded in the engineered yeast genome. The original reporter gene was replaced with a bifunctional marker system of bleomycin resistance and fluorescent protein genes, providing observable growth screening for continuous directed evolution. Simultaneously, linear plasmid P1 carrying the wild-type 4-coumaric acid-CoA ligase (4CL) gene was introduced into the yeast host. The activity of its expression product regulates the expression levels of BleoR and YFP through CouR sensor signal transduction, thereby determining the strain's proliferation ability and fluorescent reporter intensity in an antibiotic environment. To establish a precise evolutionary selection pressure gradient, this study evaluated the dose-response relationship between bleomycin concentration and resistance gene expression. Figure 13 (b) Plasmids pSZL6-15&18, a fallibility-prone DNA polymerase expression plasmid, and a growth-related gene expression plasmid are transformed into yeast strain ySZL6-20. The enzyme expressed by the evolution target sequence can drive / inhibit the expression of growth-related genes by producing flavonoid products. The yeast strain is placed under a growth selection pressure environment. Yeast strains surviving under this selection pressure contain the evolution target sequence, which is a target sequence conforming to the direction of directed evolution.

[0489] The experiment was set up with antibiotic concentration gradients from 0 to 0.1 mg / mL, combined with drug-induced concentration gradient treatment.

[0490] Based on the molecular evolution principle of the OrthoRep system, this invention successfully constructs a dynamically adjustable continuous directed evolution platform. Figure 13 (c) The experiment employed a two-step culture strategy: First, the evolutionary strain was inoculated into 2 mL of auxotrophic medium (SD-Ura / -Leu) and pre-cultured at 30°C and 220 rpm for 24 h; then, the bacterial culture was transferred to a 96-well evolutionary plate containing 500 μL of fresh medium, and metabolic byproducts were removed by centrifugation (3000 rpm, 5 min). The initial inoculation density was then normalized to OD. 600 =0.2. The evolutionary pressure module consists of dual regulatory elements: (1) the concentration of coumaric acid as an enzyme catalytic precursor (0.5 mM, determined based on the previous dose-response curve); (2) the resistance concentration gradient bleomycin (0-0.1 mg / mL), the threshold of which is dynamically adjusted according to the strain's adaptability.

[0491] 1.4.2 Coupling of the naringenin pathway in yeast evolutionary systems

[0492] To verify that biocatalytic enzymes can be coupled with the yeast evolutionary system, this invention constructed an initial synthetic strain ySZL6-19 by co-expressing Fj-TAL (TAL-Fj), Pc-4CL (4CL-Pc), Ha-CHS (CHS), and Ms-CHI (CHI-Ms) in the heterologous expression of naringenin in yeast, and confirmed that the metabolic pathway can synthesize naringenin.

[0493] Plasmid pSZL6-22, a fallibility-prone DNA polymerase expression plasmid, and the biosensor ttgR-PttgO3-bleoR-YFP were transformed into yeast to synthesize yeast strain ySZL6-22. The enzyme expressed by the target evolution sequence can drive / inhibit the expression of growth-related genes by producing flavonoid products. The yeast strain was placed under growth selection pressure; yeast strains surviving this pressure contained the target evolution sequence, which was considered a target sequence conforming to the directed evolution direction. Subsequently, the rate-limiting enzymes 4CL and CHS, which required evolution, were evolved, and their coding sequences were cloned into the P1 linear plasmid. Continuous mutations were induced using OrthoRep, and the TtgR sensor was introduced to drive bleoR expression, achieving responsive regulation to the final product concentration. A linear response relationship between expression activity and resistance selection was established by systematically evaluating the inhibitory effect of different bleomycin concentrations on strain growth. (See [link to relevant documentation]). Figure 14-15 The results showed that bleoR expression could distinguish mutants with enzyme activity differences of more than 2-fold, and the dynamic screening window covered the concentration range of 0.05-1.5 mg / mL.

[0494] Application examples

[0495] This invention establishes a laboratory evolutionary system based on bleomycin resistance gradient screening for the adaptive evolution of target enzyme genes 4CL and CHS. First, preliminary screening experiments determined that the effective selection concentration of the BleoR resistance gene under these experimental conditions was 0.1 mg / mL. Accordingly, 1 μL of bleomycin stock solution (100 mg / mL) was added to each well of a standard 500 μL reaction system to achieve the target concentration. Subsequently, the mutant strain was continuously passaged into screening systems with new concentration gradients of antibiotics, initiating a new round of laboratory evolution. This system, by continuously increasing resistance pressure, selects mutants with stronger adaptability to the target substrate and more stable expression, providing an effective strategy for obtaining high-performance 4CL and CHS enzymes. The specific operation procedure is as follows:

[0496] (1) Pre-culture and initial concentration determination of strains: Single clones of evolved strains were selected and amplified in 3 mL of auxotrophic medium in 15 mL sterile centrifuge tubes (30°C, 220 rpm). No selection pressure was applied during this stage to obtain sufficient initial bacterial population (OD).600 =0.8). A screening pressure threshold was established through preliminary experiments, and the bleomycin inhibitory concentration was determined using a fine concentration gradient method (0.1-1.5 mg / mL, gradient interval 0.01 mg / mL). In this experiment, 0.1 mg / mL was selected as the initial screening concentration.

[0497] (2) Construction of a multi-stage adaptive evolution culture system: A 500 μL / well evolution system was constructed using 96-well deep-well plates as the carrier, with 6 biological replicates. Each stage was performed according to the following standardized procedure: 15 μL of initial bacterial suspension (pre-cultured to mid-log phase) was mixed with 3 mL of composite culture medium containing gradient resistance concentrations (containing 0.5-1.5 mg / mL bleomycin or a specific concentration of 5-FOA) and then aliquoted into each well. The culture parameters were set at 30°C and 900 rpm with shaking. The top of the system was covered with a breathable sealing film to maintain humidity balance and prevent osmotic pressure changes caused by evaporation of the culture medium.

[0498] (3) Dynamic monitoring of evolutionary process and propagation strategy: using OD 600 Bacterial growth monitoring methods track evolutionary processes through OD. 600 The strain's adaptability was assessed by changes in turbidity. When the culture turbidity reached 0.8 ± 0.05, serial subculturing was performed: 1) the bleomycin concentration in the positive selection group was increased to 1.0 mg / mL; 2) the negative control group maintained the original selection pressure but reduced the inoculum size to 10 μL / well; 3) the backup group was temporarily stored at 4 °C. Each stage was observed for 3 ± 1 days, and the growth advantage of each well was assessed using a double-blind method.

[0499] (4) Synergistic screening mechanism under combined pressure: A dual-parameter synchronous optimization strategy was introduced in the fourth evolutionary stage: 1) The working concentration of bleomycin was increased to 0.08 mg / mL; 2) The bacterial culture dilution ratio was increased to 1:300. In the final stage, the dual screening pressure of low substrate (p-coumaric acid) concentration, high dilution ratio and high bleomycin (0.1 mg / mL) was integrated, and the mutation accumulation effect was verified by fluorescence. It can be seen that the present invention innovatively adopts a step-by-step pressure loading mode, combined with dynamic passage regulation technology, and successfully obtains a superior strain with a stable resistance phenotype through eight rounds of continuous evolution.

[0500] See Figure 16 Experimental data showed that when the bleomycin concentration reached 0.1-0.5 mg / mL, the growth curve of the wild-type strain showed a significant inflection point; the 0.25 mg / mL group showed a lower OD value. 600 The initial value dropped sharply from 0.59±0.2 to 0.10±0.1, indicating that wild-type enzyme activity was insufficient to drive sufficient BleoR expression to resist antibiotic stress. When the concentration increased to 0.5 mg / mL, complete inhibition of bacterial growth (OD) was achieved. 600The value <0.1 indicates that this threshold can be used as a critical pressure value for evolutionary screening.

[0501] 1.5. Evolutionary results of coumaroyl coenzyme A:

[0502] The ySZL6-23 series strains were developed by transforming the 4CL-related mutant gene expression plasmid of plasmid pSZL6-23 into yeast, simultaneously integrating the downstream enzyme STS to convert coumaric acid to coumaroyl coA, and then to resveratrol via STS enzyme. The expression of the mutant-driven sensor couR-driven reporter gene bleoR-p2A-YFP was detected, and resveratrol yield was determined by high-performance liquid chromatography (HPLC). The mutant potency was quantified (see [link to relevant documentation]). Figure 17 (a) To verify the performance of the 4CL mutant in the metabolic pathway and the correlation between fluorescence intensity and product accumulation, ten representative mutants with high, medium, and low fluorescence intensities were selected and co-integrated with the rate-limiting enzyme STS to construct a resveratrol synthesis pathway. High-performance liquid chromatography (HPLC) was used to quantify the product concentration. The results showed a significant linear correlation between resveratrol yield and YFP fluorescence intensity (a). Figure 17 (b) indicates that the biosensor can serve as a reliable predictor of yield and reflects the synergistic enhancement of mutants at the protein and metabolic levels.

[0503] Among the dominant mutants, 3A5, 1C8, and 2A4 showed resveratrol yields 3.60, 3.27, and 3.20 times higher than the wild type, respectively. Plasmid sequencing results revealed mutations at the I46L, F110R, and N397 sites in several high-yielding mutants. Structural analysis indicated that these sites are located near the enzyme's active pocket or substrate channel, potentially affecting substrate binding or conformational flexibility.

[0504] Four mutants with outstanding performance were finally obtained: 3A5, 2A4, 1C8 and 2F9, with final resveratrol yields of 0.36 mM, 0.31 mM, 0.31 mM and 0.30 mM, respectively. Among them, the yields of three mutants were more than three times that of wild type, showing significant improvement in expression efficiency and metabolic flux.

[0505] In summary, through mutation spectrum analysis, allele verification, product quantification, and structure-function correlation, the enhancing effects of key sites such as I46L, F110R, and N397K on the function and metabolic yield of 4CL were clarified, providing a data foundation for deep learning to predict mutant combinations and optimize pathways. Although the CouR sensor used is effective in the range of p-coumaroyl-CoA concentration ≤80 mg / L, its linear response range limits its ability to resolve higher product concentrations. However, the 4CL mutant obtained through multiple rounds of screening not only controlled the concentration of the precursor p-coumaric acid below the toxicity threshold (250 mg / L), but also increased the resveratrol yield to 0.08 g / L. Figure 17This provides a highly efficient biocatalyst for the industrial biomanufacturing of drug molecules.

[0506] After collecting and analyzing the 4CL enzyme library, the functional and structural distribution characteristics of the 4CL mutant enzymes were further clarified. Fluorescence expression tests were performed on 194 mutant clones, and they were sorted from highest to lowest according to their MFI values. Figure 18 ).

[0507] 1.6. Evolutionary results of naringenin:

[0508] The ySZL6-24 series strains were developed by transforming yeast with the expression plasmid of the 4CL & CHS-related mutant genes from plasmid pSZL6-24, simultaneously integrating the downstream enzyme CHI, and transforming the substrate coumaric acid into naringenin. The mutant-driven sensor ttgR was used to detect the expression of the reporter gene bleoR-p2A-YFP, and high-performance liquid chromatography (HPLC) was used to detect naringenin production, quantifying the mutant efficacy. Results showed that most mutants produced significantly higher levels of naringenin than the control group co-expressing wild-type 4CL, CHS, and CHI, confirming that the mutant alleles clearly contribute to increased metabolic flux. Figure 19As shown, the naringenin yields of mutant strains A9, D2, and B8 were 79.62 mg / L, 81.62 mg / L, and 95.60 mg / L, respectively, which were 3.2 times, 3.3 times, and 3.8 times that of the wild type. Plasmid sequencing results revealed the following mutation profile: A9: The 4CL gene contains three missense mutations (Q16P / V160I / E190G); the CHS gene contains one non-synonymous mutation (D61L) and three synonymous mutations (I65I / A315A / T361T). D2: The 4CL gene contains two missense mutations (D10N / Q11L); the CHS gene contains one missense mutation (D61R) and one synonymous mutation (A308A). B8: The 4CL gene contains one missense mutation (F23L); the CHS gene contains two missense mutations (S293T / A308K). During evolution, high-mutation-rate DNA polymerases induced multi-site mutations in the CHS and 4CL genes. The A308, T132, and E61 sites in the CHS gene were frequently enriched, suggesting their crucial role in increasing naringenin production. Among the tested mutants, nine strains showed higher naringin production than the wild-type CHS1 (25.15 mg / L), including: CHS1A7 (59.63 mg / L, 2.4-fold), CHS12A7 (75.34 mg / L, 3.0-fold), CHS1A9 (79.62 mg / L, 3.2-fold), CHS1S3-13 (49.32 mg / L, 2.0-fold), CHS1D2 (81.62 mg / L, 3.3-fold), CHS1B7 (58.05 mg / L, 2.3-fold), CHS1G11 (50.97 mg / L, 2.1-fold), and CHS1S3-B8 (95.60 mg / L, 3.8-fold, the most significant increase). In subsequent functional validation, three optimal mutants (CHS3-B8, CHS1D2, and CHS1A9) were selected for combined expression to explore their synergistic effects. However, the yield of naringenin after co-expression was lower than that of any single mutant, suggesting that there may be antagonistic effects or structural interference between these mutations. This indicates that the enhancement of enzyme function is not simply due to the superposition of mutations, and further investigation is needed to explore the complex mapping relationship between sequence and function.

[0509] Furthermore, the quantification of fluorescence and naringenin titration revealed a near-perfect correlation.

[0510] After collecting and analyzing the CHS enzyme library, we further clarified the functional and structural distribution characteristics of CHS mutant enzymes. Fluorescence expression tests were performed on 136 mutant clones, and they were sorted from highest to lowest MFI value (see [link to MFI analysis]). Figure 20 ).

[0511] II. MetaAI for Predictive Cell Design

[0512] To overcome the limitations of experimental screening described above and to reveal the complex interactions between sequence variation and cellular function, we developed MetaAI—a hybrid experimental-computational evolutionary framework capable of predictive modeling and rational design in high-dimensional sequence spaces. Based on the rich datasets of experimentally validated 4CL and CHS variants described above, MetaAI reconstructs the sequence-metabolite landscape with high fidelity, revealing underlying structures and guiding the design of novel, high-performance genotypes. For ease of description, MetaAI is referred to as the screening model. In this disclosure, MetaAI is synonymous with the screening model.

[0513] The mutant protein sequences corresponding to the screened catalytic enzyme mutants and the wild-type protein sequences corresponding to the wild-type catalytic enzymes are input into the screening model to obtain the fitness score corresponding to each catalytic enzyme mutant; based on the fitness score, the target catalytic enzyme mutants are screened.

[0514] The screening model is used to predict and output the functional predicted values ​​of catalytic enzyme mutants. The fitness score is obtained by calculating the difference between a first predicted value obtained by the screening model for the functional value of the catalytic enzyme mutant and a second predicted value obtained for the functional value of the wild-type catalytic enzyme. Based on the fitness score, all catalytic enzyme mutants are ranked, and mutants that meet the preset threshold or ranking requirements are selected as target catalytic enzyme mutants.

[0515] Catalytic enzyme mutants may include double, triple, and quadruple mutants of enzyme 4CL, and / or double, triple, and quadruple mutants of enzyme CHS. The fitness score can be the difference between a first predicted value of the functional value of the catalytic enzyme mutant by the screening model and a second predicted value of the functional value of the wild-type catalytic enzyme by the screening model. In embodiments of this disclosure, catalytic enzyme mutants can be sorted according to their fitness scores. Catalytic enzyme mutants whose fitness scores meet the actual requirements are selected as target catalytic enzyme mutants from the sorted list.

[0516] The screening model includes: a one-dimensional vector extraction unit, a two-dimensional vector extraction unit, a geometric encoder, and a multilayer perceptron. The step of inputting the mutant protein sequences corresponding to the screened catalytic enzyme mutants and the wild-type protein sequences corresponding to the wild-type catalytic enzymes into the screening model to obtain fitness scores for each catalytic enzyme mutant includes: the one-dimensional vector extraction unit performing one-dimensional feature extraction based on the mutant protein sequences to obtain a one-dimensional mutation vector representing the evolutionary information of the catalytic enzyme mutants, and performing one-dimensional feature extraction based on the wild-type protein sequences to obtain a one-dimensional wild-type vector representing the evolutionary information of the wild-type catalytic enzymes; the two-dimensional vector extraction unit performing two-dimensional feature extraction based on the mutant protein sequences to obtain a vector representing the function related to the catalytic enzyme mutants. Two-dimensional mutation vectors representing geometric features are obtained from the wild-type protein sequence. Two-dimensional feature extraction is performed to obtain two-dimensional wild-type vectors representing geometric features related to the function of the wild-type catalytic enzyme. The one-dimensional mutation vector, one-dimensional wild-type vector, two-dimensional mutation vector, and two-dimensional wild-type vector are input into the geometric encoder to obtain a first node embedding vector corresponding to the catalytic enzyme mutant and a first edge embedding vector for connecting the first node embedding vector, as well as a second node embedding vector corresponding to the wild-type catalytic enzyme and a second edge embedding vector for connecting the second node embedding vector. The first node embedding vector, the first edge embedding vector, the second node embedding vector, and the second edge embedding vector are input into a multilayer perceptron to obtain an fitness score.

[0517] In this embodiment, the one-dimensional vector extraction unit can be a protein language model, such as a 650M parameter version of the ESM-2 (Volutionary Scale Modeling Version 2) model. It can correctly extract one-dimensional feature vectors from the protein sequence. These one-dimensional feature vectors can include one-dimensional wild-type vectors and one-dimensional mutation vectors. The two-dimensional vector encoder can extract two-dimensional feature vectors characterizing the protein's geometric structure. These two-dimensional feature vectors can include two-dimensional wild-type vectors and two-dimensional mutation vectors.

[0518] The geometric encoder can be a geometric encoder for the SPIRED-Fitness model. The geometric encoder can include a self-attention layer, a linear mapping, an instance normalization layer, a two-dimensional convolutional layer, and an activation layer. The activation layer can include a Leaky Rectified Linear Unit (LeakyReLU). The geometric encoder can transform a one-dimensional feature vector into node embedding vectors. Node embedding vectors can include a first node embedding vector and a second node embedding vector. The geometric encoder can transform a two-dimensional feature vector into edge embedding vectors. Edge embedding vectors can include a first edge embedding vector and a second edge embedding vector. Node embedding vectors carry feature information of the protein sequence, while edge embedding vectors carry information about the connections between node embedding vectors, i.e., they carry geometric information of the protein sequence. In other words, the input data of the geometric encoder can be the one-dimensional mutation vector, one-dimensional wild-type vector, two-dimensional mutation vector, and two-dimensional wild-type vector. The input data is jointly encoded through the self-attention layer, convolutional layer, and other structures within the geometric encoder, simultaneously generating node embedding vectors and edge embedding vectors representing the structural features of the protein. Specifically, the output data of the geometric encoder includes: a first node embedding vector and a first edge embedding vector corresponding to the catalytic enzyme mutant, and a second node embedding vector and a second edge embedding vector corresponding to the wild-type catalytic enzyme.

[0519] A multilayer perceptron can output the functional value of a protein based on node embedding vectors and edge embedding vectors. In this way, the functional value corresponding to the catalytic enzyme mutant (the aforementioned first predicted value) and the functional value corresponding to the wild-type catalytic enzyme (the aforementioned second predicted value) can be obtained. The difference between the first and second predicted values ​​is used as the fitness score corresponding to the catalytic enzyme mutant.

[0520] The training and evaluation of the screening model followed these steps: First, all data (4CL: 193 samples, CHS: 136 samples) were divided into a training set and an independent test set in a fixed proportion. Five-fold cross-validation was performed using the training set data to optimize the hyperparameters. In the cross-validation, each round used 4-fold data for training and 1-fold data for validation, for a total of 5 rounds to ensure the robustness of the screening model. After determining the optimal hyperparameters, the model was finally trained using the entire training set data (4CL: 170 samples, CHS: 119 samples). The model performance was finally evaluated using a reserved test set (4CL: 23 samples, CHS: 17 samples). The evaluation results showed that the model's predicted fitness score was highly correlated with the experimentally measured metabolite yield (for 4CL, both Pearson and Spearman correlation coefficients were >0.9), indicating that the model effectively learned the sequence-function mapping.

[0521] This approach effectively reduces the randomness caused by sample partitioning and improves the model's robustness to different data distributions. After completing cross-validation and determining the hyperparameters, the model is finally trained using the entire training dataset containing 170 4CL samples and 119 CHS samples. Simultaneously, a test set consisting of 23 reserved 4CL samples and 17 CHS samples will be used for the final evaluation of the model's performance, ensuring that the evaluation results are not affected by the training process.

[0522] The screening model performed exceptionally well on the test set, demonstrating a highly high correlation between the predicted fitness scores and experimentally measured metabolite yields (both Pearson and Spearman correlation coefficients were greater than 0.9 for 4CL). This indicates that the screening model in this disclosure successfully learned the intrinsic regularity of protein sequence-protein function (fitness scores) from limited anchor data.

[0523] To further evaluate the predictive ability of the MetaAI model, three-, four-, and five-coupling mutations of 4CL and CHS were systematically tested to fully explore the performance limits of the model in multi-mutant co-design. The results showed that all mutants predicted by the model had better fitness than wild-type proteins, and the yield of the target product was significantly increased, indicating that MetaAI can not only effectively predict multiple-coupling mutations but also has high prediction accuracy and good protein adaptability expansion capabilities. To further verify the practical application effect of this deep learning model in metabolic engineering, fermentation validation was performed on the mutant with the highest predicted fitness, and its effects on resveratrol and naringenin synthesis were evaluated. Figure 22 As shown, the performance of AI-assisted screening (MetaAI) and experimental evolution (OrthoRep) in terms of product yield was compared: Figure 22 In strain a, the engineered strain constructed from the 4CL mutant optimized through deep learning achieved a resveratrol yield of approximately 270 mg / L, about 14.6 times that of the wild type (18.5 mg / L), and significantly higher than the mutant obtained through OrthoRep evolution (82 mg / L). These results demonstrate that this model can not only accurately identify key functional sites but also synthesize enzyme mutants that offer greater advantages than experimental evolution. Figure 22 Figure b further demonstrates the synergistic design effect of the 4CL and CHS dual-enzyme system: the naringenin yield in the MetaAI group was 250 mg / L, which was 10.6 times that of the wild type (23.5 mg / L), and also significantly higher than that of the traditional evolutionary group. It is worth emphasizing that the dual-enzyme system involves a more complex mutation space and inter-enzyme interactions, yet the model can still accurately screen efficient combinations, demonstrating that the constructed multi-channel deep learning framework has excellent structural reconstruction and generalization capabilities.

[0524] The mutant results obtained after prediction by the MetaAI model are shown in Tables 5-7 below. Figure 23 As shown.

[0525]

[0526]

[0527]

[0528] In summary, the MetaAI model demonstrates excellent predictive performance and engineering guidance value in practical applications. It can efficiently guide the screening of multi-site mutants, significantly improve the yield of target products, and outperform traditional evolutionary strategies. Furthermore, the model possesses good scalability, is suitable for the collaborative design of single-enzyme and multi-enzyme systems, and provides a powerful tool for the intelligent construction of complex metabolic pathways.

[0529] In summary, these findings challenge the notion that cell sequence-phenotype relationships are inherently unpredictable. Instead, they demonstrate that landscapes can be decoded, compressed, and rationally designed through strategically sampled key points and structurally information-based modeling.

[0530] It should be noted that although the technical solution of the present invention has been described with specific examples, those skilled in the art will understand that the present invention should not be limited thereto.

[0531] The various embodiments of the present invention have been described above. These descriptions are exemplary and not exhaustive, nor are they limited to the disclosed embodiments. Many modifications and variations will be apparent to those skilled in the art without departing from the scope and spirit of the described embodiments. The terminology used herein is chosen to best explain the principles, practical application, or technical improvements to the embodiments in the market, or to enable others skilled in the art to understand the embodiments disclosed herein.

Claims

1. A method for optimizing metabolic pathways by AI-based reconstructing using a growth-coupled metabolic enzyme directed evolution system, characterized in that, The method includes: Metabolic enzymes are evolved using a growth-coupled directed evolution system, wherein host cells are cultured and screened, and based on the host cells that survive the screening pressure, catalytic enzyme mutants conforming to the directed evolution direction are obtained, along with mutant protein sequences of the catalytic enzyme mutants; and, The mutant protein sequences corresponding to the screened catalytic enzyme mutants and the wild-type protein sequences corresponding to the wild-type catalytic enzymes are input into the screening model to obtain the fitness score for each catalytic enzyme mutant; based on the fitness score, target catalytic enzyme mutants are screened. The growth-coupled metabolic enzyme directed evolution system includes: a host cell, and an orthogonal DNA replication system and a biosensor system integrated into the host cell; The orthogonal DNA replication system is used to mutate a target gene encoding at least one metabolic enzyme in order to carry out continuous directed evolution of the target gene. The biosensor system includes a biosensor and a selectable marker gene and / or a reporter gene. The biosensor is configured to respond to metabolites in a metabolic pathway involving the metabolic enzyme to regulate the expression of the selectable marker gene and / or reporter gene. The biosensor couples the concentration of the metabolite with the survival or growth advantage of the host cell, such that host cells surviving under selection pressure contain the target gene that has evolved to promote the production of the metabolite. Preferably, the biosensor system includes a biosensor, as well as a selectable marker gene and a reporter gene.

2. The method according to claim 1, characterized in that, The metabolites include intracellular metabolites and / or metabolites that diffuse outside the cell.

3. The method according to claim 1 or 2, characterized in that, The biosensor system is integrated into the genome of the host cell, and / or, The biosensor is expressed by a promoter; and / or, The selectable marker gene and / or reporter gene are expressed by a promoter containing a binding sequence of the biosensor; Optionally, when the metabolite is absent or the metabolite concentration is insufficient to trigger a response in the biosensor, the biosensor binds to the binding sequence to suppress the expression of the selectable marker gene and / or reporter gene; when the target gene in a host cell surviving under selection pressure evolves into a target gene that promotes the production of the metabolite, the metabolite concentration increases, and the biosensor binds to the metabolite, thereby promoting the expression of the selectable marker gene and reporter gene.

4. The method according to any one of claims 1-3, characterized in that, The host cells include prokaryotic cells or eukaryotic cells; Optionally, the prokaryotic cells include Escherichia coli; Optionally, the eukaryotic cells include yeast, mammalian cells, insect cells, plant cells, and / or fungi; Optionally, the host cell includes yeast, including Saccharomyces cerevisiae, Saccharomyces kiwifruit, Saccharomyces cerevisiae, Saccharomyces rubrum, or Saccharomyces pastoris; Preferably, the host cell comprises Saccharomyces cerevisiae.

5. The method according to any one of claims 1-4, characterized in that, The selected marker gene includes an antibiotic resistance gene or a nutrient marker gene; and / or, the reporter gene contains a gene encoding a fluorescent protein; Optionally, the antibiotic resistance gene includes at least one selected from bleomycin resistance gene, kanamycin resistance gene, hygromycin resistance gene, norsin gene, and amphotericin gene; Optionally, the nutrient marker gene includes at least one selected from URA, LEU, HIS, ADE2, TRP1, and MET17; Optionally, the fluorescent protein includes green fluorescent protein or a derivative thereof, and / or red fluorescent protein or a derivative thereof.

6. The method according to any one of claims 1-5, characterized in that, The orthogonal DNA replication system comprises: The plasmid contains a foreign plasmid encoding a target gene for at least one metabolic enzyme; and a fallibility-prone DNA polymerase expression plasmid.

7. The method according to any one of claims 1-6, characterized in that, The screening model includes: a one-dimensional vector extraction unit, a two-dimensional vector extraction unit, a geometric encoder, and a multilayer perceptron. The step of inputting the mutant protein sequences corresponding to the screened catalytic enzyme mutants and the wild-type protein sequences corresponding to the wild-type catalytic enzymes into the screening model to obtain fitness scores for each catalytic enzyme mutant includes: the one-dimensional vector extraction unit performing one-dimensional feature extraction based on the mutant protein sequences to obtain a one-dimensional mutation vector representing the evolutionary information of the catalytic enzyme mutants, and performing one-dimensional feature extraction based on the wild-type protein sequences to obtain a one-dimensional wild-type vector representing the evolutionary information of the wild-type catalytic enzymes; the two-dimensional vector extraction unit performing two-dimensional feature extraction based on the mutant protein sequences to obtain a vector representing the function related to the catalytic enzyme mutants. Two-dimensional mutation vectors representing geometric features are obtained from the wild-type protein sequence. Two-dimensional feature extraction is performed to obtain two-dimensional wild-type vectors representing geometric features related to the function of the wild-type catalytic enzyme. The one-dimensional mutation vector, one-dimensional wild-type vector, two-dimensional mutation vector, and two-dimensional wild-type vector are input into the geometric encoder to obtain a first node embedding vector corresponding to the catalytic enzyme mutant and a first edge embedding vector for connecting the first node embedding vector, as well as a second node embedding vector corresponding to the wild-type catalytic enzyme and a second edge embedding vector for connecting the second node embedding vector. The first node embedding vector, the first edge embedding vector, the second node embedding vector, and the second edge embedding vector are input into a multilayer perceptron to obtain an fitness score.

8. A mutant obtained by the AI-based metabolic pathway optimization method based on a growth-coupled metabolic enzyme directed evolution system as described in any one of claims 1-7; Optionally, the mutant includes a 4-coumaric acid-CoA ligase (4CL) mutant, the 4-coumaric acid-CoA ligase mutant corresponding to the amino acid sequence shown in SEQ ID NO: 5, having any one of the mutations shown in (m1)-(m30): (m1) I284M, I533M; (m2) N415S, I533M; (m3)I284M,N415S; (m4) F110R, K426E, L450S; (m5)I167V,N397K,K426E; (m6)F110K,E190G,I454V; (m7) F110R, N238D, E331V, S532A; (m8) N397K, V498E, S532A, D545G; (m9)N397G,K426E; (m10)F110R,I167N,N397K; (m11)T3I,F110K,M318K; (m12)S262G,M318K,D488G; (m13) N397K, K426E, L450S; (m14)I46S,I167T,K426E; (m15)M318K,E331V,D488G; (m16)F110R,E331V,E365G; (m17)F110R,N397K; (m18) I271L, N397G, K426E, I541T; (m19)I271L,K426E,D488G,I541T; (m20)F110K,N397K; (m21)I46S,K426E; (m22)N415S,K544S; (m23)N415S,T423S; (m24)N397K,K426E; (m25) F110R, I252T, M318K, E331V; (m26) F110K, V185G, E331V, I505L; (m27) F110K, V246G, E365G, S532A; (m28) V8D, Q130L, M318K, S532A; (m29) D13G, M318K, N397K, I524V; (m30) F110K, I271L, N397K, I541T; Optionally, the mutant includes a chalcone synthase (CHS) mutant, the chalcone synthase mutant corresponding to the amino acid sequence shown in SEQ ID NO: 6, having any one of the mutations shown in (n1)-(n30): (n1)D61L,A308K; (n2)D61R,K66E,A308K; (n3)K66E,A308K; (n4) K66E, K67R, A308K; (n5) K281E, S293H, A308K; (n6) K66E, I229T, K234R, A308K; (n7)V2I,K281E,A308K; (n8) K281E, S293T, A308K; (n9) K66E, S208N, A308K; (n10)D61A,K66E,A308K; (n11) K66E, A308R; (n12) K66E, K67R, A308K, L343H; (n13)D61L,K281E,A308K; (n14)K66E,S208N; (n15)D61A,K281E,A308K; (n16)A308K,L343H; (n17) K55R, K281E, A308K; (n18)D61A,K66E; (n19)K281E,A308K; (n20)S293T,A308K; (n21)D61R,K66E; (n22) K55R, D61R, K66E, K67R; (n23)K66E,L343H; (n24)D61A,K66E,K67R,I229T; (n25)D61R,K66E,K67R,A308R; (n26) K66E, K67R, S208N, I229T; (n27) K66E, K67R, S293F, L343H; (n28) K66E, K67R, I229T, L343H; (n29) K66E, K67R, I229T, K234R; (n30)D61A,K66E,K67R,K234R; Optionally, the mutant comprises a combination of a 4-coumaric acid-CoA ligase mutant and a chalcone synthase mutant. The mutants include combinations of 4-coumaric acid-CoA ligase mutants and chalcone synthase mutants selected from any one of the following (z1)-(z32): (z1)4CL mutants I284M, I533M; CHS mutants K281E, S293T, A308K; (z2)4CL mutants N415S, I533M; CHS mutants D61L, A308K; (z3)4CL mutants I284M, I533M; CHS mutants K281E, S293H, A308K; (z4)4CL mutants N415S, I533M; CHS mutants K281E, S293T, A308K; (z5)4CL mutants I167V, N397K, K426E; CHS mutants K281E, S293T, A308K; (z6)4CL mutants I167V, N397K, K426E; CHS mutants D61R, K66E, A308K; (z7)4CL mutants F110R, I167N, N397K; CHS mutants K66E, A308K; (z8)4CL mutants F110R, I167N, N397K; CHS mutants D61L, A308K; (z9)4CL mutants I167V, N397K, K426E; CHS mutants D61L, A308K; (z10)4CL mutants I167V, N397K, K426E; CHS mutants V2I, K281E, A308K; (z11)4CL mutants I284M and I533M; CHS mutants D61L and A308K; (z12)4CL mutants F110R, I167N, N397K; CHS mutants K66E, K67R, A308K; (z13)4CL mutants I284M, I533M; CHS mutants D61R, K66E, A308K; (z14)4CL mutants I167V, N397K, K426E; CHS mutants K66E, A308K; (z15)4CL mutants I167V, N397K, K426E; CHS mutants K66E, K67R, A308K; (z16)4CL mutants F110R, I167N, N397K; CHS mutants D61R, K66E, A308K; (z17)4CL mutants F110R, I167N, N397K; CHS mutants K281E, S293T, A308K; (z18)4CL mutants N415S, I533M; CHS mutants D61R, K66E, A308K; (z19)4CL mutants I284M and I533M; CHS mutants K66E and A308K; (z20)4CL mutants I167V, N397K, K426E; CHS mutants K281E, S293H, A308K; (z21)4CL mutants F110R, I167N, N397K; CHS mutants V2I, K281E, A308K; (z22)4CL mutants F110R, I167N, N397K; CHS mutants K281E, S293H, A308K; (z23)4CL mutants N415S, I533M; CHS mutants K66E, K67R, A308K; (z24)4CL mutants N415S, I533M; CHS mutants K281E, S293H, A308K; (z25)4CL mutants N415S, I533M; CHS mutants K66E, A308K; (z26)4CL mutants I284M, I533M; CHS mutants V2I, K281E, A308K; (z27)4CL mutants N415S, I533M; CHS mutants V2I, K281E, A308K; (z28)4CL mutants I284M, I533M; CHS mutants K66E, K67R, A308K; (z29)4CL mutants I167V, N397K, K426E; CHS mutants K66E, I229T, K234R, A308K; (z30)4CL mutants N415S, I533M; CHS mutants K66E, I229T, K234R, A308K; (z31)4CL mutants I284M, I533M; CHS mutants K66E, I229T, K234R, A308K; (z32)4CL mutants F110R, I167N, N397K; CHS mutants K66E, I229T, K234R, A308K.

9. The use of the mutant of claim 8 for the preparation of metabolites in the flavonoid synthesis pathway; Optionally, the metabolites in the flavonoid synthesis pathway include naringenin or resveratrol.

10. A directed evolution system for metabolic enzymes based on growth coupling, characterized in that, The growth-coupled metabolic enzyme directed evolution system includes: a host cell, and an orthogonal DNA replication system and a biosensor system integrated into the host cell; The orthogonal DNA replication system is used to mutate a target gene encoding at least one metabolic enzyme in order to carry out continuous directed evolution of the target gene. The biosensor system includes a biosensor and a selectable marker gene and / or a reporter gene. The biosensor is configured to respond to metabolites in a metabolic pathway involving the metabolic enzyme to regulate the expression of the selectable marker gene and / or reporter gene. The biosensor couples the concentration of the metabolite with the survival or growth advantage of the host cell, such that host cells surviving under selection pressure contain the target gene that has evolved to promote the production of the metabolite. Preferably, the biosensor system includes a biosensor, as well as a selectable marker gene and a reporter gene; Optionally, the metabolites include intracellular metabolites and / or extracellular metabolites; optionally, the intracellular metabolites include p-coumaroyl-CoA; optionally, the extracellular metabolites include naringenin. Optionally, the biosensor is expressed by a promoter, and / or the selectable marker gene and / or reporter gene is expressed by a promoter containing a binding sequence of the biosensor; Optionally, the biosensor includes CouR or a functional variant thereof, and / or TtgR or a functional variant thereof; Optionally, the CouR comprises an amino acid sequence as shown in SEQ ID NO: 8, or an amino acid sequence having at least 80% identity with SEQ ID NO: 8; and / or, the binding sequence of the CouR or a functional variant thereof comprises a nucleotide sequence as shown in SEQ ID NO: 23, or a nucleotide sequence having at least 80% identity with SEQ ID NO: 23; Optionally, the TtgR comprises an amino acid sequence as shown in SEQ ID NO: 9, or an amino acid sequence having at least 80% identity with SEQ ID NO: 9; and / or, the binding sequence of the TtgR or a functional variant thereof comprises a nucleotide sequence as shown in SEQ ID NO: 24, or a nucleotide sequence having at least 80% identity with SEQ ID NO: 24; Optionally, when the biosensor includes CouR or a functional variant thereof, the promoter containing the binding sequence of the biosensor comprises a nucleotide sequence as shown in any one of SEQ ID NO: 19-22, or a nucleotide sequence having at least 80% identity with any one of SEQ ID NO: 19-22; preferably, the promoter driving the expression of CouR or a functional variant thereof comprises a nucleotide sequence as shown in SEQ ID NO: 19, or a nucleotide sequence having at least 80% identity with SEQ ID NO: 19; Optionally, when the biosensor includes TtgR or a functional variant thereof, the promoter containing the binding sequence of the biosensor comprises a nucleotide sequence as shown in any one of SEQ ID NO: 15-18, or a nucleotide sequence having at least 80% identity with any one of SEQ ID NO: 15-18; preferably, the promoter driving the expression of TtgR or a functional variant thereof comprises a nucleotide sequence as shown in SEQ ID NO: 15 or 17, or a nucleotide sequence having at least 80% identity with SEQ ID NO: 15 or 17; Optionally, the metabolic enzyme includes 4-coumaric acid-CoA ligase and chalcone synthase; Optionally, the selectable marker gene includes an antibiotic resistance gene or a nutrient marker gene; and / or, the reporter gene comprises a gene encoding a fluorescent protein; Optionally, the antibiotic resistance gene includes at least one selected from bleomycin resistance gene, kanamycin resistance gene, hygromycin resistance gene, norsin gene, and amphotericin gene; Optionally, the nutrient marker gene includes at least one selected from URA, LEU, HIS, ADE2, TRP1, and MET17; Optionally, the fluorescent protein includes green fluorescent protein or a derivative thereof, and / or red fluorescent protein or a derivative thereof; Optionally, the orthogonal DNA replication system comprises: an exogenous plasmid containing a target gene encoding at least one metabolic enzyme; and a fallibility-prone DNA polymerase expression plasmid.