Application of soybean protein GmERFA / GmERFB and its encoding gene in improving seed weight
By regulating the content and activity of GmERFA and GmERFB proteins in soybeans, and using the CRISPR/Cas9 system to knock out or silence the GmERFA and GmERFB genes, the problem of soybean seed size and weight regulation was solved, resulting in a significant increase in seed weight and yield, and promoting the soybean breeding process.
Patent Information
- Authority / Receiving Office
- CN · China
- Patent Type
- Applications(China)
- Current Assignee / Owner
- INST OF GENETICS & DEVELOPMENTAL BIOLOGY CHINESE ACAD OF SCI
- Filing Date
- 2024-12-30
- Publication Date
- 2026-06-30
Smart Images

Figure HDA0005218613320000011 
Figure HDA0005218613320000012 
Figure HDA0005218613320000013
Abstract
Description
Technical Field
[0001] This invention belongs to the field of plant genetic engineering technology, specifically relating to the application of soybean protein GmERFA / GmERFB and its encoding gene in improving seed weight. Background Technology
[0002] Soybean (Glycine max) is an important economic crop. Its seeds are rich in protein and are a significant source of edible oils and plant protein, holding a vital position in my country's food industry. Soybean oil also has many uses in industrial production, such as biofuels, surfactants, and softeners. Given the wide range of uses of soybeans, actively developing soybean production and enhancing its competitiveness is of paramount economic and social importance to the development of agriculture and animal husbandry. Although traditional techniques have yielded some high-quality, high-nutritional-value soybean varieties, they are insufficient to meet the ever-growing demand for improved soybean production and quality. With the rapid development of modern biotechnology, in-depth exploration of functional genes related to soybean yield and quality, and the efficient and precise alteration of soybean genetic structure through molecular breeding techniques for variety improvement, have become effective ways to promote high-yield and high-quality soybean production. This is of great significance for advancing soybean breeding and variety improvement.
[0003] Seed weight and shape are among the main traits that have been domesticated during the evolution of wild soybeans into cultivated soybeans. However, for describing yield-related seed traits, seed weight (100-seed weight / 1000-seed weight) is still the most accurate. Most wild soybeans have a 100-seed weight of only about 2 grams, while cultivated soybeans generally have a 15-22-gram 100-seed weight, with some reaching up to 25 grams, showing a highly significant difference. Plants can increase yield by increasing seed weight. In recent years, some genes related to increasing seed weight have been identified in crops such as rice. Research in soybeans is still lagging behind. Summary of the Invention
[0004] The technical problem to be solved by this invention is how to regulate plant seed size, seed weight, and / or yield, and how to cultivate plants with altered seed size, seed weight, and / or yield. The technical problem to be solved is not limited to the described technical subject matter; other technical subject matter not mentioned herein will be clearly understood by those skilled in the art through the following description.
[0005] To address the aforementioned technical problems, the present invention first provides an application of proteins, wherein the application may be any of the following:
[0006] A1) Application in regulating plant seed size;
[0007] A2) Application in regulating plant seed weight;
[0008] A3) Application in regulating plant yield;
[0009] A4) Application in the cultivation of plants with altered seed size, seed weight and / or yield;
[0010] A5) Applications in molecular breeding for the improvement of seed size, seed weight and / or yield, or in the improvement of germplasm resources related to seed size, seed weight and / or yield.
[0011] The protein may be composed of protein 1 and protein 2, wherein:
[0012] The name of protein 1 may be GmERFA, and may be any of the following:
[0013] B1) The amino acid sequence is that of the protein SEQ ID NO:3;
[0014] B2) A protein that has more than 80% identity with and has the same function as the protein shown in B1) obtained by substituting, deleting and / or adding amino acid residues of the amino acid sequence shown in SEQ ID NO:3.
[0015] B3) A fusion protein with the same function obtained by attaching a tag to the N-terminus and / or C-terminus of B1) or B2);
[0016] The name of protein 2 may be GmERFB, and may be any of the following:
[0017] C1) The amino acid sequence is that of the protein SEQ ID NO:4;
[0018] C2) A protein that has more than 80% identity with and has the same function as the protein shown in C1) obtained by substituting, deleting and / or adding amino acid residues of the amino acid sequence shown in SEQ ID NO:4.
[0019] C3) is a fusion protein with the same function obtained by attaching a tag to the N-terminus and / or C-terminus of C1) or C2).
[0020] In the above applications, the proteins GmERFA and GmERFB can be derived from soybean (Glycine max).
[0021] The connections described in B3) and C3) can be made directly via peptide bonds or via adapters.
[0022] The substitution of amino acid residues described in B2) and C2) can be a conservative substitution of amino acid residues.
[0023] To facilitate the isolation, purification, detection, and / or localization of the proteins described in B1) or C1), a tag protein may be attached to its amino-terminus or carboxyl-terminus. Such tags include, but are not limited to: GST (glutathione thioredoxin) tag protein, Trx (thioredoxin) tag protein, nitrogen utilization substrate A (NusA) tag protein, His tag protein (His-tag), MBP (maltose-binding protein) tag protein, Flag tag protein, SUMO tag protein, HA (influenza hemagglutinin) tag protein, Myc tag protein, LacZ tag protein, CBD (cellulose-binding domain) tag protein, phage T7 protein kinase (T7PK) tag protein, GFP (green fluorescent protein), CFP (cyan fluorescent protein), YFP (yellow-green fluorescent protein), mCherry (monomer red fluorescent protein), or AviTag tag protein. The use of tags does not alter the function of the target protein, and those skilled in the art know how to select appropriate tag proteins according to the desired purpose.
[0024] The application can be achieved by upregulating or downregulating the content and / or activity of the proteins GmERFA and GmERFB.
[0025] Furthermore, the application may include increasing plant seed size, improving plant seed weight, and / or increasing plant yield by downregulating the content and / or activity of the proteins GmERFA and GmERFB (e.g., knocking out or silencing the GmERFA and GmERFB genes). The application may also include decreasing plant seed size, reducing plant seed weight, and / or decreasing plant yield by upregulating the content and / or activity of the proteins GmERFA and GmERFB (e.g., overexpressing the GmERFA and GmERFB genes).
[0026] This invention also provides applications of biomaterials, which may be any of the following:
[0027] D1) Application in regulating plant seed size;
[0028] Application of D2 in regulating plant seed weight;
[0029] Application of D3 in regulating plant yield;
[0030] D4) Application in the cultivation of plants with altered seed size, seed weight and / or yield;
[0031] D5) Applications in molecular breeding for the improvement of seed size, seed weight and / or yield, or in the improvement of germplasm resources related to seed size, seed weight and / or yield.
[0032] The biomaterial may be any of the following:
[0033] E1) The nucleic acid molecule that encodes the protein;
[0034] E2) contains an expression cassette containing the nucleic acid molecules described in E1;
[0035] E3) A recombinant vector containing the nucleic acid molecule described in E1), or a recombinant vector containing the expression cassette described in E2;
[0036] E4) Recombinant microorganisms containing the nucleic acid molecules described in E1), or recombinant microorganisms containing the expression cassette described in E2), or recombinant microorganisms containing the recombinant vector described in E3);
[0037] E5) A recombinant host cell containing the nucleic acid molecule described in E1), or a recombinant host cell containing the expression cassette described in E2), or a recombinant host cell containing the recombinant vector described in E3;
[0038] E6) Transgenic plant tissue containing the nucleic acid molecules described in E1), or transgenic plant tissue containing the expression cassette described in E2;
[0039] E7) A transgenic plant organ containing the nucleic acid molecule described in E1) or a transgenic plant organ containing the expression cassette described in E2).
[0040] In the above application, the nucleic acid molecule E1) may be composed of a nucleic acid molecule encoding protein 1 and a nucleic acid molecule encoding protein 2, wherein:
[0041] The nucleic acid molecule encoding protein 1 may be any of the following:
[0042] F1) The coding sequence is the DNA molecule of SEQ ID NO:1;
[0043] F2) The nucleotide sequence is a DNA molecule of SEQ ID NO:1 or SEQ ID NO:5;
[0044] The nucleic acid molecule encoding protein 2 may be any of the following:
[0045] The coding sequence of G1 is the DNA molecule of SEQ ID NO:2;
[0046] The G2 nucleotide sequence is a DNA molecule with SEQ ID NO:2 or SEQ ID NO:6.
[0047] The nucleic acid molecules mentioned in this article can be DNA, such as cDNA, genomic DNA, or recombinant DNA; the nucleic acid molecules can also be RNA, such as mRNA or hnRNA.
[0048] The nucleotide sequence shown in SEQ ID NO:1 may be the coding sequence (CDS) of the GmERFA gene, which encodes the protein GmERFA as shown in SEQ ID NO:3. The nucleotide sequence shown in SEQ ID NO:5 may be the genomic sequence of the GmERFA gene.
[0049] The nucleotide sequence shown in SEQ ID NO:2 may be the coding sequence (CDS) of the GmERFB gene, which encodes the protein GmERFB as shown in SEQ ID NO:4. The nucleotide sequence shown in SEQ ID NO:6 may be the genomic sequence of the GmERFB gene.
[0050] E1) The nucleic acid molecule may also include nucleic acid molecules obtained by codon preference modification based on the nucleotide sequences shown in SEQ ID NO:1 and / or SEQ ID NO:2.
[0051] Those skilled in the art can readily mutate the nucleotide sequences encoding proteins GmERFA and / or GmERFB using known methods, such as site-directed mutagenesis (including oligonucleotide primer-mediated site-directed mutagenesis, PCR-mediated site-directed mutagenesis, and cassette mutagenesis) or directed evolution (including error-prone PCR, DNA shuffling, and in vitro random recombination). Artificially modified nucleotide sequences that possess 75% or more identity with the nucleotide sequences encoding the proteins GmERFA or GmERFB, provided they encode the proteins GmERFA or GmERFB and have the same function as the proteins GmERFA or GmERFB, are nucleotide sequences derived from and equivalent to those of the present invention.
[0052] The present invention also provides the use of a substance for reducing the activity and / or content of said protein in any of the following:
[0053] Application of H1 in regulating plant seed size;
[0054] Application of H2 in regulating plant seed weight;
[0055] Application of H3 in regulating plant yield;
[0056] Application of H4 in the cultivation of plants with altered seed size, seed weight and / or yield;
[0057] Applications of H5 in molecular breeding for improving seed size, seed weight and / or yield, or in germplasm resource improvement related to seed size, seed weight and / or yield.
[0058] The substance may be any substance that reduces the activity and / or content of the proteins GmERFA and GmERFB through gene-level expression regulation or protein-level regulation.
[0059] The gene-level expression regulation can include expression regulation at the chromatin level (such as histone modification and chromatin remodeling), transcriptional level (such as promoter, transcription factor, and co-regulatory factor regulation), post-transcriptional level (such as RNA splicing and microRNA regulation), and post-translational level (such as ubiquitination, SUMOylation, acetylation, glycosylation, phosphorylation, methylation, NEDD8 modification, etc.).
[0060] The regulation of protein levels may include regulating protein activity and / or content through protein degradation, protein interaction, or other methods that can modulate protein activity.
[0061] In the above applications, the substance includes substances that inhibit the replication, transcription, translation, post-transcriptional modification, and / or post-translational modification of nucleic acid molecules encoding the protein.
[0062] The nucleic acid molecules encoding the protein may be the GmERFA gene and the GmERFB gene.
[0063] Furthermore, the substance may include a substance that causes the coding genes of the proteins GmERFA and GmERFB to be deleted or inactivated by site-directed mutagenesis, gene knockdown, gene editing and / or gene knockout, or a substance that targets and binds to the proteins GmERFA and GmERFB to reduce their content or inactivate their function.
[0064] It is well known to those skilled in the art to use site-directed mutagenesis (including oligonucleotide primer-mediated site-directed mutagenesis, PCR-mediated site-directed mutagenesis, and cassette mutagenesis), gene knockout techniques (including RNA interference, Morpholino interference, antisense nucleic acid techniques, and ribozyme techniques), gene editing techniques (including zinc finger ribozyme gene editing, TALEN gene editing, and CRISPR gene editing), or gene knockout techniques (including complete gene knockout and conditional gene knockout) to inhibit gene expression, silence, or knock out genes. For example, shRNA, siRNA, or miRNA targeting the genes encoding the proteins GmERFA and GmERFB can be used to inactivate or silence gene expression at the post-transcriptional or translational level. The CRISPR-Cas system containing gRNA (sgRNA) and Cas protein can also be used to knock out the target gene. Alternatively, site-directed mutagenesis can be used to mutate the GmERFA and GmERFB genes to induce frameshift mutations or premature translation termination, thereby inactivating or weakening the function of the GmERFA and GmERFB genes. In some embodiments of the present invention, CRISPR / Cas9 gene editing technology is used to knock out the GmERFA and GmERFB genes in soybeans.
[0065] Those skilled in the art will understand that nucleic acid molecules such as siRNA, miRNA, shRNA, or dsRNA can be designed by selecting target sequences based on the sequences of the GmERFA and GmERFB genes or the sequences of the mRNA transcribed from them. These nucleic acid molecules can inhibit or interfere with gene transcription, translation, or post-transcriptional and post-translational modifications, thereby affecting protein expression.
[0066] Furthermore, the substances may include nucleic acid molecules, carbohydrates, lipids, small molecule compounds, antibodies, peptides, proteins, recombinant vectors (such as gene editing vectors), recombinant cells, and viral vectors (such as lentiviruses and adeno-associated viruses).
[0067] Furthermore, the nucleic acid molecules may include (1) double-stranded RNA (dsRNA), small interfering RNA (siRNA), microRNA (miRNA), and short hairpin RNA (shRNA) used in RNA interference technology; (2) antisense RNA (asRNA) and antisense oligonucleotides (AON) used in antisense nucleic acid technology; (3) gRNA and sgRNA used in gene editing technology; and (4) aptamers and ribozymes.
[0068] In the above applications, the substance may be sgRNA or a CRISPR / Cas9 system containing the sgRNA, wherein the sgRNA targets the encoding genes of the proteins GmERFA and GmERFB.
[0069] Further, the target sequences of the sgRNA may be as shown in SEQ ID NO:7 (target sequence of sgRNA1) and / or SEQ ID NO:8 (target sequence of sgRNA2). Specifically: sgRNA1 targets the GmERFA and GmERFB genes, with its target sequence located in the first exon region of the GmERFA and GmERFB genes, and can be used to knock out the GmERFA and GmERFB genes in conjunction with the Cas9 protein; sgRNA2 targets the GmERFA and GmERFB genes, with its target sequence located in the second exon region of the GmERFA and GmERFB genes, and can also be used to knock out the GmERFA and GmERFB genes in conjunction with the Cas9 protein.
[0070] The present invention also provides a method for cultivating transgenic plants, the method comprising reducing the content and / or activity of the proteins GmERFA and GmERFB in the target plant to obtain plants with increased seed size, seed weight and / or yield.
[0071] In the above method, reducing the content and / or activity of the proteins GmERFA and GmERFB in the target plant can be achieved by reducing the expression level of the encoding genes of the proteins GmERFA and GmERFB in the target plant.
[0072] The nucleotide sequence of the gene encoding the protein GmERFA is as shown in SEQ ID NO:1; the nucleotide sequence of the gene encoding the protein GmERFB is as shown in SEQ ID NO:2.
[0073] In the above method, the reduction of the expression level of the encoding genes of the proteins GmERFA and GmERFB in the target plant can be carried out using a CRISPR / Cas9 system, wherein the CRISPR / Cas9 system includes sgRNAs that target the encoding genes of the proteins GmERFA and GmERFB.
[0074] Furthermore, the target sequence of the sgRNA may be as shown in SEQ ID NO:7 (target sequence of sgRNA1) and / or SEQ ID NO:8 (target sequence of sgRNA2).
[0075] Furthermore, the CRISPR / Cas9 system also includes the Cas9 protein.
[0076] Furthermore, the Cas9 protein described herein is not limited to any specific protein, as long as it can be used in conjunction with the sgRNA of this invention.
[0077] Furthermore, the Cas9 proteins described herein include Streptococcus pyogenes Cas9 (spCas9, subtype II-A), spCas9 HF (high fidelity), nickase Cas9 (nCas9), Staphylococcus aureus Cas9 (saCas9, subtype II-A), Neisseria meningitidis Cas9 (NmCas9, subtype II-C), Francisella novicida Cas9 (FnCas9, subtype II-B), Streptococcus thermophilus Cas9 (St1Cas9, St3Cas9), Campylobacter jejuni Cas9 (CjCas9), and Treponema pallidum Cas9, as well as orthologs of Cas9 from other organisms, but not limited to these. The Cas9 protein may also include high-fidelity Cas9 mutants (such as SpCas9-HF1, eSpCas9-1.1, and TrueCut). TM HiFi Cas9 protein, etc.
[0078] The method of this invention can be implemented with any Cas9 protein known in the art. Those skilled in the art can make appropriate selections of the coding sequence of the Cas9 protein without departing from the principles of the embodiments of this invention.
[0079] Furthermore, reducing the expression levels of the encoding genes for the proteins GmERFA and GmERFB in the target plant using the CRISPR / Cas9 system can be achieved by contacting the GmERFA and GmERFB genes in the target plant cells with any of the sgRNAs described herein (such as sgRNA1 and / or sgRNA2) and the Cas9 protein.
[0080] Furthermore, the contact step can be performed as follows (1) and (2):
[0081] (1) Directly introduce any of the sgRNAs described herein into the target plant cells, or first construct the DNA molecule encoding any of the sgRNAs described herein into an expression vector and then introduce it into the target plant cells;
[0082] (2) Directly introduce the Cas9 protein or the mRNA of the Cas9 protein into the target plant cell, or first construct the DNA molecule encoding the Cas9 protein into the expression vector and then introduce it into the target plant cell, or fuse the Cas9 protein with the membrane-penetrating peptide and then introduce it into the target plant cell through the membrane-penetrating peptide.
[0083] The membrane-penetrating peptide is used to promote the uptake and absorption of the Cas9 protein fused to it by the cell, and to enable it to perform its biological functions within the cell. Suitable membrane-penetrating peptides are not limited to specific types; any peptide capable of carrying the Cas9 protein across the membrane and internalizing it is acceptable. For example, the membrane-penetrating peptide could be Tat (Tat peptide), a transcriptional transactivator of human immunodeficiency virus (HIV).
[0084] Those skilled in the art know that Cas9 protein, Cas9 protein mRNA, Cas9 expression vectors (vectors containing and expressing DNA molecules encoding Cas9 protein), sgRNA, and sgRNA expression vectors (vectors containing and expressing DNA molecules encoding sgRNA) can be transferred into plant cells by various methods known in the art, such as chemical stimulation methods (including PEG, calcium phosphate, calcium chloride treatment, etc.), electroporation, liposome-mediated methods, microinjection, gene gun methods (also known as microparticle bombardment), laser microbeam methods, pollen tube pathway methods, ultrasonic methods, air gun methods, and eddy current methods. Furthermore, the target gene can be transferred into plant recipient cells using a vector as a medium, such as Agrobacterium Ti plasmid vector (including Ti plasmid-derived vectors such as co-integration vector systems and binary vector systems) mediated methods.
[0085] When using expression vectors to deliver sgRNA and Cas9 protein, the sgRNA and Cas9 protein can be expressed in different expression vectors or in the same expression vector.
[0086] Furthermore, the method for cultivating transgenic plants described herein may include the following steps:
[0087] (1) Construct sgRNAs (such as sgRNA1 and / or sgRNA2) targeting the GmERFA and GmERFB genes into the Cas9 expression vector to obtain the CRISPR / Cas9 gene editing vector;
[0088] (2) The CRISPR / Cas9 gene editing vector was introduced into the target plant;
[0089] (3) Transgenic plants with GmERFA and GmERFB gene knockout were obtained through screening and identification.
[0090] Further, the Cas9 expression vector described in step (1) contains the Cas9 gene and is capable of expressing the Cas9 protein. The Cas9 expression vector may also contain one or more of the following elements: origin of replication (ori), promoter (such as the U6 promoter, the U6-2 promoter of the present invention), enhancer (such as the CAG enhancer), tag (such as the FLAG tag), terminator (such as the bGH poly(A)terminator), resistance gene (such as the Kana antibiotic resistance gene, the ampicillin resistance gene), promoter of the resistance gene, selection gene (such as the bar gene), promoter of the selection gene, and promoter of the Cas9 gene (such as the Ubi promoter).
[0091] The Cas9 expression vector is commercially available. After designing the sgRNA targeting the gene, the DNA molecule encoding the sgRNA can be easily inserted into a commercial Cas9 expression vector, simultaneously expressing both the Cas9 protein and the sgRNA, thereby editing the target gene. Alternatively, conventional methods in the art can be used to construct the Cas9 expression vector. For example, the Cas9 gene can be amplified using the *Streptococcus pyogenes* genome as a template, and then cloned into a backbone expression vector (such as pET28a, pET32a, etc.) to obtain the Cas9 expression vector.
[0092] Further, the introduction in step (2) can be carried out by Agrobacterium-mediated transformation, which may include the following steps: introducing the CRISPR / Cas9 gene editing vector constructed in step (1) into Agrobacterium (such as Ca ion-induced transformation, polyethylene glycol-mediated transformation, metal cation-mediated transformation, electroporation transformation, phage transduction, etc.) to obtain recombinant Agrobacterium; infecting the callus or explant of the target plant with the recombinant Agrobacterium; and inducing and culturing the obtained positive callus or explant to obtain regenerated plants after identification.
[0093] The explants include, but are not limited to, seeds, roots, leaves, petioles, cotyledons, cotyledonary petioles, hypocotyls, stem segments, shoot apical meristems, epidermal parenchyma cells, tubers, stolons, embryogenic suspension cells, and protoplasts.
[0094] The screening and identification methods are known to those skilled in the art. For example, gene-edited plants (including progeny materials of gene-edited plants) can be identified by methods such as PCR detection, Sanger sequencing, high-throughput sequencing, Western blotting, and Southern blot.
[0095] While the GmERFA and GmERFB genes are knocked out using CRISPR / Cas9 technology in one or more embodiments provided in this invention, the invention is not limited to this specific method. Those skilled in the art will recognize that other gene knockout, gene editing, gene mutation, gene knockdown, homologous recombination, and other techniques known in the art can be used to delete or inactivate the GmERFA and GmERFB genes in the plant genome. These methods can also be used in this invention. These alternative methods do not depart from the scope of this invention, and this invention should include these alternative methods.
[0096] In this article, the plant may be any of the following:
[0097] K1) Dicotyledons;
[0098] K2) Leguminosae (family legumes);
[0099] K3) Soybean (Glycine) species.
[0100] In this article, the target plant may be a target plant containing the coding genes for proteins GmERFA and GmERFB.
[0101] In this article, the seed weight includes seed weight (weight of a single seed), weight of 100 seeds, or weight of 1000 seeds.
[0102] In this article, the yield of the plant may refer to the yield of the plant seeds.
[0103] The method for cultivating transgenic plants according to the present invention may further include the step of hybridizing the transgenic plant obtained by any of the methods described above with the plant to be improved to obtain offspring transgenic plants; the offspring transgenic plants are substantially identical in phenotype to the transgenic plants.
[0104] In this document, the term "transgenic plant" is understood to include not only the first-generation transgenic plants obtained by knocking out the GmERFA and GmERFB genes in the target plant, but also their progeny. The transgenic plant includes seeds, callus tissue, intact plants, and cells.
[0105] In this article, “GmERFA / GmERFB gene” or “GmERFA / GmERFB” or “GmERF gene” refer to the GmERFA gene and the GmERFB gene.
[0106] In this article, “GmERFA / GmERFB protein” or “GmERFA / GmERFB” or “GmERF” refers to GmERFA protein and GmERFB protein.
[0107] This invention utilizes CRISPR / Cas9 technology to knock out two GmERF genes in soybean with approximately 91% homology, namely the GmERFA and GmERFB genes, successfully creating homozygous GmERF gene knockout plants erfab5, erfab6, and erfab7. Experimental results show that knocking out the GmERF genes significantly increases soybean seed size and seed weight. The 100-seed weight of the erfab5, erfab6, and erfab7 mutants is significantly higher than that of the recipient controls Jack and Null. Statistical analysis indicates that GmERFA / GmERFB negatively regulates soybean seed size and weight, and reducing the expression levels of GmERFA / GmERFB can significantly increase soybean seed size and weight.
[0108] This invention discloses for the first time the application of proteins GmERFA / GmERFB and their encoding genes in regulating plant seed size and weight. By downregulating the content and / or activity of proteins GmERFA and GmERFB (e.g., knocking out or silencing the GmERFA and GmERFB genes), plant seed size and / or seed weight can be increased, thereby increasing plant yield (including seed yield). Those skilled in the art can also reasonably expect that upregulating the content and / or activity of the aforementioned proteins GmERFA and GmERFB (e.g., overexpressing the GmERFA and GmERFB genes) can reduce plant seed size and / or weight. This invention provides excellent genetic resources for soybean breeding, opens up new areas for the application of the GmERF gene, and has broad application prospects and significant importance for breeding new soybean varieties and promoting the commercialization of soybean breeding.
[0109] Terminology Definition
[0110] In this invention, unless otherwise stated, the scientific and technical terms used herein have the meanings commonly understood by those skilled in the art. Furthermore, to better understand this invention, definitions and explanations of relevant terms are provided below.
[0111] The term "expression cassette" generally refers to a nucleic acid construct containing sufficient nucleic acid elements to express a target gene. A typical expression cassette includes a promoter, a multiple cloning site (MCS), and a terminator. Expression cassettes may also include the target gene, marker genes (such as TK, DHFR, CAT, and NEO genes), ribosome recognition and binding sites (SDs), transcription factor binding sites (TFBSs), enhancers, silencers, repressors, introns, poly(A) signal sequences, and / or mRNA splicing signal sequences. Elements within an expression cassette can be directly linked or indirectly linked through adapters.
[0112] The term "vector" generally refers to a vector capable of delivering exogenous DNA or a target gene into host cells for amplification and / or expression. This vector can be a cloning vector or an expression vector. Vectors can be introduced into host cells through transformation, transduction, or transfection, allowing the genetic material they carry to be amplified and / or expressed within the host cells. Those skilled in the art can select appropriate vectors based on the purpose of genetic engineering and the properties of the recipient cells. The vectors include, but are not limited to: plasmids, phages (such as λ phage or M13 phage), cosmids (i.e., Cosmids), phagemids, shuttle vectors (such as yeast expression vectors), Ti plasmids, artificial chromosomes (such as yeast artificial chromosomes (YAC), bacterial artificial chromosomes (BAC), P1 artificial chromosomes (PAC), or Ti plasmid artificial chromosomes (TAC)), and viral vectors (such as baculovirus vectors, retroviruses (including lentiviruses), adenoviruses, adeno-associated viruses, poxviruses, papillomaviruses, papillomaviruses (such as SV40), and herpesviruses (such as herpes simplex virus)). A vector may contain multiple elements controlling expression, including but not limited to promoter sequences, transcription initiation sequences, enhancer sequences, selection elements, and reporter genes. Additionally, the vector may also contain a replication initiation site.
[0113] The term "microorganism" typically includes bacteria, viruses, fungi, actinomycetes, rickettsiae, mycoplasma, chlamydia, spirochetes, algae, etc. For example, the bacteria mentioned can be from genera such as *Escherichia* sp. (e.g., *Escherichia coli*), *Erwinia* sp., *Agrobacterium* sp. (e.g., *Agrobacterium tumefaciens*), *Flavobacterium* sp., *Alcaligenes* sp., *Pseudomonas* sp., and *Bacillus* sp. (e.g., *Bacillus*). The viruses mentioned can include rotaviruses, baculoviruses, retroviruses (e.g., lentiviruses), adenoviruses, adeno-associated viruses, poxviruses, papillomaviruses, influenza viruses, papillomaviruses (e.g., SV40), and herpesviruses (e.g., herpes simplex virus). The fungi may originate from genera such as *Saccharomyces p.* (e.g., *Saccharomyces cerevisiae*, *Methanolac*, *Pichia pastoris*), *Fusarium* sp., *Rhizoctonia* sp., *Verticillium* sp., *Penicillium* sp., *Aspergillus* sp., and *Cephalosporium* sp. The actinomycetes may originate from genera such as *Streptomyces* sp. The algae may originate from phylum Cyanophyta (e.g., cyanobacteria), genera such as *Fucus* sp., *Achnanthes* sp., *Amphiprora* sp., *Amphora* sp., *Ankistrodesmus* sp., *Asteromonas* sp., and *Boekelovia* sp.
[0114] The term "host cell," also known as the recipient cell, generally refers to any type of cell that can be used to introduce a vector, such as plant and animal cells. The term "host cell" can be understood not only to the specific recipient cell but also to its offspring, which, due to natural, accidental, or intentional mutations and / or alterations, may not necessarily be identical to the original parent cell but are still included within the scope of the host cell. Suitable host cells are those known in the art, including: plant cells such as Arabidopsis thaliana, tobacco (Nicotiana tabacum), maize (Zea mays), rice (Oryza sativa), wheat (Triticum aestivum), etc., but not limited to these; animal cells such as mammalian cells (e.g., Chinese hamster ovary cells (CHO cells), Chinese hamster ovary cell subline (CHO-K1 cells), African green monkey kidney cells (Vero cells), SV40-transformed African green monkey kidney cells (COS cells), young hamster kidney cells (BHK cells), mouse breast cancer cells (C127 cells), human embryonic kidney cells (HEK293 cells), human HeLa cells, fibroblasts, bone marrow cell lines, T cells or NK cells, etc.), avian cells (e.g., chicken or duck cells), and amphibian cells (e.g., Xenopus laevis cells or Andrias davidianus cells). These include, but are not limited to, davidianus cells, fish cells (e.g., grass carp, carp, rainbow trout, or catfish cells), insect cells (e.g., Sf21 cells, Sf-9 cells, or Hi-5 cells).
[0115] The term "recombinant vector" generally refers to a recombinant DNA molecule constructed by linking a foreign target gene to a vector in vitro. It can be constructed in any suitable way, as long as the constructed recombinant vector can carry the foreign target gene into the recipient cell and provide the foreign target gene with the ability to replicate, integrate, amplify and / or express in the recipient cell.
[0116] The term "recombinant microorganism" generally refers to a recombinant microorganism whose genes have been manipulated and modified to obtain a functionally altered microorganism. This can be achieved by introducing a foreign target gene or recombinant vector into the target microorganism, or by directly editing the endogenous genes of the target microorganism.
[0117] The term "recombinant host cell" generally refers to a recombinant host cell whose genes have been manipulated and modified to obtain a recombinant host cell with altered function. This can include introducing a foreign target gene or recombinant vector into the host cell, or directly editing the host cell's endogenous genes.
[0118] The term "linkage" generally refers to the association of two or more molecules. Linkages can be covalent or non-covalent. The linkages described herein can be direct peptide bonds or linkages via linkers (connectors).
[0119] The term "identity" generally refers to the degree to which two (nucleotide or amino acid) sequences have identical residues at the same position in an alignment, and is usually expressed as a percentage. The identity described herein can refer to the identity of an amino acid sequence or a nucleotide sequence. Two copies having completely identical sequences have 100% identity. Those skilled in the art will recognize that the identity of an amino acid sequence or nucleotide sequence can be determined using identity search sites on the Internet, such as the BLAST page on the NCBI homepage. For example, in Advanced BLAST 2.1, the identity of an amino acid sequence can be calculated by using blastp as the program, setting the Expect value to 10, setting all filters to OFF, using BLOSUM62 as the matrix, setting the Gap existence cost, Perresidue gap cost, and Lambda ratio to 11, 1, and 0.85 (default values), and performing a search, thus obtaining the identity value (%). Alternatively, sequence analysis software such as CLC Main Workbench and MegAlign can be used. TM The determination can be performed, for example, using a computer program BLAST with default parameters, especially BLASTP or TBLASTN. The 75% or higher identity mentioned herein can mean at least 75%, 80%, 85%, 90%, or 95% or higher. The 80% or higher identity mentioned herein can mean at least 80%, 81%, 82%, 83%, 84%, 85%, 86%, 87%, 88%, 89%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, or 99% or higher.
[0120] The term "conservative substitution" generally refers to the replacement of one amino acid residue with another amino acid residue in a side chain that has similar physicochemical properties. For example, conservative substitutions can occur between hydrophobic side chain amino acid residues (e.g., Met, Ala, Val, Leu, and Ile), between neutral hydrophilic side chain residues (e.g., Cys, Ser, Thr, Asn, and Gln), between acidic side chain residues (e.g., Asp, Glu), between basic side chain amino acids (e.g., His, Lys, and Arg), or between aromatic side chain residues (e.g., Trp, Tyr, and Phe). It is known in the art that conserved substitutions generally do not cause significant changes in protein conformation and structure, and essentially do not alter the protein's biological activity. Conservative substitutions in the protein sequence that are expected to have only a minimal or no effect on protein structure or function can be readily designed by those skilled in the art.
[0121] The term "introduction" generally refers to the transfer of a foreign gene into a recipient cell, such as a eukaryotic or prokaryotic recipient cell. There are no particular limitations on the method of introduction; any known transformation method that can transfer the target gene (such as the DNA molecule of this invention) into the recipient cell is acceptable. The methods of introduction may include any of the following: (1) introducing the target gene or a recombinant vector containing the target gene into the host bacteria via chemical transformation (such as Ca ion-induced transformation, polyethylene glycol-mediated transformation, or metal cation-mediated transformation, etc.) or physical transformation (such as electroporation transformation). (2) transducing the target gene into the host bacteria via bacteriophage transduction. (3) transferring the target gene into plant recipient cells via physical or chemical methods, such as gene gun method (also known as microparticle bombardment method or biological missile method), chemical stimulation method, electroshock method, liposome-mediated method, microinjection method, laser microbeam method, pollen tube channel method, ultrasound method, air gun method, and eddy current method, etc. (4) Transformation of the target gene into plant recipient cells using vectors, such as Agrobacterium Ti plasmid vector (including Ti plasmid-derived vectors such as co-integration vector systems and binary vector systems) mediated by Agrobacterium, transformation mediated by plant virus vectors, etc. (5) Transformation of the target gene into isolated animal cells (transfection) through calcium phosphate coprecipitation, cationic polymer methods (such as DEAE-dextran transfection), cationic liposome methods, electroporation (i.e., electrotransfection), microinjection, gene gun methods, or virus-mediated methods (such as retrovirus infection, adenovirus infection, lentivirus infection), etc. (6) Transformation of the target gene into in vivo animal cells through microinjection, retroviral vector methods, somatic cell nuclear transfer methods, sperm vector methods, or embryonic stem cell methods, etc., to further prepare transgenic animals.
[0122] The term "gene knock-down," also known as gene knockdown reduction, generally refers to techniques that inactivate or silence gene expression at the post-transcriptional or translational level without altering the gene's DNA sequence. Gene knockdown includes techniques such as RNA interference, Morpholino interference, antisense nucleic acid techniques, and ribozyme techniques.
[0123] The term "gene editing" generally refers to the ability to alter specific gene sequences within any cell, including somatic cells, resulting in base deletions, duplications, insertions, frameshift mutations, and replacements or knockouts of target genes. This allows for the substitution, deletion, splicing, and single-base changes of the genome sequence—essentially, the technology to arbitrarily "edit" the genome or the sequence of a specific gene. Gene editing includes zinc finger ribozyme knockout technology, TALEN gene editing technology, and CRISPR gene editing technology.
[0124] The term "gene knockout" generally refers to a technique that uses a foreign mutated gene to replace an endogenous normal homologous gene through homologous recombination, thereby inactivating the endogenous gene. This includes complete gene knockout (e.g., complete mutation of the target gene based on substitution or insertion targeting vectors) and conditional gene knockout (e.g., tissue-specific knockout based on the Cre-LoxP recombinase system or the FLP-FRT recombinase system).
[0125] The term "RNA interference (RNAi)" generally refers to the technique of using double-stranded RNA (dsRNA) to induce the degradation of mRNA of a target gene homologously complementary to it, thereby silencing gene expression and inducing post-transcriptional gene silencing (PTGS), thus preventing gene expression. Long dsRNA can be cleaved into smaller dsRNA fragments, known as small interfering RNA (siRNA), by the enzyme Dicer within the cell, and siRNA mediates mRNA cleavage. In a broader sense, RNA interference also includes transcriptional gene silencing (pre-transcriptional gene silencing) induced in gene regulatory regions. This silencing process involves DNA methylation rather than mRNA degradation, and the siRNA used to silence genes acts directly on the regulatory regions of the gene, not the coding regions. RNA interference can also include translational gene silencing (translational silencing), for example, microRNA (miRNA, a single-stranded RNA molecule) mainly silences gene expression by preventing mRNA translation and interfering with the accumulation of target mRNA protein products.
[0126] The term "Morpholino interference technology" typically refers to replacing the five-carbon sugar ring on a traditional nucleotide with morpholino, altering the original phosphate group. This results in a molecule that carries no charge, cannot be recognized or degraded by RNases and DNases, and is extremely stable. Its principle is similar to antisense nucleic acid technology; it binds to the mRNA molecule through complementarity with the homologous sequence of the target gene mRNA, thereby preventing the binding of other molecules and proteins to the specific mRNA nucleic acid sequence, ultimately preventing the target gene mRNA from being translated into protein.
[0127] The term "antisense nucleic acid technology" generally refers to the technology that utilizes the principle that antisense RNA can bind complementary to specific mRNA molecules with homologous sequences, thereby inhibiting the processing and translation of that mRNA. This involves artificially synthesizing antisense RNA or its gene and introducing it into cells to suppress the expression of specific genes. Antisense nucleic acid technology mainly includes antisense RNA (asRNA) and antisense oligonucleotides (AON).
[0128] The term "ribozyme technology" generally refers to the technique of using ribozymes to cleave and degrade target RNA molecules. Ribozymes are a class of RNA molecules with biocatalytic activity that can specifically bind to and cleave target RNA molecules, thereby inhibiting the expression of target genes. Ribozymes include hammerhead ribozymes, hairpin ribozymes, hepatitis D virus ribozymes, VS (Varkud satellite) ribozymes, and class I intron ribozymes, among others.
[0129] The term "Cas9 protein" generally refers to a Cas endonuclease of the type II CRISPR system that forms a complex with crRNA and tracrRNA or with guide RNA, used to specifically recognize and cleave all or part of a DNA target sequence. Cas9 proteins have two distinct domains: the HNH domain and the RuvC domain. The HNH domain is responsible for cleaving the DNA strand complementary to the crRNA (or gRNA) (the target strand), while the RuvC domain is responsible for cleaving the non-complementary strand (the non-target strand). The Cas9 protein is not limited to a specific type of protein, as long as it can interact with sgRNA (gRNA). The Cas9 protein can be derived from bacterial species.
[0130] The term "sgRNA (single-guide RNA)" generally refers to a single RNA structure created by artificially modifying a crRNA / tracrRNA complex (gRNA) with a dual RNA structure, linking the crRNA and tracrRNA directly (or through a linker). sgRNA is a component of the CRISPR-Cas9 system, responsible for guiding the Cas9 protein to recognize and cleave target nucleic acid molecules. In practical gene editing applications, sgRNA can be synthesized directly or obtained through plasmid expression or in vitro transcription. sgRNA includes a recognition region and a scaffold region. The scaffold region, as known to those skilled in the art, is responsible for binding to the Cas protein, while the recognition region is responsible for binding to the target site of the target gene, guiding the Cas protein to the target site. In this invention, sgRNA and gRNA are used interchangeably.
[0131] The term "explant" generally refers to a part of a plant used as in vitro culture material in plant tissue culture, which, after appropriate treatment and under suitable conditions, can regenerate into a whole plant. In practice, those skilled in the art select suitable explants for transformation based on different plants. Explants include seeds, roots, leaves, petioles, cotyledons, cotyledonary petioles, hypocotyls, stem segments, shoot apical meristems, epidermal parenchyma cells, tubers, stolons, embryogenic suspension cells, and protoplasts, etc.
[0132] The term "callus" generally refers to the new tissue that forms on the surface of a wound after a localized injury to the original plant. It consists of living parenchyma cells and can originate from living cells in various tissues within any organ of the plant. In plant tissue culture, it can refer to a cluster of disordered, rapidly dividing parenchyma cells formed from an explant. Cultivating callus on a suitable culture medium can induce the formation of a whole plant.
[0133] The term "comprising" is not intended to be restrictive, but rather inclusive and implies the presence of other elements besides those listed, and can be interpreted as "including but not limited to". The term "comprising" also encompasses the terms "consisting of" and "substantially consisting of". In this document, the terms "comprising" and "including" are used interchangeably. Attached Figure Description
[0134] Figure 1 Preparation of the GmERFA / GmERFB gene-editing mutant erfab5. The two target sequences (sgRNA1 and sgRNA2) of GmERFA and GmERFB, their specific locations on the GmERFA / GmERFB genes, and gene-editing verification of the erfab5 mutant are shown.
[0135] Figure 2Preparation of the GmERFA / GmERFB gene-editing mutant erfab6. The two target sequences (sgRNA1 and sgRNA2) of GmERFA and GmERFB, their specific locations on the GmERFA / GmERFB genes, and gene-editing verification of the erfab6 mutant are shown.
[0136] Figure 3 Preparation of the GmERFA / GmERFB gene-editing mutant erfab7. The two target sequences (sgRNA1 and sgRNA2) of GmERFA and GmERFB, their specific locations on the GmERFA / GmERFB genes, and gene-editing verification of the erfab7 mutant are shown.
[0137] Figure 4 Molecular detection of GmERFA / GmERFB gene mutants erfab5, erfab6, and erfab7.
[0138] Figure 5 Statistical analysis of seed phenotypes and 100-seed weights for GmERFA / GmERFB gene mutants erfab5, erfab6, and erfab7. Detailed Implementation
[0139] The present invention will now be described in further detail with reference to specific embodiments. The given embodiments are merely illustrative of the invention and not intended to limit its scope. The embodiments provided below can serve as a guide for further improvements by those skilled in the art and do not constitute a limitation on the invention in any way.
[0140] Unless otherwise specified, the experimental methods used in the following examples are conventional methods, performed according to the techniques or conditions described in the literature in this field or according to the product instructions. Unless otherwise specified, the materials and reagents used in the following examples are commercially available.
[0141] The soybean variety Jack is described in the literature “Jin-Song Zhang1, et al., A transcriptional regulatory module controls lipid accumulation in soybean, New Phytologist (2021), 231:661-678”.
[0142] The pCBSG015 vector used in the following examples was provided by Wimi Biotechnology Co., Ltd., catalog number wimi-pCXB053.
[0143] Example 1: Obtaining the soybean GmERFA / GmERFB gene
[0144] GmERFA belongs to the AP2 / ERF family of transcription factors, which is found only in plants. The AP2 / ERF family is a large family named for its AP2 / ERF domain, which consists of 60-70 amino acids. AP2 / ERF transcription factors are involved in various hormone signaling pathways, including those involving salicylic acid, jasmonic acid, ethylene, and abscisic acid. Through extensive and in-depth research, the inventors of this application screened and obtained candidate genes regulating seed weight during the analysis of soybean transcriptomes. Decreased expression of GmERFA (Glyma.02G016100) led to an increase in soybean seed weight. It is hypothesized that GmERFA negatively regulates soybean seed weight.
[0145] Primers were designed at both ends of the coding frame based on the Williams 82 (W82) reference genome sequence (Genome Assebly Glycein__max_v4.0_Genbank: GCA_000004515.5, updated: March 10, 2021). The primer sequences are as follows:
[0146] Upstream primer GmERFA-up: 5'-ATGTGTGGCGGTGCCAT-3',
[0147] Downstream primer GmERFA-dp: 5'-CTAATCGAAACTCCAGAGATCCC-3'.
[0148] RNA was extracted from the seeds of the soybean variety Jack and reverse transcribed into cDNA using Toyobo's ReverTra Ace reverse transcriptase. PCR amplification was then performed using the cDNA as a template and GmERFA-up and GmERFA-dp as primers. The GmERFA gene was amplified from total soybean RNA using PCR: Jack leaves were crushed in liquid nitrogen, suspended in 4 mol / L guanidine thiocyanate, and extracted with acidic phenol and chloroform. The supernatant was precipitated with anhydrous ethanol, and the precipitate was dissolved in water to obtain total RNA. 1 μg of total RNA was reverse transcribed using a Thermo Fisher Scientific reverse transcription kit according to the kit's instructions. The resulting cDNA fragment was then used as a template for PCR amplification.
[0149] The 50 μl PCR reaction system consisted of: 1 μl single-stranded cDNA (0.05 μg), 1.5 μl of the above primers (10 μM), 25 μl 2× PCR buffer, 10 μl dNTPs (10 mM), and 1 U KOD DNA polymerase, with the volume made up to 50 μl with ultrapure water. The reaction was performed on a PE9600 PCR instrument with the following program: denaturation at 94 °C for 5 min; followed by 30–32 cycles at 98 °C for 1 min, 58 °C for 1 min, and 68 °C for 1 min; extension at 68 °C for 10 min; and storage at 4 °C. Approximately 1.3 kb of PCR product was obtained. The amplified product was detected by 1% agarose gel electrophoresis. A DNA fragment of approximately 900 bp was recovered using an agarose gel recovery kit (TIANGEN), cloned into the pMD-18 vector (TaKaRa, catalog number 6011), transformed into competent E. coli cells, and positive clones were screened. Plasmids were extracted and sequenced. Sequencing results showed that the primer pair amplified two homologous genes, which contained two sequences of 903 bp (Glyma.10G016100) and 915 bp (Glyma.10G016500), respectively, which were the GmERFA gene and the GmERFB gene, and the proteins encoded by them were named GmERFA protein and GmERFB protein, respectively.
[0150] The genomic nucleotide sequence of the GmERFA gene is shown in SEQ ID NO:5; the GmERFA gene contains two exon sequences, exon-1 is located at positions 165-336 of SEQ ID NO:5, exon-2 is located at positions 437-1167 of SEQ ID NO:5, and the intron sequence is located at positions 337-436 of SEQ ID NO:5.
[0151] The coding sequence (CDS) of the GmERFA gene is shown in SEQ ID NO:1 (903bp); the GmERFA gene encodes the GmERFA protein, and the amino acid sequence of the GmERFA protein is shown in SEQ ID NO:3.
[0152] The genomic nucleotide sequence of the GmERFB gene is shown in SEQ ID NO:6; the GmERFB gene contains two exon sequences, exon-1 is located at positions 435-606 of SEQ ID NO:6, exon-2 is located at positions 688-1430 of SEQ ID NO:6, and introns are located at positions 607-687 of SEQ ID NO:6.
[0153] The coding sequence (CDS) of the GmERFB gene is shown in SEQ ID NO:2 (915bp); the GmERFB gene encodes the GmERFB protein, and the amino acid sequence of the GmERFB protein is shown in SEQ ID NO:4.
[0154] Example 2: Construction of GmERFA / GmERFB gene knockout mutant soybean
[0155] This embodiment uses soybean Jack as the recipient and employs CRISPR / Cas9 technology to knock out two GmERF genes in soybean with approximately 91% homology, namely the GmERFA gene and the GmERFB gene. The specific steps are as follows:
[0156] 1. Construction of CRISPR / Cas9 gene editing vector
[0157] Target sequence selection: The high-throughput CRISPR-Cas9 target design program developed by Wemi Technology was used. The target design principles of this program are as follows: 1) The knockout site should be located in the coding sequence (CDS) region and preferably at the protein's front end or in an important functional domain; 2) It should cover a higher proportion of transcripts; 3) There should be no off-target effects or off-target effects should be located in intergenic regions; 4) Targets with higher editing efficiency should be preferred; 5) The sequence should have a relatively balanced GC content and be less prone to secondary structure formation. The successful application of this program in soybean whole-genome target design has proven its feasibility. A two-gene, two-target knockout approach was adopted.
[0158] The two designed sgRNA target sequences are shown below:
[0159] The target sequence of sgRNA1 (target site T1): 5'-CCGCCGTGGAGGCCGCCGCCTCA-3' (SEQ ID NO:7),
[0160] The target sequence of sgRNA2 (target site T2): 5'-ATAAGCCGGTGAAGAGGCAGAGG-3' (SEQ ID NO:8).
[0161] The selected target sites T1 are located in the first exon region of the GmERFA and GmERFB genes, and T2 is located in the second exon region of the GmERFA and GmERFB genes. Specifically:
[0162] T1: 5'-CCGCCGTGGAGGCCGCCGCCTCA-3' (SEQ ID NO:7) targets sequences 36-58 of the GmERFA gene CDS sequence (SEQ ID NO:1) or 200-222 of the GmERFA genome sequence (SEQ ID NO:5); or sequences 36-58 of the GmERFB gene CDS sequence (SEQ ID NO:2) or 470-492 of the GmERFB genome sequence (SEQ ID NO:6).
[0163] T2: 5'-ATAAGCCGGTGAAGAGGCAGAGG-3' (SEQ ID NO:8) targets sequences 191-213 of the GmERFA gene CDS sequence (SEQ ID NO:1) or 455-477 of the GmERFA genome sequence (SEQ ID NO:5); or sequences 191-213 of the GmERFB gene CDS sequence (SEQ ID NO:2) or 706-728 of the GmERFB genome sequence (SEQ ID NO:6).
[0164] Promoter selection: AtU6 derived from Arabidopsis thaliana was used to promote the T1 and T2 target sites.
[0165] Preparation of sgRNA expression cassettes containing target sites: The pCBSG015 vector was linearized by Bsa I restriction enzyme digestion. The T1 sequence was directly synthesized using primer synthesis, and 16bp vector sequences were added to both ends as homologous arms (U6-T1, U6-T2). Reverse complementary sequences (Anti-U6-T1, Anti-U6-T2) were synthesized, annealed to form double strands, and homologously recombinated with the backbone linear vector. The specific procedures are as follows:
[0166] Sense-U6-T1: 5'-ggcaccgagtcggtgcCCGCCGTGGAGGCCCGCCGCCTCAgttgaacaacggaaac-3';
[0167] Anti-U6-T1: 5'-gtttccgttgttcaacTGAGGCGGCGGCCTCCACGGCGGgcaccgactcggtgcc-3';
[0168] Lowercase letters represent homologous arms, and uppercase letters represent T1 sequences.
[0169] Sense-U6-T2: 5'-ggcaccgagtcggtgcATAAGCCGGTGAAGAGGCAGAGGgttgaacaacggaaac-3';
[0170] Anti-U6-T2: 5'-gtttccgttgttcaacCCTCTGCCTCTCACCGGCTTATgcaccgactcggtgcc-3';
[0171] Lowercase letters represent homologous arms, and uppercase letters represent T2 sequences.
[0172] Preparation of annealed AtU6-T1-gRNA and AtU6-T2-gRNA fragments: The synthesized Sense-U6-T1 and Anti-U6-T1, Sense-U6-T2 and Anti-U6-T2 sequences were annealed to form double strands. The reaction system was as follows: the synthesized sequences were dissolved in 75 mM NaCl solution to a final concentration of 0.2 nM / μl, and equal volumes of the forward and reverse strand solutions (Sense-U6-T1 and Anti-U6-T1, Sense-U6-T2 and Anti-U6-T2) were mixed. The mixture was heated in a 95℃ water bath for 5-10 min, followed by slow cooling to obtain the annealed AtU6-T1-gRNA and AtU6-T2-gRNA fragments.
[0173] Vector linearization by enzyme digestion: 1-2 μg of pCBSG015 plasmid, 10X CutSmart TM 5 μl of buffer (NEB), 1 μl of Bsa I restriction enzyme, and sterile double-distilled water were added to make up to 50 μl. The mixture was reacted in a water bath at 37°C for 30 min, and then purified using the EZ-10 Column DNA Purification Kit (Shanghai Sangon Biotech). The purified DNA was dissolved in an appropriate amount of water to obtain the linearized pCBSG015 vector.
[0174] Homologous recombination of the target sgRNA expression cassette with the pCBSG015 vector was performed using the EasyGeno Rapid Recombinant Cloning Kit (TianGen). The reaction mixture consisted of 5 μL of 2×EasyGeno Assembly mix buffer, 0.5 μL of linearized pCBSG015 vector, and 4.5 μL of annealed AtU6-T1-gRNA and AtU6-T2-gRNA fragments. The reaction conditions were 50 °C for 15 min. The ligation product was transformed into E. coli DH5α competent cells, and plasmids were extracted from positive colonies. After successful sequencing, the recombinant vector pCBSG015-sgRNA containing two editing target sites was obtained.
[0175] The recombinant vector pCBSG015-sgRNA is a recombinant plasmid obtained by replacing the fragments between 5'-ggcaccgagtcggtgc-3' and 5'-gtt gaacaacggaaac-3' in the pCBSG015 vector with DNA fragments whose nucleotide sequences are directly linked by SEQ ID NO:7 and SEQ ID NO:8, while keeping other sequences of the pCBSG015 vector unchanged. The recombinant plasmid is named recombinant vector pCBSG015-sgRNA.
[0176] The recombinant vector pCBSG015-sgRNA contains two editing target sites (SEQ ID NO:7 and SEQ ID NO:8) and the gene encoding the Cas9 protein on the vector. After being introduced into the recipient, the two transcribed guide RNAs can target the target sequence near the PAM of the recipient genome through base complementarity, namely the GmERFA and GmERFB genes. The Cas9 protein causes double-strand breaks in the DNA at the target sites of the GmERFA and GmERFB genes. Through the organism's own DNA damage repair response mechanism, gene mutations occur in the cleaved regions during the repair process, leading to frameshift mutations or premature termination of translation in the coding genes, thereby achieving the knockout of the GmERFA and GmERFB genes.
[0177] The recombinant vector pCBSG015-sgRNA was transformed into Agrobacterium EHA105 competent cells, resulting in recombinant Agrobacterium EHA105 / pCBSG015-sgRNA.
[0178] 2. Genetic transformation of soybeans
[0179] The soybean variety Jack was infected with the recombinant Agrobacterium EHA105 / pCBSG015-sgRNA prepared above to obtain soybean plants with GmERFA / GmERFB gene editing. T0 generation seeds were harvested, and T1 generation seeds were obtained by self-pollination of T0 generation seeds. T2 generation seedlings were obtained by planting T1 generation seeds and the following tests were performed.
[0180] 3. Screening for homozygous GmERFA / GmERFB gene mutations
[0181] Using the genomic DNA of the T2 generation seedlings obtained in step 2 as a template, the mutant target detection primers were used for screening and identification. Primers for the target detection of the GmERFA gene were designed using approximately 85 bp upstream of the target sequence T1 and approximately 230 bp downstream of the target sequence T2. The amplification product was approximately 635 bp. Primers for the target detection of the GmERFA gene were designed using approximately 218 bp upstream of the target sequence T1 and approximately 100 bp downstream of the target sequence T2. The amplification product was approximately 621 bp. Sequencing primers were F: 5'-CTACTCATTCCACACCCAACTTA-3' and R: 5'-GCGGCTTCTTCTGCAGTGT-3'.
[0182] After successful sequencing, the CRISPR target editing method was analyzed using the website DSDecode (http: / / dsdecode.scgene.com / ) and compared with the standard gene sequence using manual peak reading. The editing methods of each target sequence and its upstream and downstream sequences were analyzed. The T2 generation gene-edited positive plants were then self-crossed and passaged to prepare gene-edited homozygous mutant plants.
[0183] The gene editing methods for the homozygous mutants erfab5, erfab6, and erfab7 of the GmERFA / GmERFB gene, finally screened through the above methods, are shown in the figures below. Figure 1 , Figure 2 and Figure 3 .
[0184] In the homozygous mutant erfab5 of the GmERFA / GmERFB gene, compared with the recipient soybean variety Jack, the GmERFA gene in the genome has a mutation: in both homologous chromosomes, there is a 45-nucleotide deletion at positions 208-252 of the CDS sequence (SEQ ID NO:1), corresponding to positions 472-516 of the genome sequence (SEQ ID NO:5), namely a 5'-CAGAGGAAGAATCTCTACAGAGGGATTCGGCAGCGTCCGTGGGGC-3' deletion, resulting in a large fragment deletion and frameshift mutation in the GmERFA gene, thus knocking out the GmERFA gene; while in both chromosomes, there is a 5-nucleotide deletion at positions 203-207 of the CDS sequence (SEQ ID NO:2), corresponding to positions 718-722 of the genome sequence (SEQ ID NO:6), namely a 5'-AGAGG-3' deletion, resulting in premature termination of GmERFB gene translation, thus knocking out the GmERFB gene.
[0185] In the homozygous mutant erfab6 of the GmERFA / GmERFB gene, compared with the recipient soybean variety Jack, the GmERFA gene in the genome has a mutation: in both homologous chromosomes, there is a 23-nucleotide deletion at positions 208-230 of the CDS sequence (SEQ ID NO:1), corresponding to positions 472-494 of the genome sequence (SEQ ID NO:5), namely the deletion of 5'-CAGAGGAAGAATCTCTACAGAGG-3', which causes premature termination of GmERFA gene translation, thus knocking out the GmERFA gene; while in both chromosomes, there is a deletion of the "G" at position 207 of the CDS sequence (SEQ ID NO:2), corresponding to position 722 of the genome sequence (SEQ ID NO:6), which causes premature termination of GmERFB gene translation, thus knocking out the GmERFB gene.
[0186] In the homozygous mutant erfab7 of the GmERFA / GmERFB gene, compared with the recipient soybean variety Jack, the GmERFA gene in the genome has undergone mutations: in both homologous chromosomes, there is a 12-nucleotide deletion, namely 5'-GCAGAGGAAGAA-3' deletion, at positions 207-218 of the CDS sequence (SEQ ID NO:1) corresponding to positions 471-482 of the genome sequence (SEQ ID NO:5), resulting in a large fragment deletion and frameshift mutation in the GmERFA gene, thus knocking out the GmERFA gene; while in both chromosomes, there is a 5-nucleotide deletion, namely 5'-AGAGG-3' deletion, at positions 203-207 of the CDS sequence (SEQ ID NO:2) corresponding to positions 718-722 of the genome sequence (SEQ ID NO:6), resulting in premature termination of GmERFB gene translation, thus knocking out the GmERFB gene.
[0187] Seeds from T2 generation plants (T3 generation) of mutants erfa5, erfa6, and erfa7 were harvested for subsequent experiments.
[0188] Example 3: Phenotypic identification of GmERFA / GmERFB gene knockout mutants
[0189] 1. Detection of GmERFA / GmERFB gene expression levels in erfab5, erfab6, and erfab7
[0190] Total RNA was extracted from mid-developmental soybean varieties Jack, Null (empty vector control), and erfab5, erfab6, and erfab7 seeds. Reverse transcription was performed, and the resulting cDNA was used as a template for Real-Time PCR to identify the expression levels of the GmERFA / GmERFB genes. Primers used were:
[0191] ERFA-qF: 5'-TATTTGGGTGGCACGGTAAT-3',
[0192] ERFA-qR: 5'-TCGTGGATTGTCCATCATAGTAC-3',
[0193] ERFB-F: 5'-GAACGGGTATCTGGGTGTTAC-3',
[0194] ERFB-R: 5'-AGAGATCCCCGACCAAGC-3'.
[0195] The soybean Tublin gene was used as an internal standard, and the internal standard primers were:
[0196] Primer-TF:5'-TGGCCGTTACCTGACAGCAT-3',
[0197] Primer-TR: 5'-CTCGGAGGGATGTCACACAC-3'.
[0198] The results are as follows Figure 4 As shown, the relative expression level of the GmERFA gene in Jack was 0.152±0.051, in Null it was 0.162±0.015, and in erfab5, erfab6, and erfab7 it was approximately 0.061±0.005, 0.073±0.007, and 0.054±0.005, respectively. This indicates that the expression level of the GmERFA gene in the mutants was significantly reduced. The expression levels of the GmERFB gene in Jack, Null, erfab5, erfab6, and erfab7 were 0.0062±0.0020, 0.0062±0.0015, 0.0037±0.0021, 0.0047±0.0008, and 0.0034±0.0002, respectively, indicating that the expression levels of the GmERFB gene in erfab5, erfab6, and erfab7 decreased significantly or extremely significantly.
[0199] 2. Downregulation of the GmERFA / GmERFB gene increased soybean seed weight.
[0200] Seeds from the T2 generation plants of the aforementioned erfab5, erfab6, and erfab7 transgenic events, along with seeds from the control Jack and the empty vector control Null, were sown in greenhouse pots. Greenhouse conditions included 16 hours of light at 11000 Lux and 8 hours of darkness, with daytime temperatures ranging from 30 to 37°C and nighttime temperatures from 25 to 28°C. Growth and development were observed. Seeds were harvested after 130 days. No significant differences were observed in the phenotypes of erfa5, erfa6, and erfa7 compared to the controls Jack and Null during the growth and maturity stages.
[0201] After the potted soybean seeds matured, the seeds harvested from individual plants were dried at 37℃ for one week, and the seed weights of the recipient controls Jack and Null, and the mutants erfab5, erfab6, and erfab7 were measured. Fifteen plants were taken from each line, and the biological experiment was repeated three times. Results were expressed as mean ± standard deviation. One-way ANOVA was used, with P < 0.05 (*) indicating statistical significance and P < 0.01 (**) indicating highly statistical significance.
[0202] The results are as follows Figure 5 As shown, Figure 5The left-middle figure shows that the seeds of mutants erfab5, erfab6, and erfab7 were significantly larger than those of the controls Jack and Null. The 100-seed weights of controls Jack and Null, and erfab5, erfab6, and erfab7 were 16.5±0.6, 16.2±0.5, 17.8±0.9, 17.3±0.5, and 17.6±1.0 grams, respectively. The 100-seed weight of mutants erfab5, erfab6, and erfab7 was significantly greater than that of controls Jack and Null. The 100-seed weight of mutant seeds increased by 5.8%–8.9% compared to the controls. Figure 5 (Right image in the middle)
[0203] The above statistics show that three biological replicate experiments on 15 individual plants in greenhouse pots showed that the 100-seed weight of mutants erfab5, erfab6, and erfab7 was significantly higher than that of the recipient Jack and the empty vector Null. This indicates that GmERFA / GmERFB negatively regulates the 100-seed weight of soybean seeds, and reducing the expression levels of the two homologous coding genes of GmERF, GmERFA and GmERFB, can significantly increase the 100-seed weight of soybean seeds.
[0204] The present invention has been described in detail above. Those skilled in the art will recognize that the invention can be practiced in a wide range of ways with equivalent parameters, concentrations, and conditions without departing from its spirit and scope, and without requiring unnecessary experiments. While specific embodiments have been provided, it should be understood that further modifications can be made to the invention. In summary, according to the principles of the invention, this application is intended to include any changes, uses, or improvements to the invention, including changes made using conventional techniques known in the art that depart from the scope disclosed herein.
Claims
1. Use of a protein, characterized in that, The application is any one of the following: A1) Application in regulating plant seed size; A2) Application in regulating plant seed weight; A3) Application in regulating plant yield; A4) Application in the cultivation of plants with altered seed size, seed weight and / or yield; A5) Applications in molecular breeding for the improvement of seed size, seed weight and / or yield, or in the improvement of germplasm resources related to seed size, seed weight and / or yield. The protein is composed of protein 1 and protein 2, wherein: Protein 1 is any one of the following: B1) The amino acid sequence is that of the protein SEQ ID NO:3; B2) A protein that has more than 80% identity with and has the same function as the protein shown in B1) obtained by substituting, deleting and / or adding amino acid residues of the amino acid sequence shown in SEQ ID NO:
3. B3) A fusion protein with the same function obtained by attaching a tag to the N-terminus and / or C-terminus of B1) or B2); Protein 2 is any one of the following: C1) The amino acid sequence is that of the protein SEQ ID NO:4; C2) A protein that has more than 80% identity with and has the same function as the protein shown in C1) obtained by substituting, deleting and / or adding amino acid residues of the amino acid sequence shown in SEQ ID NO:
4. C3) is a fusion protein with the same function obtained by attaching a tag to the N-terminus and / or C-terminus of C1) or C2).
2. The application of biomaterials, characterized in that, The application is any one of the following: D1) Application in regulating plant seed size; Application of D2 in regulating plant seed weight; Application of D3 in regulating plant yield; D4) Application in the cultivation of plants with altered seed size, seed weight and / or yield; D5) Applications in molecular breeding for the improvement of seed size, seed weight and / or yield, or in the improvement of germplasm resources related to seed size, seed weight and / or yield. The biomaterial is any one of the following: E1) The nucleic acid molecule encoding the protein described in claim 1; E2) contains an expression cassette containing the nucleic acid molecules described in E1; E3) A recombinant vector containing the nucleic acid molecule described in E1), or a recombinant vector containing the expression cassette described in E2; E4) Recombinant microorganisms containing the nucleic acid molecules described in E1), or recombinant microorganisms containing the expression cassette described in E2), or recombinant microorganisms containing the recombinant vector described in E3); E5) A recombinant host cell containing the nucleic acid molecule described in E1), or a recombinant host cell containing the expression cassette described in E2), or a recombinant host cell containing the recombinant vector described in E3; E6) Transgenic plant tissue containing the nucleic acid molecules described in E1), or transgenic plant tissue containing the expression cassette described in E2; E7) A transgenic plant organ containing the nucleic acid molecule described in E1) or a transgenic plant organ containing the expression cassette described in E2).
3. The application according to claim 2, characterized in that, E1) The nucleic acid molecule is composed of a nucleic acid molecule encoding protein 1 of claim 1 and a nucleic acid molecule encoding protein 2 of claim 1, wherein: The nucleic acid molecule encoding protein 1 as described in claim 1 is any one of the following: F1) The coding sequence is the DNA molecule of SEQ ID NO:1; F2) The nucleotide sequence is a DNA molecule of SEQ ID NO:1 or SEQ ID NO:5; The nucleic acid molecule encoding protein 2 as described in claim 1 is any one of the following: The coding sequence of G1 is the DNA molecule of SEQ ID NO:2; The G2 nucleotide sequence is a DNA molecule with SEQ ID NO:2 or SEQ ID NO:
6.
4. The use of a substance for reducing the activity and / or content of the protein of claim 1 in any of the following: Application of H1 in regulating plant seed size; Application of H2 in regulating plant seed weight; Application of H3 in regulating plant yield; Application of H4 in the cultivation of plants with altered seed size, seed weight and / or yield; Applications of H5 in molecular breeding for improving seed size, seed weight and / or yield, or in germplasm resource improvement related to seed size, seed weight and / or yield.
5. The application according to claim 4, characterized in that, The substance includes substances that inhibit the replication, transcription, translation, post-transcriptional modification, and / or post-translational modification of nucleic acid molecules encoding the protein of claim 1.
6. The application according to claim 4 or 5, characterized in that, The substance is sgRNA or a CRISPR / Cas9 system containing said sgRNA, wherein said sgRNA targets the gene encoding the protein of claim 1.
7. A method for cultivating transgenic plants, characterized in that, The method includes reducing the content and / or activity of the protein described in claim 1 in the target plant to obtain plants with increased seed size, seed weight, and / or yield.
8. The method according to claim 7, characterized in that, The reduction of the content and / or activity of the protein described in claim 1 in the target plant is achieved by reducing the expression level of the gene encoding the protein in the target plant.
9. The method according to claim 8, characterized in that, The reduction of the expression level of the protein-coding gene in the target plant is performed using a CRISPR / Cas9 system, which includes sgRNA targeting the protein-coding gene of claim 1.
10. The application according to any one of claims 1-6 or the method according to any one of claims 7-9, characterized in that, The plant is any one of the following: K1) Dicotyledons; K2) Leguminosae (family legumes); K3) Soybean (Glycine) species.