SNP molecular marker combination for predicting lysine content of fresh corn kernels and application thereof
By applying 20,000 SNP molecular marker combinations and molecular probe combinations in fresh corn breeding, a genome-wide selection breeding model was constructed, which solved the problems of high cost and low efficiency in detecting lysine content in fresh corn kernels, and achieved early and efficient prediction and selection of high-quality protein corn.
Patent Information
- Authority / Receiving Office
- CN · China
- Patent Type
- Patents(China)
- Current Assignee / Owner
- SHANGHAI ACAD OF AGRI SCI
- Filing Date
- 2025-11-26
- Publication Date
- 2026-06-19
AI Technical Summary
In current sweet corn breeding, the detection of lysine and tryptophan content is costly and inefficient. Traditional phenotypic selection methods are time-consuming and costly, and molecular marker technology has not been widely used in the breeding of high-quality protein sweet corn.
By employing a combination of 20,000 SNP molecular markers and molecular probes, and through whole-genome selection breeding technology, a predictive model was constructed to predict the lysine content of fresh corn kernels using genotype data, thereby reducing detection costs and improving breeding efficiency.
It enables early and efficient prediction of lysine content in fresh corn kernels, reducing breeding costs, shortening the breeding cycle, improving selection efficiency, and achieving a prediction accuracy of 94.65%.
Smart Images

Figure FT_1 
Figure FT_2 
Figure FT_3
Abstract
Description
Technical Field
[0001] This invention belongs to the field of selective breeding technology for sweet corn, specifically relating to a combination of SNP molecular markers for predicting the lysine content of sweet corn kernels and its application. Background Technology
[0002] Sweet corn is a type of corn consumed during its milk-ripe stage, primarily including three categories: sweet corn, waxy corn, and sweet-waxy corn. As a highly nutritious and efficient economic crop, the planting area of sweet corn in my country has exceeded 28 million mu (approximately 1.2 million hectares), playing a vital role in increasing farmers' income. However, the content of the essential amino acids lysine and tryptophan in corn kernels is too low, which can lead to latent hunger. Therefore, improving the lysine and tryptophan content in sweet corn kernels through breeding is of great significance. High-lysine and high-tryptophan corn is also known as high-quality protein corn.
[0003] Currently, fresh maize breeding in my country still mainly relies on traditional phenotypic selection methods, but these methods have inherent drawbacks such as high cost, low efficiency, and long breeding cycles. For complex traits such as lysine and tryptophan content in maize kernels, high-performance liquid chromatography (HPLC) is required, which is also costly. In contrast, genome-wide selection breeding has significant advantages, including low cost, high efficiency, improved selection efficiency, and shorter breeding cycles. This technology has become a standard technique used by international multinational seed companies in maize breeding. It is particularly important to note that molecular marker technology constitutes a crucial technological foundation of modern breeding systems.
[0004] Genome-wide selection (GLOC) is a breeding technique that uses markers distributed throughout the maize genome to assess the total genetic value of an individual and select breeds accordingly. It has been widely applied in various plants and animals. However, there are currently no reports on the use of molecular marker technology for genome-wide selection breeding of high-quality protein fresh maize. Summary of the Invention
[0005] In view of this, one of the objectives of the present invention is to provide a combination of SNP molecular markers and molecular probes for predicting the lysine content of fresh corn kernels, and their applications.
[0006] The second objective of this invention is to provide a method for predicting the lysine content of fresh corn kernels. This method only requires testing the genotype of the breeding material to predict the content of high-quality protein in fresh corn kernels, and has the advantages of clear selection targets and being unaffected by the environment.
[0007] To achieve the above-mentioned objectives, the present invention provides the following technical solution:
[0008] This invention provides a combination of SNP molecular markers for predicting the lysine content of fresh corn kernels, consisting of 20,000 SNP loci located on the maize B73 reference genome version v5. The information of the 20,000 SNP loci is shown in Table 1 below:
[0009] Table 1. Information on 20,000 SNP sites
[0010]
[0011]
[0012]
[0013]
[0014]
[0015]
[0016]
[0017]
[0018]
[0019]
[0020]
[0021]
[0022]
[0023]
[0024]
[0025]
[0026]
[0027]
[0028]
[0029]
[0030]
[0031]
[0032]
[0033]
[0034]
[0035]
[0036]
[0037]
[0038]
[0039]
[0040]
[0041]
[0042]
[0043]
[0044]
[0045]
[0046]
[0047]
[0048]
[0049]
[0050]
[0051]
[0052]
[0053]
[0054]
[0055]
[0056]
[0057]
[0058]
[0059]
[0060]
[0061]
[0062]
[0063]
[0064]
[0065]
[0066]
[0067]
[0068]
[0069]
[0070]
[0071]
[0072]
[0073]
[0074]
[0075]
[0076]
[0077]
[0078]
[0079]
[0080]
[0081]
[0082]
[0083]
[0084]
[0085]
[0086]
[0087]
[0088]
[0089]
[0090]
[0091]
[0092]
[0093]
[0094]
[0095]
[0096]
[0097]
[0098]
[0099]
[0100]
[0101]
[0102]
[0103]
[0104]
[0105]
[0106]
[0107]
[0108]
[0109]
[0110]
[0111]
[0112]
[0113]
[0114]
[0115]
[0116]
[0117]
[0118]
[0119]
[0120]
[0121]
[0122]
[0123]
[0124]
[0125]
[0126]
[0127]
[0128]
[0129]
[0130]
[0131]
[0132]
[0133]
[0134]
[0135]
[0136]
[0137]
[0138]
[0139] The present invention also provides a molecular probe assembly for specifically recognizing the above-mentioned SNP molecular marker assemblies.
[0140] This invention also provides the application of the above-mentioned SNP molecular marker combination or the above-mentioned molecular probe combination in any of the following: (1) constructing a predictive model for the high and low lysine content of fresh corn kernels; (2) gene selection breeding of high-quality protein fresh corn; (3) multi-trait aggregation breeding of fresh sweet corn. In this invention, the high-quality protein fresh corn is corn with high lysine content.
[0141] This invention also provides a method for constructing a predictive model for the lysine content of fresh corn kernels, comprising the following steps: detecting the genotype and lysine content of the above 20,000 SNP molecular markers in multiple corn materials, performing missing data imputation analysis of genotype data using Beagle software, inputting the detected genotype and lysine content into the rrBLUP software package, performing a genome-wide selection study, and obtaining the predictive model.
[0142] This invention does not specifically limit the method for detecting the genotypes of the 20,000 SNP molecular markers in maize materials. Methods such as resequencing, liquid chromatography-mass spectrometry (LC-MS), and KASP can be used. In this embodiment, low-depth resequencing is employed. This invention also does not specifically limit the method for detecting lysine content in maize materials; conventional methods for detecting lysine content in the art can be used. In this invention, the preferred maize materials include waxy maize inbred lines, sweet maize inbred lines, and sweet-waxy double-recessive inbred lines. In this invention, the number of maize materials required for constructing the prediction model is preferably more than 200. In this invention, when using Beagle software to perform missing data imputation analysis of genotype data and the rrBLUP software package to perform genome-wide selection studies, genotype and lysine content are used for model construction. Except for the input genotype and lysine content, all other parameters use the software's default settings.
[0143] This invention also provides a predictive model for the lysine content of fresh corn kernels, which is constructed using the method described above.
[0144] This invention also provides the application of the above-mentioned prediction model in predicting the lysine content of fresh corn kernels.
[0145] This invention also provides a method for predicting the lysine content of fresh corn kernels, comprising the following steps: identifying the genotype of the above-mentioned SNP molecular marker combination in the sample to be tested; performing missing data imputation analysis of the genotype data using Beagle software; inputting the genotype into the above-mentioned prediction model; performing a genome-wide selection study using the rrBLUP software package to obtain the breeding value; and the sample with a high breeding value indicates a high lysine content. In this invention, a high lysine content preferably refers to a lysine content higher than 0.4 g / 100 g.
[0146] This invention uses the genotypic information of the aforementioned 20,000 SNP loci to predict the phenotype of samples without phenotypic determination using a prediction model (or whole-genome selection model) constructed based on the training population of this invention. The predicted phenotypic values are then used for breeding selection. In breeding, phenotypic determination is costly; amino acid determination costs 300 yuan per sample, and sampling can only be performed after the entire growth period of the maize plant. The costs and time associated with planting and management are also high. The method provided by this invention only requires DNA extraction and genotypic detection during the seedling or seed stage to predict the breeding value (phenotypic value). Selection based on the predicted phenotype can significantly reduce breeding costs, shorten the breeding cycle, and improve breeding efficiency. In this invention, after obtaining the breeding value, the lysine content in the maize sample to be tested is predicted based on the magnitude of the breeding value; a higher breeding value indicates a higher lysine content. The breeding value mentioned in this invention is relative; that is, it compares the maize samples to be tested, with a higher value indicating a relatively higher lysine content. In this invention, after obtaining the breeding values of multiple maize samples to be tested, they are sorted from top to bottom. Preferably, the maize samples corresponding to the top 20% of the breeding values are considered to have high lysine content and have potential for further breeding.
[0147] This invention uses five-fold cross-validation to generate training and validation populations for evaluating prediction accuracy. Specifically, 205 inbred lines are selected from the training population to construct a prediction model. This model is then used to predict the phenotypic values of the remaining 51 maize inbred lines. The predicted phenotypic values are compared with the actual measured phenotypic values. This process is repeated 100 times to evaluate the prediction accuracy of the constructed model. In this experiment, the correlation coefficient between the predicted and observed values of grain protein content in the validation population generated by five-fold cross-validation is used as the prediction accuracy, calculated using the "cor()" function in R. The results show that the prediction method for lysine content in fresh maize grains constructed using the 20,000 SNPs shown in Table 1 has an average prediction accuracy of 94.65%.
[0148] This invention also provides a method for gene selection breeding of high-quality protein fresh maize, comprising the following steps: identifying the genotypes of the above-mentioned SNP molecular marker combinations in the sample to be tested, performing missing data imputation analysis of genotype data using beagle software, inputting the genotypes into the above-mentioned prediction model, and then performing whole-genome selection study using the rrBLUP software package to obtain breeding values.
[0149] The method for gene selection breeding of high-quality protein fresh corn provided by this invention is constructed based on the genotype and phenotypic data (lysine content in kernels) of the training population, using rrBLUP software. Only the genotype of each individual needs to be determined and input into the rrBLUP software; all other parameters are set to the software's default values. The breeding value is then used to predict the lysine content in the corn sample to be tested. Since corn lacks both lysine and tryptophan, two essential amino acids for the human body, corn containing lysine or tryptophan, or both, is called high-quality protein corn.
[0150] This invention does not specifically limit the method for identifying the genotype of the SNP molecular marker in the test sample. Methods such as resequencing, liquid chromatography-mass spectrometry (LC-MS), and KASP can be used. In this embodiment, low-depth resequencing is employed. Preferably, test samples with high breeding values are selected for breeding. In this invention, "high breeding value" refers to a relative value, comparing the breeding values among test samples; a higher value indicates breeding potential. In this invention, "high lysine content" refers to a lysine content higher than 0.4 g / 100 g.
[0151] The beneficial effects of this invention are:
[0152] The SNP molecular marker combination provided by this invention for predicting the lysine content of fresh corn kernels can be used for molecular marker-assisted breeding of high-quality protein (high lysine content) fresh corn, and can also be used for aggregate breeding of multiple traits in fresh sweet corn.
[0153] The method for predicting the lysine content of fresh corn kernels provided by this invention only requires testing the genotype of the breeding material to predict the content of high-quality protein (lysine) in fresh corn kernels. It has the advantages of clear selection target and no environmental influence. At the same time, the prediction of the high-quality protein (lysine) content trait provided by this invention is mainly at the harvest period of fresh corn, which can realize early detection. Attached Figure Description
[0154] Figure 1 Box plot of lysine content in grains of the breeding population;
[0155] Figure 2This is a distribution map of SNPs on the ten chromosomes of maize;
[0156] Figure 3 The images show the Manhattan plot and QQ plot of the genome-wide association analysis results for lysine content, with the left image being the Manhattan plot and the right image being the QQ plot.
[0157] Figure 4 This represents the prediction accuracy of the whole-genome prediction model for lysine. Detailed Implementation
[0158] The technical solutions provided by the present invention will be described in detail below with reference to the embodiments, but they should not be construed as limiting the scope of protection of the present invention.
[0159] Unless otherwise specified, the following embodiments are all conventional methods.
[0160] Unless otherwise specified, all materials and reagents used in the following examples are commercially available.
[0161] Example 1
[0162] 1. Fresh sweet corn breeding population planting
[0163] This experiment selected 256 representative inbred lines from core germplasm of sweet maize as research materials, including 81 waxy maize inbred lines, 157 sweet maize inbred lines, and 18 sweet-waxy double-recessive inbred lines. The population was planted in the spring of 2024 at the Zhuangxing Comprehensive Experimental Station of the Shanghai Academy of Agricultural Sciences. A completely randomized design was used, with 20 plants per material planted in two rows, 2.5 meters long, and routine field management implemented. Self-pollination was conducted using bagging. After ear maturity, three uniformly vigorous ears from each material were harvested for subsequent phenotypic analysis.
[0164] 2. Determination of Lysine Content in Grains
[0165] In this experiment, corn kernels were first ground into powder using a sample crusher, then mixed with hydrochloric acid solution and subjected to acid hydrolysis at 110°C for 24 hours. After hydrolysis, the supernatant was collected and neutralized to neutral with sodium hydroxide solution. Then, AccQ•Tag Ultra Borate buffer and AccQ•Tag reagent were added to the neutralized sample, and a derivatization reaction was initiated by heating at 55°C for 10 minutes. After the reaction, the sample solution, cooled to room temperature, was analyzed using ultra-high performance liquid chromatography-tandem high-resolution Orbitrap mass spectrometry (UHPLC-QE, Thermo, USA) to determine the lysine content in the sample.
[0166] The best linear unbiased prediction (BLUP) of lysine in each material was calculated using Meta-R software. The calculations showed that the lysine content in the kernels of 256 sweet maize breeding materials ranged from 3859 to 5350 µg / g, with an average of 4517 µg / g. Figure 1 As shown.
[0167] 3. Genotyping of related populations
[0168] Genomic DNA was extracted from the young leaves of 256 sweet corn inbred lines using the CTAB method, and whole-genome resequencing was performed using the Illumina sequencing platform. After sequencing, the raw sequencing data of the 256 sweet corn breeding materials ranged from 12.25 to 35.69 Gb, with an average sequencing data of 16.04 Gb. Subsequently, sequencing data quality control, alignment to the reference genome, removal of PCR repetitive sequences and SNP identification were performed. The data processing flow was as follows: (1) The raw sequencing data was filtered for quality control using FASTP software based on the Q20 standard; (2) The filtered data was aligned to the maize reference genome B73v5 using BWA software; (3) Repetitive sequences in the sequencing data were labeled using the Picard software package; (4) Variant sites were detected using GATK software. After obtaining the SNP loci, genotype data were filtered using bcftools software. The screening criteria included: ① retaining only biallelic SNP loci; ② minor allele frequency (MAF) > 0.05; ③ deletion rate < 5%; ④ linkage disequilibrium strength between adjacent SNPs > 0.5. A total of 886,066 SNPs were retained after filtering, as shown in Table 2. Figure 2 As shown, this is used for subsequent genome-wide association analysis.
[0169] Table 2. SNP statistics after filtering
[0170]
[0171] 4. Genome-wide association analysis of lysine content
[0172] Genome-wide association analysis (GWAS) was performed on the lysine content trait in kernels of 256 sweet maize accessions using Gapit software and a mixed linear model (with kinship matrix and population structure matrix as covariates). The significance threshold for GWAS analysis was set at 1×10⁻⁶. -5 The markers were sorted from largest to smallest effect according to the GWAS analysis results, and the top 20,000 SNP markers (as shown in Table 1) were selected for model construction and subsequent prediction. Among these 20,000 SNP markers, 17 SNP sites significantly associated with lysine content were identified (e.g., ...). Figure 3(as shown in Table 3).
[0173] Table 3. Significant SNP information from genome-wide association analysis.
[0174]
[0175] Example 2
[0176] Construction of a predictive model for the lysine content of fresh corn kernels
[0177] The 20,000 SNPs screened in Example 1 were used for model construction. Specifically, the genotypes of the 20,000 SNP markers listed in Table 1 in the 205 maize samples from Example 1 were identified using resequencing. The lysine content of each of the 205 maize samples was also detected. Missing data imputation analysis of the genotype data was performed using Beagle software. The genotypes and lysine content were then input into rrBLUP software. Genome-wide selection was performed using the rrBLUP software package (https: / / CRAN.R-project.org / package=rrBLUP) to obtain a predictive model for the lysine content of fresh maize kernels.
[0178] Example 3
[0179] The prediction model constructed in Example 2 was used to predict the lysine content in the kernels of the remaining 51 maize samples from Example 1. Specifically:
[0180] Genotypes of 20,000 SNP markers in 51 maize samples, as shown in Table 1, were identified using resequencing. Missing data imputation analysis of genotype data was performed using Beagle software. The genotypes were input into the prediction model of Example 2, and whole-genome selection was performed using the rrBLUP software package to obtain breeding values. Samples with high breeding values indicate high lysine content.
[0181] The predicted lysine content of 51 samples was compared with the actual measured lysine content. This was repeated 100 times to evaluate the prediction accuracy of the constructed model. The correlation coefficient between the predicted and observed grain protein content of the validation population generated by five-fold cross-validation was used as the prediction accuracy, calculated using the "cor()" function in R.
[0182] The results showed that the prediction method for lysine content in fresh corn kernels constructed using 20,000 SNPs as shown in Table 1 had an average prediction accuracy of 94.65% (e.g., ...). Figure 4 (As shown).
[0183] Example 4
[0184] The predictive model constructed in Example 2 was used for gene selection breeding of high-quality protein fresh maize. The specific method was as follows: the genotypes of 20,000 SNP molecular markers as shown in Table 1 in another 30 maize samples were identified by resequencing. The missing data imputation analysis of the genotype data was performed using Beagle software. The genotypes were input into the predictive model of Example 2. The whole genome selection study was performed using the rrBLUP software package (https: / / CRAN.R-project.org / package=rrBLUP) to obtain the breeding value.
[0185] The breeding values were arranged from largest to smallest, and the top 20% of the corn samples (i.e., the top 6) were identified as high-quality protein fresh corn. The lysine content in the kernels of these 30 corn samples was measured, and the results were completely consistent with the prediction made by the method of this invention. The lysine content in the kernels of these 6 corn samples was higher than that in the other 24 corn samples, and the order of the lysine content in the kernels of these 6 corn samples was consistent with the order of the breeding values measured by this invention.
[0186] The above description is only a preferred embodiment of the present invention. It should be noted that for those skilled in the art, several improvements and modifications can be made without departing from the principle of the present invention, and these improvements and modifications should also be considered within the scope of protection of the present invention.
Claims
1. A molecular probe combination, characterized in that, The molecular probe combination is used to specifically identify SNP molecular marker combinations that predict the lysine content of fresh corn kernels. The SNP molecular marker combination consists of 20,000 SNP sites located on the maize B73 reference genome version v5, as shown in Table 1 of the specification.
2. Use of the molecular probe combination according to claim 1 in any one of the following, characterized in that, (1) Construct a predictive model for the high and low lysine content of fresh corn kernels; (2) Gene selection breeding of high-quality protein fresh corn; the high-quality protein fresh corn is corn with high lysine content; the high lysine content is lysine content higher than 0.4g / 100g.
3. A method for constructing a predictive model for the lysine content of fresh corn kernels, characterized in that, The process includes the following steps: detecting the genotype and lysine content of 20,000 SNP sites in the molecular probe combination described in claim 1 in multiple maize materials, performing missing data imputation analysis of genotype data using Beagle software, inputting the detected genotype and lysine content into the rrBLUP software package, performing whole-genome selection study, and obtaining a prediction model.
4. The method according to claim 3, characterized in that, The multiple maize materials include waxy maize inbred lines, sweet maize inbred lines, and sweet-waxy double recessive inbred lines.
5. A predictive model for the lysine content of fresh corn kernels, characterized in that, It is obtained by the method described in claim 3 or 4.
6. The application of the prediction model described in claim 5 in predicting the lysine content of fresh corn kernels.
7. A method for predicting the lysine content of fresh corn kernels, characterized in that, The process includes the following steps: identifying the genotype of the SNP molecular marker combination in claim 1 in the sample to be tested; performing missing data imputation analysis of the genotype data using Beagle software; inputting the genotype into the prediction model described in claim 5; performing whole-genome selection study using the rrBLUP software package to obtain the breeding value; after obtaining the breeding value of the maize sample to be tested, sorting it from top to bottom, and considering the maize samples corresponding to the top 20% of the breeding values as maize samples with high lysine content.