SNP molecular marker combination for chicken breed identification and application
By using SNP molecular marker combinations based on the sixth version of the chicken reference genome and machine learning algorithms in chicken breed identification, 300 SNP loci were screened out, which solved the inaccuracy problem of traditional chicken breed identification methods and achieved efficient and low-cost chicken breed identification and classification.
Patent Information
- Authority / Receiving Office
- CN · China
- Patent Type
- Applications(China)
- Current Assignee / Owner
- CHINA AGRI UNIV
- Filing Date
- 2026-03-16
- Publication Date
- 2026-06-30
AI Technical Summary
Traditional methods for identifying chicken breeds rely on morphological characteristics, lack standardization and systematization, and are difficult to accurately distinguish breeds with significant genetic differences, leading to inaccurate identification.
Using a combination of SNP molecular markers based on the sixth version of the chicken reference genome and combined with machine learning algorithms, 300 SNP loci with strong classification ability were screened out for the identification and classification of chicken breeds.
It enables efficient and accurate identification and classification of chicken breeds, reduces the cost of typography, and is applicable to breed identification and breeding of white-feathered broilers and local Chinese chickens.
Smart Images

Figure FT_1 
Figure FT_2 
Figure FT_3
Abstract
Description
Technical Field
[0001] This invention relates to the field of molecular breeding technology, and more specifically, to SNP molecular marker combinations and their applications for chicken breed identification. Background Technology
[0002] Chickens are one of the most important livestock and poultry in my country, and also one of the earliest domesticated livestock and poultry in the country. my country is one of the countries with the richest chicken germplasm resources in the world. Studying the genetic resources and genetic diversity of local Chinese chickens and broiler chickens, and exploring the differences between different breeds and populations, helps to broaden the selection of genetic markers available for different chicken breeds, promotes the development and utilization of chicken genetic resources, and plays an important role in chicken breeding and conservation.
[0003] Each individual organism possesses a unique deoxyribonucleic acid (DNA) sequence. With increasing national emphasis on the seed industry, "DNA fingerprinting" and "DNA molecular ID cards" have become hot research topics. Traditional methods for identifying genetic resources primarily rely on morphological characteristics such as body shape, feather color, skin and bone color. These methods are relatively subjective and lack standardization and systematization. They are prone to inaccuracies when distinguishing breeds exhibiting similar external characteristics but significant genetic differences. To more efficiently and accurately identify different breeds, the combination of molecular biology techniques with various algorithmic models is a research hotspot and has already been used to identify livestock genetic resources, including species such as chickens, cattle, sheep, pigs, and dogs. Summary of the Invention
[0004] The purpose of this invention is to provide SNP molecular marker combinations and their applications for chicken breed identification.
[0005] To achieve the objectives of this invention, in a first aspect, this invention provides a combination of SNP molecular markers for chicken breed identification. The combination consists of 300 SNP molecular markers, the physical locations of which are shown in Table 1. These physical locations are determined based on the chicken sixth version reference genome GRCg6a (GCA_000002315.5). For example, 1:4724708 represents the SNP site at 4724708 bp on chicken chromosome 1, with a base polymorphism of G / A.
[0006] Table 1 Physical location information of SNP sites
[0007] Secondly, the present invention provides a probe assembly for identifying chicken breeds, the probe assembly containing reagents for detecting the 300 SNP molecular markers.
[0008] Thirdly, the present invention provides the application of the SNP molecular marker combination or the probe combination in chicken breeding.
[0009] Fourthly, the present invention provides the application of the SNP molecular marker combination or the probe combination in the identification of broiler chickens (broiler chickens B1 to B5, 5 strains) and 19 local Chinese chicken breeds.
[0010] Furthermore, the Chinese local chicken breeds include: Aba Chicken, Beijing Oil Chicken, Camellia Chicken, Shimian Grass Chicken, Chentang Black Chicken, Danzhou Chicken, Gushi Chicken, Lhoba Chicken, Lingkun Chicken, Luning Chicken, Liyang Chicken, Linzhi Tibetan Chicken, Longzi Black Chicken, Piao Chicken, Weixi Lijia Black Chicken, Taiping Chicken, Wuliang Mountain Black-bone Chicken, Yulong Snow Mountain Black-bone Chicken, Red Jungle Chicken, etc.
[0011] Fifthly, the present invention provides the application of the SNP molecular marker combination or the probe combination in the assessment of genetic diversity in chickens.
[0012] Sixthly, the present invention provides the application of the SNP molecular marker combination or the probe combination in chicken pedigree correction.
[0013] In a seventh aspect, the present invention provides the use of reagents for detecting the SNP molecular marker combinations in the preparation of probes or chips for identifying chicken breeds.
[0014] Eighthly, the present invention provides a method for screening the above-mentioned SNP molecular marker combinations: (1) The whole genome resequencing data of 24 different breeds / strains of white-feathered broiler chickens B1-B5, Aba chicken, Beijing oil chicken, Camellia chicken, Shimian grass chicken, Chentang black chicken, Danzhou chicken, Gushi chicken, Lhoba chicken, Lingkun chicken, Luning chicken, Liyang chicken, Linzhi Tibetan chicken, Longzi black chicken, Piao chicken, Weixi Lijia black chicken, Taiping chicken, Wuliangshan black-bone chicken, Yulong Snow Mountain black-bone chicken, and red junglefowl were quality controlled and compared to the reference genome of chicken version 6 GRCg6a (GCA_000002315.5). SNPs were detected using GATK 4.2.0.0 and quality controlled to remove low-quality SNPs.
[0015] (2) Using one variety / strain as the target population, the other 23 populations were merged into another comparison population. The Fst analysis was performed using VCFTools software to calculate the Fst value of each single point SNP in each population.
[0016] (3) Sort the Fst of each breed / strain of chicken obtained in the previous step from high to low according to MEAN_FAST; extract the top 500 sites of each breed / strain Fst and merge them.
[0017] (4) The linkage disequilibrium analysis strategy was used to screen and optimize the whole genome SNP markers, and the merged SNP sites were analyzed by LD analysis to extract the marker sites of all non-LD SNPs and one SNP marker site in each LD.
[0018] (5) Randomly divide the SNP information of all individuals obtained in the previous step into a training set (80%) and a test set (20%). Use the canonical correlation analysis (CCA) method to reduce the dimensionality of the SNP information, where only the top 3000 sites (n_criterion=3000) are considered, and 100 sites (n_top=100) are selected from each variety / line to obtain a subset of SNPs with strong classification ability.
[0019] (6) Further reduce the number of SNPs in the subset obtained in the previous step. Use random forest (RF), Naive Bayes (BNM) and logistic regression (LR) to perform feature selection on the SNPs selected above, and gradually increase the number of SNPs in units of 100 to test their accuracy.
[0020] (7) Validate using the selected feature sites in the test set. Calculate the classifier accuracy using a k-Nearest Neighbor (KNN) classifier. Select the combination with the highest classification accuracy and the fewest SNPs.
[0021] (8) Finally, 300 SNP sites selected by logistic regression were chosen as the SNP combination for chicken breed identification and classification as described above.
[0022] The above-mentioned SNP combination screening process for chicken breed identification and classification is described in [link to relevant documentation]. Figure 1 The accuracy rates of the three machine learning algorithms on the training and test sets are shown in the respective figures. Figure 2 , Figure 3 .
[0023] This invention provides a set of SNP molecular markers for chicken breed identification, particularly for the identification of broiler chickens and local Chinese chicken breeds, along with a liquid phase chip and its application. This SNP set contains 300 SNP loci, based on whole-genome resequencing data from 24 populations including 5 broiler chicken strains and 19 local Chinese chicken breeds. Machine learning methods are used to extract characteristic SNP loci, resulting in lower genotyping costs and facilitating widespread application by enterprises. It can serve as an efficient and accurate method for identifying and differentiating chicken breeds. Attached Figure Description
[0024] Figure 1 This is a schematic diagram of the SNP combination screening process for chicken breed identification and classification according to the present invention.
[0025] Figure 2 The accuracy of the three machine learning algorithms of this invention on the training set is shown.
[0026] Figure 3 The accuracy of the three machine learning algorithms of this invention on the test set is shown.
[0027] Figure 4 The results of principal component analysis for all breeds / strains of chickens in the preferred embodiment of the present invention are shown.
[0028] Figure 5 This is a phylogenetic tree of all breeds / strains of chickens in a preferred embodiment of the present invention.
[0029] Figure 6 The results of principal component analysis of 12000 SNPs are shown in a preferred embodiment of the present invention.
[0030] Figure 7 This is a phylogenetic tree of 3857 SNPs in a preferred embodiment of the present invention.
[0031] Figure 8 In a preferred embodiment of the present invention, after merging two Weixi Lijia black-bone chickens, two white-feathered broiler B2 strain chickens, and two Huning chickens outside the experimental population with the experimental population, the results of phylogenetic tree analysis using 300 SNPs from Example 3 were obtained. Detailed Implementation
[0032] The present invention aims to provide a combination of SNP molecular markers and liquid phase chips that are small in quantity, low in cost, and easy to operate, so as to efficiently and accurately identify and distinguish chicken breeds and strains.
[0033] Using 24 different breeds / strains of broiler chickens, including B1-B5 strains, Aba chicken, Beijing oil chicken, Camellia chicken, Shimian grass chicken, Chentang black chicken, Danzhou chicken, Gushi chicken, Lhoba chicken, Lingkun chicken, Luning chicken, Liyang chicken, Linzhi Tibetan chicken, Longzi black chicken, Piao chicken, Weixi Lijia black chicken, Taiping chicken, Wuliangshan black-bone chicken, Yulong Snow Mountain black-bone chicken, and red junglefowl, we used whole-genome resequencing data to detect SNPs and extracted characteristic SNPs from all populations using a combination of traditional methods and machine learning.
[0034] Specifically, the present invention adopts the following technical solution: This invention provides a set of SNP loci combinations for chicken breed identification and classification. The SNP loci combination consists of 300 SNP loci, the physical locations of which are determined based on the chicken sixth version reference genome GRCg6a genome sequence alignment. The physical locations of the 300 SNP loci are shown in Table 1.
[0035] This invention also provides the application of the above-mentioned SNP site combinations in the preparation of gene chips for chicken breed identification and classification.
[0036] This invention also provides the application of the above-mentioned SNP site combinations and gene chips in the identification and classification of broiler chickens and local Chinese chicken breeds.
[0037] Preferably, the application is any of the following uses: 1) Application in the identification of broiler chickens and local Chinese chicken breeds; 2) Application in the breeding of broiler chickens and local Chinese chicken breeds; 3) Application in pedigree correction of broiler chickens and local chickens.
[0038] The following examples are used to illustrate the present invention, but are not intended to limit the scope of the invention. Unless otherwise specified, the technical means used in the examples are conventional means well known to those skilled in the art, and the raw materials used are all commercially available products.
[0039] Example 1: Identification and analysis of SNP loci across the genome of 24 chicken populations TrimGalore 0.6.5 was used to remove low-quality sequences and adapter sequences from the raw sequencing data. BWA 0.7.17 was used to align the quality-controlled data to the reference genome of the sixth version of the Red Junglefowl (GRCg6a). GATK 4.2.0.0 was used for SNP detection. The detection methods yielded SNP results for 24 breeds / strains, including broiler B1, broiler B2, Red Junglefowl, Aba chicken, Chentang black chicken, and Gushi chicken, as shown in Table 2. Among local chicken breeds, the Piao chicken had the most SNPs (20,055,621), while the Beijing You chicken had the fewest (13,432,596). Among broiler strains, the B1 strain had the fewest SNPs (9,031,576), while the B5 strain had the most (13,078,706).
[0040] Table 2. Number of SNPs in different breeds / strains of chickens
[0041] The detected SNPs were annotated using SnpEFF software to analyze the distribution types and regions of SNPs in different chicken populations. Annotation revealed that most SNP variations in all chicken breeds / strains occurred in intron regions, with some distribution in intergenic and downstream regions, and fewer in exon regions, upstream regions, and other areas. SNP variations were mainly of two types: transitions (Ts) and transversions (Tv). The ratio of base transitions to base transversions (Ts / Tv) was greater than 1 in all chicken breeds / strains, and the ratio of transitions to transversions was greater than 2:1, indicating that transitions were predominant in all breeds / strains. Missense mutations and silent mutations were the main types of SNPs in all chicken breeds / strains, accounting for over 99% of all mutations. Silent mutations were more numerous than missense mutations, approximately twice the number of missense mutations, while nonsense mutations were less common.
[0042] The kinship matrix among different individuals was calculated using GCTA 64 (V 1.93.2), followed by Principal Component Analysis (PCA). The PCA plane was plotted using R. All population phylogenetic trees were constructed using Neighbor-Joining (NJ) in VCF2Dis software (https: / / github.com / BGI-shenzhen / VCF2Dis), and visualized using iTOL. The principal component analysis results (…) Figure 4 The phylogenetic tree shows that broiler breeds are more distant from local chicken breeds, while the B1 and B2 lines are more closely related, and the B4 and B5 lines are more closely related; local chicken breeds are more closely related to the red junglefowl. The phylogenetic tree shows similar results. Figure 5This study revealed complex genetic differentiation patterns in 24 chicken breeds / strains. Phylogenetic topology showed two significantly differentiated monophyletic groups: one centered on the broiler breeder lines B1-B5, exhibiting a stepwise branching structure, suggesting continuous artificial selection; the other included local breeds such as AB, GS, and LY, reflecting genetic differentiation caused by geographical isolation.
[0043] Example 2: Screening of 24 characteristic loci of chicken breeds SNPs from all populations in Example 1 were used as raw materials for screening characteristic loci of chicken breeds. One breed / strain was selected as the target population, and the other 23 breeds / strains were merged into another comparison population. Fst analysis was performed using VCFTools (V0.1.13) software. The Fst value of each single-point SNP in each population was calculated with a window size of 1 kb and a step size of 1 kb. The obtained Fst values of each breed / strain were sorted from highest to lowest according to MEAN_FAST. The top 500 loci in the Fst ranking for each breed / strain were extracted and merged using the merge parameter of PLINK 1.90, resulting in a total of 12,000 SNPs. PCA analysis revealed a similar trend to the PCA analysis of all SNP loci: five white-feathered broiler breeds clustered separately, while local chickens and red junglefowl clustered together, such as... Figure 6 As shown.
[0044] Linkage disequilibrium (LD) analysis was used to screen and optimize genome-wide SNP markers from 15,000 characteristic loci of 24 chicken breeds / strains. LD analysis was performed on merged SNP loci using PLINK 1.90, extracting markers from all non-LD SNPs and one SNP marker from each LD, resulting in 3,857 SNPs as characteristic SNP combinations for chicken breeds. Phylogenetic analysis of these 3,857 SNPs showed results similar to the phylogenetic trees of all SNP loci: five broiler breeds formed one clade, exhibiting a stepped distribution; the remaining local chicken breeds clustered according to different breeds, such as... Figure 7 As shown.
[0045] Example 3: Optimization of SNP Combinations for Chicken Breed Characteristics The chicken breed characteristic SNP combinations in Example 2 were further optimized. The SNP information of all individuals obtained in Example 2 was randomly divided into a training set (80%) and a test set (20%). Canonical Association Analysis (CCA) was used to reduce the dimensionality of the SNP information, resulting in a subset of SNPs with strong classification capabilities. The number of SNPs in the subset was further reduced. A platform was built using Python, and Random Forest (RF), Naive Bayes (BNM), and Logistic Regression (LR) methods were used to select features from the chosen SNPs. The number of SNPs was gradually increased in increments of 100, and their accuracy was observed. Finally, the selected feature sites were used for validation in the test set. The k-Nearest Neighbor (KNN) classifier was selected to calculate the classifier accuracy. The specific process is as follows: Figure 1 As shown.
[0046] A comprehensive evaluation of the model performance and data cost-effectiveness of the three algorithms on both the training and test sets is conducted. Figure 2 , Figure 3 In practical applications, logistic regression (LR) is preferred, and SNP=300 is selected as the optimal number of loci. LR achieves a test set precision of 0.9935 (close to the theoretical maximum) with SNP=300, which is 1.35% and 2.71% higher than BNM (0.9803) and RF (0.9673) with the same sample size, respectively, and no abnormal fluctuations (such as the decrease in precision at SNP=400) were observed in the subsequent high sample size stage.
[0047] Example 4: Application of SNP molecular marker combinations for chicken breed identification The 300 SNP loci from Example 3 were applied to identify chicken breeds. Blood samples were collected from two Weixi Lisu Black-boned Chickens, two broiler B2 strain chickens, and two Luzhou-Ningxia Chickens. Genomic DNA was extracted and resequencing was performed. The quality-controlled data were aligned to the reference genome of the sixth version of the Red Junglefowl (GRCg6a) using BWA 0.7.17. SNPs were detected using GATK 4.2.0.0. The 300 SNP loci from Example 3 were extracted from the whole genomes of these six chickens using Vcftools. The SNP loci of these six chickens were incorporated into the population from Example 1 using PLINK 1.90. Phylogenetic analysis of the 300 loci in this population revealed that the two Weixi Lisu Black-boned Chickens, two broiler B2 strain chickens, and two Luzhou-Ningxia Chickens could all be classified into the correct breed population. Figure 8 As shown, the red line represents newly added individuals, and the variety identification is accurate.
[0048] Although the present invention has been described in detail above with general descriptions and specific embodiments, modifications or improvements can be made to it, which will be obvious to those skilled in the art. Therefore, all such modifications or improvements made without departing from the spirit of the present invention fall within the scope of protection claimed by the present invention.
Claims
1. A combination of SNP molecular markers for chicken breed identification, characterized in that, The molecular marker combination consists of 300 SNP molecular markers, the physical locations of which are shown in Table 1 of the specification. These physical locations are determined based on the chicken sixth version reference genome GRCg6a.
2. A probe assembly for identifying chicken breeds, characterized in that, The probe assembly contains reagents for detecting the 300 SNP molecular markers of claim 1.
3. The application of the SNP molecular marker combination of claim 1 or the probe combination of claim 2 in chicken breeding.
4. The application of the SNP molecular marker combination of claim 1 or the probe combination of claim 2 in the identification of five strains of white-feathered broiler chickens B1-B5 and 19 local Chinese chicken breeds.
5. The application according to claim 4, characterized in that, The 19 Chinese local chicken breeds include: Aba Chicken, Beijing Oil Chicken, Camellia Chicken, Shimian Grass Chicken, Chentang Black Chicken, Danzhou Chicken, Gushi Chicken, Lhoba Chicken, Lingkun Chicken, Luning Chicken, Liyang Chicken, Linzhi Tibetan Chicken, Longzi Black Chicken, Piao Chicken, Weixi Lijia Black Chicken, Taiping Chicken, Wuliang Mountain Black-bone Chicken, Yulong Snow Mountain Black-bone Chicken, and Red Jungle Chicken.
6. The application of the SNP molecular marker combination of claim 1 or the probe combination of claim 2 in the assessment of genetic diversity in chickens.
7. The application of the SNP molecular marker combination of claim 1 or the probe combination of claim 2 in chicken pedigree correction.
8. The use of reagents for detecting the SNP molecular marker combination of claim 1 in the preparation of probes or chips for identifying chicken breeds.