Probe combination for capturing mammalian mitochondrial genome and application thereof

By designing 24,537 probe combinations and hybridization capture technology, the problem of low capture efficiency of mammalian mitochondrial genomes in existing technologies has been solved. This has enabled efficient capture of mitochondrial genomes of various mammals, especially the effective acquisition of low-concentration and highly degraded samples, reducing costs and improving the feasibility of large-sample studies.

CN122256337APending Publication Date: 2026-06-23KUNMING INST OF ZOOLOGY CHINESE ACAD OF SCI

Patent Information

Authority / Receiving Office
CN · China
Patent Type
Applications(China)
Current Assignee / Owner
KUNMING INST OF ZOOLOGY CHINESE ACAD OF SCI
Filing Date
2026-05-26
Publication Date
2026-06-23

AI Technical Summary

Technical Problem

Existing technologies lack efficient capture methods for a wide range of mammalian mitochondrial genomes. Probe design is costly and difficult to adapt to low-concentration or highly degraded samples, resulting in low sequencing efficiency and high cost.

Method used

A probe array comprising 24,537 probes covering the mitochondrial genomes of multiple mammalian species was designed. Specific enrichment was achieved through hybridization capture technology, and hybridization capture libraries were constructed by combining conventional DNA extraction, fragmentation, end repair, adapter ligation, and PCR amplification steps. Finally, sequencing and splicing were performed, making it suitable for multiplexed sample mixing.

Benefits of technology

It achieves efficient capture of mitochondrial genomes from multiple mammals, reduces the cost of obtaining complete sequences from a single sample, is applicable to samples with low concentrations and high degradation, improves the feasibility of large-scale studies, and reduces sequencing costs.

✦ Generated by Eureka AI based on patent content.

Smart Images

  • Figure CN122256337A_ABST
    Figure CN122256337A_ABST
Patent Text Reader

Abstract

The application belongs to the field of molecular biology and biotechnology, and particularly relates to a probe combination for capturing a mammalian mitochondrial genome and application thereof. The nucleotide sequences of the probe combination are the sequences shown in SEQ ID NO: 1~24537 in turn. The probe combination has good capture efficiency on the mitochondrial genome of mammals, and can obtain a complete mitochondrial genome sequence. Moreover, the probe combination has very high sensitivity, and for low-concentration DNA and highly degraded difficult samples, more than 90% of the samples can still capture the target sequence. The application can mix 4~16 mammalian sample fragmented whole genome libraries, capture, sequence and splice by using the probe combination, obtain the corresponding mitochondrial genome sequence, reduce the cost of obtaining the mtDNA complete sequence of a single sample by 70~75%, and greatly improve the feasibility of large sample quantity research.
Need to check novelty before this filing date? Find Prior Art

Description

Technical Field

[0001] This invention belongs to the fields of molecular biology and biotechnology, specifically relating to a probe array for capturing the mitochondrial genome of mammals and its applications. Background Technology

[0002] The mammalian mitochondrial genome (mtDNA), approximately 15–17 kbp in length, contains unique genetic material and is an ideal molecular marker for surveying native wildlife resources. Its main advantages include: the abundance of mitochondria in cells, making it relatively easy to extract sufficient mtDNA for PCR amplification and subsequent sequencing even in degraded or scarce samples. This characteristic is particularly suitable for non-ideal samples collected in the field, ensuring data reliability and ease of acquisition. mtDNA is primarily maternally transmitted and, in most cases, does not undergo recombination, thus preserving relatively complete genetic information. This feature allows researchers to use mtDNA to trace maternal lineages, reveal population migration pathways and historical bottleneck events, providing strong evidence for reconstructing population history and evolutionary dynamics. Compared to the nuclear genome, mtDNA has a higher mutation rate, making it more sensitive in distinguishing closely related populations and capturing recent evolutionary events. This characteristic makes mtDNA particularly effective in assessing genetic diversity and local adaptations within a population, contributing to understanding the evolutionary process of a population.

[0003] Hybrid capture sequencing (HMRS) uses a reference sequence as a template and, based on the principle of complementary base pairing, employs designed nucleic acid probes to specifically enrich target regions in a DNA library before sequencing. This technology selectively captures genomic regions of interest without sequencing the entire genome, significantly improving sequencing depth and accuracy while reducing costs and data analysis complexity. mtDNA sequences are relatively conserved, allowing for the design of multiple probes for different subspecies within a single species to capture mtDNA from a single species; alternatively, increasing the number of designed probe species can capture mtDNA from multiple species. However, most mtDNA studies using this technology are limited to probe design for single species and their closely related genera, lacking validation of capture efficiency for a broader range of taxa; or attempts to fully cover published mammalian mtDNA sequences lead to a sharp increase in probe design costs. Summary of the Invention

[0004] The purpose of this invention is to overcome the shortcomings of existing technologies and propose a highly efficient and universal probe combination for capturing mammalian mitochondrial genomes and its application. This combination can obtain complete mammalian mitochondrial genome sequences with high specificity and sensitivity. It is suitable for difficult samples with low concentrations of DNA and high degradation, reduces the cost of obtaining complete mtDNA sequences from a single sample, and achieves effective capture of the entire mitochondrial genome of a broad spectrum of mammalian groups.

[0005] This invention provides a probe combo for capturing the mammalian mitochondrial genome, the probe combo comprising 24,537 probes, the nucleotide sequences of which are shown in SEQ ID NO:1~24537.

[0006] This invention provides a kit or chip for capturing the mammalian mitochondrial genome, comprising the probe combination described in the above technical solution.

[0007] This invention provides the application of the probe combination described in the above-described technical solution or the kit or chip described in the above-described technical solution in capturing mammalian mitochondrial genomic DNA.

[0008] This invention provides a method for capturing the mitochondrial genome of mammals, comprising the following steps: capturing, sequencing, and assembling the probe combination described in the above technical solution.

[0009] Preferably, it includes the following steps: Genomic DNA was extracted from mammalian samples; Construct a fragmented whole-genome library based on the genomic DNA; The fragmented whole genome library and probes were combined and hybridized to construct a hybridization capture library; The hybridization capture library was sequenced and assembled to obtain the mammalian mitochondrial genome sequence.

[0010] Preferably, the step of constructing a fragmented whole genome library includes: fragmenting the genomic DNA with a fragmentation enzyme, performing end repair and adapter ligation sequentially, obtaining the ligation product, performing PCR amplification, purification, and obtaining a fragmented whole genome library.

[0011] Preferably, when the genomic DNA is undegraded and has a mass ≥300ng, the number of PCR amplification cycles is 9-13, and the incubation time for end repair is 15 min; When the genomic DNA gel electrophoresis shows a drag band with a mass ≥300ng, the number of PCR amplification cycles is 9-13, and the incubation time for end repair is 10 min. When the genomic DNA gel electrophoresis shows that the main band disappears, the low molecular weight region is enriched, and the mass is ≥500ng, the number of PCR amplification cycles is 11~15, and the incubation time for end repair is 0 min. When the genomic DNA mass is <300ng, the number of PCR amplification cycles is 15-19, and the end repair incubation method is a gradient incubation of 0-10 min. The mammalian sample includes one or more of mammalian tissue, mammalian blood, and mammalian feces; The incubation temperature is 20°C.

[0012] Preferably, the total DNA mass of the fragmented whole genome library is ≤6 μg; The fragmented whole genome library includes fragmented whole genome libraries of 4 to 16 mammalian samples; When the fragmented whole-genome library is ≤8 mammalian samples, the total DNA mass of the fragmented whole-genome library for each mammalian sample is 500 ng; when the fragmented whole-genome library is >8 mammalian samples, the total DNA mass of the fragmented whole-genome library for each mammalian sample is 350 ng.

[0013] Preferably, the method further includes an accuracy verification step: performing COI amplification sequencing on the genomic DNA to obtain a COI sequence, and assessing the consistency between the mammalian mitochondrial genome sequence and the COI sequence.

[0014] Preferably, the mammals include one or more of the following: rodent mammals, lagomorph mammals, perissodactyl mammals, carnivorous mammals, cetacean mammals, primate mammals, plesiosaur mammals, manatee mammals, and proboscis mammals.

[0015] Beneficial effects: This invention provides a probe combo for capturing the mitochondrial genome of mammals. The probe combo comprises 24,537 probes, with nucleotide sequences as shown in SEQ ID NO: 1 to 24537. The probe combo provided by this invention exhibits good capture efficiency for the mitochondrial genomes of the aforementioned mammals, enabling the acquisition of complete mitochondrial genome sequences (greater than 16 kb). Furthermore, the probe combo provided by this invention possesses extremely high sensitivity, maintaining a capture rate of over 90% for difficult samples with low DNA concentrations and high degradation. It is suitable for obtaining mitochondrial DNA from valuable samples such as formalin-fixed specimens and hides in museums.

[0016] Furthermore, the present invention can construct libraries by using a multiple sample mixing method, mixing fragmented whole genome libraries of 4 to 16 mammalian samples, and using the probe combination for capture, sequencing and splicing to obtain mitochondrial genome sequences of 4 to 16 mammalian samples. This reduces the cost of obtaining complete mtDNA sequences from a single sample by 70 to 75%, greatly improving the feasibility of large-sample studies. Attached Figure Description

[0017] To more clearly illustrate the technical solutions in the embodiments of the present invention or the prior art, the accompanying drawings used in the embodiments will be briefly described below.

[0018] Figure 1 This is a visualization of probe species coverage and probe depth; the black line represents coverage and the red line represents probe depth. Figure 2 For different mixed library sequencing data volumes; Figure 3 Average sequencing depth of sequencing data from different mixed libraries Detailed Implementation

[0019] This invention provides a probe combo for capturing the mammalian mitochondrial genome, the probe combo comprising 24,537 probes, the nucleotide sequences of which are shown in SEQ ID NO:1~24537.

[0020] This invention provides a kit or chip for capturing the mammalian mitochondrial genome, comprising the probe combination described in the above technical solution.

[0021] This invention provides the application of the probe combination described in the above-described technical solution or the kit or chip described in the above-described technical solution in capturing mammalian mitochondrial genomic DNA.

[0022] This invention provides a method for capturing the mitochondrial genome of mammals, comprising the following steps: capturing, sequencing, and assembling the probe combination described in the above technical solution.

[0023] In one embodiment, the present invention provides genomic DNA from mammalian samples. In one embodiment, the mammalian sample includes one or more of mammalian tissue, mammalian blood, and mammalian feces. In one embodiment, the mammalian tissue includes mammalian skin and / or muscle, wherein the mammalian skin is a highly degraded sample. The present invention does not impose strict requirements on the specific steps for extracting genomic DNA from mammalian samples; conventional methods in the art can be used, such as DNA extraction kits. Specifically, the BGMG FastTotal DNA Extraction Kit (magnetic bead method) can be used to extract DNA from muscle samples, and the EasyPure® Blood Genomic DNA Kit (TransGen) can be used to extract DNA from blood samples.

[0024] As one embodiment, the mammals described in this invention include one or more of the following: rodent mammals, lagomorph mammals, perissodactyl mammals, carnivorous mammals, cetacean mammals, primate mammals, plesiosaur mammals, manatee mammals, and proboscis mammals.

[0025] In one embodiment, the mammals described in this invention include any one or more listed in Table 1. The probe combination provided by this invention exhibits good capture efficiency for the mitochondrial genomes of the aforementioned mammals, enabling the acquisition of complete mitochondrial genome sequences (greater than 16kb). Furthermore, the probe combination provided by this invention possesses extremely high sensitivity, maintaining a capture rate of over 90% for difficult samples with low DNA concentrations and high degradation, making it suitable for obtaining mitochondrial DNA from valuable samples such as museum formalin-fixed specimens and hides.

[0026] In one embodiment, after obtaining the genomic DNA, the present invention constructs a fragmented whole-genome library based on the genomic DNA. In one embodiment, the steps for constructing the fragmented whole-genome library according to the present invention include: fragmenting the genomic DNA with a fragmentation enzyme, then sequentially performing end repair and adapter ligation, obtaining the ligation product, performing PCR amplification, and purification to obtain the fragmented whole-genome library. The present invention does not impose strict requirements on the specific steps of genomic DNA fragmentation, end repair, adapter ligation, and PCR amplification; all are conventional operations in the art.

[0027] In one embodiment, when the genomic DNA of this invention is undegraded and has a mass ≥300 ng, the incubation time for end repair is 15 min; when the genomic DNA gel electrophoresis shows banding (i.e., slight degradation) and its mass ≥300 ng, the incubation time for end repair is 10 min; when the genomic DNA gel electrophoresis shows the main band disappearing and low molecular weight regions enriched (i.e., high degradation) and its mass ≥500 ng, the incubation time for end repair is 0 min; when the genomic DNA mass <300 ng, the incubation method for end repair is a gradient incubation of 0-10 min. In one embodiment, the incubation temperature of this invention is 20°C. This invention optimizes the incubation time for end repair based on the extraction of genomic DNA from mammalian samples, thus avoiding further fragmentation.

[0028] In one implementation, when the genomic DNA of this invention is undegraded and has a mass ≥300 ng, the number of PCR amplification cycles is 9-13; when the genomic DNA gel electrophoresis shows banding (i.e., slight degradation) and its mass ≥300 ng, the number of PCR amplification cycles is 9-13; when the genomic DNA gel electrophoresis shows the main band disappearing and low molecular weight regions enriched (i.e., highly degraded) and its mass ≥500 ng, the number of PCR amplification cycles is 11-15; and when the genomic DNA mass <300 ng, the number of PCR amplification cycles is 15-19. This invention determines the number of PCR cycles based on the extraction status of the genomic DNA, which can improve amplification efficiency.

[0029] As one implementation method, after obtaining the fragmented whole genome library, the present invention hybridizes the fragmented whole genome library and probe combination to construct a hybridization capture library.

[0030] In one embodiment, the total DNA mass of the fragmented whole-genome library described in this invention is ≤6 μg. In another embodiment, the fragmented whole-genome library comprises fragmented whole-genome libraries from 4 to 16 mammalian samples. When the number of fragmented whole-genome libraries is ≤8 mammalian samples, the total DNA mass of each mammalian sample fragmented whole-genome library is 500 ng. When the number of fragmented whole-genome libraries is >8 mammalian samples, the total DNA mass of each mammalian sample fragmented whole-genome library is 350 ng. The method provided by this invention is applicable to the construction of multiplexed mixed libraries (supporting 4 to 16 sample mixing, i.e., mixing 4 to 16 mammalian sample fragmented whole-genome libraries and then hybridizing with probes), which can reduce the cost of obtaining the complete mtDNA sequence from a single sample by 70-75%, greatly improving the feasibility of large-sample studies. This invention does not have strict requirements on the specific hybridization steps; conventional methods in the art can be used.

[0031] In one implementation method, after obtaining the hybridization capture library, the present invention sequences and assembles the hybridization capture library to obtain the mammalian mitochondrial genome sequence. The present invention does not have strict requirements on the sequencing and assembly methods; conventional methods in the art can be used.

[0032] In one embodiment, the method of the present invention further includes an accuracy verification step: performing COI amplification and sequencing on the genomic DNA to obtain a COI sequence, and assessing the consistency between the mammalian mitochondrial genome sequence and the COI sequence. In the present invention, when the mammalian mitochondrial genome sequence is 100% consistent with the COI sequence, it is considered accurate.

[0033] To further illustrate the present invention, the following detailed description, in conjunction with the accompanying drawings and embodiments, provides a probe combination for capturing the mammalian mitochondrial genome and its applications, but these descriptions should not be construed as limiting the scope of protection of the present invention.

[0034] Example 1 probe design Downloaded mammalian mtDNA reference sequences from the NCBI database, covering 152 species from 96 genera and 32 families across 9 orders of mammals (Rodentia, Lagomorpha, Perissodactylus, Carnivora, Cetacea, Primates, Pholidota, Sirenia, and Proboscidea). For each mtDNA reference sequence, a 100bp probe sequence was designed. After removing identical probe sequences, an oligonucleotide probe pool was synthesized on a microarray using high-throughput synthesis. Subsequently, biotin-labeled NTPs and nucleotide analogs were used as raw materials for in vitro amplification and preparation of RNA probes, resulting in a mammalian hybridization capture probe pool with a total of 24,537 probes. The species coverage and reference sequence information are shown in Table 1. The visualization results of the RNA probe pool species coverage and probe depth are shown below. Figure 1 The nucleotide sequence information of the probe is shown in SEQ ID NO:1~SEQ ID NO:24537 in the sequence listing, and it was prepared by Aijitaikang using in situ synthesis technology.

[0035] Table 1. Information on 152 species for the designed probes

[0036] Example 2 A method for targeted enrichment of mammalian mitochondrial genome sequences, comprising the following steps: 1. Extract genomic DNA from animal samples Genomic DNA was extracted from animal tissue, blood, or fecal samples. Integrity was checked by gel electrophoresis, and concentration was measured using a Qubit 4.0 fluorometer to ensure that the DNA quality met the requirements for subsequent library construction.

[0037] 2. Constructing fragmented whole-genome libraries The reagents used were provided by IGT® Enzyme Plus LibraryPrep Kit V3 and the adapter modules IGT® Adapter & UDI Primer. The specific steps are as follows: (1) Pretreatment: For samples with high concentrations of DNA that have not been degraded, add 300 ng of DNA and bring the total volume to 40 μL with Nuclease-Free Water; for samples with moderate concentrations of DNA that have been slightly degraded, add 300 ng of DNA and bring the total volume to 40 μL with Nuclease-Free Water; for samples with moderate concentrations of DNA that have been highly degraded, add 500 ng of DNA and bring the total volume to 40 μL with Nuclease-Free Water; for samples with extremely low DNA concentrations where the total amount cannot reach the minimum starting amount of 300 ng, add the entire amount to 40 μL of DNA extraction solution.

[0038] (2) DNA fragmentation and end repair: PCR reaction was performed by adding 10 μL Fragment&ERABuffer v3 and 10 μL Fragment&ERA Enzyme Mix v3 to the DNA sample (40 μl system). The program settings were: hot cap temperature 85℃, 4℃ 1 min → 30℃ 0-15 min → 65℃ 20 min → 4℃ storage. The incubation time at 30℃ was different for samples with different levels of degradation. Specifically: samples without degradation were incubated for 15 min; slightly degraded samples were incubated for 10 min; highly degraded samples were incubated for 0 min; and samples with extremely low concentrations where the degradation status could not be determined were incubated for 0-10 min.

[0039] (3) Connector connection: Add 5 μL of diluted Adapter, 30 μL of AdapterLigation Buffer v3 and 5 μL of Adapter Ligase v3 to the reaction system on an ice box, incubate at 20°C for 15 min and store at 4°C.

[0040] (4) Purification of ligation products: Add 55 μL of purification magnetic beads (IGT® Pure Beads), mix well, let stand at room temperature for 5 min, magnetically separate for 3 min, discard the supernatant, wash with 200 μL of 80% ethanol, let stand for 30 s and discard the supernatant, repeat twice, dry at room temperature for 3-5 min, resuspend in 22 μL of Nuclease-Free Water, mix well, let stand at room temperature for 2 min, momentarily separate, magnetically separate for 2 min, and collect the supernatant.

[0041] (5) Add UDI Primer to MGI sequencing platform and perform PCR amplification: Add 25 μL PCR Master Mix and 5 μL UDI Primer, mix well and briefly incubate for amplification. Program settings: hot cap temperature 105℃, 98℃ 1 min → (98℃ 20 s → 60℃ 30 s → 72℃ 30 s), 9~13 cycles → 72℃ 2 min → store at 4℃. For samples with different degrees of DNA degradation and different starting amounts, the number of cycles for denaturation, annealing, and extension stages is different. Specifically: samples that are not degraded or have slightly degraded and whose starting amount can reach 300 ng are subjected to 9~13 cycles; samples that are highly degraded and whose starting amount can reach 500 ng are subjected to 11~15 cycles; samples whose starting amount cannot reach 300 ng are subjected to 15~19 cycles.

[0042] (6) Purification of ligation products: Add 1.1 times the volume of purification magnetic beads (IGT® Pure Beads) to the reaction system of the previous step for magnetic bead purification. Take 1 μL of the library for Qubit quantification, and 1-2 μL of the library for gel electrophoresis to detect the size of the library fragments.

[0043] 3. Construction of Hybrid Capture Library The reagents used were provided by TargetSeq Corporation, including the TargetSeq One® Hyb&Wash Kit v2.0 cleaning module, the TargetSeq® Universal Blocking Oligo blocking module, the probe combination designed in Example 1, and the TargetSeq® Cap Beads purification magnetic beads. The specific steps are as follows: (1) Pretreatment: According to the mixing quantity requirements, multiple libraries are mixed in a total amount not exceeding 6 μg. For example, the corresponding amount of DNA fragmented libraries are mixed together for 4-mix-1, 5-mix-1, or 8-mix-1 libraries. 500 ng of DNA is added to each library, while the amount of DNA fragmented library added to each 16-mix-1 library is controlled at about 350 ng.

[0044] (2) Perform liquid-phase hybridization of the fragmented whole genome library with the probe obtained in Example 1: Take about 5 μg of purified library, add 1.8 times the volume of purified magnetic beads (IGT® Pure Beads), mix well, let stand at room temperature for 5 min, magnetically separate for 3 min, discard the supernatant, add 200 μL of 80% alcohol, let stand for 30 s, discard the supernatant, let stand at room temperature for 3~5 min, air dry the magnetic beads, add 13 μL of TargetSeq One® Hyb Buffer v2, 5 μL of Hyb Human Block, 2 μL of TargetSeq® BlockingOligo, 5 μL of RNase Block, 3 μL of Nuclease-Free Water, and 2 μL of TargetSeq® Target Probes, mix well, let stand at room temperature for 3 min, magnetically separate for 3 min, aspirate the supernatant into a new PCR tube, gently pipette to mix, briefly separate, and then perform PCR. Program settings: hot cap temperature 85℃, 80℃ for 5 min, 50℃ for 12~18 h hybridization.

[0045] (2) PCR amplification of the DNA enriched by hybridization capture: Take 50 μL of Cap Beads and magnetically separate for 1 min, discarding the supernatant; then add 180 μL of Binding Buffer and mix well. Add the prepared 180 μL of Cap Beads to the hybridization product and mix well. Place it on a vertical spin mixer at 10 rpm and bind for 30 min at room temperature. Then briefly separate and magnetically separate for 2 min, discarding the supernatant. Add 150 µL of Wash Buffer 1 to the PCR tube, mix well, resuspend the magnetic beads, replace the tube cap, place it on a vertical spin mixer, and wash for 15 min at room temperature at 10 rpm. Magnetically separate for 2 min, discard the supernatant, add 150 µL of TargetSeq One® Wash Buffer 2 v2 preheated to 50℃, mix well, briefly separate, and place it on a constant temperature shaking mixer or metal bath for 10 min at 50℃. Remove the PCR tube, briefly separate, magnetically separate for 2 min, discard the supernatant, and repeat three times. Keep the PCR tube on the magnetic rack, add 200 µL of 80% ethanol to the PCR tube, let stand for 30 s, discard the ethanol solution, and air dry the magnetic beads at room temperature. Resuspend the magnetic beads in 24 µL of nuclease-free water for subsequent PCR amplification. Add 1 µL of Post PCR Primer and 25 µL of Post PCR Master Mix to the resuspended magnetic beads, mix well, and quickly transfer to the PCR instrument. Program settings: hot cap temperature 105℃, 95℃ 1 min → (98℃ 20 s → 60℃ 30 s → 72℃ 30 s), 17 cycles → 72℃ 5 min → store at 4℃.

[0046] (3) Recover the purified DNA and perform next-generation sequencing: Add 1.1 times the volume of purified magnetic beads (IGT® PureBeads) to the reaction system of the previous step for magnetic bead purification. Take 1 μL for Qubit quantification, and perform gel electrophoresis on 1~2 μL of the library to detect the size of the library fragments. Transfer the hybridization capture product to a 1.5 ml low-adsorption centrifuge tube, label the tube with sample information, including the library type as DNA library and adapter sequence information, and perform next-generation sequencing on the BGI DNBSEQ-T7 platform.

[0047] 4. The sequencing data is assembled to obtain the mitochondrial genome sequence. The specific steps are as follows: (1) Raw data quality control: The adapter sequence was automatically identified using the FASP software, and the raw reads were initially filtered to obtain clean reads. Subsequent analyses were based on the clean reads. The data filtering parameters included: removing poly G tails; removing bases with a base mass of less than 20; removing reads with an N base content of more than 10%; and finally retaining reads with a base length greater than 30 bp.

[0048] (2) mtDNA sequence assembly and extraction of representative contigs: The filtered clean reads are assembled without parameters using Megahit, and the contig with the largest multi value is selected as the representative contig. In some cases, short fragment contigs may appear with higher multi values, so the fragment length needs to be considered when making the selection.

[0049] (3) Species identification: Upload representative contigs to NCBI-Blast for species identification. If the species identification is determined based on the highest bit sore score, there may be species identification errors. Therefore, fewer mismatches and gap open should be selected, and the length of the successfully aligned sequence should also be considered. In general, the most similar species should be judged based on the alignment length and similarity.

[0050] (4) Reassembling the mtDNA sequence: Download the reference sequence Refseq of the most similar species, align the clean reads to the corresponding reference sequence using BWA-MEM software, and use Samtools software to build an index and sort them, retaining the successfully aligned reads. Use Picard software to remove PCR duplicates, and then use Samtools to obtain the sequencing depth and coverage files, which are then visualized using the R package ggplot2. After converting the deduplicated BAM file to a FASTA file, perform parameter-free sequence assembly using Megahit, still selecting the contig with the largest multi value as the representative contig, i.e., the mtDNA sequence belonging to this sample. Reassembling after alignment with the reference sequence is necessary to minimize the number of repetitive sequences in the final assembled sequence, making the obtained sequence more accurate. If the identified species does not have a reference sequence, or the reference sequence is incomplete, the first successfully assembled complete mtDNA sequence is used as the result.

[0051] Example 3 mtDNA assembly results verification 1. Laboratory animals: 48 mammal samples.

[0052] 2. Complete mtDNA sequences were obtained following the steps in Example 2. Complete mtDNA reference sequences were obtained from 14 samples through whole-genome resequencing and used as positive controls.

[0053] 3. COI gene amplification and sequencing: The primers used for COI gene amplification were universal primers (Forward (COI-C02): 5'-AYTCAACAAATCATAAAGATATTGG-3', SEQ ID NO:24538; Reverse (Chmr4): 5'-ACYTCRGGRTGRCCRAARAATCA-3', SEQ ID NO:24539). After PCR amplification, Sanger sequencing was performed. The random peaks at both ends of the ab1 file were removed using the Sequencher and Seqman tools in the DNAstar software package. The bidirectional sequences were spliced ​​together and formed into a FASTA file, which was then uploaded to NCBI-Blast. The best matching species was determined by combining the alignment score (Bit Score), the number of mismatches (Mismatches), and the number of gaps (Gaps) to avoid misjudgments caused by relying solely on similarity.

[0054] 4. Accuracy Verification: Local BLASTN Comparison: A local database was constructed using the assembled mtDNA sequences. The COI sequence was used as the query sequence, and BLASTN was used for alignment. The number of mismatched bases and sequence similarity (Identity%) were counted to evaluate the consistency between the assembled results and the COI sequence. The results are shown in Table 2. A complete mtDNA reference sequence was obtained through whole-genome resequencing and used as a positive control to verify the global accuracy of the hybridization capture assembly results. The results are shown in Table 3.

[0055] Table 2 Comparison of mtDNA assembly sequence and COI sequence

[0056] Table 3 Comparison of mtDNA assembled sequences and whole-genome resequencing sequences

[0057] As shown in Tables 2 and 3, only one sample out of 48 had a COI sequence that differed from the probe-captured mtDNA sequence at one location, possibly because that site was a degenerate base. Of the 14 samples from which complete mtDNA sequences were obtained through whole-genome sequencing, 8 were 100% identical to the probe-captured mtDNA sequence, 5 samples had an extra repetitive sequence segment, and the rest were identical. One sample had one difference, possibly due to the degenerate base at that site. Different assembly strategies led to different assembly results. These results indicate that the assembly results of the hybridized mtDNA captured in this invention are basically consistent with the COI sequencing results and the whole-genome resequencing results.

[0058] Example 4 Construction of libraries of highly degraded samples, low-concentration samples, and contaminated samples 1. In this embodiment, three hide samples stored at room temperature from the specimen collection and animal germplasm resource bank of the Xinye Institute of Biogeography were selected as representatives of highly degraded samples; 13 samples with DNA concentrations below 1 ng / μl were selected as representatives of low-concentration difficult samples; and 13 fecal samples that may be contaminated with exogenous DNA were selected as representatives of contaminated samples. Specific information is shown in Tables 4 to 6.

[0059] 2. Differential treatment for fragmented library construction: (1) Adjustment of DNA input amount: When the concentration is appropriate, use 500 ng DNA (Nuclease-Free Water to make up to 40 μL system). When the total DNA concentration is less than 300 ng, directly make up to 40 μL. (2) Optimization of end repair time: Slightly degraded samples are incubated for 10 min; highly degraded samples are incubated for 0 min (to avoid further fragmentation); samples with very low concentration are incubated in a gradient of 0~10 min (adjusted according to electrophoresis results). (3) Adjustment of PCR cycle number: Highly degraded samples with an initial amount ≥500 ng are subjected to 11~15 cycles; samples with an initial amount <300 ng are subjected to 15~19 cycles (to improve template amplification efficiency). The remaining operation procedures are the same as in Example 2. The results are shown in Tables 4~6.

[0060] Table 4. Probe capture efficiency of skin samples

[0061] Table 5. Probe capture efficiency for low-concentration samples

[0062] Table 6 Probe capture efficiency of fecal samples

[0063] As shown in Table 4, two skin samples showed highly degraded DNA with extremely low concentrations and relatively low average sequencing depths, but all three skin samples assembled complete mtDNA sequences with a length of approximately 16,500 bp.

[0064] As shown in Table 5, the concentrations of the 13 low-concentration DNA samples were all below 1 ng / μl. Gel electrophoresis revealed that the bands were almost invisible or completely absent. However, after hybridization to capture the complete mtDNA sequence, the average sequencing depth reached over 10,000 ×, and all samples were assembled into complete mtDNA sequences with a length of approximately 16,500 bp.

[0065] As shown in Table 6, of the 13 fecal samples that may have been contaminated by external sources, 10 samples were able to assemble complete mtDNA sequences, but 2 samples failed to assemble complete mtDNA sequences, with extremely low DNA concentration and average sequencing depth. 1 sample with an average sequencing depth of only 11 also assembled a complete mtDNA sequence. 1 sample, although with an average sequencing depth of over 10000×, only assembled a 11317 bp mtDNA sequence.

[0066] The above results indicate that the probe pool of the present invention can achieve a capture efficiency of over 90% for highly degraded samples with extremely low DNA concentrations and potential contamination.

[0067] Example 5 Construction of multi-sample mixed libraries 1. Fragmented library mixing: According to the mixing quantity requirements, the individual DNA libraries that have completed fragmented library construction are combined according to the preset mixing ratio (such as 4-to-1, 5-to-1, 8-to-1, 16-to-1), and the total amount of DNA in the mixed library is controlled to be ≤6μg. For regular mixing (4-8 mixing): 500 ng of DNA is added to each sub-library; for high-density mixing (16-to-1): the amount of DNA added to each sub-library is optimized to about 350 ng.

[0068] 2. Hybridization Capture Processing: The mixed library was processed under the same conditions as in Example 2, with each sub-library identified by its paired-end index. The name of each sub-library and its corresponding paired-end index were labeled during sequencing sample delivery.

[0069] The results are as follows Figure 2 and Figure 3 As shown, in terms of the amount of data obtained after sequencing, in samples with a specified sequencing volume of 5 G, the data volume of each sub-library in a 4-mix 1 library ranged from 0.16 G to 6.74 G, accounting for 3.2% to 43.7%; the data volume of each sub-library in an 8-mix 1 library ranged from 0.18 G to 3.58 G, accounting for 1.8% to 35.2%; and the data volume of each sub-library in a 16-mix 1 library ranged from 0.04 G to 1.5 G, accounting for 0.5% to 18.2%. In samples with a specified sequencing volume of 1 G, the data volume of each sub-library in an 8-mix 1 library ranged from 0.02 G to 1.88 G, accounting for 0.4% to 40.3%; and the data volume of each sub-library in a 16-mix 1 library ranged from 0.06 G to 3.8 G, accounting for 0.6% to 37.5%. This indicates that as the number of mixed libraries increases and the amount of sequencing data decreases, the amount of data from each sub-library also decreases. In terms of average sequencing depth, as the number of mixed libraries increases, the average sequencing depth of each sub-library decreases, and the decrease in sequencing volume also leads to a decrease in depth. This shows that average sequencing depth is inversely proportional to the number of mixed libraries and directly proportional to the amount of sequencing data. In the sequencing analysis data, over 90% of the samples were able to assemble complete mtDNA fragments of approximately 16500 bp.

[0070] Example 6 Hybridization capture, COI validation, and mixed library construction of known samples 1. Laboratory animal materials Chinese pangolins frozen at -80°C Manis pentadactyla Muscle sample, Asian elephant Elephas maximus Muscle sample, snow rabbit Lepus timidus Muscle samples and wild camels Camelus ferusBlood sample.

[0071] 2. Enrich the mitochondrial genome of experimental animals according to the steps in Example 2. The specific steps are as follows: 2.1. DNA Extraction Muscle samples were cut to a size of 5mm × 5mm × 1mm, and 50~100μL of blood samples were collected. DNA was extracted from muscle samples using the BGMG Fast TotalDNA Extraction Kit magnetic bead method, and DNA was extracted from blood samples using the EasyPure® BloodGenomic DNA Kit (TransGen). The DNA quality results are shown in Table 7.

[0072] Table 7 DNA extraction concentration and degradation degree

[0073] 2.2. Construction of fragmented whole-genome libraries (1) Pretreatment: The DNA of Asian elephant samples was not degraded and had a high concentration. 300 ng of DNA was added and the system was made up to 40 μL with Nuclease-Free Water. The DNA of snow hare samples was slightly degraded and had a moderate concentration. 300 ng of DNA was added and the system was made up to 40 μL with Nuclease-Free Water. The DNA of Chinese pangolin samples was highly degraded and had a moderate concentration. 500 ng of DNA was added and the system was made up to 40 μL with Nuclease-Free Water. The DNA of wild camel samples had a very low concentration. 40 μL of DNA extraction solution was added.

[0074] (2) The database construction process is the same as in Example 2.

[0075] 3. The accuracy of probe capture was verified in accordance with Example 3, and a 4-mix 1 mixed library was constructed in accordance with Example 5. The results are shown in Tables 8 and 9.

[0076] Table 8 Results of COI sequence amplification and species identification

[0077] Table 9. Assembly of captured mtDNA sequences and species identification from 4-to-1 hybrid libraries

[0078] As can be seen from the above results, the present invention can effectively complete the capture and sequencing of mitochondrial genomes of multiple mammalian species, and achieve cross-species, high-throughput, and accurate acquisition of mammalian mitochondrial genomes. Combined with the experimental steps of commercial hybridization capture reagents and the method of the present invention, the capture and sequencing of mammalian mitochondrial genomes can be completed, which has the advantages of convenience and relatively low cost.

[0079] Although the above embodiments have provided a detailed description of the present invention, they are only some embodiments of the present invention, and not all embodiments. People can obtain other embodiments based on these embodiments without creative effort, and these embodiments all fall within the protection scope of the present invention.

Claims

1. A probe array for capturing the mammalian mitochondrial genome, characterized in that, The probe assembly comprises 24,537 probes, with nucleotide sequences shown in SEQ ID NO:1~24537.

2. A kit or chip for capturing the mitochondrial genome of mammals, characterized in that, Includes the probe assembly as described in claim 1.

3. The use of the probe combination of claim 1 or the kit or chip of claim 2 in capturing mammalian mitochondrial genomic DNA.

4. A method for capturing the mitochondrial genome of mammals, characterized in that, The process includes the following steps: capturing, sequencing, and splicing using the probe combination described in claim 1.

5. The method according to claim 4, characterized in that, Includes the following steps: Genomic DNA was extracted from mammalian samples; Construct a fragmented whole-genome library based on the genomic DNA; The fragmented whole genome library and probes were combined and hybridized to construct a hybridization capture library; The hybridization capture library was sequenced and assembled to obtain the mammalian mitochondrial genome sequence.

6. The method according to claim 5, characterized in that, The steps for constructing a fragmented whole-genome library include: fragmenting the genomic DNA with a fragmentation enzyme, performing end repair and adapter ligation sequentially, obtaining the ligation product, performing PCR amplification, purification, and obtaining a fragmented whole-genome library.

7. The method according to claim 6, characterized in that, When the genomic DNA is undegraded and has a mass ≥300ng, the number of PCR amplification cycles is 9-13, and the incubation time for end repair is 15 min. When the genomic DNA gel electrophoresis shows a drag band with a mass ≥300ng, the number of PCR amplification cycles is 9-13, and the incubation time for end repair is 10 min. When the genomic DNA gel electrophoresis shows that the main band disappears, the low molecular weight region is enriched, and the mass is ≥500ng, the number of PCR amplification cycles is 11~15, and the incubation time for end repair is 0 min. When the genomic DNA mass is <300ng, the number of PCR amplification cycles is 15-19, and the end repair incubation method is a gradient incubation of 0-10 min. The mammalian sample includes one or more of mammalian tissue, mammalian blood, and mammalian feces; The incubation temperature is 20°C.

8. The method according to claim 5, characterized in that, The total DNA mass of the fragmented whole genome library is ≤6 μg; The fragmented whole genome library includes fragmented whole genome libraries of 4 to 16 mammalian samples; When the fragmented whole genome library is made from ≤ 8 mammalian samples, the total DNA mass of the fragmented whole genome library for each mammalian sample is 500 ng. When the fragmented whole-genome library is greater than that of 8 mammalian samples, the total DNA mass of the fragmented whole-genome library for each mammalian sample is 350 ng.

9. The method according to any one of claims 4 to 8, characterized in that, The method further includes an accuracy verification step: performing COI amplification sequencing on the genomic DNA to obtain the COI sequence, and assessing the consistency between the mammalian mitochondrial genome sequence and the COI sequence.

10. The application according to claim 3 or the method according to any one of claims 4 to 9, characterized in that, The mammals include one or more of the following: rodent mammals, lagomorph mammals, perissodactyl mammals, carnivorous mammals, cetacean mammals, primate mammals, plesiosaur mammals, manatee mammals, and proboscis mammals.