Sequencing data processing system and SMN (survival motor neuron) gene detecting system
A technology for sequencing data and processing system, which is used in electrical digital data processing, special data processing applications, and microbial determination/inspection. Reduce cost, improve ease of use, low cost effect
- Summary
- Abstract
- Description
- Claims
- Application Information
AI Technical Summary
Problems solved by technology
Method used
Image
Examples
Embodiment 1
[0068] A method for processing sequencing data, comprising the steps of:
[0069] S111: Obtain high-throughput sequencing data containing the SMN gene.
[0070] S112: Annotate all exons of the SMN2 gene in the reference genome (chromosome 5: 69344512-69373860 base pairs, exons 1 to 7) as X, and use the BWA-MEM software to compare the sequencing data with the annotated The sequence comparison of the reference genome was carried out to obtain the matching sequence in the sequencing data.
[0071] S113: Find all mutations on the SMN1 gene in the annotated reference genome from the matching sequence, and combine the difference base sites (ie SMN1 / SMN2 difference sites, located at position 70247773 on chromosome 5, where SMN1 is C, and SMN2 is T), determine all the mutation sites of the SMN gene in the sequencing data, and use the Hidden Markov method to obtain the total copy number of Exon 7 of the SMN gene, and the Hidden Markov method formula is as follows:
[0072]
[0073...
Embodiment 2
[0078] Computer simulation testing the effect of annotation positioning of the reference genome in Example 1:
[0079] By annotating the SMN2 exon sequence in the reference genome as X, the sequencing sequences of both SMN1 and SMN2 genes were accurately mapped to SMN1, and the positioning results are as follows figure 1 Shown: figure 1 The first line in the table is exon 1-7 of SMN1, and the second line is exon 1-7 of SMN2; the hollow box plot indicates that the standard reference genome (unannotated) is used for gene mapping, which is recorded as the original The reference genome (P), and the dark solid box plot indicates the gene location after the reference genome SMN2 is annotated with X, which is recorded as the reference genome after annotation (M), and the abscissa indicates four different test data sets (specifically : SR1: 48 samples; SR2: 48 samples; SR3: 48 samples; SR4: 48 samples), the ordinate represents the number of uniquely mapped sequencing sequences.
[0...
Embodiment 3
[0082] The sequence matching of the control group (sequencing data without SMN region) and the experimental group (sequencing data containing SMN region) after reference genome annotation were compared, and the detailed analysis results are shown in Table 1 and Table 2.
[0083] Table 1 is the control group: DNA capture does not contain SMN regions (ie, does not contain SMN1 and SMN2 regions); Table 2 shows the experimental group: DNA capture contains SMN regions (ie, contains SMN1 and SMN2 regions). The results of the data in Table 1 and Table 2 below show that after the reference genome annotation, the sequencing sequence that could not be uniquely matched before was successfully matched to SMN1, and the sequencing sequence that was previously matched to SMN2 was also matched to SMN1, while other regions of the genome Has little effect.
[0084] Table 1
[0085]
[0086] Table 2
[0087]
PUM
Login to View More Abstract
Description
Claims
Application Information
Login to View More 


