[0023] In the following, the structure and working principle of the present invention will be further described in conjunction with the accompanying drawings.
[0024] Such as figure 1 As shown, a susceptible genotype detection method of the present invention includes the following steps:
[0025] S1. Collect the sample to be tested, capture the sequence of the exon region of the sample to be tested, and form the original sequencing data.
[0026] S2. Perform quality testing on each sequence in the original sequencing data one by one, and obtain sequences that meet the quality requirements based on the results of the quality testing to form preliminary adjustment data.
[0027] In the embodiment of the present invention, any one of the following bioinformatics software can be used for quality detection of the gene sequence: FastQC, Cutadapt, FASTX-Toolkit, bbmap.
[0028] Specifically, the quality of each sequence in the original sequencing data can be checked by calling bioinformatics software, and appropriate screening and trimming can be performed, including: removing gene sequences with N higher than 5%; deleting the quality value lower than 30 The proportion of the sequence is higher than 20%, so that its quality value meets the requirements. In the embodiment of the present invention, the original sequencing data that meets the quality requirements include sequences that contain less than 5% of N and have a base quality value of more than 30 (Q30) above 80%, and the preliminary adjustment data is only composed of Sequence composition that meets quality requirements.
[0029] In this embodiment of the present invention, before performing quality testing on each gene sequence in the original sequencing data, the sequence linker of each sequence should be removed, where the sequence linker is the sample tag of each sequence.
[0030] S3. Compare each sequence in the preliminary adjustment data with the reference genome sequence to obtain the comparison result to form the mutation detection data.
[0031] In the embodiment of the present invention, to compare each gene sequence in the preliminary adjustment data with the reference genome sequence, any one of the following bioinformatics software can be called: BWA, Samtools, Picard, GATK, QualiMAp, IGV, R.
[0032] Specifically, in order to obtain more accurate comparison results, each sequence in the preliminary adjustment data is compared with the reference genome sequence, and according to the comparison result of each sequence and the reference-based group sequence, the preliminary adjustment data is compared with the reference-based group sequence The repeated sequence deletion of the same part prevents redundant data from appearing, and the base quality score calibration of the sequence without repeated sequence after the redundant data is deleted is re-calibrated, and the final mutation detection data is obtained, which is provided for the mutation detection analysis step Raw materials.
[0033] S4. Perform mutation detection analysis on each sequence in the mutation detection data, and determine the gene sequence with the mutation gene and the mutation site corresponding to the mutation gene in the mutation detection data.
[0034] In the embodiment of the present invention, any one of the following bioinformatics software can be used to perform mutation detection and analysis on each gene sequence in the mutation detection data: GATK, BEDTools, VCFtools, bcftools, and mutation detection and analysis can be performed on single nucleotides. Polymorphism detection, insertion and deletion detection and copy number variation detection.
[0035] Specifically, the corresponding bioinformatics software can be called to perform single nucleotide polymorphism detection, insertion and deletion detection, and copy number variation detection for each sequence in the mutation detection data, to find the gene mutation site, and simultaneously detect Type of mutation.
[0036] S5. Perform functional annotation of the mutation site and determine whether the sample to be tested contains the susceptible genotype to be tested.
[0037] In the embodiment of the present invention, any one of the following bioinformatics software can be used for functional annotation of the mutation site: Annovar, SnpEff, SnpSift. Among them, the annotation database mainly includes: refGene, cytoBAnd, gwasCatalog, clinvar, dbsnp138, etc.
[0038] Specifically, by calling the bioinformatics software, functional annotations can be performed on the gene region, gene interval, and untranslated region of the mutation site. If the result of the functional annotation is found to be consistent with the susceptible genotype to be detected If they match, it is determined that the sample to be tested contains the susceptible genotype mutation site to be tested.
[0039] In the embodiment of the present invention, the susceptibility genotype to be detected is the susceptibility genotype of type I neurofibromas, including the genotype of rs1801052 being AA and the genotype of rs1129506 being AA in the type I neurofibromas genes, that is, when detecting When the sample to be tested contains any of these two genotypes, it can be considered that there is a high-risk mutation site of type I neurofibromatosis in the sample to be tested, and the patient to be tested may have type I nerve. High-risk groups of fibroids.
[0040] In another embodiment of the present invention, when it is determined that the sample to be tested contains the susceptible genotype to be tested, it is also possible to verify the test result of the sample to be tested based on the entire exome genome sequence of the patient’s immediate family members. The genetic sequence testing of immediate family members can therefore improve the accuracy of the testing results from a genetic perspective.
[0041] In another embodiment of the present invention, it is also possible to perform additional detection on gene sequences with mutant genes that have been functionally annotated to obtain one or more related detection results of mutation harmfulness, candidate genes, and protein mutations. Specifically, according to the user's requirements, the corresponding bioinformatics software can be called to complete the corresponding analysis and output the corresponding results.
[0042] Additional testing includes at least:
[0043] The analysis of the harmfulness of mutations can make the corresponding harmfulness ranking according to the influence of each mutation on the gene function;
[0044] Candidate genes and disease relevance ranking can rely on the results of database annotations to evaluate the impact of each mutation on the corresponding disease, especially the impact on NF1 disease;
[0045] Candidate gene function annotation;
[0046] Candidate gene function enrichment analysis can complete the detection and screening of candidate genes by calling function annotations and the built-in script library, and at the same time restore the biological function pathways of the selected high-confidence genes.
[0047] Protein mutation prediction can predict the impact on the three-dimensional structure from the predicted changes in the primary structure of the protein.
[0048] In another embodiment of the present invention, according to the results of steps S2-S5, the quality data report, the comparison data report, the mutation data report, and the mutation function evaluation report can also be summarized separately, and according to the quality data report, the comparison data report, and the mutation data report And the mutation function evaluation report output the susceptible genotype test result report.
[0049] Among them, while the quality of each gene sequence in the original genetic data is tested, it is evaluated whether the quality of the trimmed data meets the requirements of the subsequent analysis process to obtain a quality data report. The quality data report mainly includes sequencing fragments. Base quality distribution, four-base content distribution in sequenced fragments, GC content of sequenced fragments, etc.; while matching each gene sequence in the preliminary adjustment data with the reference genome sequence, the corresponding sequencing analysis results can be obtained. In order to obtain a comparative data report, the comparative data report includes statistical results of comparison rates, exon coverage depth and distribution, evaluation of exon region capture specificity, statistical results of insert distribution, etc., which are used to evaluate the results of sequencing experiments. It is related to the reliability of subsequent mutation site detection results; after the mutation detection and analysis of each gene sequence in the mutation detection data, the found gene mutation sites can be counted, and the corresponding Venn diagram can be drawn to obtain the mutation data Report: After functionally annotating the mutation site, a mutation function evaluation report is obtained according to the result of the functional annotation.
[0050] Finally, the quality data report, comparison data report, mutation data report, and mutation function evaluation report are integrated into written reports and data files, and a susceptible genotype test result report with professional annotations is output, and the final analysis result is displayed to the user. Data storage and backup can be performed.
[0051] According to an embodiment of the present invention, the susceptible genotype detection method of the present invention can be used to detect whether a patient has the susceptible genotype of type I neurofibromas. The detection method mainly includes: genomic DNA sample preparation, library construction, Steps such as quality inspection, comparison analysis, mutation detection, function annotation, advanced information analysis and analysis report output.
[0052] First, randomly break the genomic DNA of the sample to be tested into fragments of 150 to 200 bp, prepare multiple sequences of the sample to be tested, and then construct a library with the prepared multiple sequences of the sample to be tested. The library sequence passes through the exon region The specific biotin-labeled DNA probe is hybridized and captured by magnetic beads with capture probe function, and finally the captured sequence is eluted from the magnetic beads to obtain sequence fragments of the target region. The specific capture process refers to: SureSelectXTTarget Enrichment System for Illumina Paired-End Sequencing Library, IlluminaHiSeq and MiSeq Multiplexed Sequencing Platforms, Protocol Version 1.3.1, February 2012.
[0053] As the susceptible genotype can be used as a powerful evidence for disease diagnosis at the genetic level, it can be used as a prenatal screening method to achieve the purpose of disease prevention. Therefore, after the exome sequence of the sample to be tested is prepared, the susceptible genotype detection of type I neurofibromas can be performed according to the susceptible genotype detection method of the present invention, and the quality detection, comparison analysis, and mutation detection can be completed , Function annotation, advanced information analysis and analysis report output. Specifically, the original sequencing data undergoes data quality testing to generate a quality data report, and at the same time remove the sequence connectors of the gene sequence, filter the sequences with N higher than 5%, and filter the sequences with base quality values lower than 30 and more than 20%. The preliminary adjustment data obtained after the quality inspection process is compared and analyzed, compared with the reference genome sequence, and a comparative data report is generated. Perform mutation detection analysis on the mutation detection data obtained after comparison and analysis, and perform mutation detection such as single nucleotide polymorphism, insertion and deletion, and copy number variation. The detected mutation sites are functionally annotated to evaluate the function of the mutation sites. Finally, the patient’s NF1 gene identified the genotype of rs1801052 as AA and the genotype of rs1129506 as AA. It is believed that there is a high-risk mutation site of type I neurofibromas in the sample to be tested, and the patient is likely to have type I Neurofibromas. Finally, it is verified by referring to the whole exome sequencing data of the father and/or mother to increase the accuracy and reliability of the test results.
[0054] The above is only a schematic description of the present invention. Those skilled in the art should know that various improvements can be made to the present invention without departing from the working principle of the present invention, which all fall within the protection scope of the present invention.