A functional prediction method for single-nucleotide genomic variation in non-coding regions

A single nucleotide variation and single nucleotide technology, applied in the field of genes, to achieve low time and economic cost effects

Active Publication Date: 2021-07-27
SOUTHEAST UNIV
View PDF3 Cites 0 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0006] Unfortunately, at present, there is no set of methods to systematically identify the binding sites of multiple transcription factors at one time and evaluate the effect of nucleotide variations located in these transcription factor binding site regions on transcription factor binding and downstream gene transcription. influences

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • A functional prediction method for single-nucleotide genomic variation in non-coding regions
  • A functional prediction method for single-nucleotide genomic variation in non-coding regions
  • A functional prediction method for single-nucleotide genomic variation in non-coding regions

Examples

Experimental program
Comparison scheme
Effect test

Embodiment 1

[0031] Embodiment 1: Accuracy experiment of the prediction method of the present invention: Calculating the influence of a single nucleotide polymorphism of DNA on a transcription factor binding site on transcription factor binding

[0032] The proto-oncogene c-MYC has a regulatory region (8q24) at a distance of 335kb. This region has a SNP (rs6983267) (genome assembly version GRCh37.p13). In the Thousand Genomes Project, 1008 Asian populations (Phase3_V1-EAS ), on the positive strand, the frequency of this site is guanine (G) is G=0.388, and the frequency of cytosine (C) is C=0.612. In the European population (thousand genomes, population size 1006), the frequency was G=0.499, T=0.501. The DNA sequence ("ATGAAAGGC") where the SNP is located is the binding site of the transcription factor TCF4, and the target gene regulated by it is c-MYC.

[0033]In the European population, genotype G has a slightly lower frequency (0.501) and genotype T has a slightly higher frequency (0.49...

Embodiment 2

[0037] Example 2 Identifying transcription factors common to 8 cell lines and predicting the function of genomic variation in transcription factor binding sites

[0038] 1. Data source:

[0039] The high-throughput sequencing data (DNase-Seq) of 8 cell lines GM12878, IMR90, MCF-7, K562, BJ, H7, HepG2 and M059J came from the data accession number of the National Center for Biotechnology Information (GEO ID: GSE32970) (Nature 2012 Sep 6;489(7414):75-82). The cell types of the eight cell lines are listed in Table 1.

[0040] Table 1

[0041]

[0042] The sources of expression data (RNA high-throughput sequencing (RNA-Seq)) of the eight cell lines are shown in Table 1. The variation data was obtained from the genome information browser (UCSC genome browser) of the University of California, Santa Cruz, and the simple variation data information (dbSNP150) of the human genome hg19 was obtained by using the Tables function. The SNP of a single base is selected according to the ...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention discloses a method for predicting the function of a single nucleotide genome variation in a non-coding region, comprising the following steps: 1) identification of an open chromatin region; 2) identification of a transcription factor binding site; 3) evaluation of a single nucleotide variation Role: Based on the site-specific frequency matrix of transcription factors, calculate the impact of single nucleotide variations located in the binding site region of transcription factors on the binding of transcription factors, and identify single nucleotide variations that significantly change the binding ability of transcription factors; further Assess the effect of single nucleotide variants by looking at the biological pathways of transcription factors' target genes. This method uses chromatin open region information and gene expression information to complete the identification of multiple transcription factors and their binding sites at one time, and realize the functional annotation of genomic variation in non-coding regions.

Description

technical field [0001] The invention belongs to the field of gene technology, and in particular relates to a method for predicting the function of a single nucleotide genome variation in a non-coding region. The invention is based on high-throughput sequencing information of open chromatin regions to identify transcription factors (transcription factors, TFs) in eukaryotic genomes. ) and their binding sites, and methods for assessing the effect of single nucleotide variations based on motifs (motifs) of transcription factor binding to DNA. Background technique [0002] All biological functions and characteristics of cells are related to the transcriptional regulation of genes. Transcriptional regulation is cell-type specific and closely related to differentiation and carcinogenesis. It is a key to analyzing the laws of cells and solving cancer problems. To analyze the transcriptional regulation of genes, the first task is to identify the binding sites (TFBS) of various trans...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
Patent Type & Authority Patents(China)
IPC IPC(8): G16B20/20G16B30/00
Inventor 刘宏德孙啸罗坤马伟恒
Owner SOUTHEAST UNIV
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products