The invention discloses a data classification method for single cell sequencing

A technology for single-cell sequencing and data classification, applied in sequence analysis, instruments, calculations, etc., can solve problems such as lack of methods, and achieve fast and efficient classification

Active Publication Date: 2019-04-19
HAINAN UNIVERSITY
View PDF9 Cites 2 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0003] To sum up, the problem existing in the existing technology is that there is currently no corresponding method for the data classification of SPLiT-seq single-cell sequencing

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • The invention discloses a data classification method for single cell sequencing
  • The invention discloses a data classification method for single cell sequencing
  • The invention discloses a data classification method for single cell sequencing

Examples

Experimental program
Comparison scheme
Effect test

Embodiment 1

[0067] 1. Classification and extraction of Read2.fastq data

[0068] Such as image 3 As shown, it is the data content of Read2. Read2 is divided into 5 parts, UMI, 3 rounds of tags and cDNA, where UMI and 3 rounds of tags are used as identifiers to classify different cell sources, and cDNA is the final sequence information to be extracted .

[0069] (1) Firstly extract 3 rounds of barcode from the sequence. The specific method is to first find out the position of the characteristic sequence in the sequence, and then shift forward 8 bits to extract the corresponding barcode. When searching for the position of the characteristic sequence, the K-mer method is adopted, and a fault-tolerant mechanism is provided. After extracting 3 rounds of barcode, convert the barcode into 3 sets of numbers through the Barcode Table, and use them together as a unique identifier to determine a cell. Then append the UMI to the identity.

[0070] After the feature sequence is obtained, the barc...

Embodiment 2

[0103] Step 1, load the actual data and related files:

[0104] 2 actual data files:

[0105] R1.fastq

[0106] R2.fastq

[0107] Three rounds of barcode information files:

[0108]BarcodeList

[0109] Feature information:

[0110] PrimerList

[0111] Step 2: Generate corresponding tables according to BarcodeList and PrimerList to speed up the query process: generate 3 tables according to the three rounds of information of BarcodeList, such as Image 6 :

[0112] Generate PrimerTable based on PrimerList such as Figure 7 :

[0113] The generation of PrimerTable is a linked list array generated according to the input text file PrimerList. Treat the data in PrimerList as a whole long sequence, take a fragment of length k each time, start to take the fragment from the beginning, and shift backward by 1 bit each time, mainly because the record subsequence appears in the whole s position.

[0114] Among them, each fragment is converted once, and it is regarded as a 4-ary...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention belongs to the technical field of bioinformatics analysis, and discloses a data classification method for single cell sequencing. the information identification module comprises a firstsequence (Read1. Fast q); An information identification module of a second sequence (Read2. Fast q); A barcode list information loading (barcode List) module; And a primer information loading (primerList) module. According to the invention, the single cell sequencing technology SPLiT-is mainly adopted; Data of seq is classified, and information of bar codes is fully considered in the classification process, which is the first one of the single cells SPLiT-; The invention discloses a data classification method of a seq technology. A fault-tolerant comparison mechanism is added into a barcode and a feature sequence, and a base conversion function is used for converting characters into numbers for operation, so that the classification of the single cell sequencing data is quicker and more efficient.

Description

technical field [0001] The invention belongs to the technical field of bioinformatics, and in particular relates to a data classification method for single-cell sequencing. Background technique [0002] High-throughput sequencing technology (Next generation sequencing, NGS) is one of the important technologies in the field of life science research. In recent years, life science research based on high-throughput sequencing technology has been widely used at the level of individuals, organizations, and other groups, such as human whole genome sequencing. (Whole genome sequencing, WGS), transcriptome sequencing (RNA sequencing, RNA-seq), etc. Due to the widespread cellular heterogeneity in multicellular tissues, that is, there may be significant differences in genetic information such as genomes and transcriptomes of cells with the same phenotype, it is necessary to analyze and study organisms at the single-cell level. Although there are some early single-cell research methods...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
IPC IPC(8): G16B30/00G06K9/62
CPCG06F18/24
Inventor 谢尚潜刘宇枭林加论邢剑锋
Owner HAINAN UNIVERSITY
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products