The invention discloses a data classification method for single cell sequencing

What is Al technical title?
Al technical title is built by PatSnap Al team. It summarizes the technical point description of the patent document.
A technology for single-cell sequencing and data classification, applied in sequence analysis, instruments, calculations, etc., can solve problems such as lack of methods, and achieve fast and efficient classification

Active Publication Date: 2019-04-19

HAINAN UNIVERSITY

View PDF9 Cites 2 Cited by

Summary
Abstract
Description
Claims
Application Information

AI Technical Summary
This helps you quickly interpret patents by identifying the three key elements:
Problems solved by technology
Method used
Benefits of technology

Problems solved by technology

[0003] To sum up, the problem existing in the existing technology is that there is currently no corresponding method for the data classification of SPLiT-seq single-cell sequencing

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Image

Smart Image Click on the blue labels to locate them in the text.

Viewing Examples

Smart Image

Examples

Experimental program

Comparison scheme

Effect test

Embodiment 1

[0067] 1. Classification and extraction of Read2.fastq data

[0068] Such as image 3 As shown, it is the data content of Read2. Read2 is divided into 5 parts, UMI, 3 rounds of tags and cDNA, where UMI and 3 rounds of tags are used as identifiers to classify different cell sources, and cDNA is the final sequence information to be extracted .

[0069] (1) Firstly extract 3 rounds of barcode from the sequence. The specific method is to first find out the position of the characteristic sequence in the sequence, and then shift forward 8 bits to extract the corresponding barcode. When searching for the position of the characteristic sequence, the K-mer method is adopted, and a fault-tolerant mechanism is provided. After extracting 3 rounds of barcode, convert the barcode into 3 sets of numbers through the Barcode Table, and use them together as a unique identifier to determine a cell. Then append the UMI to the identity.

[0070] After the feature sequence is obtained, the barc...

Embodiment 2

[0103] Step 1, load the actual data and related files:

[0104] 2 actual data files:

[0105] R1.fastq

[0106] R2.fastq

[0107] Three rounds of barcode information files:

[0108]BarcodeList

[0109] Feature information:

[0110] PrimerList

[0111] Step 2: Generate corresponding tables according to BarcodeList and PrimerList to speed up the query process: generate 3 tables according to the three rounds of information of BarcodeList, such as Image 6 :

[0112] Generate PrimerTable based on PrimerList such as Figure 7 :

[0113] The generation of PrimerTable is a linked list array generated according to the input text file PrimerList. Treat the data in PrimerList as a whole long sequence, take a fragment of length k each time, start to take the fragment from the beginning, and shift backward by 1 bit each time, mainly because the record subsequence appears in the whole s position.

[0114] Among them, each fragment is converted once, and it is regarded as a 4-ary...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Login to View More

PUM

Login to View More

Abstract

The invention belongs to the technical field of bioinformatics analysis, and discloses a data classification method for single cell sequencing. the information identification module comprises a firstsequence (Read1. Fast q); An information identification module of a second sequence (Read2. Fast q); A barcode list information loading (barcode List) module; And a primer information loading (primerList) module. According to the invention, the single cell sequencing technology SPLiT-is mainly adopted; Data of seq is classified, and information of bar codes is fully considered in the classification process, which is the first one of the single cells SPLiT-; The invention discloses a data classification method of a seq technology. A fault-tolerant comparison mechanism is added into a barcode and a feature sequence, and a base conversion function is used for converting characters into numbers for operation, so that the classification of the single cell sequencing data is quicker and more efficient.

Description

technical field [0001] The invention belongs to the technical field of bioinformatics, and in particular relates to a data classification method for single-cell sequencing. Background technique [0002] High-throughput sequencing technology (Next generation sequencing, NGS) is one of the important technologies in the field of life science research. In recent years, life science research based on high-throughput sequencing technology has been widely used at the level of individuals, organizations, and other groups, such as human whole genome sequencing. (Whole genome sequencing, WGS), transcriptome sequencing (RNA sequencing, RNA-seq), etc. Due to the widespread cellular heterogeneity in multicellular tissues, that is, there may be significant differences in genetic information such as genomes and transcriptomes of cells with the same phenotype, it is necessary to analyze and study organisms at the single-cell level. Although there are some early single-cell research methods...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Login to View More

Application Information

Patent Timeline

Login to View More

IPC IPC(8): G16B30/00G06K9/62

CPCG06F18/24

Inventor 谢尚潜刘宇枭林加论邢剑锋

Owner HAINAN UNIVERSITY

Who we serve

R&D Engineer
R&D Manager
IP Professional

Why Patsnap Eureka

Industry Leading Data Capabilities
Powerful AI technology
Patent DNA Extraction

Social media

Patsnap Eureka Blog

Learn More

PatSnap group products

Browse by: Latest US Patents, China's latest patents, Technical Efficacy Thesaurus, Application Domain, Technology Topic, Popular Technical Reports.

The invention discloses a data classification method for single cell sequencing

AI Technical Summary This helps you quickly interpret patents by identifying the three key elements: Problems solved by technologyMethod usedBenefits of technology

Problems solved by technology

Method used

Image

Examples

Embodiment 1

Embodiment 2

PUM

Abstract

Description

Claims

Application Information

AI Technical Summary
This helps you quickly interpret patents by identifying the three key elements:
Problems solved by technology
Method used
Benefits of technology