A Method for Analyzing High-Throughput Sequencing Gene Expression Levels Using Text Alignment

What is Al technical title?
Al technical title is built by PatSnap Al team. It summarizes the technical point description of the patent document.
A gene expression level, high-throughput technology, applied in the field of bioinformatics, can solve problems such as large differences and differences in results, and achieve the effect of reducing workload, simple method, and simple and fast splicing

Active Publication Date: 2022-01-25

FOSHAN UNIVERSITY

View PDF5 Cites 2 Cited by

Summary
Abstract
Description
Claims
Application Information

AI Technical Summary
This helps you quickly interpret patents by identifying the three key elements:
Problems solved by technology
Method used
Benefits of technology

Problems solved by technology

[0003] At present, the existing methods for determining gene expression levels through high-throughput sequencing include CLC, Trinity, SOAP, Oases, ABySS, NextGENe, TopHAT, RSEM, eXpress, Sailfish, kallisto, NURD, etc. These methods are still being improved. Each method has its own characteristics and different algorithm principles, and the results obtained by different methods are obviously different (the results of the same algorithm with different setting parameters are also very different), therefore, it is necessary to develop a method suitable for analyzing the gene expression level of high-throughput sequencing still necessary

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Image

Smart Image Click on the blue labels to locate them in the text.

Viewing Examples

Smart Image

Examples

Experimental program

Comparison scheme

Effect test

Embodiment 1

[0062] Example 1 Camellia high-throughput sequencing

[0063] The company provides sequencing services for the fully developed leaves and petals of unopened buds of camellia flowering branches during the flowering period, including total RNA extraction and library construction, paired-end sequencing (Paired-End, Illumina HiSeq 4000). The sequence format is fastq, submit 6G high-quality data (clean data) and 7G unprocessed raw sequencing data (raw data), each sequence length is 150mer, and merge the double-ends to obtain about 50 million sequencing sequences for each sample.

Embodiment 2

[0064] Example 2 Sequencing sequences are numbered, broken up, and randomly combined

[0065] Extract the high-throughput sequencing sequences obtained in Implementation 1, and only keep the sequences. Each sequence is numbered. There are 50 million sequences in this sequence (about 25 million sequences are generated by paired-end sequencing respectively), and the sequencing of one end is from the first sequence. The serial numbers from the sequence to the 25 millionth sequence are 00000001-25000000. Then use step-by-step random sorting every 100,000 → 1 million → 50,000, and merge the sequences in a random way, cut the sequence documents, sort them randomly, and merge all the sequences into one document. Among them, each 1 million sequence documents cut according to 1 million pieces are divided into several directories, and then randomly sorted every 10,000 pieces and then randomly merged to obtain 1 million sequences, and the documents obtained from all directories are rando...

Embodiment 3

[0066] Example 3 Among 1 million sequences, 100,000 are selected as query sequences for comparison, and the expression level of each query sequence is obtained

[0067] Randomly select the 1,000,000 fragmented sequences in Example 2 of the above steps, divide the 1,000,000 sequences into every 100,000 sequences, and randomly select one 100,000 sequences as the query sequence.

[0068] In the above 100,000 query sequences, for each sequence, perform the following operations:

[0069] 1. Take 20 consecutive nucleotide sequences (20mer) every 5 nucleotides, and each query sequence can be divided into 27x20mer and short sequences;

[0070] 2. In each query sequence, randomly select 9 short sequences of 20mer;

[0071] 3. At least 9 randomly selected short sequences of 20mer are used to match and compare with 1 million sequences. At the same time, the complementary strands of at least 9 20mer short sequences are also matched and compared with 1 million sequences, and the matching ...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Login to View More

PUM

Login to View More

Abstract

The invention belongs to the field of bioinformatics and provides a method for analyzing the gene expression of high-throughput sequencing sequences. First, the sequencing sequences are coded, broken up, and randomly combined, and 100,000 sequences are selected as query sequences and respectively compared with 1 million sequences. The sequence was compared, and nine groups of 20mers were randomly selected from each query sequence, and the number of transcripts of the sequence was obtained after deduplication of 1 million sequences. The first and last 20mers of the query sequence were used to assemble from the matched aligned contigs. The expression amount of the spliced sequence is obtained by merging the expression amounts of all query sequence groups, which is equivalent to the expression amount of the negative strand obtained by alignment with the complementary strand. This method can be effectively used in the analysis of high-throughput sequencing gene expression and sequence de novo assembly.

Description

technical field [0001] The invention belongs to the field of bioinformatics, and relates to a method of using the command line of an open source operating system for text matching, performing similarity comparison on short nucleotide sequences obtained by high-throughput sequencing, and splicing matched contiguous sequence groups. Analytical methods for analyzing gene expression levels in individual tissues of organisms. Background technique [0002] High-throughput sequencing technology simultaneously sequences millions of DNA molecules, making it possible to conduct detailed and comprehensive analysis of the transcriptome and genome in a species or sample. At present, commonly used high-throughput sequencing technologies mainly include Roche / 454, ABI / SOLID sequencing technology, Illumina / Solexa sequencing technology, single-molecule sequencing technology, and IonTorrrent sequencing technology. RNA-Seq high-throughput sequencing, also known as transcriptome sequencing, is ...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Login to View More

Application Information

Patent Timeline

Login to View More

Patent Type & Authority Patents(China)

IPC IPC(8): G16B30/10

CPCG16B30/00

Inventor 宋东光

Owner FOSHAN UNIVERSITY

Features

R&D
Intellectual Property
Life Sciences
Materials
Tech Scout

Why Patsnap Eureka

Unparalleled Data Quality
Higher Quality Content
60% Fewer Hallucinations

Social media

Patsnap Eureka Blog

Learn More

Browse by: Latest US Patents, China's latest patents, Technical Efficacy Thesaurus, Application Domain, Technology Topic, Popular Technical Reports.

A Method for Analyzing High-Throughput Sequencing Gene Expression Levels Using Text Alignment

AI Technical Summary This helps you quickly interpret patents by identifying the three key elements: Problems solved by technologyMethod usedBenefits of technology

Problems solved by technology

Method used

Image

Examples

Embodiment 1

Embodiment 2

Embodiment 3

PUM

Abstract

Description

Claims

Application Information

AI Technical Summary
This helps you quickly interpret patents by identifying the three key elements:
Problems solved by technology
Method used
Benefits of technology