Supercharge Your Innovation With Domain-Expert AI Agents!

Primer design method and system based on k-mer algorithm

A design method and primer design technology, applied in the field of primer design method and system based on k-mer algorithm, can solve the problem of long design time, achieve high coverage, and improve the effect of primer design time

Active Publication Date: 2020-06-23
RES CENT FOR ECO ENVIRONMENTAL SCI THE CHINESE ACAD OF SCI +1
View PDF6 Cites 4 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0005] In view of this, the purpose of the present invention is to propose a kind of primer design method based on k-mer algorithm, to solve the long problem of the design time that existing primer design method exists

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Primer design method and system based on k-mer algorithm
  • Primer design method and system based on k-mer algorithm
  • Primer design method and system based on k-mer algorithm

Examples

Experimental program
Comparison scheme
Effect test

Embodiment 1

[0057] This embodiment is a primer design method based on the k-mer algorithm, which design method includes:

[0058] (a) construct functional gene nucleic acid sequence database, and based on k-mer algorithm, be k with primer length, and k is 17-20bp, the nucleic acid sequence in the database is respectively cut into k-mers;

[0059] (b) Select 120 k-mers according to the occurrence frequency of k-mer from high to low, and use them as primers to select k-mers;

[0060] (c) Merge the k-mers with overlap > 10 among the alternative primer k-mers, and then select 40 k-mers from the combined primer alternative k-mers according to the frequency from high to low as preliminary primers, The merging method is: retain the k-mer with the highest frequency among the k-mers with overlap > 10, if there are multiple k-mers with the highest frequency, keep the longest k-mer;

[0061] (d) Search for k-mers with a difference of one base from each preliminary primer among the k-mers with cover...

Embodiment 2

[0068] This embodiment is a primer design method based on the k-mer algorithm, which design method includes:

[0069] (a) Construct a functional gene nucleic acid sequence database, supplement the species information in the functional gene nucleic acid sequence database by gene numbering, and based on the k-mer algorithm, use the primer length as k, and k is 20bp, cut the nucleic acid sequences in the database into k-mers;

[0070] (b) Select 100 k-mers according to the occurrence frequency of k-mers from high to low, and use them as primers to select k-mers;

[0071] (c) Merge k-mers with overlap > 10 in the alternative primer k-mers, and then select 30 k-mers from high to low frequency among the combined primer alternative k-mers as preliminary primers, The merging method is: retain the k-mer with the highest frequency among the k-mers with overlap > 10, if there are multiple k-mers with the highest frequency, keep the longest k-mer;

[0072] (d) Search for k-mers with a d...

experiment example

[0078] This experimental example is to use the nitrate reduction gene napA sequence database in the nitrogen cycle to design primers. The database includes 4562 napA gene sequences that are highly reliable and identified species (species). The database is mainly Proteobacteria, including More than 80% of the Proteobacteria, and the remaining 20% ​​are composed of Firmicutes, Bacteroidetes and Chloroflexi; in addition, 58% of the sequences belong to facultative anaerobic groups in terms of oxygen demand types, 28% belonged to the aerobic group, 5% belonged to the anaerobic group; according to Gram classification, 82% of the sequences belonged to Gram-negative bacteria, and 15% belonged to Gram-positive bacteria.

[0079] Currently the most commonly used napA gene amplification primer pair is shown in SEQ ID NO: 1 and SEQ ID NO: 2. SEQ ID NO: 1 is V16cf-GCNCCNTGYMGNTTYTGYGG. In this sequence, N is A, T, C or G, and M is A or C, Y is C or T;

[0080] SEQ ID NO: 2 is: V17cr-RTGYT...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

PUM

No PUM Login to View More

Abstract

The invention provides a primer design method and system based on a k-mer algorithm, and the method comprises the steps: constructing a functional gene nucleic acid sequence database, and respectivelycutting nucleic acid sequences in the database into k-mers; selecting a plurality of k-mers from high to low according to the occurrence frequency of the k-mers, and taking the k-mers as alternativek-mers of the primers; combining the k-mergers of which the overlap is greater than x in the alternative k-mergers of the primers, and selecting a plurality of k-mergers from the combined alternativek-mergers of the primers according to the frequency from high to low to serve as preliminary primers; searching for k-mers having a difference of one base with each preliminary primer in k-mers havinga coverage degree of greater than 1%, and combining the searched k-mers with the corresponding preliminary primers in a degenerate base form to obtain degenerate primers; respectively carrying out basic information evaluation on the degenerate primers, calculating the lengths of amplification products after all the degenerate primers are paired in pairs, and screening the degenerate primers intopairs according to a basic information evaluation result and the lengths of the amplification products. According to the method, tedious steps of sequence alignment can be avoided, the primer design time is greatly prolonged, and the designed primers have higher coverage.

Description

technical field [0001] The present invention relates to the technical field of primer design, in particular to a method and system for designing primers based on a k-mer algorithm. Background technique [0002] At present, research on the composition and diversity of microbial communities generally follows the methods of sample collection, DNA extraction, PCR amplification of target gene fragments, and high-throughput sequencing. High-throughput sequencing is a highly targeted method for analyzing genetic variation in specific genomic regions, and is an ideal method for discovering single nucleotide polymorphisms (SNPs). It uses polymerase chain reaction (PCR) primers to amplify a specific region of the genome, targetedly captures the DNA in the target region, and achieves the enrichment goal of the target DNA fragment. Finally, high-throughput sequencing is performed on the amplified product to analyze information such as genetic variation in the sequence; PCR refers to a ...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

Application Information

Patent Timeline
no application Login to View More
IPC IPC(8): G16B15/00G16B30/00G16B50/00
CPCG16B15/00G16B30/00G16B50/00Y02A90/10
Inventor 邓晔吴悦妮
Owner RES CENT FOR ECO ENVIRONMENTAL SCI THE CHINESE ACAD OF SCI
Features
  • R&D
  • Intellectual Property
  • Life Sciences
  • Materials
  • Tech Scout
Why Patsnap Eureka
  • Unparalleled Data Quality
  • Higher Quality Content
  • 60% Fewer Hallucinations
Social media
Patsnap Eureka Blog
Learn More