Seaching method of genome sequence data based on characteristic

A genome sequence and search method technology, which is applied in the search field of a feature-based genome sequence database, can solve problems such as missing sequences and inability to handle long sequence fragments, and achieve the effect of improving efficiency and scalability

Inactive Publication Date: 2005-03-23
SOUTHEAST UNIV
View PDF0 Cites 20 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

For small fragment sequences, this method can be said to be relatively good, but it cannot handle long sequence fragments, and the method of sequence alignment only considers the similarity of two sequences in alphabetical order, although the current theory believes that similar The sequences have similar structures and functions, but the results of this analysis make it possible for us to miss sequences that are functionally similar but not very similar in alphabetical order when we want to find sequences with similar functions

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Seaching method of genome sequence data based on characteristic
  • Seaching method of genome sequence data based on characteristic
  • Seaching method of genome sequence data based on characteristic

Examples

Experimental program
Comparison scheme
Effect test

Embodiment 1

[0044] Example 1: Search for 100 sequences in the database that are similar in base correlation characteristics to a certain sequence.

[0045] On human chromosome 7, a sequence is selected, and its base correlation feature value is searched in the database to obtain 100 most similar sequences, such as Figure 3 ~ Figure 7 As shown, most of the 100 sequences were found to be from the human genome, and only from the 73rd sequence did some mouse genome sequences appear, indicating that the sequences in the human genome still have the characteristic of base correlation. A fairly high degree of similarity, and some fragments of the mouse genome also have a certain similarity with the human genome.

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention relates to an approach of searching gene group sequence database based on characters. Approximate sequence is searched in database according to sequence statistic character. The search method is as follows: search approximate sequence according to distance of statistic character, namely, basic information of different species' sequence group serial data, including sequence's database login number, species name, chromosome number, original data, basic group composition character and base pair relativity character is memorized in database. For any a gene snippet submitted by client, its eigenvalue is computed according to client's request and distance between the eigenvalue and all corresponding eigenvalue in database is computed to compare approximate sequence; the most approximate sequence is arranged and displayed by distance.

Description

technical field [0001] The invention is a feature-based search method for a genome sequence database, more precisely, a method for searching approximate sequences within the scope of the database according to the statistical characteristics of the sequence. Background technique [0002] Along with the implementation and completion of the Human Genome Project (HGP), the Model Organisms Genome Project and the Microbial Genome Project (MGP) are also in progress. At present, the work on the structural genome has been basically completed, but for the study of the entire genome, obtaining the sequence is only the first step. Collect, organize, and retrieve these sequences like a bible, and analyze the sequence and the structure and function of the expressed protein The ultimate goal is to find out the rules and uncover the secrets of life. These efforts are the tasks of the so-called post-genome era. Bioinformatics is an emerging science that has emerg...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
Patent Type & Authority Applications(China)
IPC IPC(8): G06F17/30
Inventor 孙啸焦典
Owner SOUTHEAST UNIV
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products