Multi-species feature selection and unknown gene identification methods

A feature selection and multi-species technology, applied in the field of life sciences, can solve the problems of lack of comprehensiveness in the regulation of non-coding RNA expression and lack of identification standards for identifying non-coding RNAs
CN106446597AActive Publication Date: 2017-02-22TSINGHUA UNIV

Patent Information

Authority / Receiving Office
CN ยท China
Patent Type
Applications(China)
Current Assignee / Owner
TSINGHUA UNIV
Publication Date
2017-02-22

Smart Images

  • Figure 1
    Figure 1
  • Figure 2
    Figure 2
  • Figure 3
    Figure 3
Patent Text Reader

Abstract

The invention discloses a multi-species feature selection and unknown gene identification methods, and belongs to the field of life science. The multi-species feature selection method comprises the steps of performing feature valuation on a small fragment region covering a whole genome; performing tagging processing; performing feature selection in species; and performing feature selection between the species. An efficient and accurate calculation method is constructed by depending on integration of gene generality among different species; and the method is used for accurate identification and unknown gene description.
Need to check novelty before this filing date? Find Prior Art

Description

technical field

[0001] The invention relates to the field of life sciences, in particular to a method for multi-species feature selection and identification of unknown genes. Background technique

[0002] A number of tools for predicting the probability of protein-coding transcripts have been published, including CONC, CPC, PhyloCSF, RNAcode, PLEK, CNCI, CNCTDiscriminator, CPAT, HMMER, and lncRNA-ID (1-10), but the vast majority of these tools Some only used the sequence information of the transcripts. These sequence information include but not limited to: Open reading frame (Openreading frame, ORF) characteristics, such as ORF length and coverage, etc. (1,2,4,7,9); base frequency (nucleotide frequencies) characteristics, such as k-mer Sequence patterns, codon usage, etc. (1,2,5,7-9); conservation score features such as base sequence alignment or protein sequence alignment, etc. (1-4); Evolution-related features such as substitution rate and phylogenic score (7,10) and in ...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More