Method for screening gene keywords from PubMed literatures

A technology for screening genes and keywords, applied in the field of bioinformatics, can solve problems such as low accuracy rate and high false positive rate

Active Publication Date: 2019-10-18
SOUTHERN MEDICAL UNIVERSITY +1
View PDF5 Cites 7 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0005] The method of GenCLiP to screen gene keywords is based on the high frequency co-occurrence of genes and keywords in the abstract. In the actual use process, it is found that many genes and keywords are not related, the false positive rate is high, and the accuracy rate is low.

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Method for screening gene keywords from PubMed literatures
  • Method for screening gene keywords from PubMed literatures
  • Method for screening gene keywords from PubMed literatures

Examples

Experimental program
Comparison scheme
Effect test

Embodiment

[0043]1. Obtain PubMed's annual update literature and daily update literature through MEDLINE / PubMed FTP (ftp: / / ftp.ncbi.nlm.nih.gov / pubmed / ), and extract the relevant information of the NAT1 gene from the downloaded XML file, such as PMID (PubMed ID), title, abstract information, localize the PubMed literature related to the NAT1 gene, identify the NAT1 gene name in the PubMed abstract, and correspond to the correct Entrez Gene ID (GID), determine the NAT1 gene related abstract, use Perl's Text::Sentence module segments NAT1 gene-related summaries into sentences (SIDs) and identifies NAT1 gene-related sentences. Finally, a total of 583 gene-related abstracts and 1862 gene-related sentences related to the NAT1 gene were found.

[0044] 2. Use the search engine Sphinx (http: / / sphinxsearch.com / ) to perform full-text indexing of NAT1 gene-related documents. Since Sphinx itself cannot store text fields, we combine it with MySQL. The MySQL database stores documents and their corres...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention discloses a method for screening gene keywords from PubMed literatures. The invention provides the method for screening gene keywords from a literature database. The method comprises thefollowing steps: identifying gene-related abstracts or / and gene-related sentences from abstracts of the literature database, establishing full-text indexes of gene-related literatures, screening keywords from a term library, obtaining association probability scores of the genes and the keywords through a gene-keyword association score calculation formula, and screening out the keywords closely related to the genes. According to the construction method, on the one hand, a predefined term vocabulary library is broadened, and the terms of other authoritative databases are integrated except for GO terms; on the other hand, a new keyword screening method is introduced for scoring the probability that genes and terms are associated, and the firmer gene-keyword association is screened accordingto the integration of the frequencies that the genes and keywords co-occur in abstracts and sentences.

Description

technical field [0001] The invention belongs to the technical field of biological information, and in particular relates to a method for screening gene keywords from PubMed documents. Background technique [0002] In the era of precision medicine, high-throughput methods (such as sequencing and microarrays, etc.) are commonly used to screen candidate genes related to diseases (abnormal expression, mutation, or epigenetic changes, etc.), However, the molecular mechanism involved in thousands of candidate disease-related genes has become a new challenge. Obtaining genes associated with disease-related biological events is a breakthrough for analysis. The conventional solution is to use the manually annotated database Gene Ontology (GO) for query or enrichment analysis. [0003] However, because GO annotations are manually obtained by experts from the literature, the number and scope of annotations are relatively limited, and the annotation speed is far behind the literature u...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
Patent Type & Authority Applications(China)
IPC IPC(8): G16B50/10G16B50/30G16B20/00G16B40/00
CPCG16B20/00G16B40/00G16B50/10G16B50/30
Inventor 汪佳宏章建平黄仲曦潘星华
Owner SOUTHERN MEDICAL UNIVERSITY
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products