Unlock instant, AI-driven research and patent intelligence for your innovation.

Ontology-based protein/gene synonym table construction method

A construction method and protein technology, which is applied in the field of ontology-based protein/gene thesaurus construction, can solve problems such as lack of authority, different case specifications, and unmatched accuracy of the synonym set, and achieve accurate literature data mining and classification information Detailed, accurate and reliable results

Active Publication Date: 2020-09-25
SHANDONG COMP SCI CENTNAT SUPERCOMP CENT IN JINAN
View PDF4 Cites 2 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

The focus of each data source is different, which will inevitably lead to a lack of comprehensiveness in the collation and collection of synonyms
[0008] (2) Insufficient authority of the synonym set
[0009] The thesaurus obtained by automated means is generally slightly better than the existing public vocabulary sets in general and comprehensiveness, but the accuracy rate is always difficult to match the public vocabulary sets that have been verified manually, there will be deviations and errors, and the authority cannot be guaranteed
[0010] (3) Not differentiated by species
[0013] Another difference is that synonyms with the same spelling have different capitalization rules according to different species.

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Ontology-based protein/gene synonym table construction method
  • Ontology-based protein/gene synonym table construction method
  • Ontology-based protein/gene synonym table construction method

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0045] The present invention will be further described below in conjunction with the accompanying drawings and embodiments.

[0046] Existing protein / gene synonym lists are difficult to guarantee authoritative and comprehensive issues. For biomedical literature using automated means to extract synonyms, no matter how accurate the model is, it is difficult to ensure the comprehensiveness of the constructed synonym list without data benchmarking; In the case of long-term use and verification, it is also difficult to verify the authority of the thesaurus.

[0047] Data extraction and entity alignment issues. The technical solution adopted in this article selects the three most commonly used and authoritative data sources in the field of biomedicine: Uniprot-Swissprot, BioGRID and NCBI Gene as the data basis for the construction of synonyms. However, these three data sources have different data fields, different storage formats, different protein / gene names as the main names, an...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

PUM

No PUM Login to View More

Abstract

The invention discloses an ontology-based protein / gene synonym table construction method, which comprises the following steps: a) obtaining data sources Uniprot, BioGRID and NCBI Gene; b) segmenting adata file; c) establishing an upper-layer ontology; d) mapping and fusing Uniprot-Switch to the upper-layer ontology; e) mapping and fusing the BioGRID to the upper-layer ontology; f) mapping and fusing the NCBI Gene to the upper-layer ontology; and g) carrying out duplicate removal on synonyms. According to the protein / gene synonym table construction method, the protein / gene synonym table whichis comprehensive in synonym scale, reliable in accuracy and fine in classification information is established, premise and guarantee are provided for efficient and accurate literature data mining, andthe protein / gene synonym table construction method is a powerful assistance for scientific research discovery of biomedical experts.

Description

technical field [0001] The present invention relates to a method for constructing an ontology-based protein / gene thesaurus, more specifically, to a method for constructing an ontology-based protein / gene thesaurus using Uniprot-Swissprot, BioGRID and NCBI Gene as data sources. Background technique [0002] Synonyms, or more technically called synonyms, are a phenomenon that exists in every language in the world. It refers to words that express the same or similar meanings but have different expressions. Decades of biological research practice have led to inconsistent use of some terms, with a large number of "synonymous" terms in the dataset. Using different vocabularies to describe the same gene function hinders the search for common features across multiple proteins and species. To make matters worse, the same term is used by scholars in different branches of biology to describe different molecules. Such unorganized data accumulation and acquisition will bring great chal...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

Application Information

Patent Timeline
no application Login to View More
IPC IPC(8): G16B40/00G06F40/247G06F16/36
CPCG16B40/00G06F40/247G06F16/367
Inventor 王小红窦方坤赵志刚王鑫杨帅曹皓伟
Owner SHANDONG COMP SCI CENTNAT SUPERCOMP CENT IN JINAN