Protein interactive relationship identification method based on text relationship similarity

A technology of relationship recognition and protein, applied in the field of protein interaction relationship recognition based on textual relationship similarity, to achieve comprehensive acquisition and improve accuracy

Inactive Publication Date: 2015-04-22
NANJING UNIV OF AERONAUTICS & ASTRONAUTICS
View PDF0 Cites 9 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0018] At present, there is a lack of a protein interaction recognition method that can quickly obtain protein interaction relationships and add them to protein interaction networks

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Protein interactive relationship identification method based on text relationship similarity
  • Protein interactive relationship identification method based on text relationship similarity
  • Protein interactive relationship identification method based on text relationship similarity

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0072] The specific embodiments of the present invention will be further described below in conjunction with the drawings.

[0073] Such as Figure 1 to Figure 3 As shown, figure 2 with image 3 Both represent the recognition accuracy obtained when k-nearest neighbors take different k values. The horizontal axis represents the different values ​​of k nearest neighbors, and the vertical axis represents the recognition accuracy of the system. Among them, precision: accuracy, recall: recall rate, F-score: F-value, are three technical indicators used to evaluate the accuracy of the recognition of interactive relationships .

[0074] The present invention does not need to mark the relationship between proteins in a sentence. The data (training set) with known interaction relationships during implementation is directly taken from the PPI database HPRD. In the embodiment, in order to obtain positive examples, that is, protein pairs with interaction, first extract all protein pairs fro...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention discloses a protein interactive relationship identification method based on the text relationship similarity. The protein interactive relationship identification method includes the following steps: (1) sentences of protein pair keywords in a text set are obtained, and all the sentences are gathered to obtain signature files S, wherein each protein pair is (p1, p2), and each target protein pair corresponds to the corresponding signature file; (2) the relationships between p1 and p2 are denoted through characteristic vectors; (3) the relationship similarity is calculated, wherein similarity calculation comparing is carried out on the vectors for denoting the relationships between the target protein pairs and the characteristic vectors of the protein pairs with the known interactive relationships, and the most similar characteristic vectors are found and signed to serve as signs of the target protein pairs; (4) a word similarity matrix is calculated; (5) a word similarity model is introduced into a basic relationship similarity model to form a new mixed model. By means of the protein interactive relationship identification method, according to abundant context information in a text, the interactive relationship characteristics are more comprehensively obtained, and the identification accuracy is improved.

Description

Technical field [0001] The invention relates to an automatic identification method of protein interaction relationships in biomedical documents, and in particular to a method for identifying protein interaction relationships based on text relationship similarity. Background technique [0002] Protein is the most important component of biological cells. As the embodiment of life activities, proteins do not exist in isolation, they complete most of the processes in the cell through their interaction with each other. Protein-Protein Interaction, PPI is essential for understanding the function of a single protein and the entire biological process. It is an important content of biological research and the key information to solve a large number of medical problems. Therefore, to describe the protein-protein interactions between proteins, the establishment of PPI networks has always been the core issue of the study of biological processes, which is of great significance to biological ...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
Patent Type & Authority Applications(China)
IPC IPC(8): G06F19/24
Inventor 牛耘王宇伟吴红梅魏欧
Owner NANJING UNIV OF AERONAUTICS & ASTRONAUTICS
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products