Scientific and technical literature fine grit relation mining method based on two-stage syntax analysis

A two-stage relationship mining technology, applied in the field of fine-grained relationship mining of scientific and technological literature based on two-stage syntactic analysis, can solve problems such as inability to explain the relationship, indetermination, and indetermination of the specific type of relationship, etc., to reduce human and financial resources Effect

Inactive Publication Date: 2017-05-31
JINLING INST OF TECH
View PDF3 Cites 5 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0004] Existing relationship mining mostly adopts co-occurrence technology, which cannot determine the specific type of relationship. If two biological entities appear in the statistical corpus or in the same document at the same time, the two biological entities are considered to be related. However, it is only possible to determine that they are related, how they are related, and what kind of relationship they are. That is to say, if two genes, gene A and gene B, appear in a document at the same time, only the gene There is a relationship between gene A and gene B, but it is not known whether gene A activates the expression of gene B or participates in the expression of gene B, or whether it is a positive regulation or an inhibitory relationship; the relationship discovered by existing technologies is Coarse-grained, important indirect relationships cannot be found. In scientific literature, the direct relationship between two pairs is often reported, such as the relationship of A->B, B->C, and the relationship of A->C is implicit in the scientific literature. It is difficult to find this implicit relationship in the existing technology, because the relationship mined by the technology based on co-occurrence is a kind of extensive relationship
[0005] The existing relationship mining technology is a frequency-based technology in the final analysis. Although it can find out the relationship between some biological entities, it cannot explain the relationship between them; this also leads to another problem of this technology. A defect, the relationship between all biological entities is a parallel relationship, which cannot reflect the active-passive relationship between biological entities, and naturally there is no indirect relationship between biological entities
Because A->C can only be deduced by knowing A->B and B->C, and now there is a parallel relationship between A, B, and C. This relationship is extensive, and it is difficult to accurately find out A->C. indirect relationship

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Scientific and technical literature fine grit relation mining method based on two-stage syntax analysis
  • Scientific and technical literature fine grit relation mining method based on two-stage syntax analysis
  • Scientific and technical literature fine grit relation mining method based on two-stage syntax analysis

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0029] In order to deepen the understanding of the present invention, the present invention will be described in further detail below in conjunction with the accompanying drawings and embodiments, which are only used to explain the present invention and do not limit the protection scope of the present invention.

[0030] Such as figure 1 As shown, a two-stage syntactic analysis method is adopted: the first stage of syntactic analysis uses the phrase structure generated by the analysis to simplify complex long sentences in scientific and technological literature; the second stage syntactically analyzes the simplified simple sentences again, using Analyzing the generated grammatical structure to accurately extract the subject-verb-object structure of the sentence, specifically includes the following steps:

[0031] Step 1. Preprocessing, segmenting the input original document into sentences and word segmentation to obtain word sequences, also called token strings, and using Cond...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention relates to the field of technical mining in scientific and technical literature, in particular to a scientific and technical literature fine grit relation mining method based on two-stage syntax analysis. A two-stage syntax analysis method is adopted, wherein according to first-stage syntax analysis, a phrase structure generated through analysis is utilized, and long and complicated sentences in the scientific and technical literature are simplified better; according to the second stage, simple sentences obtained after simplifying are subjected to syntax analysis again, and a syntactic structure generated through analysis is utilized for accurately extracting the subject, verb and object structures of statements. According to the technical scheme, the aim of mining of the bio-entity relation in the scientific and technical literature is mainly achieved, the direct fine grit relation between bio-entities can be obtained, and the indirect relation hidden in the bio-entities can also be obtained through derivation.

Description

technical field [0001] The invention relates to the field of technology mining in scientific and technological documents, in particular, a fine-grained relationship mining method for scientific and technological documents based on two-stage syntax analysis. Background technique [0002] Relationship in the general sense refers to the interconnection between people, between people and things, and between things and things. For example, in the research on a certain disease recorded in the scientific literature, which genes are associated with the disease, and which drugs interact with the gene, etc. The expression of the relationship between entities in the scientific and technological literature is often some important experimental conclusions or experience accumulation, which has great practical significance for guiding the future. Therefore, relationship mining in scientific literature is very important. Since most of the current research documents are published in variou...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
Patent Type & Authority Applications(China)
IPC IPC(8): G06F17/27G06F17/30
CPCG06F16/35G06F16/367G06F40/211G06F40/253
Inventor 杨荣根龚乐君
Owner JINLING INST OF TECH
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products