Text-oriented domain classification relationship automatic learning method

A classification relationship and automatic learning technology, applied in the field of automatic learning of classification relations in the text-oriented field, can solve a large number of manual marking, difficult to extend to other fields, etc., to achieve the effect of good classification relationship

Active Publication Date: 2018-06-15
ZHEJIANG UNIV
View PDF3 Cites 7 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0004] Rule-based methods rely on static language patterns (rules) that can provide high accuracy, not only require extensive domain expertise, but also require a large amount of manual labeling, which is difficult to generalize to other domains

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Text-oriented domain classification relationship automatic learning method
  • Text-oriented domain classification relationship automatic learning method
  • Text-oriented domain classification relationship automatic learning method

Examples

Experimental program
Comparison scheme
Effect test

example 1

[0050] Example 1: “Exclude patients with a current diagnosis of hepatic or renal disease. Exclude patients with severe liver disorder or kidney disease.”

[0051] The concepts identified by this statement are:

[0052] (C 1 , "liver disease", {β 1 ,β 2}),

[0053] where β 1 =("hepatic disease", s 1 ), β 2 =("liver disorder", s 2 );

[0054] (C 2 , "kidney disease", {β 3 ,β 4}),

[0055] where β 3 =("renal disease", s 1 ), β 4 =("kidney disorder", s 2 )

[0056] The realization process of the method of the present invention is described in detail below by the embodiment, as figure 1 As shown, the text-oriented domain classification relation automatic learning method of the present invention comprises the following steps:

[0057] (1) store the abstract part in the paper of xml format extracted from MEDLINE as txt format, as corpus; In the present embodiment, select the latest paper about Alzheimer's disease of MEDLINE as data source;

[0058] (2) Use the natu...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention discloses a text-oriented domain classification relationship automatic learning method. The method comprises the steps of adopting MEDLINE as a corpus library; performing term extractionand concept extraction; performing syntax similarity and semantic similarity-based five dimension similarity calculation for extracted concepts; performing weighting on the similarity of each dimension to obtain a final similarity matrix; based on this, performing hierarchical clustering to obtain an initial tree diagram; and performing corresponding pruning and cluster marking on the tree diagram to finally obtain a tree diagram reflecting a classification relationship among the concepts. According to the method, a large amount of manual marking is not required, so that the manpower and timeoverhead is saved; extracted terms and a UMLS metathesaurus of an authoritative knowledge base are mapped to obtain accurate domain concepts; by adopting a distributed method of the hierarchical clustering, and in combination with domain background knowledge, the five dimension similarity calculation is provided; and an extremal distance estimation-based unsupervised hierarchical clustering dynamic pruning method is proposed, so that the domain-related classification relationship can be better obtained.

Description

technical field [0001] The invention belongs to the field of ontology learning, in particular to a text-oriented automatic learning method for domain classification relations. Background technique [0002] Although the practicability of domain ontologies has been widely recognized in the field of biomedical research, there are still many obstacles to the effective use of domain ontologies. A very important requirement of domain ontologies is that they must achieve high coverage for domain concepts and the relationships between concepts. However, the construction of these ontologies is usually a manual, time-consuming process and prone to various errors. Limited resources lead to the loss of concepts and relationships, and at the same time increase the difficulty of updating ontology caused by knowledge changes. In addition, the construction of ontology requires the participation of domain experts. Even experts in the same field may not have the same knowledge of the knowle...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
Patent Type & Authority Applications(China)
IPC IPC(8): G06F17/30
CPCG06F16/35G06F16/36G06F16/367
Inventor 李劲松张桠童周天舒田雨王昱
Owner ZHEJIANG UNIV
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products