Cluster-based multi-label imbalance biomedical data classification method

A data classification and biomedical technology, applied in the multi-label field, can solve problems such as multi-label imbalanced biomedical data classification methods, multi-label imbalanced biomedical data classification performance errors, etc., to improve reliability and reduce noise data The effect of probability

Active Publication Date: 2017-04-26
CHONGQING UNIV OF POSTS & TELECOMM
View PDF6 Cites 12 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0004] In view of this, the object of the present invention is to provide a clustering-based multi-label imbalanced biomedical data classificatio...

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Cluster-based multi-label imbalance biomedical data classification method
  • Cluster-based multi-label imbalance biomedical data classification method
  • Cluster-based multi-label imbalance biomedical data classification method

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0020] The preferred embodiments of the present invention will be described in detail below with reference to the accompanying drawings.

[0021] refer to figure 1 , figure 1 A flow chart of a clustering-based multi-label imbalanced biomedical data classification method provided in this embodiment, specifically including:

[0022] 101: Define an association matrix for biomedical data according to feature similarity and label association.

[0023] A new clustering method is defined in the unbalanced multi-label data space. This clustering method not only considers the similarity between features, but also considers the association of multi-label space when clustering biomedical sample data. , and then establishes an association to define an association matrix through the similarity between features and the association of the multi-label space.

[0024] The correlation matrix refers to the correlation matrix obtained by comprehensively considering feature similarity and label...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention relates to a cluster-based multi-label imbalance biomedical data classification method. The method includes the following steps of S101 defining a relation matrix for the label imbalance data according to the feature similarity and label relevance; S102 clustering the data according to the relation matrix; S103 directionally increasing the imbalance labels in each cluster; S104 training and learning the data in each cluster by means of a multi-label classifier; and S105 combining the result of each classifier according to the polling rule and predicting the label. The data is clustered by means of a hierarchical clustering method, the label relevance is considered during the clustering to reduce the imbalance of the labels in the clusters, so that the reliability for new data generation by means of a re-sampling method is improved, and the probability for noise data is reduced.

Description

technical field [0001] The invention relates to the field of multi-label technology, in particular to a clustering-based multi-label imbalance biomedical data classification method. Background technique [0002] Multi-label learning can be called a paradigm of supervised learning. Unlike binary classification problems, multi-label learning allows samples to have multiple categories. But also different from multi-classification problems, multi-label learning allows an object to belong to multiple categories at the same time. For the classification of multi-label sample data, it means that a certain sample data contains multiple labeled labels, the entire sample data set is marked into a q-dimensional multi-label space, and the feature vector of each data sample is expressed as x i , the label vector labeled d i ={d i1 , d i2 ,...,d iq}, where d ir ∈{0,1}, 1≤r≤q, means that for each sample data, the label has the same label, 1 indicates that the sample data contains the...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
IPC IPC(8): G06K9/62G06F19/00
CPCG16H50/20G06F18/231G06F18/22G06F18/24323
Inventor 王进卜亚楠欧阳卫华谢水宁孙开伟张登峰王科李智星陈乔松邓欣胡峰雷大江
Owner CHONGQING UNIV OF POSTS & TELECOMM
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products