Medical dataset characteristic dimension reduction method based on subspace learning

A subspace learning and medical data technology, applied in the field of big data technology and machine learning, can solve the problems of not considering the importance of global discriminant information, the optimal solution cannot be solved by eigenvalue decomposition, and the eigenvector is inaccurate, etc., to achieve the goal of choosing The method is simple and achievable, convenient for feature selection, and has good robustness

Pending Publication Date: 2019-10-22
NANJING UNIV OF SCI & TECH
View PDF0 Cites 4 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

When the number of samples is much smaller than the feature dimension, the LDA method may have a singular matrix and cause the calculated eigenvectors to be inaccurate
On the contrary, LPP maintains the linear structure of the node neighborhood by constructing the adjacency graph of sample points and then calculating the weight, but it does not consider the importance of global discriminant information, so the classification effect is not good.
In addition, some existing feature dimensionality reduction methods, such as PCA and LDA, are parameterless feature dimensionality reduction methods, and both assume that the distribution of sample points obeys the Gaussian distribution, so they are very sensitive to outliers
[0005] At present, the problem of feature dimensionality reduction for high-dimensional data is modeled as an optimization problem, and its solution method often involves eigenvalue decomposition. However, some literature points out that the optimal solution of some problems cannot be solved by eigenvalue decomposition.

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Medical dataset characteristic dimension reduction method based on subspace learning
  • Medical dataset characteristic dimension reduction method based on subspace learning
  • Medical dataset characteristic dimension reduction method based on subspace learning

Examples

Experimental program
Comparison scheme
Effect test

Embodiment

[0099] The present invention is based on the medical data set feature dimensionality reduction method of subspace learning, comprises the following contents:

[0100] 1. Construct the original high-dimensional data matrix X and the label column according to the medical data set to be analyzed.

[0101] The data set used in this embodiment is the ARCENE data set, which comes from human serum mass spectrometry. The sample size of the ARCENE dataset is 900, and the feature dimension is up to 10000. The task is a binary classification problem that aims to distinguish people with cancer (labeled +1) from normal people (labeled -1). The entire dataset is merged from two prostate cancer datasets and one ovarian cancer dataset from the National Cancer Institute (NCI) and Eastern Virginia Medical School (EVMS). The data has no missing values ​​and approximately 44% of the samples are positive. The dataset consists of three parts: a training dataset with 100 samples, a validation dat...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention discloses a medical dataset characteristic dimension reduction method based on subspace learning. The method comprises the following steps of constructing an original high-dimension datamatrix X and a label column according to a to-be-analyzed medical dataset; constructing a most optimized target function, and solving a Lagrangean function thereof; according to the original high-dimension data matrix and the label column, calculating global discriminating information and local discriminating information; iterating and solving a conversion matrix Q until the target function is convergent or reaches a highest cycling number-of-times, thereby obtaining a dimension-reduced data matrix; training a model according to the calculated conversion matrix, calculating an AUC value evaluation dimension reduction matrix and classification accuracy. Compared with an existing characteristic dimension reduction method of the medical dataset, the method according to the invention is advantageous in that the local discriminating information and the global discriminating information of data are simultaneously used for performing dimension reduction; the method is suitable for the characteristic dimension reduction problem in a common scale, and relatively high classification accuracy is realized when the characteristic scale of the data is far higher than the sample scale.

Description

technical field [0001] The invention belongs to the field of big data technology and machine learning, in particular to a method for reducing the dimensionality of medical data set features based on subspace learning. Background technique [0002] Dimensionality Reduction aims to transform high-dimensional data into low-dimensional data. The emergence of feature dimensionality reduction technology is due to the fact that in machine learning problems generated in practical application scenarios, a large amount of complex high-dimensional data will be generated. The running time of most data analysis tasks increases at least linearly with the increase of data dimensions, and storing and analyzing high-dimensional data requires a large amount of computer storage resources and a lot of computing time. And many data mining and machine learning tasks, such as classification, clustering and regression, only achieve good results in low-dimensional spaces, and it will be very diffic...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
Patent Type & Authority Applications(China)
IPC IPC(8): G16H50/70
CPCG16H50/70
Inventor 庾安妮徐雷
Owner NANJING UNIV OF SCI & TECH
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products