Medical dataset characteristic dimension reduction method based on subspace learning

What is Al technical title?
Al technical title is built by PatSnap Al team. It summarizes the technical point description of the patent document.
A subspace learning and medical data technology, applied in the field of big data technology and machine learning, can solve the problems of not considering the importance of global discriminant information, the optimal solution cannot be solved by eigenvalue decomposition, and the eigenvector is inaccurate, etc., to achieve the goal of choosing The method is simple and achievable, convenient for feature selection, and has good robustness

Pending Publication Date: 2019-10-22

NANJING UNIV OF SCI & TECH

View PDF0 Cites 4 Cited by

Summary
Abstract
Description
Claims
Application Information

AI Technical Summary
This helps you quickly interpret patents by identifying the three key elements:
Problems solved by technology
Method used
Benefits of technology

Problems solved by technology

When the number of samples is much smaller than the feature dimension, the LDA method may have a singular matrix and cause the calculated eigenvectors to be inaccurate

On the contrary, LPP maintains the linear structure of the node neighborhood by constructing the adjacency graph of sample points and then calculating the weight, but it does not consider the importance of global discriminant information, so the classification effect is not good.

In addition, some existing feature dimensionality reduction methods, such as PCA and LDA, are parameterless feature dimensionality reduction methods, and both assume that the distribution of sample points obeys the Gaussian distribution, so they are very sensitive to outliers

[0005] At present, the problem of feature dimensionality reduction for high-dimensional data is modeled as an optimization problem, and its solution method often involves eigenvalue decomposition. However, some literature points out that the optimal solution of some problems cannot be solved by eigenvalue decomposition.

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Image

Smart Image Click on the blue labels to locate them in the text.

Viewing Examples

Smart Image

Examples

Experimental program

Comparison scheme

Effect test

Embodiment

[0099] The present invention is based on the medical data set feature dimensionality reduction method of subspace learning, comprises the following contents:

[0100] 1. Construct the original high-dimensional data matrix X and the label column according to the medical data set to be analyzed.

[0101] The data set used in this embodiment is the ARCENE data set, which comes from human serum mass spectrometry. The sample size of the ARCENE dataset is 900, and the feature dimension is up to 10000. The task is a binary classification problem that aims to distinguish people with cancer (labeled +1) from normal people (labeled -1). The entire dataset is merged from two prostate cancer datasets and one ovarian cancer dataset from the National Cancer Institute (NCI) and Eastern Virginia Medical School (EVMS). The data has no missing values and approximately 44% of the samples are positive. The dataset consists of three parts: a training dataset with 100 samples, a validation dat...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Login to View More

PUM

Login to View More

Abstract

The invention discloses a medical dataset characteristic dimension reduction method based on subspace learning. The method comprises the following steps of constructing an original high-dimension datamatrix X and a label column according to a to-be-analyzed medical dataset; constructing a most optimized target function, and solving a Lagrangean function thereof; according to the original high-dimension data matrix and the label column, calculating global discriminating information and local discriminating information; iterating and solving a conversion matrix Q until the target function is convergent or reaches a highest cycling number-of-times, thereby obtaining a dimension-reduced data matrix; training a model according to the calculated conversion matrix, calculating an AUC value evaluation dimension reduction matrix and classification accuracy. Compared with an existing characteristic dimension reduction method of the medical dataset, the method according to the invention is advantageous in that the local discriminating information and the global discriminating information of data are simultaneously used for performing dimension reduction; the method is suitable for the characteristic dimension reduction problem in a common scale, and relatively high classification accuracy is realized when the characteristic scale of the data is far higher than the sample scale.

Description

technical field [0001] The invention belongs to the field of big data technology and machine learning, in particular to a method for reducing the dimensionality of medical data set features based on subspace learning. Background technique [0002] Dimensionality Reduction aims to transform high-dimensional data into low-dimensional data. The emergence of feature dimensionality reduction technology is due to the fact that in machine learning problems generated in practical application scenarios, a large amount of complex high-dimensional data will be generated. The running time of most data analysis tasks increases at least linearly with the increase of data dimensions, and storing and analyzing high-dimensional data requires a large amount of computer storage resources and a lot of computing time. And many data mining and machine learning tasks, such as classification, clustering and regression, only achieve good results in low-dimensional spaces, and it will be very diffic...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Login to View More

Application Information

Patent Timeline

Login to View More

Patent Type & Authority Applications(China)

IPC IPC(8): G16H50/70

CPCG16H50/70

Inventor 庾安妮徐雷

Owner NANJING UNIV OF SCI & TECH

Features

R&D
Intellectual Property
Life Sciences
Materials
Tech Scout

Why Patsnap Eureka

Unparalleled Data Quality
Higher Quality Content
60% Fewer Hallucinations

Social media

Patsnap Eureka Blog

Learn More

Browse by: Latest US Patents, China's latest patents, Technical Efficacy Thesaurus, Application Domain, Technology Topic, Popular Technical Reports.

Medical dataset characteristic dimension reduction method based on subspace learning

AI Technical Summary This helps you quickly interpret patents by identifying the three key elements: Problems solved by technologyMethod usedBenefits of technology

Problems solved by technology

Method used

Image

Examples

Embodiment

PUM

Abstract

Description

Claims

Application Information

AI Technical Summary
This helps you quickly interpret patents by identifying the three key elements:
Problems solved by technology
Method used
Benefits of technology