Hierarchical important feature selection method based on clinical high-dimensional breast cancer data

A high-dimensional data and breast cancer technology, applied in the computer field, can solve the problems of high-dimensional clinical data, achieve the effects of ensuring accuracy, reducing time and computing resource consumption, and widely applicable scenarios

Active Publication Date: 2018-12-07
UNIV OF ELECTRONICS SCI & TECH OF CHINA
View PDF9 Cites 5 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0013] The purpose of the present invention is to solve the problem of too high clinical data dimension in establishing breast cancer survival prediction model
Use the hierarchical feature selection method combining statistical feature selection and integrated feature selection to solve the problem of important feature extraction and model practicability

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Hierarchical important feature selection method based on clinical high-dimensional breast cancer data
  • Hierarchical important feature selection method based on clinical high-dimensional breast cancer data
  • Hierarchical important feature selection method based on clinical high-dimensional breast cancer data

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0040] In order to make the purpose, technical solution and advantages of the present invention clearer, the present invention will be further described in detail below in conjunction with the implementation methods and accompanying drawings.

[0041] see figure 1 , The layered important feature selection method for breast cancer clinical high-dimensional data of the present invention includes statistical feature calculation, integrated feature calculation and threshold value setting methods involved in integrated feature calculation. The present invention uses a layered feature selection method combining statistical feature selection and integrated feature selection to effectively solve the problems of important feature extraction and model practicability. Its specific implementation process is as follows:

[0042] S1: Statistical feature selection.

[0043] Feature extraction and cleaning are performed on the original clinical data to obtain the original feature set F n; ...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention discloses a hierarchical important feature selection method based on clinical high-dimensional breast cancer data. The feature selection method comprises steps of statistical feature selection and integrated feature selection. At the statistical feature selection step, a single factor analysis method is employed; and features having obvious impacts on outcome variables are selected preliminarily by different statistics and checking. At the integrated feature selection step, a gradient lifting tree is established; after model training, a feature importance score is obtained; witha designed and verified importance score threshold, selection of features having obvious impacts on outcome variables is realized. Therefore, problems of too high data feature dimension, excessive redundant features and data disorder during the clinical breast cancer prediction modeling process are solved effectively; the redundant or meaningless features in high-dimensional clinical breast cancerdata can be excluded; and thus a few of features having the important impact on the breast cancer modeling are selected and the high accuracy and practicability of the breast cancer model are ensured.

Description

technical field [0001] The invention relates to the fields of computer technology, statistical machine learning technology, feature engineering technology and the like. Background technique [0002] Breast cancer is the malignant tumor with the highest incidence rate among women in the world, which seriously threatens women's health. Breast cancer patients are usually intervened through surgery, chemotherapy and other treatment measures, and may face the risk of recurrence at any time after treatment. Scientifically assessing and predicting the survival status of breast cancer patients can assist doctors in formulating appropriate treatment plans, providing new support for reducing the risk of recurrence and improving prognosis. [0003] To realize the assessment and prediction of the survival status of breast cancer patients, such as the recurrence-free survival rate, a machine learning prediction model can be established based on breast cancer clinical data. However, cli...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
Patent Type & Authority Applications(China)
IPC IPC(8): G16H50/20G16H50/30G16H50/70
CPCG16H50/20G16H50/30G16H50/70
Inventor 付波刘沛林劼郑鸿邓玲
Owner UNIV OF ELECTRONICS SCI & TECH OF CHINA
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products