Unlock instant, AI-driven research and patent intelligence for your innovation.

Method for evaluating sample set division quality based on data set distance

A sample set and data set technology, applied in the field of biomedicine, can solve problems such as the difference in the size of the error value and the decline in the prediction performance of the test set

Inactive Publication Date: 2020-05-22
BEIJING CHINESE MEDICINE HOSPITAL AFFILIATED CAPITAL MEDICAL UNIV
View PDF0 Cites 3 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

However, the decline in the prediction performance of the test set cannot be completely attributed to the distribution difference between the training set and the test set, and the error value varies greatly with different data sets.

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Method for evaluating sample set division quality based on data set distance
  • Method for evaluating sample set division quality based on data set distance
  • Method for evaluating sample set division quality based on data set distance

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0036] Exemplary embodiments of the present disclosure will be described in more detail below with reference to the accompanying drawings. Although exemplary embodiments of the present disclosure are shown in the drawings, it should be understood that the present disclosure may be embodied in various forms and should not be limited by the embodiments set forth herein. Rather, these embodiments are provided for more thorough understanding of the present disclosure and to fully convey the scope of the present disclosure to those skilled in the art.

[0037] refer to figure 1 As shown, a method for evaluating the quality of sample set division based on data set distance provided by an embodiment of the present invention includes:

[0038] 1) According to the sample division method, the sample set is divided into two independent, non-crossover sample subsets of the first training set and the first test set; the sample division method does not include the random division method; ...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

PUM

No PUM Login to View More

Abstract

The invention discloses a method for evaluating the division quality of a sample set based on a data set distance. The method can overcome the defects of quantification and difficult evaluation of conventional error-based analysis, and tightly grasps the basic hypothesis that a training set and a test set need to be independent from each other and come from the same distribution. The method comprises steps estimating the mean value and variance of a sample set through the decomposition of a distance matrix between samples; calculating the distance between two distributions of the training setand the test set; carrying out probability distribution estimation by using distance distribution obtained by random sampling, calculating probabilities of different partitions, and evaluating the quality of data partition or the adaptability of a partition method to specific data by using exact quantitative indexes. On the basis of simplicity and practicability, evaluation of the effectiveness ofthe sample set division method is given, and a proper method is provided for helping researchers in the biomedical field to select a proper data division method and clarify the real generalization performance of the modeling method.

Description

technical field [0001] The invention relates to the field of biomedical technology, in particular to a method for evaluating the division quality of a sample set based on the distance between data sets. Background technique [0002] Sample division plays an important role in the biomedical field. Its purpose is to generate a test set and estimate the generalization ability of the model. The model is the formal expression of the relationship in the data, and the prediction of unknown samples can be realized through the model. The training error represents the learning ability of the modeling method, and the predictive ability of the unknown sample is the goal pursued by the modeling. Therefore, when performing model training, the data set is generally required to be large enough to cover the scope of future applications. When estimating the generalization ability of the model, the training set is also required to be consistent with the model training range, that is, the tw...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

Application Information

Patent Timeline
no application Login to View More
Patent Type & Authority Applications(China)
IPC IPC(8): G06Q10/06G06K9/62
CPCG06Q10/06395G06F18/2193G06F18/214
Inventor 林兆洲王大仟张金霞关竹君姜迪
Owner BEIJING CHINESE MEDICINE HOSPITAL AFFILIATED CAPITAL MEDICAL UNIV