Data classification method and system
A technology of data classification and classification methods, which is applied in the direction of electrical digital data processing, special data processing applications, instruments, etc., and can solve problems such as low modeling accuracy, reduced sample classification accuracy, and inaccurate estimation of regression coefficients
- Summary
- Abstract
- Description
- Claims
- Application Information
AI Technical Summary
Problems solved by technology
Method used
Image
Examples
Embodiment 1
[0034] The flow chart of the data classification method provided by the embodiment of the present invention is as follows figure 1 shown, including:
[0035] S101: Calculate the correlation coefficient between each sample variable and a preset target variable, and under other sample variable conditions, the partial correlation coefficient between each sample variable and the target variable;
[0036] The formula for calculating the correlation coefficient is:
[0037] φ YX 1 = Σ i = 1 N ( X 1 i - X ‾ 1 ) ( ...
Embodiment 2
[0071] see figure 2 , shows a flow chart of Embodiment 2 of a data classification method of the present invention, where the sample variables in the original sample set need to be extracted and filled before the segmentation variables are selected. The second embodiment includes the following steps:
[0072] S201: Calculate the missing ratio of each sample variable in the original sample set, and select the sample variable that meets the missing ratio condition according to the missing ratio;
[0073] S202: Calculating the mean values of the selected sample variables meeting the missing ratio condition respectively, and filling the mean values of the selected sample variables meeting the missing ratio condition;
[0074] The missing ratio condition is that the missing ratio of the variable is not greater than 30%. Of course, the missing ratio condition is not fixed, and it is determined according to the specific situation of the missing variable of the sample. The follo...
Embodiment 3
[0084] After modeling the training subsets one by one to generate a model describing the data, it is also necessary to judge the prediction effect of the model to determine whether the model has achieved the best prediction effect. Therefore, after classifying the sample variables in the test subset, it also includes: model The judgment process of the prediction effect, such as image 3 shown, including:
[0085] S301 to S311: the same as steps S201-S211 in the second embodiment;
[0086] S312: Judging whether the model has achieved the best prediction effect, if yes, execute S313, otherwise, execute S314;
[0087] Specifically, this step includes the following steps, such as Figure 4 Shown:
[0088] S3121: Obtain the probability value that the target variable takes a value of 1 from the probability value calculated in step S311;
[0089] S3122: Merge the probability values, and sort from large to small according to the magnitude of the values;
[0090] For example: the pr...
PUM
Login to View More Abstract
Description
Claims
Application Information
Login to View More 