Input data processing method and device for categorical data mining model

A technology for classifying data and inputting data, which is applied in the field of data processing and can solve problems such as low efficiency and low popularity

Inactive Publication Date: 2018-08-14
BANK OF CHINA
View PDF6 Cites 9 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

General data mining includes: data acquisition, data processing, model calculation, model online and other processes. However, in the traditional data mining process, the steps of data conversion, processing, inspection, and screening are mostly realized by manua

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Input data processing method and device for categorical data mining model
  • Input data processing method and device for categorical data mining model
  • Input data processing method and device for categorical data mining model

Examples

Experimental program
Comparison scheme
Effect test

Embodiment approach 2

[0097] A chi-square test is performed on each variable, and variables that do not meet the chi-square test are eliminated;

[0098] Calculate the correlation coefficient between each variable and the target variable;

[0099] According to the magnitude of the correlation coefficient, the top N variables with the highest correlation coefficient are selected; N≥1.

[0100] In this embodiment, the amount of the degree of linear correlation between the research variables is generally represented by the letter r. Due to the different research objects, there are many ways to define the correlation coefficient, and the Pearson correlation coefficient is more commonly used.

[0101] For example: the calculation of the correlation coefficient can be through the following formula 3):

[0102]

[0103] Among them, X and Y are two different variables, Cov(X,Y) is the covariance of X and Y, Var[X] is the variance of X, and Var[Y] is the variance of Y.

[0104] In this embodiment, the...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention discloses an input data processing method and device for a categorical data mining model. The method comprises the following steps: receiving data uploaded by a user, and carrying out preprocessing on the data; converting character type data in the data into numeric data; carrying out binning treatment on every continuous variable datum; calculating a preset index value of every variable, and screening a variable which is the most relevant to a preset target variable according to the preset index; and carrying out standardizing treatment on the data. Then operations such as datamodeling and follow-up classifying scoring can be carried out by treated data. Thus, after the data uploaded by the user are received, input data of the categorical data mining model can be processedautomatically, data analyzing personnel are not required, automation of a data processing stage in a data mining process is realized, moreover, operation is simple, and the operators do not need to have professional data analyzing experiences.

Description

technical field [0001] The invention relates to the field of data processing, in particular to a method and device for processing input data of a classification data mining model. Background technique [0002] In recent years, with the development of big data technology, data mining technology has become increasingly mature. Data mining technology generally refers to the method of searching out hidden information from a large amount of data through algorithms. Data mining technology is widely used in many industries, such as: financial industry, communication industry, transportation industry, large retail and insurance industry and other industries. General data mining includes: data acquisition, data processing, model calculation, model online and other processes. However, in the traditional data mining process, the steps of data conversion, processing, inspection, and screening are mostly implemented by manual processing, which is inefficient and requires Professional da...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
IPC IPC(8): G06F17/30
CPCG06F16/215G06F16/2465
Inventor 陈丹蒋诗伟许佳顾玉莲
Owner BANK OF CHINA
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products