Continuous attribute discretization method based on Chi2 statistics

A discretization and attribute technology, applied in complex mathematical operations and other directions, can solve problems such as inconsistency, achieve high precision, and avoid unfair interval selection.

Inactive Publication Date: 2010-07-14
DALIAN UNIV OF TECH
View PDF0 Cites 4 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0016] 3) I...

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Continuous attribute discretization method based on Chi2 statistics
  • Continuous attribute discretization method based on Chi2 statistics
  • Continuous attribute discretization method based on Chi2 statistics

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0037] 1. The specific process of Rec-Chi2 algorithm is as follows:

[0038] Step1: Initialization. Let the significance level α=0.5. Calculate the information system inconsistency rate Incon_rate.

[0039] Step2: Sort the data for each attribute and calculate the χ of all adjacent intervals according to formula (1) 2 Value, find out the χ 2 corresponding Value, then calculate the difference

[0040] Step3: Merge

[0041] while (intervals that can be combined)

[0042] {Find the largest breakpoint of D′ to merge;

[0043] if(Incon_rate increases)

[0044] {Undo the merge; goto Step4;}

[0045] else goto Step2;

[0046] }

[0047] Step4: if (α is the last level)

[0048] {Exit the program, discretization is complete;}

[0049] else{α 0 =α; downgrade to α; goto Step2;}

[0050] Step5: Discretization of a single attribute.

[0051] For each attribute

[0052] {Calculate the difference D′;

[0053] α=α 0 ;

[0054] Flag = 0;

[0055] ...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention relates to a continuous attribute discretization method based on Chi2 statistics, which belongs to the field of data mining, and is characterized in that Chi2 series algorithm is analyzed, a discretization standard is re-defined, and continuous attributes are discretized more reasonably and more effectively; and a difference sequence interval combination method serves as a standard, so that all adjacent areas have fair combination opportunities, thereby well solving the problem of unfairness. In addition, the imprecise Eji value in Chi2 statistics is analyzed, and two improving schemes are presented. The continuous attribute discretization method based on Chi2 statistics has the advantages and the benefits that the high efficiency in extracting information from original data, the discretization process is more precise, and higher accuracy is realized in machine learning.

Description

Technical field [0001] The invention belongs to the field of data mining and relates to a χ 2 The continuous attribute discretization method of statistics involves the Chi2 series algorithm based on probability and statistics. Background technique [0002] Data in real life often contains continuous-valued attributes. However, current induction and classification algorithms often rely on discrete-valued attributes, which brings inconvenience to the research of machine learning. Therefore, continuous attribute discretization plays a very important role in data mining, machine learning, and knowledge discovery, and has attracted the attention of researchers. The task of discretization is to divide the value range or interval of the continuous attribute into a few small cells, where each cell corresponds to a discrete symbol. With the widespread attention and in-depth research in this field, the discretization algorithm has been greatly developed. The current discretization types a...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
IPC IPC(8): G06F17/10G06F17/18
Inventor 李克秋桑雨王哲
Owner DALIAN UNIV OF TECH
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products