Unlock instant, AI-driven research and patent intelligence for your innovation.

Unsupervised discretization method for continuous attribute data based on information entropy

A technology of attribute data and information entropy, applied in the field of unsupervised discretization of continuous attribute data based on information entropy, can solve the problems of high calculation cost, lack of theoretical basis, lack of data adaptability, etc., and achieve the effect of high calculation efficiency

Inactive Publication Date: 2021-02-12
NORTHWEST NORMAL UNIVERSITY
View PDF4 Cites 0 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0006] Among the above methods, the method of subjectively specifying the number of discrete values ​​by the user lacks adaptability to the original data; the method of assuming conditions lacks theoretical basis; the discretization process depends on other attributes through the heuristic method; the method of using information entropy is not based on The information entropy of continuous attributes determines the number of discrete values, and the calculation cost is relatively high

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Unsupervised discretization method for continuous attribute data based on information entropy
  • Unsupervised discretization method for continuous attribute data based on information entropy
  • Unsupervised discretization method for continuous attribute data based on information entropy

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0029] The present invention will be described in detail below in conjunction with the accompanying drawings.

[0030] First, a brief description of the terms and discretization process involved in the present invention is given to facilitate the understanding of the method of the present invention.

[0031] Discrete granularity: For a given data set, the number of different values ​​of any attribute is called the discrete granularity of the attribute. The discrete granularity of discrete attributes is denoted as |c|, and the discrete granularity of continuous attributes is denoted as |n|.

[0032] Information entropy is a theory for measuring information uncertainty, which is defined as the probability of occurrence of discrete random events. Similarly, each different value of a continuous attribute can be understood as a discrete random event, and the discrete granularity of this attribute is equal to the number of discrete random events. By calculating the information ent...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

PUM

No PUM Login to View More

Abstract

The invention relates to the technical field of discretization of continuous attributes of big data, in particular to an unsupervised discretization method for continuous attribute data based on information entropy. The method includes the steps as follows: the first step of traversing all value records of any continuous attribute, counting discrete granularity |nj| of the attribute and the probability qji of each different value, and recording a maximum njmax and minimum njmin; the second step of obtaining a calculating formula of the value chaos degree of any continuous attribute nj according to a calculating formula of the information entropy, and calculating the value chaos degree of the attribute according to the formula; the third step of rounding down the value chaos degree to obtain the number of break points; the fourth step of adopting an equivalent width interval method to calculate the width of each divided interval, and determining the position of each break point; and thefifth step of discretizing the continuous attribute nj. The novel method for determining the number of break points is more suitable for the original data, and the discretization of each attribute does not affect each other and does not depend on other attributes, and the calculation efficiency is higher.

Description

technical field [0001] The invention relates to the technical field of discretization of continuous attributes of big data, in particular to an unsupervised discretization method of continuous attribute data based on information entropy. Background technique [0002] Discretization of continuous attributes is the process of dividing the value range of continuous attributes into several intervals, each interval corresponding to a unique discrete value, and transforming the original value into a discrete value. Researchers at home and abroad have proposed a lot of methods in the discretization of continuous attributes (numerical attributes), and there are many classification methods from different perspectives, namely top-down and bottom-up, supervised and and partial, static and dynamic, and single-attribute and multi-attribute, etc. The essence of discretization of continuous attributes is to determine the number of discrete values ​​(intervals, intervals) and the location ...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

Application Information

Patent Timeline
no application Login to View More
Patent Type & Authority Patents(China)
IPC IPC(8): G06F17/18
CPCG06F17/18
Inventor 马生俊陈旺虎郭宏乐乔保民李新田
Owner NORTHWEST NORMAL UNIVERSITY