Continuous attribute discretization method based on Canopy clustering and BIRCH hierarchical clustering

A technology of hierarchical clustering and clustering method, applied in text database clustering/classification, structured data retrieval, unstructured text data retrieval, etc., can solve the problems of unreasonable discretization and poor discretization effect.

Inactive Publication Date: 2015-04-29
ANHUI KELI INFORMATION IND
View PDF0 Cites 17 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0006] The purpose of the present invention is to solve the defects of poor discretization effect and unreasonable discretization in the prior ...

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Continuous attribute discretization method based on Canopy clustering and BIRCH hierarchical clustering
  • Continuous attribute discretization method based on Canopy clustering and BIRCH hierarchical clustering
  • Continuous attribute discretization method based on Canopy clustering and BIRCH hierarchical clustering

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0050] In order to have a further understanding and understanding of the structural features of the present invention and the achieved effects, the preferred embodiments and accompanying drawings are used for a detailed description, as follows:

[0051] likefigure 1 As shown, a kind of continuous attribute discretization method based on Canopy clustering and BIRCH hierarchical clustering according to the present invention comprises the following steps:

[0052] The first step is to use Canopy clustering to achieve initial clustering of continuous attribute data. Set reasonable distance thresholds T1 and T2, where the thresholds T1 and T2 are the measures for dividing the size of Canopy. T1 determines the number of points contained in each Cluster, which directly affects the "center of gravity" and "radius" of the Cluster, while T2 determines If T2 is too large, there will be only one Cluster, and if it is too small, there will be too many Clusters. The specific determination o...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention relates to a continuous attribute discretization method based on Canopy clustering and BIRCH hierarchical clustering. Compared with the prior art, the method solves the shortcomings of poor discretization effect and unreasonable discretization. The method includes the following steps: utilizing the Canopy clustering to achieve continuous attribute data initial clustering; utilizing the BIRCH hierarchical clustering to conduct secondary clustering with the clustering center as the sample; finding the nearest neighbors of the clustering centers of the dimension of any sample of the break point set to serve as the discrete micro adjustment basis and achieve continuous attribute discretization. The method can achieve the discretization method of the high-dimensional large-data samples, reduces the number of continuous attribute values, reduces dependency on storage space, enables the discretizated data to be regular and simplified and convenient to understand, use and explain, and expands the application range.

Description

technical field [0001] The invention relates to the technical field of data mining preprocessing, in particular to a continuous attribute discretization method based on Canopy clustering and BIRCH hierarchical clustering. Background technique [0002] Discretization of continuous attributes is an important preprocessing step in data mining, which directly affects the effect of data mining. At present, many data mining algorithms require discretization of continuous attributes before modeling, such as rough set algorithm. The discretization of continuous attributes refers to setting several division points within the value range of a specific continuous attribute, dividing the value range of the attribute into some discretized intervals, and finally using different symbols or integers to represent each The property value in the self range. The discretization of continuous attributes can essentially be attributed to the problem of using selected breakpoints to divide the spa...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
IPC IPC(8): G06F17/30
CPCG06F16/35G06F16/285
Inventor 闫永刚陶刚刘俊张小兵张晓花
Owner ANHUI KELI INFORMATION IND
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products