Clustering analysis method and system under non-independent identical distribution

A cluster analysis, non-independent technology, applied in the direction of instruments, character and pattern recognition, computer components, etc., can solve the problem that category data cannot effectively obtain attribute values, and achieve the effect of avoiding human subjectivity and improving quality

Inactive Publication Date: 2019-07-12
QILU UNIV OF TECH
View PDF0 Cites 0 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0004] In order to overcome the deficiencies of the above-mentioned prior art, the present disclosure provides a non-IID clustering analysis method and system, which solves the problem that categorical data cannot effectively obtain the real relationship between attribute values

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Clustering analysis method and system under non-independent identical distribution
  • Clustering analysis method and system under non-independent identical distribution
  • Clustering analysis method and system under non-independent identical distribution

Examples

Experimental program
Comparison scheme
Effect test

Embodiment 1

[0041] This embodiment provides a new non-independent and identically distributed clustering analysis method using coupling similarity for measurement. This method solves the problem of how to capture the real relationship in the category data set, and obtains it based on the NI-DBSCANS clustering algorithm. More accurate and efficient clustering results can accurately calculate the real relationship between attribute values ​​of category data, thereby improving the quality of clustering results.

[0042] Please refer to the attached figure 1 , the non-independent and identically distributed cluster analysis method includes the following steps:

[0043] S101. Obtain a data set and create an information table S.

[0044] Specifically, in the step 101, the data set U is obtained, and the data set U is preprocessed, and the information table S is formed by using the preprocessed data set. The information table S is shown in Table 1, and the row in the information table S Repres...

Embodiment 2

[0102] In order to enable those skilled in the art to better understand the technical solution of the present application, a more detailed embodiment is listed below. This embodiment provides a non-independent and identically distributed cluster analysis method for mining and analyzing data , this embodiment has the following settings:

[0103] The formula for calculating the accuracy of clustering results is:

[0104]

[0105] Among them, n represents the correct clustering result, and N represents the total number of objects in the dataset.

[0106] Please refer to the attached figure 2 , the non-independent and identically distributed cluster analysis method includes the following steps:

[0107] S201, acquire the Zoo data set, and establish the information table S 1 .

[0108] Specifically, the Zoo data set has a total of 101 objects and 16 attributes (hair, feathers, eggs, milk, flying, aquatic, predation, teeth, backbone, breath, poison, fins, legs, tail, domesti...

Embodiment 3

[0154] This embodiment provides a cluster analysis system under non-independent and identical distribution, please refer to the attached Figure 5 , the system consists of:

[0155] The coupling similarity matrix generation module is used to obtain the data set and establish the information table; calculate the coupling similarity between the data objects in the information table to obtain the coupling similarity matrix;

[0156] The coupling similarity matrix analysis module analyzes the coupling similarity matrix, and calculates the Eps neighborhood interval value and the core point threshold Minpts;

[0157] The clustering module performs clustering analysis on the information table based on the obtained Eps neighborhood interval value and the core point threshold value Minpts.

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention discloses a clustering analysis method and system under non-independent identical distribution, which solves the problem that category-type data cannot effectively obtain a real relationbetween attribute values. The method comprises the following steps: acquiring a data set, and establishing an information table; calculating the coupling similarity between the data objects in the information table to obtain a coupling similarity matrix; analyzing the coupling similarity matrix, and calculating and obtaining an Eps neighborhood interval value and a core point threshold value Minpts; and performing clustering analysis on the information table based on the obtained Eps neighborhood interval value and the core point threshold value Minpts.

Description

technical field [0001] The disclosure belongs to the field of computer data mining, and specifically relates to a non-independent and identical distribution clustering analysis method and system. Background technique [0002] In most of the current research on data mining algorithms, it is first necessary to assume that the data or objects are independent and identically distributed, that is, there is no relationship between the data. Usually, this assumption is not true in real life. Data are often closely related and interact with each other, and there are complex coupling or heterogeneous relationships. Therefore, in the face of complex data, we cannot only see the data On the surface, it is necessary to analyze the hidden relationship in the data more deeply. In the research of DBSCAN algorithm, ignoring the coupling or dependency between the attribute values, attributes, and objects of the data source may cause the analysis results to become inaccurate due to the loss ...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
Patent Type & Authority Applications(China)
IPC IPC(8): G06K9/62
CPCG06F18/2321G06F18/22G06F18/24147
Inventor 姜合吕奕锟
Owner QILU UNIV OF TECH
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products