Clustering devices and methods

A device and clustering technology, applied in the field of data analysis, can solve the problems of poor clustering quality, difficult to define the weight of spatial attributes and thematic attributes, etc., and achieve the effect of reliable clustering results.

Active Publication Date: 2020-06-26
NEC CORP
View PDF2 Cites 0 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

There are two outstanding problems in this type of method: on the one hand, it is difficult to define the weights of spatial attributes and thematic attributes; on the other hand, the randomness of initial cluster center selection leads to poor clustering quality

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Clustering devices and methods
  • Clustering devices and methods
  • Clustering devices and methods

Examples

Experimental program
Comparison scheme
Effect test

example 1

[0060] In this example, the kernel calculation unit 120 first calculates the local variance between each object in the spatial dataset and other objects in the object's spatial neighborhood. Then, the kernel point calculation unit 120 performs multiple random rearrangements in the spatial data set. Finally, the kernel point calculation unit 120 judges whether the object is a kernel point based on the significance of the local variance of each object. The calculation process in Example 1 is described in detail below.

[0061] Using this example, the user does not need to subjectively select parameters when calculating the kernel point, but finds the kernel point according to the characteristics of the data itself, which has a strong adaptive ability and can produce more robust results.

example 2

[0063] In this example, the kernel calculation unit 120 first calculates the local variance between each object in the spatial dataset and other objects in the object's spatial neighborhood.

[0064] Then, the kernel point calculation unit 120 performs multiple times of Bootstrap random sampling in the spatial data set. Bootstrap random sampling is to use the attribute values ​​of all objects as a set, for each object P i , randomly select n from the set sequentially i (n i for object P i The number of objects in the spatial neighborhood) attribute values, and then calculate the object P once i The local variance of , in this way, it is randomly selected multiple times, and each time it is randomly selected, it is extracted from the most original attribute value set, that is, the sampling process with replacement. For example, the existing set of attribute values ​​is {1, 2, 3, 4}, P i There are 3 objects in the neighborhood of the object space, the results of two random ...

example 3

[0067] The inventors of the present invention found that the local variance of the kernel points approximately obeys the chi-square distribution. Therefore, it is possible to randomly rearrange multiple times and calculate the local variance after the rearrangement, and then perform chi-square distribution curve fitting, that is, estimate the parameter k of the chi-square distribution through the local variance value after multiple rearrangements, and then according to Chi-Square Distribution Density Function with Local Variance Calculates the p-value of the local variance.

[0068] Specifically, the kernel point calculation unit 120 calculates the local variance between each object in the spatial dataset and other objects in the spatial neighborhood of the object, and then performs multiple random rearrangements in the spatial dataset. Afterwards, the kernel point calculation unit 120 calculates the local variance of each object and performs chi-square curve fitting, and calc...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention provides clustering equipment. The clustering equipment comprises a spatial neighborhood selection unit, an epipole computation unit, an extracting unit and a merging unit, wherein the spatial neighborhood selection unit is configured to select the spatial neighborhood of each object in a spatial dataset; the epipole computation unit is configured to compute an epipole in the spatial dataset; the epipole has attribute values similar to that of other objects in the spatial neighborhood of the epipole; the extracting unit is configured to extract the epipole in the spatial dataset and an object positioned in the spatial neighborhood of the epipole, so as to form a corresponding spatial data subset; the merging unit is configured to perform clustering according to the spatial data subset. The invention further provides a clustering method. By adopting the method, significance judgment of a spatial level clustering result can be effectively performed, so that the obtained clustering result is more reliable.

Description

technical field [0001] This application relates to the field of data analysis, in particular to a clustering device and method. Background technique [0002] Clustering is a means of classifying things or partitioning natural units. Hierarchical clustering is the most widely used clustering method at present. It mainly adopts the bottom-up gradual merging strategy. Great grouping unit. [0003] On the basis of traditional hierarchical clustering, geographers put spatial location adjacent as a constraint and proposed spatial hierarchical clustering, which is used for clustering data with geographic spatial location and attribute information (such as temperature, monitoring data of environmental monitoring stations, etc.) . Although spatial hierarchical clustering can obtain a series of results, it cannot directly give which (some) clustering results users really need and are interested in. In practical applications, the user needs to set the clustering end condition (that...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
Patent Type & Authority Patents(China)
IPC IPC(8): G06K9/62
Inventor 刘博胡卫松刘晓炜唐建波刘启亮
Owner NEC CORP
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products