Sample clustering method and device, equipment and storage medium

A clustering method and sample technology, applied in the field of data processing, can solve the problem that the sample set cannot be reasonably clustered, and achieve the effect of ensuring the rationality of the clustering and reducing the workload.

Inactive Publication Date: 2019-09-24
GUANGZHOU SHIYUAN ELECTRONICS CO LTD
View PDF3 Cites 6 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0004] The present invention provides a sample clustering method, device, equipment and storage medium to solve the technical problem in the prior art that the DBSCAN algorithm cannot reasonably cluster sample sets with uneven density

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Sample clustering method and device, equipment and storage medium
  • Sample clustering method and device, equipment and storage medium
  • Sample clustering method and device, equipment and storage medium

Examples

Experimental program
Comparison scheme
Effect test

Embodiment 1

[0075] figure 2 It is a flowchart of a sample clustering method provided by Embodiment 1 of the present invention. The sample clustering method provided in the embodiment can be executed by a sample clustering device, which can be realized by software and / or hardware, and the sample clustering device can be composed of two or more physical entities, or Can be a physical entity. For example, the sample clustering device can be a computer, mobile phone, tablet or interactive smart tablet and other smart devices with data computing and analysis capabilities.

[0076] Specifically, refer to figure 2 , the sample clustering method specifically includes:

[0077] Step 110: Statistically calculate the first sample distance corresponding to each sample in the sample set, where the first sample distance is the distance between the sample and the Sth nearest neighbor sample of the sample.

[0078] Exemplarily, the sample set includes multiple samples, and each sample has the same ...

Embodiment 2

[0101] Figure 4 It is a flowchart of a sample clustering method provided by Embodiment 2 of the present invention. This embodiment is embodied on the basis of the above-mentioned embodiments. refer to Figure 4 , the sample clustering method provided in this embodiment includes:

[0102] Step 201. Construct a K-nearest neighbor graph for each sample in the sample set, and the weight of each edge in the K-nearest neighbor graph is the distance between corresponding samples.

[0103] Specifically, after calculating the distance between each sample and other samples, when drawing the K-nearest neighbor graph of a certain sample, use the sample as a vertex, and obtain the K samples closest to the sample according to the distance between the samples and corresponding distance. Afterwards, the connection lines between the vertices and the K samples are respectively drawn, and the distance between the vertices and the corresponding samples is used as the weight of the connection...

Embodiment 3

[0176] Figure 9 It is a schematic structural diagram of a sample clustering device provided in Embodiment 3 of the present invention. refer to Figure 9 , the sample clustering apparatus includes: a distance statistics module 301 , a distance acquisition module 302 , an average calculation module 303 , a connection determination module 304 and a sample clustering module 305 .

[0177] Wherein, the distance statistics module 301 is used to count the first sample distance corresponding to each sample in the sample set, and the first sample distance is the distance between the sample and the Sth neighbor sample of the sample; distance acquisition Module 302, used to obtain the first sample distance within the set distance range among all the first sample distances; mean calculation module 303, used to obtain the first sample distance based on the set distance range Calculate the mean value of the distance; the connection determination module 304 is used to determine all connec...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The embodiment of the invention discloses a sample clustering method and device, equipment and a storage medium, and relates to the field of data processing, and the method comprises the steps: carrying out the statistics of a first sample distance corresponding to each sample in a sample set, and enabling the first sample distance to be the distance between the sample and an Sth neighbor sample of the sample; in all the first sample distances, acquiring a first sample distance in a set distance range; calculating a distance mean value based on the first sample distance in the set distance range; determining all connection samples of each sample based on the K neighbor sample set corresponding to each sample, wherein K > S, and the connection samples of the samples are neighbor samples and have a connection relationship; and clustering samples in the sample set according to the connection samples, the distance mean value and the S value, the distance mean value being a scanning radius, and the S value being a minimum inclusion sample number of clustering. By adopting the method, the technical problem that a DBSCAN algorithm in the prior art cannot reasonably cluster a sample set with uneven density can be solved.

Description

technical field [0001] Embodiments of the present invention relate to the technical field of data processing, and in particular, to a sample clustering method, device, equipment, and storage medium. Background technique [0002] Cluster analysis refers to the analytical process of grouping a collection of physical or abstract objects into multiple classes consisting of similar objects. Nowadays, clustering analysis is widely used in various fields, and with the wide application of clustering analysis, various clustering algorithms emerge as the times require. For example, K-MEANS algorithm, K-MEDOIDS algorithm, BIRCH algorithm, CURE algorithm, DBSCAN algorithm, OPTICS algorithm, etc. Among them, the DBSCAN algorithm is a relatively representative density-based clustering algorithm, which requires manual input of two parameters: one is the scanning radius, which is recorded as eps; the other is the minimum number of included points, which is recorded as minPts, and through t...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
Patent Type & Authority Applications(China)
IPC IPC(8): G06K9/62
CPCG06F18/23213
Inventor 熊凯
Owner GUANGZHOU SHIYUAN ELECTRONICS CO LTD
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products