Large-scale data clustering method and device and computer readable storage medium

A technology of large-scale data and clustering methods, applied in computer parts, computing, character and pattern recognition, etc., can solve problems such as reduced execution efficiency, inability to provide good solutions, and time-consuming

Pending Publication Date: 2020-01-17
PING AN TECH (SHENZHEN) CO LTD
View PDF0 Cites 4 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

However, the K-means algorithm is very dependent on the initial k centers. Improper selection of the initial centers may easily lead to local optimal solutions, increase the number of iterations, and reduce execution efficiency. In addition, during the K-means clustering process, data points need to be calculated The Euclidean distance with the class center point, and to calculate the Euclidean distance, it is necessary to calculate the dot product of the data point and the class center
In the case of massive data participating in clustering, there are a lot of dot products that need to be calculated, which takes a long time and is inefficient
Therefore, traditional clustering algorithms cannot provide a good solution when dealing with large-scale data, no matter from the perspective of system resources or real-time efficiency.

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Large-scale data clustering method and device and computer readable storage medium
  • Large-scale data clustering method and device and computer readable storage medium
  • Large-scale data clustering method and device and computer readable storage medium

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0058] It should be understood that the specific embodiments described here are only used to explain the present invention, not to limit the present invention.

[0059] The invention provides a large-scale data clustering method. refer to figure 1 As shown, it is a schematic flowchart of a large-scale data clustering method provided by an embodiment of the present invention. The method may be performed by a device, and the device may be implemented by software and / or hardware.

[0060] In this embodiment, the large-scale data clustering method includes:

[0061] S1, the K value calculation layer receives the data sample set input by the user, calculates the average silhouette coefficient according to the data sample set, and selects the K value with the largest average silhouette coefficient, randomly determines K cluster centers, and divides the data sample set, The K value and the K cluster centers are input to the cluster center calculation layer.

[0062] In a preferre...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention relates to an artificial intelligence technology. The invention discloses a large-scale data clustering method. The method comprises the following steps: receiving a data sample set input by a user, calculating an average contour coefficient of a cluster center number K value according to the data sample set, selecting the K value with the maximum average contour coefficient, randomly determining K cluster centers, and storing the K cluster centers and the data sample set into a database according to a row priority storage form; calculating loss values of the K cluster centers and the data sample set according to a minimum square error algorithm, and judging a size relationship between the loss values and a preset threshold value; and when the loss value is smaller than the threshold value, outputting the K cluster centers to complete a clustering result. The invention further provides a large-scale data clustering device and a computer readable storage medium. Accordingto the invention, an accurate large-scale data clustering function can be realized.

Description

technical field [0001] The present invention relates to the technical field of artificial intelligence, in particular to a method, device and computer-readable storage medium for intelligently performing large-scale data clustering based on big data input. Background technique [0002] Clustering as a typical data classification method, its core is to find similar categories from large-scale data sets, and divide samples into multiple non-overlapping subsets. The K-means clustering algorithm is one of the most widely divided clustering methods, which uses the quality centers of various samples to represent the class for iteration, and clusters by dynamically adjusting various centers. However, the K-means algorithm is very dependent on the initial k centers. Improper selection of the initial centers may easily lead to local optimal solutions, increase the number of iterations, and reduce execution efficiency. In addition, during the K-means clustering process, data points ne...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
Patent Type & Authority Applications(China)
IPC IPC(8): G06K9/62
CPCG06F18/23213
Inventor 陈善彪尹浩
Owner PING AN TECH (SHENZHEN) CO LTD
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products