Large-scale data clustering method and device and computer readable storage medium

What is AI technical title?
AI technical title is built by PatSnap AI team. It summarizes the technical point description of the patent document.
A technology of large-scale data and clustering methods, applied in computer parts, computing, character and pattern recognition, etc., can solve problems such as reduced execution efficiency, inability to provide good solutions, and time-consuming

Pending Publication Date: 2020-01-17

PING AN TECH (SHENZHEN) CO LTD

View PDF0 Cites 4 Cited by

Summary
Abstract
Description
Claims
Application Information

AI Technical Summary
This helps you quickly interpret patents by identifying the three key elements:
Problems solved by technology
Method used
Benefits of technology

Problems solved by technology

However, the K-means algorithm is very dependent on the initial k centers. Improper selection of the initial centers may easily lead to local optimal solutions, increase the number of iterations, and reduce execution efficiency. In addition, during the K-means clustering process, data points need to be calculated The Euclidean distance with the class center point, and to calculate the Euclidean distance, it is necessary to calculate the dot product of the data point and the class center

In the case of massive data participating in clustering, there are a lot of dot products that need to be calculated, which takes a long time and is inefficient

Therefore, traditional clustering algorithms cannot provide a good solution when dealing with large-scale data, no matter from the perspective of system resources or real-time efficiency.

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Image

Smart Image Click on the blue labels to locate them in the text.

Viewing Examples

Smart Image

Examples

Experimental program

Comparison scheme

Effect test

Embodiment Construction

[0058] It should be understood that the specific embodiments described here are only used to explain the present invention, not to limit the present invention.

[0059] The invention provides a large-scale data clustering method. refer to figure 1 As shown, it is a schematic flowchart of a large-scale data clustering method provided by an embodiment of the present invention. The method may be performed by a device, and the device may be implemented by software and / or hardware.

[0060] In this embodiment, the large-scale data clustering method includes:

[0061] S1, the K value calculation layer receives the data sample set input by the user, calculates the average silhouette coefficient according to the data sample set, and selects the K value with the largest average silhouette coefficient, randomly determines K cluster centers, and divides the data sample set, The K value and the K cluster centers are input to the cluster center calculation layer.

[0062] In a preferre...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Login to View More

PUM

Login to View More

Abstract

The invention relates to an artificial intelligence technology. The invention discloses a large-scale data clustering method. The method comprises the following steps: receiving a data sample set input by a user, calculating an average contour coefficient of a cluster center number K value according to the data sample set, selecting the K value with the maximum average contour coefficient, randomly determining K cluster centers, and storing the K cluster centers and the data sample set into a database according to a row priority storage form; calculating loss values of the K cluster centers and the data sample set according to a minimum square error algorithm, and judging a size relationship between the loss values and a preset threshold value; and when the loss value is smaller than the threshold value, outputting the K cluster centers to complete a clustering result. The invention further provides a large-scale data clustering device and a computer readable storage medium. Accordingto the invention, an accurate large-scale data clustering function can be realized.

Description

technical field [0001] The present invention relates to the technical field of artificial intelligence, in particular to a method, device and computer-readable storage medium for intelligently performing large-scale data clustering based on big data input. Background technique [0002] Clustering as a typical data classification method, its core is to find similar categories from large-scale data sets, and divide samples into multiple non-overlapping subsets. The K-means clustering algorithm is one of the most widely divided clustering methods, which uses the quality centers of various samples to represent the class for iteration, and clusters by dynamically adjusting various centers. However, the K-means algorithm is very dependent on the initial k centers. Improper selection of the initial centers may easily lead to local optimal solutions, increase the number of iterations, and reduce execution efficiency. In addition, during the K-means clustering process, data points ne...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Login to View More

Application Information

Patent Timeline

Login to View More

Patent Type & AuthorityApplications(China)

IPC IPC(8): G06K9/62

CPCG06F18/23213

Inventor陈善彪尹浩

OwnerPING AN TECH (SHENZHEN) CO LTD

Large-scale data clustering method and device and computer readable storage medium

AI Technical Summary This helps you quickly interpret patents by identifying the three key elements: Problems solved by technologyMethod usedBenefits of technology

Problems solved by technology

Method used

Image

Examples

Embodiment Construction

PUM

Abstract

Description

Claims

Application Information

AI Technical Summary
This helps you quickly interpret patents by identifying the three key elements:
Problems solved by technology
Method used
Benefits of technology