A data clustering method and device

What is Al technical title?
Al technical title is built by PatSnap Al team. It summarizes the technical point description of the patent document.
A data clustering and data technology, applied in the field of data processing, can solve the problems of reducing the accuracy of clustering and not considering the impact of clustering, and achieve the effect of improving accuracy

Active Publication Date: 2022-07-19

深圳软通动力科技有限公司

View PDF2 Cites 0 Cited by

Summary
Abstract
Description
Claims
Application Information

AI Technical Summary
This helps you quickly interpret patents by identifying the three key elements:
Problems solved by technology
Method used
Benefits of technology

Problems solved by technology

[0004] By using the EM (Expectation Maximization, maximum expectation) algorithm to solve the problem of making the mixed density suitable for uncertainty data clustering and the fuzzy C-means clustering algorithm, but these two data clustering methods do not consider the impact of uncertainty on clustering. influence, resulting in a decrease in clustering accuracy

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Image

Smart Image Click on the blue labels to locate them in the text.

Viewing Examples

Smart Image

Examples

Experimental program

Comparison scheme

Effect test

Embodiment Construction

[0060] Currently, the data clustering problem is in the dataset C j (j from 1 to K) find a dataset C where dataset C j by the similarity-based mean c j (It can be regarded as a dataset C j The preset initial centroid of the Minimizing the distance between data in the same dataset can also be regarded as minimizing the distance between each piece of data in the same dataset and minimizing the distance between each piece of data and a preset initial centroid in that dataset.

[0061] The applicant studies a clustering algorithm suitable for uncertain data starting from the hard clustering algorithm-means clustering (K-means) algorithm, wherein the purpose of the K-means algorithm is to find a data set C from K data sets to minimize the sum of squared errors (SSE). The formula for calculating the sum of squared errors is as follows:

[0062]

[0063] ||.|| represents a data x i with the preset initial centroid c of the dataset j the distance. For example, Euclidean dis...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Login to View More

PUM

Login to View More

Abstract

The present invention provides a data clustering method and device. In the case of obtaining the uncertainty data to be clustered, based on the uncertainty probability density function of the uncertainty data, the method and device for clustering the uncertainty data are calculated. Information is required, such as uncertainty probability density function based on uncertainty data, recalculation of the preset initial centroid of the data set, uncertainty data regarded as uncertainty data relative to the expected sum of squared errors of the data set to this data set. The expected square error of the preset initial centroid recalculated by the data set and the sum of the expected square errors of the uncertainty data to the preset initial centroids of other data sets, and then determine the data set with the smallest sum of expected square errors For the target data set, divide the uncertainty data into the target data set, realize the clustering of the uncertainty data based on the uncertainty probability density function of the uncertainty data, and improve the accuracy of the uncertainty data clustering .

Description

technical field [0001] The invention belongs to the technical field of data processing, and in particular relates to a data clustering method and device. Background technique [0002] Due to inaccurate measurements, sampling errors, outdated data sources, or other reasons, data often have the nature of uncertainty (uncertainty data for short), especially in applications that require interaction with the real environment, such as mobile location services and sensor monitoring In other applications, taking the tracking of moving targets (such as vehicles or people) in mobile positioning services as an example, it is impossible to completely track the accurate instantaneous positions of all moving targets in mobile positioning services, so the position change process of each moving target is accompanied by There is uncertainty, and this uncertainty will have an impact on data management, such as data query and data clustering. [0003] There are two types of uncertainty in cur...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Login to View More

Application Information

Patent Timeline

Login to View More

Patent Type & Authority Patents(China)

IPC IPC(8): G06F16/906G06K9/62

CPCG06F18/23213

Inventor 陈力铭叶朱荪张峰马新杰

Owner 深圳软通动力科技有限公司

A data clustering method and device

AI Technical Summary This helps you quickly interpret patents by identifying the three key elements: Problems solved by technologyMethod usedBenefits of technology

Problems solved by technology

Method used

Image

Examples

Embodiment Construction

PUM

Abstract

Description

Claims

Application Information

AI Technical Summary
This helps you quickly interpret patents by identifying the three key elements:
Problems solved by technology
Method used
Benefits of technology