A method and apparatus for data cluster

A data clustering and data technology, applied in the field of data processing, can solve the problems of clustering accuracy reduction, without considering the impact of clustering, etc., to achieve the effect of improving accuracy

Active Publication Date: 2018-12-14
深圳软通动力科技有限公司
View PDF7 Cites 2 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0004] By using the EM (Expectation Maximization, maximum expectation) algorithm to solve the problem of making the mixed density suitable for uncertainty data clustering and the fuzzy C-means clustering algorithm, but these two data clustering methods do not consider the impact of uncertainty on clustering. influence, resulting in a decrease in clustering accuracy

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • A method and apparatus for data cluster
  • A method and apparatus for data cluster
  • A method and apparatus for data cluster

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0060] Currently, the data clustering problem is in the dataset C j (j from 1 to K) find a data set C, where data set C j by the similarity-based mean c j (Can be regarded as data set C j The preset initial centroid), and different data clustering algorithms can correspond to different objective functions, but the main idea is to minimize the distance between data in the same data set and maximize the distance between data in different data sets, where the minimum Minimizing the distance between data in the same data set can also be regarded as minimizing the distance between each piece of data in the same data set and minimizing the distance between each piece of data and the preset initial centroid in the data set.

[0061] The applicant starts from the hard clustering algorithm - mean clustering (K-means) algorithm to study the clustering algorithm suitable for uncertain data, where the purpose of the K-means algorithm is to find a data set C from K data sets to minimize...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

A method and apparatus for data cluster are provided. The method comprises the steps of: in the case of obtaining uncertain data to be clustered, based on the uncertainty probability density functionof the uncertainty data, calculating the information needed for clustering the uncertain data, such as an based on uncertain probability density function of uncertain data, recalculating the preset initial centroid of the dataset, the sum of the expected square error of the recalculated preset initial centroid of the data set and the expected square error of the uncertain data to the preset initial centroid of the other data sets as uncertainty data considered as the sum of the expected square error of the uncertain data with respect to the data set, furthermore, determining the data set withthe minimum sum of expected square error as the target data set, and classifying the uncertain data into the target data set, so as to realize the clustering of the uncertain data by the uncertain probability density function based on the uncertain data, and improve the accuracy of the clustering of the uncertain data.

Description

technical field [0001] The invention belongs to the technical field of data processing, and in particular relates to a data clustering method and device. Background technique [0002] Due to measurement inaccuracy, sampling error, outdated data sources, or other reasons, data often has the nature of uncertainty (referred to as uncertainty data), especially in applications that need to interact with the real environment, such as mobile location services and sensor monitoring In applications such as mobile positioning services, for example, tracking moving targets (such as vehicles or people) in mobile positioning services, it is impossible to completely track the exact instantaneous positions of all moving targets in mobile positioning services, so the position change process of each moving target is accompanied by There is uncertainty, which will affect data management, such as data query and data clustering. [0003] There are two types of uncertainty in current data: exis...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
Patent Type & Authority Applications(China)
IPC IPC(8): G06F17/30G06K9/62
CPCG06F18/23213
Inventor 陈力铭叶朱荪张峰马新杰
Owner 深圳软通动力科技有限公司
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products