Data clustering method and apparatus, computer readable medium and electronic device

A data clustering and data technology, applied in the field of data processing, can solve problems such as slow convergence, failure to detect outliers, failure to obtain clustering results, etc., to reduce time spent, avoid time-consuming and inappropriate selection effects

Inactive Publication Date: 2017-08-18
BEIJING JINGDONG SHANGKE INFORMATION TECH CO LTD +1
View PDF3 Cites 25 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0006] (2) In the k-means algorithm, the initial clustering center needs to be determined artificially. Different initial clustering centers may lead to completely different clustering results. Once the initial value is not selected properly, effective clustering results may not be obtained. ;
[0007] (3) The k-means algorithm is sensitive to outliers and cannot detect outliers, and outliers sometimes have a great impact on the accuracy of the cluster center;
[0008] (4) The k-means algorithm needs to continuously adjust the sample classification and continuously calculate the adjusted new clustering center, the convergence is slow and the clustering time complexity is high. When the amount of data is very large, the time overhead of the algorithm Very big;
[0009] (5) The k-means algorithm needs to scan the full amount of data multiple times, and cannot cluster real-time data

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Data clustering method and apparatus, computer readable medium and electronic device
  • Data clustering method and apparatus, computer readable medium and electronic device
  • Data clustering method and apparatus, computer readable medium and electronic device

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0037] Example embodiments will now be described more fully with reference to the accompanying drawings. Example embodiments may, however, be embodied in many forms and should not be construed as limited to the examples set forth herein; rather, these embodiments are provided so that this disclosure will be thorough and complete and fully convey the concept of example embodiments to those skilled in the art.

[0038] Furthermore, the described features, structures, or characteristics may be combined in any suitable manner in one or more embodiments. In the following description, numerous specific details are provided in order to give a thorough understanding of embodiments of the invention. However, those skilled in the art will appreciate that the technical solutions of the present invention may be practiced without one or more of the specific details, or other methods, components, means, steps, etc. may be employed. In other instances, well-known methods, apparatus, implemen...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention provides a data clustering method and apparatus, a computer readable medium and an electronic device. The data clustering method comprises the steps of obtaining a to-be-clustered data set; calculating a distance between each piece of data in the data set and a clustering center of each existing type; if the distance between any data in the data set and the clustering center of any existing type is smaller than or equal to a distance threshold, including any data in any type; if the distances between any data in the data set and the clustering centers of all existing types are all greater than the distance threshold, creating a new type, and including any data in the new type. According to the technical scheme, during data clustering, a clustering number and the clustering centers do not need to be specified in advance, so that the bad influence on a final clustering result due to wrong selection of an initial clustering center is avoided and the time for the data clustering process can be shortened.

Description

technical field [0001] The present invention relates to the technical field of data processing, in particular to a data clustering method, device, computer readable medium and electronic equipment. Background technique [0002] In the process of building the user portrait label model, after extracting user features and standardizing the feature data, there are many scenarios for building labels based on clustering, such as promotion sensitivity clustering, comment sensitivity clustering, user loyalty clustering, etc. . Clustering is to divide the user set into different classes or clusters according to a certain standard under the corresponding user characteristics, so that the similarity or distance of the user characteristics in the same class or cluster is as large as possible or the distance is as small as possible, and at the same time they are not in the same class. Or the difference of user characteristics in the cluster is also as large as possible. In short, after...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
Patent Type & Authority Applications(China)
IPC IPC(8): G06K9/62G06F17/30
CPCG06F16/285G06F18/23213
Inventor 李树海
Owner BEIJING JINGDONG SHANGKE INFORMATION TECH CO LTD
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products