Initial clustering center determination method and device based on K-means clustering algorithm

A technology of initial clustering centers and clustering algorithms, applied in the field of machine learning, can solve problems such as clustering results falling into local optimum, and achieve the effect of improving accuracy

Inactive Publication Date: 2019-09-10
雷恩友力数据科技南京有限公司
View PDF3 Cites 3 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0004] The technical problem to be solved by the present invention is to provide a method and device for determining an initial cluster center based on the K-means clustering algorithm, so as to solve t

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Initial clustering center determination method and device based on K-means clustering algorithm
  • Initial clustering center determination method and device based on K-means clustering algorithm
  • Initial clustering center determination method and device based on K-means clustering algorithm

Examples

Experimental program
Comparison scheme
Effect test

Embodiment 1

[0052] Such as figure 1 As shown, the initial cluster center determination method based on the K-means clustering algorithm provided by the embodiment of the present invention includes:

[0053] S101. Acquire a data object set, where the data object set includes: a microblog document set;

[0054] S102. Determine the average similarity between each data object in the data object set and other data objects, and obtain data objects whose average similarity is greater than or equal to a preset density threshold as core objects;

[0055] S103. Select a plurality of core objects most dissimilar to each other from the core objects as initial clustering centers of the K-means clustering algorithm, so that the K-means clustering algorithm performs clustering according to the obtained initial clustering centers.

[0056] The method for determining the initial cluster center based on the K-means clustering algorithm described in the embodiment of the present invention obtains an averag...

Embodiment 2

[0096]The present invention also provides a specific implementation of an initial cluster center determination device based on the K-means clustering algorithm, since the initial cluster center determination device based on the K-means clustering algorithm provided by the present invention is the same as the aforementioned K-means-based Corresponding to the specific implementation of the method for determining the initial cluster center of the clustering algorithm, the device for determining the initial cluster center based on the K-means clustering algorithm can realize the purpose of the present invention by performing the process steps in the specific implementation of the above method Therefore, the above explanations in the specific implementation of the method for determining the initial cluster center based on the K-means clustering algorithm are also applicable to the specific implementation of the device for determining the initial cluster center based on the K-means cl...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention provides an initial clustering center determination method and device based on a K-mean clustering algorithm, which can discover the public opinion hot topics quickly and accurately froma large amount of microblog data. The method comprises the steps of obtaining a data object set, wherein the data object set comprises a microblog document set; determining the average similarity between each data object in the data object set and other data objects, and obtaining the data object of which the average similarity is greater than or equal to a preset density threshold as a core object; and selecting a plurality of core objects which are most dissimilar from the core objects as an initial clustering center of a K-mean clustering algorithm, so that the K-mean clustering algorithmperforms clustering according to the obtained initial clustering center. The invention relates to the field of machine learning.

Description

technical field [0001] The invention relates to the field of machine learning, in particular to a method and device for determining an initial cluster center based on a K-means clustering algorithm. Background technique [0002] With the continuous advancement of media technology and the increasing diversification of information dissemination channels, today's society has entered the era of self-media where "everyone is a news disseminator". Netizens are enthusiastic about participating in speeches, especially with the rise of Weibo. Netizens can post speeches anytime and anywhere through computers and mobile phones. Since the launch of new network applications such as Sina Weibo-Twitter, the number of registered users, monthly active users and daily users has increased rapidly, and public opinion on Weibo has become a very influential type of Internet public opinion. How to quickly and effectively discover hot topics that netizens care about from massive data, so as to gui...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
IPC IPC(8): G06F16/35G06F17/27G06K9/62
CPCG06F16/35G06F40/205G06F18/23213
Inventor 周成成杨兵强安凤平
Owner 雷恩友力数据科技南京有限公司
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products