Looking for breakthrough ideas for innovation challenges? Try Patsnap Eureka!

An integrated three-way clustering method based on a Spark platform and employing two-evaluation weight selection

A clustering method and clustering algorithm technology, applied in the direction of instruments, character and pattern recognition, computer components, etc., can solve the problems of not being able to show data objects, not being able to intuitively show the influence of objects on building clusters, etc. , to achieve the effect of improving algorithm efficiency, high robustness, and strong scalability

Active Publication Date: 2017-12-15
CHONGQING UNIV OF POSTS & TELECOMM
View PDF5 Cites 15 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

The disadvantage of this clear binary division of data objects is that it cannot show the data objects that may belong to this cluster, so it cannot intuitively show the influence of objects on the construction of clusters

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • An integrated three-way clustering method based on a Spark platform and employing two-evaluation weight selection
  • An integrated three-way clustering method based on a Spark platform and employing two-evaluation weight selection
  • An integrated three-way clustering method based on a Spark platform and employing two-evaluation weight selection

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0046] The technical solutions in the embodiments of the present invention will be described clearly and in detail below with reference to the drawings in the embodiments of the present invention. The described embodiments are only some of the embodiments of the invention.

[0047] The technical scheme that the present invention solves the problems of the technologies described above is:

[0048] figure 1 A kind of based on Spark platform that the present invention proposes adopts the weighted selection of two evaluations to integrate three clustering flow charts, and the self-defined partition stage carries out data partitioning to the input data set; by the K-Means clustering algorithm based on Spark, set The number of initial clusters and the number of iterations are used to generate initial cluster members; the labels of the initial cluster members are aligned, and then new cluster members are selected through two evaluations, where the first evaluation is to find a refer...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

PUM

No PUM Login to View More

Abstract

The invention relates to an integration three-way clustering method based on a Spark platform and employing two-evaluation weight selection; the method mainly comprises the following steps: 1, partitioning and managing big data so as to from a corresponding elastic distributed data set (RDD); 2, using a K-Means cluster algorithm based on Spark to cluster the data of each partitioned area, thus forming a plurality of different clustering members; 3, using two evaluations to build a novel evaluation function and weight selection strategy, selecting the clustering members, deleting clustering results with poor clustering effects, and forming new clustering members; 4, integrating the clustering members, building a weighted ballot matrix, dividing clusters according to three-way decision rules, and finally obtaining a three-way clustering result. The method can greatly reduce the algorithm operation time, thus improving the algorithm efficiency.

Description

technical field [0001] The invention belongs to the technical field of big data processing and data mining, and in particular relates to a Spark-based three-branch selective integration method and the field of three-branch decision-making. Background technique [0002] With the rapid development of social informatization and networking, data is growing explosively every day. Facing the generation of massive data, big data has also attracted more and more attention. The data generated and accumulated in daily operations in the medical field, bioscience field, financial field, Internet and other fields can no longer be measured in GB or TB. As of 2012, the amount of data has jumped from TB level to PB, EB or even ZB level. These data contain a lot of value, and the new information and knowledge that can be obtained from the analysis and mining of these data will have a wide range of applications in various fields, such as e-commerce, O2O, logistics and distribution, etc., all...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

Application Information

Patent Timeline
no application Login to View More
Patent Type & Authority Applications(China)
IPC IPC(8): G06K9/62
CPCG06F18/23213
Inventor 于洪陈云胡峰王国胤胡军
Owner CHONGQING UNIV OF POSTS & TELECOMM
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Patsnap Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Patsnap Eureka Blog
Learn More
PatSnap group products