Difference privacy protection K-means cluster method based on profile coefficient

A technology of k-means clustering and silhouette coefficient, which is applied in the field of information security, can solve the problems of big data clustering analysis information leakage, etc., and achieve the effects of good clustering result availability, good algorithm stability, and increased execution time

Inactive Publication Date: 2018-09-18
XIAN UNIV OF TECH
View PDF0 Cites 7 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0004] The purpose of the present invention is to provide a K-means clustering method for differential privacy protection based on silhouette coefficients, which solves the serious problem of information leakage in the large data clustering analysis in the distributed environment existing in the prior art

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Difference privacy protection K-means cluster method based on profile coefficient
  • Difference privacy protection K-means cluster method based on profile coefficient
  • Difference privacy protection K-means cluster method based on profile coefficient

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0025] The present invention will be described in detail below in conjunction with the accompanying drawings and specific embodiments.

[0026] A K-means clustering method for differential privacy protection based on silhouette coefficients in the present invention, the process is as follows figure 1 As shown, the specific steps are as follows:

[0027] Step 1. Divide the data set into M pieces of data of the same size and perform the Map task and Reduce task respectively. Assume that the data set is D, the total number of records in the data set is N, and the record is recorded as a i , where, 1≤i≤N, the dimension of the record is d, the number of clusters is K, and the kth central point is denoted as u k , 1≤k≤K, privacy budget ε, the random noise of the kth cluster in the tth iteration is t is the number of iterations;

[0028] Step 2. Perform normalization processing on all the data in the data set D, and distribute all points to [0,1] during normalization processing ...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention discloses a difference privacy protection K-means cluster method based on a profile coefficient; the method comprises the following steps: 1, normalizing all data in a data set D; 2, averagely dividing the data set D with N records into K sets; 3, solving an attribute vector sum (sum <o><k>) and a record number (num <0><k>) of all records in a set Ck; 4, calculating the distance between each record (ai) to the center (uk) of K clusters; 5, calculating the recording number (num) of the cluster and the attribute vector (sum) of all records; 6, calculating a profile coefficient Sk of the k clusters, and adding random noises (noise <t><k>) onto numk and sumk; 7, calculating a new cluster center; 8, calculating the distance between the new cluster center and the cluster center ofthe previous iteration, and finishing the algorithm if the distance is smaller than a threshold. An existing method in the big data cluster analysis under a distributed environment is severe in information leakage; the difference privacy protection K-means cluster method can solve said problems.

Description

technical field [0001] The invention belongs to the technical field of information security, and in particular relates to a K-means clustering method for differential privacy protection based on contour coefficients. Background technique [0002] As an important method of obtaining information in the current big data environment, data mining obtains useful information through various methods such as statistics, machine learning, and pattern recognition. These information are widely used in business management, production control, market analysis, and scientific research. . Cluster analysis is a typical data mining method, and its main idea is to gather data into several categories, so that the difference between each cluster is the largest, and the data difference within the cluster is the smallest. K-means algorithm is a clustering algorithm with simple thought and fast clustering convergence speed, which is widely used in various fields. [0003] In big data clustering a...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
IPC IPC(8): G06K9/62G06F17/30
CPCG06F18/23213
Inventor 张亚玲刘娜
Owner XIAN UNIV OF TECH
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products