Large-scale multi-view data self-dimension-reduction K-means algorithm and system

A k-means algorithm and multi-view technology, applied in the field of information processing, can solve problems such as ignoring view information complementarity

Inactive Publication Date: 2020-01-17
CIVIL AVIATION UNIV OF CHINA
View PDF0 Cites 9 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

However, the K-means algorithm treats all features equally, is very sensitive to redundant features, and is prone to clustering errors.
If dimensionality reduction is performed separately for different views, the information complementarity between views will be ignored

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Large-scale multi-view data self-dimension-reduction K-means algorithm and system
  • Large-scale multi-view data self-dimension-reduction K-means algorithm and system
  • Large-scale multi-view data self-dimension-reduction K-means algorithm and system

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0080] In order to further understand the invention content, characteristics and effects of the present invention, the following examples are given, and detailed descriptions are as follows in conjunction with the accompanying drawings:

[0081] see figure 1 ,

[0082] First preferred embodiment:

[0083] A self-dimension reduction K-means algorithm for large-scale multi-view data, which aims to improve the clustering performance of the traditional multi-view K-means algorithm in high-dimensional data. Fully consider the relationship between features and clustering targets, use the information complementarity between different views, realize the self-reduction of high-dimensional data by finding the optimal subspace on a single view, and use non-negative matrix factorization (NMF) to reduce the loss The function is reconstructed, so that different views share the same clustering indicator matrix, so as to realize the complementarity of multi-view information and complete the...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention relates to a large-scale multi-view data self-dimension-reduction K-means algorithm and a system, and belongs to the technical field of information processing, and the method comprises the steps: 1, carrying out the normalization of data with different features, and enabling all data to be in a range of [-1, 1]; 2, initializing; 3, optimizing the algorithm; and 4, using a data set tooptimize the algorithm according to the algorithm until the algorithm is finally converged to obtain a final clustering result, measuring the clustering effect by using the interaction information entropy and the purity, and repeating the step 3 by selecting different initial values, and removing the average value of the result to complete the experiment. The relationship between the features andthe clustering targets is fully considered; information complementation among different views is utilized, self-dimension reduction of high-dimensional data is achieved by searching for an optimal subspace on a single view, a loss function is reconstructed through non-negative matrix factorization (NMF), the different views share the same clustering indication matrix, therefore, multi-view information complementation is achieved, and clustering of large-scale multi-view data is completed.

Description

technical field [0001] The invention belongs to the technical field of information processing, and in particular relates to a large-scale multi-view data self-dimension reduction K-means algorithm and system. Background technique [0002] In the era of big data, the data of the same entity may come from different data sources and thus can be expressed in multiple forms, each of which has different distribution, scale and density. For example: when describing a person speaking, voice, facial expression, and lip changes can all be used as features; in the field of image recognition, images can extract various features such as color, outline, and key points. The heterogeneous feature data generated by observing the same object from different angles is called multi-view data. [0003] Unsupervised clustering of large-scale data is one of the main tasks of data mining. Studies have shown that effective information complementation between multi-view data can greatly improve clus...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
Patent Type & Authority Applications(China)
IPC IPC(8): G06K9/62
CPCG06F18/23213G06F18/213
Inventor 曹卫东蔡浩天王怀超
Owner CIVIL AVIATION UNIV OF CHINA
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products