Clustering method for high dimensional data based on Bayes mixed common factor analyzer

What is Al technical title?
Al technical title is built by PatSnap Al team. It summarizes the technical point description of the patent document.
A technology of high-dimensional data and common factors, applied in the fields of electrical digital data processing, special data processing applications, instruments, etc.

Inactive Publication Date: 2013-07-31

INFORMATION & COMM BRANCH OF STATE GRID JIANGSU ELECTRIC POWER

View PDF2 Cites 16 Cited by

Summary
Abstract
Description
Claims
Application Information

AI Technical Summary
This helps you quickly interpret patents by identifying the three key elements:
Problems solved by technology
Method used
Benefits of technology

Problems solved by technology

However, MFA-based methods for high-dimensional data processing, especially when used for clustering, still have limitations

First of all, in MFA, because each mixture component has a different factor loading matrix, the overall number of parameters of the model is large, and the existing MFA is based on the maximum likelihood criterion for model inference and parameter estimation, so in high When the number of samples of dimensional data is not large, overfitting problems are prone to occur; secondly, and most importantly, in most cases in the application of data clustering, the number of categories is unknown in advance, if set too high Or too low, will affect the accuracy of the final clustering results, and for high-dimensional data, this problem will become more difficult, how to adaptively determine the optimal clustering based on high-dimensional data while reducing dimensionality The number of categories, so as to obtain better clustering performance, is a difficult problem and key point in high-dimensional data clustering techniques and methods

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Image

Smart Image Click on the blue labels to locate them in the text.

Viewing Examples

Smart Image

Examples

Experimental program

Comparison scheme

Effect test

Embodiment Construction

[0095] In order to better illustrate the high-dimensional data clustering method based on the Bayesian Mixed Common Factor Analyzer (BMCFA) involved in the present invention, it is applied to the clustering of high-dimensional gene expression data in the field of bioinformatics. The data source to be clustered comes from the preprocessed 248 tissue samples provided by Yeoh et al., and the dimension of each sample is 50 (E. J. Yeoh et al. Classification, subtype discovery, and prediction of outcome in pediatric acute lymphoblastic leukemia by gene expression profiling, Cancer Cell, vol.1, no.2, pp.133-143, 2002.), namely N = 248, p = 50, .

[0096] There are 6 classes in this application, the class names and the number of samples in this class are: MLL (20 samples), T-ALL (43 samples), Hyperdip (64 samples), TEL-AML1 (79 samples ), E2A-PBX1 (27 samples), BCR-ABL (15 samples). Assume that the number of clusters and specific conditions are not known before clustering, and ...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Login to View More

PUM

Login to View More

Abstract

The invention discloses a clustering method for high dimensional data based on a Bayes mixed common factor analyzer. The method comprises the following steps: firstly, a model of a Bayes mixed common factor analyzer is built for to-be-clustered high dimensional data; secondly, posteriori distributions of various random variables of the model are subjected to inference, and statistics relevant to the random variables can be obtained; and finally, categories which each dimensional datum belongs to can be obtained through judgment, and the clustering process can be completed. According to the invention, the built Bayes mixed common factor analyzer model has strong flexibility; as the method is based on the inference procedure of Bayes criterion, the phenomenon of overfitting and a dimensionality disaster can be prevented effectively; the method can automatically adjust an optimal structure of the model according to the high dimensional data, so that optimal category data can be confirmed automatically to finish clustering smoothly while performing dimensionality reduction, and excellent clustering performance and computational efficiency can be obtained.

Description

Technical field [0001] The invention involves a clustering method based on the Bayesian hybrid public factor analyzer, which is a processing method and application technology in high -dimensional data. [0002] Background technique [0003] With the continuous development of collection and storage technology, high -dimensional and ultra -high -dimensional data has continued to emerge.For example, tens of thousands of face images commonly common in image retrieval and document search and the inevitable high -vitamin vector, voice and audio signal processing of hundreds of thousands of web texts, voice and audio signalsPerform high -vitamin expression data in cluster analysis, and so on.Obviously, the higher the number of dimensions (the more attributes of the object), you can more comprehensively portray the described objects and better distinguish the object.However, when the data sample is not large, the excessive dimension inevitably has a severe challenge to the processing of ...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Login to View More

Application Information

Patent Timeline

Login to View More

Patent Type & Authority Applications(China)

IPC IPC(8): G06F17/30

Inventor 魏昕李宗辰

Owner INFORMATION & COMM BRANCH OF STATE GRID JIANGSU ELECTRIC POWER

Features

R&D
Intellectual Property
Life Sciences
Materials
Tech Scout

Why Patsnap Eureka

Unparalleled Data Quality
Higher Quality Content
60% Fewer Hallucinations

Social media

Patsnap Eureka Blog

Learn More

Browse by: Latest US Patents, China's latest patents, Technical Efficacy Thesaurus, Application Domain, Technology Topic, Popular Technical Reports.

Clustering method for high dimensional data based on Bayes mixed common factor analyzer

AI Technical Summary This helps you quickly interpret patents by identifying the three key elements: Problems solved by technologyMethod usedBenefits of technology

Problems solved by technology

Method used

Image

Examples

Embodiment Construction

PUM

Abstract

Description

Claims

Application Information

AI Technical Summary
This helps you quickly interpret patents by identifying the three key elements:
Problems solved by technology
Method used
Benefits of technology