Clustering method for high dimensional data based on Bayes mixed common factor analyzer
What is AI technical title?
AI technical title is built by PatSnap AI team. It summarizes the technical point description of the patent document.
A technology of high-dimensional data and common factors, applied in the fields of electrical digital data processing, special data processing applications, instruments, etc.
What is AI technical title?
AI technical title is built by PatSnap AI team. It summarizes the technical point description of the patent document.
A technology of high-dimensional data and common factors, applied in the fields of electrical digital data processing, special data processing applications, instruments, etc.
CN103226595AInactive 📅 Publication Date: 2013-07-31INFORMATION & COMM BRANCH OF STATE GRID JIANGSU ELECTRIC POWER
Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
Click on the blue label to locate the original text in one second.
Reading with bidirectional positioning of images and text.
Smart Image
Examples
Experimental program
Comparison scheme
Effect test
Embodiment Construction
[0095] In order to better illustrate the high-dimensional data clustering method based on the Bayesian Mixed Common Factor Analyzer (BMCFA) involved in the present invention, it is applied to the clustering of high-dimensional gene expression data in the field of bioinformatics. The data source to be clustered comes from the preprocessed 248 tissue samples provided by Yeoh et al., and the dimension of each sample is 50 (E. J. Yeoh et al. Classification, subtype discovery, and prediction of outcome in pediatric acute lymphoblastic leukemia by gene expression profiling, Cancer Cell, vol.1, no.2, pp.133-143, 2002.), namely N = 248, p = 50, .
[0096] There are 6 classes in this application, the class names and the number of samples in this class are: MLL (20 samples), T-ALL (43 samples), Hyperdip (64 samples), TEL-AML1 (79 samples ), E2A-PBX1 (27 samples), BCR-ABL (15 samples). Assume that the number of clusters and specific conditions are not known before clustering, and ...
the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More
PUM
Login to View More
Abstract
The invention discloses a clustering method for high dimensional data based on a Bayes mixed common factor analyzer. The method comprises the following steps: firstly, a model of a Bayes mixed common factor analyzer is built for to-be-clustered high dimensional data; secondly, posteriori distributions of various random variables of the model are subjected to inference, and statistics relevant to the random variables can be obtained; and finally, categories which each dimensional datum belongs to can be obtained through judgment, and the clustering process can be completed. According to the invention, the built Bayes mixed common factor analyzer model has strong flexibility; as the method is based on the inference procedure of Bayes criterion, the phenomenon of overfitting and a dimensionality disaster can be prevented effectively; the method can automatically adjust an optimal structure of the model according to the high dimensional data, so that optimal category data can be confirmed automatically to finish clustering smoothly while performing dimensionality reduction, and excellent clustering performance and computational efficiency can be obtained.
Description
Technical field [0001] The invention involves a clustering method based on the Bayesian hybrid public factor analyzer, which is a processing method and application technology in high -dimensional data. [0002] Background technique [0003] With the continuous development of collection and storage technology, high -dimensional and ultra -high -dimensional data has continued to emerge.For example, tens of thousands of face images commonly common in image retrieval and document search and the inevitable high -vitamin vector, voice and audio signal processing of hundreds of thousands of web texts, voice and audio signalsPerform high -vitamin expression data in cluster analysis, and so on.Obviously, the higher the number of dimensions (the more attributes of the object), you can more comprehensively portray the described objects and better distinguish the object.However, when the data sample is not large, the excessive dimension inevitably has a severe challenge to the processing of ...
Claims
the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine