Random forest visualized data analysis method based on largeVis

A random forest and data analysis technology, applied in the fields of big data analysis, pattern recognition, and machine learning, can solve problems such as low precision, low precision, and reduced efficiency of t-SNE, and achieve improved operating speed, good adaptability, and reliability highly usable effect

Active Publication Date: 2018-07-13
FUZHOU UNIV
View PDF6 Cites 8 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0003] At present, the t-SNE algorithm in manifold learning has been widely used, but there are the following shortcomings: when dealing with large-scale high-dimensional data, the efficiency of t-SNE is significantly reduced (including the improved algorithm); the parameters in t-SNE It is sensitive to different data sets. After adjusting the parameters on one data set and getting a good visualization effect, it is found that it cannot be applied to another data set, and it takes a lot of time to find suitable parameters, which is important for the entire classification. The limitation of the model is very huge; pure original high-dimensional data directly enters the model training and classification through dimensionality reduction, with low accuracy and long training time
In addition, the current methods for data dimensionality reduction generally use the original data for dimensionality reduction and use existing models for classification, but this may have problems such as low accuracy and unexplainable data for dimensionality reduction

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Random forest visualized data analysis method based on largeVis
  • Random forest visualized data analysis method based on largeVis
  • Random forest visualized data analysis method based on largeVis

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0037] The technical solution of the present invention will be specifically described below in conjunction with the accompanying drawings.

[0038] A kind of random forest visualization data analysis method based on LargeVis of the present invention is realized according to the following steps:

[0039] Step S1: training data set preprocessing;

[0040] Step S2: extracting sample features whose proportion in the training data set is greater than the preset proportion threshold through random forest;

[0041] Step S3: Using LargeVis for dimensionality reduction

[0042] Step S4: Perform visualization processing based on Random Forest of LargeVis.

[0043] In this embodiment, due to the problems of unbalanced data samples and outliers in practical applications, this will lead to poor classification results. Imbalanced training data sets can cause many problems in pattern recognition. For example, if the dataset is unbalanced, the classifier tends to "learn" the largest propo...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention relates to a random forest visualized data analysis method based on LargeVis. The random forest visualized data analysis method comprises the steps of preprocessing a training dataset; extracting important characteristics of the training dataset through a random forest; adopting LargeVis to reduce the dimension; based on the random forest of LargeVis, conducting visualized processing. By means of the random forest visualized data analysis method based on LargeVis, aiming at high-dimension data, through the characteristic importance trained by the random forest, new sub-high-dimension data is formed, and then through the data subjected to LargeVis dimension reduction, the sub-high-dimension data is sent into the random forest to be predicted and analyzed to form visualization,the classifying precision can be improved, the visualization time can also be prolonged, and meanwhile, the random forest visualized data analysis method adapts to different pieces of data.

Description

technical field [0001] The present invention relates to pattern recognition, machine learning, big data analysis, especially a random forest visualization data analysis method based on LargeVis. Background technique [0002] In the era of big data, the dimensionality of data features is getting higher and higher, and it is particularly important to analyze data through a method of dimensionality reduction. At the same time, how to visualize high-dimensional data is also the focus of research in the current environment. At present, the most classic dimensionality reduction method is PCA (Principal Component Analysis). It not only reduces the dimensionality of high-dimensional data, but more importantly, removes noise through dimensionality reduction and discovers patterns in the data. PCA replaces the original n features with a smaller number of m features. The new features are linear combinations of the old features. These linear combinations maximize the sample variance and...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
Patent Type & Authority Applications(China)
IPC IPC(8): G06F17/30G06K9/46G06K9/62
CPCG06F16/26G06V10/40G06F18/241
Inventor 黄立勤陈宋
Owner FUZHOU UNIV
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products