High-dimensional data feature selection method based on graph neural network and spectral clustering

A feature selection method and neural network technology, applied in the field of high-dimensional data feature selection, can solve problems such as focusing on model results and ignoring interaction relationships

Active Publication Date: 2021-01-15
NORTHEASTERN UNIV
View PDF1 Cites 4 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

However, these algorithms generally assume that the samples are independent of each other, or only consider the characteristic relationship between the data, and are limited to...

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • High-dimensional data feature selection method based on graph neural network and spectral clustering
  • High-dimensional data feature selection method based on graph neural network and spectral clustering
  • High-dimensional data feature selection method based on graph neural network and spectral clustering

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0061] The invention will be further described below in conjunction with the accompanying drawings and specific implementation examples. In the present invention, each gene is used as a node to establish a gene relationship graph structure model, and the gene relationship data is added to the gene relationship graph as side information. It should be noted that there are many gene relationships, such as homologous expression Co-expression, Physical Interaction and Pathway, etc., taking Physical Interaction as an example of the edge relationship, the graph structure model constructed is an undirected graph. In the gene relationship graph, each node represents each gene and score, and each The edges represent the mutual relationship between every two genes, and the nodes are divided into two categories in this method: isolated nodes and normal nodes. Genes not involved in prior knowledge will form isolated nodes during the establishment of the graph structure, but isolated nodes ...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention provides a high-dimensional data feature selection method based on a graph neural network and spectral clustering. The method comprises the steps: taking each gene as a node to establisha gene relation graph structure model, taking gene correlation data as side information to be added into a gene relation graph, taking a graph neural network model for obtaining feature vector representation of the nodes, and after the feature vector representation of each node is obtained, starting a link prediction stage, generating a new edge based on the genetic relationship graph, and obtaining a new genetic relationship graph; finally, selecting a node with the highest weight from the new genetic relationship graph based on spectral clustering to serve as a feature node. According to the invention, the finally selected gene has small redundancy, a good model effect is achieved, and the interpretability of the biological angle is supported.

Description

technical field [0001] The invention relates to the technical field of machine learning, in particular to a high-dimensional data feature selection method based on graph neural network and spectral clustering. Background technique [0002] In the field of bioinformatics, most of the processed objects are multi-feature, high-noise, and non-linear data sets. For example, researchers can use gene chips to simultaneously detect the expression values ​​of thousands of genes in one experiment, thereby obtaining a large amount of gene expression data; they can also use protein mass spectrometry to produce a large amount of protein expression profile data at one time. However, due to the high dimensionality and small number of samples of these data, conventional pattern recognition methods are no longer applicable. For this kind of data, how to eliminate redundant features and mine hidden useful biological information from massive data has become the key to research on identificati...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
IPC IPC(8): G06K9/62G06N3/04G06N3/08
CPCG06N3/08G06N3/045G06F18/2323G06F18/2113Y02A90/10
Inventor 栗伟谢维冬王林洁覃文军冯朝路闵新于鲲
Owner NORTHEASTERN UNIV
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products