Looking for breakthrough ideas for innovation challenges? Try Patsnap Eureka!

Large-scale graph data set-oriented statistical significance subgraph mining method and device

A graph data, large-scale technology, applied in the field of statistically significant subgraph mining, which can solve the problems of false positive individuals, inaccurate feature subgraph results, etc.

Inactive Publication Date: 2018-06-15
NORTHEASTERN UNIV
View PDF0 Cites 1 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

In such a case, if only the statistical significance of each subgraph can be controlled, a large number of false positive individuals will appear in the mining results, making the result of the final feature subgraph inaccurate

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Large-scale graph data set-oriented statistical significance subgraph mining method and device
  • Large-scale graph data set-oriented statistical significance subgraph mining method and device
  • Large-scale graph data set-oriented statistical significance subgraph mining method and device

Examples

Experimental program
Comparison scheme
Effect test

example

[0147] In addition, in order to better understand the method of the embodiment of the present invention, the present invention also provides a specific example, as follows:

[0148] In the embodiment of the present invention, real data sets provided by multiple NCI open source data sets, Mutag data sets and several synthetic data sets are used as experimental data. The specific information of the dataset is as follows:

[0149] NCI Dataset: The NCI Cancer Spectrum dataset is widely used in validation work for graph classification algorithms. In this embodiment, 10 NCI datasets were downloaded from the PubChem database. Each dataset belongs to a bioassay task for anticancer activity prediction. That is, if a compound molecule in a data set has anti-cancer activity against the corresponding cancer, this type of molecule is taken as a positive sample. The remaining compound molecule samples are negative samples.

[0150] Table 1 lists the summary information of the ten datase...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

PUM

No PUM Login to View More

Abstract

The invention discloses a large-scale graph data set-oriented statistical significance subgraph mining method and device. The method comprises the following steps of: mining a graph data set G, correcting a statistical significance threshold value under a family error rate threshold value <alpha>, obtaining a corrected significance threshold value <delta>* and a minimum support degree threshold value <sigma> to be satisfied by mined subgraphs when the significance threshold value <delta>* is reached on the basis of an improved replacement inspection algorithm westfall-younglight; mining all the subgraphs, statistical significance threshold values p of which are smaller than or equal to the significance threshold value <delta>*, in the graph data set G; and converging support degrees of allthe mined subgraphs to the minimum support degree threshold value <sigma>. The method is capable of effectively calculating the repeated calculated amounts in the process of mining significant subgraphs.

Description

technical field [0001] The invention relates to graph data mining technology, in particular to a statistically significant subgraph mining method and device for large-scale graph data sets. Background technique [0002] In applications in scientific research fields such as bioinformatics, computational chemistry, medical informatics, and social networks, a large amount of data modeled with graph patterns is generated. In order to conduct further analysis and research on these data, it is necessary to extract feature subgraphs as representatives of the original graph pattern. [0003] In the prior art, there are many studies on extracting feature subgraphs from graph data, for example, frequent subgraph mining algorithms are used to extract feature subgraphs. In addition, a series of AGM algorithms and FSG algorithms based on frequent subgraph mining algorithms have appeared. In addition, there are GraphMiner algorithms that use indexes to mine frequent subgraphs, and gPrun...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

Application Information

Patent Timeline
no application Login to View More
IPC IPC(8): G06F17/30G06N3/00
CPCG06F16/9038G06N3/006
Inventor 赵宇海印莹杜焱黄海王国仁
Owner NORTHEASTERN UNIV
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Patsnap Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Patsnap Eureka Blog
Learn More
PatSnap group products