Looking for breakthrough ideas for innovation challenges? Try Patsnap Eureka!

Large scale data set outlier data mining method based on graph theoretic method

A large-scale data, outlier data technology, applied in character and pattern recognition, instruments, computer parts and other directions, can solve problems such as difficulty in screening outlier data

Active Publication Date: 2015-10-07
SHANDONG LANGCHAO YUNTOU INFORMATION TECH CO LTD
View PDF4 Cites 2 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0005] With the continuous accumulation of data and the continuous increase of data scale, it is becoming more and more difficult for traditional outlier data mining algorithms to use existing computing conditions to screen outlier data.

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Large scale data set outlier data mining method based on graph theoretic method
  • Large scale data set outlier data mining method based on graph theoretic method
  • Large scale data set outlier data mining method based on graph theoretic method

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0037] The present invention will be further described below in conjunction with specific embodiments according to the accompanying drawings of the description:

[0038] A large-scale data set outlier data mining method based on graph theory, the method deletes the edges in the graph, and after multiple iterations, the samples corresponding to the nodes with a degree of 0 in the graph are the outlier data screened by this method .

[0039] The acyclic graph is an undirected graph based on distance, the nodes in the graph are samples in the data set, and the weight of an edge is the distance between samples corresponding to two nodes.

[0040] Such as figure 1 As shown, the method includes the following steps:

[0041] 1) Data preprocessing

[0042] The purpose of this step is to preprocess the data, eliminate the inconsistency between the data and normalize each data, including specific operations such as data cleaning, data integration, data transformation, data reduction,...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

PUM

No PUM Login to View More

Abstract

The invention discloses a large scale data set outlier data mining method based on a graph theoretic method, characterized by utilizing distance information among samples to establish an acyclic graph; gradually deleting edges in the graph through a graphic clipping method based on the acyclic graph; and through multiple iteration, obtaining samples corresponding with 0 degree nodes, i.e., outlier data selected through the method. The acyclic graph is an undirected graph based on distance; nodes in the graph are samples in a data set; the weight of an edge is a distance between corresponding samples of two nodes. The classification method can be applied to various outlier data mining applications, and is suitable for searching for global outlier data.

Description

technical field [0001] The invention relates to the technical fields of computer pattern recognition and machine learning, in particular to a large-scale data set outlier data mining method based on a graph theory method. Background technique [0002] Outlier data refers to some data that exists in a large amount of data that is inconsistent with the general behavior or model of the data. There are generally two reasons for the generation of outlier data: [0003] 1) Due to measurement or execution errors, the screening of this type of outlier data can filter out impurities or problematic data from a large amount of data, thereby improving the overall quality of the data. [0004] 2) As a result of inherent data variability, the objective existence of this type of data determines the importance of screening this type of outlier data. For example, the discovery of some unknown outlier data that objectively exists in scientific research data can greatly improve the research ...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

Application Information

Patent Timeline
no application Login to View More
Patent Type & Authority Applications(China)
IPC IPC(8): G06K9/62
CPCG06F18/24133
Inventor 韦鹏吴楠付兴旺
Owner SHANDONG LANGCHAO YUNTOU INFORMATION TECH CO LTD
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Patsnap Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Patsnap Eureka Blog
Learn More
PatSnap group products