A Method of Outlier Data Mining in Large-Scale Datasets Based on Graph Theory

A large-scale data, outlier data technology, applied in character and pattern recognition, instruments, computer parts and other directions, can solve problems such as difficulty in screening outlier data

Active Publication Date: 2018-04-17
SHANDONG LANGCHAO YUNTOU INFORMATION TECH CO LTD
View PDF4 Cites 0 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0005] With the continuous accumulation of data and the continuous increase of data scale, it is becoming more and more difficult for traditional outlier data mining algorithms to use existing computing conditions to screen outlier data.

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • A Method of Outlier Data Mining in Large-Scale Datasets Based on Graph Theory
  • A Method of Outlier Data Mining in Large-Scale Datasets Based on Graph Theory
  • A Method of Outlier Data Mining in Large-Scale Datasets Based on Graph Theory

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0037] The present invention will be further described below in conjunction with specific embodiments according to the accompanying drawings of the description:

[0038] A large-scale data set outlier data mining method based on graph theory, the method deletes the edges in the graph, and after multiple iterations, the samples corresponding to the nodes with a degree of 0 in the graph are the outlier data screened by this method .

[0039] The acyclic graph is an undirected graph based on distance, the nodes in the graph are samples in the data set, and the weight of an edge is the distance between samples corresponding to two nodes.

[0040] Such as figure 1 As shown, the method includes the following steps:

[0041] 1) Data preprocessing

[0042] The purpose of this step is to preprocess the data, eliminate the inconsistency between the data and normalize each data, including specific operations such as data cleaning, data integration, data transformation, data reduction,...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention discloses a large-scale data set outlier data mining method based on a graph theory method. The method deletes the edges in the graph, and after multiple iterations, the samples corresponding to the nodes with a degree of 0 in the graph are screened by the method. outlier data. The acyclic graph is an undirected graph based on distance, the nodes in the graph are samples in the data set, and the weight of an edge is the distance between samples corresponding to two nodes. The classification method of the present invention can be applied to various outlier data mining applications, and is suitable for searching global outlier data.

Description

technical field [0001] The invention relates to the technical fields of computer pattern recognition and machine learning, in particular to a large-scale data set outlier data mining method based on a graph theory method. Background technique [0002] Outlier data refers to some data that exists in a large amount of data that is inconsistent with the general behavior or model of the data. There are generally two reasons for the generation of outlier data: [0003] 1) Due to measurement or execution errors, the screening of this type of outlier data can filter out impurities or problematic data from a large amount of data, thereby improving the overall quality of the data. [0004] 2) As a result of inherent data variability, the objective existence of this type of data determines the importance of screening this type of outlier data. For example, the discovery of some unknown outlier data that objectively exists in scientific research data can greatly improve the research ...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
Patent Type & Authority Patents(China)
IPC IPC(8): G06K9/62
CPCG06F18/24133
Inventor 韦鹏吴楠付兴旺
Owner SHANDONG LANGCHAO YUNTOU INFORMATION TECH CO LTD
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products