A Method of Outlier Data Mining in Large-Scale Datasets Based on Graph Theory

What is AI technical title?
AI technical title is built by PatSnap AI team. It summarizes the technical point description of the patent document.
A large-scale data, outlier data technology, applied in character and pattern recognition, instruments, computer parts and other directions, can solve problems such as difficulty in screening outlier data

Active Publication Date: 2018-04-17

SHANDONG LANGCHAO YUNTOU INFORMATION TECH CO LTD

View PDF4 Cites 0 Cited by

Summary
Abstract
Description
Claims
Application Information

AI Technical Summary
This helps you quickly interpret patents by identifying the three key elements:
Problems solved by technology
Method used
Benefits of technology

Problems solved by technology

[0005] With the continuous accumulation of data and the continuous increase of data scale, it is becoming more and more difficult for traditional outlier data mining algorithms to use existing computing conditions to screen outlier data.

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Image

Smart Image Click on the blue labels to locate them in the text.

Viewing Examples

Smart Image

Examples

Experimental program

Comparison scheme

Effect test

Embodiment Construction

[0037] The present invention will be further described below in conjunction with specific embodiments according to the accompanying drawings of the description:

[0038] A large-scale data set outlier data mining method based on graph theory, the method deletes the edges in the graph, and after multiple iterations, the samples corresponding to the nodes with a degree of 0 in the graph are the outlier data screened by this method .

[0039] The acyclic graph is an undirected graph based on distance, the nodes in the graph are samples in the data set, and the weight of an edge is the distance between samples corresponding to two nodes.

[0040] Such as figure 1 As shown, the method includes the following steps:

[0041] 1) Data preprocessing

[0042] The purpose of this step is to preprocess the data, eliminate the inconsistency between the data and normalize each data, including specific operations such as data cleaning, data integration, data transformation, data reduction,...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Login to View More

PUM

Login to View More

Abstract

The invention discloses a large-scale data set outlier data mining method based on a graph theory method. The method deletes the edges in the graph, and after multiple iterations, the samples corresponding to the nodes with a degree of 0 in the graph are screened by the method. outlier data. The acyclic graph is an undirected graph based on distance, the nodes in the graph are samples in the data set, and the weight of an edge is the distance between samples corresponding to two nodes. The classification method of the present invention can be applied to various outlier data mining applications, and is suitable for searching global outlier data.

Description

technical field [0001] The invention relates to the technical fields of computer pattern recognition and machine learning, in particular to a large-scale data set outlier data mining method based on a graph theory method. Background technique [0002] Outlier data refers to some data that exists in a large amount of data that is inconsistent with the general behavior or model of the data. There are generally two reasons for the generation of outlier data: [0003] 1) Due to measurement or execution errors, the screening of this type of outlier data can filter out impurities or problematic data from a large amount of data, thereby improving the overall quality of the data. [0004] 2) As a result of inherent data variability, the objective existence of this type of data determines the importance of screening this type of outlier data. For example, the discovery of some unknown outlier data that objectively exists in scientific research data can greatly improve the research ...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Login to View More

Application Information

Patent Timeline

Login to View More

Patent Type & AuthorityPatents(China)

IPC IPC(8): G06K9/62

CPCG06F18/24133

Inventor韦鹏吴楠付兴旺

OwnerSHANDONG LANGCHAO YUNTOU INFORMATION TECH CO LTD

A Method of Outlier Data Mining in Large-Scale Datasets Based on Graph Theory

AI Technical Summary This helps you quickly interpret patents by identifying the three key elements: Problems solved by technologyMethod usedBenefits of technology

Problems solved by technology

Method used

Image

Examples

Embodiment Construction

PUM

Abstract

Description

Claims

Application Information

AI Technical Summary
This helps you quickly interpret patents by identifying the three key elements:
Problems solved by technology
Method used
Benefits of technology