Method for outlier data mining based on parallel computation

A technology of outlier data and parallel computing, applied in data mining, calculation, electrical digital data processing, etc., can solve problems such as inability to find hidden outlier data, inability to provide valuable information, inconsistent valuable information, etc., to achieve a solution The problem of uneven data distribution and the effect of solving the problem of local order but global disorder

Inactive Publication Date: 2016-08-17
MASHANGYOU TECH CO LTD
View PDF4 Cites 7 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0003] In high-dimensional mass data, due to the large amount of data and high dimensionality, the effect and efficiency of outlier data mining are seriously affected, and some outlier data hidden in the subspace may not be found. In most cases, outlier data is Data objects that are obviously inconsistent with the distribution characteristics of the local data set. But on some attribute dimensions, inconsistent valuable information can be provided, while on other attribute dimensions, valuable information cannot be provided

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Method for outlier data mining based on parallel computation
  • Method for outlier data mining based on parallel computation
  • Method for outlier data mining based on parallel computation

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0024] In order to deepen the understanding of the present invention, the present invention will be further described in detail below in conjunction with the accompanying drawings and embodiments. The following examples are only used to illustrate the technical solution of the present invention more clearly, but not to limit the protection scope of the present invention.

[0025] Traditional algorithm:

[0026] Suppose DS is any d-dimensional data set, attribute set FS={A1, A2,...Ad}, xij(i=1, 2,...,n; j=1, 2,...,d) represents the i-th data The value of the jth attribute of the object obji. If the value of each dimension of the subspace definition vector v of the i-th object obj is 0, it indicates that obj is consistent with the local distribution characteristics; if there is a related subspace in the i-th object obj, it indicates that obj is inconsistent with the local distribution characteristics. Usually we use Factor(obj) to describe the degree of outlier:

[0027] ...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention discloses a method for outlier data mining based on parallel computation. By utilization of a local sparse degree on attribute dimensions, a relevant subspace is defined again, and thus distribution characteristics on various local data sets can be effectively depicted; by utilization of the probability density of the local data sets, a local outlier factor calculation formula is given, the degree of data objects disobeying local data set distribution characteristics is effectively embodied, and N data objects with a maximum outlier degree are selected to be defined as local outlier data. According to the method for outlier data mining based on parallel computation, a sparsity factor and an outlier factor are calculated through a Map; when values of the factors are subjected to total ordering, one Map is used to sample the factors, a function for determining what node each (K2,V2) is distributed to is implemented, and thus the problems that the data distribution is uneven and data are locally ordered and globally disorderly are effectively solved.

Description

technical field [0001] The invention relates to an outlier data mining method based on parallel computing. Background technique [0002] Outlier data is data that obviously deviates from other data, does not satisfy the general pattern or behavior of the data, and is inconsistent with other existing data. It contains a large amount of valuable information that is not easy to be discovered by humans. Outlier mining as data An important branch of mining has been widely used in astronomical spectrum data analysis, credit card fraud, network intrusion mining, data cleaning and other fields. [0003] In high-dimensional mass data, due to the large amount of data and high dimensionality, the effect and efficiency of outlier data mining are seriously affected, and some outlier data hidden in the subspace may not be found. In most cases, outlier data is Data objects that are obviously inconsistent with the distribution characteristics of the local data set. But on some attribute di...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
Patent Type & Authority Applications(China)
IPC IPC(8): G06F17/30
CPCG06F16/2465G06F2216/03
Inventor 陈勇胡中骥贾昱
Owner MASHANGYOU TECH CO LTD
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products