Method for detecting outlier data from large-scale high dimensional data based on genetic algorithm

A technology of outlier data and genetic algorithm, applied in the field of outlier data mining, can solve problems such as the difficulty of screening outlier data

Inactive Publication Date: 2015-03-11
LANGCHAO ELECTRONIC INFORMATION IND CO LTD
View PDF3 Cites 7 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0006] With the continuous accumulation of data and the continuous increase of data scale, it is becoming more and more difficult for traditional outlier data mining algorithms to use existing computing conditions to screen outlier data.

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Method for detecting outlier data from large-scale high dimensional data based on genetic algorithm
  • Method for detecting outlier data from large-scale high dimensional data based on genetic algorithm
  • Method for detecting outlier data from large-scale high dimensional data based on genetic algorithm

Examples

Experimental program
Comparison scheme
Effect test

Embodiment 1

[0027] The method for detecting outlier data from large-scale high-dimensional data based on genetic algorithm of the present invention comprises the following steps:

[0028] (1) Sample discretization and coding: high-dimensional data is coded, and each individual corresponds to a string; the sparse coefficient is selected as the fitness function, and the coefficient is used as the standard for judging the quality of the individual;

[0029] (2), cyclic iteration: maintain a group, which includes several individuals; through crossover, mutation, and selection, the group is continuously updated in accordance with the principle of survival of the fittest;

[0030] (3) Decoding to obtain outlier data: Decode the finally obtained group to correspond to the corresponding sample data, and then discover the outlier data hidden in it.

[0031] Coding and decoding: A group in the genetic algorithm is composed of a certain number of individuals encoded by genes; each individual is an e...

Embodiment 2

[0036] The method for detecting outlier data from large-scale high-dimensional data based on genetic algorithm of the present invention comprises the following steps:

[0037] (1) Sample discretization and coding: high-dimensional data is coded, and each individual corresponds to a string; the sparse coefficient is selected as the fitness function, and the coefficient is used as the standard for judging the quality of the individual;

[0038] (2), cyclic iteration: maintain a group, which includes several individuals; through crossover, mutation, and selection, the group is continuously updated in accordance with the principle of survival of the fittest;

[0039] (3) Decoding to obtain outlier data: Decode the finally obtained group to correspond to the corresponding sample data, and then discover the outlier data hidden in it.

[0040] Coding and decoding: A group in the genetic algorithm is composed of a certain number of individuals encoded by genes; each individual is an e...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention discloses a method for detecting outlier data from large-scale high dimensional data based on a genetic algorithm, and belongs to the technical field of outlier data mining. The method comprises the steps of (1) sample discretization and encoding, namely encoding the high dimensional data and enabling each individual to correspond to one character string, selecting a sparse coefficient as a fitness function and taking the coefficient as a criterion for judging whether the individuals are good or bad, (2) loop iteration, namely maintaining a group which comprises a plurality of individuals and updating the group continuously by use of crossing, mutation and selection according to the principle of survival of the fittest, and (3) decoding to obtain the outlier data, namely decoding the group obtained at last by corresponding to the corresponding sample data and then finding the hidden outlier data in the sample data. The method for detecting the outlier data from the large-scale high dimensional data based on the genetic algorithm is capable of effectively and quickly finding out the hidden outlier data from the large-scale high dimensional data.

Description

technical field [0001] The invention relates to the technical field of outlier data mining, in particular to a method for detecting outlier data from large-scale high-dimensional data based on a genetic algorithm. Background technique [0002] Outlier data refers to some data that exists in a large amount of data that is inconsistent with the general behavior or model of the data. There are generally two reasons for the generation of outlier data: [0003] (1) Caused by measurement or execution errors: The screening of this type of outlier data can filter out impurities or problematic data from a large amount of data, thereby improving the overall quality of the data. [0004] (2) The result of inherent data variability: the objective existence of this type of data determines the importance of screening this type of outlier data. For example, the discovery of some unknown outlier data that objectively exists in scientific research data can greatly improve the research of r...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
Patent Type & Authority Applications(China)
IPC IPC(8): G06F17/30G06N3/12
CPCG06N3/12G06F18/2111
Inventor 韦鹏付兴旺吴楠
Owner LANGCHAO ELECTRONIC INFORMATION IND CO LTD
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products