A Fast Method for Screening Outliers from Large-Scale Data

A technology of large-scale data and outlier data, applied in electrical digital data processing, special data processing applications, instruments, etc., can solve the problems of accelerated operation speed, high requirements for computing time and memory space, etc. The effect of high memory space requirements, fast and effective outlier data filtering

Active Publication Date: 2016-09-07
LANGCHAO ELECTRONIC INFORMATION IND CO LTD
View PDF3 Cites 0 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0007] The purpose of the present invention is achieved in the following manner, using random sampling to reduce the number of samples involved in the calculation, using parallel computing to speed up the calculation speed, thereby effectively solving the problem of computing time and memory space in large-scale data outlier data screening The more demanding problem, so as to achieve fast and effective outlier data screening, including the following steps:

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • A Fast Method for Screening Outliers from Large-Scale Data
  • A Fast Method for Screening Outliers from Large-Scale Data
  • A Fast Method for Screening Outliers from Large-Scale Data

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0033] Referring to the accompanying drawings, a method for quickly screening outlier data from large-scale data of the present invention will be described in detail below.

[0034] A method for quickly screening outlier data from large-scale data, the design idea is as follows:

[0035] 1) It is mainly divided into six stages of data preprocessing, feature selection and transformation, initialization variables, iteration, outlier index calculation, and outlier data screening for development and implementation. In order to ensure the consistency of the process and the reusability of the intermediate results, it is recommended to use a unified development programming language to complete;

[0036] 2) The basic algorithm used in the present invention can be rewritten, and existing program packages can also be used;

[0037] 3) The distance measure is used multiple times in the present invention. The definition of distance is flexible, and Euclidean distance, Manhattan distance...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

Provided is a method for rapid screening of outlier data from large-scale data; the features of the computation of time and space complexity in large-scale data outlier mining are fully taken into consideration; random sampling is used to reduce the quantity of samples participating in the computation, and parallel computing is used to increase computing speed, thereby effectively solving the problem of higher computing-time and memory space requirements when screening outliers of large-scale data, and thus accomplishing rapid and effective screening of outlier data.

Description

technical field [0001] The invention relates to the technical field of computer pattern recognition and machine learning, in particular to a method for quickly screening outlier data from large-scale data. Background technique [0002] Outlier data refers to some data that exists in a large amount of data that is inconsistent with the general behavior or model of the data. There are generally two reasons for the generation of outlier data: [0003] 1) The screening of this type of outlier data caused by measurement or execution errors can filter out impurities or problematic data from a large amount of data, thereby improving the overall quality of the data; [0004] 2) The result of inherent data variability The objective existence of this type of data determines the importance of screening this type of outlier data. For example, the discovery of some unknown outlier data that objectively exists in scientific research data can greatly improve the research of related theor...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
Patent Type & Authority Patents(China)
IPC IPC(8): G06F17/30
CPCG06F16/00
Inventor 王恩东张东吴楠韦鹏付兴旺
Owner LANGCHAO ELECTRONIC INFORMATION IND CO LTD
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products