Looking for breakthrough ideas for innovation challenges? Try Patsnap Eureka!

Abnormal data detection method based on active learning

A technology of abnormal data detection and active learning, applied in the direction of integrated learning, etc., to achieve the effects of improving scalability, reducing workload, and high accuracy

Inactive Publication Date: 2019-06-25
BEIJING INFORMATION SCI & TECH UNIV
View PDF7 Cites 11 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0004] Due to the complexity of the outlier detection task, there is no single algorithm suitable for all scenarios, so the researchers proposed a detection method based on model integration to reduce the risk brought by a single algorithm

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Abnormal data detection method based on active learning
  • Abnormal data detection method based on active learning
  • Abnormal data detection method based on active learning

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0032] The present invention is described in detail below with reference to accompanying drawing and embodiment:

[0033] attached Figure 1-4 It can be seen that an abnormal data detection method based on active learning,

[0034] According to the comparative analysis of various basic learners, 5 unsupervised models based on statistics-based Tukey Test and HBOS, similarity-based model iORCA, axis-parallel subspace division-based Isolation Forest and RSHash were selected as the base learners;

[0035] The outlier scores judged by each base learner are combined and presented to human experts for labeling;

[0036] This maximizes the amount of information that human experts can feed back.

[0037] A supervised binary classification model based on GBM (Gradient Boosting Machine) is trained by sampling 75% of the labeled data set and the data set generated by each basic learner vote, and the model is applied to the full data set to obtain the final mining results.

[0038] Fina...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

PUM

No PUM Login to View More

Abstract

The invention relates to an abnormal data detection method based on active learning, and the method comprises the steps: selecting a statistical and similarity-based model and an axis parallel subspace division-based unsupervised model as a basic learner according to the comparative analysis of various basic learners; merging the data of the outlier scores judged by the base learners at the outliers and the normal boundaries, and presenting the merged data to human experts for labeling; and sampling and training a supervised binary classification model from the labeled data set and the data set generated by voting of each base learner, and applying the model to the full data set to obtain a final mining result. The invention discloses an abnormal data detection method based on active learning. An outlier integrated mining method OMAL based on active learning is provided in combination with active learning and model integration, a supervised binary classification model is trained in combination with learning results of a plurality of unsupervised base learners and human expert knowledge, and high accuracy is achieved while workload is reduced and expansibility is improved.

Description

technical field [0001] The invention relates to a method for detecting abnormal data, in particular to a method for detecting abnormal data based on active learning. Background technique [0002] An outlier is a significant deviation from the rest of the data in the dataset that raises suspicions that these data points were produced by a different mechanism. Outlier Detection, also known as Outlier Mining, has received extensive attention and research because of its broad application prospects in financial fraud, network intrusion, fault detection, bioinformatics and other fields. [0003] Outlier detection tasks usually lack available labeled data, and outlier data only account for a small part of the entire data set. Therefore, compared with other data mining tasks, outlier detection is more difficult. At present, the research on outlier detection can be mainly divided into the following categories: (1) detection methods based on probability and statistics, including dete...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

Application Information

Patent Timeline
no application Login to View More
IPC IPC(8): G06N20/20
Inventor 赵晓永王磊李忱闫阳
Owner BEIJING INFORMATION SCI & TECH UNIV
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Patsnap Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Patsnap Eureka Blog
Learn More
PatSnap group products