Unbalanced data set conversion intrusion detection method and system based on sampling and feature reduction

An unbalanced data and intrusion detection technology, which is applied in the direction of instruments, character and pattern recognition, computer components, etc., can solve the problems of reducing the detection accuracy of network intrusion risks, unbalanced data sets, etc.

Pending Publication Date: 2020-12-15
COMP NETWORK INFORMATION CENT CHINESE ACADEMY OF SCI
View PDF5 Cites 3 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

Data collection generally collects network log data, but the existing log data has the problem of data set imbalance, which reduces the accuracy of network intrusion risk detection

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Unbalanced data set conversion intrusion detection method and system based on sampling and feature reduction
  • Unbalanced data set conversion intrusion detection method and system based on sampling and feature reduction
  • Unbalanced data set conversion intrusion detection method and system based on sampling and feature reduction

Examples

Experimental program
Comparison scheme
Effect test

Embodiment 1

[0074] Embodiment 1 of the present invention provides an intrusion detection method based on unbalanced data set conversion based on sampling and feature reduction, such as figure 1 As shown, the method includes the following steps:

[0075] S1: Obtain an unbalanced data set in the network log data, the unbalanced data set is a minority class sample set;

[0076] S2: Oversample the minority class sample set to form a new minority class sample set, which is a new unbalanced data set. Oversampling includes using the S-NKSMOTE algorithm to oversample the minority class sample set. Refer to figure 2 ,Specifically:

[0077] S21: Obtain k nearest neighbor samples of the sample x in the minority class sample set;

[0078] Among them, the k nearest neighbor samples are the k samples closest to the sample x in the kernel space, and the value of k can be set, which can be 100, 500, etc.;

[0079] S22: Compare the number of minority class samples in the k nearest neighbor samples wit...

Embodiment 2

[0098] Embodiment 2 of the present invention provides an intrusion detection method based on unbalanced data set conversion based on sampling and feature reduction. The detection method includes the following steps:

[0099] S1: Obtain an unbalanced data set in the network log data, the unbalanced data set is a collection of majority class sample set and minority class sample set;

[0100] S2: Sampling the unbalanced data set to obtain a new unbalanced data set. For the specific method of step S2, refer to Figure 4 , including:

[0101] S210: Acquire boundary sample sets of the majority class sample set and the minority class sample set;

[0102] refer to Figure 5 , step S210 is specifically, wherein the distances referred to below are all distances in the nuclear space;

[0103] S211: Calculate the distance between each majority class sample and its nearest minority class sample in the majority class sample set;

[0104] S212: Calculate the distance between each minorit...

Embodiment 3

[0144] Embodiment 3 further defines the oversampling process on the basis of embodiment 1, specifically:

[0145] Calculate the distance between each minority sample in the minority sample set and the center sample, and oversample the minority sample set according to the calculated distance to obtain a new minority sample set, which specifically includes the following steps:

[0146] Calculate the distance between each sample in the minority class sample set and the center sample;

[0147] Sort the distances from small to large to form a matrix of R'×T';

[0148] Both R' and T' are set values, which can be the same or different, and can take values ​​such as 50, 100 or 200

[0149] Starting from the first row, use the S-NKSMOTE algorithm to oversample the samples corresponding to each row, referring to Example 1 and figure 2 method for oversampling;

[0150] After the samples in each row of the matrix are oversampled, the sample set formed after oversampling is input into ...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention provides an unbalanced data set conversion intrusion detection method and system based on sampling and feature reduction, and the method comprises the following steps: firstly carrying out the sampling of a sample in an unbalanced data set, and then carrying out the sorting of features from big to small through the correlation between the features and class labels; and sequentially deleting one-dimensional features from the last dimension of the features according to a sequence, inputting the sample data set of which the one-dimensional features are reduced into the random forestmodel every time the one-dimensional features are deleted, calculating ACC values corresponding to the samples, comparing all the ACC values, and selecting a feature dimension corresponding to the maximum ACC value as a target feature dimension of feature reduction. The new unbalanced data obtained through the conversion method is input into the multi-class SVM to be trained, the obtained detection model detects the weblog data to be detected, and the problem that the detection accuracy is low due to sample imbalance is solved.

Description

technical field [0001] The invention belongs to the technical field of network intrusion detection, in particular to an intrusion detection method and system for unbalanced data set conversion based on sampling and feature reduction. Background technique [0002] With the development of the Internet, various network attacks emerge in an endless stream, and the security of the network is threatened. The purpose of intrusion detection is to analyze network data and discover suspicious attack types. Usually, methods based on machine vision and neural networks are used to detect network intrusion behaviors. The detection process generally includes steps such as data collection, analysis, and processing. Data collection generally collects network log data, but the existing log data has the problem of data set imbalance, which reduces the accuracy of network intrusion risk detection. Contents of the invention [0003] In order to solve the problems in the prior art, the presen...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
Patent Type & Authority Applications(China)
IPC IPC(8): G06K9/62
CPCG06F18/2411G06F18/214
Inventor 龙春魏金侠万巍赵静杨帆
Owner COMP NETWORK INFORMATION CENT CHINESE ACADEMY OF SCI
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products