Self-adaptive host intrusion detection sequence feature extraction method and system

A feature extraction and intrusion detection technology, applied in computer parts, computer security devices, instruments, etc., can solve problems such as overfitting of classification models, increase in the number of subsequences, and increase in computational cost of subsequences, and achieve good adaptability. , Characterize the comprehensive effect

Active Publication Date: 2021-07-09
四川阁侯科技有限公司
View PDF9 Cites 5 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0003] The purpose of the present invention is to provide an adaptive host intrusion detection sequence feature extraction method and system, which is used to solve the problem that the feature extraction method based on a fixed-length window in the prior art is not easy to select a suitable window length, resulting in an explosion in the number of system call subsequences growth, increased computing costs, and short subsequences are easily bypassed by attackers, while the length of long subsequences has a high correlation with the data used, training with long sequences can easily cause the problem of over-fitting of classification models

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Self-adaptive host intrusion detection sequence feature extraction method and system
  • Self-adaptive host intrusion detection sequence feature extraction method and system
  • Self-adaptive host intrusion detection sequence feature extraction method and system

Examples

Experimental program
Comparison scheme
Effect test

Embodiment 1

[0047] An adaptive host intrusion detection sequence feature extraction method, comprising:

[0048] S1: Extract fixed-length features: call the system call sequence of the normal system call training data set (that is, the training data) Use the N-Gram (N-element model) sliding window value to divide each system call sequence into fixed-length subsequences, and use TF-IDF to weight each subsequence, and then filter the subsequences according to the size of the weight to obtain fixed-length subsequences , the fixed-length subsequence set is the fixed-length corpus, such as figure 2 shown.

[0049] The calculation method of using TF-IDF to weight each subsequence in the above step S1 is as follows:

[0050] : To calculate the inverse ratio of sequence frequency, first use N-Gram to divide the system call sequence into equal-length subsequences with a length of 2, that is, a subsequence with a length of 2 is a fixed-length subsequence. Then count each fixed-length subseque...

Embodiment 2

[0074] combined with figure 1 As shown, an adaptive host intrusion detection sequence feature extraction system includes a fixed-length feature extraction module, a variable-length feature extraction module, a feature fusion module, an automatic encoding machine module and a classifier module, wherein:

[0075] Fixed-length feature extraction module: use N-Gram technology to segment the input normal system call sequence with window value. Count each fixed-length subsequence Appears in different system call sequences frequency in . Then, the process behavior weight is calculated, and the process behavior weight can be obtained by the inverse ratio of the calculated frequency of a single fixed-length subsequence to the frequency of all sequences . Because process behavior weights represents the fixed-length subsequence t i , thus illustrating the taxonomic contribution of this fixed-length subsequence to anomaly detection. Finally, according to the size of the process...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention discloses a self-adaptive host intrusion detection sequence feature extraction method, which comprises the following steps of: extracting a fixed-length feature subsequence and a variable-length feature subsequence to obtain a fixed-length corpus and a variable-length corpus, obtaining a union set to obtain a feature corpus, counting the occurrence frequency of the subsequences in the feature corpus in a to-be-tested system call sequence to obtain a feature vector, and carrying out dimension reduction on the feature vectors by utilizing an automatic coding machine, inputting the feature vectors subjected to dimension reduction into a classifier for classification, and obtaining a classification result. The invention further discloses a self-adaptive host intrusion detection sequence feature extraction system which comprises a fixed-length feature extraction module, a variable-length feature extraction module, a feature fusion module, an automatic coding machine and a classifier. According to the method an system, host program behaviors are described in combination with fixed-length and variable-length features, better adaptivity is achieved, given program behaviors can be better described through variable-length feature extraction, and features highly contributing to classification can be further extracted through a TF-IDF-based fixed-length feature selection method.

Description

technical field [0001] The invention relates to the technical field of host intrusion detection, in particular to an adaptive host intrusion detection sequence feature extraction method and system. Background technique [0002] Host intrusion detection technology is an intrusion detection technology that prevents further attacks through post-mortem analysis. It has the advantages of high cost performance, centralized detection field of view, easy user tailoring, and no need to set up another hardware platform. The system call sequence represents the behavior characteristics of the running process in the host, and is an important data source for the host intrusion detection system. The sequence of system calls is usually abstracted as a numerical vector representing the calling function, and the combination sequence between various system calls represents the potential action target of the process. The traditional host intrusion detection feature extraction method has a wind...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
Patent Type & Authority Applications(China)
IPC IPC(8): G06F21/56G06K9/62
CPCG06F21/566G06F2221/033G06F18/213G06F18/24
Inventor 陈文廖小瑶黄登
Owner 四川阁侯科技有限公司
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products