A malicious code classification method based on multiple features and feature selection

A malicious code and feature selection technology, which is applied in the field of malicious code classification, can solve the problems of reduced training speed, achieve good classification performance, improve classification accuracy, and speed up training

Active Publication Date: 2018-12-18
BEIJING INSTITUTE OF TECHNOLOGYGY
View PDF3 Cites 6 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0005] However, the effect of the malicious code classification method based on machine learning depends largely on the extracted features of the malicious code family. Too few features cannot fully represent the type of malicious code family, and too many features will not only slow down the training Decrease, it will also produce the effect of restricting classification by problems such as overfitting

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • A malicious code classification method based on multiple features and feature selection
  • A malicious code classification method based on multiple features and feature selection
  • A malicious code classification method based on multiple features and feature selection

Examples

Experimental program
Comparison scheme
Effect test

example 1

[0061] The specific implementation steps of this embodiment 1 are as follows: figure 1 As shown, the method includes the following steps:

[0062] Step A: Malicious code file preprocessing;

[0063] In this embodiment, the malicious code samples used are the datasets provided by Microsoft that include '.byte' files and '.asm' files, and the PE file headers of each malicious code sample have been removed;

[0064] Specifically in this embodiment, check the files in the sample set, and delete malicious code samples that only contain '.bytes' files or only '.asm' files;

[0065] Step B: Generate malicious code images and extract pixel features;

[0066] Malicious code pixel feature extraction process is as follows: figure 2 ;

[0067] Specific to this embodiment, use '.asm' file to generate malicious code image, use python to carry out feature extraction;

[0068] First read the '.asm' file and convert it to a hexadecimal file, and then split the hexadecimal string by byte....

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention relates to a malicious code classification method based on multiple features and feature selection, belonging to the technical field of computer security and machine learning. The invention relates to a method for obtaining multi-features by fusing features of different classes of malicious code family, and a feature selection method for processing the multi-features, in particular,the method comprises the following steps of fusing pixel features and n-Gram features, the multi-feature information representing the malicious code family is obtained, the initial feature selection of the fusion feature is carried out by using the logistic regression model with L1 regular term optimized by L2 regular term, and then the feature dimension is reduced by using linear discriminant analysis, finally the malicious code classifier is trained by using K nearest neighbor algorithm. The invention can provide more dimensional characteristic data for the training process, and can solve the problem that the key features cannot be selected; by using LDA to reduce the dimension of features, the mapped samples have better classification performance, which not only accelerates the trainingspeed, but also improves the classification accuracy of the model.

Description

technical field [0001] The invention relates to a malicious code classification method based on multiple features and feature selection, and belongs to the technical fields of computer security and machine learning. Background technique [0002] With the development of malicious code technology, malicious codes began to deform in the process of spreading to avoid detection and killing. At present, the number of variants of the same malicious code has increased sharply, and the shape has also changed greatly compared with the main body. The detection of malicious codes Security and prevention are major challenges in the security field. [0003] Malicious code refers to all malicious programs designed to destroy the reliability, usability, security and data integrity of computer or network systems or consume system resources. With the gradual development of anti-malicious code technology, active defense technology and cloud detection and killing technology have been increasin...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
IPC IPC(8): G06F21/56G06K9/62
CPCG06F21/563G06F18/23213G06F18/241G06F18/253
Inventor 金福生王茹楠秦勇
Owner BEIJING INSTITUTE OF TECHNOLOGYGY
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products