Malicious code detection method and system

A malicious code detection and malicious code technology, applied in the direction of instruments, calculations, electrical digital data processing, etc., can solve the problems of fuzzy category boundaries, missing data part features, no obvious improvement, etc.

Active Publication Date: 2020-05-12
GUANGZHOU UNIVERSITY
View PDF7 Cites 10 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0004] However, the model proposed by Tian et al. only focused on the number and frequency of Windows APIs, without considering the context contained in the API call sequence, and lost some characteristics of the data.
The model proposed by Shifu Hou et al. merges the clusters containing multiple categories of data into a large mixed cluster after clustering. Some of these clusters are caused by the distribution of data as non-spherical clusters or fuzzy category boundaries. Part of the data points are doped with each other, but some clusters are reduced in purity due to the inclusion of a small number of outliers or noise points
For the latter, the data subsets corresponding to the mixed clusters are often highly unbalanced in data distribution, and there is no obvious improvement after the clusters are merged. The noise points generated in the process may also affect the accuracy of classification

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Malicious code detection method and system
  • Malicious code detection method and system
  • Malicious code detection method and system

Examples

Experimental program
Comparison scheme
Effect test

Embodiment

[0121] In this embodiment, a malicious code detection method is designed. A malicious code detection model based on feature integration and data partitioning is designed. The model is mainly divided into two parts. The first part uses TF-IDF (Term Frequency-Inverse Document Frequency, term frequency-inverse text frequency ) and the Doc2vec algorithm to extract the features of the action sequence of the malicious code. The second part is based on the first part, and uses the clustering-based ensemble classification improvement model to classify the malicious code. Such as figure 1 As shown, the specific content of this embodiment is as follows:

[0122] S1, TF-IDF and Doc2vec extract malicious code family feature fusion

[0123] Treat the Windows API action sequence in the running process of each malicious code as a contextual text, and use TF-IDF and Doc2vec for feature extraction;

[0124] TF-IDF is a statistical method used to evaluate the importance of specific words in t...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention discloses a malicious code detection method and system, and the method comprises the steps: S1, enabling a Windows API action sequence in an operation process of each malicious code to serve as a text with a context relation, and respectively carrying out the feature extraction through TF-IDF and Doc2vec; s2, after a TF-IDF feature matrix and a Doc2vec feature matrix are obtained respectively, splicing the features extracted by the TF-IDF and the Doc2vec, and obtaining a feature matrix of the malicious code after dimensionality reduction; s3, constructing an integrated classification improved model based on clustering, classifying the data set by adopting a plurality of base learners; and S4, in a prediction stage, respectively inputting the samples into the nearest single class cluster/SVM classifier in each base learner, outputting a prediction class, and finally according to a voting principle, taking the class occupying the majority in the learner output classes as afinal prediction class. According to the method, the TF-IDF and the Doc2vec are combined, the API frequency in the malicious code action sequence is considered, the context association of the action sequence is also considered, and the malicious code detection accuracy is improved.

Description

technical field [0001] The invention belongs to the technical field of network security, and in particular relates to a malicious code detection method and system. Background technique [0002] Malicious code detection has always been one of the focuses of attention in the field of network security. Malicious programs such as Trojan horse virus, worm virus, mining virus, and ransomware virus invade the system, tamper with files, and steal information by stealthily injecting and running malicious code. , Enterprises, personal privacy security and property security are a huge threat. With the continuous confrontation and upgrading of malicious code attack and defense technologies, the development of malicious code gradually tends to be multi-variant, highly concealed, large in number, and updated quickly. At present, the analysis techniques for malicious code can be divided into static analysis and dynamic analysis. Among them, the dynamic analysis technology pays attention t...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
Patent Type & Authority Applications(China)
IPC IPC(8): G06F21/56G06K9/62
CPCG06F21/563G06F18/23213G06F18/2411G06F18/214
Inventor 范美华李树栋吴晓波韩伟红杨航锋付潇鹏方滨兴田志宏殷丽华顾钊铨仇晶李默涵唐可可
Owner GUANGZHOU UNIVERSITY
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products