Malicious code family homology analysis based on semi-supervised density clustering

A homology analysis and semi-supervised clustering technology, applied in the field of data mining, can solve the problem of inability to accurately realize the division and visualization of malicious code families, and achieve the effect of accurate family division

Active Publication Date: 2019-01-11
SICHUAN UNIV
View PDF6 Cites 16 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0006] In order to solve the weaknesses that the existing homology analysis methods cannot accurately realize the family division of malicious codes and visualize the evolution relationship between malicious code variants of the same family, the present invention improves the DBSCAN algorithm and combines the semi-supervised clustering technology

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Malicious code family homology analysis based on semi-supervised density clustering
  • Malicious code family homology analysis based on semi-supervised density clustering
  • Malicious code family homology analysis based on semi-supervised density clustering

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0029] In order to make the object, technical solution and advantages of the present invention clearer, the present invention will be described in further detail below with reference to the accompanying drawings.

[0030] figure 1 The overall design scheme of the malicious code family homology analysis model proposed by the present invention is mainly divided into the following modules: dynamic API call sequence extraction module, typical API sequence pattern mining module, file characterization module, and semi-supervised family clustering module and family evolution graph building blocks.

[0031] feature mining stage, figure 1 The middle dotted line marks the process, and its workflow is to use the dynamic API call sequence extraction module to extract the API call sequences of known malicious samples of known malicious code families, and then use the family labels of known malicious samples to mine the API call sequence data. A typical API call sequence pattern that can ...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

As most of the new malicious code belong to the known malicious code family, the information of existing samples in the virus library is utilized to assist malicious code to analyze the family homology to achieve more accurate family clustering, on the basis of accurate family clustering, the family diagram of malicious code is constructed to visualize the evolutionary relationship between the varieties of malicious code in the same family and predict the development direction of the varieties, and technical support is provided for the in-depth analysis of malicious code. Combined with the evolution characteristics of malicious code, the invention provides a malicious code homology analysis model which supports family graph construction, and the experimental results show that the model iseffective. A semi-supervised density clustering algorithm is provided, and experiments show that the algorithm can achieve accurate family clustering and provide clues for the discovery of unknown families. An algorithm based on asymmetric similarity measure is provided to construct family evolution diagrams for each malicious family and visualize the evolutionary relationships among malicious samples within the same family.

Description

technical field [0001] The invention uses a semi-supervised clustering technology to carry out family clustering on malicious codes, and uses an asymmetric similarity calculation method to construct a family evolution graph to visualize the evolution relationship among variants in the same family. By studying the current clustering algorithm and the problems encountered, combined with the information of known samples in the virus database, a semi-supervised density clustering algorithm S-DBSCAN is proposed, which belongs to the data mining technology. Background technique [0002] Static automated analysis technology is difficult to resist, obfuscate, encrypt and pack static automated analysis technologies, while dynamic automated analysis technology is less efficient. Existing frameworks mostly use virtual machines as the analysis environment, which is difficult to resist dynamic analysis environment detection and technology, and cannot be obtained. Reliable and accurate dy...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
IPC IPC(8): G06K9/62G06F21/56
CPCG06F21/563G06F18/23
Inventor 方勇刘亮黄诚荣俸萍张与弛
Owner SICHUAN UNIV
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products