Malicious code family clustering method and system

A malicious code and clustering method technology, applied in character and pattern recognition, instruments, computing, etc., can solve the problems of consuming computing resources, huge amount of calculation, and large value range, so as to increase the accuracy of clustering and reduce the number of values. range of values, effect of reducing computational overhead

Active Publication Date: 2019-11-15
GUANGZHOU UNIVERSITY
View PDF6 Cites 13 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

However, due to the lack of an overall grasp of the data structure distribution, in the face of a huge amount of data, the possible value range of k is generally large during the test process, which leads to a large amount of test calculations and consumes a lot of computing resources

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Malicious code family clustering method and system
  • Malicious code family clustering method and system
  • Malicious code family clustering method and system

Examples

Experimental program
Comparison scheme
Effect test

Embodiment

[0069] like figure 1 As shown, a malicious code family clustering method in this embodiment is an effective malicious code family clustering method based on the T-SNE and K-means algorithm. The method mainly uses the malicious code execution sequence as an original feature, and adopts The T-SNE algorithm visualizes the number of clusters of malicious code families, and then uses the K-means algorithm to cluster the malicious code families. Method of the present invention comprises the following steps:

[0070] (1) Using the T-SNE algorithm to perform dimensionality reduction visualization on the original malicious code execution sequence; including the following steps:

[0071] (1.1) Use the T-SNE algorithm to model the distribution of each data point's neighbors, where the neighbors refer to the collection of data points close to each other; in the original high-dimensional space, the present invention models the high-dimensional space as a Gaussian distribution, While in t...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention discloses a malicious code family clustering method and system, and the method comprises the steps: carrying out the dimension reduction visualization of an original malicious code execution sequence through employing a T-SNE algorithm, which specifically comprises the steps: carrying out the modeling of the neighbor distribution of each data point through employing the T-SNE algorithm, wherein neighbors refer to set of data points which are close to each other; constructing a model, and mapping data points to corresponding probability distribution through nonlinear function transformation; training the constructed model, and calculating the gradient of a loss function by calculating the conditional probability of a low-dimensional space; adopting a K-means algorithm to cluster malicious code families, which specifically comprises the steps that the classification number K and a clustering center are determined, cluster division is performed on all the objects by calculating the distances between the objects and the clustering center, and a new clustering center is recalculated, and whether conditions are met or not is judged. The system comprises a dimension reduction visualization module and a clustering module. According to the method, the problem of how to determine k in the K-means algorithm is solved, and the accuracy of malicious code family clustering is improved.

Description

technical field [0001] The invention belongs to the technical field of malicious code analysis, and relates to a malicious code family clustering method and system. Background technique [0002] The K-means algorithm is one of the classic clustering algorithms. When using the K-means algorithm for clustering, it is necessary to pre-set the number of clusters, namely k. However, in practical applications, the data set is often large in scale, and it is difficult for people to determine the number of clusters in advance in the face of data with complex structure distribution. When the gap between the preset number of clusters and the actual number of clusters is too large, The clustering effect will be greatly reduced: when k is selected to be much smaller than the actual number of clusters, the data points of different classes will be clustered into the same class, resulting in a low distinction between clusters; when k is selected to be much larger than the actual number of ...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
IPC IPC(8): G06K9/62G06F21/56
CPCG06F21/563G06F18/23213
Inventor 杨航锋李树栋吴晓波韩伟红范美华付潇鹏方滨兴田志宏殷丽华顾钊铨李默涵仇晶唐可可
Owner GUANGZHOU UNIVERSITY
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products