Looking for breakthrough ideas for innovation challenges? Try Patsnap Eureka!

A malicious code family clustering method and system

A technology of malicious codes and clustering methods, which is applied in character and pattern recognition, instruments, calculations, etc., can solve the problems of huge amount of calculation, large value range, and consumption of computing resources, so as to increase clustering accuracy and narrow down range of values, the effect of reducing computational overhead

Active Publication Date: 2020-07-31
GUANGZHOU UNIVERSITY
View PDF6 Cites 0 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

However, due to the lack of an overall grasp of the data structure distribution, in the face of a huge amount of data, the possible value range of k is generally large during the test process, which leads to a large amount of test calculations and consumes a lot of computing resources

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • A malicious code family clustering method and system
  • A malicious code family clustering method and system
  • A malicious code family clustering method and system

Examples

Experimental program
Comparison scheme
Effect test

Embodiment

[0069] Such as figure 1 As shown, a malicious code family clustering method in this embodiment is an effective malicious code family clustering method based on t-SNE and K-means algorithm. The method mainly uses the malicious code execution sequence as the original feature, and adopts The t-SNE algorithm visualizes the number of clusters of malicious code families, and then uses the K-means algorithm to cluster the malicious code families. Method of the present invention comprises the following steps:

[0070] (1) Using the t-SNE algorithm to perform dimension reduction visualization on the original malicious code execution sequence; including the following steps:

[0071] (1.1) use the t-SNE algorithm to model the distribution of each data point's neighbors, where the neighbors refer to the collection of data points close to each other; in the original high-dimensional space, the present invention models the high-dimensional space as a Gaussian distribution, While in the tw...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

PUM

No PUM Login to View More

Abstract

The present invention discloses a malicious code family clustering method and system, the method includes adopting the T-SNE algorithm to perform dimensionality reduction and visualization on the original malicious code execution sequence, specifically: using the T-SNE algorithm to carry out the distribution of the neighbors of each data point Modeling, where neighbors refer to the collection of data points close to each other; construct a model, and map the data points to the corresponding probability distribution through nonlinear function transformation; train the constructed model, and calculate the conditional probability of the low-dimensional space to calculate The gradient of the loss function; use the K-means algorithm to cluster the malicious code family, specifically: determine the classification number K and the cluster center; divide all objects into clusters by calculating the distance between the object and the cluster center; recalculate the new The cluster center of , to judge whether the conditions are satisfied. The system includes a dimensionality reduction visualization module and a clustering module. The invention not only reduces the difficult problem of how to determine k in the K-means algorithm, but also improves the accuracy of malicious code family clustering.

Description

technical field [0001] The invention belongs to the technical field of malicious code analysis, and relates to a malicious code family clustering method and system. Background technique [0002] The K-means algorithm is one of the classic clustering algorithms. When using the K-means algorithm for clustering, it is necessary to pre-set the number of clusters, namely k. However, in practical applications, the data set is often large in scale, and it is difficult for people to determine the number of clusters in advance in the face of data with complex structure distribution. When the gap between the preset number of clusters and the actual number of clusters is too large, The clustering effect will be greatly reduced: when k is selected to be much smaller than the actual number of clusters, the data points of different classes will be clustered into the same class, resulting in a low distinction between clusters; when k is selected to be much larger than the actual number of ...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

Application Information

Patent Timeline
no application Login to View More
Patent Type & Authority Patents(China)
IPC IPC(8): G06K9/62G06F21/56
CPCG06F21/563G06F18/23213
Inventor 杨航锋李树栋吴晓波韩伟红范美华付潇鹏方滨兴田志宏殷丽华顾钊铨李默涵仇晶唐可可
Owner GUANGZHOU UNIVERSITY
Features
  • Generate Ideas
  • Intellectual Property
  • Life Sciences
  • Materials
  • Tech Scout
Why Patsnap Eureka
  • Unparalleled Data Quality
  • Higher Quality Content
  • 60% Fewer Hallucinations
Social media
Patsnap Eureka Blog
Learn More