Method, system and storage medium for apt organization identification based on stacking integration
A tissue identification and algorithm technology, applied in the field of network security, can solve the problems of not being able to meet the needs of a large number of samples, difficult features, large influence, etc., and achieve the effect of improving the efficiency of automatic identification, improving the accuracy of identification, and improving the effectiveness
- Summary
- Abstract
- Description
- Claims
- Application Information
AI Technical Summary
Problems solved by technology
Method used
Image
Examples
Embodiment
[0058] Such as figure 1 As shown, the present embodiment is based on the stacking integrated APT organization identification method, including the following steps:
[0059] S1: Use TF-IDF algorithm combined with n-gram to extract behavioral features from malware samples and vectorize them. n-gram can be selected according to actual data. Here it is recommended to choose n-gram=(1,5) to form malicious behavior vector feature set;
[0060] For the malicious sample behavior text features, the word frequency (TF) of each word is counted separately, and then a weight parameter (IDF) is attached to it.
[0061]
[0062] Among them, TF i,j : Frequency of term i appearing in sample j; n i,j : the number of times entry i appears in sample j; ∑ k no k,j : The total number of words appearing in sample j.
[0063] Then calculate the weight:
[0064]
[0065] Among them, |D| represents the total number of samples, |j:i∈d j | Indicates the number of samples that contain term i...
PUM
Login to View More Abstract
Description
Claims
Application Information
Login to View More 


