Clustering method and device of portable execute (PE) files

A technology for executing file and clustering methods, applied in the field of network communication, can solve problems such as inability to cluster, large differences in PE files, and large computational complexity, achieve improved capabilities and early warning capabilities, reduce storage costs, and improve matching. The effect of efficiency

Inactive Publication Date: 2014-03-26
TENCENT TECH (SHENZHEN) CO LTD
View PDF3 Cites 13 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0005] The first traditional PE file clustering method needs to align the extracted features when comparing two PE files. Due to the large differences of PE files, it is very time-consuming to perform alignment and multiple features need to be aligned. For comparison, the calculation complexity is very large, and when incrementally clustering new data, the original data needs to be clustered at the same time, and the cost of data storage and processing is high; the second method of clustering PE files based on fuzzy hash , depends on the segmentation of the PE file, the starting position of the PE file segmentation and the size of the segment will affect the hash value of the file, poor stability, poor comparability; and does not touch the internal information of the PE file, making many virus PE The file will generate variants by modifying its own structure, such as adding and deleting bits, and the result will be that its fuzzy hash values ​​​​are completely different and cannot be clustered

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Clustering method and device of portable execute (PE) files
  • Clustering method and device of portable execute (PE) files
  • Clustering method and device of portable execute (PE) files

Examples

Experimental program
Comparison scheme
Effect test

Embodiment 1

[0045] see figure 1 , an embodiment of the present invention provides a method for clustering PE files, the method comprising:

[0046] 101. Extract the features of the portable executable PE file;

[0047] 102. Generate a PE file identifier corresponding to the PE file according to the characteristics of the PE file;

[0048] 103. Perform clustering on the PE files according to the PE file identifiers.

[0049] Specifically, after extracting the features of the portable executable PE file, it includes:

[0050] The features of the extracted PE file are formed into a PE file feature set; the PE file feature set includes at least one feature;

[0051] Correspondingly, according to the characteristics of the PE file, a PE file identifier corresponding to the PE file is generated, including:

[0052] According to the PE file feature set, a PE file identifier corresponding to the PE file is generated.

[0053] Specifically, according to the characteristics of the PE file, a P...

Embodiment 2

[0063] see figure 2 , an embodiment of the present invention provides a method for clustering PE files, the method comprising:

[0064] 201. Extract features of the portable executable PE file;

[0065] Specifically, PE file is a file format under Windows, which widely exists in Windows, and most executable virus files are in PE file format;

[0066] Extract a set of features of the PE file from the PE file. The features of the PE file can be instruction sequence, import function name, export function name and visible string, etc., and other features of the PE file can also be extracted. For extracting the features of the PE file This embodiment of the present invention does not limit the number of . function name, but there are only two features in the PE file: the instruction sequence and the imported function name, and there is no feature of the exported function name, so you only need to extract the instruction sequence and the imported function name.

[0067] 202. For...

Embodiment 3

[0081] see image 3 , an embodiment of the present invention provides a portable executable file clustering device, the device includes:

[0082] The extraction module 301 is used to extract the features of the portable executable PE file;

[0083] The generating module 302 is used for generating a PE file identifier corresponding to the PE file according to the characteristics of the PE file;

[0084] The clustering module 303 is configured to cluster the PE files according to the PE file identifiers.

[0085] Specifically, the extraction module 301 is configured to, after extracting the features of the portable executable PE file, form the features of the extracted PE files into a PE file feature set; the PE file feature set includes at least one feature;

[0086] Correspondingly, the generating module 302 is configured to generate a PE file identifier corresponding to the PE file according to the PE file feature set.

[0087] Specifically, the generation module 302 inclu...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention discloses a clustering method and device of portable execute (PE) files, and belongs to the field of network communication. The clustering method comprises the steps that features of the PE files are extracted, PE file identifications corresponding to the PE files are generated according to the features of the PE files, and the PE files are clustered according to the PE file identifications. The clustering device comprises an extraction module, a generation module and a clustering module. According to the features extracted from the PE files, the PE file identifications corresponding to the PE files are generated, the PE files are clustered according to the PE file identifications, and the clustering method and device of the PE files reduce the number of the PE files of a virus analysis end and a virus searching and killing server, cluster the irregular PE files to form regular categories, reduce storage cost, improve matching efficiency, and improve the capacity of resisting variant virus PE files and the early warning capacity.

Description

technical field [0001] The present invention relates to the field of network communication, in particular to a method and device for clustering portable executable files. Background technique [0002] With the development of the Internet, information has exploded, and the cycle of computer malicious programs such as computer viruses, worms, and Trojan horse programs is becoming shorter and shorter. Every day, a large number of viruses endanger the safety of users. Since most virus files are in the PE (Portable Executable, Portable Executable) file format, although the number of these virus PE files is large, many of them have similar characteristics. Conducive to virus analysis and killing. [0003] At present, the PE file clustering methods are mainly divided into two types: one is the traditional PE file clustering method, such as k-means clustering, hierarchical clustering, etc., first extracts some features of PE files, and then according to the extracted features Comp...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
Patent Type & Authority Applications(China)
IPC IPC(8): G06F21/56
CPCG06F21/566G06F21/56G06F16/1727G06F16/122
Inventor 杨宜于涛白子潘崔精兵吴家旭
Owner TENCENT TECH (SHENZHEN) CO LTD
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products