PE (portable executable) file pack detection method based on static characteristics

A technology of static features and detection methods, applied in the fields of instruments, electrical digital data processing, platform integrity maintenance, etc., can solve the problems of single judgment index and low accuracy of detection rules, and achieve high accuracy and good packing detection ability. , the effect of enriching file characteristics

Active Publication Date: 2011-04-20
SICHUAN UNIV
0 Cites 33 Cited by

AI-Extracted Technical Summary

Problems solved by technology

[0004] The main problem faced by the above method is: because it is not known in advance whether the executable file to be detected has been packed, all executable files to be detected have to be processed by a general-purpo...
the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Abstract

The invention discloses a PE (portable executable) file pack detection method based on static characteristics. Before a target PE file is practically unpacked, a method of the static analysis on characteristics of the PE file is used for detecting whether the PE file is packed. Only the packed PE file needs to be handed to a general unpacking tool to unpack, and the unpacked codes are subject to virus detection by anti-virus software. Because the process that the practically unpacked PE file is processed by the general unpacking tool, the PE file pack detection process based on the static characteristics has the advantages of short time consumption, low false report rate and low failed report rate, thus improving the virus detection process and saving processing time.

Application Domain

Technology Topic

Anti virusPortable Executable +5

Image

  • PE (portable executable) file pack detection method based on static characteristics
  • PE (portable executable) file pack detection method based on static characteristics
  • PE (portable executable) file pack detection method based on static characteristics

Examples

  • Experimental program(1)

Example Embodiment

[0034] When the universal unpacking tool detects malware, because it is not known in advance whether the PE file to be detected is packed, all the PE files to be detected have to be actually executed to try to unpack them before being detected by the anti-virus software. shell. This introduces a large amount of calculation and time-consuming problems. To solve this problem, the present invention proposes to detect whether the target PE file is shelled before it is actually executed to unpack it. Only PE files detected as packed are handed over to the general unpacking tool for unpacking processing; and PE files detected as unpacked are directly handed over to the anti-virus software for detection, without the need for processing by the general unpacking tool.
[0035] Virus producers often rewrite the packer tool to produce new packer tools, so that traditional signature-based packer detection tools have the disadvantage of a high false negative rate. Aiming at this problem, the present invention proposes a shelling detection method based on static analysis of PE file characteristics. In our invention, we are not limited to analyzing the entropy of the PE file like the Bintropy tool. We extract a series of feature values ​​from the PE file for packing detection. This method has the advantages of low false positive rate and low false negative rate.
[0036] Not only that, we use training data to derive more accurate PE file packing detection rules based on machine learning algorithms instead of statistical methods.
[0037] figure 1 The application model of the present invention is shown. For each PE file to be detected, the present invention first performs a static file analysis on it, extracts a series of feature values ​​of the PE file, and then uses a PE file classifier to perform shelling detection. The PE file detected as packed is unpacked with a general unpacking tool, and then the signature-based anti-virus software is used to detect whether it is a virus; the PE file detected as unpacked is skipped by the general unpacking tool Processing, directly use signature-based anti-virus software for virus detection. Since the actual unpacked PE file is processed by the general unpacking tool, and the static analysis of the PE file has a small amount of calculation and time-consuming, this improves the virus detection process and saves processing time.
[0038] The invention is specifically described as follows:
[0039] (1) Introduction to PE file format
[0040] The PE file format is used in 32-bit and 64-bit Microsoft Windows operating systems. The PE file encapsulates all kinds of information required by the operating system loader, including output tables, input tables, resource management data, and so on. A simple structure of the PE file format:
[0041] PE Header
[0042] code section 1
[0043] The PE file header tells the operating system how to map the PE file into memory. Each code section and data section in the PE file is identified by a name and identified as readable, writable, or executable. Generally, a code section is marked as readable/unwritable/executable, so that the operating system knows that the memory area corresponding to the code section contains executable code, and the corresponding write operation to the memory area should be prohibited. On the other hand, the data section is usually marked as readable/writable/non-executable, so that the program counter (ProgramCounter, PC) should not point to the memory area where the data section is located. Most PE files contain a code section named .text and a data section named .data. During execution, when a program needs to call an operating system API (Application Programming Interface), it searches the Import Address Table (IAT) to obtain the address of the operating system API, and then jumps to the address to execute.
[0044] (2) PE file characteristic value:
[0045] We extract 9 feature values ​​for packing detection from the PE file:
[0046] 1) Number of standard sections and non-standard sections:
[0047] Unpacked PE files usually contain well-defined standard sections. For example, a PE file compiled by the Microsoft Visual C++ compiler usually contains at least one code section named .text and two data sections named .data and .rsrc. On the other hand, the naming of the code section and data section of a PE file with a shell usually does not follow these naming standards. For example, the PE file created by the UPX packer usually contains two sections named .UPX0 and UPX1, and one section named .rsrc. .UPX0 and .UPX1 are not standard section names, so they can be used to help detect packed and unpacked PE files. In addition to UPX, PE files produced by many other packers usually also contain non-standard section names. Therefore, the number of standard section names and non-standard section names contained in the PE file can be used to help us detect whether a PE file is packed.
[0048] 2) The number of sections with executable attributes:
[0049] When analyzing the output of the packer tool, we noticed that some packed programs did not contain any section with executable attributes. This is very abnormal, because if the operating system does not allow the PC to point to the section without executable attributes If the memory area is not installed, the program will crash, because Window XP Serivce Pack 2 introduces memory protection technology. However, on older versions of the Windows platform, a program that does not contain any sections with executable attributes may still run. On the other hand, the .text section of unpacked PE files is always marked as executable. Therefore, the information of the number of sections with executable attributes contained in the PE file can help us detect whether a PE file is packed.
[0050] 3) Number of sections with readable/writable/executable attributes at the same time:
[0051] Assume that an encrypted program P is hidden inside a packed program P'. When executing program P′, P′ will first execute a decryption instruction to decrypt program P, and then execute program P after decryption. To complete this process, the code of the decrypted program P must be written into a section with executable attributes. In this way, the program P'needs to contain at least one section with readable/writable/executable attributes at the same time. On the other hand, the executable section (usually the .text section) of the unpacked PE file does not necessarily have a writable attribute. Therefore, the number of sections with readable/writable/executable attributes in a PE file can help detect whether a PE file is packed.
[0052] 4) The number of entries in the IAT table:
[0053] The IAT table in the PE file contains the address in the memory of the external function that needs to be called. These external functions come from the dynamic link library (Dynamically Linked Library, DLL). When the PE file is loaded, the operating system loader is responsible for writing the memory address of each external function to be called into the IAT table. Every time the program calls an external function, it finds the address of the external function in the memory by looking up the IAT table.
[0054] Most unpacked programs call many external functions, for example, call Windows API to read/write files, create windows, or manage network connections, etc., so the IAT table usually contains multiple entries. On the other hand, shelled programs usually rarely call external functions. The main reason is that the unpacking instruction does not need to call external functions to complete unpacking. For example, there is no need to create windows, and no need to manage network connections, etc. In this way, the IAT table in a PE file with a shell contains a few entries.
[0055] 5) Entropy of PE file header, code section, data section, and PE file:
[0056] In the packed program P′, the code of the encrypted program P is usually stored in a code section or a data section (if a section has executable attributes, it is considered a code section, otherwise, it is considered a data section) . Because the program P is encrypted, its code looks very "random" and lacks organization. On the other hand, unencrypted code is very organized, for example, instructions will contain the opcode and the memory address of the operand. The data information contained in the unencrypted data section will also be organized. Based on this observation, we calculate the byte entropy of the code section and data section of the PE file. If the entropy of a section is close to 8 bits (the maximum value of byte entropy), then this section is likely to contain encrypted code.
[0057] The code section and data section are not the only places used to hide encrypted codes. Some optional fields in the PE file header are not necessary for the loading of the PE file itself, so some packers may use these optional fields to hide the encryption code. For this reason, we also calculate the entropy of the PE file header. Considering that the PE file is more complicated and contains other unusable space, the encryption code may be hidden in many other places. Therefore, we also calculate the entropy of the entire PE file.
[0058] Specific operation:
[0059] We collected 2598 shelled virus PE files and 2231 unpacked normal PE files; in addition, we used free shelling tools on the Internet to artificially generate 669 shelled normal PE files. In this way, we have a total of 5498 PE files for testing. Since the PEiD tool may be the most widely used signature-based executable file shelling detection tool, we use the PEiD tool to detect how many PE files are shelled out of the 3267 shelled PE files. Experimental results show that the PEiD tool can only detect that 2262 PE files are shelled, and the remaining 1005 PE files are not shelled. Among the 1005 PE files, 604 PE files are shelled virus files, and the remaining 401 PE files are normal PE files that have been manually packed. This means that the underreporting rate of the PEiD tool is 30.8%.
[0060] We have developed a PE file analysis tool to extract 9 feature values ​​of any PE file. The 9 characteristic values ​​are shown in Table 1.
[0061] Table 1 Summary of 9 characteristic values ​​of PE files
[0062]
[0063]
[0064] We use this tool to extract the eigenvalues ​​of 5498 PE files used in the experiment, and thus obtain a data set. We divide the data set into two parts: 1) the training set, which contains 2231 unpacked normal PE files and the feature values ​​of 2262 shelled PE files; 2) the test set, which contains 1005 PEiD tools failed The characteristic value of the PE file with the shell is detected.
[0065] We used the free and open source machine self-learning tool developed by Weka to conduct our experiments and selected 4 different classifiers, namely:
[0066] (a) Bayesian classifier;
[0067] (b) J48 decision tree classifier based on C4.5 decision tree classification algorithm developed by Weka;
[0068] (c) The IBk classifier developed by Weka based on the K-Nearest Neighbor (KNN) classification algorithm;
[0069] (d) Multi Layer Perceptron (MLP) classifier.
[0070] We first use the training set to train each classifier; then use the test set to test it, and calculate the accuracy of each classifier on the test set. Compared with the PEiD tool, the accuracy rate can be used as an evaluation of the general ability of PE file packing detection of each classifier.
[0071] Table 2 shows the test results of the four classifiers.
[0072] Table 2 Test results of four classifiers
[0073] Classifier
[0074] Result analysis:
[0075] It can be seen from the test results that in the 1005 PE files included in the test set that were not detected by the PEiD tool as being shelled, all the classifiers could correctly detect that more than 95% of the PE files were shelled. Among them, the MLP classifier has the highest detection result, reaching 98.91%.
[0076] On a 2GHz dual-core AMD Opteron processor, the average time it takes to extract the 9 feature values ​​of a PE file is about 2.82 seconds per PE file. We believe that after optimization, the time should be reduced.
the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

no PUM

Description & Claims & Application Information

We can also present the details of the Description, Claims and Application information to help users get a comprehensive understanding of the technical details of the patent, such as background art, summary of invention, brief description of drawings, description of embodiments, and other original content. On the other hand, users can also determine the specific scope of protection of the technology through the list of claims; as well as understand the changes in the life cycle of the technology with the presentation of the patent timeline. Login to view more.
the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Similar technology patents

Classification and recommendation of technical efficacy words

  • Reduce processing time
  • Improve efficiency
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products