Supercharge Your Innovation With Domain-Expert AI Agents!

Method and system for detecting malicious PDF documents

A malicious and document technology, applied in the field of computer communication detection, can solve problems such as difficulty in training an accurate detection model

Pending Publication Date: 2020-01-21
GUANGDONG UNIV OF TECH
View PDF5 Cites 3 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0005] In order to overcome the existing deep learning method and system for detecting malicious PDFs that rely too much on existing features, the present invention has many defects in dealing with the threat of new malicious PDF files, and the number of new malicious PDF samples is small, and it is difficult to train In view of the drawbacks of a detection model with high accuracy, the present invention proposes a new method and system for detecting malicious PDF documents. When the number of malicious PDF samples is small, PDF samples can be flexibly generated to supplement and ensure the detection effect

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Method and system for detecting malicious PDF documents
  • Method and system for detecting malicious PDF documents

Examples

Experimental program
Comparison scheme
Effect test

Embodiment 1

[0042] Such as figure 1 A flow chart of a method for detecting malicious PDF documents shown, the method comprising:

[0043] S1. Obtain a certain number of malicious PDF file samples and normal PDF file samples to form a malicious PDF file sample data set and a normal PDF file sample data set respectively; the number of normal PDF file samples ranges from 10,000 to 50,000, depending on the number of samples Enough, each model formed by the subsequent convolutional neural network can fully learn the features, and two endpoint values ​​can be obtained. In this embodiment, the number of normal PDF file samples is 50000, and the number of malicious PDF file samples is normal PDF 1 / 4 of the file sample size.

[0044] S2. The normal PDF file sample data set is replaced by a first grayscale image in a binary manner; the target type malicious PDF file sample data set is converted into a second grayscale image in a binary manner;

[0045] The conversion process described is:

[0046]...

example

[0063] Practical examples of the present invention are as follows:

[0064] S10: Obtain a certain number of target type malicious PDF file samples and a sufficient number of normal PDF file samples respectively, as the input of the generation confrontation network used for data enhancement, and the input of the training convolutional neural network malicious file detector;

[0065] Traditional deep learning requires a large number of malicious PDF file samples and a sufficient number of normal PDF file samples, and the number of positive and negative samples must be balanced in order to achieve better results when training the initial classifier model. However, under normal circumstances, it is difficult to obtain sufficient samples of malicious PDF files, and when new types of PDF files appear, the problem of insufficient number of new malicious PDF samples will be faced.

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

PUM

No PUM Login to View More

Abstract

The invention discloses a method for detecting malicious PDF (Portable Document Format) documents, which comprises the following steps of: converting an acquired malicious PDF document sample data setand a normal PDF document sample data set into a first gray level image and a second gray level image; constructing a deep convolutional generative adversarial network for increasing the number of the first grayscale images by using the first convolutional neural network; inputting the second grayscale image and the random noise into a generative adversarial network to generate a malicious PDF file grayscale image; constructing an initial classifier model of the malicious PDF file by utilizing a second convolutional neural network; training an initial classifier model to obtain a core malicious PDF file classifier; and detecting the PDF file to be detected through the core malicious PDF file classifier. The invention further discloses a system for detecting the malicious PDF document, thesystem has the advantage of automatically extracting image features, the defect that the prior art excessively depends on existing features is overcome, and the defect that when the number of novel malicious PDF samples is small, it is difficult to train a detection model with high accuracy is overcome.

Description

technical field [0001] The invention relates to the technical field of computer communication detection, and more particularly, to a method and system for detecting malicious PDF documents. Background technique [0002] With the development of information technology, the carrier format of information has also become various, among which PDF file is an electronic file format, because it is easy to use, small in size and stable, it has been widely used in various industries daily work in the field. PDF files have their own public standard specifications, and there are also dedicated PDF readers. However, while PDF files are widely used in information storage and dissemination, they have also attracted the attention of many attackers. At present, there are a large number of malicious codes on the Internet. PDF documents, users with weak security awareness are easily attacked when they download and open these malicious PDF files, and PDF files with malicious codes are often del...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

Application Information

Patent Timeline
no application Login to View More
Patent Type & Authority Applications(China)
IPC IPC(8): G06K9/00G06K9/62G06N3/04G06N3/08
CPCG06N3/08G06V30/40G06N3/045G06F18/24
Inventor 凌捷熊夙陈家辉
Owner GUANGDONG UNIV OF TECH
Features
  • R&D
  • Intellectual Property
  • Life Sciences
  • Materials
  • Tech Scout
Why Patsnap Eureka
  • Unparalleled Data Quality
  • Higher Quality Content
  • 60% Fewer Hallucinations
Social media
Patsnap Eureka Blog
Learn More