Malicious PDF document intelligent detection method and system based on graph structure

A technology of intelligent detection and graph structure, which is applied in the field of information security, can solve the problems of low training efficiency of deep learning models, inability to fully reflect document attributes, and large feature space dimensions, so as to improve accuracy and ease of use, reduce dimensions, The effect of reducing training stress

Pending Publication Date: 2021-11-23
ARMY ENG UNIV OF PLA
View PDF0 Cites 0 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0004] The disadvantages of existing malicious PDF document detection methods include: relying on expert experience to select features, which cannot fully reflect document attributes; in the face of adversarial samples, t

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Malicious PDF document intelligent detection method and system based on graph structure
  • Malicious PDF document intelligent detection method and system based on graph structure

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0044] The present invention will be further described below in conjunction with the accompanying drawings. The following examples are only used to illustrate the technical solution of the present invention more clearly, but not to limit the protection scope of the present invention.

[0045] Such as figure 1 As shown, a graph-based intelligent detection method for malicious PDF documents,

[0046] Input a document, parse it, extract its graph structure features; use the TF-IDF algorithm to simplify the graph structure, and then calculate the Laplacian matrix of the graph as the input feature; send it to the 2D-CNN model training or is the detection category.

[0047] The graph structure feature refers to extracting the structure path of the document based on the structure analysis of the PDF document according to the reference relationship of the object, and then constructing the graph structure of the document according to the structure path set.

[0048] The simplificati...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention discloses a malicious PDF document intelligent detection method and system based on a graph structure. The method comprises the steps of obtaining a to-be-detected PDF document; analyzing a PDF document to obtain a graph structure feature of the document, performing classification importance sorting on all nodes of the graph structure feature, performing retention and merging processing according to a sorting result to obtain a simplified graph structure feature, and calculating a Laplacian matrix of the simplified graph structure feature as an input feature; inputting the input features into a pre-trained convolutional neural network model, and if the output is 1, determining that the document is a malicious document; if the output is 0, judging the document to be a benign document. The graph structure extraction and image simplification method has the advantages that the dimensionality of features is reduced, the training pressure of a deep learning model is relieved, and the efficiency of a system is improved; and according to the input features, the convolutional neural network model is utilized to detect and classify the documents or automatically train parameters of the model, so that the accuracy and usability of the system are improved.

Description

technical field [0001] The invention relates to a graph structure-based intelligent detection method and system for malicious PDF documents, belonging to the technical field of information security. Background technique [0002] Traditional malicious PDF document detection methods are mainly based on signature recognition and heuristic rule matching. The advantage is that the false positive rate is low, but it is limited to the detection of existing malicious samples in the virus database, and the response to unknown malicious documents is slow. Attackers can Bypassing detection by forging new malicious documents. [0003] The feature selection of existing machine learning-based malicious document detection methods mostly relies on expert knowledge-driven, and observations are made during manual analysis of malicious documents to select feature sets (such as the number of call class objects, document page numbers, or version numbers, etc. ), or through mathematical statisti...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
IPC IPC(8): G06F21/56G06F40/205G06N3/04G06N3/08
CPCG06F21/562G06F40/205G06N3/04G06N3/08
Inventor 王金双俞远哲孙蒙邹霞
Owner ARMY ENG UNIV OF PLA
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products