Looking for breakthrough ideas for innovation challenges? Try Patsnap Eureka!

Malicious PDF document detection method based on TF-IDF algorithm and SVDD algorithm

A TF-IDF and detection method technology, applied in the field of malicious PDF document detection, can solve the problem that PDF documents are easily infected by JavaScript code, and achieve the effect of avoiding serious harm and efficient analysis and detection

Active Publication Date: 2018-04-20
GUIZHOU AEROSPACE INST OF MEASURING & TESTING TECH
View PDF2 Cites 8 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0005] The technical problem solved by the present invention: provide a malicious PDF document detection method based on TF-IDF algorithm and SVDD algorithm, to solve the problem that PDF documents are easily infected by JavaScript code because there is no special malicious PDF document detection method at present

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Malicious PDF document detection method based on TF-IDF algorithm and SVDD algorithm
  • Malicious PDF document detection method based on TF-IDF algorithm and SVDD algorithm
  • Malicious PDF document detection method based on TF-IDF algorithm and SVDD algorithm

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0054] A malicious PDF document detection method based on TF-IDF algorithm and SVDD algorithm, comprising the following steps:

[0055] Step 1: Collect a certain number of malicious PDF documents and normal PDF documents as sample sets;

[0056] Step 2: Generate a detection model that can detect malicious PDF documents;

[0057] Step 3: Detect the PDF document to be tested, take the PDF document to be tested as input, and judge whether the PDF document to be tested is a malicious PDF document through the discriminant function of the detection model.

[0058] The detection model for generating detectable malicious PDF documents described in step 2 also includes the following steps:

[0059] Step 2.1: Locate and extract the suspicious JavaScript code contained in the malicious PDF document in the sample set;

[0060] Step 2.2: Use the TF-IDF algorithm to generate malicious PDF document features, obtain a series of feature words, and count the TF-IDF values ​​of the feature wor...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

PUM

No PUM Login to View More

Abstract

The invention discloses a malicious PDF document detection method based on a TF-IDF algorithm and an SVDD algorithm. The method includes the following steps: 1) collecting malicious PDF documents andnormal PDF documents as sample sets; 2) generating a detection model capable of detecting the malicious PDF documents; and 3) detecting PDF documents to be detected. The malicious PDF documents and the normal PDF documents are collected as the sample sets in the step 1); suspicious JavaScript codes contained in the malicious PDF documents in the sample sets are positioned and extracted in the step2), the TF-IDF algorithm is adopted to perform malicious PDF document feature generation, and the SVDD algorithm is adopted to generate the PDF malicious document detection model and a discriminationfunction; and in the step 3), the judgment on the PDF documents to be detected through the detection model is achieved. According to the malicious PDF document detection method, the malicious PDF documents can be analyzed and detected accurately and efficiently, and serious hazards posed on personal and company property and privacy security by the malicious PDF documents are avoided.

Description

technical field [0001] The invention relates to the technical field of computer information security, in particular to a malicious PDF document detection method based on TF-IDF algorithm and SVDD algorithm. Background technique [0002] Portable Document Format (PDF) is an electronic file format designed by Adobe Systems to support cross-platform network information publishing and interaction. PDF documents have many characteristics: PDF is a non-related computer operating system Portable document format, which will not affect the normal editing and reading of documents due to different operating system environments; PDF also supports embedded font information, high-compression pictures and vector graphics, and can also contain hypertext connections, audio and dynamic multimedia information, It has high integration. It is precisely because of the above characteristics that PDF has become the normative standard for saving document materials. [0003] With the gradual popula...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

Application Information

Patent Timeline
no application Login to View More
Patent Type & Authority Applications(China)
IPC IPC(8): G06F21/56
CPCG06F21/563G06F21/565
Inventor 冯迪郑少波杨玉龙成建宏梁登辉陈泽瑞
Owner GUIZHOU AEROSPACE INST OF MEASURING & TESTING TECH
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Patsnap Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Patsnap Eureka Blog
Learn More
PatSnap group products