Malicious PDF document detection method based on active learning
A technology of active learning and detection methods, applied in machine learning, program/content distribution protection, instruments, etc., to achieve good detection results, improved model performance, and high processing efficiency.
- Summary
- Abstract
- Description
- Claims
- Application Information
AI Technical Summary
Problems solved by technology
Method used
Image
Examples
Embodiment
[0068] This embodiment uses some benign samples and malicious samples downloaded from the Contagio data warehouse as the original data set (including 9000 benign samples and 9000 malicious samples) to train the model, and evaluate the performance after the training. The specific operation is as follows:
[0069] First, feature extraction is performed on PDF documents.
[0070] The feature extraction uses the poppler tool, and the implementation of the extractor mainly includes such as Figure 8 cpp documentation as shown. The feature extraction in the implementation process first takes 80% of the samples as the training set and 20% of the samples as the test set, uses the extractor to extract the features of each input PDF document, and counts the number of occurrences of each feature; then the number of occurrences in the training set Structural paths with more than 300 times are used as features, that is, the occurrence threshold is set to 300, and pdf (features are numeri...
PUM
Abstract
Description
Claims
Application Information
- R&D Engineer
- R&D Manager
- IP Professional
- Industry Leading Data Capabilities
- Powerful AI technology
- Patent DNA Extraction
Browse by: Latest US Patents, China's latest patents, Technical Efficacy Thesaurus, Application Domain, Technology Topic, Popular Technical Reports.
© 2024 PatSnap. All rights reserved.Legal|Privacy policy|Modern Slavery Act Transparency Statement|Sitemap|About US| Contact US: help@patsnap.com