A file format identification method and system based on compressed package content

A technology of file format and recognition method, which is applied in the field of network information security, can solve problems such as inaccurate file format recognition, achieve the effects of saving decompression time, accurate recognition results, and improving recognition efficiency

Active Publication Date: 2019-05-07
HARBIN ANTIY TECH
View PDF10 Cites 0 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0003] In view of the defect that the existing technology cannot accurately identify the file format with the compressed package as the carrier, the present invention proposes a file format identification method and system based on the content of the compressed package

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • A file format identification method and system based on compressed package content
  • A file format identification method and system based on compressed package content
  • A file format identification method and system based on compressed package content

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0028] In order to enable those skilled in the art to better understand the technical solutions in the embodiments of the present invention, and to make the above-mentioned purposes, features and advantages of the present invention more obvious and easy to understand, the technical solutions in the present invention will be further detailed below in conjunction with the accompanying drawings illustrate.

[0029] The present invention provides an embodiment of a method for file format identification based on compressed package content, including a feature extraction stage and a file format identification stage, wherein the method flow chart of the feature extraction stage is as follows figure 1 shown, including:

[0030] S101: Collect compressed package type files and record the file format. The compressed package type files include office series documents, PDF documents, APK files and other files with compressed packages as carriers. The file formats include .doc format and .d...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention discloses a method for identifying a file format based on compressed package content. The method comprises a feature extraction stage and a file identification stage; the feature extraction stage comprises the steps of: obtaining a compressed package type file, extracting all the file names in the compressed package type file, carrying out statistical analysis to obtain a feature identifier of each compressed package format, and forming a feature library; the file identification stage comprises the steps of: obtaining a to-be-identified file, judging whether the to-be-identified file is in a compressed package type, obtaining all the file names contained in the to-be-identified file in the compressed package type, matching the file names with features in the feature library, and finally reporting the file format of the to-be-identified file. The invention also provides a system for identifying the file format based on the compressed package content, which remedies disadvantages of an existing format identifying technology that the specific compressed package format cannot be identified precisely, but only the compressed package file is displayed when the compressed package type is identified.

Description

technical field [0001] The invention relates to the field of network information security, in particular to a file format identification method and system based on compressed package content. Background technique [0002] Format recognition technology is an important basic technology in the field of network information security technology, and format recognition has a great auxiliary effect on follow-up work such as virus killing and vulnerability detection. Currently, there are two commonly used format recognition technologies, one is suffix-based recognition, and the other is file format-based recognition. Based on the file format suffix identification technology, if there is no file suffix or the suffix is ​​modified, it is difficult to accurately identify the format of the file to be recognized. The method based on file format magic recognition is currently the main file format recognition method. This method is helpless for file formats that use compressed packages as ...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
Patent Type & Authority Patents(China)
IPC IPC(8): G06F16/16G06F16/174
Inventor 沈长伟贺磊钢童志明张栗伟何公道
Owner HARBIN ANTIY TECH
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products