File type identification method and server

An identification method and file type technology, applied in the field of information processing, can solve problems such as inaccurate virus types, inability to prevent, and complex technology, so as to shorten the time from the emergence of the virus to the detection and killing, reduce the components of manual intervention, and reduce the identification results. subtle effects

Active Publication Date: 2017-11-03
TENCENT TECH (SHENZHEN) CO LTD
View PDF3 Cites 29 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

However, the above-mentioned processing methods mainly rely on manual extraction of signatures and formulating corresponding identification rules. This detection method requires virus analysts to manually analyze existing samples to find out the corresponding characteristics, which heavily depends on the ability of virus analysts. Therefore, a large number of experienced personnel are required to meet the needs of solving problems. Due to the complexity of the technology, the identification efficiency is relatively low
Moreover, the use of artificially summarized signatures can generally only deal with known viruses, and cannot prevent possible problems, so it has a certain lag
[0003] At present, in the existing technology, there are also machine learning methods to classify samples, but such schemes only divide training samples into virus and non-virus. On the one hand, due to the diversity and uneven distribution of viruses, this The model trained by a machine classification method that only distinguishes between viruses and non-viruses is not highly targeted, resulting in low accuracy, and it is easy to lose some characteristics of niche viruses, resulting in many false positives. On the other hand, the granularity of recognition can only be It is virus / non-virus, but not precise to the type of virus

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • File type identification method and server
  • File type identification method and server
  • File type identification method and server

Examples

Experimental program
Comparison scheme
Effect test

Embodiment 1

[0042] Embodiments of the present invention provide a method for identifying file types, such as figure 2 shown, including:

[0043] Step 201: Divide the training samples into training samples of at least one type of virus and training samples of normal files;

[0044] Step 202: performing feature extraction on the training samples to obtain a feature set of each training sample;

[0045] Step 203: Using the feature set of each training sample, determine the feature information base of each type of virus in the at least one type of virus, and determine the feature information base of normal files;

[0046] Step 204: Based on the feature information base of each type of virus and the feature information base of normal files, determine a classification model for identifying the at least one type of virus and normal files.

[0047] Here, the training samples divided into at least one type of virus training samples and normal file training samples can rely on the pre-establishe...

Embodiment 2

[0092] Embodiments of the present invention provide a method for identifying file types, such as figure 2 shown, including:

[0093] Step 201: Divide the training samples into training samples of at least one type of virus and training samples of normal files;

[0094] Step 202: performing feature extraction on the training samples to obtain a feature set of each training sample;

[0095] Step 203: Using the feature set of each training sample, determine the feature information base of each type of virus in the at least one type of virus, and determine the feature information base of normal files;

[0096] Step 204: Based on the feature information base of each type of virus and the feature information base of normal files, determine a classification model for identifying the at least one type of virus and normal files.

[0097] Here, the training samples divided into at least one type of virus training samples and normal file training samples can rely on the pre-establishe...

Embodiment 3

[0127] On the basis of establishing the classification model provided by the above-mentioned embodiment 1 or embodiment 2, this embodiment focuses on how to use the classification model to identify the information sent by the terminal device, see Figure 12 ,include:

[0128] Step 1201: Divide the training samples into training samples of at least one type of virus and training samples of normal files;

[0129] Step 1202: performing feature extraction on the training samples to obtain a feature set of each training sample;

[0130] Step 1203: Using the feature set of each training sample, determine the feature information base of each type of virus in the at least one type of virus, and determine the feature information base of normal files;

[0131] Step 1204: Based on the characteristic information database of each type of virus and the characteristic information database of normal files, determine a classification model for identifying the at least one type of virus and no...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention provides a file type identification method and a server. The method comprises the following steps that: dividing a training sample into a training sample for at least one type of viruses and a training sample for normal files; carrying out feature extraction on the training sample to independently obtain the feature set of each training sample; utilizing the feature set of each training sample to determine the feature information library of each type of viruses in at least one type of viruses, and determining the feature information library of the normal files; and on the basis of the feature information library of each type of viruses and the feature information library of the normal files, determining a classification model used for identifying the at least one type of viruses and the normal files.

Description

technical field [0001] The invention relates to information identification technology in the field of information processing, in particular to a file type identification method and server. Background technique [0002] The existing technical solutions for detecting virus files are as follows: analysts analyze virus files, extract virus signatures, store virus signatures into the database, antivirus engines scan existing files according to the virus database, and report viruses if they encounter matching signatures. However, the above-mentioned processing method mainly relies on manually extracting signatures and formulating corresponding identification rules. This detection method requires virus analysts to manually analyze existing samples to find out the corresponding characteristics, which heavily depends on the ability of virus analysts. Therefore, a large number of experienced personnel are needed to meet the needs of solving the problem. Due to the complexity of the te...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
Patent Type & Authority Applications(China)
IPC IPC(8): G06F21/56
CPCG06F21/562
Inventor 罗元海王佳斌
Owner TENCENT TECH (SHENZHEN) CO LTD
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products