Clustering method and device of files

A clustering method and file technology, applied in the field of information processing, can solve problems such as complexity and large amount of calculation, and achieve the effect of reducing the amount of calculation and complexity

Active Publication Date: 2014-08-27
TENCENT TECH (SHENZHEN) CO LTD
View PDF6 Cites 4 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

However, in the above-mentioned existing file clustering methods, the calculation amount of similarity comparison is relatively large, and it is relatively complicated.

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Clustering method and device of files
  • Clustering method and device of files
  • Clustering method and device of files

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0024] The following will clearly and completely describe the technical solutions in the embodiments of the present invention with reference to the accompanying drawings in the embodiments of the present invention. Obviously, the described embodiments are only some, not all, embodiments of the present invention. Based on the embodiments of the present invention, all other embodiments obtained by persons of ordinary skill in the art without creative efforts fall within the protection scope of the present invention.

[0025] The embodiment of the present invention provides a file clustering method. For example, the clustering of PE and other files is mainly performed by a computer. The flow chart is as follows: figure 1 shown, including:

[0026] Step 101, feature extraction is performed on multiple information blocks in the file to be processed respectively.

[0027] It can be understood that each file can be divided into different information blocks. For a PE file, the PE fil...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

Disclosed are a method and device for clustering files, which are applied to the technical field of information processing. In the embodiments of the present invention, when clustering files to be processed, information fingerprints of the files to be processed are obtained by processing information fingerprints of features of a plurality of information blocks contained in the file to be processed and are compared, and files to be processed with the same information fingerprint are taken as one cluster, so as to realize the clustering of files. The features of the information blocks in the files to be processed are identified by means of information fingerprints in this way, and then clustering is performed according to identifiers. Compared to similarity comparisons in the prior art, the calculation amount and complexity of the method for calculating and clustering an identifier of a feature in the embodiments of the present invention is greatly reduced.

Description

technical field [0001] The invention relates to the technical field of information processing, in particular to a file clustering method and equipment. Background technique [0002] With the development of the Internet, information has grown explosively. Among them, the information of computer malicious programs such as computer viruses, worms, and Trojan horses endangers the security of user equipment every day, and most of the files of malicious programs are portable and executable ( Portable Executable, PE) format files, although the number of these PE files is large, many of them have family characteristics. In this way, PE files can be clustered (Cluster) first, that is, some similar objects are formed into a group according to a pre-defined metric, and then a new family of PE files can be found from the clustering results, which is beneficial to virus analysis and killing. [0003] At present, there are mainly two clustering methods for files. One is traditional clust...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
Patent Type & Authority Applications(China)
IPC IPC(8): G06F21/56
CPCG06F17/30G06F16/16G06F16/325G06F16/137G06F16/35G06F16/285G06F16/1727
Inventor 杨宜于涛陶波
Owner TENCENT TECH (SHENZHEN) CO LTD
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products