Supercharge Your Innovation With Domain-Expert AI Agents!

Website classification method based on network relation graph

A classification method and technology of relational diagrams, which are applied in network data retrieval, network data indexing, special data processing applications, etc., can solve problems such as limited access field of view, difficult to do, difficult general website classification, etc., to improve detection performance , the effect of weakening the detection effect, reducing the computational complexity and the number of iterations of training

Active Publication Date: 2017-02-22
成都知道创宇信息技术有限公司
View PDF6 Cites 11 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

For LAN users, their behavior habits often have a lot in common, which also brings limitations in access vision
Secondly, because this method relies on user behavior for analysis, it can only distinguish whether the website is malicious, and it is difficult to distinguish the malicious type of the website.
More difficult to classify general sites

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Website classification method based on network relation graph

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0031] The present invention will be further described in detail below in conjunction with the accompanying drawings and specific embodiments. The classification method of the present invention is divided into three parts: data collection and logic processing, data table maintenance, feature extraction and classification. The core of the present invention is to make full use of the important role of the site network relation diagram in the classification process. The process flow of the method of the present invention is specifically:

[0032] 1. Training data collection and data preprocessing

[0033] Use the malicious data of "Security Alliance" as the sample data source to obtain the existing data classification and URL data. This method must not only handle sites within the system, such as existing sample data sources and new sites crawled from these site pages, but also need to process sites outside the system that may be obtained from other sources at any time and need...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

PUM

No PUM Login to View More

Abstract

The invention discloses a website classification method based on a network relation graph. The website classification method comprises the following steps: using the malicious data of 'security alliance' as a sample data source and obtaining existing data classification and URL data; forming a site mapping table and a word frequency analysis table, extracting a fingerprint feature of a sample for constructing a feature table and forming a type table; forming an undirected weight graph by each site and a connection weight thereof; constituting a network relation graph by numerous sites and weights, and dividing the big network relation graph into several sub-graphs through a map clustering algorithm; extracting and classifying the fingerprint features of each task unit on a server thereof through a site fingerprint feature extractor and a site fingerprint feature classifier. According to the website classification method based on the network relation graph, the detection speed of website classification is remarkably improved, and the specific type is distinguished. For the newly added site, a processing unit is determined according to the connection relation of the newly added site and the existing site, the computational complexity and the number of iterations of the training are thus effectively reduced, and the computing resources are saved.

Description

technical field [0001] The invention relates to the field of Internet website classification, in particular to a website classification method based on a network relation graph. Background technique [0002] The Internet is an open sharing platform. With the continuous development of the Internet, the scale of web pages continues to expand, and various web pages continue to emerge. Traditional network security agencies usually use manual review system judgment and heuristic crawler automatic identification to classify the emerging websites. But either way, there are serious challenges. First of all, the ever-expanding scale of webpage data makes it difficult for existing technologies to cover all webpages, and the speed of data processing is far behind the update speed of webpages. The disadvantage of manual review is that the effect of classification depends on the professional ability of reviewers, and the processing cycle is relatively long. Secondly, malicious URLs of...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

Application Information

Patent Timeline
no application Login to View More
IPC IPC(8): G06F17/30
CPCG06F16/951G06F16/9566
Inventor 杨珩
Owner 成都知道创宇信息技术有限公司
Features
  • R&D
  • Intellectual Property
  • Life Sciences
  • Materials
  • Tech Scout
Why Patsnap Eureka
  • Unparalleled Data Quality
  • Higher Quality Content
  • 60% Fewer Hallucinations
Social media
Patsnap Eureka Blog
Learn More