Website content safety testing system and method

A content security and detection system technology, applied in the field of network security, can solve the problems of lack of JavaScript and HTML content analysis of web pages, no accelerated processing, poor performance, etc., to achieve efficient and accurate security detection, improve response speed, and extract better results Effect

Inactive Publication Date: 2018-03-30
INFORMATION & TELECOMM COMPANY SICHUAN ELECTRIC POWER
View PDF3 Cites 12 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

This method has the following disadvantages: (1) Some malicious URLs have no obvious malicious features in grammatical features and WHOIS registration information, which are very similar to normal URLs, and the false positive rate is high; Content analysis, only by analyzing DNS, WHOIS and URL information to judge the security of the URL is one-sided
The above two methods have greatly improved compared with Justin's research, but they have ignored several important issues: (1) For the classification of web content, especially for the classification of pictures, use SVM model or BP neural network classification The performance of complex images is not good, and it is easy to produce large deviations; (2) Using machine learning or deep learning to classify web content will bring great overhead to the system. Measures of system response speed, the two did not do similar acceleration processing

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Website content safety testing system and method
  • Website content safety testing system and method

Examples

Experimental program
Comparison scheme
Effect test

Embodiment 1

[0042] Such as Figure 1-2 As shown, the present invention includes a website content security detection system, including

[0043] Front-end request module: input the URL address to be detected, and submit the request to the crawler module;

[0044] Crawler module: Crawl the image information of the target URL;

[0045] Feature extraction module: extract the image information of the crawler module and the image information of the sample image module as feature vectors;

[0046] Model trainer: the feature vector of the sample picture is generated into a classifier through supervised learning;

[0047] FPGA hardware accelerator: provide hardware acceleration function for the feature extraction module;

[0048] Safety arbitration module: calculate the safety factor of the target URL according to the classification results of the image features by the classifier;

[0049] Data storage module: store the image information crawled by the crawler module, and store the detection res...

Embodiment 2

[0054] This embodiment is preferably as follows on the basis of Embodiment 1: the FPGA hardware accelerator uses the Xilinx reconfigurable acceleration stack, combined with the Caffe machine learning framework and the Xilinx deep neural network DNN library to implement.

[0055] The Caffe machine learning framework is an integrated framework for deep learning of CNN convolutional neural networks. When the existing technology uses the SVM model or BP neural network to classify complex images, it is easy to produce large deviations. However, the classifier of this scheme will crawl to obtain text and picture content, and extract image features by using CNN convolutional neural network deep learning method Vector, using the sample image features as the input of the model trainer to obtain the row formula of the classifier, it is less prone to deviation than the SVM model or BP neural network classification algorithm when analyzing complex images, and the website screening results ...

Embodiment 3

[0059] A method for detecting website content security, comprising the steps of:

[0060] S1: The feature extraction module extracts the picture information of the sample picture module into the form of a feature vector;

[0061] S2: The sample feature vector obtained in S1 is used as input, and the model trainer generates a classifier by means of supervised learning;

[0062] S3: Input the URL to be detected in the front-end request module, detect the validity of the URL, and submit the URL to the crawler module;

[0063] S4: The crawler module receives the URL sent by the front-end request module, crawls the picture information of the target URL, and stores the crawled content in the data storage module;

[0064] S5: the feature extraction module extracts the feature vector of the picture crawled by S4;

[0065] S6: Taking the image feature vector extracted by S5 as input, the classifier classifies the crawled images;

[0066] S7: The security arbitration module calculate...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention discloses a website content safety testing system and method. The system includes a front-end request module, a web spider module, a feature extraction module, a model training device, an FPGA hardware accelerator and a safety arbitration module; the front-end request module inputs to-be-tested URLs and submits a request to the web spider module; the web spider module crawls pictureinformation of a target URL; the feature extraction module extracts the picture information of the web spider module and the picture information of a sample picture module as feature vectors; the model training device utilizes the feature vectors of sample pictures to generate a classification device by means of a supervised learning mode; the FPGC hardware accelerator provides a hardware acceleration function for the feature extraction module; the safety arbitration module calculates the safety coefficient of the target URL according to a classification result obtained after the classification device classifies the features of the pictures. By means of the principle above, by inputting the features of the sample pictures into the model training device to obtain the classification device and using the FPGA hardware accelerator to accelerate an algorithm of the feature extraction module to increase the response speed of the system, the purpose of implementing a fast, efficient and accurate website content safety test is achieved.

Description

technical field [0001] The invention relates to the technical field of network security, in particular to a website content security detection system and method. Background technique [0002] With the development of Internet technology, Web applications have brought great convenience to people's life and greatly enriched the way of dissemination of information. However, some illegal elements seek benefits for themselves by creating websites such as phishing, gambling and pornography, which brings great security risks to people's safety and health when surfing the Internet. Therefore, the detection of malicious websites has become a serious network security problem. [0003] At present, the detection of malicious web pages mainly includes two methods: static feature detection and dynamic feature detection. Static feature detection includes analysis of web page DNS information, WHOIS information, URL syntax features, HTML content, and JavaScript code; dynamic feature detecti...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
Patent Type & Authority Applications(China)
IPC IPC(8): G06F17/30G06N3/08
CPCG06N3/084G06F16/951G06F16/958
Inventor 王电钢龚艳母继元毛启均常健
Owner INFORMATION & TELECOMM COMPANY SICHUAN ELECTRIC POWER
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products