Malicious URL detection method and system

A detection method and detection system technology, applied in the field of network information security, can solve the problems of lack of negative samples, reduced accuracy, and difficulty in obtaining labeled data, etc., and achieve the effect of novel design and strong practicability

Inactive Publication Date: 2020-03-24
深圳市任子行科技开发有限公司 +1
View PDF7 Cites 6 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

These methods are mainly divided into two categories: one is unsupervised methods, such as anomaly detection technology, such methods do not need to label the data; but the requirements of the model for input features are much higher than that of general supervised models, usually in the number of features It is difficult to maintain its performance at the top of the score with a little more
The second is a supervised method. Manual labeling is performed based on human business experience, and then supervised learning is performed based on the labeling to obtain a model. However, the cost of labeling is high, and the labeling experts themselves have manual subjective errors, which will lead to the problem of reduced accuracy.
However, in many cases, it is difficult for us to obtain accurate annotation data
In more cases, we may only get a small part of malicious URLs and a large number of unlabeled URL samples, lacking enough reliable negative examples, which means that we cannot directly use the above machine learning algorithm
Whereas if we simply solve it in an unsupervised manner, the annotation information of known malicious URLs is difficult to make full use of, and may not achieve satisfactory performance

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Malicious URL detection method and system
  • Malicious URL detection method and system
  • Malicious URL detection method and system

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0027] The technical problem to be solved by the present invention is: when performing URL detection, usually only a small part of malicious URLs and a large number of unlabeled URL samples can be obtained, and there is a lack of reliable enough negative examples, which means that we cannot directly use conventional machines Learning algorithms. Whereas if we simply solve it in an unsupervised manner, the annotation information of known malicious URLs is difficult to fully exploit and may fail to achieve satisfactory performance. The technical idea proposed by the present invention for this technical problem is: construct a malicious URL detection method and system, and combine active learning (Active Learning, AL for short) with semi-supervised (two-step Postive and Unlabled Learning, PU for short) . In the case of limited manual labeling workload, a malicious URL detection model was developed for the URL dataset. Compared with the unsupervised model and the semi-supervised ...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention provides a malicious URL detection method and system. The malicious URL detection method comprises the steps of obtaining a URL to-be-analyzed data set, and obtaining a malicious URL training sample set; utilizing the malicious URL training sample set to train an SVM support vector machine to classify the URL to-be-analyzed data set to obtain a malicious URL data set and a to-be-labeled URL data set; clustering the to-be-labeled URL data set by adopting a clustering algorithm to obtain a to-be-labeled URL sample set; marking the to-be-marked URL sample set according to a malicious judgment result so as to divide the to-be-marked URL sample set into a marked malicious URL sample set and an unmarked URL sample set; combining the marked malicious URL sample set and the maliciousURL training sample set in a set union solving mode to obtain an updated malicious URL training sample set; and subtracting the marked malicious URL sample set from the to-be-marked URL data set to obtain an updated URL test data set. The malicious URL detection method and system are novel in design and high in practicability.

Description

technical field [0001] The invention relates to the technical field of network information security, in particular to a malicious URL detection method and system. Background technique [0002] With the rapid development of the Internet, more and more malicious URL attacks have appeared, seriously threatening network security. Traditional URL attack detection systems mainly use blacklists or rule lists. And these lists or rule lists will become longer and longer, and it is unrealistic to prevent all attacks in these ways. What's more, these methods are difficult to detect potential threats, and it is difficult for network security engineers to effectively discover emerging malicious URL attacks. [0003] In order to improve the generalization ability of the algorithm, many researchers use methods based on machine learning to complete this task. These methods are mainly divided into two categories: one is unsupervised methods, such as anomaly detection technology, such meth...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
Patent Type & Authority Applications(China)
IPC IPC(8): H04L29/06G06K9/62
CPCH04L63/1416H04L63/1483G06F18/23213G06F18/2411G06F18/214
Inventor 熊骁郭岗林飞古元沈智杰景晓军
Owner 深圳市任子行科技开发有限公司
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products