A Web attack detection method based on collaborative training

A technology of attack detection and collaborative training, which is applied in the field of Web intrusion detection and network security, and can solve the problems of high training cost, low accuracy rate, and large number of

Active Publication Date: 2019-05-31
HANGZHOU NORMAL UNIVERSITY
View PDF7 Cites 7 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

The supervised learning detection method needs to collect a large amount of data and manually mark it, and then use the classification algorithm for training. The method is to use a clustering algorithm to train unlabeled data. The advantage of this method is that the training data does not require labels. The disadvantage is that the accuracy rate is lower than that of supervised learning, and it does not perform well in actual detection; semi-supervised learning detection only needs Manually label part of the unlabeled data, and use the labeled data and unlabeled data for training at the same time
The article "APU Learning based System for Potential Malicious URL Detection" published by Ya-Lin Zhang et al. on the 2017 ACM SIGSAC Conference on Computer and Communications Security mentions the use of PU-learning semi-supervised learning to detect Web attacks, and can eventually reach 94.2% accuracy, but the initial stage requires a large number of malicious samples, which still requires manual labeling to obtain

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • A Web attack detection method based on collaborative training
  • A Web attack detection method based on collaborative training

Examples

Experimental program
Comparison scheme
Effect test

Embodiment

[0042]实施例:一种基于协同训练的Web攻击检测方法,该方法的流程如图1所示,该实施例具体实施步骤如下:

[0043]1处理Web日志,构建数据集

[0044]1.1从Web日志中提取URL

[0045]先从Web服务器中收集Web日志,再从中提取出URL,并对其进行解码,构成集合S;例如,Web日志中的一条记录 202.107.201.11--[18 / Aug / 2018:16:15:46+0800]″GET″ / html / main / col38 / column_38_1.html?id=361 HTTP / 1.0″200 472″-″″-″,经过处理后变为 / html / main / col38 / column_38_1.html?id=361;

[0046]1.2人工标记部分URL

[0047]人工标记部分URL:随机从S中抽取|L|个样本,人工对样本进行标记,标签的集合为{-1, +1},-1代表是正常的URL,+1代表带有攻击的URL,标记样本构成集合L,未标记样本构成集合U,保证S=L+U,L<

[0048]2利用专家知识特征和文本特征,获得两个独立的视图

[0049]2.1用专家知识特征构建视图

[0050]特征空间={路径长度,路径深度,参数长度,参数个数,参数名最大长度,参数名平均长度,参数值最大长度,参数值平均长度,参数值中字母占有的比例,参数之中数字占有的比例,参数值中特殊字符占有的比例,攻击关键字的个数},共计12个特征,其中特殊字符有"”、"eval”等,可以通过查询攻击特征库来得到,用特征空间将URL向量化,S转化为视图X1,最后需要对 X1进行归一化处理,公式为其中Xmax、Xmin分别是原始X1的最大值和最小值;

[0051]2.2用文本特征构建视图

[0052]首先利用N-gram对URL进行分词,URL的特征空间是所有字符的组合,若S中有c种不同的字符,则URL向量的维度d=cn,其中n为N-gram中N的取值,之后用TF-IDF来计算URL的特征值,对于样本x第i个的特征值xi公式由以下的公式得出:xi=TFi×IDFi,其中ni、ntotal、nS、分别表示第i个特征在URL中出现的数量、URL分词后的总数、S的大小、有第i个特征的URL数量,通过上述方法将S 转化为视图X2;

[0053]3利用两个独立的视图进行协同训练,获得两个分类器

[0054]...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention provides a Web attack detection method based on collaborative training. The problem that a Web attack detection model is difficult to train due to lack of most URL tags is solved. According to the method, a part of marked URLs and a large number of unmarked URLs can be used for carrying out model training; firstly, samples are vectorized through expert knowledge characteristics and text characteristics to obtain two independent views, then the views are used for collaborative training to obtain two attack detection models, and finally, the two models are combined through integrated learning to be used for detecting Web attacks. According to the method, the workload of manual data marking can be reduced, and the cost of Web attack detection can be reduced.

Description

technical field [0001] The invention relates to a web attack detection method based on collaborative training, and belongs to the technical fields of web intrusion detection and network security. Background technique [0002] With the wide application of Web systems, attack technologies against Web systems emerge in an endless stream, resulting in more and more attacks on Web systems. In recent years, data leakage incidents have occurred continuously. According to Verizon's "2018 Data Leakage Investigation Report", 90% of data leakage incidents in 2018 were caused by web attacks. It can be seen that the security of the Web system is still not guaranteed, so it is still necessary to study Web attack detection methods. [0003] Web attack detection methods are currently mainly divided into rule-based detection methods and machine learning-based detection methods. At present, most security products for detecting web attacks on the market use a rule-based detection method, whi...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
Patent Type & Authority Applications(China)
IPC IPC(8): H04L29/06
Inventor 刘雪娇唐旭栋夏莹杰
Owner HANGZHOU NORMAL UNIVERSITY
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products