Looking for breakthrough ideas for innovation challenges? Try Patsnap Eureka!

A dangerous web page identification method based on chrome plug-in

An identification method and web page technology, which is applied in the field of Internet information security, can solve problems such as the complexity of the extraction process

Active Publication Date: 2020-12-01
NANJING UNIV OF POSTS & TELECOMM
View PDF2 Cites 0 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

The dynamic features mainly come from the dynamic behavior of web pages, and there are fewer types, but the extraction process is relatively complicated.

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • A dangerous web page identification method based on chrome plug-in
  • A dangerous web page identification method based on chrome plug-in
  • A dangerous web page identification method based on chrome plug-in

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0064]下面结合附图和具体实施例,进一步阐明本发明,应理解这些实例仅用于说明本发明而不用于限制本发明的范围,在阅读了本发明之后,本领域技术人员对本发明的各种等价形式的修改均落于本申请所附权利要求所限定的范围。

[0065]一种基于chrome插件的危险网页识别方法,用于识别危险网页,包括如下步骤:

[0066]步骤1)支持向量机第一维度数据提取,如图3所示,具体步骤包括:

[0067]步骤1.1)提取一个网页中所有外部链接的URL;

[0068]步骤1.2)访问http: / / data.alexa.com,根据网页URL中的域名获得网页所在网站的Alexa排名;若该网站排名在1000名以内,直接将该网页视为安全,若该网站排名在1000名以外或索取不到排名,则危险因素零(danger0)置1;

[0069]步骤1.3)分析当前网页URL以及网页外部链接的URL中的各级域名,取其中最长的一级域名;若最长的一段域名长度大于18,危险因素三(danger3)置1,否则为0。对URL的划分方法如下:先以" / ”划分URL,取其中的域名段,再在域名段用".”划分,将各级域名作为字符串加入到一个数组中;

[0070]步骤1.4)对当前网页URL以及网页外部链接的URL进行再次切割并提取信息:若域名以".com.cn”结尾,则提取三级域名;若不是,则提取二级域名;

[0071]步骤1.5)将从每个外部链接URL提取的域名与知名域名数据库中的域名一一比对,计算相似率,取相似率小于1的最高值,记为p,与知名域名数据库中某域名相似率为p的从外部链接提取的域名记为dname。

[0072]比对及相似率计算方法是:先将提取的每个域名分别与数据库中与其长度(L)相同的知名域名一一比对,找到其中相同的字母数s,计算相同字母占比(pr),公式为:

[0073]pr=s / L (公式1)记录相似率pr小于1的最大值,将其赋给变量percent(若pr为1,则所提取域名相关的链接直接视作安全),并记录pr为percent的域名。

[0074]然后再将提取的每个域名分别与数据库中与其长度不相同的知名域名一一比对,把域名看作字符的集合,利用Dice系数计算域名之间的相似度:

[0075]

[0076]记录相似率s小于1的最大值,将其赋给变量dpercent,并记录s为dpercent的域名。

[0077]比较变量percent...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

PUM

No PUM Login to View More

Abstract

The invention discloses a dangerous webpage identification method based on a chrome plug-in. The method comprises the steps of extracting first dimension data of a support vector machine according toURLs of all external links in a webpage, extracting second dimension data of the support vector machine according to JavaScript codes embedded or quoted in all (script) tags in the page html code; solving the support vector machine according to the extracted first dimension data and second dimension data of the support vector machine; the output is the parameters w * and b * of the separation hyperplane and a classification decision function; according to the webpage security identification method and device, whether the webpage is a dangerous webpage or not is judged by comparing the similarity degree of the domain name and the famous webpage domain name and analyzing the JavaScript code embedded or quoted in the webpage, and the problems that an existing webpage security identification method is not high enough in accuracy and not high enough in universality are effectively solved.

Description

technical field [0001] The invention relates to a dangerous webpage identification method based on a chrome plug-in, belonging to the field of Internet information security. Background technique [0002] At present, most of the existing malicious webpage identification systems are oriented to a certain type of specific application, so there are some differences in system structure and implementation. The basic framework of the malicious webpage identification system is mainly divided into three parts: [0003] (1) Web page collection. Responsible for collecting, deduplicating and filtering web pages on the Internet. Among them, according to the method of web page collection, it can generally be divided into two types: active and passive. Active collection mainly refers to the use of web crawler technology to directional capture web page collections from the Internet. The passive collection is mainly to collect the passing access traffic in the gateway or client honeypot....

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

Application Information

Patent Timeline
no application Login to View More
Patent Type & Authority Patents(China)
IPC IPC(8): G06F16/955G06F16/958G06F21/56
CPCG06F21/562G06F16/9566G06F16/958
Inventor 成卫青刁健峰褚佳乐蔡晨阳
Owner NANJING UNIV OF POSTS & TELECOMM
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Patsnap Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Patsnap Eureka Blog
Learn More
PatSnap group products