Phishing website detection method based on multi-feature fusion

A multi-feature fusion and phishing website technology, which is applied in data exchange networks, special data processing applications, instruments, etc., can solve the problem of excessively high dimensionality of text vector features, and achieve the effect of improving long training time and high accuracy

Active Publication Date: 2018-11-09
SOUTHEAST UNIV
View PDF4 Cites 19 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

This method mainly includes the process of feature extraction, feature fusion, and classification prediction. It can extract the features of phishing websites from multiple dimensions, effectively solve the problem of high dimensionality of text vector features, and extend the XGBoost classification model to the detection of phishing websites. , improve detection accuracy and reduce detection false negative rate

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Phishing website detection method based on multi-feature fusion
  • Phishing website detection method based on multi-feature fusion
  • Phishing website detection method based on multi-feature fusion

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0015] Below in conjunction with specific embodiment, further illustrate the present invention, should be understood that these embodiments are only used to illustrate the present invention and are not intended to limit the scope of the present invention, after having read the present invention, those skilled in the art will understand various equivalent forms of the present invention All modifications fall within the scope defined by the appended claims of the present application.

[0016] The concrete implementation steps of this method are as follows:

[0017] Step 1, accumulate sample data sets. The present invention first collects 20,000 effective phishing URLs from the blacklist provided by PhishTank (PhishTank.com), and downloads and obtains 20,000 effective normal URLs from the open website classification directory DMOZ (dmoztools.net), which together constitute URL sample data Set D. Since phishing websites generally choose banks, games and e-commerce websites as ph...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention discloses a phishing website detection method based on multi-feature fusion. The phishing website on the internet can be detected in real time by the method. The URL features, the HTML features and the text vectors based on TF-IDF are firstly extracted according to the URL of the webpage; then the text vectors are classified by using Logistic regression, Logistic regression featuresare constructed and multi-feature fusion of the Logistic regression features and the webpage URL features and the HTML features is performed; and finally the XGBoost (eXtreme gradient boosting) modelis trained and classified prediction is performed on the phishing website to be detected. Multiple key features are omnidirectionally extracted from multiple dimensions based on the URL of the website, the problem of high feature dimension of the text vectors can be effectively solved by using the Logistic regression feature fusion method and the operation efficiency can be greatly enhanced in comparison with that of the existing feature fusion method; besides, the XGBoost classification model can further enhance the phishing website detection accuracy in comparison with the conventional classification model so as to reduce the missing report rate of phishing website detection.

Description

technical field [0001] The invention relates to a method for detecting phishing websites based on multi-feature fusion. The method extracts the characteristics of phishing websites from multiple dimensions in an all-round way, uses machine learning methods to classify and improve the classification accuracy, and can detect phishing websites on the Internet in real time. It belongs to the network The field of space security technology. Background technique [0002] In recent years, with the rapid development of the Internet, the deficiencies in the security of the Internet architecture have become increasingly apparent, and various security issues such as phishing, cybercrime, and privacy leaks have become more and more prominent. There is no national security without cybersecurity, and cyberspace security has become a difficult problem that all countries in the world must face and solve together. Among various network security issues, phishing is a criminal act of stealing ...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
Patent Type & Authority Applications(China)
IPC IPC(8): H04L29/06H04L12/24G06F17/30
CPCH04L41/147H04L63/1483
Inventor 杨鹏曾朋李幼平张长江郑斌
Owner SOUTHEAST UNIV
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products