Supercharge Your Innovation With Domain-Expert AI Agents!

Multi-source data fused phishing website identification method and system

A phishing website and identification method technology, which is applied in the field of phishing website identification methods and systems integrating multi-source data, and can solve problems such as low identification performance.

Active Publication Date: 2021-06-29
WUHAN UNIV
View PDF17 Cites 4 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0004] Aiming at the defects of the prior art, the purpose of the present invention is to provide a phishing website identification method and system that integrates multi-source data, aiming at solving the problems of relatively simple features and low identification performance of the existing phishing website identification selection

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Multi-source data fused phishing website identification method and system
  • Multi-source data fused phishing website identification method and system
  • Multi-source data fused phishing website identification method and system

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0042] In order to make the object, technical solution and advantages of the present invention clearer, the present invention will be further described in detail below in conjunction with the accompanying drawings and embodiments. It should be understood that the specific embodiments described here are only used to explain the present invention, not to limit the present invention.

[0043] The invention provides a phishing website identification method and device based on multi-source data features and URL features, and relates to the technical field of phishing website identification in network security. The method includes: collecting legal website URLs and phishing website URLs to extract data for training models; extracting multi-source feature data of each URL, including multi-source data, character-level encoding data and word-level encoding data; and then processing the multi-source data respectively Data features, character-level data features, and word-level data feat...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

PUM

No PUM Login to View More

Abstract

The invention provides a phishing website identification method and system fusing multi-source data. The method comprises the following steps: determining phishing websites and legal website samples; collecting multi-source feature data of each website based on the website URL; processing the URL multi-source feature data to obtain a high-dimensional feature vector of the website URL multi-source feature; performing word segmentation and word segmentation operation on the website URL, and splicing the word vector matrix and the character-level vector matrix to obtain a high-dimensional feature vector of the website URL; splicing the high-dimensional feature vector of the website URL multi-source feature and the high-dimensional feature vector of the website URL to obtain a feature vector of each website; combining the feature vectors and the labels of all the websites to form a sample data set, inputting the sample data set into a classification model for training, and serving the trained classification model as a phishing website identification model; and identifying a to-be-identified website based on the phishing website identification model, and judging whether the to-be-identified website is a phishing website. The invention provides a high-precision phishing website identification scheme.

Description

technical field [0001] The invention belongs to the technical field of phishing website identification, and more specifically relates to a phishing website identification method and system that integrates multi-source data. Background technique [0002] "Phishing website" is a kind of network fraud, which means that criminals use various means to counterfeit the uniform resource locator (URL) address and page content of the real website, or use the loopholes in the real website server program to post on the website Insert dangerous HTML codes into some webpages of the website to defraud users of private information such as bank or credit card account numbers and passwords. [0003] In the existing technology, the final discrimination result can be obtained through two model trainings to efficiently realize the identification of phishing websites. However, the feature selection of this technology is relatively simple, and richer features are not considered. In addition, the...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

Application Information

Patent Timeline
no application Login to View More
Patent Type & Authority Applications(China)
IPC IPC(8): G06F16/955G06F40/289G06F40/30G06K9/62G06N3/04
CPCG06F16/9566G06F40/289G06F40/30G06N3/045G06F18/251G06F18/2415
Inventor 胡忠义吴江张硕果
Owner WUHAN UNIV
Features
  • R&D
  • Intellectual Property
  • Life Sciences
  • Materials
  • Tech Scout
Why Patsnap Eureka
  • Unparalleled Data Quality
  • Higher Quality Content
  • 60% Fewer Hallucinations
Social media
Patsnap Eureka Blog
Learn More