Looking for breakthrough ideas for innovation challenges? Try Patsnap Eureka!

Word sequence-based phishing URL detection method and system

A detection method and word sequence technology, which is applied in the field of information security, can solve problems such as poor robustness, confusing users, and failure to consider the characteristics of URL word sequences, etc., to achieve real-time detection, improve detection accuracy, and reduce overhead.

Inactive Publication Date: 2018-05-04
INST OF INFORMATION ENG CAS
View PDF5 Cites 20 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0008] Most of the current phishing URL detection methods based on word features use words and frequency of occurrence as features, without considering the word sequence features contained in the URL, and these features are based on artificial proposals, which have certain limitations
First of all, manually extracting features requires a lot of manpower and resources to statistically analyze and verify the effectiveness of features; secondly, manually extracted features are usually only valid for a certain type of data and have poor robustness; moreover, attackers use Keywords are usually similar to normal URLs, which can confuse users and reduce the detection efficiency of classification models

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Word sequence-based phishing URL detection method and system
  • Word sequence-based phishing URL detection method and system
  • Word sequence-based phishing URL detection method and system

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0041] The technical solutions in the embodiments of the present invention will be clearly and completely described below in conjunction with the drawings in the embodiments of the present invention.

[0042] In one embodiment of the present invention, a kind of phishing URL detection method and system based on word sequence are provided, and the main steps of the method include:

[0043] (1) word sequence vector representation, at first, adopt the method based on dictionary matching to obtain the keyword sequence contained in the URL, then obtain the vector representation of URL word sequence based on dictionary encoding;

[0044] (2) Model training, for the word sequence vector obtained in the previous step, use the marked training data to train the bidirectional LSTM model based on the word sequence;

[0045] (3) Phishing URL detection, using a trained bidirectional LSTM model based on word sequences to detect whether an unknown URL is phishing.

[0046] The system include...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

PUM

No PUM Login to View More

Abstract

The invention provides a word sequence-based phishing URL detection method and system, and aims at detecting Phishing URL. The method comprises the following steps of: carrying out word segmentation on a URL character string so as to obtain vector expression of a word sequence; automatically learning context information and features in the word sequence by utilizing a deep learning model without artificially extracting word related text features included in URLs; and detecting Phishing URLs by adoption of the trained model. According to the method and system, problems in word feature-based phishing URL detection are solved.

Description

technical field [0001] The invention relates to the field of information security, in particular to a method and system for detecting phishing URLs based on word sequences. Background technique [0002] Phishing URLs are a type of phishing attempt to obtain sensitive user information such as usernames, passwords, and credit card details by masquerading as a reputable corporate media website. Phishing URLs usually claim to be from popular social networking sites (including YouTube, Facebook, Twitter, etc.), auction sites (eBay), electronic shopping sites (PayPal, Alibaba, etc.), or network managers (Google, Yahoo, Internet service providers, etc.) business), etc., in order to lure the credulity of the victim. The deception method often used by attackers is to embed keywords that confuse users in URLs. For example, attackers use URLs such as "login.mydomain.tld / paypal" to lure PayPal users. [0003] At present, there are many phishing URL detection methods and security produ...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

Application Information

Patent Timeline
no application Login to View More
IPC IPC(8): G06F17/27G06F17/30G06N3/04H04L29/06
CPCH04L63/0236H04L63/1483G06F16/9566G06F40/247G06F40/284G06N3/048G06N3/045
Inventor 亚静柳厅文时金桥张盼盼张振宇王玉斌李全刚
Owner INST OF INFORMATION ENG CAS
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Patsnap Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Patsnap Eureka Blog
Learn More
PatSnap group products