A comprehensive scheme to detect
phishing emails using features that are invariant and fundamentally characterize
phishing. Multiple embodiments are described herein based on combinations of text analysis, header analysis, and
link analysis, and these embodiments operate between a user's mail
transfer agent (MTA) and mail
user agent (MUA). The inventive embodiment, PhishNet-NLP™, utilizes
natural language techniques along with all information present in an email, namely the header, links, and text in the body. The inventive embodiment, PhishSnag™, uses information extracted form the embedded links in the email and the email headers to detect
phishing. The inventive embodiment, Phish-Sem™ uses
natural language processing and
statistical analysis on the body of labeled phishing and non-phishing emails to design four variants of an email-body-text only classifier. The inventive scheme is designed to detect phishing at the email level.