Various features are provided for analyzing and
processing email messages including determining if an email message is unwanted, and blocking unwanted messages. Email traffic is monitored by analyzing email messages addressed to known invalid email addresses. Email messages addressed to invalid email addresses are sent to a central control site for analysis. One embodiment tries to ensure that the distance between the invalid addresses and closest valid addresses is significant enough so that the invalid addresses are not inadvertently used for non-spam purposes. Another embodiment of the invention provides for distributed “
thin client” processes to run on computer systems or other
processing platforms. The thin clients emulate an open
relay computer. Attempts at exploiting the apparent open
relay computer are reported to a control center and the
relay of email messages can be inhibited. Another embodiment provides for analysis and tuning of rules to detect spam and legitimate email. The approach adjusts various factors according to changing, current email data that is gathered from present, or recent, email traffic. Another embodiment takes into account statistics of erroneous and intentional misspellings. Groups of similar content items (e.g., words, phrases, images,
ASCII text, etc.) are correlated and analysis can proceed after substitution of items in the group with other items in the group so that a more accurate detection of “sameness” of content can be achieved. Another embodiment uses
authentication and security methods for validating email senders, detecting the sameness of messages, tracking the reputation of the sender, and tracking the behavior of the sender. Another embodiment profiles users to intelligently organize user data, including adapting spam detection according to a user's perceived interests.