Spam mail filtering system and method based on clusters

A spam filtering and email technology, applied in transmission systems, digital transmission systems, electrical components, etc., can solve the problems of filter learning and discrimination ability

Inactive Publication Date: 2014-02-05
SOUTH CHINA UNIV OF TECH
View PDF5 Cites 24 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

This kind of uniform training of training data without distinction, as a method commonly used by machine learning algorithms in the field of mail filtering, can provide filters with better learning ability, but because of the lack of differentiated treatment of training data, in The learning and discriminative ability of the filter will be affected

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Spam mail filtering system and method based on clusters
  • Spam mail filtering system and method based on clusters
  • Spam mail filtering system and method based on clusters

Examples

Experimental program
Comparison scheme
Effect test

Embodiment

[0083] A cluster-based spam filtering system such as figure 1 shown, including:

[0084] The clustering module is used to analyze the text content of the training email, and divide the email into different clusters according to the subject similarity, and the emails in the same cluster have one or more subject similarities;

[0085] The email training module is used to train and learn marked emails to generate feature knowledge;

[0086] The mail filtering module is used to filter new incoming mails, and give the judgment result of whether the mails are spam or not according to the feature database.

[0087] The feature library storage module is used to store feature data corresponding to various clusters.

[0088] Such as figure 2 Shown, for realizing the present invention better, described clustering module comprises:

[0089] The clustering preprocessing module is used to decode the training emails and unmarked emails to be analyzed by clustering, and express them in t...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention discloses a spam mail filtering system and method based on clusters. The system comprises a cluster module, a mail training module, a characteristic base storage module and a mail filtering module. The method includes that S1.1 training mails and unmarked mails are obtained from a mail backup system; S1.2 a cluster preprocessing module preprocesses the mails; S1.3 a cluster analyzing module divides the preprocessed mails into different clusters; S1.4 a cluster central computing module computes vector expression of the clusters; S1.5 the training module studies marked mails in the clusters and updates characteristic bases corresponding to the clusters; S2.1 mails required to be filtered are obtained from a mail system; S2.2 a mail class attribute distinguishing module calculates the cluster most similar to mail content; S2.3 a mail characteristic extraction module conducts characteristic extraction on the mails to be distinguished; S2.4 a mail distinguishing module shows a distinguishing result according to the mail characteristics and the corresponding characteristic bases. The system and method has the advantages of being high in extraction speed, high in accuracy and good in effect.

Description

technical field [0001] The invention relates to the technical field of spam filtering, in particular to a cluster-based spam filtering system and method. Background technique [0002] With the popularity of e-mail, spammers send a large amount of spam to the network through very cheap means, which seriously affects the bandwidth of the network, interferes with the normal use of users and poses a potential threat to user security. [0003] At present, most email filtering systems based on machine learning only use a specific learning algorithm to conduct a single training study on the training email set, and then identify new emails based on the feature library generated by the learning algorithm modeling. This kind of uniform training of training data without distinction, as a method commonly used by machine learning algorithms in the field of mail filtering, can provide filters with better learning ability, but because of the lack of differentiated treatment of training dat...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
Patent Type & Authority Applications(China)
IPC IPC(8): G06F17/27H04L29/06H04L12/58
Inventor 董守斌许腾张晶张凌隆承志
Owner SOUTH CHINA UNIV OF TECH
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products