Junk mail detection method and system based on dynamic update of categorizer

A spam, dynamic update technology, applied in transmission systems, digital transmission systems, electrical components, etc., can solve problems such as classification vectors that cannot save classification effects, and achieve the effect of flexible classification methods and accurate classification results.

Inactive Publication Date: 2008-12-03
PEKING UNIV
View PDF0 Cites 21 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0008] From the above analysis, it can be concluded that in the prior art spam classifier update method, the newly received data is used to dynamically update the classifier, which can reflect the changes of the continuously received new data stream and reflect the characteristics of

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Junk mail detection method and system based on dynamic update of categorizer
  • Junk mail detection method and system based on dynamic update of categorizer
  • Junk mail detection method and system based on dynamic update of categorizer

Examples

Experimental program
Comparison scheme
Effect test

Embodiment 1

[0040] The spam dynamic detection method proposed by the invention is based on the principle of the detector and memory cells of the immune system, and the proposed spam dynamic detection system and the immune system have similarities in pattern recognition, dynamic change and noise tolerance.

[0041] Support vector machine is a classifier with statistical learning theory and excellent generalization performance. It has been successfully applied to many fields. This embodiment improves the incremental support vector machine technology in the prior art, and updates the classifier based on the principle of detectors and memory cells in the immune system.

[0042] In the updating process of the classifier by the incremental support vector machine technology in the prior art, the initial classifier is constructed by using the training samples classified as normal emails and spam emails, and the classifier includes several classification vectors representing email classification. ...

Embodiment 2

[0049] In this embodiment, a support vector machine is used to construct a classifier. The initially constructed classifier is not limited to one, but several classifiers. That is, the several classifiers can be constructed by using training samples, or one classifier can be constructed and then continuously used. The received mails construct the several classifiers, and this embodiment adopts the latter.

[0050] Since several classifiers are used in this embodiment, with the arrival of newly received emails, the classifiers with a relatively long time will be cleared, and new classifiers will be generated using newly received emails as training samples. Therefore, it can be regarded as A sliding window carries different classifiers, and the update of classifiers in this embodiment includes the following aspects:

[0051] 1) Sliding update of the classifier in the window

[0052] In this embodiment, the stream data of mails received is considered as grouped batch data, and t...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention relates to a method and a system for detecting junk mails based on the dynamic updating of a sorter, which includes the following steps: the sorter composed of class vector is constructed; the degree of similarity between the characteristic vector and each class vector of the mail to be detected is acquired; the class vector with the highest degree of similarity is sorted; the classification fed by the user on the mail to be detected is acquired; the times of carrying out correct classification of each class vector is counted; the new mail is received and sorted according to the steps and the sorter is updated when the set conditions are met, the class vector is reserved when the correct sorting times exceeds the set value, and the newly-received mail to be detected is sorted; the system includes a sorter updating unit, the sorter is updated when the set conditions are met, the class vector is reserved for participating mail classification in the future when the correct sorting times exceeds the set value. The invention reserves the class vector with good classification effect in the sorter for a period of time, which guarantees the accuracy of the classification and cannot be affected by the limitation of the new a data stream.

Description

technical field [0001] The invention relates to the technical field of e-mail processing, in particular to a spam detection method and system based on classifier dynamic update. Background technique [0002] With the increasing popularity of the Internet, e-mail has become an important medium of daily communication and one of the most convenient means of communication for everyone, basically replacing traditional paper letters, and people are increasingly relying on It and can't do without it. However, the emergence of electronic spam has caused a growing problem, seriously threatening people's normal email communication. The expansion of spam not only wastes a lot of storage space and communication bandwidth, but also consumes a lot of user time to process and delete them. Therefore, it is very necessary and meaningful to study the detection and filtering methods of this kind of spam. [0003] Spam classification detection is essentially a pattern recognition problem. T...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
IPC IPC(8): H04L12/58H04L12/26
Inventor 谭营阮光尘
Owner PEKING UNIV
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products