Filtering method for spam based on supporting vector machine

A technology of spam filtering and support vector machine, which is applied in electrical components, transmission systems, office automation, etc., can solve problems such as spam filtering methods that do not use support vector machines, and achieve the goal of improving classification accuracy and increasing weight values Effect

Inactive Publication Date: 2008-01-16
ZHEJIANG UNIV
View PDF7 Cites 53 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

The spam filtering technology adopted in the above patents does n

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Filtering method for spam based on supporting vector machine
  • Filtering method for spam based on supporting vector machine
  • Filtering method for spam based on supporting vector machine

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0029] Main principle of the present invention is as follows:

[0030] 1) In the preprocessing stage of emails, that is, at the feature level, conventional methods such as decoding, word segmentation, and feature selection are used. The preprocessed emails will include title content and body text content, and at the same time, whether each email contains attachments, pictures, etc. , audio, video and other information extracted.

[0031] 2) At the model level, use SVM for training and classification. A SVM model is obtained through training, and the classification hyperplane is found, that is, the classification hyperplane between spam and normal mail.

[0032] 3) Aiming at the problem of unequal costs, that is, the cost of normal mail being misjudged as spam is much greater than the cost of spam being misjudged as normal mail, the method of threshold setting is adopted, that is, only when the probability of spam mail is compared with normal mail When the probability of an e...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention discloses a junk mail filtering method based on support vector machine (SVM). The steps are as following: 1) analyze the mail and extract the message relevant to title, text and character set; 2) carry out divided syncopation to the extracted text message content; 3) make statistics of word frequency in mail and utilize TF-IDF formula to map the mail text to vector; 4) utilize LibSVM to train the mail sample and obtain support vector machine model; 5) utilize support vector machine model to classify new mail and obtain the probability value of junk mails; 6) utilize threshold value adjustment to guarantee a lower level of false positive rate of normal mails to junk mails and ultimately judge whether mails are junk mails. The invention utilizes the advantage of highest single model classification accuracy of the support vector machine, improves the correctness of junk mail filtering, according to the text feature and activity feature and at the same time, also effectively solves the problem of unequal miscarriage cost in junk mail filtering.

Description

technical field [0001] The invention relates to a spam filtering method, in particular to a spam filtering method based on a support vector machine. Background technique [0002] Since the popularization of the Internet, e-mail has gradually become one of the convenient means of communication in people's lives. However, the resulting spam spread like a plague, polluting the network environment, occupying a large amount of transmission, storage and computing resources, and affecting the normal operation of the network. Due to the large amount of spam, which has the characteristics of repetition, coercion, deception, unhealthiness, and fast transmission speed, it seriously interferes with people's normal life, wastes users' time and energy, and even causes a lot of extra economic expenditure and information security risks. . Therefore, spam filtering technology has become one of the important research topics in the development of the Internet. [0003] Spam filtering can be...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
IPC IPC(8): H04L12/58H04L29/06G06F17/30G06Q10/00G06Q10/10
Inventor 陆冠中徐从富王金龙
Owner ZHEJIANG UNIV
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products