Spam email filtering system and method capable of intelligently selecting training samples

A technology for spam filtering and training samples, applied in transmission systems, digital transmission systems, electrical components, etc., can solve the problems of complex labeling and inability to select learning for a given sample, and achieve the effect of improving accuracy

Active Publication Date: 2013-06-19
CHINA TELECOM CORP LTD
View PDF3 Cites 28 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0006] The sample selection problem faced by the traditional passive learning mode, for example, the manual la

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Spam email filtering system and method capable of intelligently selecting training samples
  • Spam email filtering system and method capable of intelligently selecting training samples
  • Spam email filtering system and method capable of intelligently selecting training samples

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0023] In the mail filtering system, the higher the accuracy of the filter (that is, the classification model), the higher the accuracy of mail classification. The filter is constructed by learning and training a sample set of a given known category, so the quality of the sample is very important and directly affects the accuracy of the mail filter (the accuracy of mail filtering). The sample selection method of the present invention can improve the accuracy of mail filtering, that is, improve the precision of the classifier.

[0024] In the junk mail filtering system, the present invention provides a method and system for intelligently selecting unmarked samples to join in classification model training.

[0025] The spam filtering system involved in the present invention adds a training set management module, a sample active selection module, a sample category management module and a feedback module on the basis of traditional mail preprocessing, word segmentation, feature se...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention provides a spam email filtering system and a method capable of intelligently selecting training samples. Sample sets which are labeled in categories and sample sets which are not labeled in categories are initialized; samples in the labeled sample sets are used as a training set to train an initial classification model, and uncertainty of every unlabeled sample in the unlabeled sample sets are computed through the classification model, and the uncertainty refers to uncertainty of the samples belonging to categories; P samples with large uncertainty are selected in the samples which are not labeled in category, and the unlabeled P samples are labeled in category; the labeled samples are added in a final training set as labeled samples, and a novel category model is constructed on the final training set. The model is used for filtering emails, and determining whether an email is a spam email or a legal email. According to the spam email filtering system and the method, people can avoid learning samples which are not greatly favorable for category, and accuracy of the classification model is improved.

Description

technical field [0001] The invention relates to the technical field of anti-spam, in particular to a system and method for intelligently selecting samples from an unlabeled sample set for training during the construction of a classifier model and performing spam filtering. Background technique [0002] Spam filtering is a classification and filtering problem based on text content. In essence, it can be attributed to two processes of training and classification, namely: [0003] Process 1: The training process of word segmentation, feature selection, learning, and construction of a classifier for a large number of labeled (known category labels) samples; [0004] Process 2: The classification process of using the classifier to predict the unknown sample category. [0005] Therefore, sample quality is crucial and directly affects classification accuracy. [0006] The traditional passive learning mode faces sample selection problems, for example, the manual labeling of unlabe...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
IPC IPC(8): H04L12/58G06F17/30
Inventor 吕娣
Owner CHINA TELECOM CORP LTD
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products