Email binary classification algorithm based on active learning and negative selection

A technology of negative selection and classification algorithm, applied in text database clustering/classification, computing, computer components, etc., can solve the problems of low generalization performance, high economic cost, and large impact on classification results, and achieve accelerated mail classification process, improve classification accuracy, and reduce the effect of CPU processing time

Inactive Publication Date: 2018-05-25
CHANGCHUN UNIV OF SCI & TECH
View PDF6 Cites 0 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0004] Defects of existing methods and inventions: 1) Due to the high economic cost of expert labeling and the inability to effectively label large-scale problems, the number of unlabeled sample data is huge and easy to obtain; 2) Traditional machine learning in existing solutions Algorithms, especially supervised learning algorithms, must label a large number of sample data, otherwise the generalization performance will be low; 3) For spam filtering, the user's personal preferences have a greater impact on the classification results; 4) When manually labeling samples online, Experts cannot directly choose the best labeling time

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Email binary classification algorithm based on active learning and negative selection
  • Email binary classification algorithm based on active learning and negative selection
  • Email binary classification algorithm based on active learning and negative selection

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0047] In order to make the objectives, technical solutions, and advantages of the present invention clearer, the following further describes the present invention in detail with reference to the accompanying drawings and embodiments. It should be understood that the specific embodiments described here are only used to explain the present invention, but not to limit the present invention.

[0048] The two-class mail classification algorithm based on active learning and negative selection proposed by the present invention includes the following steps:

[0049] S1. Utilize a mail set consisting of legitimate mail and spam S 0 Establish the user’s positive interest set P and negative interest set N, and The specific process is as follows:

[0050] S11, eliminate S 0j (S 0j ∈S 0 , 1≤j≤|S 0 |, |S 0 | Means S 0 The number of elements in the set) in the attachments, tags, punctuation marks, special symbols, and stop words, the remaining text is segmented and the root is restored to form S ...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention relates to an email binary classification algorithm based on active learning and negative selection. The email binary classification algorithm is characterized in that user bidirectionalinterest sets are first established according to marked email sets; an email binary classification algorithm is constructed by using an abnormal detection mechanism in a negative selection algorithm,and to-be-classified email sets serve as self-sets to perform matching detection; finally email classification results are obtained by using matching results, and the user bidirectional interest setis updated; an active learning method and the negative selection algorithm are applied to spam filtering, the to-be-classified email sets serve as the self-sets, user positive and negative interest sets constructed from existing labeled email sets serve as detectors, all email key feature sets screened by a key feature selection algorithm serve as classified objects, and finally classification results of the email sets are obtained through an anomaly detection matching mechanism. The algorithm performs bidirectional binary matching detection on the email sets through positive and negative interest sets, and a new idea is provided for the spam filtering method.

Description

Technical field [0001] The invention relates to a two-class classification algorithm for mail based on active learning and negative selection. The positive and negative interest sets of users are constructed by using the active learning method in the machine learning method, and the self-set and detection mechanism in the negative learning algorithm are combined to realize Fast and efficient spam filtering belongs to the cross-technology application field of machine learning and text classification. Background technique [0002] Text classification technology is a technology that uses computer programs to automatically classify and mark text collections (or other entities or objects) according to a certain classification system or standard, so that people can better understand, coordinate, and reasonably use network text information. In recent years, text classification technology has been widely used in fields such as mail classification, information filtering, and text corpus c...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
Patent Type & Authority Applications(China)
IPC IPC(8): G06F17/30G06F17/27G06K9/62
CPCG06F16/335G06F16/35G06F40/279G06V10/757G06F18/22G06F18/214
Inventor 邱宁佳王鹏田文山胡小娟杨迪李松江杨华民
Owner CHANGCHUN UNIV OF SCI & TECH
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products