Bayesian Spam Filtering Method Based on Improved Feature Evaluation Function
A spam filter and evaluation function technology, which is applied in special data processing applications, electrical digital data processing, instruments, etc., can solve the problem of not considering the number of occurrences of entries, different contribution capabilities of feature item category definition, and weak negative correlation performance ability and other issues, to achieve efficient and accurate filtering effect
- Summary
- Abstract
- Description
- Claims
- Application Information
AI Technical Summary
Problems solved by technology
Method used
Image
Examples
Embodiment Construction
[0031] The Bayesian spam filtering method based on the improved feature evaluation function is characterized in that the steps are as follows:
[0032] 1) Preprocess the training mail set: divide the mail into two sub-text sets S of the mail header and the body part 1 ,S 2 , in which word segmentation is performed respectively to form two sets of feature items T 1 ,T 2 ; 2) in two feature sets T 1 ,T 2 Use the stop vocabulary list to delete prepositions, pronouns, adverbs, auxiliary words, conjunctions, and words whose word frequency is lower than a given threshold p, and the processed feature item set is recorded as T 1 ’, T 2 ';
[0033] 3) In the feature item set T 1 ’, T 2 ’ using the improved feature evaluation function to calculate the mutual information value MI(t k )':
[0034] 3a) Let feature vector set T={t k ,k=1,2,...,n}, obtain the training set category set C={c in the network file text database j ,i=1,2,...,r};
[0035] 3b) Use the formula (1) to cal...
PUM
Login to View More Abstract
Description
Claims
Application Information
Login to View More 


