Junk mail classification method based on partial match estimation

A technology of spam and classification methods, applied in computer parts, electrical components, instruments, etc., can solve problems such as speed, efficiency, accuracy, and non-incremental learning methods, so as to improve accuracy and accuracy. The effect of good sex and fast prediction speed

Inactive Publication Date: 2009-01-14
ZHEJIANG UNIV
View PDF0 Cites 18 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0008] However, these three methods have problems in terms of speed and efficiency, and none of them are incremental learning methods, and there are doubts in terms of accuracy.

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Junk mail classification method based on partial match estimation
  • Junk mail classification method based on partial match estimation
  • Junk mail classification method based on partial match estimation

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0033] The spam filtering method based on partial matching prediction includes the following steps:

[0034] 1) Convert the new mail to the characters corresponding to the ASCII character table with ASCII values ​​in the range of 032-127. If the words in the original mail are not in the range of characters corresponding to the ASC II code character table 032-127, then all these words will be converted into A character string corresponding to any one of the ASC II values ​​​​from 001 to 031 in the ASC II code character table, after conversion, a character string consisting of a character corresponding to the ASC II value from 001 to 127 is obtained;

[0035] 2) Take out the past spam training set, normal mail training set, spam prediction set and normal mail prediction set;

[0036] 3) The normal mail training set is trained into a normal mail model through a partial matching prediction algorithm, and the spam training set is passed through

[0037] Score matching prediction a...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention discloses a rubbish mail filtering method based on partial matching forecast, comprising: 1) transform a novel mail into a character string composed of corresponding characters of 001-127ASC II; 2) extracting ancient rubbish mail training set, a normal mail training set, a rubbish mail forecast set and a normal mail forecast set; 3) training the normal mail training set and the rubbish mail training set into a rubbish mail model and a normal mail model through a partially matching forecast algorithm; 4) performing cross entropy operation on the character string and the rubbish mail model transformed by a novel mail with the normal mail, obtaining two cross entropy values; 5) determining whether the novel mail is the rubbish mail or the normal mail by the model with the smallest cross entropy; 6) after sorting the novel mail, adding the novel mail into the forecast set, performing incremental learning, obtaining a novel model. The invention effectively avoids the condition that the normal mail is used as the rubbish mail by mistake.

Description

technical field [0001] The invention relates to a spam filtering method, in particular to a spam classification method based on partial matching prediction. Background technique [0002] The popularity of e-mail is mainly due to its convenience, speed and low cost; with the popularization of the Internet, e-mail has gradually become one of the convenient means of communication in people's lives. However, in recent years, with the vigorous promotion of electronic informatization in large traditional industries, spam in the field of information systems has inevitably increased exponentially. Spam has the following characteristics: large number, repetitive, coercive, deceptive, unhealthy and fast spreading. So it seriously interferes with people's normal life and poses a serious threat to the information network. Since the types of spam are becoming more and more complex and diverse. Therefore, studying spam classification has become an important research topic in recent yea...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
Patent Type & Authority Applications(China)
IPC IPC(8): H04L12/58G06K9/66
Inventor 任沁清彭鹏陆冠中徐从富
Owner ZHEJIANG UNIV
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products