N-Gram participle model-based reverse neural network junk mail filter device

A spam filtering and neural network technology, applied in the field of reverse neural network spam filtering devices, can solve the problems of messing up user mailboxes, wasting user time, etc., and achieving the effect of perfect description of email characteristics

Inactive Publication Date: 2010-12-29
UNIV OF ELECTRONICS SCI & TECH OF CHINA
View PDF0 Cites 41 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

Spam is what users hate the most, because they waste users' time, money, and network bandwidth, and at the same time, mess up users' mailboxes, and some emails are even harmful, such as those containing pornographic content or viruses, etc.

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • N-Gram participle model-based reverse neural network junk mail filter device
  • N-Gram participle model-based reverse neural network junk mail filter device
  • N-Gram participle model-based reverse neural network junk mail filter device

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0016] In order to make the object, technical solution, and advantages of the present invention clearer, the present invention will be further described in detail below with reference to the accompanying drawings and examples.

[0017] Such as figure 1 As shown, the word-document space of the present invention generates a schematic diagram, and its specific process includes:

[0018] Step 101, sample email N-Gram word segmentation

[0019] Email content word segmentation is divided into Chinese email word segmentation and English email word segmentation. In English writing, spaces are used as natural delimiters between words, and punctuation marks are used as semantic delimiters, so the process of English word segmentation is relatively simple: remove the punctuation marks from the English email body and scan directly from the beginning, two Spaces are regarded as a word, and the word list of this English email can be obtained by scanning to the end of the text once. Compar...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention relates to the technical field of text processing, in particular to an N-Gram participle model-based reverse neural network junk mail filter device. Customized word characteristic items are added to mail particles by using N-Gram technology, and judgment and filter of junk mails are implemented by combining a reverse neural network. The device is implemented by the following steps of: firstly, processing the mails by using a Markov chain and an N-Gram technique, extracting mail sample characteristics, and obtaining a sample mail word-document space by weight calculation and characteristic selection; secondly, matching a mail sample by using the customized word characteristic items to generate a customized characteristic-document space, and combining the document characteristics generated by the two methods to generate a new mail vector space; thirdly, constructing a reverse neural network model, generating characteristic vectors corresponding to network neurons according to the characteristic items of a mail training sample space, and training the network model by using the mail training sample vector space to obtain a trained mail classifier; and finally, generating a test sample vector space by the mail test sample according to the generated characteristic vectors corresponding to the network neurons, and testing the mail type judgment accuracy of the trained mail classifier. The embodiment of the invention can judge the junk mails so as to filter the junk mails.

Description

technical field [0001] The invention relates to Internet technology, in particular to a reverse neural network spam filtering device based on an N-Gram word segmentation model. Background technique [0002] With the wide application of the Internet, e-mail is favored by people for its fast, simple and cheap advantages, and has become an efficient mass communication medium. At the same time, a large number of useless emails poured into people's mailboxes, bringing disasters to their studies and lives. Spam is what users hate most, because they waste users' time, money, and network bandwidth, and at the same time, mess up users' mailboxes, and some emails are even harmful, such as containing pornographic content or viruses. According to relevant research reports, more than 10% of emails in the world are spam every day. Therefore, it is necessary to find a method for effectively intercepting and filtering spam. [0003] Anti-spam technology can be divided into two categories...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
Patent Type & Authority Applications(China)
IPC IPC(8): G06Q10/00H04L12/58G06F17/27G06N3/02G06Q10/10
Inventor 程红蓉张凤荔王娟马秋明
Owner UNIV OF ELECTRONICS SCI & TECH OF CHINA
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products