Method for detecting image spam email by picture character and local invariant feature

A local invariant feature, spam technology, applied in computer parts, electrical components, digital transmission systems, etc., can solve the problems of disadvantage, large amount of calculation, high algorithm time complexity, save program operation time and space, The effect of improving precision and recall

Inactive Publication Date: 2010-11-17
NANJING UNIV OF POSTS & TELECOMM
View PDF3 Cites 31 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

Distinguish junk pictures by calculating the threshold value. Although this method uses statistical knowledge to calculate more accurately,

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Method for detecting image spam email by picture character and local invariant feature
  • Method for detecting image spam email by picture character and local invariant feature
  • Method for detecting image spam email by picture character and local invariant feature

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0030] Image spam is detected based on local invariant features of pictures, using VC++6.0 as the development tool, in which opencv1.0 open source library is used to process image features, and MFC class library is used to extract text in pictures. The detailed steps are as follows:

[0031] 1. Training phase: Obtain junk pictures and normal pictures to form a training set, and train to form a stack classifier.

[0032] a) Text feature extraction stage:

[0033] Step 1) To recognize the characters in the graphics, use the optical character recognition technology module provided by Microsoft Corporation. We have made many improvements to the interface of this module for use in our invention: it has been improved to enable batch processing of data sets, and some unrecognizable special symbols in the extracted text have been removed;

[0034] Step 2) improve the optical character recognition module, and store the pictures that can be accurately extracted and the words that cann...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention provides a method for detecting an image spam email by local invariant features of pictures, which can extract the invariant region feature of junk information in the pictures by using a scale-invariant feature conversion algorithm and extract characters embedded into the pictures to classify the pictures so as to form a feature vector library of the pictures combining two features together. Experiments prove that the recall rate of the spam email can be improved and the program operation time and space can be saved. The method can extract the invariant region feature in the pictures to generate the feature vectors of the pictures, and a support vector machine classifier is used for training and testing. In the method, by utilizing the text messages embedded into the pictures, the text string in the pictures can be excavated by using a graphic character recognition technology and the string can be taken as the feature of the pictures, and the Bayesian classifier is used for training and testing. The feature vector of each picture is composed of the local invariant feature of the picture and the text string; and two types of classifiers are used for classifying by a stacking method to achieve the purpose of detecting the image spam email.

Description

technical field [0001] The present invention is a combination of local invariant features of spam pictures and text embedded in pictures, using different classification methods, and integrating the two results, to realize the detection of image-type spam, which mainly solves the problem of image-type spam in today's technology. The problems of spam detection efficiency and low recall rate belong to the field of data mining and machine learning. Background technique [0002] E-mail has become an important way for people to communicate on the Internet, but due to the huge commercial, economic and political interests, the amount of spam has expanded rapidly. The image-based spam that was prevalent at the beginning was to embed spam information such as advertisements in the image in the form of text. Hrishikesh et al. are using the mined text and color features to classify the mail. In 2006, Fumera et al. proposed an OCR (Optical Character Recognition) technology to detect the ...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
IPC IPC(8): G06K9/62H04L12/58
Inventor 张卫丰王慕妮周国强张迎周王宗辉杨波韩蕊许碧欢陆柳敏
Owner NANJING UNIV OF POSTS & TELECOMM
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products