Bayesian algorithm-based content filtering method

A Bayesian algorithm and content filtering technology, applied in the field of information security, can solve problems such as spam harassment, and achieve the effect of short time-consuming classification, effective operation, and maintaining accuracy.

Inactive Publication Date: 2011-03-30
SOUTHEAST UNIV
View PDF2 Cites 30 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

With the advent of the 3G era, users can browse and consult more and more information through their mobile phones, but they are also faced with harassment by all kinds of spam

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Bayesian algorithm-based content filtering method
  • Bayesian algorithm-based content filtering method
  • Bayesian algorithm-based content filtering method

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0046] The text classification method in the present invention is applied to detect the parsed text content after protocol parsing, and belongs to the category of content filtering technology. For content with questionable results, a warning sign is output to the user; and a safety sign is output for information that is detected to be safe. The classification of text information can be regarded as a specific application of pattern recognition and an application of text classification technology. The flow of the designed text classification algorithm is as follows: figure 1 shown.

[0047] The spam text information filtering system is divided into two sub-modules to realize: feature library module (background) and text classification module (foreground). The two modules are linked through a feature file. The feature file is generated by the background feature library formation module, which not only records the feature entries, but also records the weights of the entries in ...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention discloses a Bayesian algorithm-based content filtering method. Content filtering is performed for text information in a 3rd generation mobile communication core network, text classification is performed by using a double threshold-based Bayesian algorithm, C1 is set to be normal information, C2 is set to be junk information, a classifier estimates the probability that a characteristic vector X which represents a data sample belongs to each class Ci, and a Bayesian formula for the estimation is that: P(Ci/X) = P(X/Ci) P(Ci)/ P(X), wherein i is more than or equal to 1 and less than or equal to 2, the maximum value of a posterior probability is called the maximum posterior probability, for an error (a reference source is not found) of each class, the error (a reference source is not found) only needs to be calculated, a characteristic vector X of an unknown sample is assigned to the Ci class of the error (a reference source is not found) with the minimum risk value. Characteristic selection is performed by adopting document frequency (DF), and classification is performed by using minimum risk-based double threshold Bayesian decision. In a time division-synchronous code division multiple access (TD-SCDMA) mobile internet content monitoring system, the algorithm has higher controllability and can realize real-time high-efficiency classification of mass text information.

Description

technical field [0001] The method of the invention is a minimum risk-based double-threshold Bayesian decision-making method, which detects and classifies the contents of the text information in the group domain of the mobile communication network, and realizes efficient and real-time text content supervision. It belongs to the field of information security. Background technique [0002] A survey released by the Internet Society of China shows that Chinese mobile phone users receive an average of 8.29 spam messages per week. As the largest mobile communication market in the world, the number of mobile phone users in China has exceeded 443 million, and each message is charged at 0.15 yuan. , Garbage information brings about more than 78 million yuan of income to operating companies every day. With the advent of the 3G era, users can browse and consult more and more information through their mobile phones, but they are also faced with the harassment of all kinds of spam. If w...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
Patent Type & Authority Applications(China)
IPC IPC(8): G06F17/30G06F17/27H04W24/00
Inventor 黄杰蒲文静王平霍贵超
Owner SOUTHEAST UNIV
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products