Method and system for filtering sensitive web page based on multiple classifier amalgamation

A multi-classifier fusion and webpage filtering technology, applied in the field of identifying webpages containing sensitive information, can solve the problems of unsatisfactory, high false recognition rate of sensitive webpage recognition, and inability to fundamentally improve the recognition rate, etc., to achieve good application Foreground, effects with fast processing time

Active Publication Date: 2009-12-02
INST OF AUTOMATION CHINESE ACAD OF SCI
View PDF2 Cites 1 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0006] Similar to the mechanical filtering method, the above methods do not make good use of web features and cannot achieve satisfactory results at present. For example, text-based sensitive webpage recognition cannot identify normal webpages related to sensitive topics. Based on Sensitive web page recognition of images has a high false positive rate
The existing fusion algorithm is only fused by AND or operation, which cannot fundamentally improve the recognition rate

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Method and system for filtering sensitive web page based on multiple classifier amalgamation
  • Method and system for filtering sensitive web page based on multiple classifier amalgamation
  • Method and system for filtering sensitive web page based on multiple classifier amalgamation

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0039] The present invention will be described in detail below in conjunction with the accompanying drawings. It should be noted that the described embodiments are only intended to facilitate the understanding of the present invention, rather than limiting it in any way.

[0040] Such as Figure 4 The present invention is based on the sensitive web page filtering system based on the fusion of multiple classifiers, including: a data stream acquisition and preprocessing unit 1, which generates the text stream and image stream of the original web page, and divides the original web page into web page styles based on this; Image and text stream filtering unit 2, for different web page styles, use corresponding classifiers to identify text and images; image filter and text filter information fusion unit 3, for mixed web page styles, combine image filtering through fusion formula device and text filter to obtain the final recognition result of whether it is a sensitive class. To sum...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention discloses a sensitive webpage filtering system and method based on multi-classifier fusion. The processing object is a webpage, and the processing result is whether the webpage contains sensitive content. Sensitivity here can be defined as pornography, reactionary, violence, etc. Unhealthy Internet Content. The system includes data stream acquisition and preprocessing unit, image and text stream filtering unit, image filter and text filter information fusion unit. Next, obtain the source code of the web page, and split the text and image in the preprocessing stage to obtain text information and effective image information; use the decision tree algorithm to divide the input web page into three styles; use continuous text classifier, discrete sensitive text The classifier and the image classifier identify the webpage, perform fusion calculation according to the output results of each classifier identification, give the discriminant factor, and return the final result to the browser.

Description

technical field [0001] The invention relates to the technical field of information filtering, in particular to a method for identifying webpages containing sensitive information. Background technique [0002] Because Internet sensitive information has caused great harm to Internet users, especially young people, it has attracted extensive attention from researchers and the industry. [0003] There are currently many sensitive information filtering methods, including black and white lists, IP filtering, keyword matching and other filtering methods. Generally speaking, on the one hand, these filtering technologies adopt a very mechanical method, which can achieve 100% filtering efficiency for some sensitive web pages, and the response time is also very short, but the update cycle of filtering parameters can only follow the actual sensitive web pages The appearance of the website changes, and it cannot cope with the rapid changes of the actual sensitive website. On the other ...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
Patent Type & Authority Patents(China)
IPC IPC(8): G06F17/30G06K9/62H04L12/26
Inventor 胡卫明陈周耀吴偶朱明亮
Owner INST OF AUTOMATION CHINESE ACAD OF SCI
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products