Method and system for filtering sensitive web page based on multiple classifier amalgamation

A multi-classifier fusion and web filtering technology, which is applied in the field of identifying web pages containing sensitive information, can solve the problems of not being able to achieve satisfaction, high misrecognition rate of sensitive web pages, and not making good use of web features, and achieve good applications. Foreground, fast processing time effects

Active Publication Date: 2008-10-08
INST OF AUTOMATION CHINESE ACAD OF SCI
View PDF0 Cites 97 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0006] Similar to the mechanical filtering method, the above methods do not make good use of web features and cannot achieve satisfactory results at present. For example, text-based sensitive webpage recognition cannot identify normal webpages related to sensitive topics. Based on Sensitive web page recognition of images has a high false positive rate
The existing fusion algorithm is only fused by AND or operation, which cannot fundamentally improve the recognition rate

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Method and system for filtering sensitive web page based on multiple classifier amalgamation
  • Method and system for filtering sensitive web page based on multiple classifier amalgamation
  • Method and system for filtering sensitive web page based on multiple classifier amalgamation

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0039] The present invention will be described in detail below in conjunction with the accompanying drawings. It should be noted that the described embodiments are only intended to facilitate the understanding of the present invention, rather than limiting it in any way.

[0040] Such as Figure 4 The present invention is based on the sensitive web page filtering system based on the fusion of multiple classifiers, including: a data stream acquisition and preprocessing unit 1, which generates the text stream and image stream of the original web page, and divides the original web page into web page styles based on this; Image and text stream filtering unit 2, for different web page styles, use corresponding classifiers to identify text and images; image filter and text filter information fusion unit 3, for mixed web page styles, combine image filtering through fusion formula device and text filter to obtain the final recognition result of whether it is a sensitive class. To sum...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention discloses a system and a method for filtering sensitive webpage, which is based on multi-classifier fusion. The processing object is a webpage, and the processing result is whether the webpage contains sensitive content, which may be pornography, reaction, violence and other unhealthy Internet contents harmful to society. The system comprises a data stream obtaining and preprocessing unit, an image and text stream filtering unit and an information fusion unit of image filter and text filter, by the cooperation of multiple classifiers, the system acquires source code of a webpage by using the URL of the webpage, a text and an image are separated at preprocessing stage to obtain text information and effective image information; an input webpage is divided into three modes by decision tree algorithm; the webpage is recognized by using a consecutive text classifier, a discrete sensitive text classifier and an image classifier, the output result recognized by the classifiers is fused and calculated, then a judge factor is given, and the final result is returned to a browser.

Description

technical field [0001] The invention relates to the technical field of information filtering, in particular to a method for identifying webpages containing sensitive information. Background technique [0002] Because Internet sensitive information has caused great harm to Internet users, especially young people, it has attracted extensive attention from researchers and the industry. [0003] There are currently many sensitive information filtering methods, including black and white lists, IP filtering, keyword matching and other filtering methods. Generally speaking, on the one hand, these filtering technologies adopt a very mechanical method, which can achieve 100% filtering efficiency for some sensitive web pages, and the response time is also very short, but the update cycle of filtering parameters can only follow the actual sensitive web pages The appearance of the website changes, and it cannot cope with the rapid changes of the actual sensitive website. On the other ...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
Patent Type & Authority Applications(China)
IPC IPC(8): G06F17/30G06K9/62H04L12/26
Inventor 胡卫明陈周耀吴偶朱明亮
Owner INST OF AUTOMATION CHINESE ACAD OF SCI
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products