Content filtering method and device

A content filtering and content technology, applied in the field of data processing, can solve problems such as filtering technical defects, low matching performance, and memory consumption, and achieve high matching accuracy, optimized matching performance, and accurate matching results

Active Publication Date: 2013-01-02
HUAWEI TECH CO LTD
View PDF4 Cites 46 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0006] However, this content filtering technology in the prior art has relatively large defects.
The rule condition matching method used for URL address content filtering is carried out by using the DFA graph. When the number of rule conditions is too large or complex rule condition configuration is required, for example, regular expressions including wildcards, such as ".* / abc .* / news", ".*\.www\.domain.*\.com", etc., you will encounter the problem of consuming a lot of memory
This is the main shortcoming of the DFA algorithm. The existing technology can use a compressed DFA, such as the D2FA (Delayed DFA) algorithm instead of the standard DFA for matching, but it will cause low matching performance, because the time efficiency of the D2FA algorithm is several times lower than that of the standard DFA.

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Content filtering method and device
  • Content filtering method and device
  • Content filtering method and device

Examples

Experimental program
Comparison scheme
Effect test

Embodiment 1

[0043] figure 1 It is a flow chart of the content filtering method provided by Embodiment 1 of the present invention. The content filtering method of this embodiment can be applied to various scenarios where text content needs to be filtered, and can be implemented in the form of software and / or hardware, typically For example, web page content filtering based on the text application layer protocol can be implemented by software integrated in the gateway.

[0044] The content filtering method mainly includes a precompilation process for rule conditions and a filtering process for content to be filtered, specifically including the following steps:

[0045] Step 110, respectively extract keywords from one or more input rule conditions;

[0046] Step 120: Divide the one or more rule conditions into one or more groups according to the extracted keywords, so that the rule conditions in the same group have the same keyword, and precompile group matching for the extracted keywords ...

Embodiment 2

[0066] The content filtering method provided by Embodiment 2 of the present invention may further improve the pre-compilation and filtering process of filtering rules based on the above-mentioned embodiments. In the above embodiments, the precompilation and filtering of filtering rules can be performed based on various technologies, for example, record the corresponding identification after matching the rule conditions, and then match which filtering rules are applicable to each filtering rule based on the identification, Then implement the corresponding filtering strategy. Or use a tree structure to construct each filter rule, and match the matched rule conditions in the tree structure.

[0067] This embodiment provides another preferred filtering rule matching solution. At any point in the pre-compilation process, the following steps are performed:

[0068] Assigning unique condition identifiers to the one or more rule conditions respectively, and precompiling the filter ma...

Embodiment 3

[0093] figure 2 It is a flow chart of the content filtering method provided by Embodiment 3 of the present invention. In the above embodiment, the precompilation of the rule conditions and filter rules input by the user is introduced in the initial stage. In practical applications, the user can add, delete and change the rule conditions and filter rules at any time. The change operation is equivalent to deleting Added operations. This embodiment mainly optimizes the operation of adding rule conditions, then the above content filtering method can further perform the following operations:

[0094] Step 210, when the newly added rule condition is obtained, extract keywords from the newly added rule condition;

[0095] Step 220, searching or creating a corresponding group for the newly added rule condition according to the keywords extracted from the newly added rule condition, and recompiling the group matching data set;

[0096] In this step, you can first search whether the...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention provides a content filtering method and a device. The method comprises steps of respectively extracting key words from input rule conditions; dividing the rule conditions into one or more groups in accordance with extracted key words, and precompiling group matching data sets for the extracted key words; respectively precompiling accurate matching data sets for groups of rule conditions corresponding to the extracted key words; obtaining content to be filtered; conducting key word matching for the content to be filtered by the aid of the group matching data sets; conducting accurate matching of rule conditions for the content to be filtered by the aid of the accurate matching data sets of the groups of rule conditions which correspond to the matched key words; and implementing corresponding filtering strategies in accordance with matching results of accurate matching. Group prefiltering is conducted for rule conditions, so that the quantity of each of groups of rule conditions is little, and the occupied memory is reduced. The matching accuracy is high based on accurate matching of rule conditions after group prefiltering.

Description

technical field [0001] Embodiments of the present invention relate to data processing technologies, and in particular, to a content filtering method and device. Background technique [0002] As the largest information center in the world, the Internet is growing at an astonishing speed, but the information in it is uneven, and there are many bad websites and bad resources. In addition, there are also some suspicious websites that contain malicious software, which can threaten the user's personal privacy and even damage the user's computer. [0003] In order to avoid the harm of bad information, the prior art adopts the content filtering technology based on the application layer protocol to filter the webpage. For example, for an enterprise network gateway, filtering policies can be configured to filter webpages with certain types of content, so as to restrict the prohibited behaviors of internal users of the enterprise network, such as prohibiting access to inappropriate we...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
Patent Type & Authority Applications(China)
IPC IPC(8): H04L29/06
CPCG06F17/30H04L29/06G06F16/9535
Inventor 尤里・哈桑艾维・菲尔莫默
Owner HUAWEI TECH CO LTD
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products