Content filtering method and device

A content filtering and content technology, applied in the field of data processing, can solve problems such as filtering technical defects, low matching performance, memory consumption, etc., and achieve high matching accuracy, optimized matching performance, and accurate matching results

Active Publication Date: 2015-07-08
HUAWEI TECH CO LTD
View PDF4 Cites 0 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0006] However, this content filtering technology in the prior art has relatively large defects.
The rule condition matching method used for URL address content filtering is carried out by using the DFA graph. When the number of rule conditions is too large or complex rule condition configuration is required, for example, regular expressions including wildcards, such as ".* / abc .* / news", ".*\.www\.domain.*\.com", etc., you will encounter the problem of consuming a lot of memory
This is the main shortcoming of the DFA algorithm. The existing technology can use a compressed DFA, such as the D2FA (Delayed DFA) algorithm instead of the standard DFA for matching, but it will cause low matching performance, because the time efficiency of the D2FA algorithm is several times lower than that of the standard DFA.

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Content filtering method and device
  • Content filtering method and device
  • Content filtering method and device

Examples

Experimental program
Comparison scheme
Effect test

Embodiment 1

[0043] figure 1 It is a flow chart of the content filtering method provided by Embodiment 1 of the present invention. The content filtering method of this embodiment can be applied to various scenarios where text content needs to be filtered, and can be implemented in the form of software and / or hardware, typically For example, web page content filtering based on the text application layer protocol can be implemented by software integrated in the gateway.

[0044] The content filtering method mainly includes a precompilation process for rule conditions and a filtering process for content to be filtered, specifically including the following steps:

[0045] Step 110, respectively extract keywords from one or more input rule conditions;

[0046] Step 120: Divide the one or more rule conditions into one or more groups according to the extracted keywords, so that the rule conditions in the same group have the same keyword, and precompile group matching for the extracted keywords ...

Embodiment 2

[0066] The content filtering method provided by Embodiment 2 of the present invention may further improve the pre-compilation and filtering process of filtering rules based on the above-mentioned embodiments. In the above embodiments, the precompilation and filtering of filtering rules can be performed based on various technologies, for example, record the corresponding identification after matching the rule conditions, and then match which filtering rules are applicable to each filtering rule based on the identification, Then implement the corresponding filtering strategy. Or use a tree structure to construct each filter rule, and match the matched rule conditions in the tree structure.

[0067] This embodiment provides another preferred filtering rule matching solution. At any point in the pre-compilation process, the following steps are performed:

[0068] Assigning unique condition identifiers to the one or more rule conditions respectively, and precompiling the filter ma...

Embodiment 3

[0093] figure 2 It is a flow chart of the content filtering method provided by Embodiment 3 of the present invention. In the above embodiment, the precompilation of the rule conditions and filter rules input by the user is introduced in the initial stage. In practical applications, the user can add, delete and change the rule conditions and filter rules at any time. The change operation is equivalent to deleting Added operations. This embodiment mainly optimizes the operation of adding rule conditions, then the above content filtering method can further perform the following operations:

[0094] Step 210, when the newly added rule condition is obtained, extract keywords from the newly added rule condition;

[0095] Step 220, searching or creating a corresponding group for the newly added rule condition according to the keywords extracted from the newly added rule condition, and recompiling the group matching data set;

[0096] In this step, you can first search whether the...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

Embodiments of the present invention provide a content filtration method and device. The method comprises: respectively extracting a keyword from entered rule conditions; dividing the rule conditions into one or more groups according to the extracted keyword, and pre-compiling a group matching dataset for the extracted keyword; respectively pre-compiling a precise matching dataset for the rule conditions of the groups corresponding to the extracted keyword; obtaining to-be-filtered content; using the group matching dataset to perform keyword matching on the to-be-filtered content; using the precise matching dataset of the rule conditions of the groups corresponding to the matched keyword to perform precise matching of the rule conditions on the to-be-filtered content; and executing a corresponding filtration policy according to a matching result of the precise matching. The present invention performs group pre-filtration on the rule conditions; therefore the number of the rule conditions in each group is small, and occupied memory is reduced. However, the precise matching based on the rule conditions after the group pre-filtration has a higher matching accuracy.

Description

technical field [0001] Embodiments of the present invention relate to data processing technologies, and in particular, to a content filtering method and device. Background technique [0002] As the largest information center in the world, the Internet is growing at an astonishing speed, but the information in it is uneven, and there are many bad websites and bad resources. In addition, there are also some suspicious websites that contain malicious software, which can threaten the user's personal privacy and even damage the user's computer. [0003] In order to avoid the harm of bad information, the prior art adopts the content filtering technology based on the application layer protocol to filter the webpage. For example, for an enterprise network gateway, filtering policies can be configured to filter webpages with certain types of content, so as to restrict the prohibited behaviors of internal users of the enterprise network, such as prohibiting access to inappropriate we...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
Patent Type & Authority Patents(China)
IPC IPC(8): H04L29/06
CPCG06F17/30867G06F16/9535
Inventor 尤里·哈桑艾维·菲尔莫默
Owner HUAWEI TECH CO LTD
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products