Sensitive word filtering method and system

A filtering system and filtering method technology, applied in special data processing applications, instruments, electrical digital data processing, etc., can solve the problems of single matching mode of sensitive words, low matching strategy performance, wrong matching of sensitive words, etc., and achieve high recall rate , improve the recall rate, and reduce the effect of misjudgment

Active Publication Date: 2016-05-11
北京中科汇联科技股份有限公司
View PDF2 Cites 37 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0004] In the sensitive word filtering method in the prior art, the matching mode of sensitive words is single, which may easily cause false matching or missing matching of sensitive words, and the performance of the matching strategy is low, which brings great pressure to the filtering speed of sensitive words

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Sensitive word filtering method and system
  • Sensitive word filtering method and system
  • Sensitive word filtering method and system

Examples

Experimental program
Comparison scheme
Effect test

Embodiment 1

[0034] Example 1: Figure 4 It is a flow chart of the sensitive word filtering method provided by the embodiment of the present invention. It can be clearly seen from the figure that the sensitive word filtering method provided by this embodiment includes the following steps:

[0035] S1. Standardize sensitive words, excluded words, and text characters, form sensitive word management rules for normalized processing of sensitive words, and form excluded word management rules for standardized processing of excluded words;

[0036] S2. Establish a sensitive word filtering model according to the sensitive word management rules and excluded word management rules, and use the sensitive word filtering model to scan the characters or word segments of the normalized text;

[0037] S3. According to the sensitive word filtering strategy, match the sensitive word filtering model with the scanned characters or participles, and judge whether the characters or participle are sensitive words ...

Embodiment 2

[0042] Example 2: figure 1 It is a block diagram of the sensitive word filtering system provided by the embodiment of the present invention. It can be clearly seen from the figure that the technical framework of the sensitive word filtering system provided by the present embodiment includes four main modules: character normalization processing module 101, sensitive Word management module 102, excluded word management 103, sensitive word filter module 104, wherein, module 102,103 is depended on module 101, realizes the standardization process to sensitive word and excluded word, and module 104 depends on module 101, realizes to text For normalization processing, module 104 relies on modules 102 and 103 to obtain sensitive words and excluded words and construct an Aho-Corasick automaton.

[0043] Further, module 104 includes 6 filtering submodules and 1 filtering result summary submodule: default sensitive word filtering submodule 105, English sensitive word filtering submodule ...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention relates to the character string multi-mode matching field, and discloses a sensitive word filtering method. The sensitive word filtering method comprises the steps of performing management on Chinese, English, website sensitive words and excluding words; performing a character normalization processing method; performing a group of filtering policies and realization method for sensitive words in different existence forms, at least comprising a filtering step for Chinese, English, websites, full spelling, pinyin compiling and anagram; setting a group of criterion rules for sensitive words; and performing an approximate matching method for Chinese sensitive words. The invention also discloses a sensitive word filtering apparatus. According to the sensitive word filtering method and apparatus, the requirements of a content administrator and a searcher on issued or searched text filtering sensitive words can be satisfied; filtering for a large amount of sensitive words can be carried out rapidly and accurately; and the sensitive words, the level of the sensitive words and the positions of the sensitive words in the can be returned to the caller.

Description

technical field [0001] The invention relates to the field of string multi-pattern matching, in particular to a method and system for filtering sensitive words. Background technique [0002] With the development of the Internet, the content of the website is becoming more and more abundant. The openness of the Internet provides users with UGC (User Generated Content, user-generated content) websites or social application software. The content of the regulations, such as political sensitivity, pornographic terms, etc., has brought enormous pressure to Internet management. [0003] The distribution channels of text content are becoming more and more diverse, and the publishers are becoming more and more popular, sometimes even anonymous. Faced with a large number of texts, Internet managers hope to filter out illegal and unfavorable information from them. In addition, When content collectors collect texts from the Internet, they also hope to obtain the information they are int...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
Patent Type & Authority Applications(China)
IPC IPC(8): G06F17/30
CPCG06F16/9535
Inventor 游世学王丙栋杜新凯
Owner 北京中科汇联科技股份有限公司
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products