Looking for breakthrough ideas for innovation challenges? Try Patsnap Eureka!

Sensitive word recognition method, device and equipment

A recognition method and technology of sensitive words, applied in the direction of instruments, electrical digital data processing, calculation, etc., can solve problems such as inability to recognize, poor recognition effect, etc.

Pending Publication Date: 2020-07-03
ZHUHAI KINGSOFT ONLINE GAME TECH CO LTD
View PDF4 Cites 11 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0004] Currently, there are many sensitive word variants. These variants are similar in shape or sound to sensitive words. In the above scheme, such variants cannot be recognized, and the recognition effect is not good.

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Sensitive word recognition method, device and equipment
  • Sensitive word recognition method, device and equipment
  • Sensitive word recognition method, device and equipment

Examples

Experimental program
Comparison scheme
Effect test

Embodiment approach

[0311] As an embodiment, the device also includes:

[0312] A preprocessing module (not shown in the figure) is used to perform any one or more of the following preprocessing on the text to be recognized: character cleaning, full-width to half-width, traditional to simplified, pinyin to text, split word merging, Restore the homophonic characters to obtain the preprocessed text;

[0313] The segmentation module 902 is specifically configured to: perform segmentation processing on the preprocessed text to obtain multiple word segments.

[0314] As an embodiment, the device also includes:

[0315] The second recognition module (not shown in the figure) is used to iteratively intercept a character string of a preset length from the text to be recognized; for each character string intercepted, perform the character string with a pre-established dictionary tree Match; if there is a branch matching the string in the dictionary tree, the string is identified as a sensitive word.

...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

PUM

No PUM Login to View More

Abstract

Embodiments of the invention provide a sensitive word identification method, device and equipment. The method comprises the steps of determining context information corresponding to each character ina segmented word; a word vector sequence of the segmented words is generated according to the semantic relation in the context information, and the word vector sequence comprises word vectors of the segmented words and word vectors of the segmented words in the context information; inputting the word vector sequence of the segmented word into a pre-trained recognition model to obtain a recognitionresult whether the segmented word is a sensitive word or not; for sensitive word variants, even if glyph or character pronunciation is changed, the context semantic dependency relationship is not changed, so that the sensitive words are recognized based on the context semantic dependency relationship in the scheme, the variants of the sensitive words can be recognized, and the recognition effectis improved.

Description

technical field [0001] The present invention relates to the technical field of word processing, in particular to a sensitive word recognition method, device and equipment. Background technique [0002] In some Internet scenarios, such as Internet forums, personal homepages, game chats, etc., users can post some text content to express opinions, express emotions, or communicate with other users. In order to create a healthy network environment, it is usually necessary to review the text content published by users, that is, to identify whether the text content contains some sensitive words that do not meet the specifications. [0003] Existing sensitive word identification schemes usually include: obtaining the text content published by users, segmenting the text content to obtain multiple word segments, and matching each word segment with the pre-established sensitive vocabulary database. If the matching is successful, it indicates that the The participle is a sensitive word...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

Application Information

Patent Timeline
no application Login to View More
IPC IPC(8): G06F40/289G06F40/30G06F40/109
Inventor 余建兴余敏雄余赢超王焜冯毅
Owner ZHUHAI KINGSOFT ONLINE GAME TECH CO LTD
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Patsnap Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Patsnap Eureka Blog
Learn More
PatSnap group products