Method and system for training sensitive word detection model

A technology for detecting models and training methods, which is applied in the field of training of sensitive word detection models, and can solve problems such as dependence

Active Publication Date: 2019-07-16
POTEVIO INFORMATION TECH
View PDF5 Cites 4 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0008] However, the DFA algorithm relies heavily on the existing sensitive lexicon. For words that do not exist or contain interference in the lexicon, other methods need to be combined to retrieve or even

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Method and system for training sensitive word detection model
  • Method and system for training sensitive word detection model
  • Method and system for training sensitive word detection model

Examples

Experimental program
Comparison scheme
Effect test

Embodiment 1

[0080] Figure 6It is a sensitive word detection model training method based on a single training corpus. Such as Figure 6 The sensitive word detection model shown includes a bidirectional long short-term memory network BLSTM model and a conditional random field CRF model, and the BLSTM model includes a first BLSTM model and a second BLSTM model. In addition, the model training method also introduces a CNN model containing a convolutional neural network.

[0081] The training method is as Figure 7 As shown, based on the training corpus such as Figure 6 X shown 正&火 , execute steps A-1 and A-2 iteratively until the end of the iterative procedure:

[0082] Step A-1 (S101): keep the current parameters of the CNN model from updating, train the first BLSTM model, the second BLSTM model and the CRF model: input the sample data of the training corpus into the first BLSTM model and the second BLSTM model, and apply the first BLSTM model to the second BLSTM model. The output of...

Embodiment 2

[0119] The present embodiment is the training method of the sensitive word detection model of multi-training corpus, as Figure 9The sensitive word detection model shown includes a bidirectional long-short memory network BLSTM model and a conditional random field CRF model. The BLSTM model includes the first BLSTM model and the second BLSTM model. The model training method also includes a convolutional neural network CNN model and N training corpora , n is the label of the training corpus, n=1,2,...,N.

[0120] Figure 9 for Figure 8 method, 4 examples from the training corpus, with Figure 6 the difference is, Figure 9 The second BLSTM model and CRF model in are in one-to-one correspondence with the training corpus n, identifying the second BLSTM n Models and CRFs n The superscript n of the model indicates the corresponding relationship with the training corpus n.

[0121] Such as Figure 8 As shown, the training method of the present embodiment includes:

[0122] S...

Embodiment 3

[0143] The present invention also includes a sensitive word detection model, including the first BLSTM model, the second BLSTM model and the CRF model obtained after training in Example 1 and Embodiment 2 of the present invention.

[0144] Input the test text into the first BLSTM model and the second BLSTM model, input the output of the first BLSTM model and the second BLSTM model into the CRF model, and the CRF model outputs the sensitive word recognition result of the test text.

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention provides a method and a system for training a sensitive word detection model. The method comprises the steps: step A-1, inputting sample data of a training corpus into a first BLSTM model and a second BLSTM model, inputting outputs of the first BLSTM model and the second BLSTM model into a CRF model, and outputting a sensitive word recognition result of an input text by the CRF model; updating the current parameters of the model based on the difference between the identification result of the CRF and the marking result of the input text; step A-2, inputting the sample data of thetraining corpus into a current first BLSTM model, inputting the output of the first BLSTM model into a CNN model, and outputting a font recognition result of an input text by the CNN model; and updating the current parameter of the model based on the font difference between the recognition result of the CNN and the input text. According to the method and system for training the sensitive word detection model, the sensitive word detection model with better performance can be obtained, and compared with a traditional DFA algorithm, the sensitive word detection is not limited by a sensitive wordlexicon and has a certain detection capability on foreign characters.

Description

technical field [0001] The invention relates to the field of artificial intelligence, in particular to a training method and system for a sensitive word detection model. Background technique [0002] Sensitive word detection is an essential feature of modern network monitoring. How to design a filtering algorithm with high accuracy and strong robustness is a necessary condition for effective monitoring. Most of the traditional sensitive word algorithms are based on the existing sensitive thesaurus, and judge whether the sentence contains sensitive words by looking up the dictionary. [0003] Among the traditional algorithms, the most widely used should belong to the Deterministic Finite Automaton (DFA) algorithm, which is characterized by: figure 1 As shown, there is a finite set of states and some edges leading from one state to another state, each edge is marked with a symbol, one of the states is the initial state, and some states are the final states. [0004] Will f...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
IPC IPC(8): G06F16/35G06N3/04G06N3/08
CPCG06N3/08G06F16/355G06N3/045
Inventor 张鹏张春荣
Owner POTEVIO INFORMATION TECH
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products