Sensitive word recognition method and device, equipment, storage medium and program product

A technology for sensitive words to be recognized, applied in the field of data processing, can solve the problem of low recognition accuracy of sensitive words, and achieve the effect of improving the recognition accuracy, improving the labeling effect, and improving the ability of boundary recognition.

Pending Publication Date: 2022-04-29
GUANGZHOU BAIGUOYUAN NETWORK TECH +1
View PDF0 Cites 0 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0005] This application provides a sensitive word recognition method, device, equipment, storage medium and program product to solve the problem of low accuracy of sensitive word recognition in the prior art

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Sensitive word recognition method and device, equipment, storage medium and program product
  • Sensitive word recognition method and device, equipment, storage medium and program product
  • Sensitive word recognition method and device, equipment, storage medium and program product

Examples

Experimental program
Comparison scheme
Effect test

Embodiment 1

[0037] figure 1 It is a flow chart of an embodiment of a method for identifying sensitive words provided in Embodiment 1 of the present application. This embodiment can be applied to a device for identifying sensitive words, and the device can be located in a server or a client, which is not limited in this embodiment.

[0038] Such as figure 1 As shown, this embodiment may include the following steps:

[0039] Step 110, based on the pre-generated domain dictionary database, determine and acquire the word set of the text to be recognized, and each word in the word set includes head position information and tail position information.

[0040] In practice, according to different requirements and application scenarios, the text to be recognized can have different sources and functions. For example, the text to be recognized may be the text obtained after the speech is recognized by an ASR (Automatic Speech Recognition, automatic speech recognition) system and denoised and clean...

Embodiment 2

[0075] image 3 It is a flowchart of an embodiment of a method for identifying sensitive words provided in Embodiment 2 of the present application. This embodiment is described in more detail on the basis of Embodiment 1, as shown in image 3 As shown, this embodiment may include the following steps:

[0076] Step 310, in the pre-generated field dictionary library, use a matching algorithm to match the words of the text to be recognized, obtain the word set of the text to be recognized, and obtain the position of each word in the word set Describe the head position information and tail position information in the text to be recognized.

[0077] By using a matching algorithm in the domain dictionary library to perform word matching on the text to be recognized, a word set of the text to be recognized can be obtained. Exemplarily, the matching algorithm may include, but not limited to: a forward maximum matching algorithm, a reverse maximum matching algorithm, or a bidirection...

Embodiment 3

[0147] Figure 6 A structural block diagram of an embodiment of a device for identifying sensitive words provided in Embodiment 3 of the present application may include the following modules:

[0148] The word set determination module 610 is used to determine the word set of the text to be recognized based on the pre-generated domain dictionary library, and each word in the word set includes head position information and tail position information;

[0149] The word-building parts acquisition module 620 is used to split each word in the word collection into word-building parts, and obtain the word-building parts corresponding to each word;

[0150] The input vector determination module 630 is used to obtain the word vector corresponding to each word, and to obtain the word-building part vector corresponding to the word-building part of each word; and based on the word vector of each word and the word structure Word component vectors generate input vectors for said words;

[0...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention discloses a sensitive word recognition method and device, equipment, a storage medium and a program product, and the method comprises the steps: determining a word set of a to-be-recognized text based on a pre-generated domain dictionary library, each word in the word set comprising head position information and tail position information; performing character construction component splitting on each word in the word set to obtain a character construction component corresponding to each word; word vectors corresponding to the words and word vectors corresponding to the word components and word component vectors corresponding to the word components are obtained; generating input vectors of the words based on the word vectors of the words and the word construction component vectors; inputting the head position information, the tail position information and the input vector of each word in the word set into a pre-generated sequence labeling model, and determining a labeling result of each word by the sequence labeling model based on the head position information, the tail position information and the input vector; and recognizing the sensitive word according to the labeling result of each word so as to improve the recognition accuracy of the sensitive word.

Description

technical field [0001] The present application relates to the technical field of data processing, and in particular to a method for identifying sensitive words, an apparatus for identifying sensitive words, an electronic device, a computer-readable storage medium, and a computer program product. Background technique [0002] In the field of content review of automatic speech recognition, it is usually used to manually review a large amount of sensitive texts, extract sensitive words from them to build a sensitive thesaurus, and then judge whether the sentence contains sensitive words by searching the sensitive thesaurus. The efficiency of this method is low, and sensitive words often change with time and environment, so artificially constructed sensitive word databases often lack accuracy, coverage, and timeliness, which will greatly reduce the detection effect of sensitive information. [0003] Later, a method for extracting sensitive words based on statistical features was...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
IPC IPC(8): G06F16/33G06F40/126G06F40/242G06F40/295G06N3/04G06N3/08G06N7/00
CPCG06F16/3331G06F40/126G06F40/242G06F40/295G06N3/08G06N7/01G06N3/045
Inventor 翟永刚刘海东
Owner GUANGZHOU BAIGUOYUAN NETWORK TECH
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products