Sensitive word recognition method based on big data

A recognition method and sensitive word technology, applied in the field of big data research, can solve problems such as ignoring context, high labor cost, and heavy manual workload, so as to reduce the number of manual interventions, reduce labor time, and improve recognition accuracy.

Pending Publication Date: 2022-04-22
NANJING SHICHAZHE INFORMATION TECH CO LTD
View PDF0 Cites 4 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0003] In the process of realizing the present invention, the inventors found that there are at least the following problems in the prior art: the traditional technology mainly adopts the method of rule matching for identification, that is, constructing a vocabulary of sensitive words, which mainly comes from manual operation and collection. The cost is high and the efficiency is low. By traversing the vocabulary of sensitive words and matching with each text information, if any sensitive words are found, they need to be submitted to the reviewers for manual review
This method has disadvantages: first, as the number of sensitive words increases, there will be more and more variants of sensitive words, and the vocabulary of sensitive words will also become larger and larger, and the search speed will slow down due to circular matching; second, the The method still requires reviewers to make manual judgments, and the manual workload is heavy; third, this method can only detect whether sensitive words appear or not, ignoring the context in which sensitive words appear, which is likely to cause false positives

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Sensitive word recognition method based on big data

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0028] In order to clarify the technical solution and working principle of the present invention, the embodiments of the present disclosure will be further described in detail below in conjunction with the accompanying drawings. All the above optional technical solutions may be combined in any way to form optional embodiments of the present disclosure, which will not be repeated here.

[0029] The terms "step 1", "step 2", "step 3" and other similar descriptions in the description and claims of this application and the above drawings are used to distinguish similar objects, not necessarily to describe a specific order or sequence order. It should be understood that the data so used may be interchanged under appropriate circumstances such that the embodiments of the application described herein can be practiced in sequences other than those described herein.

[0030] The embodiment of the present disclosure provides a method for identifying sensitive words based on big data, t...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention discloses a sensitive word recognition method based on big data. The method comprises the steps of 1, collecting text data by utilizing crawler software, performing sensitive marking on the text data to obtain a sensitive text D1 and a normal text D2, performing sensitive word classification and grade marking on sensitive words, and storing the sensitive words into a sensitive word list S; 2, carrying out new word discovery through an N-gram model, and amplifying the sensitive word lists S; and 3, carrying out deformation processing on each sensitive word in each sensitive word list S to obtain deformed sensitive words. 4, filtering the sensitive words in the sensitive word list S based on a Trie tree and a BERT model; according to the method, the accuracy and efficiency of auditing and identifying the sensitive words are improved.

Description

technical field [0001] The invention relates to the field of big data research, in particular to a method of natural language processing, in particular to a method for identifying sensitive words based on big data. Background technique [0002] With the continuous development of the Internet, people can see a large amount of text information through various platforms of the Internet. Some of the information contains sensitive information, such as terrorist pornography, etc. If it is not identified and controlled, it will hinder social stability and damage social and public interests. It is an important task to control the quality of text information, identify and process these sensitive information in a timely manner, ensure that the published content does not contain sensitive information, and create a healthy network environment. [0003] In the process of realizing the present invention, the inventors found that there are at least the following problems in the prior art:...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
Patent Type & Authority Applications(China)
IPC IPC(8): G06F16/31G06F16/35G06F16/33G06F16/338G06F40/289G06N20/00
CPCG06F16/322G06F16/35G06F16/3346G06F16/338G06F40/289G06N20/00
Inventor 周洁琴周金明
Owner NANJING SHICHAZHE INFORMATION TECH CO LTD
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products