Looking for breakthrough ideas for innovation challenges? Try Patsnap Eureka!

Bad text detection method and device based on Bi-LSTM

A text detection and bad technology, applied in the direction of neural learning methods, special data processing applications, instruments, etc., can solve the problems of large limitations, ambiguous matching of keywords, and insufficient coverage, etc., and achieve the effect of high recall rate

Pending Publication Date: 2019-10-11
SURFILTER NETWORK TECH
View PDF10 Cites 1 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

This detection method has the following problems: the text is not word-segmented, and keywords are easily ambiguously matched; the key dictionary is manually formulated, with large limitations and insufficient coverage; the holistic principle of text understanding is ignored

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Bad text detection method and device based on Bi-LSTM
  • Bad text detection method and device based on Bi-LSTM
  • Bad text detection method and device based on Bi-LSTM

Examples

Experimental program
Comparison scheme
Effect test

Embodiment 1

[0048]This embodiment provides a method for detecting bad text based on Bi-LSTM, which can be executed by a computer with an information processing function, a network server, and the like. Bad text refers to text content that contains bad information related to pornography, gambling, and drugs. As an application scenario of the present invention, in this embodiment, the web server detects the webpage text in the form of data stream in the network according to the method provided by the present invention. It can be understood that, for detection, the webpage text in data stream form can be restored to the webpage text in natural language form. Hereinafter, the Bi-LSTM-based bad text detection method provided in this embodiment will be described.

[0049] refer to figure 1 , this embodiment discloses a bad text detection method based on Bi-LSTM, such as figure 1 As shown, the methods mainly include:

[0050] S0, acquiring text data, and performing type marking on the acquir...

Embodiment 2

[0077] Based on the same inventive concept, this embodiment discloses a bad text detection device based on Bi-LSTM, including a training module and a detection module, wherein the training module includes a training data acquisition unit, a preprocessing unit and a model training unit,

[0078] The training data acquisition unit is used to acquire text data, and carry out type marking to the acquired text data;

[0079] The preprocessing unit is used to preprocess the text data to form a training set;

[0080] The model training unit is used to train the parameters of the Bi-LSTM bidirectional cyclic neural network model through the training set, and when the iterative change of the loss value produced by the Bi-LSTM bidirectional cyclic neural network model is no longer lower than the set threshold, then Terminate the training of the Bi-LSTM bidirectional recurrent neural network model to obtain a trained Bi-LSTM bidirectional recurrent neural network model;

[0081] The det...

Embodiment 3

[0093] Based on the same inventive concept, this embodiment discloses a bad text detection system based on Bi-LSTM, including a memory and a processor, a computer program is stored in the memory, and the processor can run the computer program to perform implementation The method described in Example 1.

[0094] Those of ordinary skill in the art can understand that all or part of the processes in the methods of the above embodiments can be implemented through computer programs to instruct related hardware, and the programs can be stored in a computer-readable storage medium. During execution, it may include the processes of the embodiments of the above-mentioned methods. Wherein, the storage medium may be a magnetic disk, an optical disk, a read only memory (Read Only Memory, ROM) or a random access memory (RandomABBessMemory, RAM), etc.

[0095] Those skilled in the art should understand that the embodiments of the present invention may be provided as methods, systems, or co...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

PUM

No PUM Login to View More

Abstract

The invention discloses a bad text detection method and device based on Bi-LSTM. The method comprises the steps of obtaining text data, and performing type marking on the obtained text data; preprocessing the text data to form a training set; training parameters of a model through a training set; if the iterative change of the loss value generated by the LSTM bidirectional recurrent neural networkmodel is smaller than the set threshold value and is not reduced any more, stopping training a Bi-LSTM bidirectional recurrent neural network model to obtain a trained Bi-LSTM bidirectional recurrentneural network model; preprocessing text data to be judged and then inputting the text data to be judged into trained Bi-LSTM bidirectional recurrent neural network model to output a judgment result.According to the method, the text content is understood, detected and classified from the full-text integrity perspective, a key dictionary does not need to be made manually, word segmentation processing is conducted on the text content, and concise, efficient and high-recall-rate bad text content detection is achieved.

Description

technical field [0001] The present invention relates to the field of web page content detection, and more specifically, to a method and device for detecting bad text based on Bi-LSTM. Background technique [0002] With the rapid development of information technology, the information on the Internet is growing exponentially, and a large number of webpages are of various types, so there are also many harmful texts related to pornography and politics. Simply relying on manual review and filtering of bad text content has problems such as heavy workload and high labor costs. Therefore, it is imminent to detect and identify the bad text content of Web pages. [0003] The prior art usually uses a keyword matching method to detect text content. This detection method has the following problems: the text is not word-segmented, and keywords are easily ambiguously matched; the key dictionary is manually formulated, with large limitations and insufficient coverage; the holistic princip...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

Application Information

Patent Timeline
no application Login to View More
Patent Type & Authority Applications(China)
IPC IPC(8): G06F17/27G06K9/62G06N3/04G06N3/08
CPCG06N3/08G06F40/216G06F40/289G06N3/044G06N3/045G06F18/241
Inventor 张聪沈冀平马啸尘周勇林沈智杰景晓军
Owner SURFILTER NETWORK TECH
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Patsnap Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Patsnap Eureka Blog
Learn More
PatSnap group products