A noisy illegal short text recognition method based on a dual-channel text convolutional neural network

A convolutional neural network and recognition method technology, applied in the field of computer natural language processing, can solve the problems of variant feature identification of illegal users, difficulty in constructing variant features, etc., and achieve the effect of improving accuracy and robustness

Inactive Publication Date: 2019-04-23
TIANGE TECH HANGZHOU
View PDF4 Cites 20 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

However, the method of constructing rules to extract variant features is easy to be identified by illegal users, so as to further evade the identifica

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • A noisy illegal short text recognition method based on a dual-channel text convolutional neural network
  • A noisy illegal short text recognition method based on a dual-channel text convolutional neural network
  • A noisy illegal short text recognition method based on a dual-channel text convolutional neural network

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0023] The present invention will be described in detail below in conjunction with the accompanying drawings. It should be noted that the described embodiments are only for understanding of the present invention, and do not limit it in any way.

[0024] The method of the present invention is not limited to processing pornographic text information, and other similar illegal advertising information can also be effectively processed, such as: various invoiced advertising texts, only need to collect relevant sample information to obtain the corresponding recognizer through learning. In this embodiment, the main object of processing is pornographic promotional text, that is, to identify various pornographic advertisement text information released by various illegal users on the network platform, and most of these information have added noise to break through the existing illegal text information. Detection Systems. This embodiment is implemented using tensorflow, a deep learning fr...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention relates to a noisy illegal short text recognition method based on a dual-channel text convolutional neural network. The method comprises the steps of preprocessing the short texts with noise, constructing a dual-channel text convolutional neural network model, and training and real-time recognizing the model. The preprocessing of the short text with noise is used for standardizing the noise characters, eliminating the influence of noise and improving the learning ability of the convolutional neural network model. The dual-channel text convolutional neural network model is a textconvolutional neural network model capable of inputting a preprocessed character sequence and a preprocessed pinyin sequence at the same time. Due to the fact that the input capacity and the modelingcapacity of the pinyin sequence are improved, the influence of homophone character replacement on the classification performance can be eliminated through the model. According to the method, influences caused by homophone character replacement, English character replacement with similar shapes, numeric symbol replacement with the same semantics and the like can be processed, and the experimental results show that the method has higher recognition accuracy and lower false detection rate for the recognition of the illegal short texts with noise.

Description

technical field [0001] The invention belongs to the field of computer natural language processing, and relates to a method for identifying illegal short texts with noise based on a double-channel text convolutional neural network. Background technique [0002] With the rapid development of the network, the sharing and communication of information and opinions through the network has become an important way of current network applications. For example, discuss certain issues through BBS; express views, news and comments through Weibo; communicate through instant messaging tools; comment on the comment pages of news websites; communicate through live video services; Comment on the video content through the barrage when the video is playing, etc. This mode of user-generated content facilitates information sharing and communication among users. However, this method of publishing Internet content is also easy to be used by criminals to release some illegal advertising informati...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
IPC IPC(8): G06F16/35G06F17/27G06F17/21G06F17/22G06F17/26G06F40/191
CPCG06F40/103G06F40/126G06F40/191G06F40/289
Inventor 周建政姚金良黄金海明建华俞月伦
Owner TIANGE TECH HANGZHOU
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products