Text disturbance detection method, disturbance restoration method, disturbance processing method and devices

A detection method and text technology, applied in the direction of electronic digital data processing, natural language data processing, character and pattern recognition, etc., can solve the problems of text review model error response, poor text review effect, etc., to improve the text review effect, eliminate Text perturbation, the effect of reducing the risk of text review

Pending Publication Date: 2020-10-16
BAIDU ONLINE NETWORK TECH (BEIJIBG) CO LTD
View PDF0 Cites 4 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

However, adding perturbation to the text may cause the text review mo

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Text disturbance detection method, disturbance restoration method, disturbance processing method and devices
  • Text disturbance detection method, disturbance restoration method, disturbance processing method and devices
  • Text disturbance detection method, disturbance restoration method, disturbance processing method and devices

Examples

Experimental program
Comparison scheme
Effect test

no. 1 example

[0061] Such as figure 1 As shown, the present application provides a text disturbance detection method, comprising the following steps:

[0062] Step 101: Perform word segmentation on the first text to obtain a first word sequence.

[0063] The above-mentioned first text can be understood as text published by the user on the Internet, such as articles, comments, posts, etc. published by the user through Internet platforms such as blogs and microblogs.

[0064] In this step, performing word segmentation on the first text may be understood as performing word segmentation on the first text, so as to divide the first text into multiple words. When segmenting the first text, the first text may be segmented at the granularity of the smallest semantic unit.

[0065] In the first word sequence obtained by performing word segmentation on the first text, the order of words is the same as the order in which each word appears in the first text. That is to say, performing word segmentat...

no. 2 example

[0110] like Figure 5 As shown, the present application provides a text disturbance restoration method, comprising the following steps:

[0111] Step 201: Replace the first disturbing word in the first word sequence with a mask mark to obtain a second word sequence.

[0112] Wherein, the above-mentioned first word sequence is a word sequence obtained by segmenting the first text. For the relevant description of the first word sequence, reference may be made to the relevant content in the first embodiment, and the same beneficial effect can be achieved, so in order to avoid repetition, details are not repeated here.

[0113] The above-mentioned first disturbance word may be a certain disturbance word in the first text, or all the disturbance words in the first text.

[0114] When the number of disturbance words in the first word sequence is greater than 1, these disturbance words form a disturbance word set. A single disturbance word in the disturbance word set can be masked...

no. 3 example

[0165] like Figure 7 As shown, the present application provides a text perturbation processing method, comprising the following steps:

[0166] Step 301: Segment the first text to obtain the first word sequence;

[0167] Step 302: Obtain a context vector representation of each word in the first word sequence, where the context vector representation is a vector representation that incorporates context information of the current word;

[0168] Step 303: According to the context vector representation of each word in the first word sequence, detect disturbing words in the first word sequence, where the disturbing words include the first disturbing word;

[0169] Step 304: replacing the first disturbance word in the first word sequence with a mask mark to obtain a second word sequence;

[0170] Step 305: Obtain an estimated context vector representation of the mask flag, where the estimated context vector representation is a vector representation that incorporates context inform...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention discloses a text disturbance detection method, a disturbance restoration method and a disturbance processing method and devices, and relates to the technical field of natural language processing. The text disturbance detection method comprises the steps of performing word segmentation on a first text to obtain a first word sequence; obtaining context vector representation of each word of the first word sequence; and detecting disturbance words in the first word sequence according to the context vector representation. According to the invention, the disturbance word in the text isdetected according to the context vector representation, so that text disturbance detection is realized. After the disturbance word is detected, mask processing is carried out on the disturbance word, and a replacement word of the disturbance word is determined by obtaining estimated context vector representation of a mask mark so as to realize text disturbance restoration. Through the text disturbance detection and text disturbance restoration processes, the text disturbance can be effectively eliminated, so that the text auditing effect can be improved.

Description

technical field [0001] The present application relates to data processing technology, in particular to the field of natural language processing technology, and in particular to a text disturbance detection method, a disturbance restoration method, a disturbance processing method and a device. Background technique [0002] Natural Language Processing (NLP for short) technology is an important part of realizing information exchange between human and machine. Natural language processing models based on deep learning training have been widely used, such as information retrieval, machine translation, public opinion monitoring, mobile phone smart assistants, automatic question and answer, information extraction, text summarization, etc. In order to build a good network environment, it is necessary to review the text through the text review model. However, adding perturbations to the text may cause the text review model to respond incorrectly, resulting in poor text review perform...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
IPC IPC(8): G06F40/284G06K9/62
CPCG06F40/284G06F18/25G06F18/24
Inventor 王文华吕中厚王洋
Owner BAIDU ONLINE NETWORK TECH (BEIJIBG) CO LTD
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products