Label-consistent text backdoor attack method

A label and text technology, applied in the field of text backdoor attacks, can solve the problems of destroying text semantic information and unusable samples

Active Publication Date: 2022-01-18
NAT UNIV OF DEFENSE TECH
View PDF9 Cites 3 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

Such toxic samples are easy to identify when someone examines the training set; (2) it is almost impossible to add a truly undetectable trigger word to the text, because the text is discrete, and a small perturbation can give the original The input brings significant changes, and the semantic information of the text has a strong correlation with the words that make up the text. Therefore, simple replacement, addition, and deletion operations may destroy the semantic information of the text, so that the attacked sample Unusable, these issues negatively impact the overall attack performance of existing textual backdoor attacks

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Label-consistent text backdoor attack method
  • Label-consistent text backdoor attack method
  • Label-consistent text backdoor attack method

Examples

Experimental program
Comparison scheme
Effect test

Embodiment 1

[0054] In order to ensure the concealation of poisonous samples, the source label of the poisoning sample is set to the target label, which is the premise of this work. The threshold of confrontation and hiding keyword samples is set to 0.5 and 0.75, respectively. The number of hidden words is less than or equal to 2, and the hidden words are limited to adjectives and adverbs.

[0055] Step 1: Based on the righteous trigger word generation, use the volunteer library to find such a trigger word, it itself appears low in the data set, but it has the same number of words in the data set in the data set as much as possible many. The meaning of the word can accurately describe the meaning of the word. Therefore, words with the same volunteer annotation should have the same meaning and can be replaced with each other. A series of emotional words were excluded from the trigger word table to avoid changing the emotion of the sample in the attack sample.

[0056] Step 2, two methods interf...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention discloses a label-consistent text backdoor attack method, which comprises the following steps of: generating a trigger word: aiming at a target data set, generating the trigger word through a primitive library; disturbing an original input sample by using a disturbance resisting method and a keyword hiding method based on a black box condition; adding the generated trigger word into the disturbed sentence to generate a poisoning sample through a trigger word replacement method based on a sememe, and training a target model by using a poisoning data set; in the reasoning stage, adding trigger words to test sentences through a sememe replacement method, and therefore the target model is decoy to predict the target category. According to the method, two methods for interfering original input are designed, and a high-quality text with the same tag as an original tag is generated, so that a target model can learn a trigger more easily; secondly, in the trigger generation and addition part, a trigger generation and addition method based on a sememe is provided; the attack success rate is high, and high-quality attack samples are generated.

Description

Technical field [0001] The present invention belongs to the field of artificial intelligence safety technology, in particular a text back door attack method that labels. Background technique [0002] The depth neural network model is easily threatened by the latter attack. The purpose of the back door attack is to embed the hidden back door embedded depth neural network (DNNS), making the infection model good in the clean sample, while the hidden back door is defined by the attacker It is activated, its prediction is maliciously changed. Traditional back door attacks include infection models and two parts of attack. First, the infection model-infection model is part of the rear door function to the model weight, and is currently based on training data poisoning is the most direct and common method of infection target model. Second, the attack-attacker adds the trigger word to the target input and submit it to the model, and the model will predict the input to the target tag. [0...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
Patent Type & Authority Applications(China)
IPC IPC(8): G06F16/36G06F16/33G06N5/04
CPCG06F16/36G06F16/3338G06N5/041
Inventor 邵堃刘辉杨俊安张雨呼鹏江李小帅
Owner NAT UNIV OF DEFENSE TECH
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products