A fake video detection method based on Transformer

What is Al technical title?
Al technical title is built by PatSnap Al team. It summarizes the technical point description of the patent document.
A video detection and video technology, applied in the field of deepfake detection, can solve problems such as poor generalization, achieve the effect of improving accuracy and avoiding poor generalization performance

Active Publication Date: 2022-08-05

SHANDONG ARTIFICIAL INTELLIGENCE INST +1

View PDF1 Cites 0 Cited by

Summary
Abstract
Description
Claims
Application Information

AI Technical Summary
This helps you quickly interpret patents by identifying the three key elements:
Problems solved by technology
Method used
Benefits of technology

Problems solved by technology

Traditional deepfake detection algorithms are not suitable for detecting fake videos synthesized by improved generation techniques, and their generalization is poor

Therefore, Deepfake detection faces new challenges and needs to be further improved

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Image

Smart Image Click on the blue labels to locate them in the text.

Viewing Examples

Smart Image

Examples

Experimental program

Comparison scheme

Effect test

Embodiment 1

[0040] In step a), use the video reading algorithm VideoReader class in python to extract the video to obtain t consecutive video frames, and use the get_frontal_face_detector function in the face recognition algorithm dlib library to extract the face image for the extracted video frame, and put the obtained face on the Enter the video folder, and obtain t face images of consecutive frames in the video folder.

Embodiment 2

[0042] The width and height of the t face images of the consecutive frames obtained in step a) are adjusted to 224 and 224 respectively, and the average value is [0.4718, 0.3467, 0.3154], and the variance is [0.1656, 0.1432, 0.1364] to normalize the face images. Unify, encapsulate the normalized t face images of consecutive frames into a tensor x of [b,t,c,h,w] i ∈R b×t×c×h×w , R is a vector space, where the video labels are [b, 0 / 1], x i is the i-th video batch, i∈{1,…,K / b}, b is the number of videos in each batch, c is the number of channels of each face image, h is the height of each face image, w is the width of each face image, 0 means fake video, 1 means real video.

Embodiment 3

[0044] Step b) includes the following steps:

[0045] b-1) Establish a feature extraction module composed of five consecutive blocks. The first block, the second block, and the third block are all composed of three consecutive convolutional layers and a maximum pooling layer. The third block The block and the fourth block are both composed of four consecutive convolutional layers and a maximum pooling layer, each convolutional layer is set with a 3×3 kernel, the stride and padding of each convolutional layer are 1, and each convolutional layer is set with a 3×3 kernel. Each max-pooling layer has a window of 2 × 2 pixels, the stride of each max-pooling layer is equal to 2, the first convolutional layer of the first block has 32 channels, and the fourth block of the fifth block has 32 channels. Each convolutional layer has 512 channels.

[0046] b-2) put x i ∈R b×t×c×h×w After the dimension is transformed to [b*t, c, h, w], input the feature extraction module, and the output ...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Login to View More

PUM

Login to View More

Abstract

A transformer-based fake video detection method extracts global spatial features by using the spatial vision transformer model for the face image of a continuous frame of video, which avoids the traditional detection method only extracts local features and leads to poor generalization performance. There is inconsistency in the time series, so the global temporal features are further captured by the temporal visual transformer model, so that the spatial features and temporal features are combined to improve the accuracy of detection, which is suitable for deepfake detection generated by various improved generation algorithms. The accuracy is significantly better than other methods.

Description

technical field [0001] The invention relates to the technical field of Deepfake detection methods, in particular to a transformer-based fake video detection method. Background technique [0002] Deepfake is to use deep learning-based technology Autoencoder, GAN and other deep learning algorithms to replace the face in the source video with the face of the target video. So far, a large number of deepfake videos have been circulated on the Internet. These videos are usually used to damage the reputation of celebrities, guide public opinion, and greatly threaten social stability. At present, the commonly used deepfake detection methods include transfer learning and attention mechanism. The above detection methods are designed based on fake videos with obvious fake visual artifacts, and only have high detection performance on internal data sets with the same manipulation algorithm. Poor chemical properties. Detection methods using the attention mechanism can capture the relati...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Login to View More

Application Information

Patent Timeline

Login to View More

Patent Type & Authority Patents(China)

IPC IPC(8): G06V20/40G06V40/16G06V10/764G06V10/82G06K9/62G06N3/04G06N3/08

CPCG06N3/08G06N3/045G06F18/2415

Inventor 王英龙张亚宁舒明雷陈达刘丽孔祥龙

Owner SHANDONG ARTIFICIAL INTELLIGENCE INST

A fake video detection method based on Transformer

AI Technical Summary This helps you quickly interpret patents by identifying the three key elements: Problems solved by technologyMethod usedBenefits of technology

Problems solved by technology

Method used

Image

Examples

Embodiment 1

Embodiment 2

Embodiment 3

PUM

Abstract

Description

Claims

Application Information

AI Technical Summary
This helps you quickly interpret patents by identifying the three key elements:
Problems solved by technology
Method used
Benefits of technology