Visual model training and video processing method and device, equipment and storage medium

What is AI technical title?
AI technical title is built by PatSnap AI team. It summarizes the technical point description of the patent document.
A visual model and training method technology, applied in the field of artificial intelligence, can solve the problems of low accuracy of similar videos and large differences in visual features, and achieve the effect of solving scene conversion problems, high feature discrimination, and high discrimination

Pending Publication Date: 2022-04-05

TENCENT TECH (SHENZHEN) CO LTD

View PDF0 Cites 3 Cited by

Summary
Abstract
Description
Claims
Application Information

AI Technical Summary
This helps you quickly interpret patents by identifying the three key elements:
Problems solved by technology
Method used
Benefits of technology

Problems solved by technology

However, in the case of cropping and zooming video frames, the visual features extracted by the above method are quite different, so when the video similarity is judged based on the features of the video frame, it is easy to mistakenly judge similar videos as dissimilar videos , resulting in lower accuracy of similar video detection

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Image

Smart Image Click on the blue labels to locate them in the text.

Viewing Examples

Smart Image

Examples

Experimental program

Comparison scheme

Effect test

Embodiment approach 1

[0176] Embodiment 1. Using a pre-trained visual model, feature extraction is performed on each sample video frame to obtain the target sample visual features corresponding to each sample video frame. Then, based on the obtained visual features of the target samples, the predicted video categories corresponding to each sample video frame are respectively predicted. Based on the predicted video categories corresponding to each sample video frame in the selected sample video frame set, a corresponding second loss function is obtained, and then the parameters of the pre-trained visual model are adjusted by using the second loss function.

[0177] Specifically, since each set of sample video frames in the sample data corresponds to different sample videos, the predicted video categories of sample video frames corresponding to the same sample video should be the same, while the predicted video categories of sample video frames corresponding to different sample videos The categories ...

Embodiment approach 2

[0184] Embodiment 2: Using a pre-trained visual model, feature extraction is performed on each sample video frame, and the target sample visual features corresponding to each sample video frame are obtained. Then, based on the obtained visual features of the target samples, the predicted video categories corresponding to each sample video frame are respectively predicted. Based on the predicted video category corresponding to each sample video frame in the selected sample video frame set, a corresponding second loss function is obtained.

[0185] Based on the target sample visual features corresponding to each sample video frame, the prediction canvas area corresponding to each sample video frame is determined. Then, the fourth loss function is determined based on the prediction canvas area corresponding to each sample video frame and the reference canvas area corresponding to each sample video frame. The second loss function and the fourth loss function are used to adjust th...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Login to View More

PUM

Login to View More

Abstract

The embodiment of the invention provides a visual model training and video processing method and device, equipment and a storage medium, and relates to the technical field of artificial intelligence, and the method comprises the steps: carrying out the training of a to-be-trained visual model through combining a pre-training mode and a fine-tuning training mode, and obtaining a target visual model. In each iteration pre-training, a first loss function is obtained based on the positive sample visual features of each sample video frame in the sample video frame set and the negative sample visual features of other sample video frames used in the historical iteration pre-training. In each iteration fine tuning training, the second loss function is obtained based on the predicted video category corresponding to each sample video frame in the sample video frame set, so that the feature representation force of the target visual model is stronger, and the feature discrimination degree is higher. When the visual features of the to-be-processed video frame are extracted by using the target visual model and video similarity judgment is performed based on the visual features, the accuracy of video similarity judgment can be effectively improved.

Description

technical field [0001] The embodiments of the present invention relate to the technical field of artificial intelligence, and in particular to a visual model training and video processing method, device, device, and storage medium. Background technique [0002] With the development of Internet technology, various video applications emerge in an endless stream. Target objects can obtain video content from various video applications, and can also upload and share video content through video applications. Since the video content library corresponds to a large number of video sources, a large amount of duplicate video content often exists in the video content library. In the video recommendation scenario, it is easy to recommend repeated video content to users, which affects the video recommendation effect. [0003] In related technologies, when discriminating similar videos of a video, a hash algorithm (such as an average value hash algorithm AHash) is used to perform hash con...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Login to View More

Application Information

Patent Timeline

Login to View More

Patent Type & AuthorityApplications(China)

IPC IPC(8): G06V20/40G06V10/74G06V10/774G06V10/764G06K9/62G06N20/00

Inventor李明达郑镇鑫

OwnerTENCENT TECH (SHENZHEN) CO LTD

Visual model training and video processing method and device, equipment and storage medium

AI Technical Summary This helps you quickly interpret patents by identifying the three key elements: Problems solved by technologyMethod usedBenefits of technology

Problems solved by technology

Method used

Image

Examples

Embodiment approach 1

Embodiment approach 2

PUM

Abstract

Description

Claims

Application Information

AI Technical Summary
This helps you quickly interpret patents by identifying the three key elements:
Problems solved by technology
Method used
Benefits of technology