A method and system for visual human-computer interaction based on text generation video robot

What is Al technical title?
Al technical title is built by PatSnap Al team. It summarizes the technical point description of the patent document.
A technology of robot vision and video generation, which is applied in the field of human-computer visual interaction, can solve the problems of poor semantic consistency between text and images, poor convergence of generative confrontation network, and poor image quality, so as to reduce instability, improve visual interaction ability, Improve the effect of diversity

Active Publication Date: 2022-07-05

ZEROPLUS TECH SHANGHAI CO LTD

View PDF5 Cites 0 Cited by

Summary
Abstract
Description
Claims
Application Information

AI Technical Summary
This helps you quickly interpret patents by identifying the three key elements:
Problems solved by technology
Method used
Benefits of technology

Problems solved by technology

[0011] However, there are certain disadvantages in simply generating images from text based on generative confrontation networks.

First of all, the convergence of the generative confrontation network is relatively poor, and the "model collapse" phenomenon may occur during the training process of the generative confrontation network, that is, different noise vectors generate a large number of identical or completely similar images; secondly, through the generative confrontation network to generate graphics There will be a certain degree of randomness, especially for complex semantic information, the quality of the generated image is poor, and the semantic consistency between the text and the image is poor

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Image

Smart Image Click on the blue labels to locate them in the text.

Viewing Examples

Smart Image

Examples

Experimental program

Comparison scheme

Effect test

Embodiment 1

[0039] According to an embodiment of the present invention, a visual human-computer interaction method for a robot based on text generation video is disclosed, refer to figure 2 , including the following steps:

[0040] (1) Obtain the text information and source image to be recognized;

[0041] The text information to be recognized may be directly input text information, or may be text information converted from input voice information.

[0042] The source images can be old photos of people or photos collected on the spot, as a guide for model scene information generation.

[0043] (2) Retrieve the action database according to the text information, and select the action image sequence with the highest matching degree;

[0044] In traditional text-based image generation tasks, text information is processed through text embedding functions, and combined with specific source images in the form of tags. Label-based text information requires a lot of labor costs to label, conta...

Embodiment 2

[0073] According to an embodiment of the present invention, an embodiment of a text-based video robot vision human-computer interaction system is disclosed, refer to figure 2 ,include:

[0074] The data acquisition module is used to acquire the text information and source image to be recognized;

[0075] The semantic information processing module is used to retrieve the action database according to the text information, and select the action image sequence with the highest matching degree;

[0076] The generation module is used to generate the model scene information based on the source image, combine the reference action image sequence obtained by matching, and generate the network model of the video task based on the text, and generate a video / image sequence that satisfies the semantic information and contains the source image scene information.

[0077]In this embodiment, the semantic information processing module mainly includes language information and image information...

Embodiment 3

[0080] According to an embodiment of the present invention, an embodiment of a terminal device is disclosed, which includes a processor and a memory, the processor is used for implementing each instruction; the memory is used for storing a plurality of instructions, the instructions are suitable for being loaded and executed by the processor The robot vision human-computer interaction method based on the text-generated video described in the first embodiment.

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Login to View More

PUM

Login to View More

Abstract

The invention discloses a robot vision human-computer interaction method and system based on text-generated video, comprising: acquiring text information and source images to be identified; retrieving an action database according to the text information, and selecting an action image sequence with the highest matching degree; The image generation model scene information, combined with the reference action image sequence obtained by matching, is based on the network model of the text generation video task, and generates a video / image sequence that satisfies the semantic information and contains the scene information of the source image. The invention combines retrieval and generation model, and proposes a new method for generating video based on text, which is used for robot vision human-computer interaction. The good effect can improve the visual interaction ability of the robot.

Description

technical field [0001] The invention relates to the technical field of human-computer visual interaction, in particular to a method and system for visual human-computer interaction of a robot based on text generation. Background technique [0002] The statements in this section merely provide background information related to the present invention and do not necessarily constitute prior art. [0003] In the process of human-computer interaction, the interaction of visual information is an important means of information interaction. The output of robot visual information is mainly carried out in the form of images and videos. At present, robots mainly interact with people with visual information by playing fixed images and videos. [0004] The task of generating images based on text refers to generating specific images based on text descriptions. The task mainly includes two steps, the first is the acquisition of text information, and the second is the synthesis of images o...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Login to View More

Application Information

Patent Timeline

Login to View More

Patent Type & Authority Patents(China)

IPC IPC(8): G06F16/58G06T17/20G06V30/41G06V10/774

CPCG06F16/5866G06T17/20G06V30/10G06F18/00

Inventor 许庆阳周瑞姜聪宋勇李贻斌张承进袁宪锋庞豹王敏婕

Owner ZEROPLUS TECH SHANGHAI CO LTD

Features

R&D
Intellectual Property
Life Sciences
Materials
Tech Scout

Why Patsnap Eureka

Unparalleled Data Quality
Higher Quality Content
60% Fewer Hallucinations

Social media

Patsnap Eureka Blog

Learn More

Browse by: Latest US Patents, China's latest patents, Technical Efficacy Thesaurus, Application Domain, Technology Topic, Popular Technical Reports.

A method and system for visual human-computer interaction based on text generation video robot

AI Technical Summary This helps you quickly interpret patents by identifying the three key elements: Problems solved by technologyMethod usedBenefits of technology

Problems solved by technology

Method used

Image

Examples

Embodiment 1

Embodiment 2

Embodiment 3

PUM

Abstract

Description

Claims

Application Information

AI Technical Summary
This helps you quickly interpret patents by identifying the three key elements:
Problems solved by technology
Method used
Benefits of technology