A method and system for visual human-computer interaction based on text generation video robot

A technology of robot vision and video generation, which is applied in the field of human-computer visual interaction, can solve the problems of poor semantic consistency between text and images, poor convergence of generative confrontation network, and poor image quality, so as to reduce instability, improve visual interaction ability, Improve the effect of diversity

Active Publication Date: 2022-07-05
ZEROPLUS TECH SHANGHAI CO LTD
View PDF5 Cites 0 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0011] However, there are certain disadvantages in simply generating images from text based on generative confrontation networks.
First of all, the convergence of the generative confrontation network is relatively poor, and the "model collapse" phenomenon may occur during the training process of the generative confrontation network, that is, different noise vectors generate a large number of identical or completely similar images; secondly, through the generative confrontation network to generate graphics There will be a certain degree of randomness, especially for complex semantic information, the quality of the generated image is poor, and the semantic consistency between the text and the image is poor

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • A method and system for visual human-computer interaction based on text generation video robot
  • A method and system for visual human-computer interaction based on text generation video robot
  • A method and system for visual human-computer interaction based on text generation video robot

Examples

Experimental program
Comparison scheme
Effect test

Embodiment 1

[0039] According to an embodiment of the present invention, a visual human-computer interaction method for a robot based on text generation video is disclosed, refer to figure 2 , including the following steps:

[0040] (1) Obtain the text information and source image to be recognized;

[0041] The text information to be recognized may be directly input text information, or may be text information converted from input voice information.

[0042] The source images can be old photos of people or photos collected on the spot, as a guide for model scene information generation.

[0043] (2) Retrieve the action database according to the text information, and select the action image sequence with the highest matching degree;

[0044] In traditional text-based image generation tasks, text information is processed through text embedding functions, and combined with specific source images in the form of tags. Label-based text information requires a lot of labor costs to label, conta...

Embodiment 2

[0073] According to an embodiment of the present invention, an embodiment of a text-based video robot vision human-computer interaction system is disclosed, refer to figure 2 ,include:

[0074] The data acquisition module is used to acquire the text information and source image to be recognized;

[0075] The semantic information processing module is used to retrieve the action database according to the text information, and select the action image sequence with the highest matching degree;

[0076] The generation module is used to generate the model scene information based on the source image, combine the reference action image sequence obtained by matching, and generate the network model of the video task based on the text, and generate a video / image sequence that satisfies the semantic information and contains the source image scene information.

[0077]In this embodiment, the semantic information processing module mainly includes language information and image information...

Embodiment 3

[0080] According to an embodiment of the present invention, an embodiment of a terminal device is disclosed, which includes a processor and a memory, the processor is used for implementing each instruction; the memory is used for storing a plurality of instructions, the instructions are suitable for being loaded and executed by the processor The robot vision human-computer interaction method based on the text-generated video described in the first embodiment.

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention discloses a robot vision human-computer interaction method and system based on text-generated video, comprising: acquiring text information and source images to be identified; retrieving an action database according to the text information, and selecting an action image sequence with the highest matching degree; The image generation model scene information, combined with the reference action image sequence obtained by matching, is based on the network model of the text generation video task, and generates a video / image sequence that satisfies the semantic information and contains the scene information of the source image. The invention combines retrieval and generation model, and proposes a new method for generating video based on text, which is used for robot vision human-computer interaction. The good effect can improve the visual interaction ability of the robot.

Description

technical field [0001] The invention relates to the technical field of human-computer visual interaction, in particular to a method and system for visual human-computer interaction of a robot based on text generation. Background technique [0002] The statements in this section merely provide background information related to the present invention and do not necessarily constitute prior art. [0003] In the process of human-computer interaction, the interaction of visual information is an important means of information interaction. The output of robot visual information is mainly carried out in the form of images and videos. At present, robots mainly interact with people with visual information by playing fixed images and videos. [0004] The task of generating images based on text refers to generating specific images based on text descriptions. The task mainly includes two steps, the first is the acquisition of text information, and the second is the synthesis of images o...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
Patent Type & Authority Patents(China)
IPC IPC(8): G06F16/58G06T17/20G06V30/41G06V10/774
CPCG06F16/5866G06T17/20G06V30/10G06F18/00
Inventor 许庆阳周瑞姜聪宋勇李贻斌张承进袁宪锋庞豹王敏婕
Owner ZEROPLUS TECH SHANGHAI CO LTD
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products