A robot large-screen real-time interaction method and system

What is AI technical title?
AI technical title is built by PatSnap AI team. It summarizes the technical point description of the patent document.
By combining UDP and HTTP protocols with the SDXL-Lightning model to process images, the problems of slow response and image generation distortion in robot large-screen interaction are solved, achieving low latency, high real-time performance and high consistency in interactive effects, and reducing system deployment costs.

CN122247978APending Publication Date: 2026-06-19TOTEM VISION (GUANGZHOU) DIGITAL TECH CO LTD

View PDF 0 Cites 0 Cited by

Patent Information

Authority / Receiving Office: CN · China
Patent Type: Applications(China)
Current Assignee / Owner: TOTEM VISION (GUANGZHOU) DIGITAL TECH CO LTD
Filing Date: 2026-02-25
Publication Date: 2026-06-19

Application Information

Patent Timeline

25 Feb 2026

Application

19 Jun 2026

Publication

CN122247978A

IPC: H04L67/02; G06F16/955; G06T11/60; G06T3/04; H04L67/12; H04L65/75

AI Tagging

Application Domain

Geometric image transformation Transmission

Explore More Agents

Novelty Search
Search existing technologies and assess novelty
↗
FTO
Analyze whether a product may infringe others' patents
↗
Design FTO
Check prior-design risk for exterior design
↗
Drafting
Draft patent application text based on a technical solution
↗
Find Solutions with TRIZ
Generate feasible solution to solve your technical challenge
↗

Similar Technology Patents

Get free access to AI patent search and analysis

Check patentability, review prior art and ask IP Agent with full patent context.

AI Technical Summary

Technical Problem

In existing technologies, robot-screen interaction is slow and cannot achieve real-time response. Interactive content is difficult to obtain conveniently, image generation is distorted, and heterogeneous devices lack unified management, resulting in high network latency and high system deployment and maintenance costs.

Method used

The system transmits control commands within the local area network via UDP, uploads interactive images to a public network resource server using the HTTP protocol, and performs real-time image processing using the SDXL-Lightning model to generate interactive videos that incorporate user characteristics. These videos are then displayed via QR codes, enabling unified management and efficient collaboration of the devices.

Benefits of technology

It achieved an improvement of over 80% in device wake-up response time, a reduction in transmission latency to the millisecond level, an image consistency and color matching accuracy of 95% and 90% respectively, a 30% increase in device utilization, and a 90% reduction in deployment costs.

✦ Generated by Eureka AI based on patent content.

Smart Images

Figure CN122247978A_ABST

Patent Text Reader

Abstract

This invention provides a method and system for real-time interactive interaction on a large-screen robot. The method includes: uploading interactive images to a public network resource server via HTTP protocol using a robot; sending UDP control commands to the public network resource server via the robot, and waking up the rendering program on the rendering host based on the UDP control commands; switching the video signal source of the large screen from a standby signal to the rendering output signal of the rendering program; processing the interactive images in real time based on an AIGC (Artificial Intelligence Generated Content) model to generate interactive videos that incorporate user characteristics; uploading the interactive videos to the public network resource server for storage, generating corresponding QR codes, and displaying the QR codes on the large screen. This invention can reduce transmission latency to the millimeter level, unify the management of heterogeneous devices, enable users to easily access interactive content, and make the interactive content more adaptable.

Need to check novelty before this filing date? Find Prior Art

Description

Technical Field

[0001] This invention relates to the field of human-computer interaction technology, and in particular to a method and system for real-time interaction with a robot on a large screen. Background Technology

[0002] In modern multi-functional multimedia exhibition halls and outdoor large-screen environments, collaborative device display and interactive control have become core requirements. Real-time interactive robot screens are suitable for multimedia exhibition halls, outdoor advertising screens, and other scenarios, enabling collaborative device control, real-time image generation, and user interaction.

[0003] In existing technologies, communication between devices within a local area network requires a public network server, resulting in high network latency (typically 100-500ms) and affecting real-time response. In a secure intranet environment, megabyte-sized files cannot be directly shared, making it difficult for users to easily access interactive content. Traditional face-swapping technologies suffer from issues such as blurred edges, texture distortion, and weak scene adaptability, and AIGC content generation is slow, failing to meet millisecond-level real-time interaction requirements. The lack of unified management among heterogeneous devices (such as servers, hosts, robots, and mobile terminals) leads to information silos and high system deployment and maintenance costs. Summary of the Invention

[0004] Based on this, the purpose of the present invention is to provide a real-time interactive method and system for robot large screens, so as to solve the problems of slow robot-screen interaction response, inability to achieve real-time response, difficulty in convenient acquisition of interactive content, distortion of generated images, and difficulty in unified management of various devices in the prior art.

[0005] This invention provides a method for real-time interaction with a robot on a large screen, the method comprising: The robot responds to voice control commands to wake up and acquires interactive images taken by the user under the guidance of the robot. The robot uploads the interactive images to a public network resource server via HTTP protocol. The robot sends UDP control commands to the public network resource server and wakes up the rendering program on the rendering host based on the UDP control commands. Based on the rendering program and the instructions received by the rendering host, the video signal source of the large screen is switched from the standby signal to the rendering output signal of the rendering program; The system receives the rendering output signal and processes the interactive image in real time based on the Artificial Intelligence Generated Content (AIGC) model to generate an interactive video that incorporates user characteristics. The interactive video is uploaded to the public network resource server for storage, and a corresponding QR code is generated and displayed on the large screen.

[0006] Furthermore, the UDP control commands are transmitted within the local area network in the form of IP+PORT+STRING_INFO.

[0007] Furthermore, the AIGC model is the SDXL-Lightning model, which processes images using 6-8 iteration steps.

[0008] Furthermore, the step of processing the interactive image in real time based on the AIGC (Artificial Intelligence Generated Content) model includes: Create a queue of images to be displayed, store the interactive images in the queue of images to be displayed, and sort the interactive images in the queue of images to be displayed; The interactive images in the queue of images to be displayed are preprocessed, recognized, segmented, and generated sequentially to obtain the processed interactive images. The processed interactive image is updated to the result queue for the rendering program to use.

[0009] Furthermore, the specific steps of the image recognition and image segmentation include: The identity features of the people in the interactive images are extracted and identified, and the extracted identity features are customized based on PuLID technology to ensure the consistency of the identity features. The CFG technology is used to regulate the non-diffusion model Flux to guide the generation of feature images consistent with the identity characteristics of the person in question; The feature image is color-corrected to ensure that the skin color of the output person is consistent with the original person's identity features.

[0010] Furthermore, the specific steps to ensure the consistency of identity features include: The system integrates FaceID and InstantID nodes to reconstruct facial contours and fuse facial features, and eliminates face-swapping traces through edge feathering and pixel compensation techniques.

[0011] Secondly, the present invention also provides a real-time interactive system for a robot with a large screen, the system comprising: The response acquisition module is used to respond to voice control commands to wake up the robot and acquire interactive images taken by the user guided by the robot. An upload module is used to upload the interactive image to a public network resource server via HTTP protocol based on the robot. The wake-up module is used to send UDP control commands to the public network resource server through the robot, and wake up the rendering program on the rendering host based on the UDP control commands. The receiving switching module is used to switch the video signal source of the large screen from the standby signal to the rendering output signal of the rendering program based on the instructions received by the rendering program and the rendering host. The receiving and generating module is used to receive the rendering output signal and process the interactive image in real time based on the Artificial Intelligence Generated Content (AIGC) model to generate an interactive video that integrates user features. The storage generation module is used to upload the interactive video to the public network resource server for storage, generate a corresponding QR code, and display the QR code on the large screen.

[0012] Furthermore, the receiving / generating module includes: A creation unit is used to create a queue of images to be displayed, store the interactive images in the queue of images to be displayed, and sort the interactive images in the queue of images to be displayed. The processing unit is used to sequentially perform preprocessing, image recognition, image segmentation and image generation on the interactive images in the queue of images to be displayed, so as to obtain the processed interactive images. An update unit is used to update the processed interactive image to the result queue for the rendering program to call.

[0013] Thirdly, the present invention also provides an electronic device, including a memory, a processor, and a computer program stored in the memory and executable on the processor, wherein the processor executes the computer program to implement the above-described real-time interactive method for a robot large screen.

[0014] Fourthly, the present invention also provides a storage medium storing a computer program thereon, which, when executed by a processor, implements the above-described real-time interactive method for a robot large screen.

[0015] Compared with the prior art, the beneficial effects of the present invention are: 1. By using UDP control commands, transmission latency can be reduced to the millimeter level, thereby improving device wake-up response time by more than 80%; it can also unify the management of heterogeneous devices and improve scenario adaptability. 2. Intranet transmission is carried out via HTTP protocol, resource files are shared through public network resource servers, and interactive content can be easily accessed by generating scannable QR codes. 3. By processing interactive images using an AIGC (Artificial Intelligence Generated Content) model, not only can the consistency of character identity features reach over 95%, but the color matching accuracy can also exceed 90%, making the interactive content more adaptable. Attached Figure Description

[0016] Figure 1 This is a flowchart of the robot large screen real-time interaction method in the first embodiment of the present invention; Figure 2 This is a structural block diagram of the robot large-screen real-time interactive system according to the second embodiment of the present invention; Figure 3 This is a schematic diagram of the structure of the electronic device in the third embodiment of the present invention.

[0017] Explanation of key component symbols: 10. Response Acquisition Module; 20. Upload Module; 30. Send Wake-up Module; 40. Receive Switching Module; 50. Receive Generation Module; 60. Store Generation Module; 70. Bus; 71. Processor; 72. Memory; 73. Communication interface.

[0018] The following detailed description, in conjunction with the accompanying drawings, will further illustrate the present invention. Detailed Implementation

[0019] To facilitate understanding of the present invention, a more complete description will be given below with reference to the accompanying drawings. Several embodiments of the invention are illustrated in the drawings. However, the invention can be implemented in many different forms and is not limited to the embodiments described herein. Rather, these embodiments are provided so that this disclosure will be thorough and complete.

[0020] It should be noted that when a component is said to be "fixed to" another component, it can be directly on the other component or there may be an intervening component. When a component is said to be "connected to" another component, it can be directly connected to the other component or there may be an intervening component. The terms "vertical," "horizontal," "left," "right," and similar expressions used in this document are for illustrative purposes only.

[0021] Unless otherwise defined, all technical and scientific terms used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this invention pertains. The terminology used herein in the description of the invention is for the purpose of describing particular embodiments only and is not intended to be limiting of the invention. The term "and / or" as used herein includes any and all combinations of one or more of the associated listed items.

[0022] Example 1 Please see Figure 1 The image shows a real-time interactive method for a robot on a large screen according to the first embodiment of the present invention, the method comprising steps S1 to S6: S1, respond to voice control commands to wake up the robot and acquire interactive images taken by the user guided by the robot; Understandably, the robot is activated by a simple voice control command, initiating the entire process. The robot is responsible for guiding the human interaction points to begin the interaction and capturing the interactive images generated during the process. It is worth noting that when the user issues a voice command, the robot responds and guides the user to the correct position, and then uploads the captured image to cloud storage via a RESTful API.

[0023] S2, the robot uploads the interactive image to a public network resource server via HTTP protocol; It is understandable that by using the HTTP protocol to transmit and upload interactive images, these images can be stored on public network resource servers.

[0024] S3, the robot sends a UDP control command to the public network resource server and wakes up the rendering program on the rendering host based on the UDP control command; It should be noted that the UDP control commands are transmitted within the local area network in the form of IP+PORT+STRING_INFO, thereby controlling the device's central control server and waking up the rendering program on the rendering host.

[0025] S4, based on the instructions received by the rendering program and the rendering host, the video signal source of the large screen is switched from the standby signal to the rendering output signal of the rendering program; It should be noted that after receiving the user's instructions, the instruction and public network resource server convert the welcome screen signal into a rendering host video signal.

[0026] S5, receive the rendering output signal, and process the interactive image in real time based on the Artificial Intelligence Generated Content (AIGC) model to generate an interactive video that integrates user characteristics; Furthermore, the AIGC model is the SDXL-Lightning model, and the SDXL-Lightning model processes images using 6-8 iteration steps; It should be explained that by replacing the standard SDXL model with the SDXL-Lightning model, the number of sampling iterations required for image generation is reduced from 20-30 steps to 6-8 steps, thereby compressing the processing time for a single image to 25-35 seconds.

[0027] Specifically, step S5 further includes steps S51 to S53: S51, create a queue of images to be displayed, store the interactive images in the queue of images to be displayed, and sort the interactive images in the queue of images to be displayed; S52, the interactive images in the queue of images to be displayed are preprocessed, recognized, segmented and generated in sequence to obtain the processed interactive images; S53, update the processed interactive image to the result queue for the rendering program to call; It needs to be explained that a queue is created to store images to be displayed. The images to be processed are read from the queue, and further image processing is performed on the read images. The processed images are then updated in the image result queue. The images are recognized, and the subject and background are segmented. The identity features of the person are extracted and recognized, and the extracted feature maps are repaired to achieve person ID infilling. Commercial image information and person IDs are integrated to perform face fusion to achieve person ID resampling fusion.

[0028] Specifically, step S52 includes steps S521 and S523: S521, Extract and identify the identity features of the people in the interactive image, and customize the extracted identity features based on PuLID technology to ensure the consistency of the identity features. S522, CFG technology is used to regulate the non-diffusion model Flux to guide the generation of a feature image consistent with the identity characteristics of the person; S523, Perform color correction on the feature image to ensure that the output skin color of the person is consistent with the original person's identity features; It needs to be explained that the process involves image recognition, segmentation of the subject and background, extraction and identification of the subject's identity features, and restoration of the extracted feature map to achieve subject ID in-situ fusion. Commercial image information is then integrated with the subject ID for facial fusion to achieve subject ID resampling fusion. In this embodiment, CFG technology is used to regulate the non-diffusion model Flux, guiding the generation of consistent subject features. PuLID technology is used to customize the subject image to ensure consistency of identity features. Finally, color correction is applied to ensure that the final output subject's skin tone matches the subject ID. The processed images are managed and prepared through an image display mini-program. It should be noted that maintaining a high degree of consistency in identity features is difficult during subject image processing, easily leading to facial feature distortion or mismatch. By using CFG technology to regulate the non-diffusion model Flux in combination with PuLID technology, consistency of subject features is ensured throughout the entire processing flow.

[0029] Specifically, to maintain consistency in identity features, the SDXL-Lightning model automatically analyzes the facial features of the target portrait, accurately locating key areas such as the eyes, nose, and mouth, and generating a facial contour mask. This process simultaneously calculates the proportion of the facial region within the overall image. If the difference between the target portrait's facial region proportion and the template exceeds a threshold (e.g., ±20%), an intelligent scaling mechanism is activated. When the template facial region is too small, the SDXL-Lightning model proportionally enlarges the target face while maintaining pixel clarity through bicubic interpolation to avoid stretching and distortion. If the template facial region is too large, the system uses intelligent cropping technology to shrink the target face while preserving the relative positions of key facial feature points.

[0030] It should be noted that face detection and alignment are also performed on the face-swapping template image (i.e., the face position in the target body / scene to be replaced). Based on the pose, scale, and lighting direction of the template face, the geometry of the target facial region is dynamically adjusted (e.g., through latent space editing of the 3DMM model) so that it initially matches the spatial layout of the template face in terms of angle, size, and expression, in order to reduce geometric inconsistencies during subsequent fusion.

[0031] Furthermore, step S521 includes step S5211: S5211 integrates FaceID and InstantID nodes to reconstruct facial contours and fuse facial features, and eliminates face-swapping traces through edge feathering and pixel compensation technology. Understandably, the InstantID node uses a feature transfer algorithm to map the facial features, textures, shapes, and proportions of the target person to the template face. Simultaneously, it dynamically adjusts the light and shadow transitions and three-dimensional structure of the facial features by combining the template's original lighting and facial expression characteristics. Furthermore, the InstantID model automatically detects the connection points between the facial features and the surrounding skin and hair, eliminating face-swapping artifacts through edge feathering and pixel compensation techniques, ensuring a natural transition between the facial features and the template background. Moreover, by integrating FaceID and InstantID, it ensures that the facial contours after face-swapping strictly match the characteristics of the target person. It's worth noting that a color transfer algorithm adaptively adjusts the color of the face area after face-swapping: first, it analyzes the overall color distribution of the template image, extracting parameters such as the dominant hue and color temperature; then, based on the inherent color features of the target face's skin and hair, it unifies the color space of the target face with the template through histogram matching and linear transformation. Meanwhile, the intelligent color adjustment engine of the SDXL-Lightning model is used to fine-tune the color transition in the face-swapping area, ensuring that the face and background are consistent in terms of light and shadow and color temperature, eliminating any sense of disharmony, and ultimately outputting a visually harmonious image.

[0032] S6. Upload the interactive video to the public network resource server for storage, generate a corresponding QR code, and display the QR code on the large screen; It should be noted that the calculation results are uploaded to a resource library on a public cloud service for storage, and a QR code is provided and displayed on a large screen for users to download. After the program is closed, the video signal on the large screen automatically switches back to standby mode. The server is shut down, and the screen signal is converted to a standby carousel video signal. In other words, the host program is closed, and the welcome screen signal is converted to a standby carousel video signal.

[0033] In summary, the robot large-screen real-time interaction method in the above embodiments of the present invention offers the following advantages: low latency and high real-time performance: intranet UDP command transmission latency is reduced to milliseconds, and device wake-up response time is improved by more than 80%; security and convenience are balanced: control signals are transmitted on the intranet, resource files are shared through cloud storage, and users can easily access content via QR codes; image quality is improved: through multi-stage AI processing, the consistency of human identity features reaches more than 95%, and the color matching degree exceeds 90%; efficient system collaboration: it supports unified management of heterogeneous devices, increasing device utilization by 30% and reducing deployment costs by 90%; and strong scalability: the modular architecture supports multi-scenario adaptation, and subsequent function upgrades are convenient.

[0034] Example 2 This invention also provides a real-time interactive system for a robot with a large screen; please refer to [link / reference]. Figure 2 The image shows a robot large-screen real-time interactive system according to a second embodiment of the present invention. The system includes: The response acquisition module 10 is used to respond to voice control commands to wake up the robot and acquire interactive images taken by the user guided by the robot. Upload module 20 is used to upload the interactive image to a public network resource server via HTTP protocol based on the robot; The wake-up module 30 is used to send UDP control commands to the public network resource server through the robot, and wake up the rendering program on the rendering host based on the UDP control commands. The UDP control commands are transmitted within the local area network in the form of IP+PORT+STRING_INFO. The receiving switching module 40 is used to switch the video signal source of the large screen from the standby signal to the rendering output signal of the rendering program based on the instructions received by the rendering program and the rendering host. The receiving and generating module 50 is used to receive the rendering output signal and process the interactive image in real time based on the Artificial Intelligence Generated Content (AIGC) model to generate an interactive video that integrates user features. The AIGC model for generating content is the SDXL-Lightning model, which processes images using 6-8 iteration steps. The storage generation module 60 is used to upload the interactive video to the public network resource server for storage, generate a corresponding QR code, and display the QR code on the large screen.

[0035] In some alternative embodiments, the receiving and generating module 50 includes: A creation unit is used to create a queue of images to be displayed, store the interactive images in the queue of images to be displayed, and sort the interactive images in the queue of images to be displayed. The processing unit is used to sequentially perform preprocessing, image recognition, image segmentation and image generation on the interactive images in the queue of images to be displayed, so as to obtain the processed interactive images. An update unit is used to update the processed interactive image to the result queue for the rendering program to call.

[0036] In some alternative embodiments, the processing unit includes: An extraction and recognition subunit is used to extract and recognize the identity features of people in the interactive image, and to customize the extracted identity features based on PuLID technology to ensure the consistency of the identity features. A guided generation subunit is used to use CFG technology to regulate the non-diffusion model Flux to guide the generation of a feature image consistent with the identity characteristics of the person. The correction output subunit is used to perform color correction on the feature image so that the skin color of the output person is consistent with the original person's identity features.

[0037] In some alternative embodiments, the extraction and identification subunit includes: The integrated subunit is used to integrate the reconstruction of facial contours and fusion of facial features using FaceID and InstantID nodes, and eliminates face-swapping traces through edge feathering and pixel compensation technology.

[0038] The functions or operation steps implemented by the above modules and units are largely the same as those in the above method embodiments, and will not be repeated here.

[0039] The robot large screen real-time interactive system provided in this embodiment of the invention has the same implementation principle and technical effects as the aforementioned method embodiment. For the sake of brevity, any parts not mentioned in the system embodiment can be referred to the corresponding content in the aforementioned method embodiment.

[0040] Example 3 The present invention also proposes an electronic device, please refer to [link to relevant documentation]. Figure 3 The image shows an electronic device according to a third embodiment of the present invention.

[0041] The electronic device may include a processor 71 and a memory 72 storing computer program instructions.

[0042] Specifically, the processor 71 may include a central processing unit (CPU), an application specific integrated circuit (ASIC), or one or more integrated circuits that can be configured to implement this application.

[0043] The memory 72 may include a mass storage device for data or instructions. For example, and not limitingly, the memory 72 may include a hard disk drive (HDD), a floppy disk drive, a solid-state drive (SSD), flash memory, an optical disk drive, a magneto-optical disk drive, magnetic tape, or a Universal Serial Bus (USB) drive, or a combination of two or more of these. Where appropriate, the memory 72 may include removable or non-removable (or fixed) media. Where appropriate, the memory 72 may be internal or external to a data processing device. In a particular embodiment, the memory 72 is non-volatile memory. In a particular embodiment, the memory 72 includes read-only memory (ROM) and random access memory (RAM). Where appropriate, the ROM may be a mask-programmed ROM, a programmable read-only memory (PROM), an erasable programmable read-only memory (EPROM), an electrically erasable programmable read-only memory (EEPROM), an electrically alterable read-only memory (EAROM), or flash memory, or a combination of two or more of these. Where appropriate, the RAM can be Static Random-Access Memory (SRAM) or Dynamic Random-Access Memory (DRAM). DRAM can be Fast Page Mode Dynamic Random-Access Memory (FPMDRAM), Extended Data Out Dynamic Random-Access Memory (EDODRAM), Synchronous Dynamic Random-Access Memory (SDRAM), etc.

[0044] The memory 72 can be used to store or cache various data files that need to be processed and / or communicated, as well as possible computer program instructions executed by the processor 71.

[0045] The processor 71 reads and executes the computer program instructions stored in the memory 72 to implement the real-time interactive robot screen method of Embodiment 1 described above.

[0046] In some embodiments, the electronic device may further include a communication interface 73 and a bus 70. For example, Figure 3 As shown, the processor 71, memory 72, and communication interface 73 are connected through bus 70 and complete communication with each other.

[0047] The communication interface 73 is used to enable communication between the various modules, devices, units, and / or equipment in this application. The communication interface 73 can also enable data communication with other components such as external devices, image / data acquisition devices, databases, external storage, and image / data processing workstations.

[0048] Bus 70 includes hardware, software, or both, that couples components of a device together. Bus 70 includes, but is not limited to, at least one of the following: data bus, address bus, control bus, expansion bus, and local bus. For example, and not as a limitation, bus 70 may include an Accelerated Graphics Port (AGP) or other graphics bus, an Extended Industry Standard Architecture (EISA) bus, a Front Side Bus (FSB), a Hyper Transport (HT) interconnect, an Industry Standard Architecture (ISA) bus, an InfiniBand interconnect, a Low Pin Count (LPC) bus, a memory bus, a Micro Channel Architecture (MCA) bus, a Peripheral Component Interconnect (PCI) bus, a PCI-Express (PCI-X) bus, a Serial Advanced Technology Attachment (SATA) bus, a Video Electronics Standards Association Local Bus (VLB) bus, or other suitable buses, or a combination of two or more of these. Where appropriate, bus 70 may include one or more buses. Although this application describes and illustrates a specific bus, this application considers any suitable bus or interconnection.

[0049] The electronic device can access the robot's large screen real-time interactive system and execute the robot's large screen real-time interactive method of this embodiment.

[0050] Furthermore, in conjunction with the real-time interactive robot screen method in Embodiment 1 above, this application can provide a storage medium for implementation. This storage medium stores computer program instructions; when these computer program instructions are executed by a processor, they implement the real-time interactive robot screen method of Embodiment 1 above.

[0051] In the description of this specification, references to terms such as "one embodiment," "some embodiments," "example," "specific example," or "some examples," etc., indicate that a specific feature, structure, material, or characteristic described in connection with that embodiment or example is included in at least one embodiment or example of the invention. In this specification, the illustrative expressions of the above terms do not necessarily refer to the same embodiment or example. Furthermore, the specific features, structures, materials, or characteristics described may be combined in any suitable manner in one or more embodiments or examples.

[0052] The embodiments described above are merely illustrative of several implementations of the present invention, and while the descriptions are specific and detailed, they should not be construed as limiting the scope of the present invention. It should be noted that those skilled in the art can make various modifications and improvements without departing from the concept of the present invention, and these modifications and improvements all fall within the scope of protection of the present invention. Therefore, the scope of protection of this patent should be determined by the appended claims.

Claims

1. A robot large screen real-time interaction method, characterized in that, The method includes: The robot responds to voice control commands to wake up and acquires interactive images taken by the user under the guidance of the robot. The robot uploads the interactive images to a public network resource server via HTTP protocol. The robot sends UDP control commands to the public network resource server and wakes up the rendering program on the rendering host based on the UDP control commands. Based on the rendering program and the instructions received by the rendering host, the video signal source of the large screen is switched from the standby signal to the rendering output signal of the rendering program; The system receives the rendering output signal and processes the interactive image in real time based on the Artificial Intelligence Generated Content (AIGC) model to generate an interactive video that incorporates user characteristics. The interactive video is uploaded to the public network resource server for storage, and a corresponding QR code is generated and displayed on the large screen. 2.The robot large-screen real-time interaction method according to claim 1, characterized in that, The UDP control commands are transmitted within the local area network in the form of IP+PORT+STRING_INFO. 3.The robot large-screen real-time interaction method according to claim 1, characterized in that, The AIGC (Artificial Intelligence Generated Content) model is the SDXL-Lightning model, which processes images using 6-8 iteration steps. 4.The robot large-screen real-time interaction method according to claim 1, characterized in that, The step of processing the interactive image in real time based on the AIGC (Artificial Intelligence Generated Content) model includes: Create a queue of images to be displayed, store the interactive images in the queue of images to be displayed, and sort the interactive images in the queue of images to be displayed; The interactive images in the queue of images to be displayed are preprocessed, recognized, segmented, and generated sequentially to obtain the processed interactive images. The processed interactive image is updated to the result queue for the rendering program to use.

5. The method of claim 4, wherein, The specific steps of the image recognition and image segmentation include: The identity features of the people in the interactive images are extracted and identified, and the extracted identity features are customized based on PuLID technology to ensure the consistency of the identity features. The CFG technology is used to regulate the non-diffusion model Flux to guide the generation of feature images consistent with the identity characteristics of the person in question; The feature image is color-corrected to ensure that the skin color of the output person is consistent with the original person's identity features.

6. The real-time interactive method for a robot large screen according to claim 5, characterized in that, The specific steps to ensure the consistency of identity features include: The system integrates FaceID and InstantID nodes to reconstruct facial contours and fuse facial features, and eliminates face-swapping traces through edge feathering and pixel compensation techniques.

7. A robot large-screen real-time interactive system, characterized in that, The system includes: The response acquisition module is used to respond to voice control commands to wake up the robot and acquire interactive images taken by the user guided by the robot. An upload module is used to upload the interactive image to a public network resource server via HTTP protocol based on the robot. The wake-up module is used to send UDP control commands to the public network resource server through the robot, and wake up the rendering program on the rendering host based on the UDP control commands. The receiving switching module is used to switch the video signal source of the large screen from the standby signal to the rendering output signal of the rendering program based on the instructions received by the rendering program and the rendering host. The receiving and generating module is used to receive the rendering output signal and process the interactive image in real time based on the Artificial Intelligence Generated Content (AIGC) model to generate an interactive video that integrates user features. The storage generation module is used to upload the interactive video to the public network resource server for storage, generate a corresponding QR code, and display the QR code on the large screen.

8. The robot large-screen real-time interactive system according to claim 7, characterized in that, The receiving / generating module includes: A creation unit is used to create a queue of images to be displayed, store the interactive images in the queue of images to be displayed, and sort the interactive images in the queue of images to be displayed. The processing unit is used to sequentially perform preprocessing, image recognition, image segmentation and image generation on the interactive images in the queue of images to be displayed, so as to obtain the processed interactive images. An update unit is used to update the processed interactive image to the result queue for the rendering program to call.

9. An electronic device comprising a memory, a processor, and a computer program stored in the memory and executable on the processor, characterized in that, When the processor executes the computer program, it implements the real-time interactive method for a robot large screen as described in any one of claims 1 to 6.

10. A storage medium having a computer program stored thereon, characterized in that, When the program is executed by the processor, it implements the real-time interactive method for a robot large screen as described in any one of claims 1 to 6.