Intelligent interaction-based interactive video generation method and related device based on multi-agent cooperation
By generating personalized interactive videos through multi-agent collaboration, the problem of monotonous interactive forms in traditional video content is solved, enabling a highly flexible interactive experience that matches user needs and improving user satisfaction.
Patent Information
- Authority / Receiving Office
- CN · China
- Patent Type
- Patents(China)
- Current Assignee / Owner
- BEIJING BAIDU NETCOM SCI & TECH CO LTD
- Filing Date
- 2025-03-24
- Publication Date
- 2026-06-23
AI Technical Summary
Traditional video content has a limited interactive format, allowing users to passively watch without a deep interactive experience, making it difficult to meet personalized needs.
A multi-agent collaboration approach is adopted, in which the master agent determines user needs, controls the copywriting, interaction options, storyboard, and material generation agents to generate personalized interactive videos, utilizes the copywriting agent to generate video copy that matches user needs, the interaction option generation agent and the storyboard agent to plan interaction options and storyboards, and the material generation agent to provide video frame materials, and finally generates an interactive video that matches user needs.
It enables personalized and diverse interactive video generation, improving user engagement and interactive experience, and enhancing user satisfaction.
Smart Images

Figure CN120281978B_ABST
Abstract
Description
Technical Field
[0001] This disclosure relates to the field of data processing technology, specifically to the fields of artificial intelligence technology such as large language models, generative models, intelligent agents, and interactive videos, and particularly to a method, apparatus, electronic device, computer-readable storage medium, and computer program product for generating intelligent interactive videos based on multi-agent collaboration. Background Technology
[0002] With the continuous development of digital media technology, video has become an important way for users to obtain information, entertainment, and socialize. However, traditional video content interaction is relatively simple, and users can usually only passively watch videos, lacking in-depth interactive experiences and failing to meet users' needs for personalized content exploration.
[0003] Existing interactive videos mainly rely on pre-set branching storylines or specific interactive buttons, allowing users to make choices within a limited range of options, making it difficult to achieve a personalized and highly flexible interactive experience. Summary of the Invention
[0004] This disclosure presents an interactive video generation method, apparatus, electronic device, computer-readable storage medium, and computer program product based on multi-agent collaboration and intelligent interaction.
[0005] In a first aspect, embodiments of this disclosure propose an interactive video generation method based on multi-agent collaboration, comprising: determining user needs based on the interaction between the target user and the initial video content; controlling a preset script agent to generate interactive video scripts matching the user needs; controlling a preset interaction option generation agent and a storyboard agent to generate interaction options and storyboard plans according to the interactive video scripts, respectively; controlling a preset material generation agent to generate materials constituting each video frame according to the storyboard plans; and generating an interactive video matching the user needs by combining the interactive video scripts, interaction options, and materials according to a preset video rendering template.
[0006] Secondly, embodiments of this disclosure propose an interactive video generation device based on multi-agent collaboration, comprising: a user demand determination unit configured to determine user demands based on the interaction between a target user and initial video content; a script generation unit configured to control a preset script agent to generate interactive video scripts matching the user demands; an interaction option and storyboard planning generation unit configured to control preset interaction option generation agents and storyboard agents to generate interaction options and storyboard plans according to the interactive video scripts, respectively; a material generation unit configured to control a preset material generation agent to generate materials constituting each video frame according to the storyboard plans; and an interactive video generation unit configured to generate an interactive video matching the user demands by combining the interactive video scripts, interaction options, and materials according to a preset video rendering template.
[0007] Thirdly, embodiments of this disclosure provide an electronic device comprising: at least one processor; and a memory communicatively connected to the at least one processor; wherein the memory stores instructions executable by the at least one processor, the instructions being executed by the at least one processor to enable the at least one processor to implement the interactive video generation method based on multi-agent cooperation and intelligent interaction as described in the first aspect.
[0008] Fourthly, embodiments of this disclosure provide an interactive video generation system based on multi-agent collaboration, comprising: a master agent, used to determine user needs based on the interaction between the target user and the initial video content; and to generate an interactive video matching the user needs by using received interactive video scripts, interactive options, and materials according to a preset video rendering template; a script agent, used to generate interactive video scripts matching the user needs under the control of the master agent; an interactive option generation agent, used to generate interactive options according to the interactive video scripts under the control of the master agent; a storyboard agent, used to generate storyboard plans according to the interactive video scripts under the control of the master agent; and a material generation agent, used to generate materials constituting each video frame according to the storyboard plans under the control of the master agent.
[0009] Fifthly, embodiments of this disclosure provide a non-transitory computer-readable storage medium storing computer instructions that enable a computer, when executed, to implement the interactive video generation method based on multi-agent cooperation and intelligent interaction as described in the first aspect.
[0010] In a sixth aspect, embodiments of this disclosure provide a computer program product including a computer program that, when executed by a processor, can implement the steps of the interactive video generation method based on multi-agent cooperation and intelligent interaction as described in the first aspect.
[0011] The interactive video generation solution based on multi-agent collaboration disclosed herein involves a master agent determining the target user's needs based on their interactions with the initial video content. Then, based on these needs, the master agent distributes the intermediate stages (including script, interaction options, storyboard planning, and source material) for generating a matching interactive video to specialized agents for execution. This allows the specialized agents to better complete the deliverables of each intermediate stage. Finally, the master agent generates an interactive video matching the user's needs using a video rendering template based on the deliverables of each intermediate stage. This solution, through automation and agent collaboration between the master agent and specialized agents, enables the generated interactive video to fully meet personalized and diverse user needs, further enhancing user engagement and interactive experience, and ultimately improving user satisfaction with such services or products.
[0012] It should be understood that the description in this section is not intended to identify key or essential features of the embodiments of this disclosure, nor is it intended to limit the scope of this disclosure. Other features of this disclosure will become readily apparent from the following description. Attached Figure Description
[0013] Other features, objects, and advantages of this disclosure will become more apparent from the following detailed description of non-limiting embodiments with reference to the accompanying drawings:
[0014] Figure 1 This is an exemplary system architecture to which this disclosure can be applied;
[0015] Figure 2 A flowchart illustrating an intelligent interactive video generation method based on multi-agent collaboration provided in this disclosure embodiment;
[0016] Figure 3 A flowchart illustrating a method for determining user needs provided in an embodiment of this disclosure;
[0017] Figure 4 A flowchart illustrating a method for controlling an intelligent agent to generate deliverables in intermediate stages, provided as an embodiment of this disclosure;
[0018] Figure 5 A flowchart of a method for evaluating the quality of generated results provided in this embodiment of the disclosure;
[0019] Figure 6-1 A schematic diagram illustrating the execution flow of each functional entity in an application scenario provided by an embodiment of this disclosure;
[0020] Figures 6-2 to 6-7 A schematic diagram illustrating the effect of five interactions provided in an embodiment of this disclosure;
[0021] Figure 7 A structural block diagram of an interactive video generation device based on multi-agent collaboration that is capable of intelligent interaction, provided in an embodiment of this disclosure;
[0022] Figure 8 This is a schematic diagram of the structure of an electronic device suitable for performing an interactive video generation method based on multi-agent cooperation and intelligent interaction, provided as an embodiment of this disclosure. Detailed Implementation
[0023] The exemplary embodiments of this disclosure are described below with reference to the accompanying drawings, including various details of the embodiments to aid understanding; these should be considered merely exemplary. Therefore, those skilled in the art will recognize that various changes and modifications can be made to the embodiments described herein without departing from the scope and spirit of this disclosure. Similarly, for clarity and brevity, descriptions of well-known functions and structures are omitted in the following description. It should be noted that, unless otherwise specified, the embodiments and features described in this disclosure can be combined with each other.
[0024] The collection, storage, use, processing, transmission, provision, and disclosure of user personal information involved in the technical solution disclosed herein comply with the provisions of relevant laws and regulations and do not violate public order and good morals.
[0025] Figure 1 An exemplary system architecture 100 is shown, in which embodiments of the present disclosure of a method, apparatus, electronic device, and computer-readable storage medium for generating intelligent interactive video based on multi-agent collaboration are applied.
[0026] like Figure 1 As shown, system architecture 100 may include terminal devices 101, 102, and 103, a network 104, and a server 105. Network 104 serves as the medium for providing communication links between terminal devices 101, 102, and 103 and server 105. Network 104 may include various connection types, such as wired or wireless communication links, or fiber optic cables, etc.
[0027] Users can use terminal devices 101, 102, and 103 to interact with server 105 via network 104 to receive or send messages, etc. Various applications for communication between the terminal devices 101, 102, and 103 and server 105 can be installed, such as interactive video generation applications, video viewing applications, and instant messaging applications. Server 105 can host or support various intelligent agents, such as interactive intelligent agents specifically for interacting with users, and specialized intelligent agents for undertaking various tasks, or they can be classified as primary intelligent agents and sub-intelligent agents according to their importance.
[0028] Terminal devices 101, 102, and 103 and server 105 can be either hardware or software. When terminal devices 101, 102, and 103 are hardware, they can be various electronic devices with displays, including but not limited to smartphones, tablets, laptops, and desktop computers. When terminal devices 101, 102, and 103 are software, they can be installed in the aforementioned electronic devices, and can be implemented as multiple software programs or software modules, or as a single software program or software module; no specific limitation is made here. When server 105 is hardware, it can be implemented as a distributed server cluster composed of multiple servers, or as a single server. When server 105 is software, it can be implemented as multiple software programs or software modules, or as a single software program or software module; no specific limitation is made here.
[0029] Server 105 can provide various services through its built-in applications. Taking an interactive video generation application as an example, the main agent on server 105 can achieve the following effects when running this application: First, it receives user interactions with the initial video content viewed via terminal devices 101, 102, and 103 through network 104, thereby determining user needs; then, it controls a preset script agent to generate interactive video scripts that match the user's needs; next, it controls a preset interaction option generation agent and a storyboard agent to generate interaction options and storyboard plans according to the interactive video scripts; next, it controls a preset material generation agent to generate materials that constitute each video frame according to the storyboard plans; finally, it generates an interactive video that matches the user's needs using the interactive video scripts, interaction options, and materials according to a preset video rendering template.
[0030] Furthermore, the main intelligent agent hosted on server 105 can also send the interactive video as a subsequent video stream to the user's session to watch the initial video content, and then present it to the user.
[0031] It should be noted that user interactions with the initial video content can be obtained from terminal devices 101, 102, and 103 via network 104, or can be pre-stored locally on server 105 through various means. Therefore, when server 105 detects that this data is already stored locally (e.g., when it begins processing previously reserved pending tasks), it can choose to retrieve this data directly from the local storage. In this case, the exemplary system architecture 100 may also exclude terminal devices 101, 102, and 103 and network 104.
[0032] Since generating interactive videos requires significant computing resources and power, the interactive video generation method based on multi-agent collaboration provided in the subsequent embodiments of this disclosure is generally executed by a server 105 with strong computing power and abundant computing resources. Correspondingly, the interactive video generation device based on multi-agent collaboration is also generally located within the server 105. However, it should also be noted that when terminal devices 101, 102, and 103 also possess sufficient computing power and resources, they can also complete the aforementioned calculations performed by the server 105 through their installed interactive video generation applications, thereby outputting the same results as the server 105. Especially when multiple terminal devices with different computing capabilities exist simultaneously, but the interactive video generation application determines that the terminal device it is using has strong computing power and sufficient remaining computing resources, the terminal device can perform the aforementioned calculations, thereby appropriately reducing the computing pressure on server 105. Correspondingly, an interactive video generation device based on multi-agent collaboration and capable of intelligent interaction can also be set in terminal devices 101, 102, and 103. In this case, the exemplary system architecture 100 may also exclude server 105 and network 104.
[0033] It should be understood that Figure 1 The number of terminal devices, networks, and servers shown is merely illustrative. Depending on implementation needs, any number of terminal devices, networks, and servers can be included.
[0034] Please refer to Figure 2 , Figure 2 A flowchart of an interactive video generation method based on multi-agent collaboration with intelligent interaction provided in this disclosure embodiment, wherein process 200 includes the following steps:
[0035] Step 201: Determine user needs based on the interaction between the target user and the initial video content;
[0036] This step aims to be implemented by the agent of an intelligently interactive video generation method based on multi-agent collaboration (e.g., Figure 1 The main intelligent agent hosted on server 105 (shown in the diagram) analyzes the user's interaction with the initial video content to accurately capture the user's interests, preferences, and needs, thereby providing clear direction for subsequent steps such as script generation, storyboard planning, and material production. In other words, this step serves as the core starting point for the personalized interactive video generation solution provided in this embodiment.
[0037] The user's interaction with the initial video content may include, but is not limited to, the following types:
[0038] 1) Click behavior: Users click on specific elements in the video (such as buttons, links, options, etc.); 2) Dwell time: The length of time users stay on certain video segments or content; 3) Selection behavior: Choices made by users in the video (such as options, votes, answers, etc.); 4) Feedback behavior: Users like, comment, share, or rate the video content; 5) Repeated viewing: Users repeatedly watch certain segments, etc. These interactive behaviors can be collected in real time through the built-in monitoring module of the video player or third-party data analysis tools.
[0039] After acquiring the aforementioned interactive behaviors, this step involves the main AI agent performing in-depth analysis of the collected data to extract key information. For example: which options or elements the user clicked indicate their interest in that content; which segments the user lingered on for longer periods suggest greater interest in those segments; and whether the user's choices reflect certain preferences or tendencies. This extracted key information is then mapped to specific user needs. For instance, if a user frequently clicks on "technology"-related options, their needs may lean towards technology-related content; if a user lingers on "humorous" segments in a video for a longer period, their needs may lean towards an entertainment-oriented style. In other words, this stage allows for the categorization of user needs according to different dimensions, such as: content theme: topics the user is interested in (e.g., technology, education, entertainment); content style: the user's preferred style (e.g., humorous, serious, concise); and interaction depth: the user's preference for interactive content (e.g., light interaction, in-depth exploration).
[0040] The aforementioned implementing entities need to analyze behavioral data in real time during user interaction with the video and quickly determine user needs to support rapid response in subsequent stages. They also need to ensure that the judgment of user needs is as accurate as possible through multi-dimensional data analysis (such as behavior type, time distribution, frequency, etc.). At the same time, during video playback, they need to dynamically adjust the judgment of user needs based on the latest user interaction behavior to ensure that the generated content always matches the user's needs.
[0041] Furthermore, in addition to the explicit interactions between users and the initial video content, other data sources can be combined to enrich the understanding of user needs. For example, emotions or moods extracted from voice interactions, and facial expressions and emotional changes captured by the camera while watching the video. Historical behavioral data can also be incorporated: user profiles formed by combining users' historical viewing records and preferences can further enhance or refine the judgment of current user needs. These user profiles are constructed through long-term accumulation of user interaction data, including interest tags, preference styles, and interaction habits, providing more comprehensive support for determining user needs. One specific implementation method, including but not limited to: first, determining interaction information based on the target user's interaction with the initial video content; then, determining user needs based on the interaction information and the target user's profile information. This interaction information can include at least one of the following: option selection information for interactive options provided by the initial video content, input text information, input image information, input voice information, and input or shared accessible or callable links (i.e., relevant information introduced through links that is not directly presented in text or image form).
[0042] Specifically, the initial video content mentioned in this step can take different forms depending on the application scenario:
[0043] 1) In the field of education, the initial video can be the teaching video that students watch. Then, by having students click on specific knowledge points or repeatedly watch certain segments, it is possible to determine their level of mastery of certain content and generate targeted interactive videos that include review materials or practice questions.
[0044] 2) In the advertising field, the initial video can be the advertising video that the user watches. Then, by having the user click on links to certain products or services, their interests can be determined, and more personalized advertising content that meets their needs can be generated.
[0045] 3) In the entertainment field, the initial video can be a short video that the user watches. Then, based on the user's personalized choices on plot branches or their strong interest in certain segments, subsequent content that better suits their preferences can be generated.
[0046] Step 202: Control the preset copywriting AI to generate interactive video copy that matches user needs;
[0047] Building upon step 201, this step aims to have the aforementioned executing entity control a copywriting generation agent with copywriting generation capabilities to generate video copywriting that highly matches the determined user needs, so as to provide a foundation for subsequent interaction options, storyboard planning, and material generation.
[0048] The copywriting AI agent can provide, but is not limited to, the following specific functions:
[0049] 1) Narrative text in the video (such as narration, dialogue, etc.); 2) The theme and core information of the video; 3) Interactive point design in the video (such as questions, options, branching storylines, etc.), which is used to design questions or options at key points in the video to guide user participation. The complexity and number of interactive points can be adjusted according to the depth of user interaction (such as light interaction or deep exploration); 4) Style adaptation: The copywriting agent can adjust the copywriting style according to user preferences. For example: formal style: suitable for educational and professional videos; humorous style: suitable for entertainment and lighthearted videos; concise style: suitable for advertising and fast-paced videos; 5) Language expression: The copywriting agent can generate corresponding copywriting based on the user's language habits (such as colloquial or formal) or language type (such as Chinese or English), etc.
[0050] Specifically, this step involves the aforementioned executing entity transmitting the determined user needs (such as content theme, style preferences, interaction depth, etc.) to the copywriting agent as input for copy generation. The copywriting agent then dynamically generates matching copy based on user needs, either from a pre-set copy library or through Natural Language Understanding (NLU, which aims to enable machines to understand, interpret, and generate human language by extracting semantic information from natural language text and converting it into machine-processable structured data) and Natural Language Generation (NLG, which aims to convert structured data or non-linguistic information into human-understandable natural language text, with the goal of generating grammatically correct, semantically coherent, and context-sensitive language output). For example, if a user is interested in the topic of "technology," the copywriting agent generates narrative text and interactive points related to technology; if the user prefers a "humorous" style, the copywriting agent adds lighthearted and humorous language elements to the copy. Simultaneously, when generating copy, the copywriting agent also needs to consider the overall context of the video (such as theme, target users, interaction history, etc.) to ensure the consistency and coherence of the copy.
[0051] Furthermore, the copywriting AI can generate multilingual copy based on users' language preferences to meet the needs of global users. It can also incorporate emotional elements, such as adjusting the tone of the copy based on the user's emotional state (e.g., excitement, calmness) and adding emotional expressions to enhance user engagement and resonance. Even further, personalized recommendations can be embedded in the copy, such as recommending related topics or products based on the user's historical behavior, and including information or links that the user might be interested in.
[0052] Specifically, the copy generation mentioned in this step can take different forms depending on the application scenario:
[0053] 1) Education field: Generate targeted teaching video scripts based on students' learning needs and interests. For example, for students who like practical application, the scripts will include more case studies and interactive exercises; for students who like theoretical learning, the scripts will include more concept explanations and logical reasoning.
[0054] 2) Advertising field: Generate personalized advertising copy based on users' interests and preferences. For example, for users who like fashion, the copy will embed fashion elements and trends; for users who like practicality, the copy will embed product functions and usage scenarios.
[0055] 3) Entertainment: Generate personalized storylines based on users' viewing habits and preferences. For example, for users who like suspenseful plots, embed more suspense and plot twists in the storylines; for users who like lighthearted plots, embed more humor and heartwarming elements in the storylines.
[0056] Step 203: Control the preset interaction options to generate intelligent agents and storyboard intelligent agents to generate interaction options and storyboard plans according to the interactive video script;
[0057] Building upon step 202, this step aims to further generate interactive options and storyboard plans from the interactive video script generated by the aforementioned execution entity based on the text intelligent agent, providing structured support for subsequent material generation and video rendering.
[0058] The interactive options generate the intelligent agent that can provide, but are not limited to, the following specific functions:
[0059] 1) Interaction Option Generation: The interaction option generation agent generates specific interaction options based on the interactive points in the interactive video script. These options may include: multiple choice questions: users can choose an answer or direction from multiple options; true / false questions: users can answer questions using "yes / no" or "right / wrong"; input boxes: users can express their thoughts or answers through text input; click triggers: users can trigger subsequent content by clicking on specific elements.
[0060] 2) Subtitle generation: The interactive options generate intelligent agents that match the text content, ensuring that the subtitles are synchronized with the video content and conform to the user's reading habits and preferences.
[0061] 3) Option design optimization: The interactive option generation agent can optimize option design based on users' interaction habits and preferences. For example, it can adjust the number of options according to the depth of user interaction (e.g., 2-3 options for simple interaction design and 4-5 options for deep interaction design), and design more attractive option content based on users' interests.
[0062] The storyboard agent can provide, but is not limited to, the following specific functions:
[0063] 1) Storyboard planning generation: The storyboard agent generates detailed storyboard plans based on the interactive video script, including: Shot design: the shooting angle, shot type (such as long shot, medium shot, close-up) and movement method (such as push shot, pull shot) of each shot; Scene switching: the switching method and transition effect between different scenes (such as fade in and fade out, fast switching); Time allocation: the duration of each shot and the rhythm control of the overall video.
[0064] 2) Matching Storyboards with Text: The storyboard intelligence ensures that the storyboard planning is highly matched with the text content. For example, close-up shots are designed at key information points in the text to highlight the key points; multiple storyboards are designed at interactive points to provide users with different visual experiences.
[0065] 3) Storyboard Style Adaptation: The storyboard AI can adjust the storyboard style according to user preferences. For example: Simple style: using simple camera transitions and clear screen layout; Dynamic style: using rich camera movements and visual effects; Immersive style: using first-person perspective or long shots to enhance the sense of immersion.
[0066] This step aims to have the master agent separately control the interaction option generation agent and the storyboard agent to accurately parse different parts of the interactive video script, extracting key information (such as interaction points, themes, and styles). The interaction option generation agent and the storyboard agent work collaboratively under the coordination of the master agent to ensure that the generated interaction options and storyboard plans match each other and are consistent with the script content. Specifically, the interaction option generation agent and the storyboard agent can execute their respective tasks in parallel, or one can execute its task first, and then use the interactive video script and the generated task results as input to the other, thus obtaining a more accurate second task output. For example, the interaction option generation agent can first generate interaction options, and then send these interaction options and the interactive video script to the storyboard agent, which then outputs a storyboard plan that takes the interaction options into account. The order can also be reversed.
[0067] Furthermore, the interactive option generation agent can combine multiple interaction methods to design interactive options, such as: voice interaction: users can select options or answer questions via voice; gesture interaction: users can trigger specific options or content through gestures. The storyboard agent can also combine a pre-set material library or material generation agent capabilities when generating storyboard plans to plan the usage of materials in advance, such as selecting or generating matching backgrounds, characters, props, etc., based on the storyboard plan; and designing special effects or animation effects based on the storyboard plan. Moreover, the storyboard agent can design personalized storyboard plans based on users' viewing habits and preferences; for example, designing rapidly switching storyboards for users who prefer a fast pace, and designing more close-ups and slow-motion shots for users who prefer details.
[0068] Specifically, the interaction option generation and storyboard planning generation mentioned in this step can take different forms depending on the application scenario:
[0069] 1) Education field: In interactive teaching videos, the interactive option generation agent generates multiple-choice or true / false questions related to knowledge points, and the storyboard agent designs clear storyboard plans to highlight the teaching focus;
[0070] 2) Advertising field: In personalized advertising videos, the interactive option generation agent generates options related to user interests (such as selecting different product functions), and the storyboard agent designs eye-catching storyboard plans to enhance advertising effectiveness;
[0071] 3) Entertainment field: In interactive story videos, the interactive option generation agent generates options related to the development of the plot (such as choosing plot branches), and the storyboard agent designs immersive storyboard planning to enhance the user's sense of immersion.
[0072] Step 204: Control the preset material generation agent to generate the materials that make up each video frame according to the storyboard plan;
[0073] Building upon step 203, this step aims to have the aforementioned execution entity dynamically generate or select the materials that constitute each frame of video based on the storyboard planning generated by the storyboard intelligent agent, ensuring that the video content is highly matched with user needs and providing high-quality material support for the final video rendering.
[0074] The intelligent agent generated from this material can provide, but is not limited to, the following specific functions:
[0075] 1) Material Generation: The material generation agent generates, searches, queries, or selects materials that constitute each frame of the video based on the storyboard planning, including but not limited to: background materials: background images or video clips that match the video theme and scene; character materials: character images or animations related to the video content; prop materials: props or decorative elements related to the video plot; special effects materials: special effects used to enhance visual effects (such as lighting, particle effects, transition animations, etc.).
[0076] 2) Material Adaptation: The intelligent material generation agent can adjust the size, proportion, color and other attributes of the material according to the requirements of the storyboard planning to ensure that the material is highly matched with the storyboard planning.
[0077] 3) Material Stylization: The intelligent material generation agent can generate or select materials that match specific styles based on user preferences and video style. For example: Cartoon style: suitable for entertainment and children's videos; Realistic style: suitable for educational and professional videos; Minimalist style: suitable for advertisements and fast-paced videos.
[0078] Specifically, the material generation agent can generate materials through the following technologies:
[0079] 1) Storyboard Analysis: The material generation AI analyzes the storyboard plan and extracts the material requirements for each frame (such as background, characters, props, special effects, etc.);
[0080] 2) Material generation methods: The material generation agent can generate or select materials in the following ways: Dynamic generation: Use image generation technology (such as generative adversarial networks, diffusion models) to generate materials that meet the requirements in real time; Material library selection: Select materials that match the storyboard planning from the preset material library; Material combination: Combine or synthesize multiple materials to generate complex materials that meet the storyboard planning.
[0081] 3) Material Optimization: The intelligent agent that generates materials optimizes the generated materials, for example, by adjusting the resolution, brightness, and contrast of the materials to ensure their visual effect in the video; and by compressing or converting the materials to ensure their compatibility with the video rendering template.
[0082] Furthermore, this intelligent agent for generating materials can combine multiple modalities to generate materials, such as: 3D material generation: using 3D modeling technology to generate three-dimensional characters, scenes, and props; audio material generation: generating background music, sound effects, or dubbing that match the video content; personalized material design: the intelligent agent for generating materials can generate customized materials according to the user's needs, such as: generating characters or scenes that the user prefers based on the user's historical behavior; generating relevant props or decorative elements based on the user's interests.
[0083] Specifically, the material generation mentioned in this step can take different forms depending on the application scenario:
[0084] 1) Education field: In interactive teaching videos, the material generation agent generates backgrounds, characters and props related to knowledge points. For example, when explaining historical events, it generates scenes and characters that match the historical background; when explaining scientific principles, it generates dynamic 3D models or animations.
[0085] 2) Advertising field: In personalized advertising videos, the material generation agent generates materials related to user interests. For example, for users who like fashion, it generates backgrounds and characters of fashion brands; for users who like technology, it generates 3D models and special effects of technology products.
[0086] 3) Entertainment field: In interactive story videos, the material generation agent generates materials related to the development of the plot, such as generating different scenes and characters based on the plot branches selected by the user; and generating matching background music and special effects based on the user's emotional state.
[0087] Step 205: Generate an interactive video that matches the user's needs by combining the interactive video script, interactive options, and materials according to the preset video rendering template.
[0088] Based on steps 202 to 204, this step aims to have the aforementioned executing entity integrate the text, interactive options, and materials generated in the previous steps into a complete and interactive video using a preset video rendering template, ensuring that the final output video highly matches the user's needs.
[0089] This video rendering template provides a standardized framework for integrating interactive video scripts, interactive options, and materials into a complete video. Specifically, it includes: Script rendering: embedding script content into the video as subtitles, narration, or dialogue boxes; Interactive option rendering: embedding interactive options into the video as buttons, selection boxes, or clickable areas; Material rendering: embedding generated materials (such as backgrounds, characters, props, and effects) into video frames according to the storyboard. Furthermore, this video rendering template can control the timeline of each frame according to the storyboard, ensuring the rhythm and smoothness of the video, and it embeds the logic of interactive options (such as post-selection navigation and trigger events) into the video to ensure the implementation of interactive functions.
[0090] The specific integration process can be represented in the following order (this is only an example and does not represent a fixed or limited necessary integration order):
[0091] 1) Text embedding: Embedding text content into video frames in the form of text or audio;
[0092] 2) Option Embedding: Embed interactive options in a visual form into video frames and set the interactive logic;
[0093] 3) Material embedding: Embed the material into the video frames according to the storyboard plan, and apply special effects or animations.
[0094] The video rendering template renders the input content into video frames according to preset rules and logic, splices them into a complete video in timeline order, and then outputs the rendered video frames as an interactive video file, supporting multiple formats (such as MP4, WebM, etc.) and playback platforms (such as web pages and mobile devices).
[0095] It should be noted that during the video generation process in this step, video rendering templates are needed to ensure the consistency and coordination of the text, interactive options, and materials in the video, avoiding content conflicts or visual inconsistencies. It is also necessary to ensure that the functional logic of the interactive options is correctly implemented in the video. For example, after a user selects an option, the video can jump to the corresponding segment or trigger a specific event, and the user's interactive behavior can provide real-time feedback and affect subsequent video content.
[0096] Furthermore, this video rendering template can generate compatible video formats and resolutions for different playback platforms (such as web, mobile, and TV), ensuring smooth playback on various devices. During video playback, the template can also dynamically adjust subsequent video content based on real-time user feedback, such as rendering different plot branches based on user selections or adjusting the video's rhythm or content based on user interactions. Even further, the template can offer different rendering styles based on user needs, such as a clean, minimalist style with clear layouts and simple effects; a dynamic style with rich animations and effects; and an immersive style with full-screen layouts and 3D effects.
[0097] Specifically, the interactive video generation mentioned in this step can take different forms depending on the application scenario:
[0098] 1) Education field: In interactive teaching videos, the video rendering template integrates knowledge point explanations, interactive exercises, and teaching materials into a complete video. Students can enter the next stage of learning by selecting answers or clicking links.
[0099] 2) Advertising field: In personalized advertising videos, video rendering templates integrate product introductions, user interests, and interactive options into a single video. Users can select different options to view product features of interest or purchase links.
[0100] 3) Entertainment: In interactive story videos, the video rendering template integrates plot development, character dialogue, and interactive options into a single video, allowing users to experience different plot branches by selecting different options.
[0101] The interactive video generation method based on multi-agent collaboration provided in this disclosure involves a master agent determining the target user's needs based on their interaction with the initial video content. Then, based on these needs, the master agent distributes the intermediate stages (including script, interaction options, storyboard planning, and source material) for generating a matching interactive video to specialized agents for execution. This allows the specialized agents to better complete the deliverables of each intermediate stage. Finally, the master agent generates an interactive video matching the user's needs using a video rendering template based on the deliverables of each intermediate stage. This solution, through automation and agent collaboration between the master agent and specialized agents, enables the generated interactive video to fully meet personalized and diverse user needs, further improving user stickiness and interactive experience, and ultimately enhancing user satisfaction with such services or products.
[0102] Please refer to Figure 3 , Figure 3 The flowchart of a method for determining user needs provided in this disclosure aims to provide a more specific implementation of the solution mentioned in step 201, which involves determining user needs based on interaction information and profile information. Step 300 includes the following steps:
[0103] Step 301: Determine the interaction information based on the interaction between the target user and the initial video content;
[0104] This step is consistent with the description in the above embodiments, and will not be repeated here.
[0105] Step 302: Determine the desired video direction based on the initial video content and interaction information;
[0106] The analysis of the initial video content can include the theme, style, structure, and interactive points, serving as a basis for determining the desired direction of the video. Combining interactive information and the initial video content, the analysis is used to deduce the user's desired direction of the video. For example, if a user shows interest in "technology" related content, the desired direction of the video may lean towards technology-related content; if a user spends a long time on "humorous" segments, the desired direction of the video may lean towards an entertainment style; if a user frequently selects a certain plot branch, the desired direction of the video may lean towards the subsequent development of that branch.
[0107] Among these, videos can be categorized according to different dimensions to meet user needs, such as: content theme: topics that users are interested in (e.g., technology, education, entertainment, etc.); content style: styles that users prefer (e.g., humorous, serious, concise, etc.); and interaction design: user preferences for interactive content (e.g., light interaction, in-depth exploration, etc.).
[0108] Step 303: Use the profile information to adjust the video direction to meet user needs.
[0109] User profiles are typically built upon users' historical behavioral data (such as viewing history, interaction habits, and preference tags), including: interest tags: areas or topics of interest to the user (such as technology, fashion, travel, etc.); behavioral habits: user interaction habits (such as preference for clicking, selecting, and rewatching); style preferences: user's preferred video styles (such as humorous, serious, and concise), etc. Furthermore, in the absence of sufficient individual user profile data, information on the profiles of similar user groups can be referenced to better assist in determining user needs.
[0110] The process of using user profile information to adjust the direction of videos can be illustrated with an example: if the user profile shows that they have a long-term interest in "travel" related content, then the direction of the videos will be adjusted to travel-related content; if the user profile shows that they prefer a "simple" style, then the style of the videos will be adjusted to simplicity.
[0111] This embodiment aims to accurately determine user needs by analyzing user interaction behavior with the initial video and combining it with user profile information, thereby providing a clear direction for subsequent video generation.
[0112] Specifically, the user requirement determination scheme provided in this embodiment can take different forms depending on the application scenario:
[0113] 1) Education field: When students watch instructional videos, they can click on specific knowledge points or repeatedly watch certain segments. The main agent determines their interaction information and, combined with their learning history (such as preference for practice or theory), adjusts the direction of the video to meet their needs, generating targeted teaching content.
[0114] 2) Advertising field: When users watch advertising videos, they click on links to certain products or services. The main intelligent agent determines their interaction information and, combined with their shopping history (such as preferences for fashion or technology), adjusts the video's direction to meet their needs, generating advertising content that better matches their interests.
[0115] 3) Entertainment: When users watch short videos, they can select different plot branches or show strong interest in certain segments. The main AI agent determines their interaction information and, combined with their viewing history (such as a preference for suspense or comedy), adjusts the video's direction to meet their needs, generating subsequent content that better suits their preferences.
[0116] Furthermore, in determining user needs, environmental information such as the user's geographical location and time can be combined to further optimize the need assessment. During video playback, the main AI agent can dynamically adjust user needs based on real-time interactive behavior. If the user shows new points of interest during playback, the video direction can be adjusted in real time to meet the needs; and if the user's interactive behavior deviates from expectations, the user needs can be revised.
[0117] Please see Figure 4 , Figure 4 The flowchart of a method for controlling an agent to generate deliverables in an intermediate stage, provided by an embodiment of this disclosure, is intended to better illustrate the interaction between the main agent and various specialized agents in the above embodiments, thereby clarifying how the required deliverables are obtained. The process 400 includes the following steps:
[0118] Step 401: Generate copywriting generation instructions corresponding to user needs;
[0119] Step 402: Send the copy generation instruction to the copy intelligent agent and receive the returned interactive video copy;
[0120] The above two steps aim to have the main intelligent agent first generate a corresponding copywriting generation instruction based on the determined user needs, then send the copywriting generation instruction to the copywriting intelligent agent to trigger the copywriting generation process, and finally have the copywriting intelligent agent return the generated interactive video copywriting to the main intelligent agent.
[0121] The instruction may include: a description of the user's needs (such as theme, style, depth of interaction, etc.) and generation rules that specify the specific rules for copy generation (such as word limit, language style, interactive point design, etc.).
[0122] Step 403: Generate interactive option generation instructions and storyboard planning instructions based on the interactive video script;
[0123] Step 404: Send the interaction option generation command and the storyboard planning command to the interaction option generation agent and the storyboard planning agent respectively, and receive the returned interaction options and storyboard plans respectively;
[0124] The above two steps aim to have the main agent first generate interaction option generation instructions and storyboard planning instructions based on the received interactive video script. Then, the interaction option generation instructions are sent to the interaction option generation agent and the storyboard planning instructions are sent to the storyboard agent, thereby triggering their respective generation processes. Finally, the interaction option generation agent generates interaction options and the storyboard agent generates storyboard plans, and the results are returned to the main agent.
[0125] The interaction option generation instruction is used to clarify the interactive points in the text, specify the type of interaction option (such as multiple choice, true / false, input box, etc.) and design rules, while the storyboard planning instruction is used to clarify the content structure of the text, specify the generation rules of the storyboard planning (such as shot design, scene switching, time allocation, etc.).
[0126] Step 405: Generate material generation instructions corresponding to the storyboard plan;
[0127] Step 406: Send material generation instructions to the material generation agent and receive the returned materials.
[0128] The above two steps aim to have the main intelligent agent first generate corresponding material generation instructions based on the received storyboard plan, then send the material generation instructions to the material generation intelligent agent to trigger the material generation process, and finally the material generation intelligent agent generate or select matching materials according to the instructions and return them to the main intelligent agent.
[0129] The instruction may include: material requirements for specifying the type of material needed for each frame of video (such as background, characters, props, effects, etc.), and generation rules for specifying the specific rules for material generation (such as style, resolution, format, etc.).
[0130] Steps 401-406 provided in this embodiment aim to demonstrate how, through the coordination of the main intelligent agent, user needs are transformed into specific instructions and distributed to various specialized intelligent agents for execution, ultimately generating interactive video scripts, interactive options, storyboard planning, and materials.
[0131] Specifically, the control generation scheme provided in this embodiment can take different forms depending on the application scenario:
[0132] 1) Education field: The main agent generates text generation instructions based on students' user needs, and the text generation agent generates teaching video text; the main agent generates interaction option generation instructions and storyboard planning instructions based on the text, the interaction option generation agent generates practice question options, and the storyboard planning agent plans teaching scenarios; the main agent generates material generation instructions based on the storyboard planning, and the material generation agent generates teaching materials.
[0133] 2) Advertising domain: The main agent generates copy generation instructions based on the user's interests, and the copy generation agent generates advertising copy; the main agent generates interaction option generation instructions and storyboard planning instructions based on the copy, the interaction option generation agent generates product options, and the storyboard planning agent plans advertising scenarios; the main agent generates material generation instructions based on the storyboard planning, and the material generation agent generates advertising materials.
[0134] 3) Entertainment field: The main agent generates copywriting generation instructions based on user preferences, and the copywriting agent generates plot copywriting; the main agent generates interaction option generation instructions and storyboard planning instructions based on the copywriting, the interaction option generation agent generates plot options, and the storyboard agent plans plot scenes; the main agent generates material generation instructions based on the storyboard planning, and the material generation agent generates plot materials.
[0135] Furthermore, during video generation, the main agent can dynamically adjust subsequent instructions based on real-time user feedback or the execution results of specialized agents. For example, if the user's feedback on a certain interactive point is unsatisfactory, the main agent can adjust the interaction option generation instructions and redesign the options; if the material generation agent cannot generate materials that meet the requirements, the main agent can adjust the material generation instructions and select an alternative solution.
[0136] Furthermore, the master agent can prioritize instructions based on the urgency or importance of the task. For example, instructions for generating keyframe footage can be given high priority to ensure their execution; instructions for generating non-core interactive options can be given low priority to optimize resource allocation. Simultaneously, the master agent can establish an instruction feedback mechanism to monitor the execution status of each specialized agent in real time. For instance, if an agent times out or fails, the master agent can reissue the instruction or switch to a backup agent; if the deliverables generated by an agent do not meet expectations, the master agent can adjust the instructions and re-trigger the generation process.
[0137] Considering that the user requirements identified by the main intelligent agent are more comprehensive and can more accurately describe the actual needs of users, but often cannot be fully distributed to each specialized intelligent agent, only some information related to the specific task type can be extracted to form task execution instructions, and even if the specialized intelligent agents obtain different results in different executions based on the same instruction information, in order to ensure that the deliverables output by each specialized intelligent agent can meet the actual needs of users, the quality of the products delivered by the aforementioned specialized intelligent agents can be evaluated, and a quality evaluation and feedback mechanism can be established to better manage the deliverables at each stage.
[0138] See also Figure 5 , Figure 5 A flowchart of a method for quality evaluation of generated results provided in this disclosure embodiment is included in the following steps:
[0139] Step 501: Evaluate the quality of any of the received interactive video scripts, interactive options, storyboard plans, and materials based on user needs;
[0140] In this context, quality evaluation can be understood as the process by which the aforementioned implementing entities formulate specific quality evaluation standards for each type of deliverable based on user needs. For example:
[0141] 1) Regarding interactive video scripts: Do they accurately reflect user needs? Do they conform to the intended style and theme? Are the interactive elements designed reasonably?
[0142] 2) Regarding interactive options: Do they match the text content? Do they conform to user interaction habits? Are the option designs clear and easy to understand?
[0143] 3) Regarding the storyboard planning: Does it align with the script? Does it conform to the user's visual preferences? Are the camera shots and scene transitions reasonable?
[0144] 4) Regarding the source material: Does it match the storyboard plan? Does it conform to the user's style preferences? Does the source material quality meet the requirements (such as resolution, clarity, etc.)?
[0145] Specific evaluation methods may include: 1) Rule matching: checking whether the deliverables meet the preset rules and standards; 2) Semantic analysis: analyzing whether the semantics of the text and options match the user's needs through natural language processing technology; 3) Visual analysis: analyzing whether the visual effects of the materials meet the requirements through image recognition technology; 4) User simulation: evaluating the actual effect of interactive options and storyboard planning by simulating user interaction behavior, etc.
[0146] Step 502: In response to the quality evaluation result being unsuccessful, determine the adjustment instruction information based on user needs;
[0147] This adjustment instruction is used to pinpoint specific issues in the deliverables that do not meet requirements. Examples include: unreasonable design of interactive elements in the copy, unclear descriptions of interactive options, shot design in the storyboard that does not align with user preferences, and a mismatch between the style of the materials and user needs.
[0148] Once the specific problem is identified, the adjustment instructions can be further expressed as follows:
[0149] 1) For the copywriting AI agent: Adjust the design of interactive points and add content that users are interested in;
[0150] 2) Generate intelligent agents for interactive options: Optimize the descriptions of the options to make them clearer and easier to understand;
[0151] 3) For the storyboard AI: Adjust the lens design to better suit user visual preferences;
[0152] 4) Generate intelligent agents for materials: Change the style of the materials to match the user's needs.
[0153] Step 503: For agents with unsuccessful quality evaluation results, issue a regeneration instruction containing adjustment instructions until a passing quality evaluation result is obtained.
[0154] This step involves the aforementioned executing entity generating a regeneration instruction based on the adjustment guidance information, clearly defining the content and specific requirements for adjustment. This regeneration instruction is then issued to the corresponding specialized intelligent agent to trigger the regeneration process. Subsequently, the specialized intelligent agent regenerates the deliverable according to the instruction, and the master intelligent agent re-evaluates the quality of the new deliverable until it passes the evaluation.
[0155] Furthermore, if the regenerated deliverable still fails the quality evaluation, the main agent can initiate a multi-round adjustment mechanism to gradually optimize the deliverable. For example, the first round of adjustment might optimize the interactive design of the copy; the second round might further adjust the language style of the copy; and the third round might optimize the wording of the interactive options. Also, if a quality issue with a deliverable involves multiple agents, the main agent can coordinate these agents for collaborative adjustments. For example, if the storyboard planning and the source material do not match, the main agent can simultaneously adjust the instructions of both the storyboard agent and the source material generation agent.
[0156] Furthermore, to reduce the likelihood of multiple reworks and modifications due to failure to pass quality evaluations, at least one of the copywriting agent, interaction option generation agent, storyboard agent, and material generation agent can be controlled to simultaneously provide at least two alternative deliverables to the received generation instructions. These deliverables correspond to at least one of the following: interactive video script, interaction options, storyboard planning, and raw materials. In other words, by providing multiple alternative deliverables simultaneously, the main agent can choose from them, thereby increasing the probability of passing the quality evaluation on the first attempt.
[0157] Steps 501-503 provided in this embodiment are intended to evaluate the quality of deliverables (such as interactive video scripts, interactive options, storyboards, and materials) generated by each specialized intelligent agent, and to make dynamic adjustments based on the evaluation results until all deliverables meet the quality requirements.
[0158] The quality evaluation scheme provided in this embodiment can take different forms depending on the application scenario:
[0159] 1) Education field: The main agent evaluates the quality of the teaching video script, finds that the interactive points are poorly designed, and issues adjustment instructions to require the script agent to regenerate the script; the regenerated script is evaluated again until it passes.
[0160] 2) Advertising domain: The main AI evaluates the quality of advertising materials, finds that the style does not match the user's needs, and issues adjustment instructions to require the material generation AI to regenerate the materials; the regenerated materials are evaluated again until they pass.
[0161] 3) Entertainment field: The main AI evaluates the quality of the storyboard planning and finds that the shot design does not meet the user's preferences. It then issues an adjustment instruction to require the storyboard AI to regenerate the storyboard planning. The regenerated storyboard planning is evaluated again until it passes the review.
[0162] Based on any of the above embodiments, if the initial video content contains a real or virtual image, then the generated interactive video should be controlled to contain the same real or virtual image, and the posture of the real or virtual image appearing in the interactive video should be controlled to match the video content of the interactive video, such as hand posture, face posture, and body posture.
[0163] Based on any of the above embodiments, after the interactive video generated based on the current user needs is completed, the system can further control each intelligent agent to jointly generate a predicted interactive video corresponding to the user needs in the future (relative to the current interaction moment, or relative to the currently generated interactive video).
[0164] This means that the aforementioned implementing entities predict future user needs based on current user needs and historical user behavior data. This allows them to control various specialized intelligent agents (such as copywriting agents, interaction option generation agents, storyboard agents, and material generation agents) to collaboratively generate interactive videos that match the predicted needs. For example, generating video content related to "new technology product reviews" or "travel route planning."
[0165] During subsequent interactions, if a target preset interactive video exists that matches the actual user's needs to a degree exceeding a preset level, this video will be directly provided to the target user. The number of these preset interactive videos and the number of future nodes involved are determined based on availability and / or the target user's priority. Availability-wise, the number of predicted videos and future nodes is dynamically adjusted based on the system's computing power and resource usage. For example, if system performance is sufficient, more predicted videos can be generated and more future nodes can be covered; if system performance is limited, the number of predicted videos and future nodes is reduced. Regarding user priority, the prediction video generation strategy is adjusted based on the user's importance and priority. For example, high-priority users receive more predicted videos and more future nodes covered, while ordinary users receive fewer predicted videos and fewer future nodes covered.
[0166] Once the user's future actual needs are clear, the aforementioned execution entity assesses the degree of matching between the predicted video and the actual needs. If there is a predicted video whose degree of matching with the future actual needs exceeds a preset threshold (i.e., a target preset interactive video), the main intelligent agent directly provides it to the user, avoiding the waiting time of regenerating the video. For example, if the user actually shows interest in "new technology product reviews" in the future, and the predicted video content highly matches it, the degree of matching exceeds the preset threshold. Therefore, the user is provided with pre-generated video content related to new technology product reviews, thus initiating a new round of interaction.
[0167] Based on any of the above embodiments, considering that multiple interactive video clips (segments) will be gradually generated in chronological order, this embodiment can also provide the following fallback mechanism at the interaction level:
[0168] That is, the aforementioned execution entity can receive the rewind selection information of the target user for the historically generated interactive video. If the target user performs a new interaction on the historical node corresponding to the rewind selection information, it can also control the various intelligent agents to jointly generate a new interactive video based on the new user needs corresponding to the new interaction.
[0169] The rewind function allows users to return to previously generated interactive video nodes. For example, while watching a video, a user can click the "Rewind" button to return to a specific historical node, or select a specific historical node via the timeline or node list. Specifically, rewind information can originate from the user's selection of the target node (such as a specific time point or interaction point) or from the user's desire to reselect due to dissatisfaction with the current content.
[0170] When a user reverts to a previous point in time, the main agent collects the user's new interactive behaviors in real time. This includes things like the user reselecting an interaction option, clicking on an element in a video, or inputting new feedback via voice or text. The main agent then analyzes these new interactions to understand the corresponding new user needs. For example, if the user reselected an interaction option, the new need might be related to the content theme or style of that option; if the user input new feedback, the new need might be related to keywords or sentiment expressed in that feedback. Finally, based on these new user needs, the main agent generates new instructions and sends them to the various specialized agents to obtain newly generated interactive videos. These videos are then provided to the user to continue the subsequent interaction process.
[0171] Furthermore, it can support users rewinding to multiple historical points and engaging in new interactions at different points. For example, a user can rewind to the beginning of the video and choose a different plot branch; a user can rewind to a key point and choose different interaction options. It can also record the user's rewind path and new interaction behavior to optimize subsequent video generation. For example, if a user frequently rewinds to a certain point and chooses the same interaction option, the main agent can enhance the optimization of the content at that point. Simultaneously, after a user rewinds to a historical point, the main agent can provide guidance information to help the user engage in new interactions. For example, it can prompt the user to choose different interaction options and provide background information or suggestions related to the historical point.
[0172] The rollback scheme provided in this embodiment can take different forms depending on the application scenario:
[0173] 1) Education domain: When students watch instructional videos, they can go back to a specific knowledge point and reselect practice question options; the main agent generates new instructional videos based on the new selections, providing targeted explanations.
[0174] 2) Advertising domain: When a user is watching an advertising video, they can go back to a product introduction node and reselect the function options; the main agent generates a new advertising video based on the new selection, showcasing the product functions that the user is interested in.
[0175] 3) Entertainment: When watching a story video, users can rewind to a specific story branch and choose a new option; the main agent generates a new story video based on the new choice, showcasing different story developments.
[0176] To enhance understanding, this embodiment addresses the shortcomings of existing interactive video interaction methods, such as insufficient intelligence, inability to meet user needs, and limited interactive content. Specifically, it proposes an interactive video interaction method based on multi-agent collaboration. This method aims to efficiently and accurately understand user interaction needs and achieve dynamic generation and continuous optimization of interactive video content through the collaboration of intelligent agents. The key technical solutions are as follows:
[0177] like Figure 6-1 As shown, this embodiment designs an interactive video generation system based on the collaborative work of a main intelligent agent and multiple sub-intelligent agents, specifically including the following intelligent agents:
[0178] 1) Main Agent: Responsible for overall interactive process control, accurately understanding user needs, coordinating and scheduling other agents in real time, and providing unified evaluation, feedback, and quality monitoring of the outputs of sub-agents such as text, subtitles, interactive options, storyboards, and image or short video generation to ensure the consistency, accuracy, and quality of the overall interactive video generation. Furthermore, the main agent can intelligently and automatically render based on video templates, quickly generating the final video presentation effect and improving interactive response speed.
[0179] 2) Copywriting AI Agent: Responsible for generating video copy based on user needs, including video theme planning, content organization and expression design, fully considering user preferences and interactive context. The generated copy is evaluated and optimized by the main AI agent to ensure the quality, relevance and attractiveness of the copy.
[0180] 3) Subtitle and Option Intelligent Agent (i.e., the interactive option generation intelligent agent mentioned in the above embodiments): Generate corresponding subtitles and user interaction options based on the video text. The subtitles support highlighting key content to facilitate users to quickly capture key information; the interactive options are used to guide users to further interact and enhance user engagement.
[0181] 4) Storyboard AI: Based on the text and subtitles, it intelligently plans the visual presentation requirements of the video, including the type and theme of the accompanying images or short films, determines the visual expression methods and material resource acquisition methods (such as creation or search acquisition), and interacts with the main AI in real time to provide quality feedback and adjustments.
[0182] 5) Image or short video generation agent: Based on the planning of the storyboard agent, intelligently create or obtain the required image or short video materials through a search engine to achieve efficient acquisition and accurate generation of materials, and the main agent implements strict quality evaluation and control.
[0183] A specific execution flow may include the following steps generated in sequence:
[0184] 1) Users first interact with the default initial interactive video, inputting their needs, such as clicking options, typing text, or using voice input;
[0185] 2) The main intelligent agent analyzes and schedules the copywriting intelligent agent in real time based on the user's needs and user profile to generate copy that meets the user's needs, and implements strict quality control and feedback.
[0186] 3) After the copywriting is evaluated, the main agent continues to schedule the subtitle and option agent to generate corresponding subtitles and interactive content to further improve the user experience. The main agent evaluates and provides feedback on the subtitles and options. If the quality of the subtitles and options is not good, the main agent will give feedback to the subtitle and option agent, and the subtitle and option agent will regenerate the subtitles and options.
[0187] 4) After the subtitles and options are evaluated and approved, the main AI agent schedules the storyboard AI agent to generate a response storyboard. The main AI agent evaluates and provides feedback on the storyboard. If the storyboard quality is poor, the main AI agent will provide feedback to the storyboard AI agent, and the storyboard AI agent will regenerate the storyboard.
[0188] 5) After the storyboard evaluation is passed, the main agent schedules the image or short film generation agent to generate a response image or short film. Specifically, if the plan is for creating and generating images or short films, the image or short film generation agent is scheduled to generate the images or short films; if the plan is for retrieving images or short films through a search engine, the image or short film search agent is scheduled to retrieve the images or short films. Regardless of whether it is generated through creation or retrieved through a search engine, the output of the image or short film generation agent needs to be evaluated and fed back by the main agent. If the image or short film quality is poor, the main agent will provide feedback to the image or short film generation agent, and the image or short film generation agent will regenerate the images or short films.
[0189] 6) After the image or short video is approved, the main AI agent integrates the text, subtitles, options, storyboard, images or short videos, etc., and automatically generates template content based on the front-end video rendering template, and renders it into an interactive video to be displayed to the user.
[0190] 7) After that, users can continue to interact with the generated interactive video and input their own needs.
[0191] A concrete example can be found here. Figures 6-2 to 6-6 The content of each interactive video clip and its corresponding interactive options are shown below. Figure 6-7 This displays a comprehensive guide following the selection of previously interactive options. Among them, Figure 6-2 This shows the first interaction, used to provide the option to select the specific number of travelers. Figure 6-3 This shows the second interaction, used to display text and image content. Figure 6-4 This shows the third interaction, used to collect the number of playdays. Figure 6-5 This shows the fourth interaction, used to gather the budget. Figure 6-6 This shows the fifth interaction, used for collecting gameplay elements. Figure 6-7 This is used to return the customized strategy results.
[0192] Compared with the prior art, the solution provided in this embodiment has the following innovative points:
[0193] 1) An interactive video generation mechanism based on multi-agent collaboration is proposed. Through the main agent's accurate real-time understanding of user needs, unified scheduling of content generation by sub-agents, and quality feedback, the personalized, intelligent, and automated generation of interactive videos is fully realized, significantly improving the overall user interactive experience and satisfaction.
[0194] 2) An intelligent video template rendering method was designed. The main intelligent agent can dynamically and intelligently select and generate video rendering template content according to user needs and interaction characteristics, which significantly improves the flexibility, visual consistency and response speed of interactive video generation.
[0195] 3) Based on the end-to-end generative large model, the multi-agent collaborative mode realizes accurate and efficient task coordination, real-time feedback and continuous optimization among agents, ensuring the global optimization of the entire interactive video task process, fully meeting personalized and diversified user needs, and further improving the user stickiness and interaction efficiency of interactive videos.
[0196] The solution provided in this embodiment can be used in various interactive video application scenarios, such as online education and e-commerce promotion. Figures 6-2 to 6-7 Specifically, taking the promotion of tourism projects as an example, this approach utilizes social interaction and live streaming, demonstrating broad application prospects and market potential. Furthermore, this embodiment can be deeply integrated with existing interactive video platforms and content generation tools to further enhance the intelligence, personalization, and user experience of interactive videos, resulting in significant commercial value and social benefits.
[0197] Further reference Figure 7 As an implementation of the methods shown in the above figures, this disclosure provides an embodiment of an interactive video generation device based on multi-agent collaboration that enables intelligent interaction. This device embodiment is similar to... Figure 2 Corresponding to the method embodiments shown, this device can be specifically applied to various electronic devices.
[0198] like Figure 7As shown, the interactive video generation device 500 based on multi-agent collaboration in this embodiment may include: a user needs determination unit 701, a script generation unit 702, an interaction option and storyboard planning generation unit 703, a material generation unit 704, and an interactive video generation unit 705. The user needs determination unit 701 is configured to determine user needs based on the interaction between the target user and the initial video content; the script generation unit 702 is configured to control a preset script agent to generate interactive video scripts matching the user needs; the interaction option and storyboard planning generation unit 703 is configured to control preset interaction option generation agents and storyboard agents to generate interaction options and storyboard plans according to the interactive video scripts; the material generation unit 704 is configured to control a preset material generation agent to generate materials constituting each video frame according to the storyboard plans; and the interactive video generation unit 705 is configured to generate an interactive video matching the user needs by combining the interactive video scripts, interaction options, and materials according to a preset video rendering template.
[0199] In this embodiment, the specific processing and technical effects of the user requirement determination unit 701, the text generation unit 702, the interaction option and storyboard planning generation unit 703, the material generation unit 704, and the interactive video generation unit 705 in the multi-agent collaborative intelligent interactive video generation device 700 can be found in the following references. Figure 2 The relevant descriptions of steps 201-205 in the corresponding embodiments will not be repeated here.
[0200] In some other optional implementations of this embodiment, the user requirement determination unit 701 may include:
[0201] The interaction information determination subunit is configured to determine interaction information based on the interaction between the target user and the initial video content.
[0202] The user needs determination subunit is configured to determine user needs based on interaction information and the target user's profile information.
[0203] In some other optional implementations of this embodiment, the interaction information may include at least one of the following:
[0204] The interactive options provided for the initial video content include selection information, input text information, input image information, input voice information, and input or shareable or callable links.
[0205] In some other optional implementations of this embodiment, the user requirement determination subunit can be further configured as follows:
[0206] Based on the initial video content and interaction information, determine the desired direction of the video.
[0207] By using profile information to adjust the direction of the video to meet user needs, we can obtain user requirements.
[0208] In some other optional implementations of this embodiment, the interaction options and storyboard planning generation unit 703 can be further configured as follows:
[0209] The intelligent agent generates interactive options based on the interactive video text.
[0210] The storyboard agent is controlled to generate storyboard plans based on interactive video scripts and interactive options.
[0211] In some other optional implementations of this embodiment, the copywriting generation unit 702 can be further configured as follows:
[0212] Generate copywriting instructions that correspond to user needs;
[0213] Send the text generation command to the text intelligent agent and receive the returned interactive video text;
[0214] Correspondingly, the interaction options and storyboard planning generation unit 703 can be further configured as follows:
[0215] Generate interactive option generation instructions and storyboard planning instructions based on the interactive video script;
[0216] The interaction option generation command and the storyboard planning command are sent to the interaction option generation agent and the storyboard planning agent respectively, and the returned interaction options and storyboard planning are received respectively.
[0217] Correspondingly, the material generation unit 704 can be further configured as follows:
[0218] Generate material generation instructions corresponding to the storyboard plan;
[0219] Send material generation instructions to the material generation agent and receive the returned materials.
[0220] In some other optional implementations of this embodiment, the interactive video generation device 700 based on multi-agent cooperation and capable of intelligent interaction may further include:
[0221] The quality evaluation unit is configured to evaluate the quality of any of the received interactive video scripts, interactive options, storyboards, and materials based on user needs.
[0222] The adjustment instruction information determination unit is configured to determine adjustment instruction information based on user needs in response to a quality evaluation result of failure.
[0223] The regeneration instruction issuing unit is configured as an agent with a failed quality evaluation result, and issues a regeneration instruction containing adjustment instructions until a passed quality evaluation result is obtained.
[0224] In some other optional implementations of this embodiment, the interactive video generation device 700 based on multi-agent cooperation and capable of intelligent interaction may further include:
[0225] A multiple alternative deliverable providing unit is configured to control at least one of a script intelligent agent, an interaction option generating intelligent agent, a storyboard intelligent agent, and a material generating intelligent agent, and simultaneously provide at least two alternative deliverables for a received generation instruction; wherein the deliverables correspond to at least one of: interactive video script, interaction option, storyboard planning, and material.
[0226] In some other optional implementations of this embodiment, the interactive video generation device 700 based on multi-agent cooperation and capable of intelligent interaction may further include:
[0227] The image and posture control unit is configured to, in response to the initial video content containing a real or virtual image, control the generated interactive video to contain the same real or virtual image, and control the posture of the real or virtual image appearing in the interactive video to match the video content of the interactive video.
[0228] In some other optional implementations of this embodiment, the interactive video generation device 700 based on multi-agent cooperation and capable of intelligent interaction may further include:
[0229] The predictive interactive video generation unit is configured to, in response to the completion of the generation of an interactive video based on the current user needs, control each agent to jointly generate a predictive interactive video corresponding to future user needs based on the current user needs.
[0230] In some other optional implementations of this embodiment, the interactive video generation device 700 based on multi-agent cooperation and capable of intelligent interaction may further include:
[0231] The unit is directly used and configured to respond to a target preset interactive video whose matching degree with the actual user needs of the present and future exceeds a preset degree, and to directly provide the target preset interactive video to the target user.
[0232] In some other optional implementations of this embodiment, the number of preset interactive videos and the number of future nodes involved are determined based on availability and / or the priority of the target user.
[0233] In some other optional implementations of this embodiment, the interactive video generation device 700 based on multi-agent cooperation and capable of intelligent interaction may further include:
[0234] The rollback selection information receiving unit is configured to receive rollback selection information for historically generated interactive videos input by the target user;
[0235] The new interactive video generation unit is configured to respond to a new interaction by the target user with a historical node corresponding to the rollback selection information, and to control each agent to jointly generate a new interactive video based on the new user needs corresponding to the new interaction.
[0236] This embodiment is a device embodiment corresponding to the method embodiment described above. The interactive video generation device based on multi-agent collaboration provided in this embodiment has a main agent responsible for determining the target user's needs based on the target user's interaction with the initial video content. Then, based on these user needs, the main agent dispatches the intermediate stages (including script, interaction options, storyboard planning, and materials) for generating a matching interactive video to various specialized agents for execution. This allows the specialized agents to better complete the deliverables of each intermediate stage. Finally, the main agent generates an interactive video matching the user's needs using a video rendering template based on the deliverables of each intermediate stage. This solution, through automation and agent collaboration between the main agent and the specialized agents, enables the generated interactive video to fully meet personalized and diverse user needs, further improving user stickiness and interactive experience of the interactive video, thereby enhancing user satisfaction with such services or products.
[0237] According to embodiments of this disclosure, this disclosure also provides an electronic device, which includes: at least one processor; and a memory communicatively connected to the at least one processor; wherein the memory stores instructions executable by the at least one processor, which, when executed by the at least one processor, enable the at least one processor to implement the interactive video generation method based on multi-agent cooperation and intelligent interaction described in any of the above embodiments.
[0238] According to embodiments of this disclosure, this disclosure also provides a readable storage medium storing computer instructions that enable a computer to execute and implement the interactive video generation method based on multi-agent cooperation and intelligent interaction as described in any of the above embodiments.
[0239] According to embodiments of this disclosure, this disclosure also provides a computer program product that, when executed by a processor, can implement the interactive video generation method based on multi-agent cooperation and intelligent interaction described in any of the above embodiments.
[0240] Figure 8 A schematic block diagram of an example electronic device 800 that can be used to implement embodiments of the present disclosure is shown. The electronic device is intended to represent various forms of digital computers, such as laptop computers, desktop computers, workstations, personal digital assistants, servers, blade servers, mainframe computers, and other suitable computers. The electronic device may also represent various forms of mobile devices, such as personal digital processors, cellular phones, smartphones, wearable devices, and other similar computing devices. The components shown herein, their connections and relationships, and their functions are merely illustrative and are not intended to limit the implementation of the present disclosure described and / or claimed herein.
[0241] like Figure 8 As shown, device 800 includes a computing unit 801, which can perform various appropriate actions and processes based on a computer program stored in read-only memory (ROM) 802 or a computer program loaded from storage unit 808 into random access memory (RAM) 803. RAM 803 may also store various programs and data required for the operation of device 800. The computing unit 801, ROM 802, and RAM 803 are interconnected via bus 804. Input / output (I / O) interface 805 is also connected to bus 804.
[0242] Multiple components in device 800 are connected to I / O interface 805, including: input unit 806, such as keyboard, mouse, etc.; output unit 807, such as various types of monitors, speakers, etc.; storage unit 808, such as disk, optical disk, etc.; and communication unit 809, such as network card, modem, wireless transceiver, etc. Communication unit 809 allows device 800 to exchange information / data with other devices through computer networks such as the Internet and / or various telecommunications networks.
[0243] The computing unit 801 can be various general-purpose and / or special-purpose processing components with processing and computing capabilities. Some examples of the computing unit 801 include, but are not limited to, a central processing unit (CPU), a graphics processing unit (GPU), various special-purpose artificial intelligence (AI) computing chips, various computing units running machine learning model algorithms, a digital signal processor (DSP), and any suitable processor, controller, microcontroller, etc. The computing unit 801 performs the various methods and processes described above, such as an interactive video generation method based on multi-agent cooperation that enables intelligent interaction. For example, in some embodiments, the interactive video generation method based on multi-agent cooperation that enables intelligent interaction can be implemented as a computer software program tangibly contained in a machine-readable medium, such as storage unit 808. In some embodiments, part or all of the computer program can be loaded and / or installed on device 800 via ROM 802 and / or communication unit 809. When the computer program is loaded into RAM 803 and executed by the computing unit 801, one or more steps of the interactive video generation method based on multi-agent cooperation that enables intelligent interaction described above can be performed. Alternatively, in other embodiments, computing unit 801 may be configured by any other suitable means (e.g., by means of firmware) to perform an interactive video generation method based on multi-agent cooperation and intelligent interaction.
[0244] Various embodiments of the systems and techniques described above herein can be implemented in digital electronic circuit systems, integrated circuit systems, field-programmable gate arrays (FPGAs), application-specific integrated circuits (ASICs), application-specific standard products (ASSPs), systems-on-a-chip (SoCs), payload-programmable logic devices (CPLDs), computer hardware, firmware, software, and / or combinations thereof. These various embodiments may include implementations in one or more computer programs that can be executed and / or interpreted on a programmable system including at least one programmable processor, which may be a dedicated or general-purpose programmable processor, capable of receiving data and instructions from a storage system, at least one input device, and at least one output device, and transmitting data and instructions to the storage system, the at least one input device, and the at least one output device.
[0245] The program code used to implement the methods of this disclosure may be written in any combination of one or more programming languages. This program code may be provided to a processor or controller of a general-purpose computer, special-purpose computer, or other programmable data processing apparatus, such that when executed by the processor or controller, the program code causes the functions / operations specified in the flowcharts and / or block diagrams to be implemented. The program code may be executed entirely on a machine, partially on a machine, as a standalone software package partially on a machine and partially on a remote machine, or entirely on a remote machine or server.
[0246] In the context of this disclosure, a machine-readable medium can be a tangible medium that may contain or store a program for use by or in conjunction with an instruction execution system, apparatus, or device. A machine-readable medium can be a machine-readable signal medium or a machine-readable storage medium. A machine-readable medium can be, but is not limited to, electronic, magnetic, optical, electromagnetic, infrared, or semiconductor systems, apparatus, or devices, or any suitable combination of the foregoing. More specific examples of machine-readable storage media include electrical connections based on one or more wires, portable computer disks, hard disks, random access memory (RAM), read-only memory (ROM), erasable programmable read-only memory (EPROM or flash memory), optical fiber, portable compact disk read-only memory (CD-ROM), optical storage devices, magnetic storage devices, or any suitable combination of the foregoing.
[0247] To provide interaction with a user, the systems and techniques described herein can be implemented on a computer having: a display device for displaying information to the user (e.g., a CRT (cathode ray tube) or LCD (liquid crystal display) monitor); and a keyboard and pointing device (e.g., a mouse or trackball) through which the user provides input to the computer. Other types of devices can also be used to provide interaction with the user; for example, feedback provided to the user can be any form of sensory feedback (e.g., visual feedback, auditory feedback, or tactile feedback); and input from the user can be received in any form (including sound input, voice input, or tactile input).
[0248] The systems and technologies described herein can be implemented in computing systems that include backend components (e.g., as a data server), or computing systems that include middleware components (e.g., an application server), or computing systems that include frontend components (e.g., a user computer with a graphical user interface or web browser through which a user can interact with embodiments of the systems and technologies described herein), or any combination of such backend, middleware, or frontend components. The components of the system can be interconnected via digital data communication of any form or medium (e.g., a communication network). Examples of communication networks include local area networks (LANs), wide area networks (WANs), and the Internet.
[0249] Computer systems can include clients and servers. Clients and servers are generally located far apart and typically interact through communication networks. The client-server relationship is created by computer programs running on the respective computers and having a client-server relationship with each other. The server can be a cloud server, also known as a cloud computing server or cloud host, which is a hosting product within the cloud computing service system to address the shortcomings of traditional physical hosts and Virtual Private Server (VPS) services, such as high management difficulty and weak business scalability.
[0250] According to the technical solution of this disclosure, the main intelligent agent is responsible for determining the target user's user needs based on the target user's interactive behavior with the initial video content. Then, based on these user needs, the intermediate stages (including script, interaction options, storyboard planning, and materials) for generating a matching interactive video are dispatched to various specialized intelligent agents for execution. This allows the specialized intelligent agents to better complete the deliverables of each intermediate stage. Finally, the main intelligent agent generates an interactive video matching the user's needs using a video rendering template based on the deliverables of each intermediate stage. This solution, through automation and agent collaboration between the main intelligent agent and the specialized intelligent agents, enables the generated interactive video to fully meet personalized and diverse user needs, further improving user stickiness and interactive experience of the interactive video, thereby enhancing user satisfaction with such services or products.
[0251] It should be understood that the various forms of processes shown above can be used to rearrange, add, or delete steps. For example, the steps described in this disclosure can be executed in parallel, sequentially, or in different orders, as long as the desired result of the technical solution disclosed in this disclosure can be achieved, and this is not limited herein.
[0252] The specific embodiments described above do not constitute a limitation on the scope of protection of this disclosure. Those skilled in the art should understand that various modifications, combinations, sub-combinations, and substitutions can be made according to design requirements and other factors. Any modifications, equivalent substitutions, and improvements made within the spirit and principles of this disclosure should be included within the scope of protection of this disclosure.
Claims
1. A method for generating intelligently interactive videos based on multi-agent collaboration, comprising: Determine user needs based on the interactions between target users and the initial video content; Control the preset copywriting intelligence agent to generate interactive video copywriting that matches the user's needs; The control interaction option generation agent generates interaction options according to the interactive video text; The storyboard AI agent generates a storyboard plan based on the interactive video script and the interactive options. The preset material generation agent is controlled to generate materials that constitute each video frame according to the storyboard plan; The interactive video script, interactive options, and materials are used to generate an interactive video that matches the user's needs, based on a preset video rendering template. For any of the received interactive video scripts, interaction options, storyboard plans, and footage, a quality evaluation is performed based on the user's requirements. In response to a failed quality evaluation, adjustment instructions are determined based on the user's requirements. For the agent with the failed quality evaluation result, a regeneration instruction containing the adjustment instructions is issued until a passed quality evaluation result is obtained. At least one of the script generation agent, the interaction option generation agent, the storyboard generation agent, and the footage generation agent is controlled to simultaneously provide at least two alternative deliverables for the received generation instruction. The deliverables correspond to at least one of the following: interactive video scripts, interaction options, storyboard plans, and footage.
2. The method according to claim 1, wherein, The process of determining user needs based on the interaction between the target user and the initial video content includes: Based on the interaction between the target user and the initial video content, determine the interaction information; Based on the interaction information and the target user's profile information, the user's needs are determined.
3. The method according to claim 2, wherein, The interactive information includes at least one of the following: The interactive options provided for the initial video content include selection information, input text information, input image information, input voice information, and input or shared accessible or callable links.
4. The method according to claim 2, wherein, Determining the user needs based on the interaction information and the target user's profile information includes: Based on the initial video content and the interaction information, determine the desired video direction; The user needs are obtained by correcting the video direction requirements using the profile information.
5. The method according to claim 1, wherein, The control of the preset text-based intelligent agent to generate interactive video text that matches the user's needs includes: Generate a copywriting generation instruction corresponding to the user's requirements; The text generation instruction is sent to the text intelligent agent, and the interactive video text returned is received. Correspondingly, the control preset interaction option generation agent and storyboard agent respectively generate interaction options and storyboard plans according to the interactive video script, including: Based on the interactive video script, generate interactive option generation instructions and storyboard planning instructions respectively; The interaction option generation instruction and the storyboard planning instruction are respectively sent to the interaction option generation agent and the storyboard agent, and the returned interaction options and storyboard plans are received respectively. Correspondingly, the preset material generation agent generates materials constituting each video frame according to the storyboard plan, including: Generate material generation instructions corresponding to the storyboard plan; The material generation instruction is sent to the material generation agent, and the returned material is received.
6. The method according to claim 1, wherein, In response to the initial video content containing a real or virtual image, the system controls the generated interactive video to contain the same real or virtual image, and controls the pose of the real or virtual image appearing in the interactive video to match the video content of the interactive video.
7. The method according to any one of claims 1-6, further comprising: In response to the completion of the interactive video generated based on the current user needs, the system controls each agent to jointly generate a predicted interactive video that corresponds to the future user needs.
8. The method according to claim 7, further comprising: In response to the existence of a predicted interactive video that matches the actual user needs of the future to a degree exceeding a preset level, the predicted interactive video is directly provided to the target user.
9. The method according to claim 7, wherein, The predicted number of interactive videos and the number of future nodes involved are determined based on availability and / or the priority of the target users.
10. The method of claim 7, further comprising: Receive the rewind selection information for historically generated interactive videos input by the target user; In response to the target user making a new interaction with the historical node corresponding to the rollback selection information, the intelligent agents are controlled to jointly generate a new interactive video based on the new user needs corresponding to the new interaction.
11. An interactive video generation device based on multi-agent collaboration, comprising: The user needs determination unit is configured to determine user needs based on the interaction between the target user and the initial video content. The copywriting generation unit is configured to control a preset copywriting agent to generate interactive video copywriting that matches the user's needs; The interaction option and storyboard planning generation unit is configured to control the interaction option generation agent to generate interaction options according to the interactive video script; and to control the storyboard agent to generate storyboard plans according to the interactive video script and the interaction options. The material generation unit is configured to control a preset material generation agent to generate materials that constitute each video frame according to the storyboard plan; An interactive video generation unit is configured to generate an interactive video that matches the user's needs by taking the interactive video script, the interactive options, and the materials according to a preset video rendering template. A quality assessment unit is configured to evaluate the quality of any one of the received interactive video scripts, interaction options, storyboard plans, and footage based on the user's requirements; in response to a failed quality assessment result, determine adjustment instruction information according to the user's requirements; for the agent with the failed quality assessment result, issue a regeneration instruction containing the adjustment instruction information until a passed quality assessment result is obtained; control at least one of the script generation agent, the interaction option generation agent, the storyboard agent, and the footage generation agent to simultaneously provide at least two alternative deliverables for the received generation instruction; wherein the deliverables correspond to at least one of: interactive video scripts, interaction options, storyboard plans, and footage.
12. An interactive video generation system based on multi-agent collaboration, comprising: The main intelligent agent is used to determine user needs based on the interaction between the target user and the initial video content; The received interactive video script, interactive options, and materials are used to generate an interactive video that matches the user's needs, according to a preset video rendering template. A copywriting AI agent is used to generate interactive video scripts that match the user's needs under the control of the main AI agent. An interactive option generating agent is used to generate interactive options according to the interactive video text under the control of the main agent. The storyboard intelligent agent is used to generate a storyboard plan according to the interactive video script under the control of the main intelligent agent; The material generation agent is used to generate materials that constitute each video frame according to the storyboard plan under the control of the main agent; For any of the received interactive video scripts, interaction options, storyboard plans, and footage, a quality evaluation is performed based on the user's requirements. In response to a failed quality evaluation, adjustment instructions are determined based on the user's requirements. For the agent with the failed quality evaluation result, a regeneration instruction containing the adjustment instructions is issued until a passed quality evaluation result is obtained. At least one of the script generation agent, the interaction option generation agent, the storyboard generation agent, and the footage generation agent is controlled to simultaneously provide at least two alternative deliverables for the received generation instruction. The deliverables correspond to at least one of the following: interactive video scripts, interaction options, storyboard plans, and footage.
13. An electronic device, comprising: At least one processor; as well as A memory communicatively connected to the at least one processor; wherein, The memory stores instructions executable by the at least one processor, which, when executed by the at least one processor, enables the at least one processor to perform the interactive video generation method based on multi-agent cooperation as described in any one of claims 1-10.
14. A non-transitory computer-readable storage medium storing computer instructions for causing the computer to perform the interactive video generation method based on multi-agent cooperation and capable of intelligent interaction as described in any one of claims 1-10.
15. A computer program product comprising a computer program that, when executed by a processor, implements the steps of the interactive video generation method based on multi-agent cooperation and intelligent interaction according to any one of claims 1-10.