Content generation method and device, computer device, and storage medium

What is AI technical title?
AI technical title is built by PatSnap AI team. It summarizes the technical point description of the patent document.
By decomposing the content generation process using a multi-agent architecture, the problem of unstable content generation by intelligent systems is solved, achieving efficient and stable content generation results and improving generation efficiency and consistency.

CN122240255APending Publication Date: 2026-06-19BEIJING ZITIAO NETWORK TECH CO LTD +1

View PDF 0 Cites 0 Cited by

Patent Information

Authority / Receiving Office: CN · China
Patent Type: Applications(China)
Current Assignee / Owner: BEIJING ZITIAO NETWORK TECH CO LTD
Filing Date: 2026-02-28
Publication Date: 2026-06-19

Application Information

Patent Timeline

28 Feb 2026

Application

19 Jun 2026

Publication

CN122240255A

IPC: G06F9/48; G06F40/166; G06N3/006; G06N3/045; G06N5/022

AI Tagging

Application Domain

Program initiation/switching Biological models

Explore More Agents

Novelty Search
Search existing technologies and assess novelty
↗
FTO
Analyze whether a product may infringe others' patents
↗
Design FTO
Check prior-design risk for exterior design
↗
Drafting
Draft patent application text based on a technical solution
↗
Find Solutions with TRIZ
Generate feasible solution to solve your technical challenge
↗

Similar Technology Patents

Get free access to AI patent search and analysis

Check patentability, review prior art and ask IP Agent with full patent context.

AI Technical Summary

Technical Problem

Existing intelligent systems are unstable in the content generation process, resulting in a lack of controllability and consistency in the generated content.

Method used

A multi-agent architecture is adopted, which breaks down the content generation process into four stages: recognition, conception, planning and execution, each completed by a different agent. The first agent understands the user's needs, the second agent parses the content description text and generates execution steps and prompts, and the third agent executes the operation to generate the target content.

Benefits of technology

By decomposing and collaborating with intelligent agents, the stability and consistency of generated content are ensured, improving generation efficiency and effectiveness, and achieving efficient automated processing from creative conception to final product.

✦ Generated by Eureka AI based on patent content.

Smart Images

Figure CN122240255A_ABST

Patent Text Reader

Abstract

This disclosure provides a content generation method and apparatus, computer device, storage medium, and program product. The method includes: responding to receiving a content generation request from a user, identifying the content generation request to determine user needs; using a first intelligent agent to understand the user needs and generate content description text; the first intelligent agent calling a second intelligent agent to parse the content description text, the second intelligent agent decomposing the content description text into multiple execution steps, and generating prompts corresponding to the multiple execution steps based on the content description text; the second intelligent agent calling a third intelligent agent to execute the multiple execution steps, the third intelligent agent, based on the prompts, calling corresponding execution tools to perform corresponding operations according to the multiple execution steps to generate target content. The content generation method and apparatus, computer device, storage medium, and program product provided by this disclosure can optimize the content generation effect to a certain extent.

Need to check novelty before this filing date? Find Prior Art

Description

Technical Field

[0001] This disclosure relates to the field of computer technology, and in particular to a content generation method and apparatus, computer equipment, and storage medium. Background Technology

[0002] With the development of information technology, more and more intelligent systems are acquiring the ability to autonomously analyze and generate content. These systems typically obtain user commands through natural language interaction and combine them with model reasoning capabilities and tool invocation capabilities to generate target content.

[0003] However, the inventors of this disclosure have discovered that the generation effect of intelligent systems in related technologies has certain defects. Summary of the Invention

[0004] This disclosure proposes a content generation method and apparatus, computer equipment, and storage medium to solve or partially solve the above-mentioned problems.

[0005] In a first aspect, this disclosure provides a content generation method, including:

[0006] In response to receiving a user's content generation request, the content generation request is identified to determine the user's needs; The first intelligent agent is used to understand the user's needs and generate content description text; The first agent invokes the second agent to parse the content description text. The second agent decomposes the content description text into multiple execution steps and generates prompt words corresponding to the multiple execution steps based on the content description text. The second agent invokes the third agent to execute the plurality of execution steps. Based on the prompt words, the third agent invokes the corresponding execution tools to perform corresponding operations according to the plurality of execution steps in order to generate the target content.

[0007] A second aspect of this disclosure provides a content generation apparatus, comprising: The identification module is configured to: in response to receiving a user's content generation request, identify the content generation request to determine the user's needs; The conversion module is configured to: understand the user's needs using a first intelligent agent and generate content description text; The planning module is configured such that: the first agent calls the second agent to parse the content description text; the second agent decomposes the content description text into multiple execution steps and generates prompt words corresponding to the multiple execution steps based on the content description text; The execution module is configured such that the second agent invokes a third agent to execute the plurality of execution steps, and the third agent, based on the prompt words, invokes the corresponding execution tools to perform corresponding operations according to the plurality of execution steps to generate target content.

[0008] A third aspect of this disclosure provides a computer device including one or more processors, a memory, and one or more programs, wherein the one or more programs are stored in the memory and executed by the one or more processors, and the one or more programs include instructions for performing the method of the first aspect.

[0009] A fourth aspect of this disclosure provides a non-volatile computer-readable storage medium comprising a computer program that, when executed by one or more processors, causes the one or more processors to perform the method described in the first aspect.

[0010] The content generation method, apparatus, computer equipment, and storage medium provided in this disclosure break down the content generation process into identification, conceptualization, planning, and execution processes. The conceptualization, planning, and execution are accomplished using different intelligent agents, which can ensure the stability and consistency of the generated content. Attached Figure Description

[0011] To more clearly illustrate the technical solutions in this disclosure or related technologies, the accompanying drawings used in the description of the embodiments or related technologies will be briefly introduced below. Obviously, the accompanying drawings described below are only embodiments of this disclosure. For those skilled in the art, other drawings can be obtained based on these drawings without creative effort.

[0012] Figure 1 A schematic diagram of an exemplary system provided by an embodiment of this disclosure is shown.

[0013] Figure 2 A flowchart illustrating an exemplary content generation method provided in an embodiment of this disclosure is shown.

[0014] Figure 3 A schematic diagram of an exemplary interface according to an embodiment of this disclosure is shown.

[0015] Figure 4 A schematic diagram of an exemplary multi-agent architecture employed in an embodiment of this disclosure is shown.

[0016] Figure 5 A schematic diagram of the hardware structure of an exemplary computer device provided in an embodiment of this disclosure is shown.

[0017] Figure 6A schematic diagram of an exemplary content generation apparatus provided in an embodiment of this disclosure is shown. Detailed Implementation

[0018] To make the objectives, technical solutions, and advantages of this disclosure clearer, the following detailed description is provided in conjunction with specific embodiments and the accompanying drawings.

[0019] It should be noted that, unless otherwise defined, the technical or scientific terms used in the embodiments of this disclosure should have the ordinary meaning understood by one of ordinary skill in the art to which this disclosure pertains. The terms "first," "second," and similar terms used in the embodiments of this disclosure do not indicate any order, quantity, or importance, but are merely used to distinguish different components. Terms such as "comprising" or "including" mean that the element or object preceding the word encompasses the elements or objects listed following the word and their equivalents, without excluding other elements or objects. Terms such as "connected" or "linked" are not limited to physical or mechanical connections, but can include electrical connections, whether direct or indirect. Terms such as "upper," "lower," "left," and "right" are used only to indicate relative positional relationships; when the absolute position of the described object changes, the relative positional relationship may also change accordingly.

[0020] It is understood that before using the technical solutions of the various embodiments in this disclosure, users will be informed of the type, scope of use, and usage scenarios of the personal information involved in an appropriate manner, and user authorization will be obtained.

[0021] For example, upon receiving a user's active request, a prompt message is sent to the user to explicitly inform them that the requested operation will require the acquisition and use of the user's personal information. This allows the user to independently choose, based on the prompt message, whether to provide personal information to the software or hardware such as electronic devices, applications, servers, or storage media performing the operations of this disclosed technical solution.

[0022] As an optional but not limited implementation, in response to a user's active request, sending a prompt message to the user can be done via a pop-up window, where the prompt message can be presented in text format. Furthermore, the pop-up window can also include a selection control allowing the user to choose "agree" or "disagree" to provide personal information to the electronic device.

[0023] It is understood that the above notification and user authorization process are merely illustrative and do not constitute a limitation on the implementation of this disclosure. Other methods that comply with relevant laws and regulations may also be applied to the implementation of this disclosure.

[0024] As used in this paper, the term "model" refers to a model that learns the relationship between inputs and outputs from training data, enabling it to generate corresponding outputs for a given input after training. Model generation can be based on machine learning techniques. Deep learning is a machine learning algorithm that processes inputs and provides corresponding outputs using multiple layers of processing units. A neural network model is an example of a deep learning-based model. In this paper, "model" may also be referred to as a "machine learning model," "learning model," "machine learning network," or "learning network," and these terms are used interchangeably.

[0025] Figure 1 A schematic diagram of an exemplary system 100 provided in this disclosure embodiment is shown. This exemplary system 100 may include a terminal device 110, a server 130, an agent 160, and a tool 170 invoked by the agent 160. Optionally, software and / or an application 120 (hereinafter referred to as application 120) may be installed on the terminal device 110. A user 140 may interact with the application 120 via the terminal device 110 and / or an attached device to the terminal device 110.

[0026] In some embodiments, application 120 may be downloaded and installed on terminal device 110. In some embodiments, application 120 may also be accessed in other ways, such as via a web page. Figure 1 In system 100, in response to application 120 being launched, terminal device 110 can display the interface 150 of application 120.

[0027] In some embodiments, terminal device 110 can communicate with server 130 to provide services to application 120. Terminal device 110 can be any type of mobile terminal, fixed terminal, or portable terminal, including mobile phones, desktop computers, laptop computers, notebook computers, netbook computers, tablet computers, media computers, multimedia tablets, personal communication system (PCS) devices, personal navigation devices, personal digital assistants (PDAs), audio / video players, digital cameras / camcorders, television receivers, radio receivers, e-book devices, gaming devices, or any combination thereof, including accessories and peripherals of these devices or any combination thereof. In some embodiments, terminal device 110 can also support any type of user-facing interface (such as "wearable" circuitry). Application 120 can be various types of computing systems / servers capable of providing computing power, including but not limited to mainframes, edge computing nodes, computing devices in cloud environments, etc. Server 130 can be a server providing various services, such as a backend server supporting various applications or software displayed on terminal device 110. Server 130 can be hardware or software. When these are hardware components, they can be implemented as a distributed server cluster consisting of multiple servers, or as a single server. When they are software components, they can be implemented as multiple software programs or software modules (e.g., used to provide distributed services), or as a single software program or software module. No specific limitations are made here.

[0028] In embodiments of this disclosure, application 120 can provide interaction functionality with an agent. Application 120 can be an application dedicated to providing agent services, or an application integrated with an agent (i.e., it can provide functions or services other than those of an agent). It is understood that, although... Figure 1 The image shows a single application, but in reality, multiple applications can be installed on the terminal device 110.

[0029] In the embodiments of this disclosure, intelligent agents 161, 162, ..., 16n (which may be collectively referred to or individually as intelligent agent 160) may be deployed locally on the terminal device 110 or remotely. In the case of remote deployment, the terminal device 110 may directly invoke the intelligent agent, or it may invoke the intelligent agent via the server 130. An intelligent agent, also known as an Artificial Intelligence Agent (AI Agent), can refer to an intelligent entity capable of understanding goals, perceiving the environment, making decisions, and performing actions.

[0030] In embodiments of this disclosure, the intelligent agent 160 may possess intelligent dialogue and task processing capabilities. The terminal device 110 provides an interface 150 that can present interactions with the intelligent agent 160. In the interface 150, the user 140 can submit task requests to the intelligent agent 160 by inputting natural language input (text input or voice input), and can upload online or offline files, instructing the intelligent agent 160 to assist in completing various tasks through dialogue.

[0031] In embodiments of this disclosure, during interaction with user 140, agent 160 can respond to user 140's content generation request and generate media content indicated by the user. In some embodiments, during task execution, agent 160 can invoke one or more tools 171, 172, ... 17m (collectively or individually referred to as tools 170) to assist in task execution and the provision of task results, as needed for the task. These tools 170 can be any type of tool, such as text generation tools, file reading tools, information search tools, online or offline databases, image processing tools, image generation tools, web page creation tools, etc. In some embodiments, tools 170 can also be agents, services, or other entities with tool attributes.

[0032] In some embodiments, system 100 may further include a management node for a plurality of agents 161-16n, which can interact with the agents 161-16n. In some examples, the management node may, in response to a task request from user 140, determine the task requirements corresponding to the task request. The management node may then, based on the task requirements, assign the task request to an agent 160 that matches the task requirements, requesting that agent 160 to execute the task. In other examples, the management node may also determine an execution plan for the task based on the task requirements. The execution plan may indicate one or more subtasks required to complete the task. The management node may assign these one or more subtasks to one or more agents 160, which will then execute their respective subtasks. Regarding the management node, in some examples, the management node may be implemented by one of the agents 161-16n (in which case, this agent is also referred to as the scheduling agent). In other examples, the management node may be implemented by a machine learning model, such as a language model (LM) or a large language model (LLM).

[0033] In some embodiments, agent 160 may be constructed based on one or more machine learning models. In some embodiments, the machine learning model upon which agent 160 is based may include at least a language model (LM). These machine learning models include content-generative models capable of generating corresponding outputs based on model inputs. In some embodiments, the language model-based machine learning model may receive text-modal model inputs (e.g., natural language and / or machine language) and / or non-text-modal model inputs (e.g., images, speech, video, etc.), and may obtain corresponding model outputs based on the model inputs and prompts, thereby completing the task execution. Here, prompts are used to guide the machine learning model to generate user queries that can resolve the user queries indicated by the model inputs. In application scenarios supporting user dialogue, user 140's input may be provided to machine learning model 160 as at least a part of the model inputs (other parts may include prompts). This user input is considered a question or query request. Based on the model outputs, corresponding responses may be provided to user 140.

[0034] It should be understood that the structure and function of the various elements in system 100 are described for illustrative purposes only and do not imply any limitation on the scope of this disclosure.

[0035] In some exemplary scenarios, user 140 can generate content through application 120 installed on terminal device 110. Application 120 can invoke intelligent agent 160 to perform the generation task. However, as mentioned above, the inventors of this disclosure have found that in related technologies, when intelligent systems generate content, the process of content generation is uncontrollable because the intelligent agent has the ability to understand the target, perceive the environment, make decisions, and execute actions, resulting in a certain lack of stability in the effect of the generated content.

[0036] In view of this, a first aspect of the present disclosure provides a content generation method that can solve or partially solve the above-mentioned problems to a certain extent.

[0037] Figure 2 A flowchart illustrating an exemplary content generation method 200 provided in this disclosure embodiment is shown. This content generation method 200 can be used to generate content. Optionally, the content generation method 200 can be... Figure 1 The terminal device 110, server 130, or intelligent agent 160 can be used to implement this, or it can be implemented by... Figure 1 The entities in System 100 implement the system together through interaction.

[0038] like Figure 2 As shown, the content generation method 200 may further include the following steps.

[0039] In step 202, in response to receiving a content generation request from user 140, the content generation request is identified to determine user needs.

[0040] In this step, the content generation request can be a request generated by terminal device 110 or application 120 based on information input by user 140 for generating content. The information input by the user can be text, voice, images, video, etc.

[0041] Figure 3 A schematic diagram of an exemplary interface 300 according to an embodiment of the present disclosure is shown.

[0042] like Figure 3 As shown, interface 300 can be the interaction interface between user 140 and intelligent agent 160. This interface 300 can receive content generation requests initiated by the user. For example... Figure 3 As shown, a user can trigger a new task by activating the "New Conversation" control 302. In response to the activation of the "New Conversation" control 302, the terminal device 110 can present an interactive interface for the new task in the interface 300. In some embodiments, the interface 300 may also provide an input field 304 for information input. In the input field 304, the user can input a specific request for content generation and initiate the content generation request by sending the control 306. The input field 304 may support text input, such as entering text in a text input box. Optionally, the input field 304 may also support voice input, such as inputting voice by triggering the voice control 308. Furthermore, the input field 304 may also provide an upload control 310 to support uploading attachments such as requirement documents, design documents, or images, videos, etc., to indicate the user's content generation request. In some embodiments, the interface 300 may also provide a task list (not shown) initiated by the intelligent agent 160, indicating each initiated task and its execution status (e.g., interrupted, in progress, completed, etc.).

[0043] For example, user 140 uploads an image (e.g., a selfie of user 140) via upload control 310 and enters some text (e.g., "Help me generate a video about the AA theme"), then triggers send control 306 to initiate a content generation request. At this point, it can be considered that the content generation request from user 140 has been received.

[0044] After receiving a user's content generation request, the request can be identified to determine the user's needs.

[0045] Identifying content generation requests can be achieved by analyzing user input (e.g., text, images, etc.) to determine the user's actual or potential needs.

[0046] In some embodiments, a fourth agent may be used to identify content generation requests.

[0047] Figure 4 A schematic diagram of an exemplary multi-agent architecture 400 used in embodiments of this disclosure is shown. Each agent in this multi-agent architecture 400 can be... Figure 1 The intelligent agent 160 in the middle.

[0048] like Figure 4 As shown, the fourth agent 440 can be used to identify content generation requests and generate user needs. For example, the user's natural language needs can be abstracted into structured intents.

[0049] Optionally, the fourth intelligent agent 440 can be a front-end consultation intelligent agent. As the system's "demand understanding and decision-making center," this agent can interact with user 140 in multiple rounds, accurately identify and clarify the user's true goals and constraints, thereby generating user requirements. Optionally, user 140 can utilize... Figure 3 The provided interface 300 is used to interact with the fourth intelligent agent 440. In some embodiments, the fourth intelligent agent 440 can be pre-configured to define the identity / role of the first intelligent agent 440, so that the fourth intelligent agent 440 performs tasks according to its identity / role, for example, by acting as the system's "demand understanding and decision-making center".

[0050] In some embodiments, after recognizing the content generation request, the fourth agent 440 can output a question to the user 140 based on the preliminary recognition result (e.g., presented on the interface 300) to clarify the user's goal or intention. Furthermore, it can continue to ask questions based on the user 140's answers, and after multiple rounds of question-and-answer sessions, further generate user requirements based on the content of the multiple rounds of questions and answers.

[0051] In some embodiments, the fourth agent 440 can also be the management node of the multi-agent architecture 400, i.e., the scheduling agent. As the unified entry point and scheduling core of the entire system, the fourth agent 440 can decide whether and how to schedule other expert agents to work collaboratively based on the complexity and type of the task.

[0052] Continue to refer to Figure 2 In step 204, the first intelligent agent is used to understand the user's needs and generate content description text.

[0053] In this step, after obtaining user needs, content description text can be generated based on these needs, thereby transforming user needs into a content planning proposal. This content description text can be a structured "creative proposal" with visual narrative and aesthetic value.

[0054] In some situations, a first agent can be used to transform user requests into content description text.

[0055] like Figure 4 As shown, the first intelligent agent 410 can understand user needs and generate content description text based on that understanding. The first intelligent agent 410 can be a gameplay planning intelligent agent. As the system's "creative engine," this agent is responsible for deeply understanding the user's higher-order intentions and transforming them into structured "creative solutions" with visual narrative and aesthetic value. In some embodiments, the first intelligent agent 410 can be pre-configured to define its identity / role, enabling it to perform tasks according to its identity / role. For example, it can act as the system's "creative engine" to perform tasks (e.g., implementing requirement divergence or planning solutions from scratch).

[0056] Optionally, the first intelligent agent 410 can also combine relevant knowledge in the first knowledge base to transform user requirements into content description text. Therefore, in some embodiments, using the first intelligent agent to understand the user requirements and generate content description text may further include: using the first intelligent agent to obtain first knowledge information matching the user requirements from the first knowledge base corresponding to the user requirements; and using the first intelligent agent to generate the content description text based on the first knowledge information and the user requirements.

[0057] In this embodiment, the first knowledge base can be a visual gameplay knowledge base. This knowledge base distills the industry experience and creative methodologies of top visual designers and content planners into explicit and rule-based machine-executable instructions, which can then be invoked by the first intelligent agent 410. Thus, unlike "black box" decision-making that relies on general models, the first intelligent agent 410 is guided by the first knowledge base. This ensures that every decision made by the agent is based on evidence, guaranteeing that the final output content meets industry standards in terms of professional aesthetics, visual structure, and stylistic consistency.

[0058] In this embodiment, the first intelligent agent 410 can extract one or more keywords related to the user's needs, and then match the corresponding first knowledge information in the first knowledge base based on the one or more keywords. Subsequently, the first intelligent agent 410 can generate the content description text based on the first knowledge information and the user's needs.

[0059] For example, if the user's requirement is to generate a funny video related to a certain holiday theme based on the uploaded image, the first intelligent agent 410, after calling the first knowledge base, can generate the content description text based on the user's requirement, which could be "to generate a video with 3D gothic style figurine effects from the user's uploaded image through a gothic blind box figurine gameplay based on a certain holiday theme".

[0060] Continue to refer to Figure 2 In step 206, the first agent calls the second agent to parse the content description text. The second agent decomposes the content description text into multiple execution steps and generates prompt words corresponding to the multiple execution steps based on the content description text.

[0061] In this step, since the content description text provides a more specific content planning scheme, after obtaining the content description text, the detailed execution steps can be decomposed or broken down based on the scheme planned in the content description text. This allows the content generation process to be completed according to these execution steps, thereby generating content stably, controllably, and scalably.

[0062] Based on this, in order to enable subsequent execution steps to better complete the content generation process, this step can also generate prompts corresponding to the multiple execution steps based on the content description text.

[0063] In some embodiments, after the first agent 410 identifies a user's request, it can invoke a second agent 420 to parse the content description text. The parsing of the content description text by the second agent may further include decomposing the content description text into multiple execution steps and generating prompts.

[0064] like Figure 4 As shown, the second agent 420 receives content description text and can generate multiple execution steps and prompts accordingly. The second agent 420 can be a path planning agent. As the system's "technical director," this agent receives content description text (creative solutions) and breaks it down into a detailed, executable, multi-step "technical blueprint," as well as a key step of "creating" the high-quality, professional-grade prompts needed to drive the underlying visual model. In some embodiments, the second agent 420 can be pre-configured to define its identity / role, enabling it to perform tasks according to its identity / role, such as implementing step decomposition and generating prompts.

[0065] In some embodiments, the method 200 may further include: determining the execution order of the plurality of execution steps; and determining a plurality of execution tools corresponding to the plurality of execution steps respectively.

[0066] The execution order of multiple execution steps can be determined when decomposing the execution steps. Optionally, the second intelligent agent 420 can obtain fourth knowledge information matching the content description text from a third knowledge base (step planning knowledge base) used for decomposing the execution steps, and then, based on the fourth knowledge information, decompose the content description text into multiple execution steps and determine the execution order of the multiple execution steps, thereby improving the effectiveness of the decomposition of execution steps and the scientific nature of the execution order.

[0067] For example, if the content description text is "using a gothic-themed blind box figurine game based on a certain holiday, a video with a 3D gothic figurine effect is generated from a user-uploaded image", the multiple execution steps could be: preprocessing the user-uploaded image to obtain an image with a target resolution and target size; generating an image with a certain holiday style based on the image; adding a gothic effect to the image with a certain holiday style; generating figurine images from the image with a gothic effect; generating a video based on the final image; adding a dark filter to the video, and so on.

[0068] In some embodiments, when the second agent 420 determines the multiple execution tools corresponding to the multiple execution steps, it can select and combine the optimal tool (atomic capability) to ensure the execution effect.

[0069] Therefore, optionally, determining the execution order of the plurality of execution steps and the plurality of execution tools corresponding to each of the plurality of execution steps may further include: Identify one or more candidate tools corresponding to the execution step; In response to detecting that there is only one candidate tool, the candidate tool is determined to be the execution tool corresponding to the execution step; In response to the detection that there are multiple candidate tools, a target candidate tool is selected as the execution tool corresponding to the execution step based on the matching degree between the multiple candidate tools and the execution step.

[0070] In this embodiment, a corresponding execution tool can be determined for each execution step, so that when the execution steps are executed in the future, the appropriate execution tool can be selected to execute the step, so that the execution result of the step has determinism and stability.

[0071] In this embodiment, for each execution step, the second agent 420 can determine one or more candidate tools capable of performing that step. When determining candidate tools, tools that the agent can invoke (e.g., Figure 1In tool 170, a tool that matches the execution step or can perform that step is selected as a candidate tool. For example, if the execution step is to generate an image in the style of a certain festival, then the candidate tool can be a tool with image generation capabilities. It is understood that there may be more than one callable tool with image generation capabilities; therefore, this embodiment has implemented different processing for different situations.

[0072] Specifically, if only one candidate tool is detected that matches the execution step, that candidate tool can be directly used as the execution tool corresponding to that execution step. If multiple candidate tools are detected that match the execution step, the target candidate tool with the highest matching degree can be selected from among the multiple candidate tools as the execution tool corresponding to that execution step, thereby improving the execution effect. For example, the matching degree can be calculated based on the similarity between the description information of the candidate tool and the execution step. Of course, it is understood that there are other ways to calculate the matching degree, and the examples here are not intended to limit the scope of protection of this disclosure.

[0073] In some embodiments, the second agent 420 may generate prompts based on a second knowledge base (prompt writing knowledge base) used to generate prompts, thereby improving the guidance effect of the prompts.

[0074] As an optional embodiment, generating prompt words corresponding to the plurality of execution steps based on the content description text may further include: Obtain second knowledge information that matches the content description text from a second knowledge base used to generate prompt words; Based on the second knowledge information and the content description text, the content application scenario is determined and prompt words corresponding to the multiple execution steps that match the content application scenario are generated.

[0075] In this embodiment, by finding second knowledge information matching the content description text in a second knowledge base, and then based on the second knowledge information and the content description text, the application scenario of the target content to be generated (e.g., structure-preserving video, multi-image fusion, etc.) can be determined. Then, prompt words corresponding to the multiple execution steps matching the content application scenario are generated. The prompt words generated in this way can better meet the needs of the application scenario. Optionally, this prompt word can be a single prompt word that can be used in the execution of each subsequent execution step, ensuring consistency in the execution style of multiple execution steps.

[0076] As another optional embodiment, generating prompt words corresponding to the plurality of execution steps based on the content description text may further include: Obtain multiple third knowledge information that match the multiple execution steps respectively from the second knowledge base used to generate prompt words; Based on the multiple third-party knowledge information and the multiple execution steps, multiple prompt words corresponding to the multiple execution steps are generated.

[0077] In this embodiment, by finding third knowledge information that matches each execution step in the second knowledge base, and then generating prompts for each execution step based on the third knowledge information and the execution step, this embodiment generates prompts that correspond one-to-one with each execution step to guide the implementation of that execution step, thereby ensuring the accuracy of each execution step.

[0078] In the above embodiments, the second intelligent agent 420 can create professional-grade Prompts with rich visual details from scratch based on creative solutions and a knowledge base. For example, it can autonomously identify and construct complex Prompt structures suitable for different scenarios (such as structure-preserving videos, multi-image fusion, etc.). In this way, by internalizing the technically demanding "Prompt project" itself into an automated core creation process, the threshold for creating high-quality content is lowered, allowing the system to reliably reach the optimal performance of the underlying model.

[0079] Continue to refer to Figure 2 In step 208, the second agent calls the third agent to execute the plurality of execution steps. The third agent, based on the prompt words, calls the corresponding execution tools to perform corresponding operations according to the plurality of execution steps to generate target content.

[0080] In this step, after obtaining multiple execution steps and prompts, each step can be executed strictly according to the prompts, thus obtaining stable and consistent target content. Depending on the user's needs, the target content can be multimedia content such as images and videos.

[0081] In some cases, after the second agent 420 has broken down the process into multiple execution steps and corresponding prompts, it can invoke the third agent 430 to execute the multiple execution steps to generate the target content. The third agent 430 can then, based on the prompts, invoke the corresponding execution tools to perform the corresponding operations according to the multiple execution steps to generate the target content.

[0082] like Figure 4As shown, the third intelligent agent 430 receives execution steps and prompts, and can generate target content accordingly. The third intelligent agent 430 can be an execution intelligent agent. As the system's "execution unit," this agent strictly follows the technical blueprint output by the path planning intelligent agent, calling underlying tools and services without deviation, step by step, to complete the specific generation operations. It does not perform any autonomous planning or modification to ensure the determinism and reliability of the execution process. This "separation of decision-making and execution" architecture fundamentally guarantees the freedom of creative conception, the rigor of technical planning, and the stability of final execution. In some embodiments, the third intelligent agent 430 can be pre-configured to define its identity / role, enabling it to execute tasks according to its identity / role, for example, strictly executing each execution step without any autonomous planning or modification.

[0083] In some embodiments, the third intelligent agent 430 may obtain multiple fifth knowledge information that match the multiple execution steps respectively from the fourth knowledge base (execution specification knowledge base), and then execute each execution step based on the corresponding fifth knowledge information, thereby ensuring the execution specification of the execution steps.

[0084] As mentioned above, in some embodiments, multiple execution steps may include an execution order, and each execution step corresponds to an execution tool (such as...). Figure 4 As shown, execution step A corresponds to tool 461, execution step B corresponds to tool 462, and execution step N corresponds to tool 46k. Therefore, optionally, the third intelligent agent, based on the prompt word, calls the corresponding execution tools to perform the corresponding operations according to the multiple execution steps, which may further include: the third intelligent agent calls the corresponding execution tools according to the execution order, and executes the execution steps based on the prompt word to generate the target content, thereby executing each execution step in a standardized manner and ensuring the generation effect.

[0085] As can be seen, this embodiment replaces the traditional manual, static workflow construction with automated "dynamic workflow orchestration." Faced with complex creative needs, the path planning agent can decompose creative ideas into specific multi-step execution links in real time and hand them over to the execution agent for completion. This automation capability improves the end-to-end implementation efficiency from an idea to the final product by tens of times, allowing complex effects that previously required professional engineers to build for hours to now be completed automatically in seconds or tens of seconds.

[0086] In some embodiments, the method 200 further includes: acquiring an image provided by a user; and generating reference materials corresponding to the plurality of execution steps based on the image and the content description text.

[0087] In this embodiment, when the fourth intelligent agent 440 determines that reference materials are needed, it can invoke the fifth intelligent agent 450 to generate reference materials based on images and content description text provided by the user. Optionally, the fifth intelligent agent 450 can be a material supply intelligent agent. As the system's "material scheduling and supply unit," this agent is responsible for analyzing various material elements (such as reference images, sample materials, basic resources, etc.) required in the generation process according to the clearly defined user needs and creative direction, and providing material inputs that can be directly used for subsequent creation as needed by invoking corresponding tools or capabilities, ensuring that the creative idea and execution have a sufficient and matching material foundation. In some embodiments, the system can be pre-configured for the fifth intelligent agent 450 to clarify its identity / role, so that the fifth intelligent agent 450 performs tasks according to its identity / role, such as generating materials according to requirements.

[0088] Furthermore, when reference materials are available, the third intelligent agent 430 can, in accordance with the execution order, call the corresponding execution tool to perform the execution steps based on the prompt words and the reference materials to generate the target content, thereby further improving the generation effect.

[0089] The content generation method provided in this disclosure breaks down the content generation process into decoupled processes of identification, conceptualization, planning, and execution. The identification, conceptualization, planning, and execution are accomplished using different intelligent agents, which can ensure the stability and consistency of the generated content.

[0090] In some embodiments, this content generation method provides an advanced solution based on multi-agent collaboration designed for the automated generation of high-quality multimodal AI effects content. In some embodiments, through a layered and decoupled agent architecture, combined with a domain expert knowledge base, abstract creative ideas are systematically transformed into stable, controllable, and scalable AI effects gameplay and effects, aiming to change the AI effects supply model and improve supply efficiency.

[0091] In some embodiments, to achieve effective separation of the various stages of effect creation, a collaborative architecture consisting of five types of intelligent agents with clearly defined functions is constructed.

[0092] In some embodiments, a complex generation task is deterministically decoupled into three highly cohesive, loosely coupled modules: "conceptualization," "planning," and "execution." This layered collaborative architecture design boasts strong reusability and foresight. Regardless of future iterations and upgrades of the underlying Large Language Model (LLM) or the invoked vertical domain model, this top-level collaborative pattern remains robust and effective. The integration of new technologies only enhances the overall performance and intelligence of the system without requiring disruptive modifications to the core architecture, thus ensuring the system's long-term value and technological viability.

[0093] It should be noted that the method of this disclosure embodiment can be executed by a single device, such as a computer or server. The method of this embodiment can also be applied to a distributed scenario, where multiple devices cooperate to complete the task. In such a distributed scenario, one of these devices may execute only one or more steps of the method of this disclosure embodiment, and the multiple devices will interact with each other to complete the method described.

[0094] It should be noted that the above description describes some embodiments of this disclosure. Other embodiments are within the scope of the appended claims. In some cases, the actions or steps recorded in the claims can be performed in a different order than that shown in the above embodiments and still achieve the desired result. Furthermore, the processes depicted in the drawings do not necessarily require a specific or sequential order to achieve the desired result. In some embodiments, multitasking and parallel processing are also possible or may be advantageous.

[0095] This disclosure also provides a computer device for implementing the above-described content generation method. Figure 5 A schematic diagram of the hardware structure of an exemplary computer device 500 provided in an embodiment of this disclosure is shown. The computer device 500 can be used to implement... Figure 1 The terminal device 110, server 130, and intelligent agent 160 are included. In some scenarios, this computer device 500 can also be used to implement... Figure 1 Tool 170.

[0096] like Figure 5 As shown, the computer device 500 may include: a processor 502, a memory 504, a network interface 506, a peripheral interface 508, and a bus 510. The processor 502, memory 504, network interface 506, and peripheral interface 508 are interconnected within the computer device 500 via the bus 510.

[0097] Processor 502 may be a central processing unit (CPU), image processor, neural network processor (NPU), microcontroller (MCU), programmable logic device, digital signal processor (DSP), application-specific integrated circuit (ASIC), or one or more integrated circuits. Processor 502 can be used to perform functions related to the techniques described in this disclosure. In some embodiments, processor 502 may also include multiple processors integrated as a single logic component. For example, such as... Figure 5As shown, processor 502 may include multiple processors 502a, 502b and 502c.

[0098] Memory 504 can be configured to store data (e.g., instructions, computer code, etc.). Figure 5 As shown, the data stored in memory 504 may include program instructions (e.g., one or more programs for implementing the content generation method of embodiments of this disclosure) and data to be processed (e.g., the memory may store configuration files of other modules, etc.). Processor 502 may also access the program instructions and data stored in memory 504 and execute the program instructions to operate on the data to be processed. Memory 504 may include volatile or non-volatile storage devices. In some embodiments, memory 504 may include random access memory (RAM), read-only memory (ROM), optical disk, magnetic disk, hard disk, solid-state drive (SSD), flash memory, memory stick, etc.

[0099] Network interface 506 can be configured to provide communication with other external devices to computer device 500 via a network. This network can be any wired or wireless network capable of transmitting and receiving data. For example, the network can be a wired network, a local wireless network (e.g., Bluetooth, WiFi, Near Field Communication (NFC), etc.), a cellular network, the Internet, or a combination thereof. It is understood that the type of network is not limited to the specific examples described above.

[0100] The peripheral interface 508 can be configured to connect the computer device 500 to one or more peripheral devices to enable information input and output. For example, peripheral devices may include input devices such as keyboards, mice, touchpads, touch screens, microphones, and various sensors, as well as output devices such as displays, speakers, vibrators, and indicator lights.

[0101] Bus 510 can be configured to transfer information between various components of computer device 500 (such as processor 502, memory 504, network interface 506, and peripheral interface 508), such as internal buses (e.g., processor-memory bus), external buses (USB port, PCI-E bus), etc.

[0102] It should be noted that although the architecture of the computer device 500 described above only shows the processor 502, memory 504, network interface 506, peripheral interface 508, and bus 510, in specific implementations, the architecture of the computer device 500 may also include other components necessary for normal operation. Furthermore, those skilled in the art will understand that the architecture of the computer device 500 described above may only include the components necessary for implementing the embodiments of this disclosure, and does not necessarily include all the components shown in the figures.

[0103] This disclosure also provides a content generation apparatus. Figure 6 A schematic diagram of an exemplary content generation apparatus 600 provided in an embodiment of this disclosure is shown. Figure 6 As shown, the content generation apparatus 600 can be used to implement the content generation method and may further include the following modules.

[0104] The identification module 602 is configured to: in response to receiving a user's content generation request, identify the content generation request to determine the user's needs; The conversion module 604 is configured to: use the first intelligent agent to understand the user's needs and generate content description text; The planning module 606 is configured such that: the first agent calls the second agent to parse the content description text; the second agent decomposes the content description text into multiple execution steps and generates prompt words corresponding to the multiple execution steps based on the content description text; The execution module 608 is configured such that: the second agent calls the third agent to execute the plurality of execution steps, and the third agent, based on the prompt words, calls the corresponding execution tools to perform corresponding operations according to the plurality of execution steps to generate target content.

[0105] In some embodiments, the conversion module 604 is configured to: The first intelligent agent is used to obtain first knowledge information that matches the user's needs from the first knowledge base corresponding to the user's needs; The first intelligent agent generates the content description text based on the first knowledge information and the user's needs.

[0106] In some embodiments, the planning module 606 is configured to: Obtain second knowledge information that matches the content description text from a second knowledge base used to generate prompt words; Based on the second knowledge information and the content description text, the content application scenario is determined and prompt words corresponding to the multiple execution steps that match the content application scenario are generated.

[0107] In some embodiments, the planning module 606 is configured to: Obtain multiple third knowledge information that match the multiple execution steps respectively from the second knowledge base used to generate prompt words; Based on the multiple third-party knowledge information and the multiple execution steps, multiple prompt words corresponding to the multiple execution steps are generated.

[0108] In some embodiments, the planning module 606 is configured to: determine the execution order of the plurality of execution steps and the plurality of execution tools corresponding to the plurality of execution steps respectively; The execution module 608 is configured such that the third intelligent agent, in accordance with the execution order, calls the corresponding execution tool to execute the execution steps based on the prompt words, so as to generate the target content.

[0109] In some embodiments, the apparatus further includes a material generation module configured to: acquire images provided by a user; and generate reference materials corresponding to the plurality of execution steps based on the images and the content description text; The execution module 608 is configured such that the third intelligent agent, in accordance with the execution order, calls the corresponding execution tool to execute the execution steps based on the prompt words and the reference materials, so as to generate the target content.

[0110] In some embodiments, the planning module 606 is configured to: Identify one or more candidate tools corresponding to the execution step; In response to detecting that there is only one candidate tool, the candidate tool is determined to be the execution tool corresponding to the execution step; In response to the detection that there are multiple candidate tools, a target candidate tool is selected as the execution tool corresponding to the execution step based on the matching degree between the multiple candidate tools and the execution step.

[0111] For ease of description, the above apparatus is described in terms of its functions, divided into various modules. Of course, in implementing this disclosure, the functions of each module can be implemented in one or more software and / or hardware.

[0112] The apparatus described above is used to implement the corresponding content generation method in any of the foregoing embodiments, and has the beneficial effects of the corresponding method embodiments, which will not be repeated here.

[0113] Based on the same inventive concept, corresponding to any of the above embodiments, this disclosure also provides a non-volatile computer-readable storage medium containing a computer program, which, when executed by one or more processors, causes the one or more processors to perform the content generation method.

[0114] The computer-readable medium of this embodiment includes permanent and non-permanent, removable and non-removable media, and information storage can be implemented by any method or technology. Information can be computer-readable instructions, data structures, program modules, or other data. Examples of computer storage media include, but are not limited to, phase-change memory (PRAM), static random access memory (SRAM), dynamic random access memory (DRAM), other types of random access memory (RAM), read-only memory (ROM), electrically erasable programmable read-only memory (EEPROM), flash memory or other memory technologies, CD-ROM, digital versatile optical disc (DVD) or other optical storage, magnetic tape, magnetic magnetic disk storage or other magnetic storage devices, or any other non-transfer medium that can be used to store information accessible by a computing device.

[0115] The computer program stored in the storage medium of the above embodiments is used to cause the one or more processors to execute the content generation method as described in any of the above embodiments, and has the beneficial effects of the corresponding method embodiments, which will not be repeated here.

[0116] Based on the same inventive concept, corresponding to the content generation method of any of the above embodiments, this disclosure also provides a computer program product, which includes one or more computer programs. In some embodiments, the one or more computer programs are executable by one or more processors to cause the one or more processors to perform the content generation method. Corresponding to the execution entity for each step in each embodiment of the content generation method, the processor executing the corresponding step may belong to the corresponding execution entity.

[0117] The computer program products of the above embodiments are used to cause a processor to execute the content generation method as described in any of the above embodiments, and have the beneficial effects of the corresponding method embodiments, which will not be repeated here.

[0118] Those skilled in the art should understand that the discussion of any of the above embodiments is merely exemplary and is not intended to imply that the scope of this disclosure (including the claims) is limited to these examples; within the framework of this disclosure, the technical features of the above embodiments or different embodiments can also be combined, the steps can be implemented in any order, and there are many other variations of different aspects of the embodiments of this disclosure as described above, which are not provided in detail for the sake of brevity.

[0119] Additionally, to simplify the description and discussion, and to avoid obscuring the embodiments of this disclosure, the provided drawings may or may not show well-known power / ground connections to integrated circuit (IC) chips and other components. Furthermore, the apparatus may be shown in block diagram form to avoid obscuring the embodiments of this disclosure, and this also takes into account the fact that the details of implementation of these block diagram apparatuses are highly dependent on the platform on which the embodiments of this disclosure will be implemented (i.e., these details should be fully understood by those skilled in the art). While specific details (e.g., circuits) have been set forth to describe exemplary embodiments of this disclosure, it will be apparent to those skilled in the art that the embodiments of this disclosure can be implemented without these specific details or with variations thereof. Therefore, these descriptions should be considered illustrative rather than restrictive.

[0120] Although this disclosure has been described in conjunction with specific embodiments thereof, many substitutions, modifications, and variations of these embodiments will be apparent to those skilled in the art from the foregoing description. For example, other memory architectures (e.g., dynamic RAM (DRAM)) may be used with the embodiments discussed.

[0121] This disclosure is intended to cover all such substitutions, modifications, and variations that fall within the broad scope of the appended claims. Therefore, any omissions, modifications, equivalent substitutions, improvements, etc., made within the spirit and principles of this disclosure should be included within the scope of protection of this disclosure.

Claims

1. A content generation method, comprising: In response to receiving a user's content generation request, the content generation request is identified to determine the user's needs; The first intelligent agent is used to understand the user's needs and generate content description text; The first agent invokes the second agent to parse the content description text. The second agent decomposes the content description text into multiple execution steps and generates prompt words corresponding to the multiple execution steps based on the content description text. The second agent invokes the third agent to execute the plurality of execution steps. Based on the prompt words, the third agent invokes the corresponding execution tools to perform corresponding operations according to the plurality of execution steps in order to generate the target content.

2. The method as described in claim 1, wherein, The first intelligent agent is used to understand the user's needs and generate content description text, including: The first intelligent agent is used to obtain first knowledge information that matches the user's needs from the first knowledge base corresponding to the user's needs; The first intelligent agent is used to understand the first knowledge information and the user's needs to generate the content description text.

3. The method as described in claim 1, wherein, Based on the content description text, prompt words corresponding to the multiple execution steps are generated, including: Obtain second knowledge information that matches the content description text from a second knowledge base used to generate prompt words; Based on the second knowledge information and the content description text, the content application scenario is determined and prompt words corresponding to the multiple execution steps that match the content application scenario are generated.

4. The method of claim 1, wherein, Based on the content description text, prompt words corresponding to the multiple execution steps are generated, including: Obtain multiple third knowledge information that match the multiple execution steps respectively from the second knowledge base used to generate prompt words; Based on the multiple third-party knowledge information and the multiple execution steps, multiple prompt words corresponding to the multiple execution steps are generated.

5. The method of claim 3 or 4, further comprising: Determine the execution order of the plurality of execution steps and the plurality of execution tools corresponding to each of the plurality of execution steps; The third intelligent agent, based on the prompt word, calls the corresponding execution tool to perform corresponding operations according to the multiple execution steps, including: the third intelligent agent calls the corresponding execution tool according to the execution order, and performs the execution steps based on the prompt word to generate the target content.

6. The method of claim 5, further comprising: Get images provided by the user; Furthermore, based on the image and the content description text, reference materials corresponding to the multiple execution steps are generated; The third intelligent agent, in accordance with the execution order, invokes the corresponding execution tool and performs the execution steps based on the prompt words to generate the target content, including: the third intelligent agent, in accordance with the execution order, invokes the corresponding execution tool and performs the execution steps based on the prompt words and the reference materials to generate the target content.

7. The method of claim 5, wherein, Determining the execution order of the plurality of execution steps and the plurality of execution tools corresponding to each of the plurality of execution steps, including: Identify one or more candidate tools corresponding to the execution step; In response to detecting that there is only one candidate tool, the candidate tool is determined to be the execution tool corresponding to the execution step; In response to the detection that there are multiple candidate tools, a target candidate tool is selected as the execution tool corresponding to the execution step based on the matching degree between the multiple candidate tools and the execution step.

8. A content generation apparatus, comprising: The identification module is configured to: in response to receiving a user's content generation request, identify the content generation request to determine the user's needs; The conversion module is configured to: understand the user's needs using a first intelligent agent and generate content description text; The planning module is configured such that: the first agent calls the second agent to parse the content description text; the second agent decomposes the content description text into multiple execution steps and generates prompt words corresponding to the multiple execution steps based on the content description text; The execution module is configured such that the second agent invokes a third agent to execute the plurality of execution steps, and the third agent, based on the prompt words, invokes the corresponding execution tools to perform corresponding operations according to the plurality of execution steps to generate target content.

9. A computer device comprising one or more processors, a memory; and one or more programs, wherein the one or more programs are stored in the memory and executed by the one or more processors, the one or more programs comprising instructions for performing the method of any one of claims 1 to 7.

10. A non-volatile computer-readable storage medium comprising a computer program, which, when executed by one or more processors, causes the one or more processors to perform the method of any one of claims 1 to 7.