Method, apparatus, device, storage medium and program product for video authoring

By guiding users to generate derivative videos during video playback and using machine learning models to maintain stylistic consistency, the lack of derivative creation tools on traditional platforms has been addressed, thereby improving video quality and platform appeal.

CN122269104APending Publication Date: 2026-06-23DOUYIN VISION CO LTD

Patent Information

Authority / Receiving Office
CN · China
Patent Type
Applications(China)
Current Assignee / Owner
DOUYIN VISION CO LTD
Filing Date
2026-04-10
Publication Date
2026-06-23

AI Technical Summary

Technical Problem

Traditional content platforms lack tools for secondary creation, making it difficult for users to effectively meet their emotional needs. Secondary creation videos lack connection with the original videos, resulting in poor video quality and failing to improve platform stickiness and attractiveness.

Method used

By presenting guidance information during video playback, receiving creative information and generating secondary creation videos, using machine learning models to maintain video style consistency, and providing creative entry points and tools to support users in generating high-quality secondary creation videos.

Benefits of technology

It has satisfied users' emotional needs, enhanced the stickiness and interactive experience of the content platform, enriched video content, formed a content ecosystem, and increased the platform's attractiveness.

✦ Generated by Eureka AI based on patent content.

Smart Images

  • Figure CN122269104A_ABST
    Figure CN122269104A_ABST
Patent Text Reader

Abstract

A method, apparatus, device, storage medium and program product for video authoring are provided. The method includes presenting first guidance information, the first guidance information indicating content authoring related to a first video, the first video being for presenting at least one first event related to at least one object; receiving authoring information, the authoring information describing a second event related to a first object of the at least one object; and presenting a second video, the second video being for presenting the second event related to the first object. In this way, it is facilitated to improve richness and diversity of video content.
Need to check novelty before this filing date? Find Prior Art

Description

Technical Field

[0001] The examples in this article generally relate to the field of computer science, and in particular to methods, apparatus, devices, computer-readable storage media, and computer program products for video creation. Background Technology

[0002] With the development of computer technology, the internet has become an important platform for video creation and sharing. Creators can create video content and publish it on the platform. Viewers can watch video content on the platform, greatly enriching people's daily lives. Summary of the Invention

[0003] In a first aspect, a method for video creation is provided. The method includes: presenting first guidance information instructing content creation related to a first video, the first video being used to display at least one first event related to at least one object; receiving creation information describing a second event related to the first object among the at least one objects; and presenting a second video for displaying the second event related to the first object.

[0004] In a second aspect, an apparatus for video creation is provided. The apparatus includes: a first presentation module configured to present first guidance information instructing content creation related to a first video, the first video being used to display at least one first event related to at least one object; a receiving module configured to receive creation information describing a second event related to the first object among the at least one object; and a second presentation module configured to present a second video, the second video being used to display the second event related to the first object.

[0005] In a third aspect, an electronic device is provided. The device includes at least one processor; and at least one memory coupled to the at least one processor and storing instructions for execution by the at least one processor. When executed by the at least one processor, the instructions cause the device to perform the method of the first aspect.

[0006] In a fourth aspect, a computer-readable storage medium is provided. The computer-readable storage medium stores computer-executable instructions that can be executed by a processor to implement the method of the first aspect.

[0007] In a fifth aspect, a computer program product is provided, which is tangibly stored in a computer storage medium and includes computer-executable instructions that, when executed by a device, cause the device to perform the method of the first aspect.

[0008] This approach allows for the creation of derivative videos that satisfy users' emotional needs, thereby improving content platform stickiness and interactive experience. Furthermore, derivative videos enhance the richness and diversity of the content platform; the original videos and derivative videos can form a "content ecosystem," which in turn increases the platform's attractiveness.

[0009] It should be understood that the content described in this section is not intended to limit the key or important features of the examples in this article, nor is it intended to restrict the scope of the solution. Other features will become readily apparent from the following description. Attached Figure Description

[0010] The above and other features, advantages, and aspects of the various examples herein will become more apparent when taken in conjunction with the accompanying drawings and the following detailed description. In the accompanying drawings, the same or similar reference numerals denote the same or similar elements, wherein: Figure 1 A schematic diagram of the example environment is shown; Figure 2 The flowcharts for the video creation process are shown according to some scenarios; Figures 3A to 3G Example interfaces are shown for some scenarios; Figure 4 Block diagrams of apparatuses for video creation are shown, according to various scenarios; and Figure 5 A block diagram of an electronic device is shown according to some scenarios. Detailed Implementation

[0011] The examples in the text will now be described in more detail with reference to the accompanying drawings. While some examples are shown in the drawings, it should be understood that solutions can be implemented in various forms and should not be construed as limited to the examples presented herein. Rather, these examples are provided to provide a more thorough and complete understanding of the solutions. It should be understood that the drawings and examples in this document are for illustrative purposes only and are not intended to limit the scope of protection of the solutions.

[0012] It should be noted that the headings of any section / subsection provided herein are not restrictive. Various examples are described throughout this document, and examples of any type may be included under any section / subsection. Furthermore, examples described in any section / subsection may be combined in any way with any other examples described in the same section / subsection and / or different sections / subsections.

[0013] In the description of the examples in this document, the term "including" and similar terms should be understood as open inclusion, i.e., "including but not limited to". The term "based on" should be understood as "at least partially based on". The term "an example" or "the example" should be understood as "at least one example". The term "some examples" should be understood as "at least some examples". Other explicit and implicit definitions may also be included below. The terms "first", "second", etc., may refer to different or the same objects. Other explicit and implicit definitions may also be included below.

[0014] The examples in this document may involve user data, data acquisition, and / or use. All of these aspects comply with relevant laws, regulations, and provisions. In the examples presented herein, all data collection, acquisition, processing, manipulation, forwarding, and use are conducted with the user's knowledge and confirmation. Accordingly, when implementing each example, the type, scope of use, and usage scenarios of any data or information that may be involved should be communicated to the user and their authorization obtained through appropriate means, in accordance with relevant laws and regulations. The specific methods of notification and / or authorization can vary depending on the actual situation and application scenario; the scope of the solution is not limited in this regard.

[0015] In this manual and the sample solutions, any processing of personal information will be conducted only under legal grounds (such as obtaining the consent of the data subject or being necessary for the performance of a contract) and will only be carried out within the scope stipulated or agreed upon. A user's refusal to process personal information beyond what is necessary for basic functions will not affect the user's use of basic functions.

[0016] In this article, the term "machine learning model" can refer to a computational model that performs tasks by learning patterns and rules from data. Machine learning models can include, but are not limited to, neural network models, deep learning models, and large language models. In some cases, large language models are an example of machine learning models that can understand and generate natural language text and can be used to perform tasks such as task decomposition, tool invocation, and content generation.

[0017] As mentioned above, with the development of computer technology, the internet has become an important platform for video creation and sharing. Creators can create video content and publish it on the platform. Viewers can watch video content on the platform, which greatly enriches people's daily lives.

[0018] However, some traditional content platforms still employ a professional content creation model for video content (such as movies, TV series, short videos, or short dramas). That is, video content is typically created and published by creators with professional skills or resources. Users, as the audience of this video content, lack an effective pathway to connect with it emotionally after watching. In some cases, users can express their thoughts on the video content by posting comments (e.g., text or voice) in comment sections or forums. However, this method cannot produce shareable video content (e.g., videos).

[0019] In other scenarios, users can use general-purpose tools to create derivative works (also known as "adaptations") of original videos. Users need to write detailed descriptions of video content such as characters, scenes, and actions (e.g., prompts). However, ordinary users typically lack the ability to provide such detailed descriptions. Furthermore, general-purpose tools often lack character information (e.g., 3D views of characters) and scene information (e.g., complete scene images) from the original video. Based on at least these factors, the derivative videos generated by general-purpose tools often differ from the original videos, resulting in relatively poor video quality and failing to meet the user's emotional needs.

[0020] Furthermore, since traditional content platforms typically do not provide tools for secondary creation of video content, user-generated content usually occurs outside of the platform. Secondary creation videos lack connection to the original video (e.g., lack of redirection paths), failing to form a content ecosystem with the original video, and thus failing to increase traffic or user retention rates, which is detrimental to improving the stickiness and attractiveness of the content platform.

[0021] A scheme for video creation is proposed here. In this scheme, at least one first event related to at least one object can be displayed through a first video. For the first video, first guiding information can be presented, instructing the creation of content related to the first video. Creation information is received, describing a second event related to the first object among the at least one objects. A second video is presented, used to display the second event related to the first object.

[0022] This approach provides an entry point for users' secondary creative activities (i.e., content creation). While users are watching the original video (the first video), they can be triggered to create secondary content based on the initial guidance information. This can result in secondary videos that satisfy users' emotional needs (the second video), which helps improve the stickiness and interactive experience of the content platform. Furthermore, secondary videos enhance the richness and diversity of the content platform; the original video and secondary videos can form a "content ecosystem," which helps increase the platform's attractiveness.

[0023] The following will describe in detail various examples of this scheme with reference to the accompanying drawings.

[0024] Figure 1 A schematic diagram of example environment 100 is shown. (e.g.) Figure 1 As shown, example environment 100 may include electronic device 110.

[0025] In this example environment 100, electronic device 110 can run application 120 for providing video. Application 120 can be any suitable type of application for providing video. User 140 can interact with application 120 via electronic device 110 and / or its attached devices.

[0026] exist Figure 1 In environment 100, if application 120 is active, electronic device 110 can present interface 150 for providing video through application 120.

[0027] In some cases, electronic device 110 communicates with server 130 to provide services to application 120. Electronic device 110 can be any type of mobile terminal, fixed terminal, or portable terminal, including mobile phones, desktop computers, laptop computers, notebook computers, netbook computers, tablet computers, media computers, multimedia tablets, handheld computers, portable gaming terminals, virtual reality (VR) / augmented reality (AR) devices, personal communication system (PCS) devices, personal navigation devices, personal digital assistants (PDAs), audio / video players, digital cameras / camcorders, positioning devices, television receivers, radio receivers, e-book devices, gaming devices, or any combination of the foregoing, including accessories and peripherals of these devices or any combination thereof. In some cases, electronic device 110 can also support any type of user-facing interface (such as "wearable" circuitry).

[0028] Server 130 can be a standalone physical server, a server cluster or distributed system composed of multiple physical servers, or a cloud server providing basic cloud computing services such as cloud services, cloud databases, cloud computing, cloud functions, cloud storage, network services, cloud communication, middleware services, domain name services, security services, content delivery networks, and big data and artificial intelligence platforms. Server 130 may include, for example, computing systems / servers such as mainframes, edge computing nodes, computing devices in a cloud environment, etc. Server 130 can provide background services for the video application 120 in electronic device 110, and alternatively and / or additionally, server 130 can provide other services.

[0029] A communication connection can be established between server 130 and electronic device 110. This communication connection can be established via wired or wireless means. The communication connection can include, but is not limited to, Bluetooth, mobile network, Universal Serial Bus (USB), and Wireless Fidelity (Wi-Fi) connections. In some cases, server 130 and electronic device 110 can exchange signaling information through their communication connection.

[0030] It should be understood that the structure and function of the various elements in environment 100 are described for illustrative purposes only and do not imply any limitation on the scope of the scheme.

[0031] The following description of the example will continue with reference to the accompanying drawings.

[0032] Figure 2 A flowchart of a video creation process 200 is shown, illustrating various scenarios. For ease of discussion, process 200 is described below from the perspective of electronic device 110. Of course, some or all of the operations in process 200 can also be completed by electronic device 110 in coordination with other devices.

[0033] When describing process 200, it will be combined with Figures 3A to 3G The example shown is to help with understanding. Figures 3A to 3G Example interfaces 300A to 300G (or simply examples 300A to 300G) are shown according to certain scenarios. It should be understood that the interfaces shown in the figures are merely examples, and various interface designs may actually exist. The individual interface elements may have different arrangements and different visual representations, one or more elements may be omitted or replaced, and one or more other elements may also be present. This document is not limited in this respect.

[0034] In box 210, electronic device 110 may present first guiding information. The first guiding information indicates the creation of content related to a first video. The first video is used to display at least one first event related to at least one object. In this document, the first video can be an original video created and released by a creator (e.g., an individual or organization, etc.), such as a movie, TV series, short video, short drama, documentary, variety show, etc. The at least one object can be an object appearing in the first video, such as including but not limited to people, animals, plants, virtual objects (e.g., cartoon characters or cartoon animals), etc. The at least one first event can be any suitable event displayed in the first video, such as a comedic event, a family event, or a historical event, etc.

[0035] In some cases, the first video may include a narrative video related to, for example, a person, animal, or virtual object, where at least one object includes objects appearing in the narrative video, such as a person, animal, or virtual object, and the first event may include a storyline related to that person, animal, or virtual object. In other cases, the first video may include a documentary about an animal or plant, and the first event may describe the life or growth process of that animal or plant, etc. Of course, the above-described first video, object, and first event are merely examples. The object and first event may differ depending on the type or subject matter of the first video. This document does not impose any limitations on this.

[0036] In this document, the first guidance information is used to guide users in creating content (e.g., secondary creation) from the first video. The first guidance information may include content in one or more modalities, such as text, images, audio, etc. In some examples, the electronic device 110 may display the first guidance information on the playback interface of the first video. For example, such as... Figure 3A As shown, the electronic device 110 can present video 304 (i.e., the first video) on interface 302. Interface 302 can display first guidance information 306, such as "This drama has a parallel world with XX new plot points." In this way, while user 140 is watching video 304, the first guidance information 306 can guide user 140 to create secondary content based on video 304.

[0037] In some examples, electronic device 110 can play a first video. If the first video reaches a predetermined point in time, electronic device 110 can display first guidance information. In some cases, the first video may include a single video. In this case, the predetermined point in time may indicate a predetermined duration or proportion of the first video. For example, the predetermined point in time may indicate a time point "X duration" away from the end time of the first video. If the first video reaches this time point, electronic device 110 can display the first guidance information on the playback interface of the first video.

[0038] In other cases, the first video may include multiple videos. A predetermined progress bar can indicate a predetermined order in which the multiple videos are sequenced. If the electronic device 110 plays videos after this predetermined order, the electronic device 110 can display first guidance information on the video playback interface. As an example, such as... Figure 3A As shown, the short drama may include N episodes of video, where N is a positive integer. The electronic device 110 can play episode XX (304) or subsequent episodes on the playback interface 302, and can also display first guidance information 306 on the playback interface 302. In this way, the first guidance information can be presented when the user 140 has finished or is about to finish watching the video, guiding the user 140 to accurately engage in secondary creation of the video, thereby improving the quality of video creation. Of course, the above presentation method of the first guidance information is merely exemplary. The electronic device 110 may present the first guidance information throughout or in part of the first video playback process. This document does not impose any limitations on this.

[0039] In some examples, electronic device 110 may present a first interface in response to a triggering of first guidance information. At least one third video is presented in the first interface. This at least one third video may be created based on the first video, and the at least one third video is used to display at least one third event related to a second object, where the at least one object includes the second object. In this document, a third video refers to a video created based on the first video; that is, the third video may be a derivative video of the original video. In some cases, the third video may include a derivative video created by user 140 themselves, or the third video may include videos created by other users. For example, the operator of a content platform may create a derivative video of the original video and use the resulting video as a reference video (i.e., a third video) to provide users with creative direction guidance. The second object may include one or more of the at least one object, such as one or more characters, animals, plants, virtual objects, etc., from the first video. The third event may be the same as or different from the first event; for example, the plot of the derivative video may differ from the plot of the original video.

[0040] In some examples, electronic device 110 may receive a third trigger operation, which instructs the selection of one of the at least one third video. Then, electronic device 110 may play the selected third video. As an example, such as... Figure 3A and Figure 3BAs shown, electronic device 110 can respond to the triggering of first guidance information 306 (e.g., single click, double click, long press, etc.) and present interface 308. Interface 308 can display an identifier for video 310, which may be a secondary creation video based on video 304. Electronic device 110 can respond to the triggering of the identifier for video 310 and present interface 322. Video 310 is played on interface 322. In some cases, interface 308 can be referred to as a "parallel world" of the original video. Thus, users can enter the "parallel world" of the original video via the first guidance information to browse secondary creation videos related to the original video, and can also watch secondary creation videos through the "parallel world" to understand the creative direction. To a certain extent, this can solve the problem of the disconnect between secondary creation videos and the original video. After watching the original video, users can also watch secondary creation videos through the "parallel world," providing users with different audiovisual experiences and helping to improve the stickiness of the content platform.

[0041] In some examples, the electronic device 110 can display the first guiding information on the playback interface of the third video. Therefore, the first guiding information is not limited to being displayed on the playback interface of the original video; the electronic device 110 can also display the first guiding information on the playback interface of the secondary creation video. In this way, after a user watches the secondary creation video, they can trigger secondary creation of the original video through the playback interface of the secondary creation video.

[0042] As an example, such as Figure 3C As shown, electronic device 110 can display video 310 on interface 322, and video 310 can be a secondary creation video of video 304. Electronic device 110 can display first guiding information 324 on interface 322, such as "Create the same style, generate a new storyline with one click". User 140 can trigger content creation for video 304 through the first guiding information 324. It should be understood that the above-mentioned first guiding information is only exemplary, and electronic device 110 can also display the first guiding information in other locations, such as the recommendation interface or homepage of a content platform. This document does not limit this.

[0043] Return to combination Figure 2 In box 220, electronic device 110 can receive creation information. The creation information describes a second event related to a first object among at least one objects. The first object may include one or more objects in the first video, and the second event may be the same as or different from the first event. For example, the first object may include some objects in the original video, and the plot indicated by the second event may differ from the plot indicated by the first event.

[0044] In some examples, electronic device 110 can receive a first trigger operation, which is issued in response to first guidance information. Electronic device 110 can present a creation interface. This creation interface is used to receive creation information; for example, it can be referred to as a "creation workbench." Electronic device 110 can receive creation information via the creation interface. For example, such as... Figure 3A and Figure 3D As shown, the electronic device 110 can display first guidance information 306 on interface 302. If the user 140 wishes to perform secondary creation on video 304, they can trigger (e.g., click, touch, or long press) the first guidance information 306. The electronic device 110 can respond to the triggering of the first guidance information 306 by displaying interface 328. The electronic device 110 can receive creation information through interface 328. In this way, the user can switch from the original video playback interface to the "creation workbench" for secondary creation of the video.

[0045] For example, such as Figure 3C and Figure 3D As shown, the electronic device can display first guidance information 324 on interface 322. If user 140 watches video 310 (i.e., a secondary creation video), understands the creative direction, and also wishes to perform secondary creation on video 304, the first guidance information 324 can be triggered. The electronic device 110 can respond to the triggering of the first guidance information 324 and display interface 328. In this way, the playback interface of the secondary creation video can be switched to the "creation workbench" for secondary creation of the video.

[0046] In some examples, electronic device 110 can receive creation instructions through a first interface, which may instruct content creation based on a first video. Electronic device 110 can then present a creation interface. In this way, it is possible to switch to the creation interface via a "parallel world." In some cases, electronic device 110 may display preset controls on the first interface. If electronic device 110 receives a second trigger operation, it can determine that it has received a creation instruction, and the second trigger operation is issued to the preset controls. Electronic device 110 can respond to the triggering of the preset controls by presenting the creation interface. Preset controls may include any appropriate controls, such as icons, buttons, options, etc. As an example, such as... Figure 3B and Figure 3D As shown, electronic device 110 can display control 320 on interface 308 (e.g., "Parallel World"). If user 140 wishes to perform secondary creation on video 304, user 140 can trigger control 320. Electronic device 110 can respond to the triggering of control 320 by displaying interface 328 (e.g., "Creation Workbench"). Subsequently, electronic device 110 can receive creation information via interface 328.

[0047] In some examples, the electronic device 110 may present at least one object. If the electronic device 110 can receive a third selection operation, it can determine that it has received a creation instruction, the third selection operation instructing the selection of a first object from the at least one object. The electronic device 110 may then present a creation interface. The first object may include at least one of the one or more candidate objects. In this way, the electronic device 110 can switch to a "creation workbench" in response to a selection of an object.

[0048] As an example, such as Figure 3B and Figure 3D As shown, electronic device 110 can display objects 314, 316, and 318 on interface 308. Object 314 can be "Character A" in the original video (i.e., video 304), object 316 can be "Character B" in the original video, and object 318 can be "Character C" in the original video. If user 140 wants to include object 314 in a secondary creation video, they can select object 314. Electronic device 110 can respond to the selection of object 314 by displaying interface 328 (e.g., "Creation Workbench") and displaying object 314 as selected in interface 328. In this way, the user can select objects in the "Parallel World" to switch to the "Creation Workbench".

[0049] As an example, electronic device 110 can display a details interface for video 304, which may include a synopsis of the original video, a cast and crew list, comments, etc. The cast and crew list may include at least one object from the original video. If user 140 views, for example, the comments and connects them to creative ideas for the original video, user 140 can trigger an object from the cast and crew list. Electronic device 110 can switch from the details interface to... Figure 3D The "Creative Workbench" shown.

[0050] In some examples, the electronic device 110 can present object information indicating one or more candidate objects, with at least one object including one or more candidate objects. That is, the candidate objects can be objects appearing in the original video. The user can select an object appearing in the secondary-creation video from the candidate objects, such as the first object. The electronic device 110 can receive object selection information, which can indicate the selection of the first object from one or more candidate objects. The electronic device 110 can include the object selection information as part of the creation information. In this case, the creation information can indicate the object (e.g., the first object) that will appear in the secondary-creation video.

[0051] In some examples, the electronic device 110 can present one or more candidate objects on the creation interface. The electronic device 110 can receive object selection information in response to the selection of a first object from the one or more candidate objects. The electronic device 110 can then use the object selection information as at least a part of the creation information. As an example, such as... Figure 3D As shown, the electronic device 110 can display a character list 330 on the interface 328. The character list 330 can display character cards for one or more candidate characters (i.e., candidate objects), such as character cards for objects 314, 316, and 318, etc. The character cards can include images (e.g., avatars) and character names (e.g., names) of the candidate characters. The user 140 can select one or more characters from the candidate characters based on creative ideas, such as "Character A" (i.e., object 314). In response to the user 140 selecting object 314, the electronic device 110 can obtain object selection information, which can indicate object 314, such as the name, identifier, etc. of object 314.

[0052] In some cases, electronic device 110 may display predetermined characters from character list 330 as selected, such as the male or female protagonist. In some cases, electronic device 110 may determine, based on object selection information, whether the number of objects selected by the user exceeds a predetermined number. If it is determined that the number of selected objects exceeds the predetermined number (e.g., 3 or other numbers), electronic device 110 may display a prompt message to inform the user that the maximum number of characters that can be selected is the predetermined number.

[0053] Alternatively and / or additionally, the electronic device 110 may receive event description information about a second event, such as the plot to be shown in a derivative video, and the event description information may be used to describe that plot. In some cases, the event description information may include information in one or more modalities, such as text, voice, or images. In other words, the user may describe the plot of the derivative video using at least one of text, voice, or images. The electronic device 110 may use the event description information as at least part of the creation information.

[0054] In some cases, electronic device 110 can receive event description information via a creation interface. For example, such as Figure 3D and Figure 3E As shown, electronic device 110 can display input box 332 on interface 328. User 140 can input text content 342 (i.e., event description information) into input box 332 via electronic device 110 or an accessory device of electronic device 110. Electronic device 110 can receive text content 342 via input box 332 as at least part of the creation information. In this way, the user can input event description information for the plot in the "creation workbench".

[0055] In some examples, the electronic device 110 may present at least one label (e.g., also referred to as an "inspiration label"), each indicating at least one event summary. The event summary may indicate a concise description of content creation. In some cases, the event summary may indicate a creative idea, inspiration, or intention. A user may select one or more labels from the at least one label; for example, a user may select a first label from the at least one label. The electronic device 110 may receive a first selection operation instructing the selection of a first label among the at least one labels, the first label indicating a first event summary for a second event. Subsequently, the electronic device 110 may present first descriptive information that matches the first event summary.

[0056] In some examples, the electronic device 110 may display at least one label on the authoring interface. For example, such as... Figure 3D and Figure 3E As shown, Figure 3E The input boxes 332-1, 332-2, and 332-3 are... Figure 3D The input box 332 can have three different display states. The electronic device 110 can display labels 334, 336, 338, and 340, etc., in the interface 328 (e.g., input box 332). For example, label 334 could include "Sweeter," indicating an event summary of "generating a romantic and heartwarming video." Another example is label 336, which could include "Embracing and Watching the Sea," indicating an event summary of "the male and female protagonists embracing and watching the sea." Yet another example is label 338, which could include "Live Wedding," indicating an event summary of "the bride and groom holding a live wedding."

[0057] In some cases, electronic device 110 can acquire event information from the original video (i.e., the first video), which describes at least one first event. In some cases, the event information may also be referred to as plot information, which describes the plot of the original video. Electronic device 110 can use a machine learning model to generate the at least one tag based on the plot information. In some cases, the event summary indicated by the at least one tag may differ from the plot of the original video.

[0058] In some examples, user 140 can select one or more tags from a plurality of tags, such as tag 338. Electronic device 110 can obtain initial descriptive information based on tag 338. For example, such as... Figure 3E As shown, electronic device 110 can display text content 344 in input boxes 332-2. In some cases, electronic device 110 can utilize a machine learning model to obtain first descriptive information based on the role information and first tag of the first object. As an example, such as Figure 3Dand Figure 3E As shown, if user 140 selects object 314 and tag 338, electronic device 110 can use a machine learning model to generate text content 344 based on the plot information of video 304, the character name of object 314, and tag 338. Then, electronic device 110 can add the text content 344 to input boxes 332-2.

[0059] In some examples, electronic device 110 can receive a second selection operation, which instructs the user to select the first tag again. Afterward, electronic device 110 can present second description information that matches the first event summary and differs from the first description information. In this way, if the user is not satisfied with the generated description information, they can trigger the tag again. Electronic device 110 can then regenerate the description information.

[0060] As an example, such as Figure 3E As shown, in response to a reselection of tag 338, electronic device 110 can generate text content 344 using a machine learning model based on the plot information of video 304, the character name of object 314, and tag 338. Then, electronic device 110 can display the text content 346 in input boxes 332-2. Afterwards, electronic device 110 can display the text content 346 in input boxes 332-3, overlaying the text content 344.

[0061] In some examples, electronic device 110 can receive user input (e.g., text or voice content), which can be used to describe a second event summary of a second event. This second event summary may be the same as or different from the first event summary. Electronic device 110 can present first descriptive information based on the user input and a first tag, the first descriptive information matching the first and second event summaries. In this way, electronic device 110 can provide descriptive information based on user input and selected tags.

[0062] As an example, such as Figure 3D and Figure 3E As shown, user 140 can select object 314 and tag 338, and user 140 can also input text content 342 into input box 332-1 (i.e., user input). Electronic device 110 can use a machine learning model to generate text content 344 based on the plot information of video 304, the character name of object 314, tag 338, and text content 342. Then, electronic device 110 can display text content 344 in input box 332-2.

[0063] In some scenarios, electronic device 110 may, in response to the selection of a first tag, send a request to server 130 based on the first tag, requesting server 130 to invoke a machine learning model to generate event description information. Before receiving the event description information from server 130, electronic device 110 may display a loading indicator (e.g., an image or animation) in input box 332 to indicate that event description information is being generated. After receiving the event description information from server 130, electronic device 110 may stop displaying the loading indicator and display the event description information, such as text content 344 or text content 346, in input box 332.

[0064] In some cases, electronic device 110 may present event description information (e.g., text content 344) in input box 332. Electronic device 110 may receive user input and update the event description information based on the user input. The updated event description information (e.g., text content 346) is then presented in input box 332. This allows users to manually modify the event description information. It should be understood that the event description information is not limited to text content and may also include other modalities such as voice or images. This document does not impose any limitations on this.

[0065] By providing tags and role cards in the "Creation Workbench," the complex process of writing event descriptions can be simplified to simply selecting tags and role cards. Even users without experience in writing event descriptions can generate high-quality descriptions by choosing tags and role cards, ensuring the creation of high-quality derivative videos in subsequent environments. This approach improves user-friendliness, reduces the difficulty of derivative video creation, and ultimately enhances the overall quality of derivative videos.

[0066] like Figure 2 As shown, in box 230, electronic device 110 can present a second video to demonstrate a second event related to the first object. In some examples, electronic device 110 can receive confirmation of creation information and present the second video based on that confirmation.

[0067] In some cases, such as Figures 3D to 3F As shown, electronic device 110 presents control 348 on interface 328. User 140 can select object 314 and label 338, and electronic device 110 can present text content 346 in input box 332. If user 140 confirms that the text content 346 meets the requirements, user 140 can trigger control 348. Electronic device 110 can respond to the triggering of control 348 and present interface 350 (e.g., also referred to as a "preview interface"). A second video is presented on interface 350.

[0068] In some cases, such as Figure 3B and Figure 3F As shown, electronic device 110 can display control 358 on interface 350. Electronic device 110 can play video 352 (i.e., a secondary creation video) on interface 350. If user 140 confirms that video 352 meets the requirements, user 140 can trigger control 352. Electronic device 110 can publish video 352. For example, electronic device 110 can send a publishing request for video 352 to server 130, and server 130 can publish video 352 so that other users can watch video 352. Electronic device 110 can display video 352 on interface 308 (e.g., "parallel world"). User 140 can view their own created video 352 from the parallel world.

[0069] In some cases, such as Figure 3F As shown, electronic device 110 can display control 354 on interface 350. If user 140 wants to load video 352 locally onto electronic device 110, control 354 can be triggered. Electronic device 110 can send a request to server 130. Server 130 can transmit video 352 to electronic device 110. In other cases, such as Figure 3F As shown, electronic device 110 can display control 356 on interface 350. If user 140 believes that video 352 does not meet their needs, they can trigger control 356. In response to the triggering of control 356, electronic device 110 can display, for example, interface 328, that is, re-present the "creation workbench" so that user 140 can create the video again.

[0070] In some cases, such as Figure 3G As shown, electronic device 110 can display video 352 (i.e., the second video) on interface 360. In some cases, interface 360 ​​may also be referred to as the user interface of user 140. Electronic device 110 can display derivative videos on the user interface, for example, arranged in reverse chronological order. In some cases, electronic device 110 may also display other derivative videos on the user interface, such as derivative videos collected by user 140. It should be understood that the above-described presentation method of the second video is merely exemplary, and any other appropriate presentation method can be selected according to actual needs. This document does not impose any limitations on this.

[0071] In some examples, electronic device 110 can use a machine learning model to obtain a storyboard based on creation information and event information from a first video. The storyboard describes at least one video frame in the second video, and the event information describes at least one first event. Then, based on the storyboard and character and scene information from the first video, electronic device 110 can use a machine learning model to obtain the second video. The character information describes a first object, and the scene information indicates at least one scene from the first video. In this way, the style and character appearance of the re-created video can be kept consistent with the original video.

[0072] In some cases, a knowledge base can be pre-built, which can include material information related to the original video (e.g., a short video). This material information can include character information, such as the character's name, physical characteristics, vocal features, etc. As an example, character information can be implemented as a fine-tuning plugin for a machine learning model, such as a Low-Rank Adaptation (LoRA) plugin. This ensures that the character appearance in the secondary creation video generated using the machine learning model remains consistent with the original video.

[0073] Alternatively and / or additionally, the material information may include scene information from the original video. Scene information may include images of one or more scenes from the original video. Utilizing scene information can provide a scene style reference for machine learning models, ensuring that the scene style of the derivative videos generated by the machine learning model remains consistent with the original video.

[0074] Alternatively and / or additionally, the material information may include plot information from the original video, which may include, but is not limited to, plot descriptions, atmosphere tags, theme tags, and key lines. Plot descriptions describe at least one plot point in the original video; atmosphere tags indicate the atmosphere of at least one plot point in the original video; theme tags indicate the theme of the original video, such as its core content. Key lines may include dialogue from at least one plot point in the original video.

[0075] As an example, electronic device 110 can obtain plot information from a knowledge base of a first video, and use a machine learning model to generate a storyboard based on the creation information and plot information. The storyboard may include text content. Electronic device 110 can also obtain character information (e.g., LoRA plugin) and scene information from the original video from the knowledge base. Electronic device 110 can then use a machine learning model to generate a secondary creation video based on the character information, scene information, and storyboard. For example, electronic device 110 can send a request to server 130, which may specify the character information, scene information, and storyboard. Server 130 can then invoke a machine learning model to generate a secondary creation video based on the character information, scene information, and LoRA plugin. Electronic device 110 can then receive the secondary creation video (i.e., the second video) from server 130. By constructing a knowledge base for the original video, the character information, scene information, and plot information of the original video can be saved. This ensures that the generated secondary creation video maintains consistency with the original video in terms of character appearance, movements, style, and timbre, which is beneficial for improving the video quality of the secondary creation video. It should be noted that multiple steps in this paper utilize machine learning models, and the machine learning models used in these steps may be the same or different. This article does not impose any restrictions on this.

[0076] In some cases, if the second video is successfully generated, the electronic device 110 may display a first notification to indicate that the second video has been successfully generated. The electronic device 110 may, in response to the triggering of the first notification, display something like... Figure 3F The interface 350 shown is the preview interface. A second video (e.g., video 352) is played on interface 350. If the second video fails to generate, the electronic device can display a second notification to inform the user of the failure. The electronic device 110 can respond to the triggering of the second notification by displaying... Figure 3D The interface shown is 328 (also known as the "creation workbench"), which allows users to recreate videos.

[0077] In some examples, electronic device 110 can play a second or third video. During the playback of the second or third video, second guidance information is presented. This second guidance information is used to guide the user to watch the original video and may include information in one or more modalities, such as text, images, etc. If electronic device 110 receives a fourth trigger operation on the second guidance information, electronic lock device 110 can present the first video. As an example, such as... Figure 3A and Figure 3CAs shown, electronic device 110 can play video 310 (i.e., the third video) on interface 322, and display second guidance information, such as "Watch the original video," on interface 322. If a trigger on the second guidance information is received, electronic device 110 can present interface 302, on which video 304 (i.e., the first video) is played. In this way, an entry point can be provided to switch back from the secondary creation video to the original video, enabling the original video and the secondary creation video to form a "content ecosystem." After watching the secondary creation video, users can jump to the playback interface of the original video through this entry point, which helps to improve the completion rate of the original video and the user retention rate of the content platform.

[0078] In summary, this solution provides an entry point for users' secondary creation activities (i.e., content creation). While users are watching the original video (i.e., the first video), they can be triggered to create secondary content based on the initial guidance information. This can result in secondary creation videos (i.e., the second video) that satisfy users' emotional needs, which helps improve the stickiness and interactive experience of the content platform. Furthermore, secondary creation videos can enhance the richness and diversity of the content platform; the original video and secondary creation videos can form a "content ecosystem," which helps increase the attractiveness of the content platform.

[0079] A corresponding apparatus for implementing the above methods or processes is also provided. Figure 4 A block diagram of a device 400 for video creation is shown, depending on several scenarios. Device 400 can be implemented as or included in electronic device 110. The various modules / components in device 400 can be implemented by hardware, software, firmware, or any combination thereof.

[0080] like Figure 4 As shown, the device 400 includes: a first presentation module 410 configured to present first guidance information, the first guidance information instructing content creation related to a first video, the first video being used to display at least one first event related to at least one object; a receiving module 420 configured to receive creation information, the creation information describing a second event related to the first object among at least one object; and a second presentation module 430 configured to present a second video, the second video being used to display the second event related to the first object.

[0081] In some examples, the receiving module 420 is further configured to: present object information indicating one or more candidate objects, at least one object including one or more candidate objects; receive object selection information as part of creation information, the object selection information indicating the selection of a first object from one or more candidate objects; and receive event description information as part of creation information, the event description information describing a second event.

[0082] In some examples, the receiving module 420 is further configured to: present at least one label, each label indicating at least one event summary; receive a first selection operation, the first selection operation instructing the selection of a first label among the at least one labels, the first label indicating a first event summary of a second event; and present first descriptive information, the first descriptive information matching the first event summary.

[0083] In some examples, the receiving module 420 is also configured to: receive a second selection operation, the second selection operation indicating that the first tag is selected again; and present second description information, the second description information conforming to the first event summary and different from the first description information.

[0084] In some examples, device 400 further includes: a third presentation module configured to receive a first trigger operation, the first trigger operation being issued in response to first guidance information; and a presentation interface for receiving creation information.

[0085] In some examples, the third presentation module is further configured to: in response to a first triggering operation, present a first interface displaying information about at least one third video, the at least one third video being created based on the first video, and the at least one third video being used to demonstrate at least one third event related to a second object, the at least one object including the second object; receive a creation instruction through the first interface, the creation instruction indicating to create based on the first video; and present a creation interface.

[0086] In some examples, the third presentation module is further configured to perform at least one of the following: receive a third selection operation, which indicates that a first object is selected from at least one object, or receive a second trigger operation, which is issued to a preset control in the first interface.

[0087] In some examples, the device 400 further includes: a playback module configured to receive a third trigger operation, the third trigger operation indicating the selection of one of at least one third video; and to play the selected third video.

[0088] In some examples, the first presentation module 410 is further configured to: play a first video; and, in response to the first video playing to a predetermined progress, present first guidance information.

[0089] In some examples, the first presentation module 410 is further configured to: present first guidance information on a playback interface for playing a third video, the third video being created based on the first video, and the third video being used to display at least one third event related to a second object, the at least one object including the second object.

[0090] In some examples, device 400 further includes: a fourth presentation module configured to present second guidance information during playback of a second video, the second guidance information instructing the viewing of a first video; receive a fourth trigger operation, the fourth trigger operation being issued in response to the second guidance information; and present the first video.

[0091] In some examples, the receiving module 420 is further configured to: use a machine learning model to obtain first description information based on the role information and first tag of the first object.

[0092] In some examples, the receiving module 420 is further configured to: use a machine learning model to obtain second descriptive information based on the role information and first tag of the first object.

[0093] In some examples, the second presentation module 430 is further configured to: obtain a storyboard based on creation information and event information of the first video using a machine learning model, the storyboard describing at least one video frame in the second video, and the event information describing at least one first event; and obtain the second video based on the storyboard, character information, and scene information using a machine learning model, the character information describing a first object, and the scene information indicating at least one scene in the first video.

[0094] The modules included in device 400 can be implemented in various ways, including software, hardware, firmware, or any combination thereof. In some cases, one or more modules can be implemented using software and / or firmware, such as machine-executable instructions stored on a storage medium. In addition to or as an alternative to machine-executable instructions, some or all of the units in device 400 can be implemented at least partially by one or more hardware logic components. By way of example, and not limitation, exemplary types of hardware logic components that can be used include field-programmable gate arrays (FPGAs), application-specific integrated circuits (ASICs), application-specific standard parts (ASSPs), systems on a chip (SOCs), complex programmable logic devices (CPLDs), and so on.

[0095] Figure 5 A block diagram of an electronic device 500 according to other scenarios is shown. It should be understood that... Figure 5The electronic device 500 shown is merely exemplary and should not be construed as limiting the functionality and scope of the examples described herein. Figure 5 The electronic device 500 shown can be implemented as the same or different electronic device as the electronic device 110 discussed above.

[0096] like Figure 5 As shown, electronic device 500 is in the form of a general-purpose electronic device. Components of electronic device 500 may include, but are not limited to, one or more processing units or processors 510, memory 520, storage devices 530, one or more communication units 540, one or more input devices 550, and one or more output devices 560. Processor 510 may be a physical or virtual processor and is capable of performing various processes according to programs stored in memory 520. In a multiprocessor system, multiple processors execute computer-executable instructions in parallel to improve the parallel processing capability of electronic device 500.

[0097] Electronic device 500 typically includes multiple computer storage media. Such media can be any accessible media that is accessible to electronic device 500, including but not limited to volatile and non-volatile media, removable and non-removable media. Memory 520 can be volatile memory (e.g., registers, cache, random access memory (RAM)), non-volatile memory (e.g., read-only memory (ROM), electrically erasable programmable read-only memory (EEPROM), flash memory), or some combination thereof. Storage device 530 can be removable or non-removable media and can include machine-readable media, such as flash drives, disks, or any other media that can be used to store information and / or data and can be accessed within electronic device 500.

[0098] Electronic device 500 may further include additional removable / non-removable, volatile / non-volatile storage media. Although not explicitly stated... Figure 5 As shown, disk drives for reading from or writing to removable, non-volatile disks (e.g., "floppy disks") and optical disk drives for reading from or writing to removable, non-volatile optical disks can be provided. In these cases, each drive can be connected to a bus (not shown) via one or more data media interfaces. Memory 520 may include computer program product 525 having one or more program modules configured to perform various methods or actions of various examples.

[0099] The communication unit 540 enables communication with other electronic devices via a communication medium. Additionally, the functionality of the components of the electronic device 500 can be implemented using a single computing cluster or multiple computing machines capable of communicating via communication connections. Therefore, the electronic device 500 can operate in a networked environment using logical connections to one or more other servers, networked personal computers, or another network node.

[0100] Input device 550 can be one or more input devices, such as a mouse, keyboard, trackball, etc. Output device 560 can be one or more output devices, such as a monitor, speaker, printer, etc. Electronic device 500 can also communicate with one or more external devices (not shown) via communication unit 540 as needed. These external devices include storage devices, display devices, etc., and can communicate with one or more devices that enable user interaction with electronic device 500, or with any device that enables electronic device 500 to communicate with one or more other electronic devices (e.g., network card, modem, etc.). Such communication can be performed via an input / output (I / O) interface (not shown).

[0101] A computer-readable storage medium is provided that stores computer-executable instructions thereon, wherein the computer-executable instructions are executed by a processor to implement the methods described above. A computer program product is also provided, which is tangibly stored on a non-transitory computer-readable medium and includes computer-executable instructions, which are executed by a processor to implement the methods described above.

[0102] The flowcharts and / or block diagrams of the methods, apparatus, devices, and computer program products referred to herein describe various aspects. It should be understood that each block of the flowcharts and / or block diagrams, as well as combinations of blocks in the flowcharts and / or block diagrams, can be implemented by computer-readable program instructions.

[0103] These computer-readable program instructions can be provided to a processor of a general-purpose computer, a special-purpose computer, or other programmable data processing apparatus to produce a machine such that, when executed by the processor of the computer or other programmable data processing apparatus, they create means for implementing the functions / actions specified in one or more blocks of the flowchart and / or block diagram. These computer-readable program instructions can also be stored in a computer-readable storage medium that causes a computer, programmable data processing apparatus, and / or other device to operate in a particular manner; thus, the computer-readable medium storing the instructions comprises an article of manufacture that includes instructions for implementing aspects of the functions / actions specified in one or more blocks of the flowchart and / or block diagram.

[0104] Computer-readable program instructions can be loaded onto a computer, other programmable data processing apparatus, or other device to cause a series of operational steps to be performed on the computer, other programmable data processing apparatus, or other device to produce a computer-implemented process, thereby causing the instructions that execute on the computer, other programmable data processing apparatus, or other device to perform the functions / actions specified in one or more boxes of a flowchart and / or block diagram.

[0105] The flowcharts and block diagrams in the accompanying figures illustrate the architecture, functionality, and operation of possible implementations of systems, methods, and computer program products under various scenarios. In this respect, each block in a flowchart or block diagram may represent a module, segment, or portion of an instruction, which contains one or more executable instructions for implementing the specified logical function. In some alternative implementations, the functions marked in the blocks may occur in a different order than those shown in the figures. For example, two consecutive blocks may actually be executed substantially in parallel, and they may sometimes be executed in reverse order, depending on the functions involved. It should also be noted that each block in the block diagrams and / or flowcharts, and combinations of blocks in the block diagrams and / or flowcharts, can be implemented using a dedicated hardware-based system that performs the specified function or action, or using a combination of dedicated hardware and computer instructions.

[0106] Various examples have been described above. The foregoing descriptions are exemplary and not exhaustive, nor are they limited to the disclosed implementations. Many modifications and variations will be apparent to those skilled in the art without departing from the scope and spirit of the described implementations. The terminology used herein is chosen to best explain the principles, practical applications, or improvements to technology in the market, or to enable others skilled in the art to understand the various implementations disclosed herein.

Claims

1. A method for video creation, comprising: Presenting first guiding information, the first guiding information instructing the creation of content related to a first video, the first video being used to demonstrate at least one first event related to at least one object; Receive creation information, the creation information describing a second event related to a first object among the at least one object; as well as A second video is presented, which is used to demonstrate the second event in relation to the first object.

2. The method according to claim 1, wherein receiving the creation information includes: Presenting object information, the object information indicating one or more candidate objects, the at least one object including the one or more candidate objects; Receive object selection information as part of the creation information, the object selection information indicating the selection of the first object from the one or more candidate objects; as well as The event description information is received as part of the creation information, and the event description information describes the second event.

3. The method according to claim 2, wherein receiving the event description information includes: Present at least one label, each label indicating at least one event summary; Receive a first selection operation, the first selection operation instructing the selection of a first tag among the at least one tags, the first tag indicating a first event summary of the second event; as well as First descriptive information is presented, which matches the first event summary.

4. The method according to claim 3, wherein receiving the creation information further includes: Receive a second selection operation, which instructs you to select the first tag again; as well as A second description is presented, which is consistent with the first event summary and is different from the first description.

5. The method according to claim 1, further comprising: Receive a first trigger operation, wherein the first trigger operation is issued in response to the first guidance information; as well as A creation interface is presented, which is used to receive the creation information.

6. The method according to claim 5, wherein presenting the creation interface includes: In response to the first triggering operation, a first interface is presented, which displays information about at least one third video, the at least one third video being created based on the first video, and the at least one third video being used to demonstrate at least one third event related to a second object, the at least one object including the second object; The creation instruction is received through the first interface, and the creation instruction indicates that creation should be based on the first video. as well as The creation interface will then be displayed.

7. The method of claim 6, wherein receiving the creation instruction comprises at least one of the following: Receive a third selection operation, the third selection operation indicating to select the first object from the at least one object, or Receive a second trigger operation, which is issued to a preset control in the first interface.

8. The method of claim 6, further comprising: Receive a third trigger operation, the third trigger operation indicating the selection of one of the at least one third videos; as well as Play the selected third video.

9. The method of claim 1, wherein presenting the first guidance information comprises: Play the first video; as well as In response to the first video reaching a predetermined progress point, the first guidance information is displayed.

10. The method of claim 1, wherein presenting the first guidance information comprises: The first guidance information is presented on the playback interface, which is used to play a third video. The third video is created based on the first video and is used to display at least one third event related to the second object, the at least one object including the second object.

11. The method according to claim 1, further comprising: During the playback of the second video, a second guiding message is presented, instructing the viewer to watch the first video. Receive a fourth trigger operation, which is issued in response to the second guidance information; as well as The first video is presented.

12. An apparatus for video creation, comprising: A first presentation module is configured to present first guidance information, the first guidance information indicating the creation of content related to a first video, the first video being used to display at least one first event related to at least one object; A receiving module is configured to receive creation information, the creation information describing a second event related to a first object among the at least one object; as well as The second presentation module is configured to present a second video, which is used to display the second event related to the first object.

13. An electronic device, comprising: At least one processor; as well as At least one memory coupled to the at least one processor and storing instructions for execution by the at least one processor, the instructions causing the electronic device to perform the method according to any one of claims 1 to 11 when executed by the at least one processor.

14. A computer-readable storage medium having stored thereon computer-executable instructions that can be executed by a processor to implement the method according to any one of claims 1 to 11.

15. A computer program product tangibly stored in a computer storage medium and comprising computer-executable instructions that, when executed by a device, cause the device to perform the method according to any one of claims 1 to 11.