Methods, apparatus, devices, and media for processing media items

JP2026521939APending Publication Date: 2026-07-02BEIJING ZITIAO NETWORK TECH CO LTD

View PDF 0 Cites 0 Cited by

Patent Information

Authority / Receiving Office: JP · JP
Patent Type: Applications
Current Assignee / Owner: BEIJING ZITIAO NETWORK TECH CO LTD
Filing Date: 2024-06-05
Publication Date: 2026-07-02

Smart Images

Figure 2026521939000001_ABST

Patent Text Reader

Abstract

A method, apparatus, device, and medium for processing media items are provided. The method obtains a first media item and a plurality of templates, each used to generate a plurality of second media items from the first media item. Based on the plurality of templates and the first media item, a plurality of candidate settings are determined for generating a plurality of second media items, where one of the candidate settings indicates a template from the plurality of templates and a plurality of candidate settings indicating the first media item. A plurality of effect evaluations are associated with each of the plurality of candidate settings, where one of the effect evaluations indicates an effect evaluation of the second media item generated from the first media item using the template indicated by the candidate settings. Based on the plurality of effect evaluations, a target template for generating a second media item from the first media item is selected from the plurality of templates. In this way, the plurality of effect evaluations can be used to indicate whether the plurality of candidate second media items to be generated meet the user's needs and to improve the efficiency of media item processing.

Need to check novelty before this filing date? Find Prior Art

Description

Technical Field

[0001] Embodiments of the present disclosure generally relate to the field of computers, and more particularly, to methods, apparatuses, devices, and computer-readable storage media for processing media items.

Background Art

[0002] Various technical solutions for generating media items have been proposed. For example, a user can manually create a media item, edit an existing media item, or invoke a machine learning model to generate a media item. However, in the process of processing media items, a lot of manual work is required to obtain media items that meet the user's needs. What is expected at this point is that media items are processed in a simpler and more efficient way, resulting in media items that meet expectations.

Summary of the Invention

[0003] In a first aspect of the present disclosure, a method for processing media items is provided. In the method, a first media item and a plurality of templates respectively used to generate a plurality of second media items from the first media item are obtained. Based on the plurality of templates and the first media item, a plurality of candidate settings for generating the plurality of second media items respectively are determined, where a candidate setting among the plurality of candidate settings indicates a template among the plurality of templates and the first media item. A plurality of effect evaluations respectively associated with the plurality of candidate settings are determined, where an effect evaluation among the plurality of effect evaluations represents an effect evaluation of a second media item generated from the first media item using the template indicated by the candidate setting. Based on the plurality of effect evaluations, a target template for generating a second media item from the first media item is selected from the plurality of templates.

[0004] A second aspect of this disclosure provides an apparatus for processing a media item. The apparatus includes: an acquisition module configured to acquire a first media item and a plurality of templates, each used to generate a plurality of second media items from the first media item; a generation module configured to determine a plurality of candidate settings for generating a plurality of second media items, each based on the plurality of templates and the first media item, wherein one of the candidate settings determines a plurality of candidate settings that indicate a template and the first media item from among the plurality of templates; an evaluation module configured to determine a plurality of effect evaluations associated with each of the plurality of candidate settings, wherein one of the effect evaluations determines a plurality of effect evaluations that represent the effect evaluation of a second media item generated from the first media item using the template indicated by the candidate settings; and a selection module configured to select a target template from among the plurality of templates for generating a second media item from the first media item based on the plurality of effect evaluations.

[0005] A third aspect of the present disclosure provides an electronic device comprising at least one processing unit and at least one memory coupled to the at least one processing unit and storing instructions executed by the at least one processing unit, the instructions, when executed by the at least one processing unit, cause the electronic device to perform the method described in the first aspect.

[0006] A fourth aspect of this disclosure provides a computer-readable storage medium that stores a computer program, when executed by a processor, causing the processor to implement the method described in the first aspect.

[0007] A fifth aspect of this disclosure provides a computer program product that, when executed by a processor, includes a computer program that implements the method described in the first aspect.

[0008] It should be understood that the information described in the summary of the invention is not intended to limit the main or important features of the embodiments of this disclosure, nor is it intended to limit the scope of this disclosure. Other features of this disclosure will be readily apparent from the following description. [Brief explanation of the drawing]

[0009] In the following, the aforementioned and other features, advantages, and aspects of the embodiments of this disclosure will become more apparent by referring to the accompanying drawings and the following detailed description. In the accompanying drawings, identical or similar accompanying marks represent identical or similar elements.

[0010] [Figure 1] This is a block diagram of a media processing process according to one embodiment of the present disclosure. [Figure 2] This is a block diagram for processing media items according to some embodiments of the present disclosure. [Figure 3] This is a block diagram of a template for generating media items according to some embodiments of the present disclosure. [Figure 4] This is a block diagram for determining the effect score for a candidate setting according to some embodiments of the present disclosure. [Figure 5] This is a block diagram for generating media items according to some embodiments of the present disclosure. [Figure 6] This is a block diagram for generating a second media item using a template, according to some embodiments of the present disclosure. [Figure 7] This is a flowchart of a method for processing media items according to some embodiments of the present disclosure. [Figure 8] This is a block diagram of an apparatus for processing media items according to some embodiments of the present disclosure. [Figure 9] This is a block diagram of equipment capable of carrying out multiple embodiments of the present disclosure. [Modes for carrying out the invention]

[0011] Embodiments of this disclosure will be described in more detail below with reference to the accompanying drawings. While specific embodiments of this disclosure are illustrated in the accompanying drawings, this disclosure can be implemented in various forms and should not be construed as being limited to the embodiments described herein. Rather, these embodiments should be understood as being provided for the purpose of providing a more thorough and complete understanding of this disclosure. The accompanying drawings and embodiments of this disclosure are for illustrative purposes only and are not intended to limit the scope of protection of this disclosure.

[0012] In describing embodiments of this disclosure, the terms “including” and “embodiments” are understood to have an open-ended inclusion, i.e., “including, but not limited to.” The term “based on” is understood to mean “based at least in part.” The terms “one embodiment” or “the embodiment” should be understood to mean “at least one embodiment.” The terms “several embodiments” should be understood to mean “at least several embodiments.” Other explicit and implicit definitions may be included below. In this specification, the term “model” may refer to a correlation between individual data. For example, the above associative relationships may be derived based on various technical solutions that are currently known and / or may be developed in the future.

[0013] All data related to this technical solution (including, but not limited to, the data itself, or the acquisition or use of the data) shall comply with the requirements of applicable laws and regulations.

[0014] Before using the technical solutions disclosed in each embodiment of this disclosure, please understand that, in accordance with applicable laws and regulations, it is necessary to notify users of the types, scope, and scenarios of use of personal information related to this disclosure and to obtain their consent in an appropriate manner.

[0015] For example, in response to receiving a voluntary request from a user, prompt information is sent to the user, explicitly informing the user that the requested operation requires the acquisition and use of the user's personal information. This allows the user to independently choose, based on the prompt information, whether or not to provide personal information to software or hardware such as electronic devices, applications, servers, or storage media that perform the operation of the technical solution of this disclosure.

[0016] In an optional but non-limiting embodiment, in response to receiving a voluntary request from the user, prompt information is sent to the user, for example, in the form of a pop-up window in which the prompt information is presented in text form. Furthermore, the pop-up window may include option controls for the user to select whether to "agree" or "disagree" to providing personal information to the electronic device.

[0017] The above notice and user authorization process are general in nature and do not limit the ways in which this disclosure may be implemented. Please understand that other methods that comply with applicable laws and regulations may be applied in how this disclosure may be implemented.

[0018] Here, the term "in response to" is used to describe the state in which the corresponding event has occurred or the condition has been met. It should be understood that there may not be a strong correlation between the timing of execution of a subsequent action performed in response to an event or condition and the time the event or condition occurred. For example, in some cases, a subsequent action may be executable immediately when the event occurs or the condition is met, or it may be executable some time after the event occurs or the condition is met.

[0019] Environment example Although various technical solutions for processing media items have been proposed, in the process of processing media items, a lot of manual work is required to obtain media items that meet expectations. Referring to FIG. 1, which illustrates an application environment according to some embodiments of the present disclosure, FIG. 1 is a block diagram of a media processing process according to an embodiment of the present disclosure. In the context of the present disclosure, a specific process for processing media items will be described using video as an example of a media item. Alternatively and / or additionally, the media item can include other formats such as rich text data including images, documents including text and images, and / or other formats.

[0020] As shown in FIG. 1, to obtain a second media item 120 from a first media item 110, the first media item 110 can be processed using a template 140. For example, the template 140 can specify one or more media elements to be included in the second media item 120. At this time, the generated second media item 120 includes rich visual content such as visual contents 130, 132, 134, 136, and can carry more information thereby.

[0021] Multiple templates are provided and can be used to individually generate multiple second media items. However, a large amount of computing resources are required for this generation process, and it is necessary to check one by one whether the multiple second media items meet the user's needs. Furthermore, the generated second media item 120 may not meet the user's needs. In that case, it is necessary to manually edit the second media item 120 using a media editing tool, resulting in failure to generate large-scale media items. What is expected at this point is that media items are processed in a simpler and more efficient way, resulting in media items that meet expectations.

[0022] Overview of Media Item Processing To at least partially address the deficiencies of the prior art, according to one embodiment of the present disclosure, a method for processing media items is proposed. Briefly, a plurality of effectiveness evaluations for processing a first media item using a plurality of templates to generate a second media item can be individually determined. The plurality of effectiveness evaluations are compared, and an effectiveness evaluation that matches the user's needs is selected. As a result, the first media item can be processed using the template corresponding to the selected effectiveness evaluation.

[0023] Referring to FIG. 2, which illustrates a block diagram 200 for processing media items according to some embodiments of the present disclosure and explains an overview of an embodiment according to the present disclosure, as shown in FIG. 2, a first media item may be obtained together with a plurality of templates 212,..., 214. In this specification, the plurality of templates may be from a predetermined template library 210, and the plurality of templates 212,..., 214 are each used to generate a plurality of second media items from the first media item.

[0024] A plurality of candidate settings 220,..., 222 for generating each of the plurality of second media items may be determined based on the plurality of templates 212,..., 214 and the first media item. A candidate setting among the plurality of candidate settings may indicate the first media item and a template within the plurality of templates. For example, candidate setting 220 may indicate template 212 and first media item 110,..., and candidate setting 222 may indicate template 214 and first media item 110. At this time, the candidate setting may indicate the data necessary to generate the second media item. For example, template 212 may be used to process first media item 110 to generate a second media item, and for example, template 214 may be used to process first media item 110 to generate a second media item.

[0025] Furthermore, multiple effect evaluations 230, ..., 232 may be determined, each associated with a multiple candidate setting. The effect evaluations in the multiple effect evaluations represent the effect evaluation of a second media item generated from a first media item using the template indicated by the candidate setting. For example, effect evaluation 230 represents the effect evaluation of a second media item generated from a first media item 110 using the template 212 indicated by candidate setting 220, and effect evaluation 232 represents the effect evaluation of a second media item generated from a first media item 110 using the template 214 indicated by candidate setting 222.

[0026] Effectiveness evaluations can be expressed based on various methods. For example, an effectiveness evaluation can be expressed using a continuous value in the range of 0 to 1 (or other ranges), where a higher value indicates a better match with user needs, and a lower value indicates a worse match. Alternatively and / or additionally, an effectiveness evaluation can also be expressed using a discrete form (e.g., high, medium, low).

[0027] A target template for generating a second media item from a first media item can be selected from multiple templates 212, ..., 214 based on multiple effectiveness evaluations 230, ..., 232. By comparing the multiple effectiveness evaluations 230, ..., 232, the effectiveness evaluation 232 that best meets the user's needs (e.g., having the maximum value) can be determined, and then the corresponding target template (e.g., target 214) can be selected, and the second media item 240 can be generated using the corresponding candidate setting 222.

[0028] In embodiments of this disclosure, instead of actually generating multiple second media items using multiple templates, multiple effectiveness evaluations can be used to indicate whether multiple candidate second media items to be generated meet the user's needs, and then the template that best matches the user's needs can be selected to generate the corresponding second media item. In this way, the computational resource overhead when processing media items can be significantly reduced, and the processing efficiency of media items can be improved.

[0029] Detailed processing of media items Having outlined the processing of media items, the following provides further details regarding media item processing with reference to the attached drawings. According to some embodiments of this disclosure, the first media item can be acquired in various ways. Specifically, the first media item can be acquired from a media sharing application (e.g., the first media sharing application). It should be understood that a media sharing application can provide a rich data source, containing numerous original media items posted by numerous users. The first media item can be acquired from multiple original media items posted by multiple users of the first media sharing application, thereby improving the efficiency of acquiring media items.

[0030] According to some embodiments of this disclosure, the first media item is a media segment extracted from multiple original media items. For example, suppose in a data promotion scenario, it is desired to promote a first media sharing application and / or media items within the application, then all or part of the original media items may be used as the first media item. Typically, a user-posted video may be long (e.g., 5 minutes), and a key portion of the video may be extracted (e.g., 10 seconds). For example, a machine learning model may be used to analyze the content of the original media items, perform segmentation and selection of the original media items based on user needs, and find media segments that better meet user needs (e.g., to appeal to a wider range of user interests).

[0031] According to some embodiments of this disclosure, a selected template may be used to generate a second media item from a first media item, and the template represents a layout of multiple media elements to be added to the second media item. More information regarding templates will be described with reference to Figure 3, which shows a block diagram 300 of a template for generating a media item according to some embodiments of this disclosure. As shown in Figure 3, template 214 may, in particular, include multiple media elements 310, 320, 330, 340, and 350.

[0032] There are several ways to represent template 214. For example, an image can be used to represent the template, and multiple regions can be defined within the image to represent multiple media elements separately. Another example is the use of a custom format to represent the template, such as representing it as an array according to the pixel coordinates of the media elements' positions.

[0033] According to some embodiments of this disclosure, in the process of determining the relevant effectiveness evaluation for a candidate setting, the features associated with the candidate setting may be determined based on the template and the first media item specified in the candidate setting. Specifically, the features associated with the template and the features associated with the first media item are determined, respectively, thereby determining the features of the candidate setting. Details regarding the determination of features and thus effectiveness scores are described with reference to Figure 4, which shows a block diagram 400 for determining the effectiveness score of a candidate setting according to some embodiments of this disclosure.

[0034] As shown in Figure 4, if the first media item is video 410, video features 414 can be extracted using encoder 412 for extracting video features. If an image is used to represent a template, template features 424 can be extracted using encoder 422 for extracting image features. As neural network technology matures, it can be used to perform computations on raw image and video data. As neural network technology matures, it can be used to perform computations on raw image and video data and extract features, which can then be used for analysis of strategic models, clustering, classification, and other scenarios. Since video contains image frames, feature information for each image frame can be extracted using a pre-trained neural network. Specifically, residual networks and / or other networks can be implemented. This step allows the original video to be processed into an N*D feature vector, where N represents the number of frames in the video and D represents the feature vector dimension of the video, with each image frame corresponding to a feature vector.

[0035] Furthermore, the video features 414 and template features 424 are input to the neural network 450, which can then determine the corresponding effect score 452. For example, the video features 414 and template features 424 can be combined to obtain a candidate feature configuration and then input to the neural network 450. In this way, the powerful processing capabilities of the machine learning model can be utilized to determine the effect score 452 in a more accurate manner.

[0036] According to some embodiments of this disclosure, a machine learning model may be obtained based on a generation target, such as generating a second media item from a first media item. Continuing the example above, in a data promotion scenario, the target for generating a second media item from a first media item is assumed to be increasing the access of the first media item to more users. Training data for a machine learning model can then be obtained based on this target, for example, by selecting the first and second reference media items that will receive more user access. Another example is assuming that the target for generating a second media item from a first media item is increasing the download of recommended applications within the first media item, leading to the selection of the first and second reference media items that will achieve more user downloads. Using the exemplary implementations of this disclosure, the effectiveness score output by this machine learning model can be more closely matched to the generation target, thereby making the second media item generated from the selected template more responsive to user needs.

[0037] According to some embodiments of this disclosure, additional factors may be considered in the process of generating features. For example, the candidate settings may further include background audio (e.g., music) for generating a second media item. The background audio as used herein may represent audio used to replace the original background audio in the first media item; that is, the second media item thus generated may have entirely new background audio, thereby making the second media item more responsive to the needs of use.

[0038] Specifically, background audio can be selected from an audio library containing multiple background audio tracks. In this case, the features of the background audio can be used to update the features of the candidate settings. Referring again to Figure 4, music 430 may be selected, and music features 434 may be extracted using an encoder 432 for extracting music features. Furthermore, video features 414, template features 424, and music features 434 may be input to a neural network 450, which then obtains a corresponding effect score 452. In this way, audio information may be considered when determining the effect score, thereby making the determined effect score more accurate.

[0039] According to some embodiments of this disclosure, the candidate settings may further include attributes of the first media item, where the attributes may include, in particular, at least one of the state of the first media item, the category of the first media item, the content template of the first media item, and the audio of the first media item. Continuing to refer to Figure 4, a state numerical feature 442 may be extracted from the state of the first media item. For a first media item posted in a first media sharing application, the state may represent the playback state, like state, follow state, etc. of the first media item. A category numerical feature 444 may be extracted from the category of the first media item, for example, the category to which the presence content of the first media item belongs, a food category, a landscape category, a song category, etc.

[0040] Alternatively and / or additionally, template numerical features 446 may be extracted from the content template of the first media item. Herein, the content template is a template used in the process of generating the first media item (e.g., defining settings such as style, duration, and subplots in the first media item) and is different from the template 214 for generating the second media item. Herein, the audio of the first media item is audio used in the first media item itself (e.g., background music) and is different from the music 430 that generated the second media item.

[0041] According to some embodiments of this disclosure, state numerical features 442, category numerical features 444, template numerical features 446, and music numerical features 448 may be obtained, and these features may be sequentially input into a neural network 440 to obtain attribute features of a first media item (for example, for combinations of individual features and / or dimensions). Furthermore, the attribute features obtained in the manner described above are used to update the features of a candidate setting. Specifically, by inputting video features 414, template features 424, music features 434, and attribute features into a neural network 450, a corresponding effect score 452 can be obtained. In this way, a richer set of information can be considered when determining the effect score, resulting in a more accurate determination effect score.

[0042] Figure 4 merely illustrates the process for determining the effect score 452 based on template 420 and music 430, but it should be understood. Alternatively and / or additionally, a given library of templates and music may be provided. By traversing each template in the template library and each music in the music library, multiple effect scores for multiple candidate settings can be combined and determined. Assuming the template library contains K templates and the music library contains L music, we obtain K*L candidate settings and corresponding K*L effect scores. The template and music corresponding to the highest effect score can be selected to generate a second media item from the first media item. In this way, instead of occupying a large amount of computational resources to actually generate K*L second media items, a machine learning model can be used to determine the template and music that yield a better effect score.

[0043] Once the best effect score is determined, a corresponding second media item can be generated using templates and music that can obtain a better effect score. Further details will be explained with reference to Figure 5, which shows a block diagram 500 for generating media items according to some embodiments of this disclosure. According to some embodiments of this disclosure, the template library 530 can provide a large number of templates, and the music library 510 can provide a large number of music tracks. The representations of different segments of the original video in the video library 520, which has been processed into placement material, can vary, and the performance of the placement video can be estimated using the machine learning model described above.

[0044] As shown in Figure 5, music 512 can be selected from music library 510, video clips 524 can be extracted from video library 520 using video understanding model 522, and templates 532 can be selected from template library 530, thereby generating video 540. In this specification, video clips 524 can be a complete video in video library 520 or a clip from that complete video. In this way, a video 540 that is closer to the user's needs can be generated. According to some embodiments of this disclosure, the video understanding model 522 can be used to generate corresponding text 514 (such as a description of video clip 524) and add the text 514 to video 540.

[0045] According to some embodiments of this disclosure, a template can further represent multiple associations between multiple media elements and multiple attributes of a first media item. Furthermore, the multiple attributes can be added to multiple positions of the multiple media elements corresponding to the target template, based on the multiple associations, in order to generate a second media item. More information will be explained with reference to Figure 6, which shows a block diagram 600 that generates a second media item using a template according to some embodiments of this disclosure.

[0046] As shown in Figure 6, the multiple attributes include, for the first media item, at least one of the following: the content of the first media item, a description of the first media item, the access address of the application for accessing the first media item, and the application identifier. Specifically, for each media element specified in template 214, corresponding attributes can be added to the corresponding positions.

[0047] The media element 310 of template 214 may correspond to an application identifier used to access the first media item; that is, the application identifier 612 may be added to the location of media element 310. The media element 320 may correspond to the content of the first media item; that is, the content 614 may be added to the location of media element 320; that is, the description 616 may be added to the location of media element 330. Media elements 340 and 350 may correspond to the access addresses of the application; that is, access addresses 620 and 622 may be added to the locations of media elements 340 and 350, respectively (access address 620 may be used to download an application installed on the operating system, and access address 622 may be used to download an application installed on the operating system). In this case, the second media item includes an address for accessing the first media sharing application.

[0048] The exemplary implementations of this disclosure allow for a more precise and effective specification of the attributes present in a second media item and the locations where those attributes are present, thereby making the resulting second media item more sensitive to user needs.

[0049] According to some embodiments of this disclosure, the description of a first media item is extracted from the first media item. Specifically, text extracted from the video understanding model 522 shown in Figure 5 can be used as the description. Using the exemplary implementations of this disclosure, the presenting description will be consistent with the original content of the first media item.

[0050] According to some embodiments of this disclosure, a second media item may be provided in a second media sharing application different from the first media sharing application. For example, the first media sharing application may be an application for posting short videos, and the second media sharing application may be an application for posting multimedia data. By combining different applications in this way, the generated second media item can be viewed by more users, thereby improving the efficiency of data promotion.

[0051] High-quality creative materials play a crucial role in the data promotion process. High-quality materials attract users, thereby allowing them to obtain richer information. According to some embodiments of this disclosure, suitable content for creating materials can be found from media sharing applications for material processing and ultimately placed.

[0052] According to several embodiments of this disclosure, high-quality material can be produced by intelligently editing original media content using multimodal technology and content generation functions. Specifically, a video understanding process can apply multimodal technology to an application to understand a video and provide a basis for subsequent content extraction, music recommendation, and template recommendation. A clip extraction process can slice and select from the original video to find appropriate video clips. A text generation process can use a language model to generate text suitable for the video content. A music recommendation process recommends music suitable for the video content. A template recommendation process can recommend a template suitable for each video content. A finished product effect prediction process can build a machine learning model to predict the effect of the finished product after processing and select an appropriate material processing method according to that effect.

[0053] With the maturation of multimodal and large-scale model technologies, computer vision models can be used to identify and understand original user-generated content and obtain video comprehension information. Based on video comprehension, the video can be sliced and atomized through multimodal technology, and at the same time, appropriate textual information can be processed using language models. Machine learning can be used to select the music and template that best matches this video. Finally, the content is spliced into finished material using multimodal technology.

[0054] Process example Figure 7 is a flowchart of a method 700 for processing a media item according to some embodiments of the present disclosure. In block 710, a first media item and a plurality of templates, each to be used to generate a plurality of second media items from the first media item, are obtained. In block 720, a plurality of candidate settings for generating a plurality of second media items, each based on the plurality of templates and the first media item, are determined, where one of the candidate settings indicates a template from the plurality of templates and the first media item. In block 730, a plurality of effect evaluations, each associated with the plurality of candidate settings, are determined, where one of the effect evaluations represents the effect evaluation of the second media item generated from the first media item using the template indicated by the candidate settings. In block 740, a target template for generating a second media item from the first media item is selected from the plurality of templates based on the plurality of effect evaluations.

[0055] According to some embodiments of the present disclosure, determining an effectiveness evaluation among multiple effectiveness evaluations includes determining features associated with a candidate setting based on a template and a first media item, and determining the effectiveness evaluation using a machine learning model based on the features.

[0056] According to some embodiments of this disclosure, the template represents a layout of multiple media elements to be added to a second media item, and determining the features includes determining the features based on the features of the template and the features of the first media item.

[0057] According to some embodiments of the present disclosure, the candidate setting further includes background audio for generating a second media item, the background audio is selected from a plurality of background audios, and determining the features further includes updating the features using the features of the background audio.

[0058] According to some embodiments of the present disclosure, the candidate setting further includes attributes of a first media item, and determining the feature further includes updating the feature using the attribute features, the attribute including at least one of the state of the first media item, the category of the first media item, the content template of the first media item, and the audio of the first media item.

[0059] According to some embodiments of this disclosure, a machine learning model is obtained based on a generation target that generates a second media item from a first media item.

[0060] According to some embodiments of the present disclosure, the template further represents multiple association relationships between multiple media elements and multiple attributes of a first media item, and the method further includes adding multiple attributes to multiple positions of multiple media elements corresponding to each target template, based on the multiple association relationships, in order to generate a second media item.

[0061] According to some embodiments of the present disclosure, the multiple attributes include at least one of the following: the content of the first media item, a description of the first media item, an access address of an application for accessing the first media item, and an identifier of the application.

[0062] According to some embodiments of this disclosure, the description of the first media item is extracted from the first media item.

[0063] According to some embodiments of the present disclosure, obtaining a first media item includes obtaining a first media item from a plurality of original media items posted by a plurality of users of a first media sharing application, and the method further includes providing a second media item in a second media sharing application.

[0064] According to some embodiments of this disclosure, the first media item is a media segment extracted from multiple original media items.

[0065] Exemplary devices and equipment Figure 8 is a block diagram of an apparatus 800 for processing a media item according to some embodiments of the present disclosure. The apparatus 800 comprises: an acquisition module 810 configured to acquire a first media item and a plurality of templates, each used to generate a plurality of second media items from the first media item; a generation module 820 configured to determine a plurality of candidate settings for generating a plurality of second media items, each based on the plurality of templates and the first media item, wherein one of the candidate settings determines a plurality of candidate settings that indicate a template and the first media item from the plurality of templates; an evaluation module 830 configured to determine a plurality of effect evaluations associated with each of the plurality of candidate settings, wherein one of the effect evaluations determines a plurality of effect evaluations that represent the effect evaluation of a second media item generated from the first media item using the template indicated by the candidate settings; and a selection module 840 configured to select a target template from a plurality of templates for generating a second media item from the first media item based on the plurality of effect evaluations.

[0066] According to some embodiments of the present disclosure, the evaluation module includes a feature determination module configured to determine features associated with a candidate setting based on a template and a first media item, and a calling module configured to determine an effectiveness evaluation using a machine learning model based on the features.

[0067] According to some embodiments of the present disclosure, the template represents a layout of multiple media elements to be added to a second media item, and the feature determination module includes a combination module configured to determine features based on the features of the template and the features of the first media item.

[0068] According to some embodiments of the present disclosure, the candidate setting further includes background audio for generating a second media item, the background audio being selected from a plurality of background audios, and the feature determination module further includes an update module configured to update the feature using the features of the background audio.

[0069] According to some embodiments of the present disclosure, the candidate setting further includes attributes of a first media item, and the feature determination module further includes updating the feature using the features of the attribute, the attribute including at least one of the state of the first media item, the category of the first media item, the content template of the first media item, and the audio of the first media item.

[0070] According to some embodiments of this disclosure, a machine learning model is obtained based on a generation target that generates a second media item from a first media item.

[0071] According to some embodiments of the present disclosure, the template further represents multiple association relationships between multiple media elements and multiple attributes of a first media item, and the device further includes additional modules configured to add multiple attributes to multiple positions of multiple media elements corresponding to a target template, based on the multiple association relationships, in order to generate a second media item.

[0072] According to some embodiments of the present disclosure, the multiple attributes include at least one of the following: the content of the first media item, a description of the first media item, an access address of an application for accessing the first media item, and an identifier of the application.

[0073] According to some embodiments of this disclosure, the description of the first media item is extracted from the first media item.

[0074] According to some embodiments of the present disclosure, the acquisition module includes an extraction module configured to acquire a first media item from a plurality of original media items posted by a plurality of users of a first media sharing application, and the device further includes a providing module configured to provide a second media item in a second media sharing application.

[0075] According to some embodiments of this disclosure, the first media item is a media segment extracted from multiple original media items.

[0076] Figure 9 is a block diagram of a computer 900 capable of carrying out several embodiments of the present disclosure. It should be understood that the computer 900 shown in Figure 9 is merely an example and does not limit the functionality and scope of the embodiments described herein. The computer 900 shown in Figure 9 can be used to carry out the methods described.

[0077] As shown in Figure 9, the computing device 900 is in the form of a general-purpose computing device. The components of the computing device 900 may include, but are not limited to, one or more processors or processing units 910, memory 920, storage devices 930, one or more communication units 940, one or more input devices 950, and one or more output devices 960. The processing unit 910 may be an actual or virtual processor and can perform various processes based on a program stored in memory 920. In a multiprocessor system, multiple processing units execute instructions that can be executed by the computer in parallel to improve the parallel processing capability of the computing device 900.

[0078] The computing device 900 generally includes multiple computer storage media. Such media may be any obtainable media accessible to the computing device 900, and may include, but are not limited to, volatile and non-volatile media, and removable and non-removable media. Memory 920 may be volatile memory (e.g., registers, fast cache, random access memory (RAM)), non-volatile memory (e.g., read-only memory (ROM), electrically erasable programmable read-only memory (EEPROM), flash memory), or some combination thereof. Storage device 930 may be removable or non-removable media, and may include machine-readable media, such as flash memory drives, magnetic disks, or any other media, which can be used to store information and / or data (e.g., training data for training) and can be accessed within the computing device 900.

[0079] The computing device 900 may further include other removable / non-removable, volatile / non-volatile storage media. Not shown in Figure 9, a magnetic disk drive for reading from or writing to removable, non-volatile magnetic disks (e.g., “floppy disks”) and a disk drive for reading from or writing to removable, non-volatile disks may be provided. In these cases, each drive may be connected to a bus (not shown) by one or more data medium interfaces. The memory 920 may include a computer program product 925 having one or more program modules, which are configured to perform various methods or operations of various embodiments of the present disclosure.

[0080] The communication unit 940 enables communication with other computing devices via a communication medium. Additionally, the functions of the components of computing device 900 can be implemented in a single computing cluster or multiple computing devices, which can communicate via communication connections. Therefore, computing device 900 can operate in a network environment using logical connections with one or more other servers, network personal computers (PCs), or other network nodes.

[0081] The input device 950 may be one or more input devices, such as a mouse, keyboard, or trackball. The output device 960 may be one or more output devices, such as a display, speaker, or printer. The computing device 900 may further communicate with one or more external devices (not shown), such as storage devices or display devices, via the communication unit 940 as needed, communicate with one or more devices for user-to-computer interaction, or communicate with any device (e.g., a network card or modem) for communication between the computing device 900 and one or more other computing devices. Such communication may be performed via an input / output (I / O) interface (not shown).

[0082] According to embodiments of the present disclosure, a computer-readable storage medium is provided in which computer-executable instructions are stored, wherein the computer-executable instructions are executed by a processor to implement the method described above. According to embodiments of the present disclosure, a computer program product is further provided, which is tangibly stored in a non-transient computer-readable medium and includes computer-executable instructions, wherein the computer-executable instructions are executed by a processor to implement the method described above. According to embodiments of the present disclosure, a computer program product is provided in which a computer program is stored, and the program is executed by a processor to implement the method described.

[0083] Aspects of this disclosure are described herein with reference to flowcharts and / or block diagrams of methods, apparatus, devices, and computer program products implemented in accordance with this disclosure. It should be understood that every block in a flowchart and / or block diagram, and every combination of blocks in a flowchart and / or block diagram, can be implemented by computer-readable program instructions.

[0084] These computer-readable program instructions are provided to a processing unit of a general-purpose computer, a computer for specific purposes, or other programmable data processing device to create a device that, when executed by the computer or other programmable data processing device, produces a device that implements the functions / operations defined in one or more blocks in a flowchart and / or block diagram. These computer-readable program instructions may be stored in a computer-readable storage medium, and by operating the computer, programmable data processing device, and / or other device in a specific manner, the computer-readable medium on which the instructions are stored includes a product containing instructions in each mode that implements the functions / operations defined in one or more blocks in a flowchart and / or block diagram.

[0085] By loading computer-readable program instructions into a computer, other programmable data processing device, or other device, and by executing a series of operational steps on the computer, other programmable data processing device, or other device, the instructions executed on the computer, other programmable data processing device, or other device can realize the functions / operations defined in one or more blocks in a flowchart and / or block diagram.

[0086] The flowcharts and block diagrams in the drawings illustrate the implementable system architectures, functions, and operations of multiple implementations of the computer program products provided herein. In this regard, each block in a flowchart or block diagram may represent a module, program segment, or part of an instruction, and a module, program segment, or part of an instruction may contain one or more executable instructions for implementing a defined logical function. In some alternative implementations, the functions represented in a block may occur in an order different from the order shown in the drawings. For example, two consecutive blocks may actually be executed almost in parallel, or in reverse order depending on the related functions. Note that each block in a block diagram and / or flowchart, and combinations of blocks in a block diagram and / or flowchart, may be implemented in a dedicated, hardware-based system that performs the function or operation defined in the block diagram and / or flowchart, or in a combination of dedicated hardware and computer instructions.

[0087] The above descriptions illustrate various implementations of this disclosure; however, these descriptions are illustrative, not exhaustive, and are not limited to any of the disclosed implementations. Many modifications and changes will be apparent to those skilled in the art without departing from the scope and spirit of the described implementations. The terminology used herein has been selected to best describe the various embodiments disclosed herein, the principles of the embodiments, the practical application or improvement of the technology in the market, or to be understood by those skilled in the art.

Claims

1. A method for processing media items, Obtaining a first media item and multiple templates, each used to generate multiple second media items from the first media item, A plurality of candidate settings for generating the plurality of second media items based on the plurality of templates and the first media item, wherein the plurality of candidate settings determines the plurality of candidate settings which represent the template and the first media item among the plurality of templates, A plurality of effect evaluations associated with each of the plurality of candidate settings, wherein the effect evaluation among the plurality of effect evaluations determines the plurality of effect evaluations that represent the effect evaluation of the second media item generated from the first media item using the template indicated by the candidate setting, This includes selecting a target template from the plurality of templates for generating the second media item from the first media item based on the plurality of effect evaluations. method.

2. Determining the effect evaluation among the aforementioned multiple effect evaluations is, Based on the template and the first media item, determine the features associated with the candidate setting. This includes determining the effect evaluation using a machine learning model based on the aforementioned features, The method according to claim 1.

3. The template represents the layout of multiple media elements to be added to the second media item, and determining the features includes determining the features based on the features of the template and the features of the first media item. The method according to claim 2.

4. The candidate setting further includes background audio for generating the second media item, the background audio being selected from a plurality of background audios, and determining the features further includes updating the features using the features of the background audio. The method according to claim 2.

5. The candidate setting further includes attributes of the first media item, and determining the feature further includes updating the feature using the features of the attribute, wherein the attribute includes at least one of the state of the first media item, the category of the first media item, the content template of the first media item, and the audio of the first media item. The method according to claim 2.

6. The machine learning model is obtained based on a generation target that generates the second media item from the first media item. The method according to claim 2.

7. The template further represents a plurality of association relationships between the plurality of media elements and a plurality of attributes of the first media item, and the method further includes, in order to generate the second media item, adding the plurality of attributes to a plurality of positions of the plurality of media elements corresponding to the target template, based on the plurality of association relationships. The method according to claim 3.

8. The aforementioned plurality of attributes include at least one of the following: the content of the first media item, a description of the first media item, an access address of an application for accessing the first media item, and an identifier of the application. The method according to claim 7.

9. The description of the first media item is extracted from the first media item. The method according to claim 8.

10. Obtaining the first media item includes obtaining the first media item from multiple original media items posted by multiple users of a first media sharing application, and the method further includes providing the second media item in a second media sharing application. The method according to claim 7.

11. The first media item is a media segment extracted from the plurality of original media items. The method according to claim 10.

12. A device for processing media items, A retrieval module configured to retrieve a first media item and a plurality of templates, each used to generate a plurality of second media items from the first media item, A generation module comprising a plurality of candidate settings for generating a plurality of second media items based on the plurality of templates and the first media item, wherein the candidate setting among the plurality of candidate settings determines the plurality of candidate settings that represent a template among the plurality of templates and the first media item, An evaluation module comprising a plurality of effect evaluations associated with each of the plurality of candidate settings, wherein the effect evaluation among the plurality of effect evaluations determines the plurality of effect evaluations that represent the effect evaluation of a second media item generated from the first media item using the template indicated by the candidate setting, An apparatus comprising: a selection module configured to select a target template from a plurality of templates for generating a second media item from a first media item based on the plurality of effect evaluations.

13. At least one processing unit, An electronic device comprising: at least one memory connected to the at least one processing unit, which stores instructions executed by the at least one processing unit; When the instruction is executed by the at least one processing unit, it causes the electronic device to perform the method according to any one of claims 1 to 11. electronic equipment.

14. When executed on a processor, a computer program is stored in the processor that causes it to perform the method according to any one of claims 1 to 11. Computer-readable storage medium.

15. A computer program product comprising a computer program that, when executed on a processor, implements the method described in any one of claims 1 to 11.