Video clip method, device, electronic device, storage medium and program product
By obtaining editing control information input by the user and evaluating through neural networks, video segments are automatically selected, solving the problems of disharmonious and aesthetically unappealing video editing in existing technologies, and generating harmonious videos that meet the user's needs.
Patent Information
- Authority / Receiving Office
- CN · China
- Patent Type
- Patents(China)
- Current Assignee / Owner
- SENSETIME GRP LTD
- Filing Date
- 2022-06-17
- Publication Date
- 2026-06-26
AI Technical Summary
In existing technologies, video editing mainly relies on manual operation, which makes it difficult to automate and meet users' personalized editing needs. Moreover, the editing results are often disharmonious and do not meet aesthetic standards.
By obtaining the editing control information input by the user, the editing ratio information of different types of video segments is determined. Based on this information, candidate video segments are selected from the video to be edited, and candidate videos that meet the user's needs are generated. The scores of the video segments are evaluated using neural networks to optimize the editing results.
It automates the video editing process, producing harmonious and aesthetically pleasing video clips that meet users' personalized needs.
Smart Images

Figure CN115103135B_ABST
Abstract
Description
Technical Field
[0001] This disclosure relates to the field of computer technology, and in particular to a video editing method, apparatus, electronic device, storage medium, and program product. Background Technology
[0002] Video editing refers to the trimming and / or editing of videos. In manual video editing, professional editors typically select, break down, and splice together large amounts of footage shot during video production to create a coherent, smooth, meaningful, thematically clear, and artistically compelling work. How to automatically edit videos to meet user needs is a pressing technical problem that needs to be solved. Summary of the Invention
[0003] This disclosure provides a video editing technology solution.
[0004] According to one aspect of this disclosure, a video editing method is provided, comprising:
[0005] Obtain at least one video to be edited;
[0006] Obtain the editing control information input by the user;
[0007] Based on the editing control information, determine the editing ratio information corresponding to different types of video segments;
[0008] Based on the editing ratio information, at least two candidate video segments are determined from the at least one video to be edited;
[0009] At least one candidate video is generated based on the at least two candidate video segments.
[0010] In one possible implementation, the editing control information includes information about the target editing element;
[0011] The step of determining the editing ratio information corresponding to different types of video segments based on the editing control information includes:
[0012] Based on the information of the target editing element, determine the proportion of the video segment corresponding to the target editing element, and the proportion of the video segment corresponding to at least one editing element other than the target editing element.
[0013] In one possible implementation, the editing control information includes information about the target video style;
[0014] The step of determining the editing ratio information corresponding to different types of video segments based on the editing control information includes:
[0015] Based on the information of the target video style, determine the proportion of video segments corresponding to at least two types of camera movement.
[0016] In one possible implementation, the editing control information includes target duration information;
[0017] The step of determining at least two candidate video segments from the at least one video to be edited based on the editing ratio information includes:
[0018] Based on the target duration information, determine the target number of video segments in the candidate videos;
[0019] Based on the editing ratio information and the target quantity, at least two candidate video segments are determined from the at least one video to be edited.
[0020] In one possible implementation, determining at least two candidate video segments from the at least one video to be edited, based on the editing ratio information and the target quantity, includes:
[0021] Based on the editing ratio information and the target quantity, determine the editing elements and camera movement types corresponding to at least two video segment positions in the candidate video;
[0022] From the at least one video to be edited, determine the video to be edited corresponding to the positions of the at least two video segments;
[0023] For any of the at least two video segment positions, based on the editing elements and camera movement type corresponding to the video segment position, the projection parameters corresponding to the initial video frame of the candidate video segment corresponding to the video segment position and the projection parameters corresponding to the end video frame of the candidate video segment are determined from the video to be edited corresponding to the video segment position. The projection parameters corresponding to the video frame include the position of the projection center and the field of view.
[0024] Based on the projection parameters corresponding to the initial video frame and the projection parameters corresponding to the end video frame, determine the projection parameters corresponding to the intermediate video frames of the candidate video segment.
[0025] Based on the projection parameters corresponding to the initial video frame, the projection parameters corresponding to the end video frame, and the projection parameters corresponding to the intermediate video frames, candidate video segments corresponding to the video segments to be edited are determined from the video segments to be edited at the video segment positions.
[0026] In one possible implementation, determining the projection parameters corresponding to the intermediate video frames of the candidate video segment based on the projection parameters corresponding to the initial video frame and the projection parameters corresponding to the end video frame includes:
[0027] Determine the total number of frames in the candidate video segment and the camera motion parameters corresponding to the candidate video segment;
[0028] Based on the projection parameters corresponding to the initial video frame, the projection parameters corresponding to the end video frame, the total number of frames, and the camera motion parameters, the projection parameters corresponding to the intermediate video frames of the candidate video segment are determined.
[0029] In one possible implementation, generating at least one candidate video based on the at least two candidate video segments includes:
[0030] For a first candidate video segment among the at least two candidate video segments, a first score is determined for the candidate video segment pair to which the first candidate video segment belongs, wherein the candidate video segment pair includes the first candidate video segment and a second candidate video segment, and the second candidate video segment is adjacent to the video segment corresponding to the first candidate video segment.
[0031] In response to the first score meeting the preset conditions, the first candidate video segment is retained;
[0032] At least one candidate video is generated based on the candidate video segments retained from the at least two candidate video segments.
[0033] In one possible implementation, determining the first score corresponding to the candidate video segment pair to which the first candidate video segment belongs includes:
[0034] Determine the second score corresponding to the first candidate video segment and the second score corresponding to the second candidate video segment;
[0035] Determine the third score corresponding to the candidate video segment pair;
[0036] Based on the second score corresponding to the first candidate video segment, the second score corresponding to the second candidate video segment, and the third score, the first score corresponding to the candidate video segment is determined.
[0037] In one possible implementation, determining the second score corresponding to the first candidate video segment and the second score corresponding to the second candidate video segment includes:
[0038] Based on the first editing element corresponding to the first candidate video segment, determine the second score corresponding to the first candidate video segment;
[0039] The second score corresponding to the second candidate video segment is determined based on the second editing element corresponding to the second candidate video segment.
[0040] In one possible implementation,
[0041] The step of determining the second score corresponding to the first candidate video segment based on the first editing element corresponding to the first candidate video segment includes: cropping the video content corresponding to the first editing element from the first candidate video segment based on the first editing element corresponding to the first candidate video segment; and using a pre-trained first neural network to process the first candidate video segment and the video content corresponding to the first editing element to obtain the second score corresponding to the first candidate video segment, wherein the first neural network is pre-trained using a training video set, and the training videos in the training video set are videos that have been captured.
[0042] The step of determining the second score corresponding to the second candidate video segment based on the second editing element corresponding to the second candidate video segment includes: cropping the video content corresponding to the second editing element from the second candidate video segment based on the second editing element corresponding to the second candidate video segment; and using the first neural network to process the second candidate video segment and the video content corresponding to the second editing element to obtain the second score corresponding to the second candidate video segment.
[0043] In one possible implementation, determining the third score corresponding to the candidate video segment pair includes:
[0044] The third score corresponding to the candidate video segment pair is determined based on at least one of the following:
[0045] Consistency information between the movement direction of the video frames in the first candidate video segment and the movement direction of the person in the first candidate video segment;
[0046] Consistency information between the movement direction of the video frames in the second candidate video segment and the movement direction of the person in the second candidate video segment;
[0047] Consistency information between the movement directions of video frames in the first candidate video segment and the second candidate video segment;
[0048] Similarity information between the editing elements corresponding to the first candidate video segment and the second candidate video segment.
[0049] In one possible implementation, the number of candidate videos is at least two;
[0050] After generating at least one candidate video, the method further includes:
[0051] For any one of at least two candidate videos, determine a fourth score that corresponds one-to-one with a combination of video segments in the candidate video, wherein the combination of video segments includes at least three adjacent video segments;
[0052] At least based on the fourth score, determine the fifth score corresponding to the candidate video;
[0053] Based on the fifth score, at least one target video is determined from the at least two candidate videos.
[0054] In one possible implementation, determining the fourth score, which corresponds one-to-one with the combination of video segments in the candidate videos, includes:
[0055] For any combination of video segments in the candidate videos, a pre-trained second neural network is used to process the combination of video segments to obtain a fourth score corresponding to the combination of video segments.
[0056] In one possible implementation, determining the fifth score corresponding to the candidate video based at least on the fourth score includes:
[0057] A third score is determined that corresponds one-to-one with the video segment pairs in the candidate videos, wherein the video segment pairs include two adjacent video segments;
[0058] Based on the fourth score and the third score, a fifth score is determined for the candidate video.
[0059] In one possible implementation, the at least one video to be edited includes: at least one panoramic video.
[0060] According to one aspect of this disclosure, a video editing apparatus is provided, comprising:
[0061] The first acquisition module is used to acquire at least one video to be edited;
[0062] The second acquisition module is used to acquire the clip control information input by the user;
[0063] The first determining module is used to determine the editing ratio information corresponding to different types of video segments based on the editing control information;
[0064] The second determining module is used to determine at least two candidate video segments from the at least one video to be edited based on the editing ratio information.
[0065] A generation module is used to generate at least one candidate video based on the at least two candidate video segments.
[0066] In one possible implementation, the editing control information includes information about the target editing element;
[0067] The first determining module is used for:
[0068] Based on the information of the target editing element, determine the proportion of the video segment corresponding to the target editing element, and the proportion of the video segment corresponding to at least one editing element other than the target editing element.
[0069] In one possible implementation, the editing control information includes information about the target video style;
[0070] The first determining module is used for:
[0071] Based on the information of the target video style, determine the proportion of video segments corresponding to at least two types of camera movement.
[0072] In one possible implementation, the editing control information includes target duration information;
[0073] The second determining module is used for:
[0074] Based on the target duration information, determine the target number of video segments in the candidate videos;
[0075] Based on the editing ratio information and the target quantity, at least two candidate video segments are determined from the at least one video to be edited.
[0076] In one possible implementation, the second determining module is used to:
[0077] Based on the editing ratio information and the target quantity, determine the editing elements and camera movement types corresponding to at least two video segment positions in the candidate video;
[0078] From the at least one video to be edited, determine the video to be edited corresponding to the positions of the at least two video segments;
[0079] For any of the at least two video segment positions, based on the editing elements and camera movement type corresponding to the video segment position, the projection parameters corresponding to the initial video frame of the candidate video segment corresponding to the video segment position and the projection parameters corresponding to the end video frame of the candidate video segment are determined from the video to be edited corresponding to the video segment position. The projection parameters corresponding to the video frame include the position of the projection center and the field of view.
[0080] Based on the projection parameters corresponding to the initial video frame and the projection parameters corresponding to the end video frame, determine the projection parameters corresponding to the intermediate video frames of the candidate video segment.
[0081] Based on the projection parameters corresponding to the initial video frame, the projection parameters corresponding to the end video frame, and the projection parameters corresponding to the intermediate video frames, candidate video segments corresponding to the video segments to be edited are determined from the video segments to be edited at the video segment positions.
[0082] In one possible implementation, the second determining module is used to:
[0083] Determine the total number of frames in the candidate video segment and the camera motion parameters corresponding to the candidate video segment;
[0084] Based on the projection parameters corresponding to the initial video frame, the projection parameters corresponding to the end video frame, the total number of frames, and the camera motion parameters, the projection parameters corresponding to the intermediate video frames of the candidate video segment are determined.
[0085] In one possible implementation, the generation module is used to:
[0086] For a first candidate video segment among the at least two candidate video segments, a first score is determined for the candidate video segment pair to which the first candidate video segment belongs, wherein the candidate video segment pair includes the first candidate video segment and a second candidate video segment, and the second candidate video segment is adjacent to the video segment corresponding to the first candidate video segment.
[0087] In response to the first score meeting the preset conditions, the first candidate video segment is retained;
[0088] At least one candidate video is generated based on the candidate video segments retained from the at least two candidate video segments.
[0089] In one possible implementation, the generation module is used to:
[0090] Determine the second score corresponding to the first candidate video segment and the second score corresponding to the second candidate video segment;
[0091] Determine the third score corresponding to the candidate video segment pair;
[0092] Based on the second score corresponding to the first candidate video segment, the second score corresponding to the second candidate video segment, and the third score, the first score corresponding to the candidate video segment is determined.
[0093] In one possible implementation, the generation module is used to:
[0094] Based on the first editing element corresponding to the first candidate video segment, determine the second score corresponding to the first candidate video segment;
[0095] The second score corresponding to the second candidate video segment is determined based on the second editing element corresponding to the second candidate video segment.
[0096] In one possible implementation, the generation module is used to:
[0097] Based on the first editing element corresponding to the first candidate video segment, the video content corresponding to the first editing element is cropped from the first candidate video segment; a pre-trained first neural network is used to process the first candidate video segment and the video content corresponding to the first editing element to obtain a second score corresponding to the first candidate video segment, wherein the first neural network is pre-trained using a training video set, and the training videos in the training video set are videos that have been captured.
[0098] Based on the second editing element corresponding to the second candidate video segment, the video content corresponding to the second editing element is cropped from the second candidate video segment; the first neural network is used to process the second candidate video segment and the video content corresponding to the second editing element to obtain the second score corresponding to the second candidate video segment.
[0099] In one possible implementation, the generation module is used to:
[0100] The third score corresponding to the candidate video segment pair is determined based on at least one of the following:
[0101] Consistency information between the movement direction of the video frames in the first candidate video segment and the movement direction of the person in the first candidate video segment;
[0102] Consistency information between the movement direction of the video frames in the second candidate video segment and the movement direction of the person in the second candidate video segment;
[0103] Consistency information between the movement directions of video frames in the first candidate video segment and the second candidate video segment;
[0104] Similarity information between the editing elements corresponding to the first candidate video segment and the second candidate video segment.
[0105] In one possible implementation, the number of candidate videos is at least two;
[0106] The system also includes a third determining module, used for:
[0107] For any one of at least two candidate videos, determine a fourth score that corresponds one-to-one with a combination of video segments in the candidate video, wherein the combination of video segments includes at least three adjacent video segments;
[0108] At least based on the fourth score, determine the fifth score corresponding to the candidate video;
[0109] Based on the fifth score, at least one target video is determined from the at least two candidate videos.
[0110] In one possible implementation, the third determining module is used to:
[0111] For any combination of video segments in the candidate videos, a pre-trained second neural network is used to process the combination of video segments to obtain a fourth score corresponding to the combination of video segments.
[0112] In one possible implementation, the third determining module is used to:
[0113] A third score is determined that corresponds one-to-one with the video segment pairs in the candidate videos, wherein the video segment pairs include two adjacent video segments;
[0114] Based on the fourth score and the third score, a fifth score is determined for the candidate video.
[0115] In one possible implementation, the at least one video to be edited includes: at least one panoramic video.
[0116] According to one aspect of this disclosure, an electronic device is provided, comprising: one or more processors; a memory for storing executable instructions; wherein the one or more processors are configured to invoke the executable instructions stored in the memory to perform the method described above.
[0117] According to one aspect of this disclosure, a computer-readable storage medium is provided that stores computer program instructions thereon, which, when executed by a processor, implement the above-described method.
[0118] According to one aspect of this disclosure, a computer program product is provided, including computer-readable code, or a non-volatile computer-readable storage medium carrying computer-readable code, wherein when the computer-readable code is run in an electronic device, a processor in the electronic device performs the above-described method.
[0119] In this embodiment of the disclosure, by obtaining at least one video to be edited, obtaining editing control information input by the user, determining the editing ratio information corresponding to different types of video segments based on the editing control information, determining at least two candidate video segments from the at least one video to be edited based on the editing ratio information, and generating at least one candidate video based on the at least two candidate video segments, the candidate video is generated based on the editing ratio information determined by the editing control information. This not only meets the user's video editing needs, but also makes the different types of video segments in the edited video more harmonious and aesthetically pleasing.
[0120] It should be understood that the above general description and the following detailed description are exemplary and explanatory only, and are not intended to limit this disclosure.
[0121] Other features and aspects of this disclosure will become clear from the following detailed description of exemplary embodiments with reference to the accompanying drawings. Attached Figure Description
[0122] The accompanying drawings, which are incorporated in and form part of this specification, illustrate embodiments consistent with this disclosure and, together with the specification, serve to illustrate the technical solutions of this disclosure.
[0123] Figure 1 A flowchart illustrating the video editing method provided in an embodiment of this disclosure is shown.
[0124] Figure 2 This diagram illustrates a video editing element selection interface in an embodiment of the present disclosure.
[0125] Figure 3 Another schematic diagram shows the editing element selection interface in the video editing method provided in the embodiments of this disclosure.
[0126] Figure 4 This diagram illustrates a video style selection interface in the video editing method provided in an embodiment of the present disclosure.
[0127] Figure 5 This diagram illustrates the duration selection interface in the video editing method provided in this embodiment.
[0128] Figure 6 This diagram illustrates a video editing method provided in this embodiment of the present disclosure, in which candidate video segments are selected based on a first score.
[0129] Figure 7 A schematic diagram of a first neural network in the video editing method provided in an embodiment of this disclosure is shown.
[0130] Figure 8This diagram illustrates a second neural network in the video editing method provided in an embodiment of the present disclosure.
[0131] Figure 9 A block diagram of a video editing apparatus provided in an embodiment of this disclosure is shown.
[0132] Figure 10 A block diagram of an electronic device 1900 provided in an embodiment of this disclosure is shown. Detailed Implementation
[0133] Various exemplary embodiments, features, and aspects of this disclosure will now be described in detail with reference to the accompanying drawings. The same reference numerals in the drawings denote elements that have the same or similar functions. Although various aspects of the embodiments are shown in the drawings, they are not necessarily drawn to scale unless specifically indicated otherwise.
[0134] The term “exemplary” as used herein means “serving as an example, embodiment, or illustration.” Any embodiment illustrated herein as “exemplary” is not necessarily to be construed as superior to or better than other embodiments.
[0135] In this document, the term "and / or" is merely a description of the relationship between related objects, indicating that three relationships can exist. For example, A and / or B can represent three cases: A alone, A and B simultaneously, and B alone. Furthermore, the term "at least one" in this document means any combination of at least two of any one or more elements. For example, including at least one of A, B, and C can mean including any one or more elements selected from the set consisting of A, B, and C.
[0136] Furthermore, to better illustrate this disclosure, numerous specific details are set forth in the following detailed description. Those skilled in the art will understand that this disclosure can be practiced without certain specific details. In some instances, methods, means, components, and circuits well known to those skilled in the art have not been described in detail in order to highlight the main points of this disclosure.
[0137] This disclosure provides a video editing method, apparatus, electronic device, storage medium, and program product. By obtaining at least one video to be edited, obtaining editing control information input by the user, determining editing ratio information corresponding to different types of video segments based on the editing control information, determining at least two candidate video segments from the at least one video to be edited based on the editing ratio information, and generating at least one candidate video based on the at least two candidate video segments, the method generates candidate videos based on the editing ratio information determined by the editing control information. This not only meets the user's video editing needs but also makes the different types of video segments in the edited video more harmonious and aesthetically pleasing.
[0138] The video editing method provided in this disclosure will now be described in detail with reference to the accompanying drawings.
[0139] Figure 1 A flowchart illustrating a video editing method provided in an embodiment of this disclosure is shown. In one possible implementation, the execution entity of the video editing method can be a video editing device, for example, the video editing method can be executed by a terminal device, a server, or other electronic equipment. The terminal device can be a user equipment (UE), mobile device, user terminal, terminal, cellular phone, cordless phone, personal digital assistant (PDA), handheld device, computing device, in-vehicle device, or wearable device, etc. In some possible implementations, the video editing method can be implemented by a processor calling computer-readable instructions stored in memory. Figure 1 As shown, the video editing method includes steps S11 to S15.
[0140] In step S11, at least one video to be edited is obtained.
[0141] In step S12, the clipping control information input by the user is obtained.
[0142] In step S13, the editing ratio information corresponding to different types of video segments is determined based on the editing control information.
[0143] In step S14, at least two candidate video segments are determined from the at least one video to be edited based on the editing ratio information.
[0144] In step S15, at least one candidate video is generated based on the at least two candidate video segments.
[0145] In this embodiment of the disclosure, the video to be edited can represent the video used for editing, that is, the video to be edited can represent the material for video editing. The number of videos to be edited can be one or more. The video to be edited can include unprocessed videos captured during filming, or videos that have been processed (e.g., cropped, compressed, etc.). The video to be edited can include panoramic videos or non-panoramic videos. Panoramic videos can also be called 360° videos. Non-panoramic videos can also be called normal-view videos or ordinary-view videos, etc. When the number of videos to be edited is at least two, the at least two videos to be edited can include videos filmed at the same time, or videos filmed at different times. That is, this embodiment of the disclosure does not limit the filming time of different videos to be edited. Any video to be edited can be used to generate at least one video segment in the candidate videos. Different videos to be edited can generate different numbers of video segments, or the same number of video segments.
[0146] In one possible implementation, the at least one video to be edited includes at least one panoramic video. In this implementation, the video to be edited may include one or more panoramic videos. As an example of this implementation, all at least one video to be edited is a panoramic video. In one example, the at least one video to be edited may include multiple panoramic videos taken by a traveler at different locations within the same city. For example, a traveler can use a portable panoramic video shooting tool to capture multiple panoramic videos at different scenic spots in the city. As another example of this implementation, the number of videos to be edited is at least two, and the at least two videos to be edited include at least one panoramic video and at least one non-panoramic video.
[0147] As an example of this implementation, for panoramic video, gnomonic projection can be used to transform the projection surface from a sphere to a plane, resulting in a video with a normal viewing angle. The formula for gnomonic projection can be expressed as Equation 1:
[0148] q = (x, y, f) Equation 1,
[0149] Where x represents the longitude of the projection center, x∈[-180°, 180°); y represents the latitude of the projection center, y∈[-90°, 90°); f represents the field of view (FoV), where f=(f x ,f y ), f x f represents the field of view in the x-direction. y This represents the field of view angle in the y-direction. Where, f y with f x The ratio can be determined based on the resolution of the target video. For example, fy =1.77f x .
[0150] In one example, the at least one video to be edited may include I panoramic videos, where the i-th panoramic video can be represented as ε. i , where i∈[0,I).
[0151] Other possible examples include cylindrical projection, which can be used to project panoramic videos; no specific method is specified here.
[0152] In related technologies, when generating normal-viewpoint videos based on panoramic videos, users need to select a viewpoint frame by frame from the panoramic video. In this implementation, by obtaining at least one panoramic video and acquiring editing control information input by the user, the editing ratio information corresponding to different types of video segments is determined based on the editing control information. Based on the editing ratio information, at least two candidate video segments are determined from the at least one panoramic video, and at least one candidate video is generated based on the at least two candidate video segments. This effectively generates candidate videos from panoramic videos that meet the user's video editing needs, and also makes the different types of video segments in the edited video more harmonious and aesthetically pleasing.
[0153] In another possible implementation, the at least one video to be edited may be a non-panoramic video.
[0154] In this embodiment of the disclosure, editing control information can represent information used to control the video editing process, and the editing control information can be used at least to control the proportion and / or duration of different types of video clips in the candidate video. Editing proportion information can represent the proportion and / or duration of different types of video clips in the candidate video.
[0155] In one possible implementation, at least two editing ratio information items can be determined based on the editing control information. In this implementation, the same editing control information can correspond to different editing ratio information items; that is, based on the same editing control information, at least two distinct editing ratio information items can be determined, thereby helping to generate more diverse candidate videos.
[0156] In another possible implementation, the editing control information can be mapped one-to-one with the editing ratio information. In this implementation, a unique editing ratio can be determined based on the editing control information.
[0157] In one possible implementation, the editing control information includes information about the target editing element; determining the editing ratio information corresponding to different types of video segments based on the editing control information includes: determining the ratio of the video segment corresponding to the target editing element, and the ratio of the video segment corresponding to at least one editing element other than the target editing element, based on the information about the target editing element.
[0158] In this implementation, the editing control information may include control information corresponding to the editing elements, and the control information corresponding to the editing elements may include information about the target editing elements. Here, the editing element can represent the focus content of the video edit, that is, the content that is of primary concern in the video edit. The editing element may also be called focus content, content center, key content, important content, etc., without limitation here. The target editing element may include a specified object type and / or a specified object. The target editing element may include one or more items. The editing ratio information may include the ratio information corresponding to the editing elements.
[0159] As an example of this implementation, the target clip element may include a specified object type. For example, the target clip element may include at least one of people, buildings, animals, plants, etc. In this example, the information of the target clip element may include identification information of the specified object type, or the information of the target clip element may include identification information of the specified object type and the proportion of the specified object type. The identification information of the specified object type may represent information that can be used to uniquely identify the specified object type. For example, the identification information of the specified object type may be the name, number, etc. of the specified object type. In this example, the user can select one or more object types as target clip elements. For example, the user can select people as target clip elements. Similarly, the user can select buildings as target clip elements. Furthermore, the user can select both people and buildings as target clip elements.
[0160] Figure 2 This diagram illustrates a selection interface for editing elements in a video editing method provided by an embodiment of the present disclosure. Figure 2 In the example shown, users can select the proportions of people and buildings using the clipping element controls.
[0161] Figure 3 Another schematic diagram shows the editing element selection interface in the video editing method provided in this embodiment. Figure 3 In the example shown, users can select the scale of people, buildings, and three other types using the clipping element controls.
[0162] As another example of this implementation, the target clip element can include a specified object. For example, the target clip element can be a specified character, two specified characters, multiple specified characters, a specified bridge, two specified puppies, a specified tree, etc. In this example, the information of the target clip element can include the identification information of the specified object, or the information of the target clip element can include the identification information of the specified object and the scale of the specified object.
[0163] In this implementation, for any one of the at least one videos to be edited, semantic segmentation can be performed on the video to be edited to obtain a semantic segmentation result corresponding to the video to be edited. For example, a person can be segmented from the video to be edited to obtain a semantic segmentation result corresponding to the person in the video to be edited. The semantic segmentation result corresponding to the person in the video to be edited may include the size and position of the bounding box corresponding to the person in the video to be edited. Similarly, a building can be segmented from the video to be edited to obtain a semantic segmentation result corresponding to the building in the video to be edited. The semantic segmentation result corresponding to the building in the video to be edited may include the size and position of the bounding box corresponding to the building in the video to be edited.
[0164] As an example of this implementation, for any video to be edited, semantic segmentation can be performed on the keyframes of the video to improve the speed of semantic segmentation. The keyframes of the video to be edited can be determined using a uniform sampling method.
[0165] In one example, for the i-th video ε out of I videos to be edited i It can be obtained through semantic segmentation. Buildings individual and There are K other objects. From I videos to be edited, K can be segmented. b Building, K p Individuals and K o Other objects. These other objects can represent objects of types other than people and buildings.
[0166] In one example, the proportion information corresponding to the editing element can be represented as: in, This indicates the proportion of the video segment corresponding to the person. This indicates the proportion of the video clip corresponding to the building. R represents the proportion of video clips corresponding to other types. s =1. After determining the proportional information corresponding to the editing elements, it can be done from K. p Selected from the individuals A person, from K b Select from the buildings A building, from K o Select other objects One object.
[0167] In this implementation, by determining the proportion of the video segment corresponding to the target editing element and the proportion of the video segment corresponding to at least one editing element other than the target editing element based on the information of the target editing element, video editing can be performed based on semantic segmentation results rather than pixel-based video editing, thereby meeting the user's needs for video content, that is, meeting the user's preferences for video content.
[0168] In another possible implementation, the editing control information includes information about the target editing element; determining the editing ratio information corresponding to different types of video segments based on the editing control information includes: determining the duration of the video segment corresponding to the target editing element and the duration of the video segment corresponding to at least one editing element other than the target editing element based on the information about the target editing element.
[0169] In one possible implementation, the editing control information includes information about the target video style; determining the editing ratio information corresponding to different types of video segments based on the editing control information includes: determining the ratio of video segments corresponding to at least two types of camera movements based on the information about the target video style.
[0170] In this implementation, the editing control information may include control information corresponding to the video style, and the control information corresponding to the video style may include information about the target video style. The target video style may include one or more styles. The target video style information may include identification information of the target video style, or it may include the identification information of the target video style and the proportion of the target video style. The identification information of the target video style may represent information that can uniquely identify the target video style. For example, the identification information of the target video style may be the name, number, etc., of the target video style.
[0171] As an example of this implementation, video styles can include soothing and dynamic. Figure 4 This diagram illustrates a video style selection interface in the video editing method provided in this embodiment. Figure 4 In the example shown, users can select the ratio of soothing video clips to dynamic video clips using video style controls. Of course, those skilled in the art can design more video styles for users to choose from based on actual application needs and / or personal preferences; this is not limited here.
[0172] In this implementation, the editing ratio information may include ratio information corresponding to the camera movement type. A first correspondence between video style information and the ratio of camera movement types can be pre-established, and the ratio of video segments corresponding to at least two camera movement types can be determined based on the target video style information and the first correspondence. The ratio of camera movement types can represent the ratio of video segments corresponding to those camera movement types. Camera movement types may include at least two of the following: stationary, zoom out, zoom in, rotation, panning, and rapid shaking.
[0173] In one example, when the video style is soothing, the corresponding camera movement type can include static. For instance, when the camera movement type is static, the projection center can be fixed at the centroid of the clip element, and the field of view can remain at f. x =45°, f y =1.77f x And the camera neither rotates nor moves.
[0174] In another example, when the video style is dynamic, the corresponding camera motion type can include rotation. For instance, when the camera motion type is rotation, the field of view can remain at f. x =45°, f y =1.77f x The projection center can be moved from any point on the lower boundary of the clip element's bounding box to any point on the upper boundary, or from any point on the upper boundary to any point on the lower boundary.
[0175] In another example, when the video style is dynamic, the corresponding camera movement type can include zooming in. For instance, when the camera movement type is zooming in, the projection center can be fixed at the centroid of the clip element, f x It can be gradually reduced from 45° to 25°, or it can be gradually reduced from 65° to 45°, f y =1.77f x .
[0176] In another example, when the video style is dynamic, the corresponding camera movement type can include zoom in. For instance, when the camera movement type is zoom in, the projection center can be fixed at the centroid of the clip element, f x The angle can be gradually increased from 25° to 45°, or from 45° to 65°. y =1.77f x .
[0177] In one example, the proportional information corresponding to the type of camera movement can be represented as: in, This indicates the proportion of a video segment that is still. This indicates the proportion of the video segment that is zoomed in and / or zoomed out. R represents the ratio of the video segment to be rotated. c =1. For example, if the user selects a soothing video style, For example, when the user selects a dynamic video style,
[0178] In this implementation, by determining the proportion of video segments corresponding to at least two types of camera movement based on the information of the target video style, the user's needs for artistic techniques in the video can be met.
[0179] In another possible implementation, the editing control information includes information about the target video style; determining the editing ratio information corresponding to different types of video segments based on the editing control information includes: determining the duration of video segments corresponding to at least two types of camera movements based on the information about the target video style.
[0180] In this embodiment of the disclosure, after determining the editing ratio information, at least two candidate video segments can be determined from the at least one video to be edited.
[0181] In one possible implementation, all candidate video segments can be non-panoramic video segments. For example, the field of view of the candidate video segments can be between 25° and 65°. Of course, the field of view of the candidate video segments can also be larger or smaller, and those skilled in the art can flexibly set it according to the actual application scenario requirements, which is not limited here.
[0182] In another possible implementation, candidate video clips may also include panoramic video clips.
[0183] In one possible implementation, the editing control information includes target duration information; determining at least two candidate video segments from the at least one video to be edited based on the editing ratio information includes: determining a target number of video segments in the candidate videos based on the target duration information; and determining at least two candidate video segments from the at least one video to be edited based on the editing ratio information and the target number.
[0184] In this implementation, the target duration information can be any information that represents the user's desired video duration. As an example, the target duration information may include a target duration. In this example, a second correspondence between duration and the number of video segments can be obtained, and the target quantity can be determined based on the target duration and the second correspondence. As another example, the target duration information may include a target quantity. In this example, the target quantity can be directly obtained from the target duration information.
[0185] Figure 5 This diagram illustrates a duration selection interface in a video editing method provided in an embodiment of this disclosure. Figure 5 As shown, users can adjust the duration of the target video using the duration adjustment control.
[0186] In this implementation, the number of video segments in a candidate video can represent the number of shots in the candidate video. Scene transitions are allowed between adjacent video segments in a candidate video; that is, the number of video segments in a candidate video can be equal to the number of scene transitions in the candidate video plus one. Video segments in a candidate video can also be called shots or shot fragments, etc., without limitation.
[0187] In this implementation, when there are multiple targets, the candidate video includes multiple video segments. Each video segment in the candidate video corresponds one-to-one with its position, and the number of video segment positions equals the number of targets. For example, if the number of targets is 10, there can be 10 video segment positions.
[0188] As an example of this implementation, the shooting time of the candidate video segments corresponding to earlier-ranked video segment positions is earlier than the shooting time of the candidate video segments corresponding to later-ranked video segment positions. For example, the shooting time of each candidate video segment corresponding to the first video segment position is earlier than the shooting time of each candidate video segment corresponding to the second video segment position; the shooting time of each candidate video segment corresponding to the second video segment position is earlier than the shooting time of each candidate video segment corresponding to the third video segment position; and so on. Based on this example, it is possible to generate candidate videos with a more natural temporal sequence, and to reduce the computational load of video editing, thereby increasing the speed of video editing.
[0189] Of course, it is also possible not to limit the shooting time of the candidate video segments corresponding to the video segment positions, in order to generate more diverse candidate videos.
[0190] In one example, the number of targets can be represented as L, where L∈[0,2I]. In another example, the duration of any candidate video segment can be 1 to 20 seconds; for example, the duration of a candidate video segment can be around 10 seconds.
[0191] Compared to adjusting the video speed (i.e., speeding up the video), this implementation determines the target number of video segments in the candidate video based on the target duration information, and determines at least two candidate video segments from the at least one video to be edited based on the editing ratio information and the target number. This allows for adjusting the number of video segments in the candidate video according to the target duration information, thereby enabling the editing of a more natural video.
[0192] As an example of this implementation, determining at least two candidate video segments from the at least one video to be edited based on the editing ratio information and the target quantity includes: determining the editing elements and camera movement types corresponding to the positions of at least two video segments in the candidate videos based on the editing ratio information and the target quantity; determining the videos to be edited corresponding to the positions of the at least two video segments from the at least one video to be edited; for any one of the at least two video segment positions, determining the projection parameters corresponding to the initial video frame and the projection parameters corresponding to the ending video frame of the candidate video segment from the video to be edited corresponding to the video segment position, based on the editing elements and camera movement types corresponding to the video segment position, wherein the projection parameters corresponding to the video frame include the position of the projection center and the field of view; determining the projection parameters corresponding to the intermediate video frame of the candidate video segment based on the projection parameters corresponding to the initial video frame and the projection parameters corresponding to the ending video frame; and determining the candidate video segment corresponding to the video segment position from the video to be edited corresponding to the video segment position based on the projection parameters corresponding to the initial video frame, the projection parameters corresponding to the ending video frame, and the projection parameters corresponding to the intermediate video frame.
[0193] In this example, the editing ratio information can include the ratio information corresponding to the editing elements and the ratio information corresponding to the camera movement type. Based on the editing ratio information, the editing elements and camera movement types corresponding to each video segment position can be determined randomly, or they can be determined according to preset rules; no limitation is made here. In this example, the editing elements and camera movement types corresponding to each video segment position can be fixed; that is, the editing elements and camera movement types of different candidate video segments corresponding to any given video segment position can be the same.
[0194] In one example, the proportion information corresponding to the editing element is: So, in the L video segment locations, The editing element corresponding to each video clip position is a person. The editing element corresponding to each video clip location is a building. The editing element corresponding to the location of this video clip is "Other". The proportion information corresponding to the camera movement type is... So, in the L video segment locations, The camera movement type corresponding to the location of this video clip is static. The camera movement type corresponding to each video clip position is either zooming in or zooming out. The camera movement type corresponding to the location of this video clip is rotation.
[0195] In one example, some video clip locations may not even have corresponding editing elements.
[0196] In this example, the video to be edited corresponding to any one of the at least two video segment positions can be fixed. In this example, different candidate video segments corresponding to the same video segment position can be determined from the same video to be edited. For any video segment position, at least one candidate video segment corresponding to that video segment position can be determined based on the video to be edited, the editing elements, and the camera movement type, and the at least one candidate video segment corresponding to that video segment position can constitute a candidate video segment pool corresponding to that video segment position.
[0197] In this example, by determining the editing elements and camera movement types corresponding to at least two video segment positions in the candidate video based on the editing ratio information and the target quantity, the video to be edited corresponding to the at least two video segment positions is determined from the at least one video to be edited. For any one of the at least two video segment positions, based on the editing elements and camera movement types corresponding to the video segment position, the projection parameters corresponding to the initial video frame and the projection parameters corresponding to the ending video frame of the candidate video segment corresponding to the video segment position are determined from the video to be edited corresponding to the video segment position. Based on the projection parameters corresponding to the initial video frame and the ending video frame, the projection parameters corresponding to the middle video frame of the candidate video segment are determined. Finally, based on the projection parameters corresponding to the initial video frame, the ending video frame, and the middle video frame, the candidate video segment corresponding to the video segment position is determined from the video to be edited corresponding to the video segment position. This improves the speed of video editing.
[0198] In one example, determining the projection parameters corresponding to the intermediate video frames of the candidate video segment based on the projection parameters corresponding to the initial video frame and the projection parameters corresponding to the end video frame includes: determining the total number of frames of the candidate video segment and the camera motion parameters corresponding to the candidate video segment; and determining the projection parameters corresponding to the intermediate video frames of the candidate video segment based on the projection parameters corresponding to the initial video frame, the projection parameters corresponding to the end video frame, the total number of frames, and the camera motion parameters.
[0199] In one example, among L video segment positions, any candidate video segment corresponding to the l-th video segment position can be represented by v. l Candidate video clip v l The projection parameters corresponding to the initial video frame can be adopted It indicates that the candidate video clip v l The projection parameters corresponding to the end video frame can be adopted It indicates that the candidate video clip v l The total number of frames can be expressed as T l It indicates that the candidate video clip v l The camera motion parameters can be represented by α, and the candidate video clip v l The parameters can be expressed as
[0200] In one example, candidate video clip v l The projection parameters corresponding to the intermediate video frames can be obtained through interpolation using easing functions. For example, candidate video clip v l Projection parameters corresponding to the t-th frame Equations 2 and 3 can be used to determine:
[0201]
[0202]
[0203] Where 1≤t≤T l -2. Wherein, the total number of frames T l The lens motion parameter α can be used to control the movement speed of the projection center in the candidate video segment. For a given... Larger T l This allows for a slower movement of the projection center and a smaller T. l This allows for a faster movement speed of the projection center. The lens motion parameter α can be used to control the movement speed of the projection center at different time points within the candidate video segment. For example, a larger α can control the movement speed of the projection center to be slow at first and then fast; a smaller α can control the movement speed of the projection center to be fast at first and then slow. In one example, α∈{0.1,1,10}.
[0204] In this example, by determining the total number of frames of the candidate video segment and the camera motion parameters corresponding to the candidate video segment, and based on the projection parameters corresponding to the initial video frame, the projection parameters corresponding to the end video frame, the total number of frames, and the camera motion parameters, the projection parameters corresponding to the intermediate video frames of the candidate video segment are determined. This allows for the rapid determination of the projection parameters of the intermediate video frames of the candidate video segment, thereby enabling the rapid identification of the candidate video segment.
[0205] As another example of this implementation, the editing elements corresponding to any of the at least two video segment positions can be non-fixed. That is, the editing elements corresponding to different candidate video segments at the same video segment position can be different, as long as the editing elements corresponding to the candidate video segments in the candidate video meet the proportional information corresponding to the editing elements.
[0206] As another example of this implementation, the camera motion type corresponding to any of the at least two video segment positions can be non-fixed. That is, the camera motion types corresponding to different candidate video segments at the same video segment position can be different, as long as the camera motion type corresponding to the candidate video segment in the candidate video conforms to the proportional information corresponding to the camera motion type.
[0207] As another example of this implementation, the video to be edited corresponding to any of the at least two video segment positions can be non-fixed, that is, different candidate video segments corresponding to the same video segment position can be determined from different videos to be edited.
[0208] In another possible implementation, the target duration and / or the number of targets can be fixed. In this implementation, it is not necessary for the user to input the target duration information.
[0209] In one possible implementation, generating at least one candidate video based on the at least two candidate video segments includes: for a first candidate video segment among the at least two candidate video segments, determining a first score corresponding to a candidate video segment pair to which the first candidate video segment belongs, wherein the candidate video segment pair includes the first candidate video segment and a second candidate video segment, and the second candidate video segment is adjacent to the video segment corresponding to the first candidate video segment; in response to the first score satisfying a preset condition, retaining the first candidate video segment; and generating at least one candidate video based on the retained candidate video segments among the at least two candidate video segments.
[0210] The first candidate video segment can represent any one of the at least two candidate video segments. The position of the video segment corresponding to the second candidate video segment can be before or after the first candidate video segment. For example, if the first candidate video segment is the candidate video segment corresponding to the second video segment position, then the second candidate video segment can be the candidate video segment corresponding to the first video segment position or the candidate video segment corresponding to the third video segment position. In this implementation, the first candidate video segment can be retained if the first score corresponding to any candidate video segment pair to which the first candidate video segment belongs satisfies a preset condition. For example, if the first candidate video segment is the candidate video segment corresponding to the second video segment position, the first video segment position includes 5 candidate video segments, and the third video segment position includes 4 candidate video segments, then combining the first candidate video segment with each candidate video segment at the first and third video segment positions yields 9 candidate video segment pairs. The first candidate video segment can be retained if the first score corresponding to any of these 9 candidate video segment pairs satisfies a preset condition.
[0211] In one example, the first candidate video segment could be v l This indicates that the second candidate video segment can be represented by v. l+1 First candidate video clip v l Second candidate video clip v l+1 The candidate video segments formed can be used to obtain the first score corresponding to them. express.
[0212] As an example of this implementation, the N candidate video segments with the highest first scores can be retained, where N is greater than or equal to 3. Figure 6 This diagram illustrates the process of filtering candidate video segments using a first score in a video editing method provided by an embodiment of this disclosure. In one example, the filtering of candidate video segments can begin from the position of the first video segment, and at each stage of the filtering, a maximum of N candidate video segments can be retained.
[0213] As another example of this implementation, candidate video segments with a first score greater than or equal to a first preset threshold can be retained.
[0214] In this implementation, candidate videos are generated by filtering candidate video segments, which reduces the number of candidate videos generated and improves the quality of the generated candidate videos.
[0215] As an example of this implementation, determining the first score corresponding to the candidate video segment pair to which the first candidate video segment belongs includes: determining the second score corresponding to the first candidate video segment and the second score corresponding to the second candidate video segment; determining the third score corresponding to the candidate video segment pair; and determining the first score corresponding to the candidate video segment pair based on the second score corresponding to the first candidate video segment, the second score corresponding to the second candidate video segment, and the third score.
[0216] In this example, a second score can be determined that corresponds one-to-one with the at least two candidate video segments. The second score can represent the score corresponding to a single candidate video segment. A third score can represent the score corresponding to a pair of candidate video segments, where any pair includes two adjacent candidate video segments. The third score can be determined according to preset rules. These preset rules can be manually designed rules by experts. The third score can be used to characterize: the coordination between the two candidate video segments in a pair, and / or, the coordination within a single candidate video segment in the pair.
[0217] In this example, by determining the second score corresponding to the first candidate video segment and the second candidate video segment, the candidate video segment pairs consisting of the first and second candidate video segments can be analyzed using a single video segment as the smallest unit of analysis. By determining the third score corresponding to the candidate video segment pairs consisting of the first and second candidate video segments, the correlation between the first and second candidate video segments can be analyzed. By determining the first score corresponding to the candidate video segment pairs based on the second and third scores, it is helpful to retain the candidate video segments with higher quality.
[0218] In one example, determining the second score corresponding to the first candidate video segment and the second score corresponding to the second candidate video segment includes: determining the second score corresponding to the first candidate video segment based on the first editing element corresponding to the first candidate video segment; and determining the second score corresponding to the second candidate video segment based on the second editing element corresponding to the second candidate video segment.
[0219] For example, if the editing element corresponding to any candidate video clip is a person, then the second score corresponding to that candidate video clip can be obtained by using... For example, if the editing element corresponding to any candidate video segment is a building, then the second score corresponding to that candidate video segment can be expressed as follows: express.
[0220] In this example, the first clip element can represent the clip element corresponding to the first candidate video segment, and the second clip element can represent the clip element corresponding to the second candidate video segment. By determining the second score corresponding to the first candidate video segment based on the first clip element corresponding to the first candidate video segment, and by determining the second score corresponding to the second candidate video segment based on the second clip element corresponding to the second candidate video segment, information about the clip elements corresponding to the candidate video segments can be captured more effectively.
[0221] In one example, determining the second score corresponding to the first candidate video segment based on the first editing element corresponding to the first candidate video segment includes: cropping video content corresponding to the first editing element from the first candidate video segment based on the first editing element corresponding to the first candidate video segment; processing the first candidate video segment and the video content corresponding to the first editing element using a pre-trained first neural network to obtain the second score corresponding to the first candidate video segment, wherein the first neural network is pre-trained using a training video set, and the training videos in the training video set are videos captured in real time; determining the second score corresponding to the second candidate video segment based on the second editing element corresponding to the second candidate video segment includes: cropping video content corresponding to the second editing element from the second candidate video segment based on the second editing element corresponding to the second candidate video segment; processing the second candidate video segment and the video content corresponding to the second editing element using the first neural network to obtain the second score corresponding to the second candidate video segment.
[0222] In one example, K first keyframes can be determined from a first candidate video segment, and image regions corresponding to first editing elements can be cropped from the K first keyframes to obtain K first editing element images, where K is an integer greater than or equal to 1. The K first keyframes and K first editing element images can be input into a first neural network to obtain a second score corresponding to the first candidate video segment. Similarly, K second keyframes can be determined from a second candidate video segment, and image regions corresponding to second editing elements can be cropped from the K second keyframes to obtain K second editing element images. The K second keyframes and K second editing element images can be input into a first neural network to obtain a second score corresponding to the second candidate video segment.
[0223] Figure 7 This diagram illustrates a first neural network in the video editing method provided in an embodiment of the present disclosure. Figure 7As shown, the first neural network can include two branches. The input to one branch can be a clipping element image, and the input to the other branch can be a keyframe. The first neural network can fuse the feature information extracted from the two branches, and the first neural network can use ResNet18 as the backbone network.
[0224] In this example, the first neural network can also be called the first discriminator. The first neural network can be trained based on captured video. Here, captured video can refer to video directly captured by a video capture device, rather than synthetic video.
[0225] By adopting this example, candidate video clips that are closer to the real video can receive a higher second score, while candidate video clips that are far removed from the real video can receive a lower second score, thus enabling the selection of candidate video clips that are closer to the real video. Here, the real video refers to the captured video.
[0226] In one example, determining the third score corresponding to the candidate video segment pair includes: determining the third score corresponding to the candidate video segment pair based on at least one of the following: consistency information between the movement direction of the video frame in the first candidate video segment and the movement direction of the person in the first candidate video segment; consistency information between the movement direction of the video frame in the second candidate video segment and the movement direction of the person in the second candidate video segment; consistency information between the movement directions of the video frames in the first candidate video segment and the second candidate video segment; and similarity information between the editing elements corresponding to the first candidate video segment and the second candidate video segment.
[0227] In this example, a higher third score is obtained if the movement direction of the video frames in the candidate video clip is the same as the movement direction of the person in the candidate video clip; a lower third score is obtained if the movement direction of the video frames in the candidate video clip is not the same as the movement direction of the person in the candidate video clip. For example, a higher third score is obtained if the movement direction of the video frames in the first candidate video clip and the movement direction of the person in the first candidate video clip are both to the left.
[0228] In this example, a higher third score can be obtained if the movement direction of the first candidate video segment is consistent with that of the video frames in the second candidate; a lower third score can be obtained if the movement direction of the first candidate video segment is inconsistent with that of the video frames in the second candidate video segment.
[0229] In this example, if the editing elements corresponding to the first candidate video clip and the second candidate video clip are different, a higher third score can be obtained; if the editing elements corresponding to the first candidate video clip and the second candidate video clip are the same, a lower third score can be obtained.
[0230] By adopting this example, more harmonious candidate video clip pairs can achieve higher third scores.
[0231] In one example, candidate video clip v l The corresponding second score can be represented by S. l It indicates that the candidate video clip v l and candidate video clips v l+1 The candidate video segments can be used to form a third score. This indicates that Equation 4 can be used to determine the first score corresponding to the candidate video pair.
[0232]
[0233] If the editing element corresponding to the candidate video clip is a person, then If the editing element corresponding to the candidate video clip is a building, then
[0234] As another example of this implementation, determining the first score corresponding to the candidate video segment pair to which the first candidate video segment belongs includes: determining the second score corresponding to the first candidate video segment and the second score corresponding to the second candidate video segment; and determining the first score corresponding to the candidate video segment pair based on the second score corresponding to the first candidate video segment and the second score corresponding to the second candidate video segment.
[0235] As another example of this implementation, determining the first score corresponding to the candidate video segment pair to which the first candidate video segment belongs includes: determining the third score corresponding to the candidate video segment pair; and determining the first score corresponding to the candidate video segment pair based on the third score.
[0236] In another possible implementation, candidate videos can be generated based on all candidate video segments without screening them.
[0237] In one possible implementation, the number of candidate videos is at least two; after generating at least one candidate video, the method further includes: for any one of the at least two candidate videos, determining a fourth score that corresponds one-to-one with a combination of video segments in the candidate video, wherein the combination of video segments includes at least three adjacent video segments; determining a fifth score corresponding to the candidate video based at least on the fourth score; and determining at least one target video from the at least two candidate videos based on the fifth score.
[0238] In this implementation, for any candidate video, at least one combination of video segments can be determined from the candidate video. Each combination of video segments can include w adjacent video segments, and the number of video segments in the candidate video is greater than or equal to w, where w is greater than or equal to 3. In one example, a length of w and a step size of [missing information] can be used. A sliding window is used to identify combinations of multiple video segments from the candidate videos.
[0239] In this implementation, the fourth score can represent the score corresponding to the combination of video segments in the candidate video. In one example, the fourth score can be adopted as follows: express.
[0240] In this implementation, the fifth score corresponding to a candidate video can also be called the global score. The target video can represent a video selected from the candidate videos. As an example of this implementation, the M candidate videos with the highest fifth scores can be designated as target videos. For example, M can be equal to 3 or 2, without limitation. As another example of this implementation, candidate videos with fifth scores higher than a second preset threshold can be designated as target videos.
[0241] This approach allows for the selection of higher-quality target videos from candidate videos, thereby reducing video creation time. Users can choose their favorite video from at least one target video and further edit it.
[0242] As an example of this implementation, determining the fourth score corresponding one-to-one with the combination of video segments in the candidate video includes: for any combination of video segments in the candidate video, using a pre-trained second neural network to process the combination of video segments to obtain the fourth score corresponding to the combination of video segments.
[0243] Figure 8 This diagram illustrates a second neural network in the video editing method provided in an embodiment of the present disclosure. Figure 8As shown, the second neural network can include w discriminators, each of which can use ResNet18 as its backbone network and Bi-LSTM as its classifier. Figure 8 In the example shown, the input to the second neural network includes video segments v from the candidate videos. l To video clip v l+w-1 In one example, keyframes from a video clip can be fed into a second neural network to improve the prediction speed of the fourth score. For instance, for any video clip in a combination of video clips, five keyframes from that clip can be fed into the second neural network.
[0244] In this example, by using a pre-trained second neural network to process any combination of video segments in the candidate videos, a fourth score corresponding to the combination of video segments is obtained, thereby improving the accuracy and speed of the fourth score prediction.
[0245] As an example of this implementation, determining the fifth score corresponding to the candidate video based at least on the fourth score includes: determining a third score that corresponds one-to-one with a pair of video segments in the candidate video, wherein the pair of video segments includes two adjacent video segments; and determining the fifth score corresponding to the candidate video based on the fourth score and the third score.
[0246] In one example, Equation 5 can be used to determine the fifth score S corresponding to the candidate video. glo :
[0247]
[0248] In this example, by combining the fourth and third scores, the fifth score corresponding to the candidate video is determined, which further improves the accuracy of score prediction and enables the selection of higher quality target videos.
[0249] The video editing method provided in this disclosure can be applied to application scenarios such as computer vision, intelligent video generation, intelligent video editing, professional video editing, and video editing assistance.
[0250] The following describes the video editing method provided in this embodiment through a specific application scenario.
[0251] In this application scenario, users can use portable panoramic video shooting tools to capture multiple panoramic videos at different locations in the city they are traveling in.
[0252] In this application scenario, the human video creation process can be simulated. The video editing method provided in this embodiment can be completed through a director module, a cinematographer module, and an editor module. Specifically, the director module simulates the role of a director in video creation, the cinematographer module simulates the role of a cinematographer in video creation, and the editor module simulates the role of an editor in video editing.
[0253] Users can, for example Figure 3 The editing element selection interface shown allows you to select the scale of the target editing element, which can be done through methods such as... Figure 4 The video style selection interface shown allows you to choose the aspect ratio of the target video style, which can be done through methods such as... Figure 5 The duration selection interface shown allows you to choose your target duration.
[0254] The director module can determine the proportion of the target editing elements selected by the user. It can determine the proportion based on the user's selected target video style. The target number of video clips can be determined based on the user's selected target duration. Furthermore, the corresponding video clip location, editing elements, and camera movement type can be determined.
[0255] The photographer module can determine the candidate video segments corresponding to each video segment position based on the video to be edited, the editing elements, and the type of camera movement.
[0256] The editor module can filter candidate video clips and generate multiple candidate videos based on the retained clips. It can also obtain a fifth score for each candidate video and determine the target video based on these scores. The target video can be provided to users for viewing or shared on social media.
[0257] It is understood that the various method embodiments mentioned above in this disclosure can be combined with each other to form combined embodiments without violating the principle and logic. Due to space limitations, this disclosure will not elaborate further. Those skilled in the art will understand that in the above methods of specific implementation, the specific execution order of each step should be determined by its function and possible internal logic.
[0258] In addition, this disclosure also provides video editing apparatus, electronic devices, computer-readable storage media, and computer program products, all of which can be used to implement any of the video editing methods provided in this disclosure. The corresponding technical solutions and effects can be found in the relevant descriptions in the method section, and will not be repeated here.
[0259] Figure 9 A block diagram of a video editing apparatus provided in an embodiment of this disclosure is shown. Figure 9As shown, the video editing device includes:
[0260] The first acquisition module 21 is used to acquire at least one video to be edited;
[0261] The second acquisition module 22 is used to acquire the editing control information input by the user;
[0262] The first determining module 23 is used to determine the editing ratio information corresponding to different types of video segments based on the editing control information;
[0263] The second determining module 24 is used to determine at least two candidate video segments from the at least one video to be edited based on the editing ratio information;
[0264] The generation module 25 is used to generate at least one candidate video based on the at least two candidate video segments.
[0265] In one possible implementation, the editing control information includes information about the target editing element;
[0266] The first determining module 23 is used for:
[0267] Based on the information of the target editing element, determine the proportion of the video segment corresponding to the target editing element, and the proportion of the video segment corresponding to at least one editing element other than the target editing element.
[0268] In one possible implementation, the editing control information includes information about the target video style;
[0269] The first determining module 23 is used for:
[0270] Based on the information of the target video style, determine the proportion of video segments corresponding to at least two types of camera movement.
[0271] In one possible implementation, the editing control information includes target duration information;
[0272] The second determining module 24 is used for:
[0273] Based on the target duration information, determine the target number of video segments in the candidate videos;
[0274] Based on the editing ratio information and the target quantity, at least two candidate video segments are determined from the at least one video to be edited.
[0275] In one possible implementation, the second determining module 24 is used to:
[0276] Based on the editing ratio information and the target quantity, determine the editing elements and camera movement types corresponding to at least two video segment positions in the candidate video;
[0277] From the at least one video to be edited, determine the video to be edited corresponding to the positions of the at least two video segments;
[0278] For any of the at least two video segment positions, based on the editing elements and camera movement type corresponding to the video segment position, the projection parameters corresponding to the initial video frame of the candidate video segment corresponding to the video segment position and the projection parameters corresponding to the end video frame of the candidate video segment are determined from the video to be edited corresponding to the video segment position. The projection parameters corresponding to the video frame include the position of the projection center and the field of view.
[0279] Based on the projection parameters corresponding to the initial video frame and the projection parameters corresponding to the end video frame, determine the projection parameters corresponding to the intermediate video frames of the candidate video segment.
[0280] Based on the projection parameters corresponding to the initial video frame, the projection parameters corresponding to the end video frame, and the projection parameters corresponding to the intermediate video frames, candidate video segments corresponding to the video segments to be edited are determined from the video segments to be edited at the video segment positions.
[0281] In one possible implementation, the second determining module 24 is used to:
[0282] Determine the total number of frames in the candidate video segment and the camera motion parameters corresponding to the candidate video segment;
[0283] Based on the projection parameters corresponding to the initial video frame, the projection parameters corresponding to the end video frame, the total number of frames, and the camera motion parameters, the projection parameters corresponding to the intermediate video frames of the candidate video segment are determined.
[0284] In one possible implementation, the generation module 25 is used to:
[0285] For a first candidate video segment among the at least two candidate video segments, a first score is determined for the candidate video segment pair to which the first candidate video segment belongs, wherein the candidate video segment pair includes the first candidate video segment and a second candidate video segment, and the second candidate video segment is adjacent to the video segment corresponding to the first candidate video segment.
[0286] In response to the first score meeting the preset conditions, the first candidate video segment is retained;
[0287] At least one candidate video is generated based on the candidate video segments retained from the at least two candidate video segments.
[0288] In one possible implementation, the generation module 25 is used to:
[0289] Determine the second score corresponding to the first candidate video segment and the second score corresponding to the second candidate video segment;
[0290] Determine the third score corresponding to the candidate video segment pair;
[0291] Based on the second score corresponding to the first candidate video segment, the second score corresponding to the second candidate video segment, and the third score, the first score corresponding to the candidate video segment is determined.
[0292] In one possible implementation, the generation module 25 is used to:
[0293] Based on the first editing element corresponding to the first candidate video segment, determine the second score corresponding to the first candidate video segment;
[0294] The second score corresponding to the second candidate video segment is determined based on the second editing element corresponding to the second candidate video segment.
[0295] In one possible implementation, the generation module 25 is used to:
[0296] Based on the first editing element corresponding to the first candidate video segment, the video content corresponding to the first editing element is cropped from the first candidate video segment; a pre-trained first neural network is used to process the first candidate video segment and the video content corresponding to the first editing element to obtain a second score corresponding to the first candidate video segment, wherein the first neural network is pre-trained using a training video set, and the training videos in the training video set are videos that have been captured.
[0297] Based on the second editing element corresponding to the second candidate video segment, the video content corresponding to the second editing element is cropped from the second candidate video segment; the first neural network is used to process the second candidate video segment and the video content corresponding to the second editing element to obtain the second score corresponding to the second candidate video segment.
[0298] In one possible implementation, the generation module 25 is used to:
[0299] The third score corresponding to the candidate video segment pair is determined based on at least one of the following:
[0300] Consistency information between the movement direction of the video frames in the first candidate video segment and the movement direction of the person in the first candidate video segment;
[0301] Consistency information between the movement direction of the video frames in the second candidate video segment and the movement direction of the person in the second candidate video segment;
[0302] Consistency information between the movement directions of video frames in the first candidate video segment and the second candidate video segment;
[0303] Similarity information between the editing elements corresponding to the first candidate video segment and the second candidate video segment.
[0304] In one possible implementation, the number of candidate videos is at least two;
[0305] The system also includes a third determining module, used for:
[0306] For any one of at least two candidate videos, determine a fourth score that corresponds one-to-one with a combination of video segments in the candidate video, wherein the combination of video segments includes at least three adjacent video segments;
[0307] At least based on the fourth score, determine the fifth score corresponding to the candidate video;
[0308] Based on the fifth score, at least one target video is determined from the at least two candidate videos.
[0309] In one possible implementation, the third determining module is used to:
[0310] For any combination of video segments in the candidate videos, a pre-trained second neural network is used to process the combination of video segments to obtain a fourth score corresponding to the combination of video segments.
[0311] In one possible implementation, the third determining module is used to:
[0312] A third score is determined that corresponds one-to-one with the video segment pairs in the candidate videos, wherein the video segment pairs include two adjacent video segments;
[0313] Based on the fourth score and the third score, a fifth score is determined for the candidate video.
[0314] In one possible implementation, the at least one video to be edited includes: at least one panoramic video.
[0315] In some embodiments, the functions or modules of the apparatus provided in this disclosure can be used to perform the methods described in the above method embodiments. The specific implementation and technical effects can be referred to the description of the above method embodiments. For the sake of brevity, they will not be repeated here.
[0316] This disclosure also provides a computer-readable storage medium storing computer program instructions thereon, which, when executed by a processor, implement the above-described method. The computer-readable storage medium may be a non-volatile computer-readable storage medium or a volatile computer-readable storage medium.
[0317] This disclosure also proposes a computer program including computer-readable code, wherein when the computer-readable code is run in an electronic device, a processor in the electronic device executes the above-described method.
[0318] This disclosure also provides a computer program product, including computer-readable code, or a non-volatile computer-readable storage medium carrying computer-readable code, wherein when the computer-readable code is run in an electronic device, the processor in the electronic device executes the above-described method.
[0319] This disclosure also provides an electronic device, including: one or more processors; a memory for storing executable instructions; wherein the one or more processors are configured to invoke the executable instructions stored in the memory to perform the above-described method.
[0320] Electronic devices can be provided as terminals, servers, or other forms of devices.
[0321] Figure 10 A block diagram of an electronic device 1900 provided in an embodiment of this disclosure is shown. For example, the electronic device 1900 may be provided as a server. (Refer to...) Figure 10 The electronic device 1900 includes a processing component 1922, which further includes one or more processors, and memory resources represented by memory 1932 for storing instructions, such as application programs, that can be executed by the processing component 1922. The application programs stored in memory 1932 may include one or more modules, each corresponding to a set of instructions. Furthermore, the processing component 1922 is configured to execute instructions to perform the methods described above.
[0322] Electronic device 1900 may also include a power supply component 1926 configured to perform power management of electronic device 1900, a wired or wireless network interface 1950 configured to connect electronic device 1900 to a network, and an input / output (I / O) interface 1958. Electronic device 1900 can operate on an operating system stored in memory 1932, such as Microsoft Server operating system (Windows Server). TM Apple's graphical user interface-based operating system (Mac OSX) TM ), a multi-user, multi-process computer operating system (Unix) TM Linux is a free and open-source Unix-like operating system. TM ), the open-source Unix-like operating system (FreeBSD) TM (or similar.)
[0323] In an exemplary embodiment, a non-volatile computer-readable storage medium is also provided, such as a memory 1932 including computer program instructions that can be executed by a processing component 1922 of an electronic device 1900 to perform the above-described method.
[0324] This disclosure can be a system, method, and / or computer program product. A computer program product may include a computer-readable storage medium having computer-readable program instructions loaded thereon for causing a processor to implement various aspects of this disclosure.
[0325] Computer-readable storage media can be tangible devices capable of holding and storing instructions for use by an instruction execution device. Computer-readable storage media can be, for example—but not limited to—electrical storage devices, magnetic storage devices, optical storage devices, electromagnetic storage devices, semiconductor storage devices, or any suitable combination of the foregoing. More specific examples (a non-exhaustive list) of computer-readable storage media include: portable computer disks, hard disks, random access memory (RAM), read-only memory (ROM), erasable programmable read-only memory (EPROM or flash memory), smoothed random access memory (SRAM), portable compact disc read-only memory (CD-ROM), digital multifunction disc (DVD), memory sticks, floppy disks, mechanical encoding devices, such as punch cards or recessed protrusions storing instructions thereon, and any suitable combination of the foregoing. The computer-readable storage media used herein are not to be construed as transient signals themselves, such as radio waves or other freely propagating electromagnetic waves, electromagnetic waves propagating through waveguides or other transmission media (e.g., light pulses through fiber optic cables), or electrical signals transmitted through wires.
[0326] The computer-readable program instructions described herein can be downloaded from computer-readable storage media to various computing / processing devices, or downloaded via a network, such as the Internet, local area network, wide area network, and / or wireless network, to an external computer or external storage device. The network may include copper transmission cables, fiber optic transmission, wireless transmission, routers, firewalls, switches, gateway computers, and / or edge servers. A network adapter card or network interface in each computing / processing device receives the computer-readable program instructions from the network and forwards them to the computer-readable storage media in the respective computing / processing device.
[0327] Computer program instructions used to perform the operations of this disclosure may be assembly instructions, instruction set architecture (ISA) instructions, machine instructions, machine-dependent instructions, microcode, firmware instructions, status setting data, or source code or object code written in any combination of one or more programming languages, including object-oriented programming languages such as Smalltalk, C++, etc., and conventional procedural programming languages such as the "C" language or similar programming languages. The computer-readable program instructions may execute entirely on the user's computer, partially on the user's computer, as a standalone software package, partially on the user's computer and partially on a remote computer, or entirely on a remote computer or server. In cases involving a remote computer, the remote computer may be connected to the user's computer via any type of network—including a local area network (LAN) or a wide area network (WAN)—or may be connected to an external computer (e.g., via the Internet using an Internet service provider). In some embodiments, electronic circuitry, such as programmable logic circuitry, field-programmable gate arrays (FPGAs), or programmable logic arrays (PLAs), is personalized by utilizing the status information of the computer-readable program instructions to implement various aspects of this disclosure.
[0328] Various aspects of this disclosure are described herein with reference to flowchart illustrations and / or block diagrams of methods, apparatus (systems), and computer program products according to embodiments of this disclosure. It should be understood that each block of the flowchart illustrations and / or block diagrams, and combinations of blocks in the flowchart illustrations and / or block diagrams, can be implemented by computer-readable program instructions.
[0329] These computer-readable program instructions can be provided to a processor of a general-purpose computer, a special-purpose computer, or other programmable data processing apparatus to produce a machine such that, when executed by the processor of the computer or other programmable data processing apparatus, they create means for implementing the functions / actions specified in one or more blocks of the flowchart and / or block diagram. These computer-readable program instructions can also be stored in a computer-readable storage medium that causes a computer, programmable data processing apparatus, and / or other device to operate in a particular manner; thus, the computer-readable medium storing the instructions comprises an article of manufacture that includes instructions for implementing aspects of the functions / actions specified in one or more blocks of the flowchart and / or block diagram.
[0330] Computer-readable program instructions may also be loaded onto a computer, other programmable data processing apparatus, or other device to cause a series of operational steps to be performed on the computer, other programmable data processing apparatus, or other device to produce a computer-implemented process, thereby causing the instructions executed on the computer, other programmable data processing apparatus, or other device to perform the functions / actions specified in one or more boxes of a flowchart and / or block diagram.
[0331] The flowcharts and block diagrams in the accompanying drawings illustrate the architecture, functionality, and operation of possible implementations of systems, methods, and computer program products according to various embodiments of the present disclosure. In this regard, each block in a flowchart or block diagram may represent a module, segment, or portion of an instruction containing one or more executable instructions for implementing a specified logical function. In some alternative implementations, the functions marked in the blocks may occur in a different order than those shown in the drawings. For example, two consecutive blocks may actually be executed substantially in parallel, and they may sometimes be executed in reverse order, depending on the functions involved. It should also be noted that each block in the block diagrams and / or flowcharts, and combinations of blocks in the block diagrams and / or flowcharts, may be implemented using a dedicated hardware-based system that performs the specified function or action, or using a combination of dedicated hardware and computer instructions.
[0332] The computer program product can be implemented specifically through hardware, software, or a combination thereof. In one alternative embodiment, the computer program product is specifically embodied in a computer storage medium; in another alternative embodiment, the computer program product is specifically embodied in a software product, such as a software development kit (SDK), etc.
[0333] The description of the various embodiments above tends to emphasize the differences between the various embodiments. The similarities or similarities between them can be referred to, and for the sake of brevity, they will not be repeated here.
[0334] If the technical solution of this disclosure involves personal information, the product applying the technical solution of this disclosure has clearly informed the user of the personal information processing rules and obtained the user's voluntary consent before processing the personal information. If the technical solution of this disclosure involves sensitive personal information, the product applying the technical solution of this disclosure has obtained the user's separate consent before processing the sensitive personal information, and also meets the requirement of "express consent". For example, at personal information collection devices such as cameras, clear and prominent signs are set up to indicate that the user has entered the scope of personal information collection and that personal information will be collected. If the user voluntarily enters the collection scope, it is deemed to have consented to the collection of their personal information; or on the personal information processing device, with clear signs / information informing the user of the personal information processing rules, authorization is obtained from the user through pop-up information or by asking the user to upload their personal information; wherein, the personal information processing rules may include information such as the personal information processor, the purpose of personal information processing, the processing method, and the types of personal information processed.
[0335] The various embodiments of this disclosure have been described above. These descriptions are exemplary and not exhaustive, nor are they limited to the disclosed embodiments. Many modifications and variations will be apparent to those skilled in the art without departing from the scope and spirit of the described embodiments. The terminology used herein is chosen to best explain the principles, practical application, or improvement of the technology in the market, or to enable others skilled in the art to understand the embodiments disclosed herein.
Claims
1. A video editing method, characterized in that, include: Obtain at least one video to be edited; Obtain the editing control information input by the user; Based on the editing control information, determine the editing ratio information corresponding to different types of video segments; Based on the editing ratio information, at least two candidate video segments are determined from the at least one video to be edited; Based on the at least two candidate video segments, generate at least one candidate video; The editing control information includes target duration information; The step of determining at least two candidate video segments from the at least one video to be edited based on the editing ratio information includes: Based on the target duration information, determine the target number of video segments in the candidate videos; Based on the editing ratio information and the target quantity, at least two candidate video segments are determined from the at least one video to be edited. The step of determining at least two candidate video segments from the at least one video to be edited based on the editing ratio information and the target quantity includes: Based on the editing ratio information and the target quantity, determine the editing elements and camera movement types corresponding to the positions of at least two video segments in the candidate video; From the at least one video to be edited, determine the video to be edited corresponding to the positions of the at least two video segments; For any of the at least two video segment positions, based on the editing elements and camera movement type corresponding to the video segment position, the projection parameters corresponding to the initial video frame of the candidate video segment corresponding to the video segment position and the projection parameters corresponding to the end video frame of the candidate video segment are determined from the video to be edited corresponding to the video segment position. The projection parameters corresponding to the video frame include the position of the projection center and the field of view. Based on the projection parameters corresponding to the initial video frame and the projection parameters corresponding to the end video frame, determine the projection parameters corresponding to the intermediate video frames of the candidate video segment. Based on the projection parameters corresponding to the initial video frame, the projection parameters corresponding to the end video frame, and the projection parameters corresponding to the intermediate video frames, candidate video segments corresponding to the video segments to be edited are determined from the video segments to be edited at the video segment positions.
2. The method according to claim 1, characterized in that, The editing control information includes information about the target editing elements; The step of determining the editing ratio information corresponding to different types of video segments based on the editing control information includes: Based on the information of the target editing element, determine the proportion of the video segment corresponding to the target editing element, and the proportion of the video segment corresponding to at least one editing element other than the target editing element.
3. The method according to claim 1 or 2, characterized in that, The editing control information includes information about the target video style; The step of determining the editing ratio information corresponding to different types of video segments based on the editing control information includes: Based on the information of the target video style, determine the proportion of video segments corresponding to at least two types of camera movement.
4. The method according to claim 1, characterized in that, The step of determining the projection parameters corresponding to the intermediate video frames of the candidate video segment based on the projection parameters corresponding to the initial video frame and the projection parameters corresponding to the end video frame includes: Determine the total number of frames in the candidate video segment and the camera motion parameters corresponding to the candidate video segment; Based on the projection parameters corresponding to the initial video frame, the projection parameters corresponding to the end video frame, the total number of frames, and the camera motion parameters, the projection parameters corresponding to the intermediate video frames of the candidate video segment are determined.
5. The method according to any one of claims 1 to 4, characterized in that, The step of generating at least one candidate video based on the at least two candidate video segments includes: For a first candidate video segment among the at least two candidate video segments, a first score is determined for the candidate video segment pair to which the first candidate video segment belongs, wherein the candidate video segment pair includes the first candidate video segment and a second candidate video segment, and the second candidate video segment is adjacent to the video segment corresponding to the first candidate video segment. In response to the first score meeting the preset conditions, the first candidate video segment is retained; At least one candidate video is generated based on the candidate video segments retained from the at least two candidate video segments.
6. The method according to claim 5, characterized in that, Determining the first score corresponding to the candidate video segment pair to which the first candidate video segment belongs includes: Determine the second score corresponding to the first candidate video segment and the second score corresponding to the second candidate video segment; Determine the third score corresponding to the candidate video segment pair; Based on the second score corresponding to the first candidate video segment, the second score corresponding to the second candidate video segment, and the third score, the first score corresponding to the candidate video segment is determined.
7. The method according to claim 6, characterized in that, Determining the second score corresponding to the first candidate video segment and the second score corresponding to the second candidate video segment includes: Based on the first editing element corresponding to the first candidate video segment, determine the second score corresponding to the first candidate video segment; The second score corresponding to the second candidate video segment is determined based on the second editing element corresponding to the second candidate video segment.
8. The method according to claim 7, characterized in that, The step of determining the second score corresponding to the first candidate video segment based on the first editing element corresponding to the first candidate video segment includes: cropping the video content corresponding to the first editing element from the first candidate video segment based on the first editing element corresponding to the first candidate video segment; and using a pre-trained first neural network to process the first candidate video segment and the video content corresponding to the first editing element to obtain the second score corresponding to the first candidate video segment, wherein the first neural network is pre-trained using a training video set, and the training videos in the training video set are videos that have been captured. The step of determining the second score corresponding to the second candidate video segment based on the second editing element corresponding to the second candidate video segment includes: cropping the video content corresponding to the second editing element from the second candidate video segment based on the second editing element corresponding to the second candidate video segment; and using the first neural network to process the second candidate video segment and the video content corresponding to the second editing element to obtain the second score corresponding to the second candidate video segment.
9. The method according to any one of claims 6 to 8, characterized in that, Determining the third score corresponding to the candidate video segment pair includes: The third score corresponding to the candidate video segment pair is determined based on at least one of the following: Consistency information between the movement direction of the video frames in the first candidate video segment and the movement direction of the person in the first candidate video segment; Consistency information between the movement direction of the video frames in the second candidate video segment and the movement direction of the person in the second candidate video segment; Consistency information between the movement directions of video frames in the first candidate video segment and the second candidate video segment; Similarity information between the editing elements corresponding to the first candidate video segment and the second candidate video segment.
10. The method according to any one of claims 1 to 9, characterized in that, The number of candidate videos is at least two; After generating at least one candidate video, the method further includes: For any one of at least two candidate videos, determine a fourth score that corresponds one-to-one with a combination of video segments in the candidate video, wherein the combination of video segments includes at least three adjacent video segments; At least based on the fourth score, determine the fifth score corresponding to the candidate video; Based on the fifth score, at least one target video is determined from the at least two candidate videos.
11. The method according to claim 10, characterized in that, The determination of the fourth score, which corresponds one-to-one with the combination of video segments in the candidate videos, includes: For any combination of video segments in the candidate videos, a pre-trained second neural network is used to process the combination of video segments to obtain a fourth score corresponding to the combination of video segments.
12. The method according to claim 10 or 11, characterized in that, The step of determining the fifth score corresponding to the candidate video based at least on the fourth score includes: A third score is determined that corresponds one-to-one with the video segment pairs in the candidate videos, wherein the video segment pairs include two adjacent video segments; Based on the fourth score and the third score, a fifth score is determined for the candidate video.
13. The method according to any one of claims 1 to 12, characterized in that, The at least one video to be edited includes: at least one panoramic video.
14. A video editing device, characterized in that, include: The first acquisition module is used to acquire at least one video to be edited; The second acquisition module is used to acquire the clip control information input by the user; The first determining module is used to determine the editing ratio information corresponding to different types of video segments based on the editing control information; The second determining module is used to determine at least two candidate video segments from the at least one video to be edited based on the editing ratio information. A generation module is configured to generate at least one candidate video based on the at least two candidate video segments; The editing control information includes target duration information; The second determining module is used for: Based on the target duration information, determine the target number of video segments in the candidate videos; Based on the editing ratio information and the target quantity, at least two candidate video segments are determined from the at least one video to be edited. The second determining module is used for: Based on the editing ratio information and the target quantity, determine the editing elements and camera movement types corresponding to the positions of at least two video segments in the candidate video; From the at least one video to be edited, determine the video to be edited corresponding to the positions of the at least two video segments; For any of the at least two video segment positions, based on the editing elements and camera movement type corresponding to the video segment position, the projection parameters corresponding to the initial video frame of the candidate video segment corresponding to the video segment position and the projection parameters corresponding to the end video frame of the candidate video segment are determined from the video to be edited corresponding to the video segment position. The projection parameters corresponding to the video frame include the position of the projection center and the field of view. Based on the projection parameters corresponding to the initial video frame and the projection parameters corresponding to the end video frame, determine the projection parameters corresponding to the intermediate video frames of the candidate video segment. Based on the projection parameters corresponding to the initial video frame, the projection parameters corresponding to the end video frame, and the projection parameters corresponding to the intermediate video frames, candidate video segments corresponding to the video segments to be edited are determined from the video segments to be edited at the video segment positions.
15. An electronic device, characterized in that, include: One or more processors; Memory used to store executable instructions; The one or more processors are configured to invoke executable instructions stored in the memory to perform the method according to any one of claims 1 to 13.
16. A computer-readable storage medium having computer program instructions stored thereon, characterized in that, When the computer program instructions are executed by the processor, they implement the method described in any one of claims 1 to 13.
17. A computer program product, characterized in that, Includes computer-readable code, or a non-volatile computer-readable storage medium carrying computer-readable code, wherein when the computer-readable code is executed in an electronic device, the processor in the electronic device performs the method according to any one of claims 1 to 13.