Video processing method and apparatus, terminal device, and storage medium

CN116419010BActive Publication Date: 2026-06-26CHINA MERCHANTS BANK

View PDF 2 Cites 0 Cited by

Patent Information

Authority / Receiving Office: CN · China
Patent Type: Patents(China)
Current Assignee / Owner: CHINA MERCHANTS BANK
Filing Date: 2023-04-24
Publication Date: 2026-06-26

Smart Images

Figure CN116419010B_ABST

Patent Text Reader

Abstract

The application discloses a video processing method and device, terminal equipment and a storage medium. The video processing method comprises the following steps: acquiring a video material; determining a silent frame and a to-be-spliced video according to the video material; performing frame interpolation according to the silent frame and the to-be-spliced video to obtain a to-be-spliced video after frame interpolation; and filling a time axis with the to-be-spliced video after frame interpolation to obtain a spliced video. The application solves the problem that the video splicing position is not coherent in digital human video segment splicing, and improves the smoothness of the spliced video.

Need to check novelty before this filing date? Find Prior Art

Description

Technical Field

[0001] This application relates to the field of computer technology, and in particular to a video processing method, apparatus, terminal device, and storage medium. Background Technology

[0002] Current 2D digital human video clip stitching solutions do not produce videos that are captured continuously in one go, but rather are created by directly stitching together different video segments. Therefore, when there are slight inconsistencies at the beginning and end of each stitched video segment, the stitching points will be discontinuous, resulting in noticeable jumps in the image at the stitching points.

[0003] Therefore, in order to address the problem of discontinuity at the splicing points in current digital human video clip splicing, it is necessary to propose a video processing solution for smoothing the splicing points. Summary of the Invention

[0004] The main objective of this application is to provide a video processing method, apparatus, terminal device, and storage medium, which aims to solve the problem of discontinuity at the splicing points in digital human video clip splicing and improve the smoothness of spliced videos.

[0005] To achieve the above objectives, this application provides a video processing method, the video processing method comprising:

[0006] Obtain video footage;

[0007] Determine the silent frames and the video to be spliced based on the video material;

[0008] Frame interpolation is performed based on the silent frame and the video to be spliced to obtain the spliced video.

[0009] The timeline is filled with the interpolated video to obtain the spliced video.

[0010] Optionally, the step of determining the silent frame and the video to be spliced based on the video material includes:

[0011] Determine the silent frames based on the video footage;

[0012] Determine the type of the video material;

[0013] If the video material is a silent video material, then the first silent video to be spliced is determined based on the silent video material.

[0014] Optionally, the step of interpolating frames based on the silent frame and the video to be stitched to obtain the interpolated video includes:

[0015] The silent frame and the first silent video to be stitched together are interpolated using the optical flow-based image interpolation algorithm IFRNet to obtain the first intermediate frame.

[0016] The second silent video to be spliced is obtained by merging the silent frame, the first intermediate frame, and the first silent video to be spliced.

[0017] The step of filling the timeline with the interpolated video to obtain the spliced video includes:

[0018] Determine the silent intervals of the timeline;

[0019] The second silent video to be spliced is inserted into the start time of the silent interval to obtain a spliced video in a silent state.

[0020] Optionally, after the step of determining the type of the video material, the method further includes:

[0021] If the video material is an action state video material, then the first action video to be spliced is determined based on the action state video material.

[0022] Optionally, the step of interpolating frames based on the silent frame and the video to be stitched to obtain the interpolated video includes:

[0023] The silent frame and the first video action to be stitched together are interpolated using the optical flow-based image interpolation algorithm IFRNet to obtain the second intermediate frame.

[0024] The second video to be spliced is obtained by merging the silent frame, the second intermediate frame, and the first video to be spliced.

[0025] The step of filling the timeline with the interpolated video to obtain the spliced video includes:

[0026] Determine the action range of the timeline;

[0027] The second video of the action to be spliced is inserted into the start time of the action interval to obtain a spliced video of the action state.

[0028] Optionally, the step of determining the first silent video to be spliced based on the silent video footage includes:

[0029] Based on the silent video material, determine several first silent video segments of different durations to be spliced together;

[0030] The step of interpolating the silent frame and the first silent video to be stitched together using the optical flow-based image interpolation algorithm IFRNet to obtain the first intermediate frame includes:

[0031] The silent frame and each segment of the first silent video to be spliced with different durations are interpolated using the optical flow-based image interpolation algorithm IFRNet to obtain the first intermediate frame corresponding to the first silent video to be spliced with different durations.

[0032] The step of merging the silent frame, the first intermediate frame, and the first silent video to be spliced to obtain the second silent video includes:

[0033] The second silent video to be spliced is obtained by merging the silent frame, the first intermediate frame, and the corresponding first silent videos of different durations.

[0034] Optionally, the step of acquiring video materials includes:

[0035] Obtain video footage in a silent state for a preset duration; and / or,

[0036] Get video footage of motion states where the start and end states are silent.

[0037] This application also proposes a video processing apparatus, the video processing apparatus comprising:

[0038] The acquisition module is used to acquire video footage;

[0039] The determination module is used to determine the silent frames and the video to be spliced based on the video material;

[0040] The frame interpolation module is used to interpolate frames based on the silent frames and the video to be spliced, so as to obtain the spliced video after interpolation.

[0041] The splicing module is used to fill the timeline with the interpolated video to obtain the spliced video.

[0042] This application also proposes a terminal device, which includes a memory, a processor, and a video processing program stored in the memory and executable on the processor. When the video processing program is executed by the processor, it implements the steps of the video processing method described above.

[0043] This application also proposes a computer-readable storage medium storing a video processing program, which, when executed by a processor, implements the steps of the video processing method described above.

[0044] The video processing method, apparatus, terminal device, and storage medium proposed in this application involve acquiring video footage; determining silent frames and videos to be stitched based on the video footage; interpolating frames based on the silent frames and videos to be stitched to obtain interpolated videos; and filling the timeline with the interpolated videos to obtain stitched videos. By determining silent frames and videos to be stitched from the video footage, using silent frames to perform frame interpolation smoothing on the videos to be stitched, and filling the timeline with the interpolated videos to be stitched, a smoothed stitched video can be obtained, improving the overall smoothness of the video. Based on the solution of this application, by constructing a set of 2D digital human video footage and applying the video processing method proposed in this application to this video footage, the problem of discontinuity at the stitching points in current digital human video clip stitching is effectively solved, making the transition of the stitched video images smoother and improving the smoothness of the stitched video. Attached Figure Description

[0045] Figure 1 A schematic diagram illustrating the stitching of existing digital human video clips;

[0046] Figure 2 This is a schematic diagram of the functional modules of the terminal equipment to which the video processing apparatus of this application belongs;

[0047] Figure 3 This is a flowchart illustrating a first exemplary embodiment of the video processing method of this application;

[0048] Figure 4 This is a schematic diagram of silent video smoothing, which is a second exemplary embodiment of the video processing method of this application.

[0049] Figure 5 This is a schematic diagram illustrating the specific process of filling the timeline of the interpolated video to be spliced to obtain the spliced video, which is a second exemplary embodiment of the video processing method of this application.

[0050] Figure 6 This is a schematic diagram of motion video smoothing, representing a third exemplary embodiment of the video processing method of this application.

[0051] Figure 7 This is a schematic diagram illustrating the specific process of filling the timeline of the interpolated video to be spliced to obtain the spliced video, which is a third exemplary embodiment of the video processing method of this application.

[0052] Figure 8 This is a schematic diagram of several segments of silent videos of different durations to be spliced, which are involved in the fourth exemplary embodiment of the video processing method of this application.

[0053] Figure 9 This is a schematic diagram of the silent interval of the time axis involved in the fourth exemplary embodiment of the video processing method of this application;

[0054] Figure 10 This is a schematic diagram illustrating the splicing of video segments before and after smoothing, as described in the fifth exemplary embodiment of the video processing method of this application.

[0055] The realization of the purpose, functional features and advantages of this application will be further explained in conjunction with the embodiments and with reference to the accompanying drawings. Detailed Implementation

[0056] It should be understood that the specific embodiments described herein are merely illustrative of this application and are not intended to limit this application.

[0057] This application embodiment takes into account that current digital human video clip splicing schemes mainly involve directly splicing different video clips, such as... Figure 1 As shown, Figure 1 This is a schematic diagram of existing digital human video clip stitching, where blank segments represent silent video clips and shaded segments represent action video clips. Throughout the timeline, the video is not captured continuously but is stitched together from different video clips; therefore, the video at the stitching points is discontinuous. For example... Figure 1 At times t1, t2, and t3, the image changes are quite noticeable.

[0058] Therefore, to address the issue of discontinuity at video splicing points, the main solution of this application is as follows: First, acquire video footage. Second, determine silent frames and the video to be spliced based on the video footage. Third, perform frame interpolation on the silent frames and the video to be spliced to obtain the spliced video with interpolated frames. Fourth, fill the timeline with the spliced video to obtain the spliced video. By determining silent frames and the video to be spliced from the video footage, using the silent frames for smoothing the spliced video, and then filling the timeline with the spliced video, a smoothed spliced video can be obtained, improving the overall smoothness of the video. Based on this application's solution, by constructing a set of 2D digital human video footage and applying the video processing method proposed in this application to this video footage, the problem of discontinuity at video splicing points in current digital human video clip splicing is effectively solved, resulting in smoother transitions in the spliced video images and improving the smoothness of the spliced video.

[0059] Specifically, refer to Figure 2 , Figure 2 This is a functional module diagram of the terminal device to which the video processing apparatus of this application belongs. The video processing apparatus can be a device capable of video processing, independent of the terminal device, and can be implemented on the terminal device in hardware or software form. The terminal device can be a smart mobile terminal with data processing capabilities, such as a mobile phone or tablet computer, or it can be a fixed terminal device or server with data processing capabilities.

[0060] In this embodiment, the terminal device to which the video processing device belongs includes at least an output module 110, a processor 120, a memory 130, and a communication module 140.

[0061] The memory 130 stores the operating system and video processing program. The video processing device can store information such as acquired video material, silent frames and videos to be stitched determined based on the video material, interpolated frames to be stitched to obtain the interpolated video, and the stitched video obtained by filling the timeline with the interpolated videos to be stitched to obtain the stitched video. The output module 110 can be a display screen, etc. The communication module 140 can include a WIFI module, a mobile communication module, and a Bluetooth module, etc., and communicates with external devices or servers through the communication module 140.

[0062] When the video processing program in memory 130 is executed by the processor, it performs the following steps:

[0063] Obtain video footage;

[0064] Determine the silent frames and the video to be spliced based on the video material;

[0065] Frame interpolation is performed based on the silent frame and the video to be spliced to obtain the spliced video.

[0066] The timeline is filled with the interpolated video to obtain the spliced video.

[0067] Furthermore, when the video processing program in memory 130 is executed by the processor, it also performs the following steps:

[0068] Determine the silent frames based on the video footage;

[0069] Determine the type of the video material;

[0070] If the video material is a silent video material, then the first silent video to be spliced is determined based on the silent video material.

[0071] Furthermore, when the video processing program in memory 130 is executed by the processor, it also performs the following steps:

[0072] The silent frame and the first silent video to be stitched together are interpolated using the optical flow-based image interpolation algorithm IFRNet to obtain the first intermediate frame.

[0073] The second silent video to be spliced is obtained by merging the silent frame, the first intermediate frame, and the first silent video to be spliced.

[0074] Determine the silent intervals of the timeline;

[0075] The second silent video to be spliced is inserted into the start time of the silent interval to obtain a spliced video in a silent state.

[0076] Furthermore, when the video processing program in memory 130 is executed by the processor, it also performs the following steps:

[0077] If the video material is an action state video material, then the first action video to be spliced is determined based on the action state video material.

[0078] Furthermore, when the video processing program in memory 130 is executed by the processor, it also performs the following steps:

[0079] The silent frame and the first video action to be stitched together are interpolated using the optical flow-based image interpolation algorithm IFRNet to obtain the second intermediate frame.

[0080] The second video to be spliced is obtained by merging the silent frame, the second intermediate frame, and the first video to be spliced.

[0081] Determine the action range of the timeline;

[0082] The second video of the action to be spliced is inserted into the start time of the action interval to obtain a spliced video of the action state.

[0083] Furthermore, when the video processing program in memory 130 is executed by the processor, it also performs the following steps:

[0084] Based on the silent state video material, determine the silent frame and several first silent video segments of different durations to be spliced together;

[0085] The silent frame and each segment of the first silent video to be spliced with different durations are interpolated using the optical flow-based image interpolation algorithm IFRNet to obtain the first intermediate frame corresponding to the first silent video to be spliced with different durations.

[0086] The second silent video to be spliced is obtained by merging the silent frame, the first intermediate frame, and the corresponding first silent videos of different durations.

[0087] Furthermore, when the video processing program in memory 130 is executed by the processor, it also performs the following steps:

[0088] Obtain video footage in a silent state for a preset duration; and / or,

[0089] Get video footage of motion states where the start and end states are silent.

[0090] This embodiment, through the above-described scheme, specifically involves acquiring video footage; determining silent frames and videos to be stitched based on the video footage; interpolating frames based on the silent frames and videos to be stitched to obtain interpolated videos to be stitched; and filling the timeline with the interpolated videos to be stitched to obtain the stitched video. By determining silent frames and videos to be stitched from the video footage, using the silent frames to perform frame interpolation smoothing on the videos to be stitched, and then filling the timeline with the interpolated videos to be stitched, a smoothed stitched video can be obtained, improving the overall smoothness of the video. Based on the scheme of this application, by constructing a set of 2D digital human video footage and applying the video processing method proposed in this application to this video footage, the problem of discontinuity at the splicing points in current digital human video clip splicing is effectively solved, making the transition of the stitched video images smoother and improving the smoothness of the stitched video.

[0091] Based on, but not limited to, the terminal device architecture described above, this application proposes method embodiments.

[0092] First Embodiment

[0093] The execution subject of the method in this application embodiment can be a video processing device, a video processing terminal device, or a server. This embodiment takes a video processing device as an example, which can be integrated into terminal devices such as smartphones and tablets with data processing functions.

[0094] The solution in this embodiment mainly achieves smoothing of spliced videos, especially 2D digital human spliced videos, and improves the overall smoothness of the video.

[0095] Reference Figure 3 , Figure 3 This is a flowchart illustrating a first exemplary embodiment of the video processing method of this application. The video processing method includes:

[0096] Step S10: Obtain video footage.

[0097] In this embodiment, the video processing device first acquires video footage, which refers to video containing the silent state and / or motion state of the 2D digital human. Optionally, the method for acquiring video footage can be reading video footage pre-stored in a local storage unit, receiving video footage sent by an external device, capturing video footage with a recording device, or creating and rendering video footage using an AI drawing tool, etc. In this embodiment, there is no specific limitation on the number of video footage.

[0098] Step S20: Determine the silent frames and the video to be spliced based on the video material.

[0099] In this embodiment, image frames meeting the requirements are selected as silent frames based on the acquired video footage, and video segments meeting the requirements are selected as the videos to be stitched together. Optionally, for the determination of silent frames, image frames with neutral poses, symmetrical facial features, and calm and gentle expressions of the digital human can be selected from the video footage. Optionally, for the determination of the segments to be stitched together, video segments of a specific duration or video segments showing specific actions can be selected from the video footage. In this embodiment, there is no specific limitation on the number of videos to be stitched together.

[0100] Specifically, the determination of the segments to be spliced begins with identifying the type of video footage obtained. If the video footage is in a silent state, meaning the digital human's actions are relatively still, then a silent video segment of a specific duration is selected from the video footage, or a silent video segment of a specific duration is edited from the video footage as the video to be spliced. If the video footage is in an action state, meaning the digital human exhibits specific actions, then action video segments that meet the requirements are selected from the video footage as the video to be spliced, such as speaking, smiling, or nodding.

[0101] Step S30: Interpolate frames based on the silent frame and the video to be spliced to obtain the spliced video after interpolation.

[0102] In this embodiment, frame interpolation smoothing is performed based on the determined silent frames and the video to be spliced to obtain the spliced video after frame interpolation. Specifically, firstly, the silent frames are used to perform frame interpolation smoothing on the starting frame of the video to be spliced to obtain the intermediate frame corresponding to the starting frame. Then, the silent frames, intermediate frames, and the starting frame of the video to be spliced are merged sequentially to complete the smoothing of the starting frame of the video to be spliced. Next, the silent frames are used to perform frame interpolation smoothing on the ending frame of the video to be spliced to obtain the intermediate frame corresponding to the ending frame. Then, the ending frame, intermediate frames, and silent frames of the video to be spliced are merged sequentially to complete the smoothing of the ending frame of the video to be spliced, that is, to merge to obtain the spliced video after frame interpolation.

[0103] Step S40: Fill the timeline with the interpolated video to be spliced to obtain the spliced video.

[0104] In this embodiment, for a given timeline, the interpolated video to be stitched is filled into the timeline to obtain the stitched video. Specifically, the timeline and the start time of the interpolated video to be stitched on that timeline are determined, and the interpolated video to be stitched is inserted into the corresponding start time on the timeline.

[0105] More specifically, if there are multiple interpolated video segments to be spliced, the timeline and the start time of each interpolated video segment on the timeline are determined according to the splicing order of the interpolated video segments. Each interpolated video segment is then inserted into the corresponding start time on the timeline in the splicing order.

[0106] This embodiment, through the above-described scheme, specifically involves acquiring video footage; determining silent frames and videos to be stitched based on the video footage; interpolating frames based on the silent frames and videos to be stitched to obtain interpolated videos to be stitched; and filling the timeline with the interpolated videos to be stitched to obtain the stitched video. By determining silent frames and videos to be stitched from the video footage, using the silent frames to perform frame interpolation smoothing on the videos to be stitched, and then filling the timeline with the interpolated videos to be stitched, a smoothed stitched video can be obtained, improving the overall smoothness of the video. Based on the scheme of this application, by constructing a set of 2D digital human video footage and applying the video processing method proposed in this application to this video footage, the problem of discontinuity at the splicing points in current digital human video clip splicing is effectively solved, making the transition of the stitched video images smoother and improving the smoothness of the stitched video.

[0107] Second Embodiment

[0108] Furthermore, based on the first embodiment described above, in this embodiment, step S20, determining the silent frame and the video to be spliced based on the video material, may include:

[0109] Step S201: Determine the silent frame based on the video material;

[0110] Step S202: Determine the type of the video material;

[0111] Step S203: If the type of the video material is a silent state video material, then determine the first silent video to be spliced based on the silent state video material.

[0112] Specifically, image frames that meet the requirements are selected from the acquired video footage and determined as silent frames. The type of the acquired video footage is determined. If the video footage is a silent state video footage, that is, the digital human's movements in the video are relatively silent, then a silent video segment that meets the requirements is selected from the silent state video footage and determined as the first silent video to be spliced.

[0113] Optionally, in this embodiment, step S30, which involves interpolating frames based on the silent frame and the video to be stitched together to obtain the interpolated video, may include:

[0114] Step S301: Interpolate the silent frame and the first silent video to be stitched together using the optical flow-based image interpolation algorithm IFRNet to obtain the first intermediate frame;

[0115] Step S302: The second silent video to be spliced is obtained by merging the silent frame, the first intermediate frame and the first silent video to be spliced.

[0116] Specifically, for the determined silent frame and the first silent video to be stitched together, the optical flow-based image interpolation algorithm IFRNet is used to perform frame interpolation smoothing processing on the silent frame and the first silent video to be stitched together to obtain the first intermediate frame. Based on the determined silent frame, the first intermediate frame, and the first silent video to be stitched together, they are sequentially merged to form the second silent video to be stitched together.

[0117] More specifically, refer to Figure 4 , Figure 4 This is a schematic diagram of silent video smoothing, representing a second exemplary embodiment of the video processing method of this application. Image frames with a neutral digital human pose, well-defined facial features, and a calm and gentle expression are selected from silent video footage and defined as silent frames F. S A relatively static video clip featuring the digital human's movements was selected as the first static video to be spliced. S First, the optical flow-based image frame interpolation algorithm IFRNet is used to interpolate silent frames F. S And the first silent video to be spliced V S Interpolate frames from the starting frame to obtain the first intermediate frame F of the starting frame. 1a F 2a ...F na Then the silent frame F S The first intermediate frame F of the start frame 1a F 2a ...F na And the first silent video to be spliced V S The starting frames are merged sequentially to complete the first silent video V to be spliced. S Smoothing of the initial frame; then, similarly, the optical flow-based image interpolation algorithm IFRNet is used to smooth the silent frame F. S And the first silent video to be spliced V S Interpolate frames from the terminating frame to obtain the first intermediate frame F of the terminating frame. 1b F 2b ...F nb Then, the first silent video to be spliced, V S The termination frame, the first intermediate frame of the termination frame F 1b F 2b ...F nb and silent frame F S Merge sequentially to complete the first silent video V to be spliced.S The smoothing of the termination frames is then performed, and the merged frames result in the second silent video to be spliced.

[0118] Furthermore, referring to Figure 5 , Figure 5 This is a schematic diagram illustrating the specific process of filling the timeline of the interpolated video to be stitched together to obtain a stitched video, as described in the second exemplary embodiment of the video processing method of this application. In this embodiment, step S40, which involves filling the timeline of the interpolated video to be stitched together to obtain a stitched video, may include:

[0119] Step S401: Determine the silent interval of the time axis;

[0120] Step S402: Insert the second silent video to be spliced into the start time of the silent interval to obtain a spliced video in a silent state.

[0121] Specifically, for the second silent video to be stitched after frame interpolation, the silent interval on the timeline is first determined, where the silent interval includes the start and end times of the silent video. Then, the second silent video to be stitched is inserted into the start time of the silent interval to obtain the stitched video in a silent state.

[0122] More specifically, first, the silent interval of the timeline is determined. If multiple interpolated second silent videos need to be spliced within this silent interval, the start time of each interpolated second silent video within the silent interval is determined according to the splicing order of the interpolated second silent videos. Then, each interpolated second silent video is inserted into the corresponding start time within the silent interval in the splicing order to obtain the spliced video in a silent state.

[0123] This embodiment, through the above-described scheme, specifically involves acquiring video footage; determining silent frames and videos to be stitched based on the video footage; interpolating frames based on the silent frames and videos to be stitched to obtain interpolated videos to be stitched; and filling the timeline with the interpolated videos to be stitched to obtain the stitched video. By determining silent frames and videos to be stitched from the video footage, using the silent frames to perform frame interpolation smoothing on the videos to be stitched, and then filling the timeline with the interpolated videos to be stitched, a smoothed stitched video can be obtained, improving the overall smoothness of the video. Based on the scheme of this application, by constructing a set of 2D digital human video footage and applying the video processing method proposed in this application to this video footage, the problem of discontinuity at the splicing points in current digital human video clip splicing is effectively solved, making the transition of the stitched video images smoother and improving the smoothness of the stitched video.

[0124] Third Embodiment

[0125] Furthermore, based on the second embodiment described above, in this embodiment, after determining the type of the video material in step S202, the following may be included:

[0126] Step S204: If the type of the video material is action state video material, then determine the first action video to be spliced based on the action state video material.

[0127] Specifically, the type of the acquired video material is determined. If the type of video material is action state video material, that is, the digital human in the video has specific action performance, then the action video segment that meets the requirements is selected as the first action video to be spliced.

[0128] It should be noted that, in order to ensure the consistency between the beginning and end of the interpolated silent video and the interpolated motion video, the silent frames selected for interpolation are the same.

[0129] Optionally, in this embodiment, step S30, which involves interpolating frames based on the silent frame and the video to be stitched together to obtain the interpolated video, may include:

[0130] Step S303: The silent frame and the first video to be spliced are interpolated using the optical flow-based image interpolation algorithm IFRNet to obtain the second intermediate frame.

[0131] Step S304: The second action video to be spliced is obtained by merging the silent frame, the second intermediate frame and the first action video to be spliced.

[0132] Specifically, for the determined silent frame and the first action video to be stitched together, the optical flow-based image interpolation algorithm IFRNet is used to perform frame interpolation smoothing on the silent frame and the first action video to be stitched together, resulting in the second intermediate frame. Based on the determined silent frame, the second intermediate frame, and the first action video to be stitched together, they are sequentially merged to form the second action video to be stitched together.

[0133] More specifically, refer to Figure 6 , Figure 6 This is a schematic diagram of motion video smoothing in a third exemplary embodiment of the video processing method of this application. Motion video segments with specific motion performances of a digital human are selected from motion state video footage and determined as the first motion video V to be spliced. A First, the optical flow-based image frame interpolation algorithm IFRNet is used to interpolate silent frames F. S And the first motion video to be spliced V A Interpolate frames from the starting frame to obtain the second intermediate frame F of the starting frame. 1c F 2c ...F nc Then the silent frame FS The second intermediate frame F of the starting frame 1c F 2c ...F nc And the first motion video to be spliced V A The starting frames are merged sequentially to complete the first action video V to be spliced. A Smoothing of the initial frame; then, similarly, the optical flow-based image interpolation algorithm IFRNet is used to smooth the silent frame F. S And the first motion video to be spliced V A Interpolate frames from the terminating frame to obtain the second intermediate frame F of the terminating frame. 1d F 2d ...F nd Then, the first video of the action to be spliced, V A The terminating frame, the second intermediate frame of the terminating frame F 1d F 2d ...F nd and silent frame F S The merging process is performed sequentially to complete the first video of the action to be spliced, V. A The smoothing of the termination frame is then performed, and the merged frames result in the second video of the action to be spliced.

[0134] Furthermore, referring to Figure 7 , Figure 7 This is a schematic diagram illustrating the specific process of filling the timeline of the interpolated video to be stitched together to obtain a stitched video, as described in the third exemplary embodiment of the video processing method of this application. In this embodiment, step S40, which involves filling the timeline of the interpolated video to be stitched together to obtain a stitched video, may include:

[0135] Step S403: Determine the action range of the time axis;

[0136] Step S404: Insert the second video of the action to be spliced into the start time of the action interval to obtain a spliced video of the action state.

[0137] Specifically, for the second action video to be stitched after frame interpolation, the action interval on the timeline is first determined, where the action interval includes the start and end times of the action video. Then, the second action video to be stitched after frame interpolation is inserted into the start time of the action interval to obtain the stitched video of the action state.

[0138] More specifically, if there are multiple interpolated second action videos to be spliced, then the action interval corresponding to each interpolated second action video to be spliced is determined, and each interpolated second action video to be spliced is sequentially inserted into the start time of the corresponding action interval to obtain the spliced video of the action state.

[0139] This embodiment, through the above-described scheme, specifically involves acquiring video footage; determining silent frames and videos to be stitched based on the video footage; interpolating frames based on the silent frames and videos to be stitched to obtain interpolated videos to be stitched; and filling the timeline with the interpolated videos to be stitched to obtain the stitched video. By determining silent frames and videos to be stitched from the video footage, using the silent frames to perform frame interpolation smoothing on the videos to be stitched, and then filling the timeline with the interpolated videos to be stitched, a smoothed stitched video can be obtained, improving the overall smoothness of the video. Based on the scheme of this application, by constructing a set of 2D digital human video footage and applying the video processing method proposed in this application to this video footage, the problem of discontinuity at the splicing points in current digital human video clip splicing is effectively solved, making the transition of the stitched video images smoother and improving the smoothness of the stitched video.

[0140] Fourth embodiment

[0141] Furthermore, based on the second or third embodiment described above, in this embodiment, the step of determining the first silent video to be spliced based on the silent video material may include:

[0142] Step S2011: Determine several segments of first silent video to be spliced based on the silent state video material.

[0143] Specifically, if the video material is in a silent state, then several silent video segments of different durations are selected from the silent video material to determine the first silent video to be spliced. In this embodiment, the number of selected silent video segments of different durations is no less than two. For example, refer to... Figure 8 Several silent video segments with durations slightly less than 2s, 4s, and 8s were selected as the first silent video to be spliced. These segments were then combined with the first intermediate frame and the silent frame to obtain several interpolated silent video segments with durations of 2s, 4s, and 8s.

[0144] Optionally, in this embodiment, step S301 above, which involves interpolating the silent frame and the first silent video to be stitched together using the optical flow-based image interpolation algorithm IFRNet to obtain the first intermediate frame, may include:

[0145] Step S3011: The silent frame and each segment of the first silent video to be spliced with different durations are spliced using the optical flow-based image interpolation algorithm IFRNet to obtain the first intermediate frame corresponding to the first silent video to be spliced with different durations.

[0146] Specifically, for a given silent frame and several segments of first silent video to be spliced with different durations, the optical flow-based image interpolation algorithm IFRNet is used to perform frame interpolation smoothing on the silent frame and each segment of first silent video to be spliced with different durations, to obtain the first intermediate frame corresponding to each segment of first silent video to be spliced with different durations.

[0147] Optionally, in this embodiment, step S302 above, which involves merging the silent frame, the first intermediate frame, and the first silent video to be spliced to obtain the second silent video, may include:

[0148] Step S3021: The second silent video to be spliced is obtained by merging the silent frame, the first intermediate frame and the corresponding first silent videos of different durations.

[0149] Specifically, for each first silent video of different durations, the determined silent frames, the corresponding first intermediate frames, and the first silent video to be spliced are sequentially merged into second silent videos of different durations.

[0150] Since the relative silence state of a digital human varies in micro-expressions and micro-movements across different durations of silent videos, a method can be used to increase the diversity of the spliced silent videos. This method involves determining several segments of silent videos of different durations to be spliced based on the silent video footage, performing frame interpolation smoothing on the silent frames and the segments of silent videos of different durations, and then filling the timeline with the interpolated silent videos of different durations to obtain the spliced silent video.

[0151] Optionally, refer to Figure 9 , Figure 9 This is a schematic diagram of the silent interval of the time axis involved in the fourth exemplary embodiment of the video processing method of this application. The above step S401, determining the silent interval of the time axis, may include:

[0152] Step S4011: Determine the silence duration;

[0153] Step S4012: Determine the number of still frames and the number of second silent videos of different durations to be spliced based on the silence duration;

[0154] Step S4013: Obtain the corresponding number of silent frames and the time axis of the second silent video to be spliced with different durations according to the quantity to obtain the silent interval.

[0155] Specifically, in order to determine the silence interval, the silence duration L is first determined. S Based on the determined silence duration L SDetermine the number of still frames and the number of second silent videos of different durations to be spliced. Then, obtain the corresponding number of still frames based on the number of still frames, and obtain the corresponding number of second silent videos of different durations to be spliced based on the number of second silent videos of different durations. Combine the obtained still frames and second silent videos of different durations to be spliced into the timeline to obtain the silent interval.

[0156] For example, for the several second silent video segments with durations of 2s, 4s, and 8s selected in this embodiment, the number of still frames and the number of second silent video segments with different durations can be determined by the following formulas 1-5:

[0157] Silence duration L S The relationship between the number of still frames and the number of second silent videos to be spliced of different durations is as follows:

[0158] L S = a*8+b*4+c*2+d / frate (1)

[0159] in,

[0160] a = L S / / 8 (2)

[0161] b = (L S -a*8) / / 4 (3)

[0162] c=(L S -a*8-b*4) / / 2 (4)

[0163] d=(L S -a*8-b*4-c*2)*frate (5)

[0164] In the above formula, a is V S8 Quantity, V S8 This is the second silent video to be spliced, with a duration of 8 seconds; b represents V. S4 Quantity, V S4 This is the second silent video to be stitched together, with a duration of 4 seconds. c represents V. S2 Quantity, V S2 This is the second silent video to be spliced, with a duration of 2 seconds, where d represents the silent frame F. S The quantity, * for multiplication, / / for integer division, and frate for frame rate.

[0165] This embodiment, through the above-described scheme, specifically involves acquiring video footage; determining silent frames and videos to be stitched based on the video footage; interpolating frames based on the silent frames and videos to be stitched to obtain interpolated videos to be stitched; and filling the timeline with the interpolated videos to be stitched to obtain the stitched video. By determining silent frames and videos to be stitched from the video footage, using the silent frames to perform frame interpolation smoothing on the videos to be stitched, and then filling the timeline with the interpolated videos to be stitched, a smoothed stitched video can be obtained, improving the overall smoothness of the video. Based on the scheme of this application, by constructing a set of 2D digital human video footage and applying the video processing method proposed in this application to this video footage, the problem of discontinuity at the splicing points in current digital human video clip splicing is effectively solved, making the transition of the stitched video images smoother and improving the smoothness of the stitched video.

[0166] Fifth Embodiment

[0167] Furthermore, based on the above embodiments, in this embodiment, step S10, obtaining video materials, may include:

[0168] Step S101: Obtain video footage of a preset duration in a silent state; and / or,

[0169] Step S102: Obtain video footage of motion states where the start and end states are silent.

[0170] Specifically, the video processing device acquires silent state video footage of a preset duration, and / or acquires motion state video footage of the digital human whose start and end states are silent. Optionally, to maintain a smooth and fluid viewing experience, motion state video footage of the digital human whose movements are uniform and whose start and end states are silent is selected.

[0171] Optionally, in this embodiment, the example is to obtain silent state video material of a preset duration and action state video material with a start and end state of silence; in other embodiments, silent state video material of a preset duration can be obtained, or action state video material with a start and end state of silence can be obtained.

[0172] More specifically, refer to Figure 10 , Figure 10 This is a schematic diagram illustrating the splicing of video segments before and after smoothing, as described in the fifth exemplary embodiment of the video processing method of this application. The video processing device first acquires silent state video footage of a preset duration, and acquires motion state video footage of the digital human whose start and end states are silent.

[0173] Then, based on the acquired video footage, silent frames and videos to be spliced are determined. For silent frames, images of the digital human with a neutral posture, well-defined facial features, and a calm and gentle expression are selected from the silent video footage. For the segments to be spliced, if the video footage is action-oriented (i.e., the digital human exhibits specific actions), then action video segments meeting the requirements are selected as the first action video to be spliced, such as speaking, smiling, or nodding. If the video footage is silent (i.e., the digital human's actions are relatively silent), then several silent video segments of a specific duration are selected from the silent video footage, or several silent video segments of a specific duration are edited and produced from the silent video footage as the first silent video to be spliced. In this embodiment, the number of several silent video segments of a specific duration is no less than two.

[0174] Next, frame interpolation and smoothing processing is performed on the determined silent frames and the videos to be stitched together to obtain the interpolated videos to be stitched together. Specifically, for the determined silent frames and the first action video to be stitched together, the optical flow-based image interpolation algorithm IFRNet is used to perform frame interpolation and smoothing processing on the silent frames and the first action video to be stitched together to obtain the second intermediate frame. Based on the determined silent frames, the second intermediate frame, and the first action video to be stitched together, they are sequentially merged to form the second action video to be stitched together. For the determined silent frames and several segments of the first silent video to be stitched together with different durations, the optical flow-based image interpolation algorithm IFRNet is used to perform frame interpolation and smoothing processing on the silent frames and each segment of the first silent video to be stitched together with different durations to obtain the first intermediate frame corresponding to each segment of the first silent video to be stitched together with different durations. Based on the silent frames, the first intermediate frames, and the corresponding segments of the first silent video to be stitched together with different durations, the second silent video to be stitched together is obtained.

[0175] Next, the timeline is filled with the interpolated video to be stitched together to obtain the stitched video. Specifically, for the second interpolated action video to be stitched together, the action interval of the timeline is first determined, where the action interval includes the start and end times of the action video. Then, the interpolated second action video to be stitched together is inserted into the start time of the action interval to obtain the stitched video in action state. For the remaining timeline and the second interpolated silent video to be stitched together, the silent interval of the timeline is first determined, where the silent interval includes the start and end times of the silent video. Then, the interpolated second silent video to be stitched together is inserted into the start time of the silent interval to obtain the stitched video in silent state. Thus, the smoothed stitched video is obtained.

[0176] It should be noted that, in order to ensure the smoothness of the overall video, except for the first segment of the video to be spliced after the interpolation at the beginning of the timeline (which is generally the silent video to be spliced after the interpolation), when determining the start time of other videos to be spliced after the interpolation, the end time of the previous segment of the video to be spliced after the interpolation is determined as the start time of the next segment of the video to be spliced after the interpolation.

[0177] This embodiment, through the above-described scheme, specifically involves acquiring video footage; determining silent frames and videos to be stitched based on the video footage; interpolating frames based on the silent frames and videos to be stitched to obtain interpolated videos to be stitched; and filling the timeline with the interpolated videos to be stitched to obtain the stitched video. By determining silent frames and videos to be stitched from the video footage, using the silent frames to perform frame interpolation smoothing on the videos to be stitched, and then filling the timeline with the interpolated videos to be stitched, a smoothed stitched video can be obtained, improving the overall smoothness of the video. Based on the scheme of this application, by constructing a set of 2D digital human video footage and applying the video processing method proposed in this application to this video footage, the problem of discontinuity at the splicing points in current digital human video clip splicing is effectively solved, making the transition of the stitched video images smoother and improving the smoothness of the stitched video.

[0178] Compared to existing technologies, i.e., spliced videos before smoothing, the smoothed video images obtained through the solution in this embodiment have a smoother transition and a higher overall video fluency.

[0179] Furthermore, embodiments of this application also propose a video processing apparatus, the video processing apparatus comprising:

[0180] The acquisition module is used to acquire video footage;

[0181] The determination module is used to determine the silent frames and the video to be spliced based on the video material;

[0182] The frame interpolation module is used to interpolate frames based on the silent frames and the video to be spliced, so as to obtain the spliced video after interpolation.

[0183] The splicing module is used to fill the timeline with the interpolated video to obtain the spliced video.

[0184] The principle and implementation process of video processing in this embodiment are explained in the above embodiments, and will not be repeated here.

[0185] Furthermore, this application also proposes a terminal device, which includes a memory, a processor, and a video processing program stored in the memory and executable on the processor. When the video processing program is executed by the processor, it implements the steps of the video processing method described above.

[0186] Since this video processing program employs all the technical solutions of all the foregoing embodiments when executed by the processor, it has at least all the beneficial effects brought about by all the technical solutions of all the foregoing embodiments, which will not be elaborated here.

[0187] Furthermore, embodiments of this application also propose a computer-readable storage medium storing a video processing program, which, when executed by a processor, implements the steps of the video processing method described above.

[0188] Since this video processing program employs all the technical solutions of all the foregoing embodiments when executed by the processor, it has at least all the beneficial effects brought about by all the technical solutions of all the foregoing embodiments, which will not be elaborated here.

[0189] Compared to existing technologies, the video processing method, apparatus, terminal device, and storage medium proposed in this application involve acquiring video footage; determining silent frames and videos to be stitched based on the video footage; interpolating frames based on the silent frames and videos to be stitched to obtain interpolated videos; and filling the timeline with the interpolated videos to obtain a stitched video. By determining silent frames and videos to be stitched from the video footage, using silent frames for smoothing the interpolated videos, and filling the timeline with the interpolated videos, a smoothed stitched video can be obtained, improving the overall smoothness of the video. Based on the solution of this application, by constructing a set of 2D digital human video footage and applying the video processing method proposed in this application to this video footage, the problem of discontinuity at the stitching points in current digital human video clip stitching is effectively solved, making the transition of the stitched video images smoother and improving the smoothness of the stitched video.

[0190] It should be noted that, in this document, the terms "comprising," "including," or any other variations thereof are intended to cover non-exclusive inclusion, such that a process, method, article, or system that comprises a list of elements includes not only those elements but also other elements not expressly listed, or elements inherent to such a process, method, article, or system. Unless otherwise specified, an element defined by the phrase "comprising one..." does not exclude the presence of other identical elements in the process, method, article, or system that includes that element.

[0191] Furthermore, if the embodiments of this invention involve descriptions such as "first" or "second," these descriptions are for descriptive purposes only and should not be construed as indicating or implying their relative importance or implicitly specifying the number of technical features indicated. Therefore, a feature defined with "first" or "second" may explicitly or implicitly include at least one of those features. Additionally, the technical solutions of the various embodiments can be combined with each other, but this must be based on the ability of those skilled in the art to implement them. If the combination of technical solutions is contradictory or impossible to implement, it should be considered that such a combination of technical solutions does not exist and is not within the scope of protection claimed by this invention.

[0192] The sequence numbers of the embodiments in this application are for descriptive purposes only and do not represent the superiority or inferiority of the embodiments.

[0193] Through the above description of the embodiments, those skilled in the art can clearly understand that the methods of the above embodiments can be implemented by means of software plus necessary general-purpose hardware platforms. Of course, they can also be implemented by hardware, but in many cases the former is a better implementation method. Based on this understanding, the technical solution of this application, in essence, or the part that contributes to the prior art, can be embodied in the form of a software product. This computer software product is stored in a storage medium (such as ROM / RAM, magnetic disk, optical disk) as described above, and includes several instructions to cause a terminal device (which may be a mobile phone, computer, server, controlled terminal, or network device, etc.) to execute the methods of each embodiment of this application.

[0194] The above are merely preferred embodiments of this application and do not limit the patent scope of this application. Any equivalent structural or procedural transformations made using the content of this application's specification and drawings, or direct or indirect applications in other related technical fields, are similarly included within the patent protection scope of this application.

Claims

1. A video processing method, characterized in that, The video processing method includes: Obtain video footage; Determine the silent frames and the video to be spliced based on the video material; Frame interpolation is performed based on the silent frame and the video to be spliced to obtain the spliced video. The timeline is filled with the interpolated video to obtain the spliced video. The step of determining the silent frame and the video to be spliced based on the video material includes: The silent frame is determined based on the video material. The silent frame is an image frame selected from the silent video material in which the digital human is in a neutral pose, has upright facial features, and a calm and gentle expression. Determine the type of the video material; If the video material is a silent video material, then a first silent video to be spliced is determined based on the silent video material. The first silent video to be spliced is a segment in the silent video material where the digital human's movements are relatively still. The step of interpolating frames based on the silent frame and the video to be stitched to obtain the interpolated video includes: The silent frame and the first silent video to be stitched together are interpolated using the optical flow-based image interpolation algorithm IFRNet to obtain the first intermediate frame. The second silent video to be spliced is obtained by merging the silent frame, the first intermediate frame, and the first silent video to be spliced. The step of filling the timeline with the interpolated video to obtain the spliced video includes: Determine the silent intervals of the timeline; The second silent video to be spliced is inserted into the start time of the silent interval to obtain the spliced video in a silent state; The step of determining the first silent video to be spliced based on the silent video material includes: Based on the silent video material, determine several first silent video segments of different durations to be spliced together; The step of interpolating the silent frame and the first silent video to be stitched together using the optical flow-based image interpolation algorithm IFRNet to obtain the first intermediate frame includes: The silent frame and each segment of the first silent video to be spliced with different durations are interpolated using the optical flow-based image interpolation algorithm IFRNet to obtain the first intermediate frame corresponding to the first silent video to be spliced with different durations. The step of merging the silent frame, the first intermediate frame, and the first silent video to be spliced to obtain the second silent video includes: The second silent video to be spliced is obtained by merging the silent frame, the first intermediate frame, and the corresponding first silent videos of different durations.

2. The video processing method as described in claim 1, characterized in that, After the step of determining the type of the video material, the method further includes: If the video material is an action state video material, then the first action video to be spliced is determined based on the action state video material.

3. The video processing method as described in claim 2, characterized in that, The step of interpolating frames based on the silent frames and the video to be stitched to obtain the interpolated video includes: The silent frame and the first video action to be stitched together are interpolated using the optical flow-based image interpolation algorithm IFRNet to obtain the second intermediate frame. The second video to be spliced is obtained by merging the silent frame, the second intermediate frame, and the first video to be spliced. The step of filling the timeline with the interpolated video to obtain the spliced video includes: Determine the action range of the timeline; The second video of the action to be spliced is inserted into the start time of the action interval to obtain a spliced video of the action state.

4. The video processing method as described in claim 1, characterized in that, The steps for acquiring video materials include: Obtain video footage in a silent state for a preset duration; and / or, Get video footage of motion states where the start and end states are silent.

5. A video processing apparatus, characterized in that, The video processing device includes: The acquisition module is used to acquire video footage; The determination module is used to determine the silent frames and the video to be spliced based on the video material; The frame interpolation module is used to interpolate frames based on the silent frames and the video to be spliced, so as to obtain the spliced video after interpolation. The splicing module is used to fill the timeline with the interpolated video to obtain the spliced video; The determining module is further configured to determine a silent frame based on the video material, wherein the silent frame is an image frame selected from the silent video material in which the digital human is in a neutral pose, has upright facial features, and a calm and gentle expression; Determine the type of the video material; If the video material is a silent video material, then a first silent video to be spliced is determined based on the silent video material. The first silent video to be spliced is a segment in the silent video material where the digital human's movements are relatively still. The frame interpolation module is used to interpolate the silent frame and the first silent video to be spliced using the optical flow-based image frame interpolation algorithm IFRNet to obtain the first intermediate frame. The second silent video to be spliced is obtained by merging the silent frame, the first intermediate frame, and the first silent video to be spliced. The step of filling the timeline with the interpolated video to obtain the spliced video includes: Determine the silent intervals of the timeline; The second silent video to be spliced is inserted into the start time of the silent interval to obtain the spliced video in a silent state; The determining module is further configured to determine several segments of first silent video to be spliced based on the silent video material. The step of interpolating the silent frame and the first silent video to be stitched together using the optical flow-based image interpolation algorithm IFRNet to obtain the first intermediate frame includes: The silent frame and each segment of the first silent video to be spliced with different durations are interpolated using the optical flow-based image interpolation algorithm IFRNet to obtain the first intermediate frame corresponding to the first silent video to be spliced with different durations. The step of merging the silent frame, the first intermediate frame, and the first silent video to be spliced to obtain the second silent video includes: The second silent video to be spliced is obtained by merging the silent frame, the first intermediate frame, and the corresponding first silent videos of different durations.

6. A terminal device, characterized in that, The terminal device includes a memory, a processor, and a video processing program stored in the memory and executable on the processor. When the video processing program is executed by the processor, it implements the steps of the video processing method as described in any one of claims 1-4.

7. A computer-readable storage medium, characterized in that, The computer-readable storage medium stores a video processing program, which, when executed by a processor, implements the steps of the video processing method as described in any one of claims 1-4.