A method for playing a real-time video stream supporting an SRT protocol

By setting a time base during video stream playback and scheduling based on the actual presentation timestamp (PTS) of each frame, the stuttering and latency issues caused by frame rate scheduling are resolved, achieving highly smooth and stable video playback, suitable for various video application scenarios.

CN122269079APending Publication Date: 2026-06-23SHENZHEN HECHENG VIDEO TECH CO LTD

Patent Information

Authority / Receiving Office
CN · China
Patent Type
Applications(China)
Current Assignee / Owner
SHENZHEN HECHENG VIDEO TECH CO LTD
Filing Date
2026-03-10
Publication Date
2026-06-23

AI Technical Summary

Technical Problem

Existing video streaming methods are prone to stuttering, latency, or desynchronization issues under frame rate scheduling strategies, especially when there are differences in network latency and processing capabilities, which affects the smoothness and stability of video playback.

Method used

By setting a time base and scheduling according to the actual presentation timestamp (PTS) of each frame, the video frames are ensured to play at the predetermined absolute time. This includes mechanisms such as resetting the time base, discarding non-key frames, and segmented sleep. Combined with hardware-accelerated instruction sets and lock-free queue processing, accurate video frame rendering is achieved.

Benefits of technology

It significantly improves the accuracy and smoothness of video playback, reduces stuttering and latency, enhances the user viewing experience, adapts to different network environments and device performance, and is suitable for scenarios such as live streaming, video-on-demand, and video conferencing.

✦ Generated by Eureka AI based on patent content.

Smart Images

  • Figure CN122269079A_ABST
    Figure CN122269079A_ABST
Patent Text Reader

Abstract

The application belongs to the technical field of video processing, and provides a playing method of real-time video stream supporting SRT protocol, which comprises the following steps: when the first frame of the video stream is received, the difference between the current system time and the presentation timestamp corresponding to the first frame of the video stream is taken as a time reference; for each frame in the video stream, the presentation timestamp corresponding to each frame is added to the time reference to obtain the absolute target playing time corresponding to each frame; when the absolute target playing time arrives, the corresponding video frame is rendered. By setting the time reference and scheduling according to the actual presentation timestamp (PTS) of each frame, the application can effectively eliminate the deviation caused by frame rate calculation, ensure that each frame is played at its predetermined absolute time, and significantly improve the accuracy of playing.
Need to check novelty before this filing date? Find Prior Art

Description

Technical Field

[0001] This invention belongs to the technical field of video processing, and in particular relates to a method for playing real-time video streams that support the SRT protocol. Background Technology

[0002] With the development of internet technology and the advancement of multimedia technology, real-time video streaming supporting the SRT protocol has become an indispensable part of modern internet applications. Video streaming technology is widely used in online video playback, live streaming, video conferencing, and other scenarios, and users are placing increasingly higher demands on the smoothness and stability of video playback. However, existing video streaming playback methods still face some technical challenges in practical applications, particularly in terms of the accuracy and real-time performance of video frame scheduling.

[0003] Traditional video playback methods typically schedule video frames based on frame rate (FPS). While this method theoretically guarantees smooth playback, in practice, due to network latency, differences in processing power, and other factors, the actual rendering time of a frame often deviates from the expected playback time. This frame rate-based scheduling strategy easily leads to problems such as stuttering, latency, or desynchronization during video playback. Summary of the Invention

[0004] In view of this, embodiments of the present invention provide a method for playing real-time video streams that supports the SRT protocol, in order to solve the technical problem that frame rate-based scheduling strategies can easily lead to stuttering, delays or desynchronization in video playback.

[0005] A first aspect of this invention provides a method for playing a real-time video stream supporting the SRT protocol, the method comprising: S1: When the first frame of the video stream is received, the difference between the current system time and the presentation timestamp corresponding to the first frame of the video stream is used as the time reference; S2: For each frame in the video stream, add the presentation timestamp corresponding to each frame to the time base to obtain the absolute target playback time corresponding to each frame; S3: When the absolute target playback time is reached, render the corresponding video frame.

[0006] Furthermore, prior to S3, the following is also included: A1: If the difference between the current system time and the absolute target playback time exceeds the delay threshold and the current frame is a keyframe, the time base is reset to the difference between the current system time and the presentation timestamp of the current frame, and a skip flag is recorded. A2: After the time base is reset, discard non-keyframes whose presentation timestamps are earlier than the skip marker.

[0007] Furthermore, following A2, it also includes: A3: Update the absolute target playback time based on the reset time base; A4: When the updated absolute target playback time is later than the current system time, the waiting time will be recorded. A5: Divide the waiting time into multiple time segments for segmented sleep; A6: If a jump command or stop command is received during segmented hibernation, hibernation will be interrupted immediately, and the jump procedure or stop procedure will be executed.

[0008] Furthermore, steps S1 to S3 are executed in a separate video decoding thread; The player's control commands are sent through a lock-free queue of multiple producers and a single consumer, and are processed serially by an independent player main thread.

[0009] Further, S3 includes: S31: When the absolute target playback time is reached, request the graphics buffer from the display buffer interface and obtain its memory mapping address; S32: Convert the decoded video frame data to a format supported by the graphics buffer using a hardware acceleration instruction set; S33: Submit the processed graphics buffer to the display compositor for display; wherein the hardware acceleration instruction set is the ARM NEON instruction set, and the format supported by the graphics buffer is NV12 format.

[0010] Furthermore, prior to S1, it also includes: The SRT protocol is supported by integrating FFmpeg's libavformat library, and external tags are received to enable real-time streaming mode.

[0011] Furthermore, following S3, the following is also included: B1: After pausing playback, detect the command to resume playback; B2: If the player resumes playback from a paused state, calculate the pause duration between the pause start time and the resume time; B3: Add the pause duration to the time base.

[0012] A second aspect of the present invention provides a playback device for real-time video streams supporting the SRT protocol, comprising: The receiving unit is used to take the difference between the current system time and the presentation timestamp corresponding to the first frame of the video stream as a time reference when the first frame of the video stream is received. The calculation unit is used to add the presentation timestamp of each frame to the time base for each frame in the video stream to obtain the absolute target playback time of each frame. The rendering unit is used to render the corresponding video frame when the absolute target playback time arrives.

[0013] A third aspect of the present invention provides a video player, including a memory, a processor, and a computer program stored in the memory and executable on the processor, wherein the processor executes the computer program to implement the steps of the method for playing a real-time video stream supporting the SRT protocol described in the first aspect.

[0014] A fourth aspect of the present invention provides a computer-readable storage medium storing a computer program that, when executed by a processor, implements the steps of the method for playing a real-time video stream supporting the SRT protocol described in the first aspect.

[0015] The beneficial effects of this invention compared to existing technologies are as follows: Traditional video playback methods rely on frame rate (FPS) for scheduling, which often results in discrepancies between the actual and theoretical frame presentation times due to network fluctuations and processing delays. This invention, by setting a time base and scheduling based on the actual presentation timestamp (PTS) of each frame, effectively eliminates deviations caused by frame rate calculations, ensuring that each frame plays at its predetermined absolute time, thereby significantly improving playback accuracy. By precisely controlling the playback time of video frames, this invention can greatly reduce stuttering and latency during video playback, improving the overall smoothness of the video stream. Users will enjoy a more consistent and natural viewing experience, especially in dynamic and complex scenes. The video streaming playback method of this invention can adapt to different network environments and terminal device performance. Even in situations with poor network conditions or limited device resources, accurate timestamp scheduling can still maintain video playback stability. This adaptability allows the method to be widely applied to various video application scenarios, including live streaming, video-on-demand, and video conferencing. Attached Figure Description

[0016] To more clearly illustrate the technical solutions in the embodiments of the present invention, the drawings used in the description of the embodiments or related technologies will be briefly introduced below. Obviously, the drawings described below are only some embodiments of the present invention. For those skilled in the art, other drawings can be obtained based on these drawings without creative effort.

[0017] Figure 1 A schematic flowchart illustrating the overall processing logic provided by the present invention is shown; Figure 2 A schematic flowchart of a method for playing real-time video streams supporting the SRT protocol provided by the present invention is shown. Figure 3 This diagram illustrates a playback device for real-time video streams supporting the SRT protocol, according to an embodiment of the present invention. Figure 4 A schematic diagram of a video player according to an embodiment of the present invention is shown. Detailed Implementation

[0018] In the following description, specific details such as particular system architectures and techniques are set forth for illustrative purposes and not for limitation, in order to provide a thorough understanding of the embodiments of the invention. However, those skilled in the art will understand that the invention can be implemented in other embodiments without these specific details. In other instances, detailed descriptions of well-known systems, apparatuses, circuits, and methods are omitted so as not to obscure the description of the invention with unnecessary detail.

[0019] This invention provides a method for playing real-time video streams that supports the SRT protocol, in order to solve the technical problem that frame rate-based scheduling strategies can easily lead to stuttering, delays, or desynchronization during video playback.

[0020] To better understand the technical solution of this application, the overall process logic of the solution is described here (for details, please refer to the following embodiments): After receiving a video frame, the system first determines whether it is the first frame. If it is the first frame, the system's current time is subtracted from the frame's presentation timestamp to obtain the initial time base. Then, the presentation timestamp of each frame is added to this time base to obtain the absolute target playback time of that frame. If it is not the first frame, the existing time base is used directly for subsequent video playback processing.

[0021] Real-time / Non-real-time Stream Branch Processing: After obtaining the absolute target playback time, determine the video stream type. If the video stream is a non-real-time stream, skip the delay detection step. If the video stream is a real-time stream, calculate the difference between the current system time and the absolute target playback time to obtain the delay duration.

[0022] Further assessment is made regarding the latency of the real-time stream: If the latency exceeds 200ms and the current frame is a keyframe, the time base is reset to the difference between the current system time and the current frame's rendering timestamp, and a skip flag is recorded. All non-keyframes with rendering timestamps earlier than the skip flag are discarded. The absolute target playback time of all subsequent frames is updated based on the new time base. If the updated absolute target playback time is later than the current system time, the required waiting time is calculated and segmented into 50ms sleep waits. When the target playback time is reached, the current frame and subsequent frames are rendered. If the latency does not meet the above conditions, the original time base remains unchanged, and playback continues as planned.

[0023] First, this invention provides a method for playing real-time video streams that support the SRT protocol. Please refer to... Figure 2 , Figure 2 This diagram illustrates a schematic flowchart of a method for playing real-time video streams supporting the SRT protocol, provided by the present invention. Figure 2 As shown, the method for playing real-time video streams that support the SRT protocol may include the following steps: S1: When the first frame of the video stream is received, the difference between the current system time and the presentation timestamp corresponding to the first frame of the video stream is used as the time reference; PTS is the relative playback time assigned to each frame during video encoding, usually in milliseconds or microseconds. It represents the time offset of the frame relative to the start of the stream (e.g., PTS=0 for the first frame, PTS=33ms for the second frame, if it is 30fps).

[0024] The current system time refers to the real-time clock time provided by the operating system (e.g., System.currentTimeMillis() or clock_gettime(CLOCK_MONOTONIC)).

[0025] The video's internal time coordinate system is aligned with the system time coordinate system (time base = current system time − first frame PTS). The first frame's PTS is mapped to the current playback time, while the PTS of subsequent frames are converted to future absolute system time based on this fixed offset.

[0026] Traditional methods directly wake up and play the next frame periodically based on the frame rate, but the frame rate may be inaccurate (e.g., 29.97fps), decoding time may fluctuate, or system scheduling may be delayed, causing the inter-frame interval to gradually deviate from the ideal value, eventually causing stuttering or audio-visual desynchronization.

[0027] This embodiment uses a fixed time reference, and the playback time of all subsequent frames is calculated based on the same reference point, thus avoiding the accumulation of errors.

[0028] As an optional embodiment of this application, prior to S1, it further includes: supporting the SRT protocol by integrating the libavformat library of FFmpeg and receiving external tags to enable real-time streaming mode.

[0029] S2: For each frame in the video stream, add the presentation timestamp corresponding to each frame to the time base to obtain the absolute target playback time corresponding to each frame; Adding the PTS of each frame to the baseline value obtained in S1 gives the absolute time point at which that frame should be rendered on the system timeline.

[0030] For example: If the first frame PTS=0 and the time base=T0, then the absolute target time of the first frame = T0 (play immediately).

[0031] The second frame PTS is 33ms, and the absolute target time is T0 + 33ms.

[0032] The third frame PTS is 66ms, and the absolute target time is T0 + 66ms.

[0033] Playback scheduling no longer depends on waiting for a fixed interval after the previous frame plays; instead, each frame has its own independent, pre-determined playback time.

[0034] Even if the previous frame is delayed due to slow decoding, the scheduled playback time of the next frame will not be affected (unless it is so severe that frame dropping is necessary), thereby reducing continuous stuttering or synchronization drift.

[0035] S3: When the absolute target playback time is reached, render the corresponding video frame.

[0036] The player needs a high-precision timing mechanism (such as using a system timer, audio clock synchronization, or vertical synchronization signal) to trigger rendering at a calculated absolute time.

[0037] If the decoding of a frame is completed earlier than the target time, it waits. If it is later than the target time, it may be rendered immediately (but already delayed) or discarded to avoid further backlog.

[0038] Traditional frame rate scheduling schedules play each frame until the current time plus the frame interval is reached, at which point the next frame is scheduled to play, leading to propagation of errors. In this embodiment, the playback time for each frame is calculated independently, preventing error accumulation and resulting in more stable long-term synchronization.

[0039] S1 to S3 are executed in an independent video decoding thread; The player's control commands are sent through a lock-free queue of multiple producers and a single consumer, and are processed serially by an independent player main thread.

[0040] Specifically, S3 includes S31 to S33: S31: When the absolute target playback time is reached, request the graphics buffer from the display buffer interface and obtain its memory mapping address; The display buffer interface refers to the low-level interface provided by the operating system or graphics system. In the Android system, this can be APIs related to Surface, ANativeWindow, or GraphicBuffer. These interfaces are responsible for managing the memory blocks (i.e., the graphics buffer) directly related to screen display.

[0041] A graphics buffer is a special memory area whose contents can be directly scanned by a display compositor (such as SurfaceFlinger) and output to the screen. One buffer typically corresponds to one frame of an image.

[0042] Obtaining its memory-mapped address means mapping this system-managed graphics buffer into the current application's process address space, thereby obtaining a pointer (such as void* or uint8_t*) that can directly read and write memory.

[0043] This is a prerequisite for achieving zero-copy or direct memory operations. The traditional, high-overhead approach might involve copying the data to another intermediate buffer and then submitting it via the API.

[0044] Requesting the buffer only at playback time follows the Just-in-Time principle, reducing buffer holding time and potentially lowering overall memory usage and latency. This prepares the system for subsequent direct data filling, avoiding unnecessary memory copies.

[0045] S32: Convert the decoded video frame data to a format supported by the graphics buffer using a hardware acceleration instruction set; The decoded video frame data comes from the output of a video decoder (such as MediaCodec). Its format is YUV420P, YV12, or NV21, etc.

[0046] The graphics buffer supports the NV12 format. NV12 is a YUV half-plane format where the Y component is stored contiguously on one plane, and the U and V components are interleaved on another plane. This is an efficient format natively supported by many mobile GPUs and display hardware.

[0047] It uses the SIMD (Single Instruction Multiple Data) instruction set ARM NEON. NEON instructions can process multiple pixel data at once (such as 8 or 16), improving conversion performance by several times or even tens of times.

[0048] This is a crucial step in addressing the performance bottleneck of mobile video rendering (especially high-resolution, high-frame-rate videos). Optimization with the NEON instruction set enables extremely fast format conversion, ensuring all processing is completed within a limited playback window and preventing frame drops or delays caused by conversion time.

[0049] For the same task, the hardware instruction set completes the task faster than the software loop, allowing the CPU to enter sleep mode earlier and saving power.

[0050] S33: Submit the processed graphics buffer to the display compositor for display; wherein the hardware acceleration instruction set is the ARM NEON instruction set, and the format supported by the graphics buffer is NV12 format.

[0051] After the S32 conversion, the data already exists in the graphics buffer memory acquired by S31. The NV12 format enables a seamless and efficient pipeline from decoder output (many hardware decoders also directly output NV12) to format conversion (NEON has excellent optimization for NV12 generation) and finally to submission and display, minimizing data handling and conversion overhead.

[0052] In the embodiments corresponding to S31 to S33, the high-performance, low-fluctuation rendering implementation provided is the underlying guarantee that the upper-layer precise scheduling algorithm can achieve the expected results.

[0053] As an optional embodiment of this application, A1 to A6 are included before S3: A1: If the difference between the current system time and the absolute target playback time exceeds the delay threshold and the current frame is a keyframe, the time base is reset to the difference between the current system time and the presentation timestamp of the current frame, and a skip flag is recorded. This difference is usually a positive number, indicating that at this point in time, the system clock has exceeded the original scheduled playback time of the current frame. The larger the difference, the more severe the delay.

[0054] The latency threshold is a configurable tolerance value (e.g., 500ms). When the latency exceeds this threshold, it means that the user experience has been significantly affected (persistent stuttering, severe audio-visual desynchronization), at which point the system decides to initiate a catch-up operation.

[0055] A keyframe (I-frame) contains complete image information and can be decoded and rendered independently of preceding and following frames. Understandably, if the time base is reset and frames are skipped at a non-keyframe (P-frame or B-frame), subsequent frames may depend on the skipped frames, leading to decoder instability, screen tearing, decoding errors, and other problems. Resetting at a keyframe allows the decoder to start afresh from a clean state, ensuring image accuracy.

[0056] Resetting the time base to the difference between the current system time and the rendering timestamp of the current frame is the same operation as S1, but it occurs midway through playback. The calculation formula is: New time base = Current system time - PTS of the current keyframe.

[0057] This means that the delayed keyframe will be played immediately (because its new absolute target playback time = new baseline + its PTS = current system time). At the same time, it establishes a new, later starting time point for all subsequent frames.

[0058] The skip flag is a time point or PTS value. It records the location where the catch-up operation occurred; specifically, it is equal to the presentation timestamp (PTS) of this keyframe. This flag is used to guide the next frame drop decision.

[0059] The A1 step provides an exit from a state of persistent latency. By resetting the baseline, the player acknowledges the current latency reality and decides to start playback from the present (the current keyframe) on a new timeline, rather than continuing to chase an old schedule that is no longer achievable. Strictly limiting the catch-up point to keyframes ensures visual coherence and decoding accuracy.

[0060] A2: After the time base is reset, discard non-keyframes whose presentation timestamps are earlier than the skip marker.

[0061] Suppose there is a severe playback delay, and the current system time has reached the frame that should be playing at PTS=5000ms, but the decoder has only processed the key frame (I-frame) at PTS=3000ms.

[0062] The A1 step resets the baseline at the I-frame where PTS=3000ms and records the skip flag = 3000ms.

[0063] Step A2 will discard all P / B frames with PTS < 3000ms (e.g., PTS = 3100ms, 3200ms... but frames with PTS < 3000ms may have already been piling up in the queue due to severe latency).

[0064] Understandably, since keyframes are the anchor points for decoding, keyframes earlier than 3000ms in PTS may have already been played or discarded. However, A2 primarily cleans up non-keyframes that rely on those old keyframes, as they have lost their meaning under the new benchmark and cannot be decoded correctly.

[0065] After the time base is reset, all frames with a PTS earlier than the current playback point under the new base are outdated in time and should no longer be rendered. If these backlogged, outdated frames are played, users will see prolonged slow motion or time reversal, resulting in a very poor experience.

[0066] Non-keyframes are discarded because keyframes are few in number and are the starting point for decoding, so they need to be retained to maintain the decoder state (although older keyframes earlier than the skip mark are usually no longer played, they may no longer be in the queue). Non-keyframes are numerous and depend on outdated reference frames, making decoding meaningless. Discarding them directly can quickly release the buffer, reduce decoding and rendering pressure, and is a key part of the catch-up strategy.

[0067] This avoids the player still processing old, useless frame queues after the time base is reset, enabling rapid cleanup of historical data. The core action for achieving rapid catch-up is to quickly discard a large number of frames, allowing the player's processing pipeline to focus on frames after the new time base point.

[0068] A3: Update the absolute target playback time based on the reset time base; The original time base is replaced by the newly calculated time base in step A1. For each subsequent frame to be played, the S2 logic in the independent claim is re-executed: new absolute target playback time = new time base + PTS of the frame.

[0069] Because the time base has changed, playback plans for all future frames must be recalculated to ensure they are based on the new, correct synchronization timeline. This is a necessary step to bring the player back to normal scheduling after the catch-up operation.

[0070] A4: When the updated absolute target playback time is later than the current system time, the waiting time will be recorded. In the catch-up operation (A1), a keyframe is played immediately (its new absolute target playback time is set to the current system time). However, the absolute target playback time of the next frame (e.g., a frame whose PTS is 33ms later than the keyframe) calculated based on the new baseline becomes the current system time + 33ms. This time point is in the future.

[0071] Waiting time = (new) absolute target playback time of the next frame - current system time. This time may be very short (a few milliseconds) or very long (for example, if the next frame after catching up is a keyframe and the GOP is long, the PTS interval may be hundreds of milliseconds).

[0072] A5: Divide the waiting time into multiple time segments for segmented sleep; Instead of using a single sleep for the entire waiting time, it is divided into multiple shorter time segments (e.g., every 10ms or 50ms) and then multiple sleep-wake cycles are performed.

[0073] Compared to keeping the CPU busy for extended periods, hibernation frees up CPU resources, saving energy. If a single, long hibernation occurs, any user-initiated actions during this time (such as clicking pause or dragging a progress bar) cannot be processed immediately and must wait until the hibernation ends, resulting in a frustrating user experience with a frozen interface. Segmented hibernation creates periodic checkpoints, allowing the program to check for external events (user commands, system messages) that need to be processed after each hibernation segment ends.

[0074] Segmentation requires a trade-off. If the segment is too short (e.g., 1ms), the overhead of sleep / wake system calls becomes too large; if the segment is too long (e.g., 100ms), the response latency will increase. This is an engineering optimization point.

[0075] While ensuring accurate playback timing (through waiting), it greatly improves the responsiveness of user interaction and takes energy efficiency into account.

[0076] A6: If a jump command or stop command is received during segmented hibernation, hibernation will be interrupted immediately, and the jump procedure or stop procedure will be executed.

[0077] The currently running sleep segment is immediately terminated, and the program resumes execution. It then proceeds to either the jump sequence or the playback stop sequence.

[0078] The jump process clears the current buffer, requests video data for the new position from the server, and restarts the process described in claim 1 (starting from S1, establishing a new time base with the first frame of the new position). The playback stop process stops all playback-related threads, including decoding and rendering.

[0079] This is a direct manifestation of highly responsive interaction. It ensures that even when the player is undergoing precise time-lapses, user control commands receive millisecond-level responses. Without this mechanism, users would experience noticeable lag and a very poor experience if they performed actions during the hundreds of milliseconds the player waited for the next frame.

[0080] In the embodiments corresponding to A1 to A6, conditional triggering (delay exceeding threshold + keyframe) ensures the safety and necessity of the operation. Resetting the baseline enables rapid resynchronization of the playback timeline. Intelligent frame dropping achieves efficient buffer cleanup, clearing obstacles for catching up. Transforming uninterrupted long waits into interruptible short wait sequences solves the common sluggish response problem of players during fine-tuning. It makes the entire solution focus not only on whether the playback is smooth (stuttering, delay, desynchronization), but also on whether the interaction is responsive.

[0081] As an optional embodiment of this application, after S3, B1 to B3 are also included: B1: After pausing playback, detect the command to resume playback; After the playback is paused, the player has responded to the user's pause command and stopped the S3 rendering scheduling process. At this point, the video frame is frozen, but the player's logic is still running or in a standby state.

[0082] Detect and resume playback commands, which are typically triggered by a user clicking the play button or through an API call.

[0083] This step triggers the synchronization compensation process after a pause. It clarifies that the logic in this claim is activated only in the specific scenario of resuming from a paused state.

[0084] B2: If the player resumes playback from a paused state, calculate the pause duration between the pause start time and the resume time; Pause duration = Resume time - Pause start time. This value represents the real-world time in which the video stream has been paused.

[0085] B3: Add the pause duration to the time base.

[0086] For the time base variable established in step S1 or maintained throughout the playback process, perform an addition operation: new time base = old time base + pause duration.

[0087] Absolute target playback time = frame's PTS + time base. Assume the pause occurs after frame X has played. The PTS of frame X+1 is fixed. If the time base remains unchanged, then when playback resumes, the absolute target playback time of frame X+1 will still be a point in time that has already passed, calculated according to the old time base. This will cause the player to either immediately play the backlogged frames (like fast-forwarding) or become logically inconsistent.

[0088] The compensated time base is augmented with the pause duration. This means that for the same frame (e.g., frame X+1), its new absolute target playback time is a full pause duration later than the old one. The playback schedule for frame X+1 is automatically and precisely postponed to the future (specifically, to the present (resumption time + its original interval from the pause point)). The timeline (PTS) of the video content is perfectly re-aligned with the real-world system timeline, as if no pause had ever occurred.

[0089] When the user presses the resume button, the video smoothly resumes playback from the paused frame, without any jumps or accelerations, providing a natural experience. This is crucial for audio-video synchronization. The audio track undergoes similar processing (or is processed internally by the audio driver) during pause / resume. The video adjusts its master clock (time base) in this way, ensuring that audio and video remain synchronized after resumption. With the entire playback system's time base correctly adjusted, all subsequent scheduling logic can continue operating based on the new, correct time base without any special handling.

[0090] The player employs a four-layer architecture: ArkTS UI layer: provides player components and controller interfaces, interacting with the underlying layer via SurfaceId. NAPI interface layer: provides 16 standardized interface functions, bridging ArkTS and the Native layer. Native C++ layer: includes the player's core logic, FFmpeg decoder, video renderer, and audio renderer. System graphics layer: interfaces with HarmonyOS Surface Composer to achieve hardware-accelerated rendering.

[0091] The player uses an ijkplayer-style message queue architecture: (1) Player Thread: Processes all playback commands serially, including prepareAsync, start, pause, stop, seekTo, release, etc. The eventfd event-driven mechanism is used instead of polling, which significantly reduces CPU usage.

[0092] (2) Video Decode Thread: responsible for video decoding, PTS scheduling, rendering and submission.

[0093] (3) Audio Decode Thread: Responsible for audio decoding. The decoded audio frames are passed to the audio rendering thread through a lock-free queue.

[0094] (4) Audio Render Thread: retrieves audio frames from the lock-free queue and submits them to the audio renderer.

[0095] The player uses two types of lock-free queues: (1) Player command queue: adopts a multi-producer single-consumer (MPSC) lock-free queue, which supports multiple threads to send commands and the player thread to consume commands independently.

[0096] (2) Audio frame queue: A lock-free queue with a single producer and a single consumer (SPSC) is adopted, with the audio decoding thread producing and the audio rendering thread consuming, achieving zero lock latency.

[0097] This embodiment uses the RAII (Resource Acquisition and Initialization) model to manage component lifecycles: ComponentManager uniformly manages components such as MediaSource, Decoder, and AudioRenderer; ThreadManager manages thread start and stop, and exit is achieved through the Promise / Future mechanism. The destruction order is controlled by the order of C++ object declarations, ensuring that threads stop before components are destroyed, avoiding resource contention.

[0098] Video rendering employs a zero-copy pipeline in NV12 format. The specific process is as follows: a buffer is requested from the Native Window, the buffer's memory mapping address is obtained, YUV format conversion and scaling are performed using the libyuv library (ARM NEON hardware acceleration), and the buffer is submitted to the Surface Composer. This approach avoids multiple memory copies and, combined with ARM NEON instruction set acceleration, achieves efficient rendering.

[0099] This embodiment implements SRT protocol support by integrating the FFmpeg 8.0 library. The FFmpeg libavformat module natively supports SRT protocol processing. It provides the setStreamType interface, allowing the application layer to mark the current stream as a real-time stream. In real-time stream mode, PTS absolute time scheduling and intelligent GOP cleanup are automatically enabled. In the embodiments corresponding to B1 to B3, the pause duration is precisely measured, quantifying the impact of external events on the playback timeline. Algebraic compensation is applied to the core variables to correct the player's internal clock in a minimal and elegant manner. This approach allows the discrete event of pause / resume to be smoothly incorporated into a continuous, absolute-time-based playback model. It exhibits robustness and continuity in the face of user interaction.

[0100] In the embodiments corresponding to S1 to S3, traditional video playback methods rely on frame rate (FPS) for scheduling, which often leads to discrepancies between the actual and theoretical frame presentation times due to network fluctuations and processing delays. This invention, by setting a time base and scheduling based on the actual presentation timestamp (PTS) of each frame, effectively eliminates deviations caused by frame rate calculations, ensuring that each frame plays at its predetermined absolute time, thereby significantly improving playback accuracy. By precisely controlling the playback time of video frames, this invention can greatly reduce stuttering and latency during video playback, improving the overall smoothness of the video stream. Users will enjoy a more consistent and natural viewing experience, especially in dynamic and complex scenes. The video streaming playback method of this invention can adapt to different network environments and terminal device performance. Even in situations with poor network conditions or limited device resources, accurate timestamp scheduling can still maintain video playback stability. This adaptability allows the method to be widely applied to various video application scenarios, including live streaming, video-on-demand, and video conferencing.

[0101] like Figure 3 This invention provides a playback device for real-time video streams that supports the SRT protocol. Please refer to [link / reference]. Figure 3 , Figure 3 This diagram illustrates a playback device for real-time video streams supporting the SRT protocol, as provided by the present invention. Figure 3 The device shown here supports a real-time video stream playback device that supports the SRT protocol and includes: The receiving unit 31 is used to take the difference between the current system time and the presentation timestamp corresponding to the first frame of the video stream as a time reference when the first frame of the video stream is received. The calculation unit 32 is used to add the presentation timestamp corresponding to each frame to the time base for each frame in the video stream to obtain the absolute target playback time corresponding to each frame. The rendering unit 33 is used to render the corresponding video frame when the absolute target playback time arrives.

[0102] This invention provides a playback device for real-time video streams supporting the SRT protocol. Traditional video playback methods rely on frame rate (FPS) for scheduling, which often results in discrepancies between the actual and theoretical frame presentation times due to network fluctuations and processing delays. This invention, by setting a time base and scheduling based on the actual presentation timestamp (PTS) of each frame, effectively eliminates deviations caused by frame rate calculations, ensuring that each frame plays at its predetermined absolute time, thus significantly improving playback accuracy. By precisely controlling the playback time of video frames, this invention can greatly reduce stuttering and latency during video playback, improving the overall smoothness of the video stream. Users will enjoy a more consistent and natural viewing experience, especially in dynamic and complex scenes. The video stream playback method of this invention can adapt to different network environments and terminal device performance. Even in situations with poor network conditions or limited device resources, accurate timestamp scheduling can still maintain video playback stability. This adaptability allows the method to be widely applied to various video application scenarios, including live streaming, video-on-demand, and video conferencing.

[0103] Figure 4 This is a schematic diagram of a video player provided in an embodiment of the present invention. Figure 4 As shown, a video player 4 in this embodiment includes: a processor 40, a memory 41, and a computer program 42 stored in the memory 41 and executable on the processor 40, such as a playback program for real-time video streams supporting the SRT protocol. When the processor 40 executes the computer program 42, it implements the steps in the various embodiments of the playback method for real-time video streams supporting the SRT protocol described above, for example... Figure 1 Steps 101 to 103 are shown. Alternatively, when the processor 40 executes the computer program 42, it implements the functions of each unit in the above-described device embodiments, for example... Figure 3 The function of the unit shown.

[0104] For example, the computer program 42 can be divided into one or more units, which are stored in the memory 41 and executed by the processor 40 to complete the present invention. The one or more units can be a series of computer program instruction segments capable of performing specific functions, which describe the execution process of the computer program 42 in the video player 4. For example, the specific functions of each unit of the computer program 42 can be divided as follows: The receiving unit is used to take the difference between the current system time and the presentation timestamp corresponding to the first frame of the video stream as a time reference when the first frame of the video stream is received. The calculation unit is used to add the presentation timestamp of each frame to the time base for each frame in the video stream to obtain the absolute target playback time of each frame. The rendering unit is used to render the corresponding video frame when the absolute target playback time arrives.

[0105] The video player includes, but is not limited to, a processor 40 and a memory 41. Those skilled in the art will understand that... Figure 4 This is merely an example of a video player 4 and does not constitute a limitation on a video player 4. It may include more or fewer components than shown, or combine certain components, or different components. For example, the video player may also include input / output devices, network access devices, buses, etc.

[0106] The processor 40 can be a Central Processing Unit (CPU), or other general-purpose processors, digital signal processors (DSPs), application-specific integrated circuits (ASICs), field-programmable gate arrays (FPGAs), or other programmable logic devices, discrete gate or transistor logic devices, discrete hardware components, etc. The general-purpose processor can be a microprocessor or any conventional processor.

[0107] The memory 41 can be an internal storage unit of the video player 4, such as a hard drive or memory of the video player 4. The memory 41 can also be an external storage device of the video player 4, such as a plug-in hard drive, Smart Media Card (SMC), Secure Digital (SD) card, or Flash Card equipped on the video player 4. Furthermore, the memory 41 can include both internal and external storage units of the video player 4. The memory 41 is used to store the computer program and other programs and data required by the roaming control device. The memory 41 can also be used to temporarily store data that has been output or will be output.

[0108] It should be understood that the sequence number of each step in the above embodiments does not imply the order of execution. The execution order of each process should be determined by its function and internal logic, and should not constitute any limitation on the implementation process of the embodiments of the present invention.

[0109] It should be noted that the information interaction and execution process between the above-mentioned devices / units are based on the same concept as the method embodiments of the present invention. For details on their specific functions and technical effects, please refer to the method embodiments section, which will not be repeated here.

[0110] Those skilled in the art will clearly understand that, for the sake of convenience and brevity, the above-described division of functional units and modules is merely an example. In practical applications, the above functions can be assigned to different functional units and modules as needed, that is, the internal structure of the device can be divided into different functional units or modules to complete all or part of the functions described above. The functional units and modules in the embodiments can be integrated into one processing unit, or each unit can exist physically separately, or two or more units can be integrated into one unit. The integrated unit can be implemented in hardware or as a software functional unit. Furthermore, the specific names of the functional units and modules are only for easy differentiation and are not intended to limit the scope of protection of this invention. The specific working process of the units and modules in the above system can be referred to the corresponding process in the foregoing method embodiments, and will not be repeated here.

[0111] This invention also provides a computer-readable storage medium storing a computer program that, when executed by a processor, implements the steps described in the various method embodiments above.

[0112] This invention provides a computer program product that, when run on a mobile terminal, enables the mobile terminal to implement the steps described in the above-described method embodiments.

[0113] If the integrated unit is implemented as a software functional unit and sold or used as an independent product, it can be stored in a computer-readable storage medium. Based on this understanding, all or part of the processes in the methods of the above embodiments of the present invention can be implemented by a computer program instructing related hardware. The computer program can be stored in a computer-readable storage medium, and when executed by a processor, it can implement the steps of the various method embodiments described above. The computer program includes computer program code, which can be in the form of source code, object code, executable files, or certain intermediate forms. The computer-readable medium can include at least: any entity or device capable of carrying the computer program code to a camera / video player, a recording medium, a computer memory, a read-only memory (ROM), a random access memory (RAM), an electrical carrier signal, a telecommunication signal, and a software distribution medium. Examples include USB flash drives, portable hard drives, magnetic disks, or optical disks.

[0114] In the above embodiments, the descriptions of each embodiment have different focuses. For parts that are not described in detail or recorded in a certain embodiment, please refer to the relevant descriptions of other embodiments.

[0115] Those skilled in the art will recognize that the units and algorithm steps of the various examples described in conjunction with the embodiments disclosed herein can be implemented in electronic hardware, or a combination of computer software and electronic hardware. Whether these functions are implemented in hardware or software depends on the specific application and design constraints of the technical solution. Those skilled in the art can use different methods to implement the described functions for each specific application, but such implementations should not be considered beyond the scope of this invention.

[0116] In the embodiments provided by this invention, it should be understood that the disclosed apparatus / network devices and methods can be implemented in other ways. For example, the apparatus / network device embodiments described above are merely illustrative. For instance, the division of modules or units is only a logical functional division, and in actual implementation, there may be other division methods. For example, multiple units or components may be combined or integrated into another system, or some features may be ignored or not executed. Furthermore, the coupling or direct coupling or communication connection shown or discussed may be through some interfaces; the indirect coupling or communication connection between devices or units may be electrical, mechanical, or other forms.

[0117] The units described as separate components may or may not be physically separate. The components shown as units may or may not be physical units; they may be located in one place or distributed across multiple network units.

[0118] It should be understood that, when used in this specification and the appended claims, the term "comprising" indicates the presence of the described features, integrals, steps, operations, elements and / or components, but does not exclude the presence or addition of one or more other features, integrals, steps, operations, elements, components and / or collections thereof.

[0119] It should also be understood that the term “and / or” as used in this specification and the appended claims refers to any combination of one or more of the associated listed items and all possible combinations, and includes such combinations.

[0120] As used in this specification and the appended claims, the term "if" may be interpreted, depending on the context, as "when," "once," "in response to determination," or "in response to detection." Similarly, the phrase "if determined" or "if [the described condition or event] is detected" may be interpreted, depending on the context, as meaning "once determined," "in response to determination," "once [the described condition or event] is detected," or "in response to detection of [the described condition or event]."

[0121] Furthermore, in the description of this invention and the appended claims, the terms "first," "second," "third," etc., are used only to distinguish descriptions and should not be construed as indicating or implying relative importance.

[0122] References to "one embodiment" or "some embodiments" as described in this specification mean that one or more embodiments of the invention include a specific feature, structure, or characteristic described in connection with that embodiment. Therefore, the phrases "in one embodiment," "in some embodiments," "in other embodiments," "in still other embodiments," etc., appearing in different parts of this specification do not necessarily refer to the same embodiment, but rather mean "one or more, but not all, embodiments," unless otherwise specifically emphasized. The terms "comprising," "including," "having," and variations thereof mean "including but not limited to," unless otherwise specifically emphasized.

[0123] The above-described embodiments are only used to illustrate the technical solutions of the present invention, and are not intended to limit it. Although the present invention has been described in detail with reference to the foregoing embodiments, those skilled in the art should understand that modifications can still be made to the technical solutions described in the foregoing embodiments, or equivalent substitutions can be made to some of the technical features. Such modifications or substitutions do not cause the essence of the corresponding technical solutions to deviate from the spirit and scope of the technical solutions of the embodiments of the present invention, and should all be included within the protection scope of the present invention.

Claims

1. A method for playing real-time video streams supporting the SRT protocol, characterized in that, The method for playing real-time video streams supporting the SRT protocol includes: S1: When the first frame of the video stream is received, the difference between the current system time and the presentation timestamp corresponding to the first frame of the video stream is used as the time reference; S2: For each frame in the video stream, add the presentation timestamp corresponding to each frame to the time base to obtain the absolute target playback time corresponding to each frame; S3: When the absolute target playback time is reached, render the corresponding video frame.

2. The method for playing real-time video streams supporting the SRT protocol as described in claim 1, characterized in that, Before S3, it also includes: A1: If the difference between the current system time and the absolute target playback time exceeds the delay threshold and the current frame is a keyframe, the time base is reset to the difference between the current system time and the presentation timestamp of the current frame, and a skip flag is recorded. A2: After the time base is reset, discard non-keyframes whose presentation timestamps are earlier than the skip marker.

3. The method for playing real-time video streams supporting the SRT protocol as described in claim 2, characterized in that, Following A2, it also includes: A3: Update the absolute target playback time based on the reset time base; A4: When the updated absolute target playback time is later than the current system time, the waiting time will be recorded. A5: Divide the waiting time into multiple time segments for segmented sleep; A6: If a jump command or stop command is received during segmented hibernation, hibernation will be interrupted immediately, and the jump procedure or stop procedure will be executed.

4. The method for playing real-time video streams supporting the SRT protocol as described in claim 1, characterized in that, S1 to S3 are executed in a separate video decoding thread; The player's control commands are sent through a lock-free queue of multiple producers and a single consumer, and are processed serially by an independent player main thread.

5. The method for playing real-time video streams supporting the SRT protocol as described in claim 1, characterized in that, S3 includes: S31: When the absolute target playback time is reached, request the graphics buffer from the display buffer interface and obtain its memory mapping address; S32: Convert the decoded video frame data to a format supported by the graphics buffer using a hardware acceleration instruction set; S33: Submit the processed graphics buffer to the display compositor for display; wherein the hardware acceleration instruction set is the ARM NEON instruction set, and the format supported by the graphics buffer is NV12 format.

6. The method for playing real-time video streams supporting the SRT protocol as described in claim 1, characterized in that, Before S1, it also includes: The SRT protocol is supported by integrating FFmpeg's libavformat library, and external tags are received to enable real-time streaming mode.

7. The method for playing real-time video streams supporting the SRT protocol as described in claim 1, characterized in that, Following S3, it also includes: B1: After pausing playback, detect the command to resume playback; B2: If the player resumes playback from a paused state, calculate the pause duration between the pause start time and the resume time; B3: Add the pause duration to the time base.

8. A playback device for real-time video streams supporting the SRT protocol, characterized in that, The playback device supporting real-time video streams of the SRT protocol includes: The receiving unit is used to take the difference between the current system time and the presentation timestamp corresponding to the first frame of the video stream as a time reference when the first frame of the video stream is received. The calculation unit is used to add the presentation timestamp of each frame to the time base for each frame in the video stream to obtain the absolute target playback time of each frame. The rendering unit is used to render the corresponding video frame when the absolute target playback time arrives.

9. A video player, characterized in that, The video player includes: a memory, a processor, and a playback program for a real-time video stream supporting the SRT protocol, stored in the memory and executable on the processor. The playback program for the real-time video stream supporting the SRT protocol is configured to implement the steps of the playback method for a real-time video stream supporting the SRT protocol as described in any one of claims 1 to 7.

10. A computer-readable storage medium storing a computer program, characterized in that, When the computer program is executed by the processor, it implements the steps in the method for playing a real-time video stream supporting the SRT protocol as described in any one of claims 1 to 7.