Reliable and adaptive video transmission with improved synchronization, selective retransmission of missing packets, and reduced peak-to-average ratio of frame bit rate

By dynamically estimating network jitter and adjusting video encoding parameters using the receiving device, the problem of excessively high peak-to-average power ratio in video transmission is solved, achieving low-latency and reliable video transmission.

CN122228652APending Publication Date: 2026-06-16ROKOTO LTD

Patent Information

Authority / Receiving Office
CN · China
Patent Type
Applications(China)
Current Assignee / Owner
ROKOTO LTD
Filing Date
2024-08-26
Publication Date
2026-06-16

AI Technical Summary

Technical Problem

Existing video transmission systems struggle to achieve reliable and adaptive video transmission when faced with network jitter, resulting in excessively high peak-to-average power ratios, requiring significant buffering capacity, and increasing latency.

Method used

By dynamically estimating the jitter level of the communication network through the receiving device, selectively deciding whether to wait for or immediately decode video frames, and adjusting the encoding parameters of the video encoder to reduce the peak-to-average power ratio.

🎯Benefits of technology

It achieves low-latency and reliable video transmission in network jitter environments, reduces peak-to-average power ratio, and reduces the need for buffering capacity.

✦ Generated by Eureka AI based on patent content.

Smart Images

  • Figure CN122228652A_ABST
    Figure CN122228652A_ABST
Patent Text Reader

Abstract

A video is encoded and transmitted from a transmitting device to a receiving device. The receiving device estimates a communication network jitter level that dynamically quantifies a degree of jitter of a communication network; and the receiving device receives only M out of N data packets of a particular video frame; N is greater than M; wherein N data packets are packaged and transmitted for the particular video frame. The receiving device selectively and dynamically determines, based on the communication network jitter level, whether to: (i) wait for arrival of more of the N-M data packets of the video frame that have not yet arrived, or conversely, (ii) not wait for arrival of additional data packets of the video frame, and immediately perform decoding and rendering on the video frame. The receiving device also selectively determines whether to send a retransmission request regarding a particular missing data packet. The transmitting device adjusts configuration parameters of its video encoder to reduce a peak-to-average ratio (PAR) of a required bit rate.
Need to check novelty before this filing date? Find Prior Art

Description

[0001] Cross-reference to related applications This patent application claims the benefit and priority of US 63 / 579,143, filed August 28, 2023, which is hereby incorporated herein by reference in its entirety. Technical Field

[0002] This invention relates to the field of communication systems. Background Technology

[0003] Millions of users worldwide use electronic and computing devices every day. For example, laptops, desktop computers, smartphones, tablets, and other electronic devices are used to browse the internet, consume digital content, stream audio and video, send and receive email messages, instant messaging (IM), video conferencing, play games, and more.

[0004] Many electronic devices communicate with each other or with remote servers or entities via one or more wireless communication links or networks; for example, using Wi-Fi, cellular communication, or the Internet. Some electronic devices are used to receive wireless communication signals carrying video data, allowing such devices to play streaming video or video clips on their display units. Summary of the Invention

[0005] Some embodiments provide systems, apparatus, and methods for reliable and adaptive video transmission, particularly via wireless communication networks and / or wireless communication links.

[0006] For example, a transmitting device may transmit data packets of a video frame toward a receiving device. The receiving device may dynamically and ad-hoc determine whether to continue waiting for one or more pending data packets of the current video frame in order to decode and render the specific video frame, or conversely, (II) whether to stop waiting for pending data packets of the video frame and instead immediately decode and render the video frame using the already arrived data packets of the video frame.

[0007] For example, dynamic, selective, and ad hoc determinations are performed based on the estimated or measured levels of network jitter or communication jitter in the communication channel or link; that is, the level of difference in packet delay from the transmitting device to the receiving device; or based on an estimate of the jitter or non-jitter level of the communication channel relative to a threshold of network jitter at the receiving device; or based on communication jitter-related characteristics estimated or measured by the receiving device. Decisions are then made on a frame-by-frame basis based on the most recent network jitter estimate.

[0008] Alternatively, some embodiments may dynamically and selectively determine (at the receiving device) whether to request the retransmission of one or more missing / lost / erroneous data packets. The decision is made on a frame-by-frame basis. For example, the decision may be based on (or take into account) factors such as: the importance of the missing data packet to the decoding of the current video frame; the level of network jitter; the time elapsed waiting for the retransmission of this frame or other frames; and / or other data.

[0009] Alternatively, some embodiments may iteratively or continuously or periodically or repeatedly or gradually modify / adjust the encoding parameter values ​​or operating settings or coefficients of the video encoder to achieve video transmission that results in a low peak-to-average bit rate (PAR) per frame for the receiving device; such as a PAR of less than 2.0 or less than 1.50 or less than 1.25 per frame.

[0010] In some embodiments, video is encoded and transmitted from a transmitting device to a receiving device. The receiving device estimates the communication network jitter level, which dynamically quantifies the degree of jitter in the communication network; and the receiving device receives only M out of N data packets for a specific video frame; N is greater than M; wherein N data packets are packaged and transmitted for the specific video frame. Based on the communication network jitter level, the receiving device selectively and dynamically determines whether to wait for the arrival of more of the NM data packets of the video frame that have not yet arrived, or conversely, (ii) whether to not wait for the arrival of the other data packets of the video frame and immediately perform decoding and rendering on the video frame. The receiving device also selectively determines whether to send a retransmission request for a specific missing data packet. The transmitting device adjusts the configuration parameters of its video encoder to reduce the peak-to-average power ratio (PAR) of the desired bit rate.

[0011] Some embodiments may provide other and / or additional advantages and / or benefits. Attached Figure Description

[0012] Figure 1 These are schematic illustrations of a system according to some illustrative embodiments of the present invention.

[0013] Figure 2A This is a schematic illustration of a graph showing a high PAR value per video frame bitrate in an embodiment that does not utilize the peak-to-average ratio (PAR) reduction scheme of the present invention.

[0014] Figure 2B This is a schematic illustration of a graph showing a low PAR value per video frame bitrate in an embodiment utilizing the peak-to-average ratio (PAR) reduction scheme of the present invention. Detailed Implementation

[0015] refer to Figure 1 The figure is a schematic block diagram illustrating a system 100 according to some illustrative embodiments. The system 100 includes a transmission device 110 capable of communicating with a receiving device 150 via a wireless communication link, and particularly via an Internet Protocol (IP)-based communication link or a User Datagram Protocol / Internet Protocol (UDP / IP) communication link.

[0016] Transmission device 110 stores, receives, or accesses source video or input video 101 intended to be transmitted and delivered to receiving device 150. For example, the source / input video may be a pre-recorded or pre-stored video or audio / video file; or it may be a real-time or generally real-time video feed from a local or nearby or shared camera or imager or video capture device; or it may be a video feed or video stream that transmission device 110 obtains or downloads or receives from another device or remote server before and / or during the transmission of the video to receiving device 150. In some embodiments, optionally, the source / input video may be a video wholly or partially generated locally or remotely by a generative artificial intelligence (Gen-AI) unit or system, such as a video generated based on prompts or commands or using other methods.

[0017] In some embodiments, the transmission device 110 includes a video encoder 111 that performs video encoding (or re-encoding or transcoding) on ​​the source / input video, for example, using HEVC, H.265, H.264-SVC, H.264, AVC, or other suitable video compression or encoding standards. Optionally, a frame grouping unit 112 may perform frame grouping such that each group of frames utilizes the same single forward error correction (FEC) word or FEC code or Reed-Solomon (RS) word. A packetizing unit and an FEC encoder / RS encoder 113 process the packetizing of encoded frame data and the addition of FEC codes or RS words. The transmitter 114 transmits the data packets to the receiving device 150.

[0018] At receiving device 150, receiver 151 receives incoming data packets; not all transmitted data packets are necessarily actually received at receiver 151. Therefore, missing data packet detector 152 operates to keep track of arriving data packets and detects missing data packets based on packet sequence numbers; optionally, an erase vector generator and updater 153 are used, which utilize vectors representing consecutive data packets, such that a value of 1 indicates a missing data packet and a value of 0 indicates a received data packet. Then, unpacking unit and FEC / RS decoder 154 operate to unpack the data and perform FEC or RS decoding on the arriving data packets. Optionally, the FEC / RS decoder can be configured to utilize erase indications; for example, some RS decoders can correct up to floor(NK) / 2) errors without receiving an erase indication, but can correct up to (NK) errors if an erase indication is given regarding the location of these errors. The frame degrouping unit 155 can degroup a set of frames into discrete frames, and the video decoder 156 performs decoding on each video frame (e.g., HEVC, H.265, H.264-SVC, H.264, or AVC decoding), thereby producing an output video 157, which can be output or displayed to a user via a screen, monitor, or other display unit, and / or can be further relayed or otherwise displayed to other receivers.

[0019] Some embodiments utilize improved video synchronization between the transmitting and receiving devices. For example, the receiving device is configured to estimate, measure, or calculate the video timing of the transmitter, as well as network jitter and jitter characteristics (e.g., average jitter, median jitter, root mean square jitter (RMS), standard deviation of jitter (SD), jitter variance, or other jitter characteristics). At the receiving device, a jitter characteristic estimator 158 is configured to estimate or measure such jitter characteristics; and a channel jitter score 159 can be generated, for example, indicating the degree of jitter in the communication channel (or transmission or rejection); such as relating to the most recent T seconds (e.g., the most recent 10, 30, or 45 seconds), or relating to the most recent P packets or F frames (e.g., the most recent 5, 16, or 32 packets or frames). For example, the generated jitter score can be in the range of 0 to 100, where 100 indicates a highly jittered communication channel (or transmission or reception), and where 0 indicates no jitter or minimal jitter that does not adversely affect video transmission / reception. In other implementations, network jitter scores may be expressed in milliseconds or other suitable units; and / or may be expressed in absolute units (e.g., an average network jitter of 74 milliseconds over the past 10 seconds or the most recent 300 video frames), or may be normalized or expressed in relative terms or scales or percentages. In some embodiments, measured or recent or calculated or average network jitter values ​​or characteristics thereof (e.g., their average over the past T seconds or the past N frames, or their midpoint, or their maximum value, etc.) may be categorized into several predefined ranges or jitter level bins; for example, jitter level bin 0 = no network jitter, jitter level bin 1 = 1 to 10 milliseconds of network jitter, jitter level bin 2 = 11 to 20 milliseconds of network jitter, jitter level bin 3 = 21 to 30 milliseconds of network jitter, jitter level bin 4 = 31 to 40 milliseconds of network jitter, and jitter level bin 5 = greater than 40 milliseconds of network jitter. Other schemes may also be used to represent or indicate network jitter or its characteristics.

[0020] If the channel jitter score estimated or measured at the receiving device enables the receiving device to make improved decisions and dynamically determine whether to continue waiting for unarrived data packets of a particular video frame and not yet begin decoding the video frame based on the data packets that have already arrived, or conversely, whether to stop waiting or abandon waiting for additional or missing data packets of the particular video frame and immediately perform decoding of the video frame based on the data packets that have already arrived, without waiting (or without further waiting if a waiting period has already been waited) for missing data packets of the video frame.

[0021] In some embodiments, channel jitter scoring can be used for one or more other and / or additional decisions or determinations at the receiving device, and / or can trigger one or more other and / or additional operations at the receiving device, such as (i) whether to wait for additional data packets to arrive, or instead decode and render the current frame without lost data packets or without data packets that have not yet arrived, as described above; and / or (ii) whether to request the retransmission of a specific data packet that has not yet arrived, since the request for retransmission can be avoided or skipped if the receiving device has determined that it is no longer waiting for missing / lost data packets and immediately decodes and renders the video frame based on the data packets that have already arrived.

[0022] For example, the wait-or-immediate frame rendering determination unit 160 can dynamically and temporarily decide whether to wait for lost / missing / additional data packets to arrive in order to decode and render the frame, or conversely, avoid waiting and / or abandon the waiting already in progress, and immediately decode and render the frame based on the data packets that have arrived. For example, if the channel jitter score is greater than a predefined jitter threshold 161, such as 60 in the range of 0 to 100, then it is decided to abandon the waiting and immediately decode and render the frame; conversely, if the channel jitter score is equal to or less than the predefined jitter threshold 161, then it is decided to wait or continue waiting for the missing data packets of the frame, and postpone or have not yet started the decoding or rendering of the frame.

[0023] Optionally, the wait-or-immediate frame rendering determination unit 160 may include a watchdog timer 162 or may operate in conjunction with it to ensure and force that the wait period for one or more missing data packets does not continue indefinitely and does not exceed a predefined maximum value (e.g., T milliseconds) for the wait period. For example, if T or more milliseconds have elapsed since reaching a certain milestone (e.g., since the first arriving data packet of the frame was received; or in other embodiments, since the last arriving data packet of the frame was received), the waiting for missing data packets is stopped, and the wait-or-immediate frame rendering determination unit 160 determines to immediately decode and render the frame based on the data packets that have arrived, without any further waiting.

[0024] In some embodiments, the above decision (waiting for the arrival of a missing data packet or immediately decoding and rendering the frame) is made on a frame-by-frame or frame-by-frame basis, either "temporarily" or dynamically, such that the decision on frame 5 does not necessarily affect the decision on frame 6, and is not necessarily generated by the decision on frame 4. For example, in frame number 17, if packets 1-6 and 8-9 arrive, the receiving device may decide to wait for the possible arrival of packet 7 for "more than 0.7 milliseconds" before abandoning and starting to decode frame 17; conversely, one minute later, in frame number 222, if packets 1-6 and 8-9 arrive, the receiving device may decide to wait for the possible arrival of packet 7 for "more than 0.3 milliseconds" before abandoning and starting to decode frame 222; and conversely, two minutes later, in frame number 555, if packets 1-6 and 8-9 arrive, the receiving device may decide not to wait for the possible arrival of packet 7 at all, and immediately abandon and start decoding and rendering frame 555; because the decision for each frame is based on (or takes into account) the current network jitter level, which may change every minute or sometimes at shorter time intervals.

[0025] In some embodiments, the exact timing of requesting (or performing) packet retransmission is also determined based on the jitter / non-jitter behavior exhibited by the communication network / channel, or based on the characteristics of network jitter.

[0026] For example, the selective retransmission request determination unit 163 can dynamically and ad hocly decide whether to request the retransmission of one or more missing / lost / undelivered data packets; by considering or based on a channel jitter score estimated or measured at the receiving device, and in coordination with (or based on) the current or most recent decision of the wait or immediate frame rendering determination unit 160. For example, if the wait or immediate frame rendering determination unit 160 has just determined to wait or continue waiting for a particular undelivered data packet, the selective retransmission request determination unit 163 will trigger the retransmission requester unit 164 to immediately send (transmit) a retransmission request (retransmission request) related to the lost or undelivered data packet from the receiving device 150 to the transmitting device 110. Conversely, if the wait or immediate frame rendering determination unit 160 has determined not to continue waiting for a particular undelivered data packet, the selective retransmission request determination unit 163 will avoid triggering a retransmission request (or avoid triggering an additional retransmission request for any undelivered data packet of that frame).

[0027] Therefore, some embodiments may utilize selective retransmission of missing / lost data packets, such that not every data packet lost / missing at the receiving device is affected by a retransmission request (from the receiving device) or a retransmission operation (at the transmitting device); or, selectively sending retransmission requests requested by the receiving device from the transmitting device, such that not every data packet determined by the receiving device to be missing / lost (e.g., based on gaps in normally consecutive data packet sequence numbers) is affected by a retransmission request.

[0028] For example, the receiving device receives data packets 1, 2, 4, 5, 6, 8, and 9. In this example, two data packets are lost en route: data packet 3 and data packet 7. The receiving device selectively decides which data packets to request retransmission and which to "let go" and not request retransmission. The receiving device makes this provisional decision by considering the level of network jitter and / or based on whether the missing data packets are critical / important / essential for decoding the video frames; or whether the information in the missing data packets can be (fully or partially) obtained by using redundant data / FEC data / RS data / error protection data.

[0029] For example, in the example above, the receiver may selectively request the retransmission of data packet 3 (because data packet 3 is crucial for decoding the video frame) and will not request the retransmission of data packet 7 (because data packet 7 is not important for decoding the video frame, and / or because the content of data packet 7 can be partially or fully completed using FEC / RS data or other redundant / error-protected data); and once data packet 3 is retransmitted and actually arrives at the receiving device, the receiving device will decode the video frame based on data packets 1-6 and 8-9 without using data packet 7. In some embodiments, additionally, if data packet 3 does not arrive within a specific time slot, the receiving device may determine—optionally by taking into account the level of network jitter (which may be greater than a predefined threshold)—to “abandon” waiting for the arrival of the missing data packet 3, and decode and render the frame based on the arrived partial data (data packets 1-2, 4-6, and 8-9), if such decoding is possible. It should be noted that in some cases, the frame can be decoded without missing packet 3 (which is considered crucial / essential for decoding) because packet 7 has already arrived at the receiving device, and the frame can now be decoded without actually receiving the missing packet 3, without having to wait for packet 3 further.

[0030] Therefore, some embodiments may provide a dynamic re-evaluation mechanism that dynamically re-evaluates at a high-granularity level (e.g., every 1 millisecond, 5 milliseconds, or 10 milliseconds) whether to continue waiting for missing packets of a given frame, or conversely, to immediately decode and render the frame without waiting for additional packets; and / or whether to send a retransmission request for one, some, or all of the missing packets of a given frame. Optionally, the decision to "decode immediately or wait for missing packets" may be dynamically re-evaluated and made every T milliseconds, or after a triggering event (e.g., after the arrival of a missing packet belonging to the frame), or after T milliseconds since the most recent packet of the frame was received, or based on one or more other conditions or criteria.

[0031] In some embodiments, optionally, the packet arrival status log 165 may be managed and dynamically updated by the packet arrival status log updater 166, and its contents may be used by the wait or immediate frame rendering determination unit 160 and / or the selective retransmission request determination unit 163 to make decisions. For example, the packet arrival status log may indicate: packet 1 of frame 47 has arrived (value = 0); packet 2 of frame 47 has not arrived (value = 1); packet 2 of frame 47 has not arrived and a retransmission request has been issued for it (value = 2); and / or other or additional data. In some embodiments, optionally, the packet arrival status log may further indicate the number of retransmission requests sent for a particular packet; the timestamp or time point at which each retransmission request was sent; and / or other data that may be used by the wait or immediate frame rendering determination unit 160 and / or the selective retransmission request determination unit 163 to make decisions.

[0032] Therefore, some embodiments may utilize selective and ad hoc decisions at the receiving device to determine whether to (i) continue waiting for one or more incoming data packets before decoding a particular (current) video frame, or (ii) to immediately abandon the current video frame upon the arrival of additional data packets and immediately begin decoding and rendering the video frame based on the already arrived data packets, without waiting for additional data packets for the video frame that have not yet arrived. The decision is based on the degree of jitter in the communication channel / network regarding data packet arrival.

[0033] As a first example, the receiving device receives data packets 1, 2, 3, 4, 5, 6, 8, and 9. It discovers that data packet 7 is missing; data packet 7 has not arrived. The receiving device sends a request (to the transmitting device) to retransmit the missing data packet 7. Over time, the receiving device dynamically determines, on a provisional basis, the timing of "abandoning" the waiting for the missing data packet: (I) whether to "abandon" and stop waiting when the missing data packet 7 arrives, and immediately decode and render video frames based on data packets 1-6 and 8-9, or alternatively, (II) whether to continue waiting for the missing data packet 7 to arrive.

[0034] One factor in the decision-making process at the receiving device is the estimated / measured / known level of jitter in (or otherwise) the transmission or communication channel. If the transmission channel has recently (e.g., within the last T seconds, such as within the last 10 seconds) exhibited highly jittery behavior (e.g., some packets arrive immediately, and some packets arrive with a large delay greater than D milliseconds), this information supports the decision to wait longer for the missing packet 7 to arrive before decoding the frame. Conversely, if the transmission channel has recently exhibited non-jittery or low-jittery behavior (e.g., all or most packets arrive immediately, with few or no missing / delayed packets), this information supports the decision not to wait for the missing packet 7 to arrive, but to immediately decode and render the video frame without packet 7.

[0035] According to some embodiments, communication channel jitter or communication network jitter indicates undesirable, excessive, or excessive variations in the time delay between transmitting and receiving signals (or data packets) over a network connection. In other words, network jitter is the difference in delay between data packets transmitted over the network. The jitter level can indicate the level of interruption in the order in which data packets arrive (or leave) from the transmitting device. Network jitter or communication channel jitter that is measured or estimated can be of various types, such as: constant jitter, indicating a roughly constant level of variation in data packet delay; transient jitter, indicating a large delay in a single data packet; short-term jitter, indicating a large delay in a certain number of data packets; or a combination or mixture of two or more types of jitter over time.

[0036] Network jitter, communication channel jitter, or packet timing delay jitter can be measured or estimated using one or more suitable methods; for example, by using the round-trip time (RTT) of a series of packets originating from the same transmission device 110; by measuring the variation in transmission time between two endpoints in the network; by estimating or measuring the bandwidth of the network / communication link, which can help estimate the jitter level; by using a ping operation, which takes the difference between the propagation times of two consecutive packets and calculates their average to obtain the average jitter in the network; and / or by measuring or estimating the average of the absolute differences between the expected and actual arrival times of packets in other ways, typically measured in milliseconds.

[0037] For the purpose of estimating, detecting, and / or measuring the jitter level of a communication channel, some embodiments may optionally utilize one or more elements or methods described, for example, in one or more of the following publications, which are hereby incorporated in their entirety by reference: US Patent 8,537,951 B2 entitled "Detection of jitter in a communication network"; and / or Korean Patent Application Publication No. KR 10-2013-0009670 A entitled "Packet transmission apparatus and method, and packet reception apparatus and method in MMT system"; and / or other suitable means.

[0038] Alternatively or concurrently, regarding the transmission and / or reception of video, and / or the encoding and / or decoding of video, some embodiments may utilize one or more units or methods described, for example, in one or more of the following publications, which are hereby incorporated in their entirety by reference: US Patent 11,490,140 entitled "System, device, and method for robust video transmission utilizing user datagram protocol (UDP)"; and / or US Patent Application Publication No. 2022 / 0060767 A1 entitled "System, device, and method for robust video transmission utilizing user datagram protocol (UDP)".

[0039] In some embodiments, the communication protocol between the transmitting and receiving devices can be further configured to dynamically modify or adjust the values ​​or coefficients or operating points of one or more mechanisms or parameters of the FEC / RS encoder and / or video encoder (e.g., including modifying or determining its constant bit rate parameter value); by taking into account the estimated / measured bandwidth and / or network jitter level and / or packet error rate (PER) or packet loss rate or other quality or fault indicators of the communication channel / link. Data about these parameters, or data useful for adjusting, modifying or determining these parameters, can be transmitted from the receiving device to the transmitting device in the uplink, or can be provided in a feedback or control channel from the receiving device to the transmitting device, enabling adaptive video streaming that adjusts immediately based on the instantaneous or contemporary bandwidth / PER / jitter level of the communication link, and enabling the system to dynamically modify the operating parameters of the CBR video encoder and / or FEC / RS encoder based on instantaneous or contemporary changes in the bandwidth / PER / jitter level of the communication link.

[0040] In some embodiments, the system can provide rapid recovery from severe network problems such as communication interruptions of 0.5 seconds or 1 second; because the receiving device can continue rendering the last decodeable video frame, and will quickly recover based on the last known estimate once the network interruption ends.

[0041] Some embodiments may further utilize, provide, or enforce a lower or relatively low peak-to-average ratio (PAR), which is a driving factor for achieving or promoting low latency. For example, in a communication channel with a generally stable bandwidth that the system wants to fully utilize, a large PAR will require buffering and thus increase latency. In some embodiments, the video encoder is tuned, or its parameters or coefficients or operating point are modified or dynamically set to provide an operating point with a low PAR below a predefined threshold.

[0042] In an illustrative embodiment, the receiving device may include: (I) a bandwidth estimator 167 configured to continuously or periodically estimate or measure the actual or effective bandwidth of the communication channel / link between the transmitting and receiving devices; and / or (II) a packet error rate (PER) measurement unit 168 configured to measure or estimate PER, such as by calculating the number of erroneous packets after FEC divided by the total number of received packets.

[0043] Alternatively, the transmission device may include a peak-to-average power ratio (PAR) measurement unit 118, which is configured to measure the actual PAR for each frame and / or the average PAR value of the most recent N frames (e.g., the most recent 10, 50, or 300 frames); although in some embodiments, the PAR measurement unit may be located at a receiving device that may periodically transmit the measured PAR values ​​to the transmission device via an uplink or feedback channel or control channel.

[0044] The estimated bandwidth value and / or the measured PER value and / or the most recent PAR value or the average PAR (if calculated at the receiving device) and / or the estimated channel jitter score 159 value can be periodically transmitted from the feedback channel transmitter 169 of the receiving device 150 to the feedback channel receiver 115 of the transmitting device.

[0045] Based on the values ​​of the above parameters, or some of them, or all of them, the transmission device can configure, modify, or adjust the operating settings or coefficients of its video encoder 111. For example, the VE setting modification unit 116 can modify, set, adjust, increase, or decrease the values ​​of one or more parameters, coefficients, or thresholds of the video encoder based on feedback indicating bandwidth and / or PER and / or PAR and / or network jitter.

[0046] In some embodiments, modifications or adjustments to the video encoder parameters may be performed iteratively or incrementally, such that after each incremental / iterative modification of the video encoder settings, updated parameter values ​​are received again as new feedback in an attempt to achieve or converge toward a high PAR during a dynamic and adaptive process, or toward a PAR greater than a predefined threshold, or toward a PAR within a specific threshold range. Optionally, such modifications or adjustments to parameter values ​​or coding coefficients may be managed, controlled, determined, and / or performed by a PAR reduction unit 117, which may execute an algorithm or a set of rules for incremental / iterative / repeated modification of the values ​​until the CBR stabilizes near a desired target, and may continue to perform such modifications to maintain the CBR within a limited band near the target CBR value (e.g., within 5%, 10%, or 20% below or above the target CBR value).

[0047] refer to Figure 2AThe figure described is a schematic illustration of Figure 210, showing a high PAR value in an implementation that does not utilize the PAR reduction scheme of the present invention. The horizontal axis indicates the frame index number. The vertical axis may indicate kilobits (per frame or per second), or may indicate the required / utilized bit rate or bandwidth (e.g., in kbps). It can be observed that the initial frame produces a high peak of the required bit rate or bandwidth before stabilizing near an overall constant (and low) bit rate; thus producing a high PAR value, which in turn requires a large / excessive amount of buffering capacity. For example, the highest bit rate peak exceeds 16,000 kbps; while the overall constant bit rate stabilizes near 3,000 kbps; making the PAR value approximately 16 / 3, or approximately 5.33; or making the peak bit rate more than five times the overall stable CBR or the desired / effective / average CBR value.

[0048] refer to Figure 2B The figure described is a schematic illustration of Figure 220, showing a low PAR value in an embodiment utilizing the PAR reduction scheme of the present invention. The horizontal axis indicates the frame index number. The vertical axis may indicate the corresponding bit rate (or required bandwidth) in kilobits (e.g., per frame), or may indicate the required / utilized bit rate or bandwidth (e.g., in kbps). The darker and thicker line indicating "target" rises immediately from zero to the target bit rate in the first frame and remains horizontal and linear (indicating constant bit rate, CBR); it indicates (or may correspond to) the target constant bit rate (CBR), where the actual bit rate is expected or desired to stabilize near the target constant bit rate. The gray indicator, labeled "Result," represents the actual bitrate for each video frame; it matches the horizontal target line without any peaks exceeding it; it is typically below the horizontal target line, and sometimes exceeds it by no more than 10%, 15%, or 20% (e.g., the CBR target line is 2,400 kbps; the gray actual result line peaks at around 2,640, about 10% higher than the CBR's 2,400). Therefore, Figure 2B The PAR value shown in Figure 220 is approximately 1.10. Some implementations can similarly achieve PAR values ​​lower than 1.05, or lower than 1.10, or lower than 1.15, or lower than 1.20, or lower than 1.25, or lower than 1.50 or lower than 2.0.

[0049] It should also be noted that the terms CBR or constant bit rate, as used in this article, remain correct even though the actual / effective bit rate is not mathematically constant at all times and at all frames; rather, the terms CBR or constant bit rate indicate that the video encoder is configured to encode video frames at an overall constant or similar bit rate per frame, such that the bit rate required for each video frame is 5% or 10% or 20% or 25% higher or lower than the expected overall constant bit rate, and there are no large peaks or drops greater than 50% higher or lower than such expected overall constant bit rate.

[0050] It should be noted that the data shown in Figure 220 may correspond to video data of the “coarse” or “basic” layer of a video coding scheme (such as H.265), or to video data of the “fine” or “detailed” layer of a video coding scheme (such as H.265), or to a combination of coarse and fine video data of such multi-layer / multi-channel video coding schemes.

[0051] It can be observed that the initial frames do not produce the required peak bit rate or bandwidth; instead, these frames quickly stabilize near a generally constant (and low) bit rate, resulting in low PAR values, which in turn do not require a large / excessive amount of buffering capacity.

[0052] Return to reference Figure 1 It is clear that each unit of system 100 can be implemented using hardware components and / or software components. In some embodiments, the units or devices of system 100 may include or be implemented using other suitable components, such as: processor, central processing unit (CPU), digital signal processor (DSP), graphics processing unit (GPU), processing core, controller, logic unit, memory unit (e.g., random access memory (RAM), flash memory), storage unit (e.g., hard disk drive, solid-state drive, optical drive), input unit (e.g., keyboard, keypad, mouse, trackball, audio microphone, touch screen), output unit (e.g., screen, touch screen, audio speaker), power supply (e.g., battery, power battery, connection to mains power), one or more wired and / or wireless transceivers (e.g., Wi-Fi transceiver, Bluetooth transceiver, cellular transceiver), housing housing some or all of the components of the device, operating system (OS) with drivers and applications, and / or other suitable hardware and / or software components.

[0053] Some embodiments provide a method for transmitting video from a transmitting device to a receiving device via an Internet Protocol (IP) communication link. For example, the method includes: (a) at the receiving device, estimating a communication network jitter level, the communication network jitter level dynamically quantifying the jitter level of the communication network between the transmitting device and the receiving device; (b) at the receiving device, receiving M data packets out of N data packets of a specific video frame, where N is greater than M, and where N data packets are being packaged and transmitted from the transmitting device (or, have been packaged and transmitted; or are currently being packaged and / or transmitted, and / or are currently being packaged and / or being transmitted by the transmitting device) as a representation of the specific video frame; (c) at the receiving device, based on the communication network jitter level, selectively and dynamically determining whether to: (c1) wait for the arrival of one or more of the NM data packets of the specific video frame that have not yet arrived before decoding and rendering the specific video frame, or conversely, (c2) not wait for the arrival of the other data packets of the specific video frame and immediately perform decoding and rendering of the specific video frame.

[0054] In some embodiments, the determination in step (c) is performed on a per-frame and / or frame-by-frame basis, such that the determination across two or more consecutive or non-consecutive frames is dynamically different or can be dynamically different. For example, in video frame 14, only 6 out of a total of 9 video packets have arrived at the receiving device, and the receiving device—based on a high level of communication network jitter (e.g., greater than or equal to a predefined threshold)—decides not to wait for the remaining 3 video packets (or one or more of them), and immediately decodes and renders the video frame based on the 6 video packets that have arrived; conversely, in video frame 15 (consecutive frame) or video frame 93 (non-consecutive frame), only 8 out of 10 video packets have arrived at the receiving device, and considering the presence of a low level of communication network jitter (e.g., because it is below a predefined threshold), the receiving device may decide to continue waiting for one or both of the two unarrived packets of the frame, and not yet decode / render the video frame.

[0055] Alternatively, in some embodiments, the dynamic and selective decisions or determinations at the receiving device are performed on a per-packet basis within the same video frame, or on a per-packet basis, or on a per-video-packet basis; and the determination is re-performed or re-evaluated as needed after each additional video packet of the same video frame arrives. For example, in video frame 37, the transmitting device encodes and transmits a total of nine video packets to represent the frame. After receiving six of those nine video packets, the receiving device performs an evaluation, taking into account that the current level of the network jitter buffer is low (e.g., below a predefined threshold), and determines to continue waiting for additional video packets of video frame 37. A few milliseconds later, for video frame 37, a seventh video packet arrives at the receiving device; and after receiving seven of those nine video packets of the video frame, the receiving device again performs a new evaluation, taking into account that the current level of the network jitter buffer is still low (e.g., below a predefined threshold), and determines to continue waiting for additional video packets of video frame 37. A few milliseconds later, for video frame 37, the eighth video data packet arrives at the receiving device; and after receiving 8 of the 9 video data packets for the video frame, the receiving device performs a new evaluation, but this time it takes into account the high level of the current network jitter buffer (e.g., equal to or greater than the predefined threshold), and it determines to stop waiting for the remaining video data packets for video frame 37, and it determines to immediately decode and render video frame 37 based on the 8 video data packets that have arrived.

[0056] In some embodiments, the N data packets together constitute a complete representation of the particular video frame as encoded by the transmitting device; and the method includes: at the receiving device, if the communication network jitter level is greater than a predefined threshold, determining not to wait for the arrival of the NM data packets (i.e., N minus M data packets) of the particular video frame that have not yet arrived, and determining to immediately perform decoding and rendering on the particular video frame without waiting to receive the N data packets as the complete representation of the particular video frame.

[0057] In some embodiments, the step (c) of selectively and dynamically determining is performed on a frame-by-frame basis at the receiving device; wherein the value of N is different or may be different across two consecutive video frames; wherein the value of M is different or may be different across two consecutive video frames.

[0058] In some embodiments, the method includes: (A) when receiving and decoding a first video frame F1 for which the transmitting device encodes a total of N1 video data packets, the receiving device receives only M1 video data packets out of the total N1 video data packets, and immediately decodes and renders the first video frame F1 after receiving the M1 video data packets, without waiting to receive one or more video data packets (i.e., N1 minus M1 video data packets) of the first frame F1 that have not yet arrived; (B) when receiving and decoding a second video frame F2 for which the transmitting device encodes a total of N2 video data packets, the receiving device receives only M2 video data packets out of the total N2 video data packets, and immediately decodes and renders the second video frame F2 after receiving the M2 video data packets, without waiting to receive one or more video data packets (i.e., N2 minus M2 video data packets) of the second frame F2 that have not yet arrived; wherein N1 is different from M2; wherein M1 is different from M2.

[0059] In some embodiments, the N video data packets are a complete representation of the specific video frame; wherein the value of N is different across different video frames of the same video transmitted from the transmitting device to the receiving device.

[0060] In some embodiments, the value of N is explicitly communicated, relayed, or transmitted from the transmitting device to the receiving device; in each video frame or per video frame; such that the value of N may be encoded or stored within the first video data packet of the video frame, or within a header or footer region of one or more video data packets of the video frame; or, the value of N may be communicated or transmitted from the transmitting device to the receiving device as part of an additional short data packet, or as part of control information accompanying the video data packets of the video frame. Optionally, in some embodiments, the receiving device may autonomously or independently estimate the value of N (e.g., estimate the total number of video data packets representing a complete video frame) without explicitly providing the value of N; for example, the receiving device may monitor and observe an average of 9 video data packets per frame and / or a median of 9 video data packets per frame in each of the most recent 90 frames, and therefore the receiving device may estimate that N for the currently being received frame number 91 is also 9.

[0061] In some embodiments, the determination in step (c) is performed dynamically on a per-video-data-packet basis and is performed multiple times while the receiving device continues to receive video data packets for the particular video frame; wherein for the particular video frame, the transmitting device encodes a total of N video data packets into a complete representation of the particular video frame; wherein after receiving only M1 video data packets out of the total N video data packets, the receiving device determines to continue waiting for the unarrived video data packets for the particular video frame; and wherein conversely, after receiving only M2 video data packets out of the total N video data packets, wherein M2 is greater than M1 but wherein M2 is less than N, the receiving device determines to stop waiting for the unarrived video data packets for the particular video frame and determines to immediately decode and render the particular video frame.

[0062] In some embodiments, the determination in step (c) is performed dynamically on a per-video-frame basis and is performed multiple times while the receiving device continues to receive video data packets for the particular video frame; wherein for the particular video frame, the transmitting device encodes a total of N video data packets into a complete representation of the particular video frame; wherein after receiving only the first portion P1 of the total N video data packets, the receiving device determines to continue waiting for the unarrived video data packets for the particular video frame; and wherein, conversely, after receiving only the second portion P2 of the total N video data packets, where P2 is greater than P1, the receiving device determines to stop waiting for the unarrived video data packets for the particular video frame and determines to immediately decode and render the particular video frame.

[0063] In some embodiments, the determination in step (c) includes: if the communication network jitter level is greater than a predefined network jitter value, then performing the step of immediate decoding and rendering (c2); otherwise, performing the step of waiting for the arrival of one or more additional data packets of the particular video frame (c1).

[0064] In some embodiments, after performing step (c1) of waiting for the arrival of one or more additional data packets for the particular video frame, the method further includes: if a predefined time period has elapsed since the arrival of the latest data packet for the particular video frame, stopping the waiting and switching to step (c2) of immediately decoding and rendering the particular video frame.

[0065] In some embodiments, step (c), which is selectively and dynamically determined, is performed on a per-frame basis and exclusively based on: (i) the number M of data packets that have arrived for the particular video frame, and (ii) the value of the communication network jitter level as estimated by the receiving device.

[0066] In some embodiments, step (c), which is selectively and dynamically determined, is performed on a per-frame basis and is independent of and does not depend on the arrival or absence of data packets in any video frame preceding the particular video frame.

[0067] In some embodiments, step (c), which is selectively and dynamically determined, is performed on a per-frame basis and is independent of and does not depend on the arrival or absence of data packets in any video frame following the particular video frame.

[0068] In some embodiments, the selective and dynamic determination in step (c) is performed on a frame-by-frame basis and is independent of and does not depend on the importance of the decoding purpose of the payload carried by the NM packets that have not yet arrived for the particular video frame.

[0069] In some embodiments, the method further comprises: (d) at the receiving device, on a per-frame basis and taking into account the jitter level of the communication network, selectively and dynamically determining whether to send a retransmission request for missing data packets for the particular video frame from the receiving device to the transmitting device, or conversely, (d2) whether to skip sending a retransmission request for the missing data packets for the particular video frame from the receiving device to the transmitting device.

[0070] In some embodiments, the method further comprises: (d) at the receiving device, selectively and dynamically determining, on a per-frame (or frame-by-frame) and also per-data packet (or per-data packet) basis, whether to send a retransmission request for a missing data packet for the particular video frame from the receiving device to the transmitting device, or conversely, (d2) whether to skip sending a retransmission request for the missing data packet for the particular video frame from the receiving device to the transmitting device; wherein the step of selectively and dynamically determining, on a per-frame and also per-data packet basis, comprises: (I) for the particular video frame, selectively sending from the receiving device a first retransmission request for a first specific data packet of the particular video frame to request retransmission of the particular video frame, and (II) also for the same particular video frame, selectively skipping sending from the receiving device a second retransmission request for a second different specific data packet of the particular video frame.

[0071] In some embodiments, the method includes selectively and dynamically determining, on a per-frame basis and also on a per-data packet basis, whether to send a retransmission request for a specific missing data packet for the video frame based on an estimate at the receiving device regarding the importance of the specific missing frame for the successful decoding of the specific video frame.

[0072] In some embodiments, the method further includes: (d1) measuring the bitrate of a video frame generated by the video encoder of the transmission device using a constant bitrate (CBR) video coding scheme; (d2) determining the peak value of the bitrate of the video frame; (d3) determining the overall constant bitrate value of the video frame; (d4) determining the peak-to-average power ratio (PAR) value of the video frame by dividing the peak value by the overall constant bitrate value; and (e) dynamically modifying the operating settings of the video encoder at the transmission device to reduce the bitrate PAR value. In some embodiments, the method includes iteratively performing step (e) by applying progressive modifications to the operating settings of the video encoder until convergence toward a target PAR value.

[0073] In some embodiments, the method includes iteratively performing step (e) until the PAR value is below 2 (or, below 1.75; or, below 1.5; or, below 1.33; or, below 1.25). In some embodiments, the method is not necessarily performed with respect to CBR-encoded video, but rather with respect to variable bitrate (VBR)-encoded video; wherein the method is configured to iteratively configure and reconfigure the operating settings of the video encoder to reduce the bitrate PAR value to below 2 (or, below 1.75; or, below 1.5; or, below 1.33; or, below 1.25).

[0074] In some embodiments, the method optionally includes: at the receiving device, eliminating or reducing the size of a receive buffer or receive jitter buffer that stores or buffers incoming / arriving video data packets; based on whether a bit rate PAR value is achieved that is less than a predefined threshold (e.g., less than 1.5) (or after achieving a bit rate PAR value that is less than a predefined threshold (e.g., less than 1.5), and / or whether the jitter level of the communication network is below a predefined threshold.

[0075] In some embodiments, by optionally utilizing communication channels to exchange raw data and / or processed data and / or processing results, calculations, operations, and / or determinations can be performed locally within a single device, or by multiple devices or across multiple devices, or can be performed partially locally and partially remotely (e.g., at a remote server).

[0076] While some of the content discussed herein relates to wired links and / or wired communications for illustrative purposes, some embodiments are not limited thereto and may utilize wired and / or wireless communications; may include one or more wired and / or wireless links; may utilize one or more components of wired and / or wireless communications; and / or may utilize one or more methods, protocols, or standards of wireless communications.

[0077] Some embodiments may be implemented using dedicated machines or purpose-built devices that are not general-purpose computers, or by using non-general-purpose computers or machines. Such systems or devices may utilize or include one or more components, units, or modules that are not part of a "general-purpose computer" and are not part of a "common-purpose computer," such as cellular transceivers, cellular transmitters, cellular receivers, GPS units, location determination units, accelerometers, gyroscopes, device orientation detectors or sensors, device positioning detectors or sensors, etc.

[0078] Some embodiments may be implemented as or by utilizing automated methods or processes, or methods or processes implemented by machines, or as semi-automatic or partially automated methods or processes, or as a series of steps or operations that may be performed or carried out by a computer or machine or system or other device.

[0079] Some embodiments may be implemented by using code or program code or machine-readable instructions or machine-readable code that may be stored on a non-transitory storage medium or non-transitory storage article (e.g., CD-ROM, DVD-ROM, physical storage unit, physical storage cell) such that when the program or code or instructions are executed by a processor or machine or computer, they cause such processor or machine or computer to perform the methods or processes as described herein. For example, such code or instructions can be or may contain one or more of the following: software, software module, application, program, subroutine, instruction, instruction set, computation code, word, value, symbol, string, variable, source code, compiled code, interpreted code, executable code, static code, dynamic code; including (but not limited to) high-level programming languages, low-level programming languages, object-oriented programming languages, visual programming languages, compiled programming languages, interpreted programming languages, C, C++, C#, Java, JavaScript, SQL, Ruby on Rails, Go, Cobol, Fortran, ActionScript, AJAX, XML, JSON, Lisp, Eiffel, Verilog, hardware description languages ​​(HDL, BASIC, Visual BASIC, MATLAB, Dart, Pascal, HTML, HTML5, CSS, Perl, Python, PHP), machine language, machine code, assembly language, etc.

[0080] The discussion using terms such as “processing,” “computing,” “calculating,” “determining,” “establishing,” “analyzing,” “checking,” “detecting,” and “measuring” in this document may refer to the operation and / or process of a processor, computer, computing platform, computing system, or other electronic or computing device that can automatically and / or autonomously manipulate and / or convert data represented as physical (e.g., electronic) quantities within registers and / or accumulators and / or memory cells and / or storage cells into other data, or perform other suitable operations.

[0081] As used herein, the terms "plurality" and "a plurality" include, for example, "a plurality" or "two or more". For example, "a plurality of items" includes two or more items.

[0082] References to terms such as "an embodiment," "an embodiment," "illustrative embodiment," "various embodiments," "some embodiments," and / or similar terms may indicate that the embodiments described herein may optionally include specific features, structures, or characteristics, but not every embodiment must include specific features, structures, or characteristics. Furthermore, the repeated use of the phrase "in one embodiment" does not necessarily refer to the same embodiment, although it may. Similarly, the repeated use of the phrase "in some embodiments" does not necessarily refer to the same group or set of embodiments, although it may.

[0083] As used herein, and unless otherwise specified, the use of ordinal adjectives such as “first,” “second,” “third,” “fourth,” etc., to describe an item or object merely indicates different instances of such an item or object; and is not intended to imply that the items or objects described in this way must be in a particular given order, whether temporally, spatially, or in any other ordering manner.

[0084] Some embodiments can be used in or in combination with various devices and systems, such as personal computers (PCs), desktop computers, mobile computers, laptop computers, notebook computers, tablet computers, server computers, handheld computers, handheld devices, personal digital assistant (PDA) devices, handheld PDA devices, tablet computers, in-vehicle devices, non-in-vehicle devices, hybrid devices, vehicle devices, non-vehicle devices, mobile or portable devices, consumer devices, non-mobile or non-portable devices, electrical appliances, wireless communication stations, wireless communication devices, wireless access points (APs), wired or wireless routers or gateways or switches or hubs, wired or wireless modems, video devices, audio devices, audio-video (A / V) devices, wired or wireless networks, wireless local area networks, wireless video local area networks (WVANs), local area networks (LANs), wireless LANs (WLANs), personal local area networks (PANs), wireless PANs (WPANs), etc.

[0085] Some embodiments can be used in conjunction with the following: one-way and / or two-way radio communication systems, cellular wireless telephone communication systems, mobile phones, cellular phones, wireless phones, personal communication system (PCS) devices, PDAs or handheld devices with wireless communication capabilities, mobile or portable global positioning system (GPS) devices, devices with GPS receivers or transceivers or chips, devices with RFID elements or chips, multiple-input multiple-output (MIMO) transceivers or devices, single-input multiple-output (SIMO) transceivers or devices, multiple-input single-output (MISO) transceivers or devices, devices with one or more internal antennas and / or external antennas, digital video broadcasting (DVB) devices or systems, multi-standard radio devices or systems, wired or wireless handheld devices, such as smartphones, Wireless Application Protocol (WAP) devices, etc.

[0086] Some embodiments may include or be implemented using an “app” or application that may be downloaded or obtained for free or for a fee from an “app store” or “application store”, or may be pre-installed on a computing device or electronic device, or may be otherwise transported to and / or installed on such computing device or electronic device.

[0087] The functions, operations, components, and / or features described herein with reference to one or more embodiments of the invention may be combined with, or used in combination with, one or more other functions, operations, components, and / or features described herein with reference to one or more other embodiments of the invention. Therefore, the invention may encompass any possible or suitable combination, rearrangement, assembly, reassembly, or other utilization of some or all of the modules or functions or components described herein, even if they are discussed in different locations or sections above, or even if they are shown across different or multiple figures.

[0088] While certain features of some illustrative embodiments of the invention have been shown and described herein, various modifications, substitutions, alterations, and equivalents will be apparent to those skilled in the art. Therefore, the claims are intended to cover all such modifications, substitutions, alterations, and equivalents.

Claims

1. A method for transmitting video from a transmitting device to a receiving device via an Internet Protocol (IP) communication link. The method includes: (a) At the receiving device, the communication network jitter level is estimated, which dynamically quantifies the degree of jitter in the communication network between the transmitting device and the receiving device; (b) At the receiving device, M data packets of N data packets of a specific video frame are received, where N is greater than M, and where the N data packets are packaged and transmitted from the transmitting device as a representation of the specific video frame; (c) At the receiving device, based on the jitter level of the communication network, the following items are selectively and dynamically determined: (c1) Whether to wait for the arrival of one or more of the NM data packets of the specific video frame that have not yet arrived before the specific video frame has been decoded and rendered, or conversely, (c2) whether to not wait for the arrival of the other data packets of the specific video frame and immediately perform decoding and rendering on the specific video frame.

2. The method according to claim 1, The N data packets together constitute a complete representation of the specific video frame as encoded by the transmission device; The method described herein includes: At the receiving device, if the communication network jitter level is greater than a predefined threshold, it is determined not to wait for the arrival of the NM packets that have not yet arrived for the specific video frame, and it is determined to immediately perform decoding and rendering on the specific video frame without waiting to receive the N packets that represent the complete representation of the specific video frame.

3. The method according to claim 1, Step (c) is determined selectively and dynamically. It is performed at the receiving device on a frame-by-frame basis. The value of N can be different across two consecutive video frames, or can be different in some cases. The value of M can be different across two consecutive video frames.

4. The method according to claim 1, comprising: When receiving and decoding the first video frame F1, which is the target of a total of N1 video data packets encoded by the transmission device, the receiving device only receives M1 video data packets out of the total N1 video data packets, and immediately decodes and renders the first video frame F1 after receiving the M1 video data packets, without waiting to receive one or more of the N1-M1 video data packets that have not yet arrived for the first frame F1. When receiving and decoding the second video frame F2, which is the target of the total N2 video data packets encoded by the transmission device, the receiving device only receives M2 video data packets out of the total N2 video data packets, and immediately decodes and renders the second video frame F2 after receiving the M2 video data packets, without waiting to receive one or more of the N2-M2 video data packets that have not yet arrived for the second frame F2. N1 is different from M2; M1 is different from M2.

5. The method according to claim 1, The N video data packets therein are complete representations of the specific video frame; The value of N varies across different video frames of the same video transmitted from the transmitting device to the receiving device.

6. The method according to claim 1, The determination in step (c) is performed dynamically on a per-video-data-packet basis. And this is performed multiple times while the receiving device continues to receive video data packets for the specific video frame; For a specific video frame, the transmission device encodes a total of N video data packets into a complete representation of the specific video frame; After receiving only M1 video data packets out of the total N video data packets, the receiving device determines to continue waiting for the pending video data packets of the specific video frame. Conversely, after receiving only M2 video data packets out of the total N video data packets, where M2 is greater than M1 but less than N, the receiving device determines to stop waiting for the unarrived video data packets of the specific video frame and determines to immediately decode and render the specific video frame.

7. The method according to claim 1, The determination in step (c) is performed dynamically on a per-video-frame basis. And this is performed multiple times while the receiving device continues to receive video data packets for the specific video frame; For a specific video frame, the transmission device encodes a total of N video data packets into a complete representation of the specific video frame; After receiving only the first part P1 of the total N video data packets, the receiving device determines to continue waiting for the video data packets of the specific video frame that have not yet arrived; Conversely, after receiving only the second portion P2 of the total N video data packets, where P2 is greater than P1, the receiving device determines to stop waiting for the unarrived video data packets of the specific video frame and determines to immediately decode and render the specific video frame.

8. The method according to claim 1, The determination in step (c) includes: If the communication network jitter level is greater than the predefined network jitter value, then the immediate decoding and rendering step (c2) is performed. Otherwise, proceed with step (c1) of waiting for the arrival of one or more additional data packets for the specific video frame.

9. The method according to claim 8, After performing step (c1) of waiting for the arrival of one or more additional data packets for the specific video frame, the method further includes: If a predefined time period has elapsed since the latest data packet for the specific video frame arrived, the waiting is stopped, and the process switches to step (c2) to immediately decode and render the specific video frame.

10. The method according to claim 9, The step (c), which is selectively and dynamically determined, is performed on a per-frame basis and exclusively based on: (i) the number M of data packets that have arrived for the particular video frame, and (ii) the value of the communication network jitter level as estimated by the receiving device.

11. The method according to claim 10, The selectively and dynamically determined step (c) is performed on a per-frame basis and is independent of and does not depend on the arrival or absence of data packets in any video frame preceding the particular video frame.

12. The method according to claim 11, The step (c), which is selectively and dynamically determined, is performed on a per-frame basis and is independent of and does not depend on the arrival or absence of data packets in any video frame following the particular video frame.

13. The method according to claim 12, The selective and dynamic determination in step (c) is performed on a frame-by-frame basis and is independent of and does not depend on the importance of the decoding purpose of the payload carried by the NM packets that have not yet arrived for the particular video frame.

14. The method of claim 13, further comprising: (d) At the receiving device, on a per-frame basis and taking into account the jitter level of the communication network, the following items are selectively and dynamically determined: (d1) Whether to send a retransmission request for the missing data packet for the specific video frame from the receiving device to the transmitting device, or conversely, (d2) whether to skip sending a retransmission request for the missing data packet for the specific video frame from the receiving device to the transmitting device.

15. The method of claim 13, further comprising: (d) At the receiving device, on a per-frame basis and also on a per-data packet basis, the following items are selectively and dynamically determined: (d1) Whether to send a retransmission request for the missing data packet for the specific video frame from the receiving device to the transmitting device, or conversely, (d2) whether to skip sending a retransmission request for the missing data packet for the specific video frame from the receiving device to the transmitting device. The step of selectively and dynamically determining, on a frame-by-frame basis and also on a packet-by-packet basis, includes: (I) selectively sending, for the particular video frame, a first retransmission request from the receiving device for requesting the retransmission of a first particular packet of the particular video frame; and (II) also for the same particular video frame, selectively skipping the sending of a second retransmission request from the receiving device for a second different particular packet of the particular video frame.

16. The method according to claim 15, The step of selectively and dynamically determining, on a frame-by-frame and also packet-by-packet basis, whether to send a retransmission request for a specific missing packet of the video frame. It is based on the following: an estimate at the receiving device regarding the importance of the particular missing frame for the successful decoding of the particular video frame.

17. The method of claim 13, further comprising: (d1) Measure the bit rate of the video frames generated by the video encoder of the transmission device using the constant bit rate (CBR) video coding scheme; (d2) Determine the highest peak bit rate of the video frame; (d3) Determine the overall constant bit rate value of the video frame; (d4) The peak-to-average power ratio (PAR) of the video frame is determined by dividing the highest peak value by the overall constant bit rate value; (e) At the transmission device, the operating settings of the video encoder of the transmission device are dynamically modified to reduce the PAR value.

18. The method of claim 17, comprising: Step (e) is performed iteratively by progressively modifying the operational settings of the video encoder until convergence toward the target PAR value.

19. The method of claim 17, comprising: Step (e) is performed iteratively until the PAR value is below 2.

20. A system comprising: One or more hardware processors, the one or more hardware processors being configured to execute code; The one or more hardware processors are operatively associated with one or more memory units configured to store code; The one or more hardware processors are configured to perform a method for video transmission from a transmitting device to a receiving device via an Internet Protocol (IP) communication link. The method described herein includes: (a) At the receiving device, the communication network jitter level is estimated, which dynamically quantifies the degree of jitter in the communication network between the transmitting device and the receiving device; (b) At the receiving device, M data packets of N data packets of a specific video frame are received, where N is greater than M, and where the N data packets are packaged and transmitted from the transmitting device as a representation of the specific video frame; (c) At the receiving device, based on the jitter level of the communication network, the following items are selectively and dynamically determined: (c1) Whether to wait for the arrival of one or more of the NM data packets of the specific video frame that have not yet arrived before the specific video frame has been decoded and rendered, or conversely, (c2) whether to not wait for the arrival of the other data packets of the specific video frame and immediately perform decoding and rendering on the specific video frame.