Video processing method, receiving end device and sending end device
By discarding some video encoded frames and requesting instant decoding and refresh frames when network conditions are poor, and combining differentiated encoding of target and background areas, the problems of video lag and poor quality are solved, improving the stability and security of remote industrial operations.
Patent Information
- Authority / Receiving Office
- CN · China
- Patent Type
- Patents(China)
- Current Assignee / Owner
- XUZHOU XUGONG DAOJIN SPECIAL ROBOT TECH CO LTD
- Filing Date
- 2026-03-09
- Publication Date
- 2026-06-19
AI Technical Summary
In existing technologies, remote operation of video footage in industrial settings suffers from instability and inaccuracy, especially when the network is unstable, resulting in delayed and poor-quality video footage, which can lead to misoperation and safety hazards.
When network conditions are poor, some video encoded frames in the receiver buffer are discarded, and a request is made to decode and refresh the frames immediately. The target area is encoded using the first quantization parameter, and the second quantization parameter is negatively correlated with network health. The background area coding rate is reduced to alleviate congestion and ensure that the target area information is clear.
The low-latency, clear video feed enhances the stability, accuracy, and security of remote industrial operations, reducing misoperations and safety incidents caused by poor network conditions.
Smart Images

Figure CN121815004B_ABST
Abstract
Description
Technical Field
[0001] This disclosure relates to the field of video transmission technology, and more specifically, to a video processing method, a receiving device, and a transmitting device. Background Technology
[0002] With the rapid development of industrial intelligence and digitalization, higher requirements have been placed on visual guidance in industrial sites. Specifically, it is not only required to see the industrial site from a distance, but also to carry out relevant industrial operations accurately and stably through the video footage transmitted from the industrial site. Summary of the Invention
[0003] In related technologies, remote industrial operations using video feeds transmitted from industrial sites suffer from instability, inaccuracy, and safety hazards.
[0004] Analysis revealed that related technologies typically employ increased buffer sizes to ensure smooth video playback, sacrificing latency for smoothness. However, this leads to cumulative latency (also known as "ghost latency"), where the video lags behind the actual physical conditions of the industrial environment. This makes it difficult to meet the high real-time requirements of industrial settings, resulting in unstable and inaccurate remote industrial operations based on video feeds. Especially when network jitter is present, the buffer accumulates a large number of outdated video frames, causing the operator to see a smooth but actually significantly outdated view. Furthermore, these technologies suffer from poor video quality, particularly under poor network conditions. To adapt to the deteriorating network environment, the video encoding side often compresses the image significantly, resulting in a blurrier transmitted video, further contributing to the instability and inaccuracy of remote industrial operations based on video feeds. Moreover, video lag and poor quality can potentially cause industrial safety accidents. For example, when remotely controlling robots via video feeds, lag and poor quality can easily lead to operator errors and collisions with the robotic arm.
[0005] To address the aforementioned problems, the present disclosure proposes the following solutions.
[0006] According to a first aspect of the present disclosure, a video processing method is provided, applied at a receiving end, comprising: in response to a network health score at a current time being less than or equal to a first threshold, discarding at least a portion of video encoded frames in a buffer and sending a request to a sending end to obtain an instant decoding refresh frame; in response to receiving the instant decoding refresh frame sent by the sending end, decoding the instant decoding refresh frame, wherein the instant decoding refresh frame is obtained by encoding a target region in an original image frame based on a first quantization parameter and encoding a background region in the original image frame based on a second quantization parameter, wherein the first quantization parameter does not change with the first health score, and the second quantization parameter is negatively correlated with the first health score.
[0007] In some embodiments, discarding at least a portion of the video-coded frames in the buffer includes: discarding all forward predictive coded frames and all bidirectional predictive coded frames in the buffer; or discarding all video-coded frames in the buffer.
[0008] In some embodiments, the second quantification parameter increases exponentially as the network’s first health score decreases.
[0009] In some embodiments, the second quantification parameter is determined based on the first quantification parameter and the first health score, wherein the difference between the second quantification parameter and the first quantification parameter is negatively correlated with the first health score.
[0010] In some embodiments, the second quantization parameter is determined based on an intermediate value obtained by summing the first quantization parameter and the increment, wherein the increment is obtained by multiplying the difference between 1 and the first health score by a preset maximum quantization difference. Specifically, if the intermediate value is less than the minimum quantization parameter value, the second quantization parameter is determined as the minimum quantization parameter value; if the intermediate value is greater than the maximum quantization parameter value, the second quantization parameter is determined as the maximum quantization parameter value; and if the intermediate value is greater than or equal to the minimum quantization parameter value and less than or equal to the maximum quantization parameter value, the second quantization parameter is determined as the intermediate value.
[0011] In some embodiments, the first health score is determined based on a first score, wherein the first score is obtained based on the network's round-trip latency, jitter value, and packet loss rate at the current moment.
[0012] In some embodiments, the first score is obtained by summing the difference between 1 and a first ratio multiplied by a first weighting coefficient, the difference between 1 and a second ratio multiplied by a second weighting coefficient, and the difference between 1 and a third ratio multiplied by a third weighting coefficient, wherein the first ratio is the ratio of the round-trip time of the network at the current time to a first normalized baseline threshold, the second ratio is the ratio of the jitter value of the network at the current time to a second normalized baseline threshold, and the third ratio is the ratio of the packet loss rate of the network at the current time to a third normalized baseline threshold.
[0013] In some embodiments, the first weighting coefficient is greater than the second weighting coefficient and the third weighting coefficient.
[0014] In some embodiments, the first health score is determined based on the first score and the second health score of the network at the previous time step, wherein the second health score is determined based on the round-trip time, jitter value and packet loss rate of the network at the previous time step.
[0015] In some embodiments, the first health score is determined by summing the difference between 1 and a preset smoothing factor multiplied by the second health score, and the preset smoothing factor multiplied by the first score.
[0016] In some embodiments, the video processing method further includes: displaying the real-time decoded refresh frame on a display interface; generating a distortion mesh and displaying the distortion mesh on the display interface.
[0017] In some embodiments, generating a distorted mesh includes: obtaining first coordinates of each vertex of a standard mesh in a world coordinate system; mapping the first coordinates of each vertex to obtain second coordinates of each vertex in the pixel coordinate system based on the mapping relationship between the world coordinate system and the pixel coordinate system of the display interface; mapping the second coordinates of each vertex in the pixel coordinate system based on radial distortion coefficients and tangential distortion coefficients to obtain third coordinates of each vertex in the pixel coordinate system; and generating the distorted mesh based on the third coordinates of each vertex in the pixel coordinate system.
[0018] In some embodiments, the target area includes the area containing at least one of the weld, bevel, and welding torch tip.
[0019] According to a second aspect of the present disclosure, a video processing method is provided, applied at a transmitting end, comprising: responding to receiving a request from a receiving end to obtain an instant-on-demand refresh frame; encoding a target region in an original image frame based on a first quantization parameter; and encoding a background region in the original image frame based on a second quantization parameter to obtain the instant-on-demand refresh frame, wherein the request is sent by the receiving end when a first health score of the network at the current time is less than or equal to a first threshold, the first quantization parameter does not change with the first health score, and the second quantization parameter is negatively correlated with the first health score; and sending the instant-on-demand refresh frame to the receiving end so that the receiving end can decode the instant-on-demand refresh frame when at least a portion of the video encoded frames in the buffer are discarded in response to the first health score being less than or equal to the first threshold.
[0020] In some embodiments, the second quantification parameter increases exponentially as the network’s first health score decreases.
[0021] In some embodiments, the second quantification parameter is determined based on the first quantification parameter and the first health score, wherein the difference between the second quantification parameter and the first quantification parameter is negatively correlated with the first health score.
[0022] In some embodiments, the second quantization parameter is determined based on an intermediate value obtained by summing the first quantization parameter and the increment, wherein the increment is obtained by multiplying the difference between 1 and the first health score by a preset maximum quantization difference. Specifically, if the intermediate value is less than the minimum quantization parameter value, the second quantization parameter is determined as the minimum quantization parameter value; if the intermediate value is greater than the maximum quantization parameter value, the second quantization parameter is determined as the maximum quantization parameter value; and if the intermediate value is greater than or equal to the minimum quantization parameter value and less than or equal to the maximum quantization parameter value, the second quantization parameter is determined as the intermediate value.
[0023] In some embodiments, the first health score is determined based on a first score, wherein the first score is obtained based on the network's round-trip latency, jitter value, and packet loss rate at the current moment.
[0024] In some embodiments, the first score is obtained by summing the difference between 1 and a first ratio multiplied by a first weighting coefficient, the difference between 1 and a second ratio multiplied by a second weighting coefficient, and the difference between 1 and a third ratio multiplied by a third weighting coefficient, wherein the first ratio is the ratio of the round-trip time of the network at the current time to a first normalized baseline threshold, the second ratio is the ratio of the jitter value of the network at the current time to a second normalized baseline threshold, and the third ratio is the ratio of the packet loss rate of the network at the current time to a third normalized baseline threshold.
[0025] In some embodiments, the first health score is determined based on the first score and a second health score of the network at the previous time step, wherein the second health score is determined based on the round-trip time, jitter value and packet loss rate of the network at the previous time step.
[0026] According to a third aspect of the present disclosure, a receiving device is provided, comprising: a module configured to perform the video processing method described in any one embodiment of the first aspect.
[0027] According to a fourth aspect of the present disclosure, a transmitting device is provided, comprising: a module configured to perform the video processing method described in any of the embodiments of the second aspect above.
[0028] According to a fifth aspect of the present disclosure, a video processing apparatus is provided, comprising: a memory; and a processor coupled to the memory, configured to execute the video processing method described in any of the above embodiments based on instructions stored in the memory.
[0029] According to a sixth aspect of the present disclosure, a computer-readable storage medium is provided, including a computer program, wherein when the computer program is executed by a processor, it implements the steps of the video processing method described in any of the above embodiments.
[0030] According to a seventh aspect of the present disclosure, a computer program product is provided, comprising a computer program, wherein when the computer program is executed by a processor, it implements the steps of the video processing method described in any of the above embodiments.
[0031] In this embodiment, when the network's first health score is less than or equal to a first threshold (i.e., the network condition is poor), at least a portion of the accumulated video encoded frames in the receiver's buffer are discarded to promptly decode the real-time decoded refresh frames from the transmitter corresponding to the latest scene, reducing latency. Furthermore, since the first quantization parameter does not change with the first health score, while the second quantization parameter is negatively correlated with the first health score, even when the network condition deteriorates over time, the video encoding bitrate of the background area can be reduced to alleviate network congestion, while the video encoding bitrate of the target area remains unaffected by the deteriorating network condition, thus ensuring that the information in the target area is preserved more clearly. Therefore, in subsequent remote industrial operations using low-latency, clear video footage of the target area, operational stability, accuracy, and security can be improved.
[0032] Other features and advantages of this disclosure will become clearer from the following detailed description of exemplary embodiments with reference to the accompanying drawings. Attached Figure Description
[0033] The accompanying drawings, which form part of this specification, illustrate embodiments of this disclosure and, together with the specification, serve to explain the principles of this disclosure.
[0034] This disclosure can be more clearly understood with reference to the accompanying drawings and the following detailed description.
[0035] Figure 1 A schematic flowchart of a video processing method according to some embodiments of the present disclosure is shown.
[0036] Figure 2 A flowchart illustrating the execution of appropriate policies based on network conditions according to some embodiments of the present disclosure is shown.
[0037] Figure 3 A schematic diagram illustrating the effect of differential coding according to some embodiments of the present disclosure is shown.
[0038] Figure 4 A schematic flowchart illustrating the generation of distorted meshes according to some embodiments of the present disclosure is shown.
[0039] Figure 5 A schematic diagram illustrating the mapping of points in a world coordinate system to a pixel coordinate system according to some embodiments of the present disclosure is shown.
[0040] Figure 6 A schematic flowchart of a video processing method according to other embodiments of the present disclosure is shown.
[0041] Figure 7 A schematic diagram of a receiving device according to some embodiments of the present disclosure is shown.
[0042] Figure 8 A schematic diagram of a transmitting device according to some embodiments of the present disclosure is shown.
[0043] Figure 9 A schematic diagram of a video processing apparatus according to some embodiments of the present disclosure is shown.
[0044] Figure 10 A schematic diagram of a video processing apparatus according to other embodiments of the present disclosure is shown.
[0045] Figure 11 A schematic block diagram of a computer system on which embodiments of the present disclosure may be implemented is shown.
[0046] For ease of understanding, the positions, dimensions, and extents of the structures shown in the accompanying drawings and other materials may not represent actual positions, dimensions, and extents. Therefore, the disclosed invention is not limited to the positions, dimensions, and extents disclosed in the accompanying drawings and other materials. Furthermore, the drawings are not necessarily drawn to scale, and some features may be enlarged to show details of specific components. Detailed Implementation
[0047] Various exemplary embodiments of the present disclosure will now be described in detail with reference to the accompanying drawings. It should be noted that, unless otherwise specifically stated, the relative arrangement, numerical expressions, and values of the components and steps set forth in these embodiments do not limit the scope of the present disclosure.
[0048] The following description of at least one exemplary embodiment is merely illustrative and is in no way intended to limit the scope of this disclosure or its application or use. That is, the structures and methods herein are shown in an exemplary manner to illustrate different embodiments of the structures and methods in this disclosure. However, those skilled in the art will understand that they merely illustrate exemplary ways that can be used to implement this disclosure, and not exhaustive ways. Furthermore, the drawings are not necessarily drawn to scale, and some features may be enlarged to show details of specific components.
[0049] Techniques, methods, and equipment known to those skilled in the art may not be discussed in detail, but where appropriate, such techniques, methods, and equipment should be considered part of the specification.
[0050] In all examples shown and discussed herein, any specific values should be interpreted as merely exemplary and not as limitations. Therefore, other examples of exemplary embodiments may have different values.
[0051] Figure 1 A flowchart illustrating a video processing method according to some embodiments of the present disclosure is shown. Figure 1As shown, the video processing method according to some embodiments of this disclosure may include steps S110 to S120. The video processing method according to some embodiments of this disclosure can be applied to a receiving end, or rather, executed by the receiving end.
[0052] In step S110, in response to the network's first health score at the current moment being less than or equal to a first threshold, at least a portion of the video encoded frames in the buffer are discarded, and a request to obtain an instant decoding refresh frame is sent to the sender. In step S120, in response to receiving the instant decoding refresh frame sent by the sender, the instant decoding refresh frame is decoded.
[0053] Here, the instantaneous decoding refresh frame can be obtained by encoding the target region in the original image frame based on the first quantization parameter and encoding the background region in the original image frame based on the second quantization parameter. The first quantization parameter does not change with the first health score, and the second quantization parameter is negatively correlated with the first health score.
[0054] In this disclosure, the image frame can be an image captured at an industrial site for a corresponding industrial scenario. That is, the video processing method of this disclosure can be used in industrial scenarios to process video captured from a corresponding industrial site. As some embodiments, the image frame may contain a target object from the industrial site. As some embodiments, the original image frame can be an image captured from the industrial site at the latest or current moment. In this disclosure, the target region can be the area where the target object is located. The background region can be the remaining area in the image frame excluding the target region.
[0055] In fields such as heavy machinery, shipbuilding, and large steel structure welding, the teaching programming and remote operation of high-level welds are crucial for ensuring production safety and quality. Some embodiments of this disclosure can improve the accuracy of the teaching programming trajectory for high-level welds on large workpieces and solve the problem of teaching personnel needing to frequently climb to observe the teaching trajectory. In this disclosure, a video processing method according to some embodiments of this disclosure is described using an industrial welding scenario as an example. It should be understood that the video processing method according to some embodiments of this disclosure is also applicable to other industrial operation scenarios where there is a need to transmit video or images over a network.
[0056] In this disclosure, the target area is also referred to as the region of interest (ROI).
[0057] In some embodiments, the target area may include the area where the target object is located in an industrial welding scenario. As some implementations, the target area may include the area containing at least one of a weld, a bevel, and the tip of a welding torch.
[0058] In this disclosure, the receiving end can connect to the sending end via a network to receive video encoded frames formed by encoding image frames transmitted by the sending end via the network, and can decode the corresponding video encoded frames.
[0059] In this disclosure, video coded frames may include intra-frame coded frames (I-frames), forward predictive coded frames (P-frames), and bidirectional predictive coded frames (B-frames). That is, encoding image frames according to the corresponding coding algorithm can form corresponding I-frames, P-frames, or B-frames. An I-frame corresponds to a complete image, containing all the information (such as color, texture, objects, etc.) of the corresponding image frame, and does not depend on any other frames; it can be decoded or displayed independently. A P-frame is used to record motion changes and depends on the preceding I-frame or P-frame for decoding. If the reference frame on which the P-frame depends is lost, the P-frame cannot be decoded or displayed correctly. A B-frame is used to record the difference between the preceding and following frames. Since a B-frame needs to reference the frames following it, it must wait for the frames following it to be transmitted before it can be decoded or displayed.
[0060] In video coding, image frames are organized into groups of pictures (GOPs) as the basic unit. Encoding image frames according to a predetermined coding algorithm yields corresponding video encoded frames. The starting frame of a GOP is an Instant Decoding Refresh (IDR) frame. Besides the starting frame, a GOP can include at least one of one or more I-frames, one or more P-frames, and one or more B-frames. IDR frames are a type of I-frame; they do not depend on or reference other frames and can be decoded or displayed independently. IDR frames ensure that subsequent frames within the current GOP do not reference the content of frames earlier than the IDR frame.
[0061] The quantizer parameter (QP) controls the degree of image compression. The value of the quantizer parameter is negatively correlated with image quality and positively correlated with the image compression rate. A smaller quantizer parameter results in better image quality and a higher bitrate; a larger quantizer parameter leads to more severe loss of image details and increased blurriness, while also resulting in a lower bitrate.
[0062] As one implementation method, the network's first health score at the current moment can be determined based on the network's round-trip latency, jitter, and packet loss rate at the current moment, so as to comprehensively measure the current network condition through the network's round-trip latency, jitter, and packet loss rate at the current moment.
[0063] In the above embodiments, when the network's first health score at the current moment is less than or equal to a first threshold, indicating poor network conditions and cumulative network latency, at least a portion of the accumulated video encoded frames in the receiver's buffer are discarded to promptly decode the real-time decoded refresh frames from the transmitter corresponding to the latest scene, reducing latency. Furthermore, since the first quantization parameter does not change with the network's first health score at the current moment, while the second quantization parameter is negatively correlated with the network's first health score, even as network conditions deteriorate over time, the video encoding bitrate in the background area can be reduced to alleviate network congestion while ensuring the target area's video encoding bitrate remains unaffected by the deteriorating network conditions, thus allowing the target area's information to be preserved more clearly. Therefore, in subsequent remote industrial operations using low-latency, clear video footage of the target area, operational stability, accuracy, and security can be improved.
[0064] In some embodiments, discarding at least a portion of the video-coded frames in the buffer may include discarding all forward predictive coded frames and all bidirectional predictive coded frames in the buffer.
[0065] Here, after discarding at least a portion of the video encoded frames in the buffer and before receiving the instant decoding refresh frame sent by the sender, the remaining video encoded frames in the buffer can be decoded. Then, as soon as the instant decoding refresh frame sent by the sender is received, the remaining video encoded frames in the buffer can be discarded and the instant decoding refresh frame can be decoded.
[0066] In the above embodiments, considering that a forward predictive coding frame needs to reference a frame preceding it, if there are forward predictive coding frames that have not been discarded, the loss of the frame referenced by that forward predictive coding frame could lead to decoding or display errors. Therefore, all forward predictive coding frames in the buffer can be cleared to reduce the possibility of decoding or display errors and improve the security of subsequent operations. Furthermore, by clearing all bidirectional predictive coding frames in the buffer, the latency introduced by the requirement for bidirectional predictive coding frames to reference other frames following them for decoding or display is avoided.
[0067] In some embodiments, discarding at least a portion of the video-coded frames in the buffer may include discarding all video-coded frames in the buffer. Thus, if the network's first health score at the current moment is less than or equal to a first threshold, regardless of whether the buffer contains backlogged forward predictive coded frames, backlogged bidirectional predictive coded frames, or backlogged intra-coded frames, a clearing operation can be performed on the buffer to discard all video-coded frames in the buffer.
[0068] As one implementation method, in response to discarding all video encoded frames in the buffer, the system remains in a waiting state, i.e., pausing decoding until it receives an IDR frame from the sender that represents the latest moment. Decoding then resumes. This ensures that the first frame after truncation (i.e., after discarding all video encoded frames) is complete and up-to-date, effectively avoiding the risk of screen tearing and achieving basic synchronization between the visual image and the industrial site.
[0069] In the above embodiments, considering that in the case of poor network conditions, as long as there are still backlogged video encoded frames in the buffer, the decoded and displayed images will still lag behind the actual situation in the industrial field to a certain extent, by discarding all video encoded frames in the buffer, it is possible to effectively prevent "ghost latency" caused by poor network conditions, such as network jitter, thereby meeting the needs of precision industrial operation scenarios and further improving the stability, accuracy and security of remote operation via video.
[0070] In some embodiments, in response to the network's first health score at the current moment being greater than a first threshold and less than or equal to a second threshold, a buffer of first capacity is used to store video encoded frames from the transmitter; in response to the network's first health score at the current moment being greater than the second threshold, a buffer of second capacity is used to store video encoded frames from the transmitter, wherein the second capacity is less than or equal to the first capacity.
[0071] As some implementations, the second capacity can be 0~50 ms; or the first capacity can be 50~200 ms. It should be understood that, in this document, the values at both ends of the symbol "~" used to indicate the numerical range are desirable, for example, the first capacity can be 50 ms, or it can also be 200 ms.
[0072] In the above embodiments, when the network's first health score at the current moment is greater than the second threshold, i.e., the network condition is good, a relatively small buffer or a smaller buffer size can be used to minimize end-to-end latency, thereby enabling millisecond-level real-time industrial operation. When the network's first health score at the current moment is greater than the first threshold and less than or equal to the second threshold, i.e., the network condition is average, a relatively large buffer or a larger buffer size can be used to resist slight network jitter and improve video smoothness.
[0073] The following describes some embodiments of how a first health score is determined according to this disclosure.
[0074] In some embodiments, the network's first health score Q1 at the current moment can be determined based on a first score Q1', where the first score Q1' can be obtained based on the network's round-trip latency, jitter value, and packet loss rate at the current moment. Thus, by comprehensively considering the three factors of round-trip latency, jitter value, and packet loss rate at the current moment, the obtained first health score can more accurately reflect the current network condition, thereby accurately triggering corresponding strategies. For example, by obtaining a more accurate first health score, it is possible to accurately determine whether the first health score is less than or equal to a first threshold, thereby accurately determining whether to trigger the operation of discarding at least a portion of the video encoded frames in the buffer and sending a request to the sender to obtain the instant decoding refresh frame.
[0075] In some embodiments, the first score Q1' is obtained by summing the difference between 1 and the first ratio multiplied by a first weighting coefficient, the difference between 1 and the second ratio multiplied by a second weighting coefficient, and the difference between 1 and the third ratio multiplied by a third weighting coefficient, wherein the first ratio is the ratio of the network's round-trip delay at the current time to a first normalized baseline threshold, the second ratio is the ratio of the network's jitter value at the current time to a second normalized baseline threshold, and the third ratio is the ratio of the network's packet loss rate at the current time to a third normalized baseline threshold.
[0076] As one implementation method, the first score Q1' can be obtained based on the following formula:
[0077] .
[0078] Here, Q1' represents the first score. This represents the first weighting coefficient. This represents the second weighting coefficient. This represents the third weighting coefficient. This represents the round-trip time of the network at the current moment. This represents the network jitter value at the current moment. This represents the packet loss rate of the network at the current moment. This represents the first normalized baseline threshold. This represents the second normalized baseline threshold. This represents the third normalized baseline threshold.
[0079] Here, the first normalized baseline threshold, the second normalized baseline threshold, and the third normalized baseline threshold can be used to indicate the tolerance of the corresponding industrial scenario, and can be preset based on the corresponding industrial scenario. For example, if the round-trip delay is greater than the first normalized baseline threshold, it can indicate that the current round-trip delay has exceeded the latency tolerance under the corresponding industrial scenario.
[0080] As one implementation method, the first normalized reference threshold =500 ms, second normalized baseline threshold =100 ms, third normalized baseline threshold =10%.
[0081] As one implementation method, when any single indicator exceeds its corresponding normalized baseline threshold, the score for that indicator can be set to 0, for example, when... > Then, the corresponding item can be set. Set it directly to 0; when the calculation result Q1' < 0, you can set Q1' = 0. In this way, you can constrain the calculation boundary to obtain a more intuitive calculation result.
[0082] In the above embodiments, the round-trip latency, jitter value, and packet loss rate at the current moment are normalized by the corresponding normalized benchmark threshold. The resulting first score or first health score can not only intuitively reflect the network status, but also intuitively reflect the stability margin of remote industrial operations through the network in the corresponding industrial scenario. The higher the first score or first health score, the more conducive the current network status is to maintaining the stability, security, and controllability of remote industrial operations.
[0083] Considering that real-time requirements are high in some industrial scenarios, in some embodiments, the first weighting coefficient is greater than the second and third weighting coefficients. In this way, a score that better meets the needs of high real-time industrial scenarios can be obtained, thereby improving the accuracy of the score.
[0084] In some implementation methods, the sum of the first weighting coefficient, the second weighting coefficient, and the third weighting coefficient is 1, that is... In this way, the first score Q1' can be mapped to the standardized closed interval [0,1], so as to more intuitively quantify the network health.
[0085] As one implementation method, the first weighting coefficient =0.5, second weighting coefficient =0.3, third weighting coefficient =0.2.
[0086] In some embodiments, the first score Q1' can be used as the network's first health score Q1 at the current moment, i.e., Q1=Q1'.
[0087] In other embodiments, the network's first health score Q1 at the current moment can be determined based on the first score Q1' and the network's second health score Q2 at the previous moment. Here, the network's second health score Q2 at the previous moment can be determined based on the network's round-trip time, jitter value, and packet loss rate at the previous moment. Thus, by considering the network's condition at the previous moment to determine the network's first health score at the current moment, the process of obtaining the network's first health score at the current moment can be smoother, avoiding frequent policy switching due to instantaneous network fluctuations and improving stability.
[0088] In some embodiments, the first health score Q1 is determined by summing the difference between Q1 and a preset smoothing factor multiplied by the second health score Q2, and the preset smoothing factor multiplied by the first score Q1'.
[0089] As one implementation method, the first health score Q1 is determined based on the following formula:
[0090] Where Q1 represents the first health score, Q1' represents the first score, and Q2 represents the network's second health score at the previous moment. This represents the preset smoothing factor.
[0091] In the above embodiments, by using a corresponding smoothing factor, the first score and the second health score are integrated to smooth the instantaneous network health score that changes drastically over time, thereby generating a more stable final score that better reflects the current network condition. This avoids the "jitter" in the network health calculation results caused by fluctuations in a single measurement, and improves the reliability of subsequent decisions based on network health.
[0092] As one implementation method, the preset smoothing factor 'a' can be set to 0.5. This allows for faster detection of persistent network degradation while avoiding frequent policy switching due to momentary network fluctuations, enabling timely implementation of appropriate strategies.
[0093] As one implementation method, the determination of the network's second health score Q2 at the previous time step can refer to the description above regarding the determination of the network's first health score Q1 at the current time step. Specifically, the second score Q2' obtained based on the network's round-trip delay, jitter value, and packet loss rate at the previous time step can be used as the network's second health score Q2 at the previous time step, where the second score Q2' can be obtained by referring to the above description of obtaining the first score Q1'. In addition, besides the second score obtained based on the network's round-trip delay, jitter value, and packet loss rate at the previous time step, the second health score Q2 can also be determined based on the network's third health score at the time step preceding the previous time step. Here, the third health score can be determined by referring to the method used to determine the first health score Q1 and the second health score Q2, and so on, which will not be elaborated here.
[0094] The following is for reference Figure 2 This describes some implementation examples of executing appropriate policies based on network conditions.
[0095] In step S210, the network's first health score Q1 at the current moment is obtained. As some implementations, the network's round-trip time, jitter, and packet loss rate at the current moment can be obtained, and the first health score Q1 at the current moment can be calculated based on these parameters. As some implementations, the network's first health score Q1 at the current moment can be obtained from the data sent by the transmitting end.
[0096] At step S220, it is determined whether the network's first health score Q1 at the current moment is less than or equal to a first threshold. If yes, step S230 is executed; otherwise, step S240 is executed. As some implementations, the first threshold can be 0.4, that is, if Q1 ≤ 0.4, step S230 can be executed, and if Q1 > 0.4, step S240 can be executed.
[0097] At step S230, a secure truncation strategy is adopted. Here, adopting a secure truncation strategy may include discarding at least a portion of the video encoded frames in the buffer, sending a request to the sender to obtain an instant decoding refresh frame, and decoding the instant decoding refresh frame in response to receiving the instant decoding refresh frame sent by the sender.
[0098] At step S240, a preset buffering strategy is adopted. Specifically, in some embodiments, when Q1 > 0.4, it can be further determined whether Q1 is greater than a second threshold. In response to Q1 being greater than the first threshold and less than or equal to the second threshold, the first strategy is adopted; in response to Q1 being greater than the second threshold, the second strategy is adopted. As some implementation manners, the second threshold can be 0.8, that is, when 0.4 < Q1 ≤ 0.8, the first strategy is adopted, and when Q1 > 0.8, the second strategy is adopted. It should be understood that in some embodiments, the obtained health score can also be scaled, and the corresponding strategy triggering thresholds (such as the above-mentioned first threshold and second threshold) are scaled proportionally, which is not limited herein.
[0099] As some implementation manners, adopting the first strategy can include storing video coding frames from a sending end in a buffer with a first capacity. As some implementation manners, adopting the second strategy can include storing video coding frames from the sending end in a buffer with a second capacity, where the second capacity is less than or equal to the first capacity.
[0100] Here, the instant decoding refresh frame and the video coding frame can be obtained by encoding a target region in an original image frame based on a first quantization parameter and encoding a background region in the original image frame based on a second quantization parameter, where the first quantization parameter does not change with the change of the first health score, and the second quantization parameter is negatively correlated with the first health score. That is to say, regardless of how the network health score changes, the first quantization parameter can be maintained at a corresponding fixed value, and as the first health score of the network decreases, the second quantization parameter can show an upward trend.
[0101] In some embodiments, the second quantization parameter can linearly increase as the first health score of the network decreases, so as to gradually increase the second quantization parameter as the network condition deteriorates.
[0102] Alternatively, in some embodiments, the second quantization parameter can exponentially increase as the first health score of the network decreases. Thus, when Q1 is less than or equal to the first threshold, that is, when the network condition is poor, it can be ensured that the target region is not affected by the deterioration of the network condition to lock the truth of the target region, and at the same time, the second quantization parameter can rapidly increase to release the bandwidth; when Q1 is greater than the first threshold and less than or equal to the second threshold, that is, when the network condition is average, on the premise of ensuring the visual features (such as edge sharpness) of the target region, the increasing speed of the second quantization parameter decreases to moderately reduce the bit rate of the background region and balance the fluency and image quality; when Q1 is greater than the second threshold, that is, when the network condition is good, the increasing speed of the second quantization parameter further decreases, and the overall image frame can maintain a high bit rate to provide a basically lossless image quality.
[0103] In some embodiments, the first quantization parameter may be less than or equal to the second quantization parameter. In some implementations, the first quantization parameter may be less than the second quantization parameter. This ensures that the information in the target area is clearer than the information in the background area, while the video coding bitrate or image quality of the target area remains unaffected by network degradation. In some implementations, the first quantization parameter may be set to remain at a preset baseline value; the second quantization parameter may be set to be greater than the preset baseline value and increase as the network's first health score decreases at the current moment. This ensures that even with deteriorating network conditions, the image of the target area remains faithfully preserved, effectively preventing image distortion in the target area.
[0104] In some embodiments, the first quantization parameter can be maintained at a preset reference value, specifically, it can be expressed as: = ,in, Indicates the first quantization parameter. This represents a preset reference value. As one implementation method, At this value, compression distortion is almost imperceptible to the human eye, and edge gradients remain sharp. Thus, regardless of network conditions, blocky distortion in the target area can be prevented, thereby ensuring the corresponding operational accuracy.
[0105] In some embodiments, the second quantization parameter can be determined based on the first quantization parameter and the first health score, wherein the difference between the second quantization parameter and the first quantization parameter is negatively correlated with the first health score. That is, the smaller the first health score, the larger the difference between the second quantization parameter and the first quantization parameter, and the larger the second quantization parameter. In this way, while ensuring that the information in the target area is clearer than the information in the background area, the second quantization parameter increases as network conditions deteriorate, thereby freeing up network bandwidth.
[0106] In some embodiments, the second quantization parameter can be determined based on an intermediate value obtained by summing the first quantization parameter and the increment, wherein the increment can be obtained by multiplying the difference between 1 and the first health score by a preset maximum quantization difference. Here, if the intermediate value is less than the minimum quantization parameter value, the second quantization parameter can be determined as the minimum quantization parameter value; if the intermediate value is greater than the maximum quantization parameter value, the second quantization parameter can be determined as the maximum quantization parameter value; if the intermediate value is greater than or equal to the minimum quantization parameter value and less than or equal to the maximum quantization parameter value, the second quantization parameter can be determined as the intermediate value.
[0107] As one implementation method, the second quantization parameter can be determined based on the following formula:
[0108] ;in, Indicates the first quantization parameter. This represents the second quantitative parameter, and Q1 represents the first health score. This indicates the preset maximum quantization difference. This indicates the preset index. This represents the maximum quantization parameter value. This represents the minimum quantization parameter value, and Clamp() represents the truncation function.
[0109] Here, as one of the implementation methods, a maximum quantization difference is preset. =30, when the network condition is worst (Q1=0), the second quantization parameter can be 30 higher than the first quantization parameter. Preset index To characterize the sensitivity of the second quantitative parameter to the first health score Q1, as one implementation method, a preset index is used. >1, for example =2, thus allowing the second quantization parameter to increase exponentially as Q1 decreases. If the network deteriorates, the second quantization parameter can rapidly increase to free up bandwidth. The truncation function Clamp() is used to... The calculation results are limited to the minimum quantization parameter value. and maximum quantization parameter value The minimum quantization parameter value is set between these values to avoid the second quantization parameter exceeding the encoder's allowed range. As one implementation, a minimum quantization parameter value is used. =0, maximum quantization parameter value =51.
[0110] In a specific example, under strong network conditions, the first health score is Q1. First quantization parameter =22, and by substituting into the relevant formula above, the second quantization parameter can be obtained. ,at this time The entire image is clear. In another specific example, under weak network conditions, the first health score is Q1. First quantization parameter =22, and by substituting into the relevant formula above, we can obtain the second quantization parameter. At this point, the background area is highly compressed, resulting in a significant reduction in bitrate, but the target area (such as the weld seam area) remains clear.
[0111] In the above embodiments, a direct feedback link from network status to image spatial quantization parameters is established, and a dynamic strategy of "preserving the core and discarding the shell" is adopted. That is, while locking the high fidelity of the target area, the compression rate of the background area is rapidly increased as the network health score decreases, so as to ensure that high-precision operation can still be performed on the target area even when the network deteriorates. This effectively balances the overall image quality and operation accuracy under limited bandwidth.
[0112] Figure 3 A schematic diagram illustrating the effect of differential coding according to some embodiments of this disclosure is shown. For example... Figure 3 As shown in (a), when the network condition is good (high first health score Q1), the first and second quantization parameters are not significantly different; for example, both the first and second quantization parameters are around 20, and the entire graph is clear. Figure 3 As shown in (b), when the network condition is poor (the first health score Q1 is low), the first quantization parameter and the second quantization parameter differ significantly. The first quantization parameter of the target area can still be maintained at around 20, and the target area information is clearly displayed. However, the second quantization parameter can rise to over 45, and the background area is highly compressed and becomes blurry.
[0113] In this disclosure, decoded video encoded frames can be displayed on a display interface to provide the corresponding video feed.
[0114] Considering that the acquired image frames may contain geometric distortions, such as those acquired at large pitch or side angles, which can reduce the accuracy of subsequent operations based on the video feed on the display interface—for example, measuring a specific object in an industrial setting using a video feed with geometric distortion—the display screen can be processed to improve operational accuracy.
[0115] Specifically, in some embodiments, the video processing method may further include displaying real-time decoded refresh frames on a display interface; generating a distortion mesh and displaying the distortion mesh on the display interface.
[0116] Here, the distorted mesh can be formed by mapping a standard mesh based on corresponding distortion coefficients, where the size of the standard mesh can be known. As one implementation, the standard mesh and its size can be generated by obtaining the coordinates of each vertex of the standard mesh in the world coordinate system.
[0117] As one implementation, the distortion mesh on the display interface can at least cover the target area in the displayed real-time decoded refresh frame. This allows for accurate measurement of the target area. It should be understood that the distortion mesh on the display interface can at least cover the target area in the currently displayed video frame, which can be any decoded and displayed video-encoded frame.
[0118] As one implementation method, the distorted grid displayed on the display interface can be transparent or semi-transparent to reduce the possibility of obscuring key information.
[0119] In the above embodiments, by using Inverse Perspective Mapping (IPM), the pre-generated standard physical measurement grid is inversely projected onto the pixel coordinate system of the current display viewpoint, and a visualized virtual ruler grid (i.e., a distorted grid) that is deformed with perspective is superimposed on the video layer, thereby providing the operator with a measurement benchmark with anisotropic depth perception, so as to improve the corresponding operational accuracy.
[0120] Figure 4 A schematic flowchart illustrating the generation of distorted meshes according to some embodiments of the present disclosure is shown.
[0121] In step S410, the first coordinates of each vertex of the standard grid in the world coordinate system are obtained.
[0122] In some embodiments, the point set of each vertex of a standard mesh in world coordinates can be generated at the receiving end (e.g., in the receiving end's memory). As one implementation, a corresponding dot matrix is generated at 10mm intervals in the X and Y axes. .
[0123] In step S420, based on the mapping relationship between the world coordinate system and the pixel coordinate system of the display interface, the first coordinates of each vertex are mapped to obtain the second coordinates of each vertex in the pixel coordinate system.
[0124] In some embodiments, the mapping relationship between the world coordinate system and the pixel coordinate system of the display interface can be represented based on the homography matrix H. As some implementations, the homography matrix H can be obtained based on intrinsic and extrinsic parameters. Specifically, it can be obtained by calibrating the acquisition unit (e.g., a camera) at the transmitting end to obtain, for example, the intrinsic parameter matrix K and extrinsic parameter matrix (the extrinsic parameter matrix may include a rotation matrix R and a translation vector T) of the camera's acquisition unit. Thus, the following formula for the pinhole camera model can be obtained:
[0125] ;here, For the reason The rotation matrix R and The translation vector T is concatenated to form the translation vector T. The transformation matrix, where s represents the scaling factor in homogeneous coordinates. , , Here, u and v are coordinates in the world coordinate system, and u and v are coordinates in the pixel coordinate system. For example, Figure 5As shown, the pinhole camera model can be represented as a light ray from any point in the world coordinate system traveling along a straight line from that point towards the camera's optical center 510, to be projected onto the imaging plane (display interface 520). Figure 5 As shown, the homography matrix H can be used to represent points in the world coordinate system. Mapping is performed to obtain the points on the display interface 520. .
[0126] Let the surface where the target object is located in the work environment of the corresponding work scenario be the reference plane of the world coordinate system, that is, let The above formula can then be simplified to:
[0127] ;here, for The homography matrix, Rotation matrix The first two column vectors. Thus, the mapping relationship between the physical world plane and the image plane of the display interface can be determined through the homography matrix H, that is, the mapping relationship between the world coordinate system and the pixel coordinate system of the display interface can be obtained.
[0128] Next, the homography matrix H can be used to map the first coordinates of each vertex in the world coordinate system to obtain the second coordinates of each vertex in the pixel coordinate system. Specifically, it can be represented as follows: ,in, Let be the second coordinate of vertex i in the pixel coordinate system. Let be the first coordinate of vertex i in the world coordinate system. This indicates that a homogeneous normalization operation is being performed.
[0129] In step S430, based on the radial distortion coefficient and the tangential distortion coefficient, the second coordinates of each vertex in the pixel coordinate system are mapped to obtain the third coordinates of each vertex in the pixel coordinate system.
[0130] As one implementation method, the radial and tangential distortion coefficients obtained during the calibration phase can be used to adjust the second coordinates of vertex i. Perform a nonlinear mapping to obtain the third coordinate of vertex i in the pixel coordinate system. Here, a pre-defined distortion model (such as the Brown-Conrady distortion model) can be used to perform nonlinear mapping on the second coordinate.
[0131] In step S440, a distorted mesh is generated based on the third coordinate of each vertex in the pixel coordinate system.
[0132] As one implementation method, a corresponding rendering engine can be used in the receiving end to connect the vertices in the corresponding third coordinate system. This allows for the drawing of corresponding lines, thereby generating a distorted mesh. In this way, meshes that conform to physical perspective (such as trapezoids) can be drawn on the display interface. Figure 5 As shown, the above process can generate a distorted mesh on the display interface 520 based on the standard mesh in the world coordinate system.
[0133] In the above embodiments, by performing distortion calculations on the vertices on the grid lines instead of resampling the entire image (e.g., two million pixels), the computational load is significantly reduced compared to performing distortion correction calculations on the entire image. For example, the computational load can be reduced by five orders of magnitude.
[0134] In some embodiments, the pixel coordinates of the specified pixel are obtained in response to an operation on a specified pixel on the display interface. = (u, v). Here, an operation targeting a specific pixel on the display interface could be, for example, a user clicking on a specific pixel on the screen. Next, as... Figure 5 As shown, the inverse matrix of the homography matrix is used. Pixel coordinates Restored to coordinates in the world coordinate system Specifically, it can be expressed as: The coordinates obtained above in the world coordinate system It can be a homogeneous coordinate vector; as some implementation methods, it can be used for... Perform a normalization operation to obtain the true physical coordinates. As one implementation method, the actual physical coordinates of two specified pixels are obtained. and In this case, it can be calculated and The Euclidean distance between them is used to obtain the corresponding distance in the industrial site, thereby enabling measurement of the industrial site through a display interface. As one implementation method, the distance obtained through calculation can be... and The Euclidean distance between them is displayed.
[0135] Thus, when measurement operations are required, such as measuring weld width or tool setting distance, there is no need for human estimation of perspective error; the operation can be performed directly on the touch screen to obtain the corresponding measurement results.
[0136] Figure 6 A schematic flowchart of a video processing method according to other embodiments of this disclosure is shown. For example... Figure 6As shown, the video processing method according to some embodiments of this disclosure may include steps S610 to S620. The video processing method according to some embodiments of this disclosure can be applied to a sending end, or rather, executed by the sending end.
[0137] In step S610, in response to receiving a request from the receiving end to obtain an instant-decoded refresh frame, the target region in the original image frame is encoded based on a first quantization parameter, and the background region in the original image frame is encoded based on a second quantization parameter to obtain an instant-decoded refresh frame. Here, the request to obtain the instant-decoded refresh frame is sent by the receiving end when the network's first health score at the current time is less than or equal to a first threshold. The first quantization parameter does not change with the first health score, and the second quantization parameter is negatively correlated with the first health score.
[0138] In step S620, an instant decoding refresh frame is sent to the receiving end so that the receiving end can decode the instant decoding refresh frame if at least a portion of the video encoded frames in the buffer are discarded in response to a first health score being less than or equal to a first threshold.
[0139] In some embodiments, images can be captured from the industrial site in a corresponding industrial setting to obtain corresponding image frames. As some implementations, image capture can be performed using a camera to obtain the corresponding image frames.
[0140] In some embodiments, target detection algorithms can be used to identify target regions in the acquired image frames. Taking an industrial welding scenario as an example, at least one of the following can be identified: the region containing the weld seam with strong linear characteristics, the region containing the bevel with high contrast characteristics, and the region containing the welding torch tip. Considering that the YOLO series algorithms (such as YOLOv5 and YOLOv8) have extremely high inference speeds on industrial edge computing devices (real-time inference), capable of meeting the millisecond-level real-time requirements in welding guidance, in some embodiments, YOLO series algorithms can be used to identify target regions in the acquired image frames. It should be understood that target detection algorithms may also include, but are not limited to, Single Shot MultiBox Detector (SSD) algorithms or faster region-based convolutional neural networks (Faster R-CNN) and other mainstream deep learning model algorithms.
[0141] By identifying the target region, a binary mask matrix with the same resolution as the image frame can be generated. ,in: ;
[0142] here, Represents discrete pixel coordinates within the image plane. Indicates the target area. Indicates the background area.
[0143] To avoid drastic changes in the target area in each frame due to detection noise, Gaussian filtering can be used to smooth and de-jitter the boundary coordinates of the mask matrix, ensuring the visual stability of the transmitted image.
[0144] Next, the encoder at the transmitting end reads the mask matrix. Macroblocks marked as 1 can be encoded based on the first quantization parameter, and macroblocks marked as 0 can be encoded based on the second quantization parameter.
[0145] In some embodiments, multiple sets of images containing calibration references (such as high-precision checkerboard patterns) can be acquired by the camera, and the intrinsic parameter matrix and distortion coefficients of the camera can be calculated using the Zhang Zhengyou calibration method, so that they can be used to generate the corresponding distortion mesh at the receiving end.
[0146] In some embodiments, in response to receiving a request from the receiving end to acquire an instant decoded refresh frame, the current GOP encoding sequence is also interrupted in order to acquire the original image frame corresponding to the latest moment in a timely manner and encode the original image frame.
[0147] In some embodiments, a first health score of the network at the current moment can be obtained. In some embodiments, the first health score can be determined based on a first score, wherein the first score is obtained based on the network's round-trip time, jitter value, and packet loss rate at the current moment. As some implementations, the network's round-trip time, jitter value, and packet loss rate at the current moment can be obtained through a network probe.
[0148] In some embodiments, the first score is obtained by summing the difference between 1 and the first ratio multiplied by a first weighting coefficient, the difference between 1 and the second ratio multiplied by a second weighting coefficient, and the difference between 1 and the third ratio multiplied by a third weighting coefficient, wherein the first ratio is the ratio of the network's round-trip time at the current moment to a first normalized baseline threshold, the second ratio is the ratio of the network's jitter at the current moment to a second normalized baseline threshold, and the third ratio is the ratio of the network's packet loss rate at the current moment to a third normalized baseline threshold.
[0149] As one implementation method, the first score is obtained based on the following formula:
[0150] ,
[0151] Where Q1' represents the first score, This represents the first weighting coefficient. This represents the second weighting coefficient. This represents the third weighting coefficient. This represents the round-trip time of the network at the current moment. This represents the network jitter value at the current moment. This represents the packet loss rate of the network at the current moment. This represents the first normalized baseline threshold. This represents the second normalized baseline threshold. This represents the third normalized baseline threshold.
[0152] As one implementation method, the first weighting coefficient may be greater than the second and third weighting coefficients.
[0153] In some embodiments, a first health score can be determined based on a first score and a second health score of the network at the previous time step, wherein the second health score is determined based on the network's round-trip time, jitter value, and packet loss rate at the previous time step.
[0154] In some embodiments, the first health score is determined by summing the difference between 1 and a preset smoothing factor multiplied by the second health score, and the preset smoothing factor multiplied by the first score.
[0155] As one implementation method, the first health score can be determined based on the following formula:
[0156] Where Q1 represents the first health score, Q1' represents the first score, and Q2 represents the second health score. This represents the preset smoothing factor.
[0157] In some embodiments, the second quantification parameter may increase exponentially as the network’s first health score decreases.
[0158] In some embodiments, a second quantitative parameter can be determined based on a first health score and a first quantitative parameter, wherein the difference between the second quantitative parameter and the first quantitative parameter is negatively correlated with the first health score.
[0159] In some embodiments, the second quantization parameter can be determined based on an intermediate value obtained by summing the first quantization parameter and the increment, wherein the increment is obtained by multiplying the difference between 1 and the first health score by a preset maximum quantization difference. Here, if the intermediate value is less than the minimum quantization parameter value, the second quantization parameter is determined as the minimum quantization parameter value; if the intermediate value is greater than the maximum quantization parameter value, the second quantization parameter is determined as the maximum quantization parameter value; and if the intermediate value is greater than or equal to the minimum quantization parameter value and less than or equal to the maximum quantization parameter value, the second quantization parameter is determined as the intermediate value.
[0160] As one implementation method, the second quantization parameter can be determined based on the following formula:
[0161] ;
[0162] in, Indicates the first quantization parameter. This represents the second quantitative parameter, and Q1 represents the first health score. This indicates the preset maximum quantization difference. This indicates the preset index. This represents the maximum quantization parameter value. This represents the minimum quantization parameter value. () represents the truncation function.
[0163] It should be understood that some embodiments of the video processing method applied to the transmitting end in this disclosure can be described with reference to some embodiments of the video processing method applied to the receiving end, and will not be repeated here.
[0164] Embodiments of this disclosure also provide a receiving device that can be configured to perform the video processing method applied to a receiving end in any of the above embodiments.
[0165] In some embodiments, the receiving device may include a module for performing the video processing method applied to the receiving end according to any of the above embodiments.
[0166] Figure 7 A schematic diagram of a receiving end device according to some embodiments of the present disclosure is shown. For example... Figure 7 As shown, the receiving device 700 may include a buffer operation module 710, a first transmission module 720, and a decoding module 730.
[0167] The buffer operation module 710 can be configured to discard at least a portion of the video encoded frames in the buffer in response to the network’s first health score at the current moment being less than or equal to a first threshold.
[0168] The first sending module 720 can be configured to send a request to the sending end to obtain an instant decoding refresh frame in response to the network's first health score at the current moment being less than or equal to a first threshold.
[0169] The decoding module 730 can be configured to decode an instant decoding refresh frame in response to receiving an instant decoding refresh frame sent by the transmitter.
[0170] In some embodiments, the receiving device 700 may further include other modules to perform the video processing method applied to the receiving end in any of the above embodiments.
[0171] For details on the operation of the various modules in the receiver device 700, please refer to the video processing methods applied to the receiver described above, which will not be repeated here.
[0172] Embodiments of this disclosure also provide a transmitting device that can be configured to perform the video processing method applied to a transmitting end in any of the above embodiments.
[0173] In some embodiments, the transmitting device may include a module for performing the video processing method applied to the transmitting end according to any of the above embodiments.
[0174] Figure 8 A schematic diagram of a transmitting end device according to some embodiments of the present disclosure is shown. For example... Figure 8 As shown, the transmitting device 800 may include an encoding module 810 and a second transmitting module 820.
[0175] The encoding module 810 can be configured to, in response to receiving a request from the receiving end to obtain an instant-decoded refresh frame, encode the target region in the original image frame based on a first quantization parameter and encode the background region in the original image frame based on a second quantization parameter to obtain an instant-decoded refresh frame. Here, the request to obtain the instant-decoded refresh frame is sent by the receiving end when the network's first health score at the current time is less than or equal to a first threshold. The first quantization parameter does not change with the first health score, and the second quantization parameter is negatively correlated with the first health score.
[0176] The second sending module 820 can be configured to send an instant-on-demand refresh frame to the receiving end, so that the receiving end can decode the instant-on-demand refresh frame in response to discarding at least a portion of the video encoded frames in the buffer in response to a first health score being less than or equal to a first threshold.
[0177] In some embodiments, the transmitting device 800 may further include other modules to perform the video processing method applied to the transmitting end in any of the above embodiments.
[0178] For details on the operation of the various modules in the transmitting device 800, please refer to the video processing methods applied to the transmitting end described above, which will not be repeated here.
[0179] Embodiments of this disclosure also provide a video processing apparatus.
[0180] Figure 9 A schematic diagram of a video processing apparatus according to some embodiments of the present disclosure is shown. For example... Figure 9As shown, the first video processing apparatus 900 may include a receiving device 700 and a transmitting device 800, wherein the receiving device 700 and the transmitting device 800 can be connected via a network 910 to transmit corresponding video streams and control signaling via the network 910. The receiving device 700 and the transmitting device 800 can synchronize the first health score of the network 910 at the current moment.
[0181] The operation of the receiving device 700, the operation of the transmitting device 800, and the interaction between the receiving device 700 and the transmitting device 800 can be described above, and will not be repeated here.
[0182] Figure 10 Schematic diagrams of a video processing apparatus according to other embodiments of the present disclosure are shown. For example... Figure 10 As shown, the second video processing apparatus 1000 may include a memory 1010 and a processor 1020 coupled to the memory 1010. The processor 1020 may be configured to execute the video processing method of any of the foregoing embodiments based on instructions stored in the memory 1010.
[0183] Specifically, processor 1020 can perform various actions and processes according to instructions stored in memory 1010. Processor 1020 can be an integrated circuit chip with signal processing capabilities. The processor can be a general-purpose processor, digital signal processor (DSP), application-specific integrated circuit (ASIC), field-programmable gate array (FPGA), or other programmable logic device, discrete gate or transistor logic device, or discrete hardware component. It can implement or execute the various methods, steps, and logic block diagrams disclosed in the embodiments of this disclosure. The general-purpose processor can be a microprocessor or any conventional processor, and can be an x86 architecture or an ARM architecture, etc.
[0184] Memory 1010 stores executable instructions that, when executed by processor 1020, implement the video processing method described above. Memory 1010 may be volatile memory or non-volatile memory, or may include both. Non-volatile memory may be read-only memory (ROM), programmable read-only memory (PROM), erasable programmable read-only memory (EPROM), electrically erasable programmable read-only memory (EEPROM), or flash memory. Volatile memory may be random access memory (RAM), which serves as an external cache. By way of example, but not limitation, many forms of RAM are available, such as static random access memory (SRAM), dynamic random access memory (DRAM), synchronous dynamic random access memory (SDRAM), double data rate synchronous dynamic random access memory (DDRSDRAM), enhanced synchronous dynamic random access memory (ESDRAM), synchronous linked dynamic random access memory (SLDRAM), and direct memory bus random access memory (DR RAM). It should be noted that the memory of the methods described herein is intended to include, but is not limited to, these and any other suitable types of memory.
[0185] This disclosure also proposes a non-transitory computer-readable storage medium storing a computer program that, when executed by a processor, can implement the steps of the video processing method described above.
[0186] Similarly, the non-transitory computer-readable storage media in the embodiments of this disclosure are intended to include, but are not limited to, the above and any other suitable types of memory.
[0187] This disclosure also proposes a computer program product that may include a computer program that, when executed by a processor, can implement the steps of the video processing method described above.
[0188] Instructions can be any set of instructions that will be executed directly by one or more processors, such as machine code, or any set of instructions that will be executed indirectly, such as a script. The terms “instruction,” “application,” “procedure,” “step,” “program,” and “computer program” used herein are used interchangeably. Instructions can be stored in object code format for direct processing by one or more processors, or stored in any other computer language, including scripts or sets of independent source code modules that are interpreted on demand or compiled in advance. The function, methods, and routines of instructions are explained in more detail in other parts of this document.
[0189] Figure 11A schematic block diagram of a computer system 1100 on which embodiments of the present disclosure may be implemented is shown. The computer system 1100 includes a bus 1110 or other communication mechanism for transmitting information, and a processing means 1120 coupled to the bus 1110 for processing information. The computer system 1100 also includes a storage means 1130 coupled to the bus 1110 for storing instructions to be executed by the processing means 1120. The storage means 1130 may include random access memory (RAM) or other dynamic storage devices. The storage means 1130 may be used to store temporary variables or other intermediate information during the execution of instructions to be executed by the processing means 1120. The storage means 1130 may include a read-only memory (ROM) or other static storage device for storing static information and instructions for the processing means 1120. The storage means 1130 may include means such as a magnetic disk or optical disk for storing information and instructions. Computer system 1100 may be coupled via bus 1110 to output device 1140 for providing output to a user, such as, but not limited to, a display (such as a cathode ray tube (CRT) or liquid crystal display (LCD)), speakers, etc. Input device 1150, such as a keyboard, mouse, microphone, etc., is coupled to bus 1110 for transmitting information and command selections to processing device 1120. Computer system 1100 may execute embodiments of this disclosure. Consistent with certain implementations of this disclosure, computer system 1100 provides results by executing one or more sequences of one or more instructions contained in storage device 1130 in response to processing device 1120. Such instructions may be read into storage device 1130 from another computer-readable medium, such as storage device 1130. Execution of the sequence of instructions contained in storage device 1130 causes processing device 1120 to perform the methods described herein. Alternatively, hard-wired circuitry may be used in place of or in combination with software instructions to implement the teachings. Therefore, implementations of this disclosure are not limited to any particular combination of hardware circuitry and software. In various embodiments, computer system 1100 can be connected across a network to one or more other computer systems, such as computer system 1100, to form a networked system via network interface 1160. This network may include a private network or a public network such as the Internet. In a networked system, one or more computer systems may store data and supply data to other computer systems. As used herein, the term "computer-readable medium" refers to any medium that participates in providing instructions to processing device 1120 for execution. Such media can take many forms, including but not limited to non-volatile media, volatile media, and transmission media. Non-volatile media include, for example, optical discs or magnetic disks. Volatile media include dynamic memory such as RAM. Transmission media include coaxial cables, copper wires, and optical fibers, including wiring that includes bus 1110.Common forms of computer-readable media or computer program products include, for example, floppy disks, flexible disks, hard disks, magnetic tapes, or any other magnetic media, CD-ROMs, digital video discs (DVDs), Blu-ray discs, any other optical media, thumb drives, memory cards, RAM, PROMs and EPROMs, fast EPROMs, any other memory chips or cartridges, or any other tangible media from which a computer can read. Various forms of computer-readable media may be involved when carrying one or more sequences of one or more instructions to processing device 1120 for execution. For example, instructions may initially be carried on a disk of a remote computer. The remote computer may load the instructions into its dynamic memory and transmit the instructions over a telephone line using a modem. A modem local to computer system 1100 may receive data over a telephone line and convert the data into an infrared signal using an infrared transmitter. An infrared detector coupled to bus 1110 may receive the data carried in the infrared signal and place the data on bus 1110. Bus 1110 carries the data to storage device 1130, from which processing device 1120 retrieves and executes the instructions. Optionally, the instructions received by the storage device 1130 may be stored on the storage device 1130 before or after they are executed by the processing device 1120.
[0190] According to various embodiments, instructions configured to be executed by processing device 1120 to perform a method are stored on a computer-readable medium. The computer-readable medium may be a device for storing digital information. For example, the computer-readable medium includes a compact disc read-only memory (CD-ROM) as known in the art for storing software. The computer-readable medium is accessed by a processor adapted to execute the instructions configured to be executed.
[0191] It should be noted that the flowcharts and block diagrams in the accompanying drawings illustrate the architecture, functionality, and operation of possible implementations of systems, methods, and computer program products according to various embodiments of this disclosure. In this regard, each block in a flowchart or block diagram may represent a module, segment, or portion of code containing one or more executable instructions for implementing a specified logical function. It should also be noted that in some alternative implementations, the functions indicated in the blocks may occur in a different order than those indicated in the drawings. For example, two consecutively indicated blocks may actually be executed substantially in parallel, and they may sometimes be executed in reverse order, depending on the functions involved. It should also be noted that each block in the block diagrams and / or flowcharts, and combinations of blocks in the block diagrams and / or flowcharts, can be implemented using a dedicated hardware-based system that performs the specified function or operation, or using a combination of dedicated hardware and computer instructions.
[0192] In general, the various exemplary embodiments of this disclosure can be implemented in hardware or dedicated circuitry, software, firmware, logic, or any combination thereof. Some aspects can be implemented in hardware, while others can be implemented in firmware or software that can be executed by a controller, microprocessor, or other computing device. When aspects of embodiments of this disclosure are illustrated or described as block diagrams, flowcharts, or using some other graphical representation, it will be understood that the blocks, apparatuses, systems, techniques, or methods described herein can be implemented as non-limiting examples in hardware, software, firmware, dedicated circuitry or logic, general-purpose hardware or controllers or other computing devices, or some combination thereof.
[0193] As used herein, the term “exemplary” means “used as an example, instance, or illustration” and not as a “model” to be exactly copied. Any implementation described herein by example is not necessarily to be construed as preferred or advantageous over other implementations.
[0194] Furthermore, terms such as “first,” “second,” etc., may be used in this document for reference purposes only and are not intended to be limiting. For example, unless the context clearly indicates otherwise, the words “first,” “second,” and other such numerical terms relating to structures or elements do not imply order or sequence.
[0195] It should also be understood that when the term “including / contains” is used herein, it indicates the presence of the indicated feature, whole, step, operation, unit and / or component, but does not preclude the presence or addition of one or more other features, wholes, steps, operations, units and / or components and / or combinations thereof.
[0196] In this disclosure, the term “provide” is used broadly to cover all ways of obtaining an object, and therefore “provide an object” includes, but is not limited to, “purchasing,” “preparing / manufacturing,” “arranging / setting up,” “installing / assembling,” and / or “ordering” an object.
[0197] As used herein, the term “and / or” includes any and all combinations of one or more of the listed items in association. The terminology used herein is for the purpose of describing particular embodiments only and is not intended to be limiting of this disclosure. As used herein, the singular forms “a,” “an,” and “the” are also intended to include the plural forms unless the context clearly indicates otherwise.
[0198] Those skilled in the art will recognize that the boundaries between the above operations are merely illustrative. Multiple operations may be combined into a single operation, a single operation may be distributed among additional operations, and operations may be performed with at least partial overlap in time. Moreover, alternative embodiments may include multiple instances of a particular operation, and the order of operations may be changed in various other embodiments. However, other modifications, variations, and substitutions are equally possible. Aspects and elements of all the embodiments disclosed above may be combined in any way and / or in combination with aspects or elements of other embodiments to provide multiple additional embodiments. Therefore, this specification and the accompanying drawings should be considered illustrative rather than restrictive.
[0199] While specific embodiments of this disclosure have been described in detail by way of example, those skilled in the art should understand that the examples are for illustrative purposes only and not intended to limit the scope of this disclosure. The various embodiments disclosed herein can be combined in any way without departing from the spirit and scope of this disclosure. Those skilled in the art should also understand that various modifications can be made to the embodiments without departing from the scope and spirit of this disclosure. The scope of this disclosure is defined by the appended claims.
Claims
1. A method of video processing, the method comprising: The video processing method is applied at the receiving end, and the video processing method includes: In response to the network’s first health score being less than or equal to a first threshold at the current moment, at least a portion of the video encoded frames in the buffer are discarded, and a request to obtain the instant decoded refresh frames is sent to the sender. In response to receiving the instant decoding refresh frame sent by the sending end, the instant decoding refresh frame is decoded. The instant decoding refresh frame is obtained by encoding the target region in the original image frame based on a first quantization parameter and encoding the background region in the original image frame based on a second quantization parameter. The first quantization parameter does not change with the first health score, while the second quantization parameter is negatively correlated with the first health score. The second quantization parameter is determined based on an intermediate value obtained by summing the first quantization parameter and the increment. The increment is obtained by multiplying the difference between 1 and the first health score by a preset maximum quantization difference. When the intermediate value is less than the minimum quantization parameter value, the second quantization parameter is determined as the minimum quantization parameter value. When the intermediate value is greater than the maximum quantization parameter value, the second quantization parameter is determined as the maximum quantization parameter value. When the intermediate value is greater than or equal to the minimum quantization parameter value and less than or equal to the maximum quantization parameter value, the second quantization parameter is determined as the intermediate value.
2. The video processing method of claim 1, wherein, At least a portion of the video-coded frames in the discard buffer include: Discard all forward predictive coded frames and all bidirectional predictive coded frames in the buffer; or Discard all video encoded frames in the buffer.
3. The video processing method of claim 1, wherein, The second quantification parameter increases exponentially as the network's first health score decreases.
4. The video processing method of claim 1, wherein, The first health score is determined based on a first score, which is obtained based on the network's round-trip latency, jitter value, and packet loss rate at the current moment.
5. The video processing method of claim 4, wherein, The first score is obtained by summing the difference between 1 and a first ratio multiplied by a first weighting coefficient, the difference between 1 and a second ratio multiplied by a second weighting coefficient, and the difference between 1 and a third ratio multiplied by a third weighting coefficient. Wherein, the first ratio is the ratio of the round-trip time of the network at the current moment to the first normalized reference threshold, the second ratio is the ratio of the jitter value of the network at the current moment to the second normalized reference threshold, and the third ratio is the ratio of the packet loss rate of the network at the current moment to the third normalized reference threshold.
6. The video processing method of claim 5, wherein, The first weighting coefficient is greater than the second weighting coefficient and the third weighting coefficient.
7. The video processing method of claim 4, wherein, The first health score is determined based on the first score and the second health score of the network at the previous time, wherein the second health score is determined based on the round-trip delay, jitter value and packet loss rate of the network at the previous time.
8. The video processing method of claim 7, wherein, The first health score is determined by summing the difference between 1 and a preset smoothing factor multiplied by the second health score, and the preset smoothing factor multiplied by the first score.
9. The video processing method of claim 1, wherein, The video processing method further includes: The instantaneous decoding refresh frame is displayed on the display interface; A distorted mesh is generated and displayed on the display interface.
10. The video processing method of claim 9, wherein, Generating distorted meshes includes: Obtain the first coordinates of each vertex of the standard grid in the world coordinate system; Based on the mapping relationship between the world coordinate system and the pixel coordinate system of the display interface, the first coordinates of each vertex are mapped to obtain the second coordinates of each vertex in the pixel coordinate system; Based on the radial distortion coefficient and the tangential distortion coefficient, the second coordinates of each vertex in the pixel coordinate system are mapped to obtain the third coordinates of each vertex in the pixel coordinate system. The distorted mesh is generated based on the third coordinates of each vertex in the pixel coordinate system.
11. The video processing method according to claim 1, characterized in that, The target area includes the area containing at least one of the weld, bevel, and welding torch tip.
12. A video processing method, characterized in that, The video processing method is applied at the sending end, and the video processing method includes: In response to receiving a request from the receiving end to obtain an instant decoding refresh frame, the target region in the original image frame is encoded based on a first quantization parameter, and the background region in the original image frame is encoded based on a second quantization parameter to obtain the instant decoding refresh frame. The request is sent by the receiving end when the network's first health score at the current time is less than or equal to a first threshold. The first quantization parameter does not change with the first health score, and the second quantization parameter is negatively correlated with the first health score. The instant-on decoding refresh frame is sent to the receiving end, so that the receiving end can decode the instant-on decoding refresh frame if it discards at least a portion of the video encoded frames in the buffer in response to the first health score being less than or equal to a first threshold. The second quantization parameter is determined based on an intermediate value obtained by summing the first quantization parameter and the increment. The increment is obtained by multiplying the difference between 1 and the first health score by a preset maximum quantization difference. When the intermediate value is less than the minimum quantization parameter value, the second quantization parameter is determined as the minimum quantization parameter value. When the intermediate value is greater than the maximum quantization parameter value, the second quantization parameter is determined as the maximum quantization parameter value. When the intermediate value is greater than or equal to the minimum quantization parameter value and less than or equal to the maximum quantization parameter value, the second quantization parameter is determined as the intermediate value.
13. The video processing method according to claim 12, characterized in that, The second quantification parameter increases exponentially as the network's first health score decreases.
14. The video processing method according to claim 12, characterized in that, The first health score is determined based on a first score, which is obtained based on the network's round-trip latency, jitter value, and packet loss rate at the current moment.
15. The video processing method according to claim 14, characterized in that, The first score is obtained by summing the difference between 1 and a first ratio multiplied by a first weighting coefficient, the difference between 1 and a second ratio multiplied by a second weighting coefficient, and the difference between 1 and a third ratio multiplied by a third weighting coefficient. Wherein, the first ratio is the ratio of the round-trip time of the network at the current moment to the first normalized reference threshold, the second ratio is the ratio of the jitter value of the network at the current moment to the second normalized reference threshold, and the third ratio is the ratio of the packet loss rate of the network at the current moment to the third normalized reference threshold.
16. The video processing method according to claim 14, characterized in that, The first health score is determined based on the first score and the second health score of the network at the previous time, wherein the second health score is determined based on the round-trip delay, jitter value and packet loss rate of the network at the previous time.
17. A receiving device, characterized in that, The receiving device includes: A module configured to perform the video processing method according to any one of claims 1 to 11.
18. A transmitting device, characterized in that, The transmitting device includes: A module configured to perform the video processing method according to any one of claims 12 to 16.
19. A video processing apparatus, characterized in that, The video processing device includes: Memory; and A processor coupled to the memory is configured to execute the video processing method of any one of claims 1 to 16 based on instructions stored in the memory.
20. A computer-readable storage medium, characterized in that, The computer-readable storage medium includes a computer program, wherein the computer program, when executed by a processor, implements the steps of the video processing method according to any one of claims 1 to 16.
21. A computer program product, characterized in that, The computer program product includes a computer program, wherein when the computer program is executed by a processor, it implements the steps of the video processing method according to any one of claims 1 to 16.