Narrowband video transmission method and apparatus under limited bandwidth condition, and electronic device
By decoding high-resolution video streams and extracting structured information, low-bitrate video streams are generated. At the receiving end, generative adversarial networks are used for super-resolution reconstruction, which solves the problems of packet loss and stuttering in video transmission under bandwidth-constrained conditions and achieves stable and efficient narrowband video transmission.
Patent Information
- Authority / Receiving Office
- WO · WO
- Patent Type
- Applications
- Current Assignee / Owner
- 709TH RESEARCH INSTITUTE CHINA STATE SHIPBUILDING CORP LTD
- Filing Date
- 2024-12-30
- Publication Date
- 2026-06-11
AI Technical Summary
When performing narrowband transmission under bandwidth-limited conditions, existing technologies are prone to generating compressed data packets that exceed the bandwidth limit, leading to video packet loss and stuttering.
By decoding and extracting structured information from high-resolution surveillance video streams, low-bitrate video streams are generated. Then, image quality enhancement is performed at the receiving end using super-resolution reconstruction technology based on generative adversarial networks, thus achieving stable transmission of the video stream.
It effectively avoids video packet loss and stuttering issues, ensures the clarity and smoothness of video transmission, reduces computational complexity, and improves real-time performance.
Smart Images

Figure CN2024143685_11062026_PF_FP_ABST
Abstract
Description
Methods, apparatus and electronic devices for narrowband video transmission under bandwidth-constrained conditions [Technical Field]
[0001] This application belongs to the field of streaming video narrowband transmission technology, and more specifically, relates to a video narrowband transmission method, apparatus and electronic device under bandwidth-constrained conditions. [Background Technology]
[0002] Surveillance video is widely used in smart cities, regional security, emergency command, and maritime law enforcement. With the increasing environmental reconnaissance and perception capabilities of unmanned and mobile equipment, more and more wireless transmission links, such as satellite communication, antenna communication, or shortwave, are being added to existing wired video transmission topologies based on Ethernet or fiber optics. For command centers, to achieve integrated real-time video command and control between the front and back ends, and to enable coordinated manned and unmanned operations, it is essential to solve the stability and reliability issues of narrowband video stream transmission under bandwidth-constrained conditions. Therefore, most solutions address this problem by optimizing encoding and decoding algorithms to improve the compression ratio and reduce the video stream bitrate.
[0003] Currently, mainstream encoding and decoding algorithms mainly fall into two categories: rate-first and quality-first. Rate-first (CBR) uses a fixed bit rate to ensure a constant bit rate, but the image quality is unstable, especially when there are many dynamic scenes. Quality-first uses a variable bit rate (VBR), which allows the encoding bit rate to fluctuate within the bit rate statistics time, thereby ensuring the stable quality of the encoded image. It is particularly effective when the bit rate is very low for relatively static scenes, but it is prone to very high bit rates when the image changes significantly.
[0004] However, CBR, VBR, and even the combined Adaptive Variable Bit Rate (AVBR) and Constrained Variable Bit Rate (CVBR) are only meaningful when the bitrate is close to the actual bandwidth. Taking a satellite communication terminal as an example, the available bandwidth is typically around 2Mbps. Aside from other services, the allocated bandwidth for the video channel is often only around 200Kbps. In contrast, a single HD network camera video stream typically reaches 4Mbps to 10Mbps, and 720P video streams reach 1Mbps. In the most extreme cases, the compression ratio of H264 / H265 video needs to be increased to nearly 20 times. Therefore, conventional video compression algorithms cannot meet the narrowband transmission requirements under bandwidth-constrained conditions. When there are significant changes in the scene, compressed data packets exceeding the bandwidth limit are easily generated, leading to packet loss and subsequent screen tearing and stuttering. [Summary of the Invention]
[0005] In view of the shortcomings of the prior art, the purpose of this application is to provide a video narrowband transmission method, apparatus and electronic device under bandwidth-limited conditions, which aims to solve the problem that the prior art is prone to generating compressed data packets that exceed the limited bandwidth when performing narrowband transmission under bandwidth-limited conditions, resulting in packet loss and causing screen tearing and stuttering.
[0006] To achieve the above objectives, in a first aspect, this application provides a method for narrowband video transmission under bandwidth-constrained conditions, applied at a transmitting end, comprising:
[0007] Acquire the target video stream and decode the target video stream to obtain a high-resolution original video stream;
[0008] Extract the structured information of dynamic targets from the original video stream;
[0009] The packet loss rate of the original video stream is obtained. If the packet loss rate is greater than 0, the original video stream is degraded and transcoded to generate a low bitrate video stream. The degraded and transcoded low bitrate video stream and the structured information are sent to the receiving end. Otherwise, the original video stream and the structured information are sent directly to the receiving end.
[0010] This application decodes and extracts structured information from high-resolution surveillance video streams, and then performs down-encoding to generate low-bitrate video streams suitable for narrowband network environments, which are then sent to the receiving end to avoid video packet loss and stuttering issues when network bandwidth is insufficient.
[0011] According to the video narrowband transmission method under bandwidth-constrained conditions provided in this application, the step of performing degradation transcoding on the original video stream includes:
[0012] The original video stream is scaled by 1 / 2, the frame rate and quantization parameters are adjusted, and the video is re-encoded based on constant bit rate CBR compression mode to generate a video stream with the target bit rate.
[0013] This application generates a low-bitrate video stream suitable for narrowband network environments by performing down-encoding on high-resolution surveillance video streams and video encoding based on CBR compression mode, and sends it to the receiving end to avoid video packet loss and stuttering problems when network bandwidth is insufficient.
[0014] According to the video narrowband transmission method under bandwidth-constrained conditions provided in this application, the method further includes:
[0015] Obtain the packet loss rate of the low bitrate video stream during transmission;
[0016] Based on the packet loss rate, adjust the target resolution and encoded output frame rate during the degraded transcoding process.
[0017] This application can automatically adjust the target resolution and encoding output frame rate of the degraded transcoding process according to the real-time packet loss rate of the video stream during transmission, which can maximize the utilization of bandwidth and ensure the clarity and smoothness of narrowband video transmission.
[0018] Secondly, this application provides a method for narrowband video transmission under bandwidth-constrained conditions, applied at a receiving end, including:
[0019] Receive the raw video stream or low bitrate video stream sent by the sending end, as well as structured information;
[0020] The low bitrate video stream is upscaled and its image quality enhanced based on the structured information.
[0021] This application, upon receiving a low-bitrate video stream at the receiving end, utilizes the structured information in the original resolution image to perform resolution upscaling and image quality enhancement on the low-resolution video stream.
[0022] According to the video narrowband transmission method under bandwidth-constrained conditions provided in this application, the step of performing resolution upscaling and image quality enhancement on the original video stream based on the structured information includes:
[0023] Based on the structured information, an image generator based on a Super-Resolution Generative Adversarial Network (SRGAN) is trained to obtain a trained image generator.
[0024] Based on the trained image generator and the deep learning-based super-resolution reconstruction technology, the low bitrate video stream is upscaled and its image quality enhanced.
[0025] This application utilizes the structured information in the original resolution image to train an SRGAN-based image generator. After receiving a low bitrate video stream at the receiving end, it performs super-resolution reconstruction without prior knowledge to restore the video. At the same time, it uses the trained image generator to generate realistic high-resolution images, thereby improving the super-resolution reconstruction effect and reducing computational complexity. This enhances the real-time performance of real-time video stream processing and avoids video packet loss and stuttering issues when applying high-definition narrowband video transmission.
[0026] According to the method for narrowband video transmission under bandwidth-constrained conditions provided in this application, the step of training an image generator based on a Generative Adversarial Network (SRGAN) based on the structured information includes:
[0027] Initialize the image generator;
[0028] The discriminator is initialized based on the structured information.
[0029] The low bitrate video stream is used as input to the image generator to generate an image at the original resolution.
[0030] Extract the second structured information of the dynamic target from the original resolution image, and calculate the Euclidean distance between the second structured information and the structured information;
[0031] Based on the Euclidean distance, the network weights of the image generator and the discriminator are updated alternately, and they compete against each other to gradually improve the reconstruction ability of the image generator.
[0032] The image generator obtained after the model converges is used as the trained image generator.
[0033] Thirdly, this application provides a narrowband video transmission device under bandwidth-constrained conditions, comprising:
[0034] The acquisition module is used to acquire the target video stream and decode the target video stream to obtain a high-resolution original video stream.
[0035] The extraction module is used to extract the structured information of dynamic targets in the original video stream;
[0036] The sending module is used to obtain the packet loss rate of the original video stream, and if the packet loss rate is greater than 0, to perform down-quality transcoding on the original video stream to generate a low bitrate video stream, and send the down-quality transcoded original video stream and the structured information to the receiving end; otherwise, to directly send the original video stream and the structured information to the receiving end.
[0037] Fourthly, this application provides a narrowband video transmission apparatus under bandwidth-constrained conditions, comprising:
[0038] The receiving module is used to receive the raw video stream or low bitrate video stream sent by the sending end, as well as structured information;
[0039] The upsampling module is used to increase the resolution and enhance the image quality of the low bitrate video stream based on the structured information.
[0040] Fifthly, this application provides an electronic device, comprising: at least one memory for storing a program; and at least one processor for executing the program stored in the memory, wherein when the program stored in the memory is executed, the processor is configured to execute the narrowband video transmission method under bandwidth-constrained conditions described in the first aspect or any possible implementation thereof.
[0041] In a sixth aspect, this application provides a computer-readable storage medium storing a computer program that, when run on a processor, causes the processor to perform the narrowband video transmission method under bandwidth-constrained conditions described in the first aspect or any possible implementation thereof.
[0042] In a seventh aspect, this application provides a computer program product that, when run on a processor, causes the processor to execute the video narrowband transmission method under bandwidth-constrained conditions described in the first aspect or any possible implementation of the first aspect.
[0043] It is understood that the beneficial effects of the third to seventh aspects mentioned above can be found in the relevant descriptions of the first and second aspects mentioned above, and will not be repeated here.
[0044] Overall, the technical solutions conceived in this application have the following beneficial effects compared with the prior art:
[0045] (1) This application decodes and extracts structured information from high-resolution surveillance video streams, and then performs down-encoding to generate low-bitrate video streams suitable for narrowband network environments, which are then sent to the receiving end. This avoids video packet loss and stuttering problems that are prone to occur when bandwidth is limited. After receiving the low-bitrate video stream, the receiving end uses the structured information in the original resolution image to perform resolution upscaling and image quality enhancement on the low-resolution video stream.
[0046] (2) The target resolution and encoding output frame rate of the degraded transcoding process are automatically adjusted according to the real-time packet loss rate of the video stream during transmission, which can maximize the use of bandwidth and ensure the clarity and smoothness of narrowband video transmission.
[0047] (3) The structured information in the original resolution image is used to train the SRGAN-based image generator. After receiving the low bit rate video stream at the receiving end, the video is restored by super-resolution reconstruction without prior knowledge. At the same time, the trained image generator is used to generate realistic high-resolution images, which improves the super-resolution reconstruction effect and reduces the computational complexity, improves the real-time performance of real-time video stream processing, and avoids video packet loss and stuttering problems when facing high-definition narrowband video transmission applications. [Attached Image Description]
[0048] To more clearly illustrate the technical solutions in this application or the prior art, the drawings used in the description of the embodiments or the prior art will be briefly introduced below. Obviously, the drawings described below are some embodiments of this application. For those skilled in the art, other drawings can be obtained based on these drawings without creative effort.
[0049] Figure 1 is one of the flowcharts of a narrowband video transmission method under bandwidth-constrained conditions provided in an embodiment of this application;
[0050] Figure 2 is a flowchart of the low pass rate control module provided in the embodiment of this application;
[0051] Figure 3 is a second schematic flowchart of a narrowband video transmission method under bandwidth-constrained conditions provided in an embodiment of this application;
[0052] Figure 4 is an application block diagram of the narrowband transmission system provided in an embodiment of this application;
[0053] Figure 5 is a flowchart of the training steps of the SRGAN-based image generator provided in the embodiments of this application;
[0054] Figure 6 is a flow diagram of video stream data transmission and processing provided in an embodiment of this application;
[0055] Figure 7 is one of the structural schematic diagrams of a narrowband video transmission device under bandwidth-limited conditions provided in an embodiment of this application;
[0056] Figure 8 is a second schematic diagram of the video narrowband transmission device under bandwidth-limited conditions provided in the embodiments of this application;
[0057] Figure 9 is a schematic diagram of the structure of the electronic device provided in an embodiment of this application. 【Detailed Implementation Methods】
[0058] To make the objectives, technical solutions, and advantages of this application clearer, the following detailed description is provided in conjunction with the accompanying drawings and embodiments. It should be understood that the specific embodiments described herein are merely illustrative and not intended to limit the scope of this application.
[0059] In this article, the term "and / or" describes the relationship between related objects, indicating that three relationships can exist. For example, A and / or B can represent: A existing alone, A and B existing simultaneously, or B existing alone. The symbol " / " in this article indicates that the related objects are in an "or" relationship; for example, A / B means A or B.
[0060] In the embodiments of this application, the terms "exemplary" or "for example" are used to indicate that something is an example, illustration, or description. Any embodiment or design that is described as "exemplary" or "for example" in the embodiments of this application should not be construed as being more preferred or advantageous than other embodiments or design. Specifically, the use of the terms "exemplary" or "for example" is intended to present the relevant concepts in a specific manner.
[0061] In the description of the embodiments of this application, unless otherwise stated, "multiple" means two or more, for example, multiple processing units means two or more processing units, multiple elements means two or more elements, etc.
[0062] Next, the video narrowband transmission method under bandwidth-constrained conditions provided in the embodiments of this application will be introduced with reference to Figures 1-6.
[0063] Figure 1 is a flowchart illustrating one of the video narrowband transmission methods under bandwidth-constrained conditions provided in this application embodiment. As shown in Figure 1, the method is applied at the transmitting end and includes the following steps:
[0064] Step 100: Acquire the target video stream and decode it to obtain a high-resolution original video stream;
[0065] Video is transmitted between the sending and receiving ends. Before transmission, the sending end first needs to acquire the target video stream.
[0066] Optionally, the target video stream can be any video stream that needs to be transmitted. In one embodiment of this application, the target video stream is a target surveillance video stream.
[0067] Optionally, this application does not limit the method by which the sending end obtains the target video stream; it may be receiving the target video stream sent by other devices or directly importing the target video stream into the sending end, etc.
[0068] After acquiring the target video stream, the sending end decodes it to obtain the high-resolution original image.
[0069] In one embodiment of this application, after the sending end obtains the monitoring video stream from a network camera such as an Internet Protocol Camera (IPC), it decodes it based on H264 / H265 to obtain the original resolution image, such as 1080P@30fps or 1080P@60fps.
[0070] Step 110: Extract the structured information of dynamic targets in the original video stream;
[0071] After acquiring the decoded high-resolution image, the sending end extracts the structured information of dynamic targets in the image.
[0072] Optionally, the structured information includes features such as target semantics, location, and contour.
[0073] In one embodiment of this application, taking traffic monitoring footage as an example, the main objects include moving targets such as people and vehicles. The monitoring video of people and vehicles can be detected by a detection algorithm such as YOLOv5, the target contours in the image can be extracted, and then the attribute features can be extracted using the Faster R-CNN network.
[0074] The basic structured information form of a definable target is as follows:
[0075] in For multidimensional attribute feature data, y j Let M be the target object and M be the target number.
[0076] Step 120: Obtain the packet loss rate of the original video stream. If the packet loss rate is greater than 0, perform downgrading and transcoding on the original video stream to generate a low bitrate video stream. Then, send the downgrading and transcoding low bitrate video stream and structured information to the receiving end. Otherwise, send the original video stream and structured information directly to the receiving end.
[0077] Since the video transmission method of this application is completed under narrowband conditions, it is necessary to determine whether the sent video stream can be transmitted normally. This application makes the determination by obtaining the packet loss rate of the current bitrate video stream. When the packet loss rate is greater than 0, the video stream is degraded and transcoded, that is, the resolution and bitrate of the video stream are reduced so that it can be transmitted under narrowband conditions. Then it is sent to the receiving end along with the structured information. If the packet loss rate is not greater than 0, it is sent directly to the receiving end along with the structured information.
[0078] This application provides a video narrowband transmission method that decodes and extracts structured information from a high-resolution surveillance video stream, and then performs degradation transcoding to generate a low-bitrate video stream suitable for narrowband network environments, which is then sent to the receiving end to avoid video packet loss and stuttering problems when network bandwidth is insufficient.
[0079] In some embodiments, the degradation transcoding process of the original video stream in step 120 specifically includes:
[0080] Step 1201: Scale the original video stream by 1 / 2, adjust the frame rate and quantization parameters, and re-encode the video based on constant bit rate CBR compression mode to generate a video stream with the target bit rate.
[0081] This application performs degradation transcoding on high-resolution surveillance video streams, including 1 / 2 scaling, adjusting frame rate and quantization parameters, and video encoding based on CBR compression mode, to generate a low bitrate video stream suitable for narrowband network environments and send it to the receiving end, thus avoiding video packet loss and stuttering problems when network bandwidth is insufficient.
[0082] In one embodiment of this application, taking 1080P as an example, the transmitting end utilizes the resolution scaling function of the HiSilicon built-in Video Processing Sub-System (VPSS) to reduce the resolution to D1 resolution, lower the frame rate, and increase the quantizer parameter (QP) value to adjust the resolution and image quality. Then, it performs initial compression in CBR mode, reducing the bit rate from an average of 4Mbps to a constant peak of less than 500kbps, thereby ensuring smooth passage in rate-limited communication devices. The bit stream is then sent end-to-end to the narrowband switching device according to the Real-time Transport Protocol (RTP) streaming media protocol, and finally sent to the receiving end.
[0083] In some embodiments, the method further includes:
[0084] Step 130: Obtain the packet loss rate of the low bitrate video stream during transmission;
[0085] Step 140: Based on the packet loss rate, adjust the target resolution and encoded output frame rate during the degraded transcoding process.
[0086] After receiving a low-bitrate video stream, the receiving end can statistically analyze the packet loss rate of the low-bitrate video stream during transmission and send it back to the sending end.
[0087] After receiving the packet loss rate, the sending end diagnoses the packet loss rate of the current constant bitrate video stream on the narrowband transmission network and outputs the target resolution to the constant low bitrate stream generation module. At the same time, it adjusts the encoding output frame rate to reduce the bit rate. When the packet loss rate is 0, it increases the target resolution of the constant low bitrate stream generation module and its own encoding output frame rate to maximize the utilization of the narrowband bandwidth.
[0088] Figure 2 is a flowchart of the low-pass bitrate control module provided in this application. As shown in Figure 2, in one embodiment of this application, a low-pass bitrate control module adjusts the target resolution and the encoded output frame rate during the degradation transcoding process. Specifically, assuming the current packet loss rate is R, the scaling factor is S, and the bitrate is C, if the packet loss rate is greater than 0, the scaling factor is adjusted to S / 2. Then, the target resolution is calculated based on the original resolution and the scaling factor, and scaling is performed according to the current target resolution. The frame rate and quantizer parameter (QP) value are lowered, and CBR mode is used for encoding, with an output bitrate of C / 2.
[0089] If the packet loss rate is not greater than 0, adjust the scaling factor to 2S. If 2S > 1, adjust the scaling factor to 1 and calculate the target resolution based on the original resolution and the scaling factor. Otherwise, directly calculate the target resolution based on the original resolution and the scaling factor, and then scale according to the current target resolution, increase the frame rate and QP value, use CBR mode for encoding, and output bitrate 2C.
[0090] The above adaptive adjustment method can achieve maximum utilization of narrowband bandwidth.
[0091] Figure 3 is a second schematic flowchart of a narrowband video transmission method under bandwidth-constrained conditions provided in an embodiment of this application. As shown in Figure 3, the method is applied at the receiving end and includes the following steps:
[0092] Step 300: Receive the original video stream or low bitrate video stream sent by the sending end, as well as structured information;
[0093] The receiving end is a terminal that receives video. It can receive the original video stream or low bitrate video stream sent by the sending end in a narrowband environment, as well as structured information.
[0094] In one embodiment, both the transmitting and receiving ends use the Hisilicon 3531DV200 as the main chip for encoding / decoding and video edge processing. This chip supports H264 and H265 encoding / decoding up to 4K@30fps, and also supports built-in 1.5TFlops neural network acceleration capability. It supports models such as ResNet and YOLOv5, and can perform segmentation processing of target contours in the monitoring screen and extract target features.
[0095] Step 310: Based on structured information, the low bitrate video stream is upscaled and its image quality enhanced.
[0096] After receiving the low bitrate video stream and structured information, the receiving end can perform resolution upscaling and image quality enhancement on the low bitrate video stream based on the structured information.
[0097] Figure 4 is an application block diagram of the narrowband transmission system provided in an embodiment of this application. As shown in Figure 4, in one embodiment of this application, the sending end is a sending end video transcoding device and the receiving end is a receiving end video enhancement device. The sending end video transcoding device receives the IPC video stream and sends the video stream to the receiving end video enhancement device through a rate-limiting communication device.
[0098] In some embodiments, step 310 specifically includes:
[0099] Step 3101: Based on structured information, train the image generator based on the generative adversarial network SRGAN to obtain the trained image generator.
[0100] Step 3102: Based on the trained image generator and deep learning-based super-resolution reconstruction technology, the low bitrate video stream is upscaled and its image quality enhanced.
[0101] In one embodiment of this application, the receiving end receives a complete low-resolution video stream and decodes it using the built-in Video Decoder (VDEC) function of the Hi3531DV200 to generate a D1 resolution image. At this time, the resolution is then magnified by VPSS, but the image is relatively blurry. Since the low-resolution compression at the sending end results in a large loss of pixels, traditional interpolation processing alone cannot achieve good detail restoration.
[0102] This application employs deep learning-based super-resolution reconstruction technology. Through training with a large amount of real-world data, it performs inference and filling in details to make blurry images clearer. Image super-resolution reconstruction technology refers to restoring a given low-resolution image to a corresponding high-resolution image using a specific algorithm.
[0103] In one embodiment of this application, the Media Processing Platform (MPP) of the HiSilicon 3531DV200 supports models such as TensorFlow and ResNet. The trained model can be quantized and pruned for the embedded platform, removing components with minimal impact on the results and redundant parameters to achieve lightweight portability. Then, an algorithm is used for image enhancement. The deep learning-based super-resolution reconstruction method aims to restore low-resolution degraded real-world images to high-resolution reconstructed images while ensuring a certain level of real-time performance.
[0104] Since surveillance videos are presented in various scenarios, super-resolution is often performed without prior knowledge, resulting in limited effectiveness of algorithm enhancement. To improve the clarity of super-resolution enhancement, this application employs a Generative Adversarial Network (SRGAN) to generate realistic high-resolution images. After generating images at the original resolution using an SRGAN-based image generator module, a discriminator is used to distinguish between real and fake images. The most crucial aspect of the loss function in the adversarial network is the content loss, which requires calculating the difference between the generated image and the real image in the feature space to determine authenticity.
[0105] To train an SRGAN-based image generator, content loss can be represented by the Euclidean distance between the structured information sent by the transmitter and the reconstructed structured information. This method allows the generator to be trained using real surveillance footage, while also ensuring real-time performance and versatility during image changes. The structured information of the real image and the compressed image are transmitted to the receiver in parallel, which ensures improved super-resolution reconstruction results and reduced computational complexity during super-resolution and detail enhancement, thereby improving the real-time performance of real-time video stream processing.
[0106] In some embodiments, step 3101 specifically includes:
[0107] Step 31011, initialize the image generator;
[0108] Step 31012: Initialize the discriminator based on structured information;
[0109] Step 31013: Use the low bitrate video stream as input to the image generator to generate an image at the original resolution.
[0110] Step 31014: Extract the second structured information of the dynamic target in the original resolution image, and calculate the Euclidean distance between the second structured information and the structured information;
[0111] Step 31015: Based on Euclidean distance, alternately update the network weights of the image generator and the discriminator, and gradually improve the reconstruction ability of the image generator by competing with each other.
[0112] Step 31016: Obtain the converged image generator, which serves as the trained image generator.
[0113] Figure 5 is a flowchart of the training steps for an SRGAN-based image generator provided in an embodiment of this application. As shown in Figure 5, in one embodiment of this application, the training steps for the image generator are as follows:
[0114] Step 1: Initialize the generator on the receiving end video enhancement device;
[0115] Step 2: Initialize the discriminator based on the structured feature information data of the raw image received in real time;
[0116] Step 3: Use the received low-resolution video stream as input to the generator to generate an image at the original resolution.
[0117] Step 4: Calculate the Euclidean distance for the image at the original resolution to determine its authenticity.
[0118] Step 5: Alternately update the network weights of the generator and discriminator, allowing them to compete against each other and gradually improve the generator's reconstruction capability.
[0119] Step 6: Obtain the generator after the model converges.
[0120] Figure 6 is a video stream data transmission and processing flow diagram provided in an embodiment of this application. As shown in Figure 6, in one embodiment of this application, the sending end includes an original image acquisition module, a structured extraction module, a constant low bitrate stream generation module, and a low pass bitrate control module; the receiving end includes a video stream decoding module and an SR reconstruction module, wherein:
[0121] The raw image acquisition module completes the extraction and decoding of the encoded high-resolution video stream to obtain the high-resolution raw image;
[0122] The structured extraction module extracts structured information of targets in the monitoring screen, including target semantics, location, and contour features;
[0123] The constant low bitrate stream generation module performs degradation processing on high-resolution images and encodes them using CBR mode to output low bitrate video streams.
[0124] The low-pass bitrate control module diagnoses the packet loss rate of the current constant bitrate video stream in real time, dynamically adjusts the downsampling resolution, and adjusts the frame rate and QP quantization parameters of the encoded output.
[0125] The video stream decoding module completes the decoding function of the received low-resolution video stream with constant bit rate and submits the image to the SR reconstruction module;
[0126] The SR reconstruction module performs resolution upscaling and image quality enhancement on the decoded image.
[0127] Figure 7 is a schematic diagram of one of the video narrowband transmission devices under bandwidth-constrained conditions provided in an embodiment of this application. As shown in Figure 7, the device includes an acquisition module 710, an extraction module 720, and a transmission module 730, wherein:
[0128] The acquisition module 710 is used to acquire the target video stream and decode the target video stream to obtain a high-resolution original video stream.
[0129] Extraction module 720 is used to extract structured information of dynamic targets in the original video stream;
[0130] The sending module 730 is used to obtain the packet loss rate of the original video stream. If the packet loss rate is greater than 0, the original video stream is degraded and transcoded to generate a low bitrate video stream. The degraded and transcoded original video stream and structured information are then sent to the receiving end. Otherwise, the original video stream and structured information are sent directly to the receiving end.
[0131] It should be understood that the above-described device is used to execute the methods in the above embodiments. The implementation principle and technical effect of the corresponding program modules in the device are similar to those described in the above methods. The working process of the device can be referred to the corresponding process in the above methods, and will not be repeated here.
[0132] Figure 8 is a second structural schematic diagram of a narrowband video transmission device under bandwidth-constrained conditions provided in an embodiment of this application. As shown in Figure 8, the device includes a receiving module 810 and an upsampling module 820, wherein:
[0133] The receiving module 810 is used to receive the original video stream or low bitrate video stream sent by the sending end, as well as structured information;
[0134] The upsampling module 820 is used to upscale and enhance the image quality of low bitrate video streams based on structured information.
[0135] It should be understood that the above-described device is used to execute the methods in the above embodiments. The implementation principle and technical effect of the corresponding program modules in the device are similar to those described in the above methods. The working process of the device can be referred to the corresponding process in the above methods, and will not be repeated here.
[0136] Based on the methods in the above embodiments, Figure 9 illustrates a schematic diagram of the physical structure of an electronic device. As shown in Figure 9, this application embodiment provides an electronic device that may include: a processor 910, a communication interface 920, a memory 930, and a communication bus 940. The processor 910, communication interface 920, and memory 930 communicate with each other via the communication bus 940. The processor 910 can call logical instructions in the memory 930 to execute the narrowband video transmission method under bandwidth-constrained conditions described in the above embodiments.
[0137] Furthermore, the logical instructions in the aforementioned memory 930 can be implemented as software functional units and, when sold or used as independent products, can be stored in a computer-readable storage medium. Based on this understanding, the technical solution of this application, in essence, or the part that contributes to the prior art, or a portion of the technical solution, can be embodied in the form of a software product. This computer software product is stored in a storage medium and includes several instructions to cause a computer device (which may be a personal computer, server, or network device, etc.) to execute all or part of the steps of the video narrowband transmission method under bandwidth-constrained conditions described in the various embodiments of this application.
[0138] Based on the methods in the above embodiments, this application provides a computer-readable storage medium storing a computer program that, when run on a processor, causes the processor to execute the video narrowband transmission method under bandwidth-constrained conditions described in the above embodiments.
[0139] Based on the methods in the above embodiments, this application provides a computer program product that, when running on a processor, causes the processor to execute the video narrowband transmission method under bandwidth-limited conditions described in the above embodiments.
[0140] It is understood that the processor in the embodiments of this application can be a central processing unit (CPU), or other general-purpose processors, digital signal processors (DSPs), application-specific integrated circuits (ASICs), field-programmable gate arrays (FPGAs), or other programmable logic devices, transistor logic devices, hardware components, or any combination thereof. A general-purpose processor can be a microprocessor or any conventional processor.
[0141] The method steps in this application embodiment can be implemented in hardware or by a processor executing software instructions. The software instructions can consist of corresponding software modules, which can be stored in random access memory (RAM), flash memory, read-only memory (ROM), programmable read-only memory (PROM), erasable programmable read-only memory (EPROM), electrically erasable programmable read-only memory (EEPROM), registers, hard disks, portable hard disks, CD-ROMs, or any other form of storage medium known in the art. An exemplary storage medium is coupled to the processor, enabling the processor to read information from and write information to the storage medium. Of course, the storage medium can also be a component of the processor. The processor and the storage medium can reside in an ASIC.
[0142] In the above embodiments, implementation can be achieved entirely or partially through software, hardware, firmware, or any combination thereof. When implemented using software, it can be implemented entirely or partially as a computer program product. The computer program product includes one or more computer instructions. When the computer program instructions are loaded and executed on a computer, all or part of the processes or functions described in the embodiments of this application are generated. The computer can be a general-purpose computer, a special-purpose computer, a computer network, or other programmable device. The computer instructions can be stored in a computer-readable storage medium or transmitted through the computer-readable storage medium. The computer instructions can be transmitted from one website, computer, server, or data center to another website, computer, server, or data center via wired (e.g., coaxial cable, fiber optic, digital subscriber line (DSL)) or wireless (e.g., infrared, wireless, microwave, etc.) means. The computer-readable storage medium can be any available medium that a computer can access or a data storage device such as a server or data center that integrates one or more available media. The available medium can be a magnetic medium (e.g., floppy disk, hard disk, magnetic tape), an optical medium (e.g., DVD), or a semiconductor medium (e.g., solid-state disk (SSD)).
[0143] It is understood that the various numerical designations used in the embodiments of this application are merely for the convenience of description and are not intended to limit the scope of the embodiments of this application.
[0144] Those skilled in the art will readily understand that the above description is merely a preferred embodiment of this application and is not intended to limit this application. Any modifications, equivalent substitutions, and improvements made within the spirit and principles of this application should be included within the scope of protection of this application.
Claims
1. A method for narrowband video transmission under bandwidth-constrained conditions, characterized in that, This method is applied to the sending end and includes: Acquire the target video stream and decode the target video stream to obtain a high-resolution original video stream; Extract the structured information of dynamic targets from the original video stream; The packet loss rate of the original video stream is obtained. If the packet loss rate is greater than 0, the original video stream is degraded and transcoded to generate a low bitrate video stream. The degraded and transcoded low bitrate video stream and the structured information are sent to the receiving end. Otherwise, the original video stream and the structured information are sent directly to the receiving end.
2. The video narrowband transmission method under bandwidth-constrained conditions according to claim 1, characterized in that, The process of downgrading and transcoding the original video stream includes: The original video stream is scaled by 1 / 2, the frame rate and quantization parameters are adjusted, and the video is re-encoded based on constant bit rate CBR compression mode to generate a video stream with the target bit rate.
3. The video narrowband transmission method under bandwidth-constrained conditions according to claim 2, characterized in that, The method further includes: Obtain the packet loss rate of the low bitrate video stream during transmission; Based on the packet loss rate, adjust the target resolution and encoded output frame rate during the degraded transcoding process.
4. A method for narrowband video transmission under bandwidth-constrained conditions, characterized in that, This method is applied at the receiving end and includes: Receive the raw video stream or low bitrate video stream sent by the sending end, as well as structured information; The low bitrate video stream is amplified in resolution and enhanced in image quality based on the structured information.
5. The video narrowband transmission method under bandwidth-constrained conditions according to claim 4, characterized in that, The process of upscaling and enhancing the resolution and image quality of the low bitrate video stream based on the structured information includes: Based on the structured information, an image generator based on Generative Adversarial Network (SRGAN) is trained to obtain a trained image generator. Based on the trained image generator and the deep learning-based super-resolution reconstruction technology, the low bitrate video stream is upscaled and its image quality enhanced.
6. The video narrowband transmission method under bandwidth-constrained conditions according to claim 5, characterized in that, The training of the image generator based on the structured information, using a Generative Adversarial Network (SRGAN), includes: Initialize the image generator; The discriminator is initialized based on the structured information. The low bitrate video stream is used as input to the image generator to generate an image at the original resolution. Extract the second structured information of the dynamic target in the original resolution image, and calculate the Euclidean distance between the second structured information and the structured information; Based on the Euclidean distance, the network weights of the image generator and the discriminator are updated alternately, and they compete against each other to gradually improve the reconstruction ability of the image generator. The image generator obtained after the model converges is used as the trained image generator.
7. A narrowband video transmission device under bandwidth-constrained conditions, characterized in that, include: The acquisition module is used to acquire the target video stream and decode the target video stream to obtain a high-resolution original video stream. The extraction module is used to extract the structured information of dynamic targets in the original video stream; The sending module is used to obtain the packet loss rate of the original video stream, and if the packet loss rate is greater than 0, to perform down-quality transcoding on the original video stream to generate a low bitrate video stream, and send the down-quality transcoded original video stream and the structured information to the receiving end; otherwise, to directly send the original video stream and the structured information to the receiving end.
8. A narrowband video transmission device under bandwidth-constrained conditions, characterized in that, include: The receiving module is used to receive the raw video stream or low bitrate video stream sent by the sending end, as well as structured information; The upsampling module is used to increase the resolution and enhance the image quality of the low bitrate video stream based on the structured information.
9. An electronic device, characterized in that, include: At least one memory for storing computer programs; At least one processor is configured to execute a program stored in the memory, wherein, when the program stored in the memory is executed, the processor is configured to perform a narrowband video transmission method under bandwidth-constrained conditions as described in any one of claims 1-6.
10. A computer-readable storage medium storing a computer program, characterized in that, When the computer program is run on the processor, it causes the processor to perform the video narrowband transmission method under bandwidth-constrained conditions as described in any one of claims 1-6.