A video compression and enhancement method, device, electronic equipment and storage medium

By using video encoding and AI image processing technologies, the high cost problem caused by unoptimized video data has been solved, achieving efficient compression and enhancement of video files, reducing storage and maintenance costs, and improving video quality.

CN116828208BActive Publication Date: 2026-06-12CHINA TELECOM CORP LTD

Patent Information

Authority / Receiving Office
CN · China
Patent Type
Patents(China)
Current Assignee / Owner
CHINA TELECOM CORP LTD
Filing Date
2023-07-17
Publication Date
2026-06-12

AI Technical Summary

Technical Problem

In existing technologies, video data is not effectively optimized and managed, resulting in high data center usage costs, high energy consumption, and high maintenance costs. This is especially true when storing massive amounts of video data, where storage and operation and maintenance costs remain high.

Method used

By acquiring video data from the target camera, stacking and encoding the data using a preset video encoding protocol, a high-resolution video file is generated. Combined with image restoration processing, spatiotemporal super-resolution processing, and face enhancement processing, the video is compressed and enhanced.

🎯Benefits of technology

It effectively reduces the construction and maintenance costs of video storage devices, achieves efficient compression and enhancement of video files, saves disk space, and improves the clarity and quality of video files.

✦ Generated by Eureka AI based on patent content.

Smart Images

  • Figure CN116828208B_ABST
    Figure CN116828208B_ABST
Patent Text Reader

Abstract

The application discloses a video compression and enhancement method and device, electronic equipment and storage medium, the method comprises the following steps: acquiring video data collected by a target camera, and storing the video data into a storage pool; decoding a plurality of first video files with low resolution from the storage pool based on a preset video coding protocol, stacking and splicing the plurality of first video files, and coding to obtain a second video file with high resolution; in response to a review request of a target object, decoding the second video file, and then cutting to obtain a plurality of third video files; performing video enhancement on the third video files to obtain a target video file and feed back to the target object; wherein the video enhancement comprises image repair processing, space-time domain super-resolution processing and face enhancement processing; the embodiment of the application can efficiently realize the compression and enhancement of the video, effectively reduce the storage and operation and maintenance cost of the storage equipment, and can be widely applied to the technical field of data processing.
Need to check novelty before this filing date? Find Prior Art

Description

Technical Field

[0001] This invention relates to the field of data processing technology, and in particular to a method, apparatus, electronic device, and storage medium for video compression and enhancement. Background Technology

[0002] Driven by the continuous empowerment and development of new infrastructure and technologies such as big data, AI, 5G, cloud computing, and IoT, the rapid growth of business requires massive video data storage, which in turn leads to increased data center usage costs, increased energy consumption, and higher maintenance costs. The core issue is that the raw video data has not been effectively optimized and managed.

[0003] Taking standard 720P video surveillance as an example, the daily file size is calculated as follows: 3Mb / s × 3600 seconds × 24 hours = 259200Mb = / 8 / 1024 = 31.64GB.

[0004] Table 1

[0005] 720P (H.264) Route 1 Route 10000 Daily disk usage 34.64GB 338.28TB Monthly disk usage 1.01TB 9.91PB Disk usage every six months 6.08TB 59.46PB Annual disk usage 12.34TB 120.57PB

[0006] As shown in Table 1, storing 10,000 720P cameras for 6 months requires approximately 60PB of storage space. Based on the 2023 standard price for distributed storage, the construction cost of 1PB of storage space with 3 copies is approximately 500 yuan / TB × 1024 × 3 = 1.536 million yuan. Storing 10,000 video streams for six months would require a storage cost of 1.536 million yuan × 59.46PB = 91.33 million yuan. Each PB occupies 2 server racks, resulting in an actual usage of approximately 30 server racks. This indicates high initial construction costs and significant ongoing operation and maintenance costs. Summary of the Invention

[0007] This invention aims to at least partially solve one of the technical problems in related technologies. To this end, this invention proposes a method, apparatus, electronic device, and storage medium for video compression and enhancement, capable of efficiently compressing and enhancing video.

[0008] On one hand, embodiments of the present invention provide a method for video compression and enhancement, including:

[0009] Acquire video data captured by the target camera and store the video data in the storage pool;

[0010] Based on a preset video encoding protocol, multiple low-resolution first video files are decoded from the storage pool, and the multiple first video files are stacked and spliced ​​to obtain a high-resolution second video file.

[0011] In response to the target object's playback request, the second video file is decoded, and then cropped to obtain several third video files;

[0012] The third video file is enhanced to obtain the target video file, which is then fed back to the target object. The video enhancement includes image inpainting, spatiotemporal super-resolution processing, and face enhancement.

[0013] Optionally, multiple first video files can be stacked and spliced, including:

[0014] Use a merge function to stack and stitch multiple first video files in the GPU;

[0015] The merge functions include horizontal merge functions and vertical merge functions.

[0016] Optionally, the method further includes:

[0017] The second video file is stored in the storage pool, and the multiple first video files used to encode the second video file are deleted from the storage pool.

[0018] Optionally, the second video file is decoded and then cropped to obtain several third video files, including:

[0019] Based on a preset video encoding format, a video decoder engine is used to decode the second video file in parallel, and then crop it to obtain several third video files.

[0020] Optionally, when video enhancement includes image inpainting, the step of enhancing a third video file includes:

[0021] A deep learning image inpainting model is used to perform image inpainting processing on a third video file;

[0022] Image inpainting includes noise and crease removal and color correction; the deep learning image inpainting model is built based on a variational autoencoder.

[0023] Optionally, when video enhancement includes spatiotemporal super-resolution processing, the step of enhancing the third video file includes:

[0024] Spatial super-resolution of the third video file;

[0025] Perform temporal super-resolution on the third video file;

[0026] Among them, spatial super-resolution characterization improves the spatial resolution of the third video file, while temporal super-resolution characterization increases the video frame rate of the third video file.

[0027] Optionally, when video enhancement includes face enhancement processing, the step of enhancing the third video file includes:

[0028] Face enhancement processing is performed on a third video file using a pre-trained prior embedding network;

[0029] The prior embedding network is obtained through the following pre-training steps:

[0030] A generative adversarial network is embedded into a U-shaped dynamic neural network to obtain a prior embedding network; wherein, the generative adversarial network is generated based on the first face image;

[0031] The prior embedding network is trained and adjusted using the second face image to obtain a pre-trained prior embedding network; the clarity of the first face image is greater than that of the second face image.

[0032] On the other hand, embodiments of the present invention provide a video compression and enhancement apparatus, comprising:

[0033] The first module is used to acquire video data captured by the target camera and store the video data in the storage pool;

[0034] The second module is used to decode multiple low-resolution first video files from the storage pool based on a preset video encoding protocol, stack and splice the multiple first video files, and encode them to obtain a high-resolution second video file.

[0035] The third module is used to decode the second video file in response to the playback request of the target object, and then cut it to obtain several third video files;

[0036] The fourth module is used to enhance the third video file to obtain the target video file and feed it back to the target object; the video enhancement includes image restoration processing, spatiotemporal super-resolution processing, and face enhancement processing.

[0037] Optionally, the first module is specifically used for:

[0038] Use a merge function to stack and stitch multiple first video files in the GPU;

[0039] The merge functions include horizontal merge functions and vertical merge functions.

[0040] Optionally, the device further includes:

[0041] The fifth module is used to store the second video file in the storage pool and delete multiple first video files in the storage pool used for encoding to obtain the second video file.

[0042] Optionally, the third module is specifically used for:

[0043] Based on a preset video encoding format, a video decoder engine is used to decode the second video file in parallel, and then crop it to obtain several third video files.

[0044] Optionally, video enhancement is an image inpainting process, and the fourth module is specifically used for:

[0045] A deep learning image inpainting model is used to perform image inpainting processing on a third video file;

[0046] Image inpainting includes noise and crease removal and color correction; the deep learning image inpainting model is built based on a variational autoencoder.

[0047] Optionally, video enhancement is performed as spatiotemporal super-resolution processing, and the fourth module is specifically used for:

[0048] Spatial super-resolution of the third video file;

[0049] Perform temporal super-resolution on the third video file;

[0050] Among them, spatial super-resolution characterization improves the spatial resolution of the third video file, while temporal super-resolution characterization increases the video frame rate of the third video file.

[0051] Optionally, video enhancement is face enhancement processing, and the fourth module is specifically used for:

[0052] Face enhancement processing is performed on a third video file using a pre-trained prior embedding network;

[0053] The prior embedding network is obtained through the following pre-training steps:

[0054] A generative adversarial network is embedded into a U-shaped dynamic neural network to obtain a prior embedding network; wherein, the generative adversarial network is generated based on the first face image;

[0055] The prior embedding network is trained and adjusted using the second face image to obtain a pre-trained prior embedding network; the clarity of the first face image is greater than that of the second face image.

[0056] On the other hand, embodiments of the present invention provide an electronic device, including: a processor and a memory; the memory is used to store a program; the processor executes the program to implement the above-mentioned video compression and enhancement method.

[0057] On the other hand, embodiments of the present invention provide a computer storage medium storing a processor-executable program, which, when executed by a processor, is used to implement the above-described video compression and enhancement method.

[0058] This invention first acquires video data captured by a target camera and stores it in a storage pool. Based on a preset video encoding protocol, it decodes multiple low-resolution first video files from the storage pool, stacks and splices these first video files, and encodes them to obtain a high-resolution second video file. This invention effectively compresses video files and saves disk space by stacking and splicing low-resolution video files to obtain a high-resolution video file. In response to a playback request from the target object, it decodes the second video files and then crops them to obtain several third video files. The third video files are then enhanced to obtain the target video file, which is then fed back to the target object. The video enhancement includes image restoration processing, spatiotemporal super-resolution processing, and face enhancement processing. This invention utilizes image processing techniques such as image restoration processing, spatiotemporal super-resolution processing, and face enhancement processing for video enhancement, facilitating video playback while reducing the construction and maintenance costs of video storage devices. This invention can efficiently achieve video compression and enhancement, effectively reducing the storage costs of video files and the maintenance costs of storage devices. Attached Figure Description

[0059] The accompanying drawings are provided to further understand the technical solutions of the present invention and constitute a part of the specification. They are used together with the embodiments of the present invention to explain the technical solutions of the present invention, and do not constitute a limitation on the technical solutions of the present invention.

[0060] Figure 1 This is a schematic diagram of an implementation environment for video compression and enhancement provided in an embodiment of the present invention;

[0061] Figure 2 This is a schematic flowchart of a video compression and enhancement method provided in an embodiment of the present invention;

[0062] Figure 3 This is a schematic diagram of the video stacking and splicing process architecture provided in an embodiment of the present invention;

[0063] Figure 4 This is a schematic diagram illustrating the process principle of video stacking and splicing provided in an embodiment of the present invention;

[0064] Figure 5 A schematic diagram of the nvnec module provided in an embodiment of the present invention;

[0065] Figure 6 A schematic diagram illustrating the process principle of video encoding and compression provided in an embodiment of the present invention;

[0066] Figure 7 A schematic diagram illustrating the process principle of parallel video decoding provided in an embodiment of the present invention;

[0067] Figure 8A schematic diagram of the workflow architecture for video spatiotemporal super-resolution based on TMNe provided in an embodiment of the present invention;

[0068] Figure 9 A schematic diagram of the process architecture of the GPEN model provided in an embodiment of the present invention;

[0069] Figure 10 A schematic diagram illustrating the process principle of video encoding provided in an embodiment of the present invention;

[0070] Figure 11 A schematic diagram of the overall process of the video compression and enhancement method provided in the embodiments of the present invention;

[0071] Figure 12 A schematic diagram of the overall technical architecture of a video compression and enhancement system provided in an embodiment of the present invention;

[0072] Figure 13 A schematic diagram of the structure of a video compression and enhancement device provided in an embodiment of the present invention;

[0073] Figure 14 This is a schematic diagram of the structure of an electronic device provided in an embodiment of the present invention;

[0074] Figure 15 A computer system architecture block diagram suitable for implementing electronic devices according to embodiments of the present invention is provided. Detailed Implementation

[0075] To make the objectives, technical solutions, and advantages of this invention clearer, the invention will be further described in detail below with reference to the accompanying drawings and embodiments. It should be understood that the specific embodiments described herein are merely illustrative and not intended to limit the invention.

[0076] It should be noted that although functional modules are divided in the system diagram and the logical order is shown in the flowchart, in some cases, the steps shown or described may be performed in a different order than the module division in the system or the order in the flowchart. The terms "first / S100," "second / S200," etc., in the specification, claims, and the aforementioned figures are used to distinguish similar objects and are not necessarily used to describe a specific order or sequence.

[0077] To facilitate understanding of the technical solutions, the technical terms that may appear in the embodiments of the present invention are explained:

[0078] H.264 is a highly compressed digital video codec standard, and also MPEG-4 Part 10. It is a highly compressed digital video codec standard proposed by the Joint Video Group, which is jointly formed by the ITU-T Video Coding Experts Group (VCEG) and the ISO / IEC Moving Picture Experts Group (MPEG).

[0079] H.265 is a new video coding standard developed by ITU-T VCEG following H.264. The H.265 standard builds upon the existing H.264 standard, retaining some of its technologies while improving upon others. These new technologies aim to optimize the relationship between bitstream, coding quality, latency, and algorithm complexity.

[0080] AV1 is an emerging open-source, royalty-free video compression format, jointly developed and finalized by the Open Multimedia Consortium (AOMedia) industry alliance in early 2018. The main goal of AV1 development was to achieve significant compression gains on state-of-the-art codecs while maintaining practical decoding complexity and hardware feasibility.

[0081] In this document, the term "embodiment" means that a particular feature, structure, or characteristic described in connection with an embodiment may be included in at least one embodiment of the invention. The appearance of this phrase in various places throughout the specification does not necessarily refer to the same embodiment, nor is it a separate or alternative embodiment mutually exclusive with other embodiments. It will be explicitly and implicitly understood by those skilled in the art that the embodiments described herein can be combined with other embodiments.

[0082] It is understood that the video compression and enhancement method provided in this embodiment of the invention can be applied to any computer device with data processing and computing capabilities, and this computer device can be various terminals or servers. When the computer device in the embodiment is a server, the server is an independent physical server, or a server cluster or distributed system composed of multiple physical servers, or a cloud server that provides basic cloud computing services such as cloud services, cloud databases, cloud computing, cloud functions, cloud storage, network services, cloud communication, middleware services, domain name services, security services, CDN (Content Delivery Network), and big data and artificial intelligence platforms. Optionally, the terminal can be a smartphone, tablet computer, laptop computer, or desktop computer, but it is not limited to these.

[0083] like Figure 1 The diagram shown is a schematic representation of an implementation environment provided by an embodiment of the invention. (Refer to...) Figure 1 The implementation environment includes at least one terminal 102 and a server 101. The terminal 102 and the server 101 can be connected via a network, either wirelessly or via a wired connection, to complete data transmission and exchange.

[0084] Server 101 can be a standalone physical server, a server cluster or distributed system composed of multiple physical servers, or a cloud server that provides basic cloud computing services such as cloud services, cloud databases, cloud computing, cloud functions, cloud storage, network services, cloud communication, middleware services, domain name services, security services, CDN (Content Delivery Network), and big data and artificial intelligence platforms.

[0085] Additionally, server 101 can also be a node server in a blockchain network. Blockchain is a novel application model of computer technologies such as distributed data storage, peer-to-peer transmission, consensus mechanisms, and encryption algorithms.

[0086] Terminal 102 can be a smartphone, tablet computer, laptop computer, desktop computer, smart speaker, smartwatch, etc., but is not limited to these. Terminal 102 and server 101 can be directly or indirectly connected via wired or wireless communication, and this embodiment of the invention does not impose any limitations.

[0087] Exemplary based on Figure 1 The implementation environment shown in this embodiment of the invention provides a video compression and enhancement method. The following description uses the application of this video compression and enhancement method in server 101 as an example. It can be understood that the video compression and enhancement method can also be applied in terminal 102.

[0088] Reference Figure 2 , Figure 2 This is a flowchart illustrating a video compression and enhancement method applied to a server, provided in an embodiment of the present invention. The execution entity of this video compression and enhancement method can be any of the aforementioned computer devices. (Refer to...) Figure 2 The method includes the following steps:

[0089] S100: Acquire video data captured by the target camera and store the video data in the storage pool;

[0090] In some specific embodiments, the camera stores the captured video files in a storage pool, which facilitates subsequent processing steps, such as using GPU computing power from the AI ​​computing power pool to compress the video files. It should be noted that the video data captured by the camera can be grouped and stored in the storage pool based on camera identifiers, different time periods, etc.

[0091] S200: Based on a preset video encoding protocol, decode multiple low-resolution first video files from the storage pool, stack and splice the multiple first video files, and encode them to obtain a high-resolution second video file;

[0092] It should be noted that, in some embodiments, stacking and splicing multiple first video files may include: stacking and splicing multiple first video files in the GPU using a merging function; wherein, the merging function includes a horizontal merging function and a vertical merging function.

[0093] In some embodiments, the method may further include: storing the second video file in a storage pool and deleting multiple first video files in the storage pool used for encoding to obtain the second video file.

[0094] In some specific embodiments, 8K hybrid compression technology can be used during video compression. 8K resolution refers to an image or display resolution with a horizontal width of approximately 8000 pixels, meaning each frame of an 8K video image has a resolution of 7680×4320, with approximately 33 million pixels (16:9) per frame. The standard 720P high-definition resolution is 1280×720, while 8K video has six times the horizontal and vertical resolution of 720P. In short, one 8K frame contains exactly 36 images the size of 720P. Taking a 720P file saved by a camera as an example, when compressing the video, the H.264 video codec protocol is used to decode multiple 720P video files simultaneously, and AI-CUDA is used to stitch the 8K frames together in an M×N format. In our specific implementation, we simultaneously decode 36 720P video surveillance files and use AI-CUDA to stitch the 8K frames together in a 6×6 format, ultimately generating a single 36-frame video file. CUDA (Compute Unified Device Architecture) is a parallel computing platform and programming model. It significantly improves computing performance by leveraging the processing power of graphics processing units (GPUs). To date, millions of CUDA-based GPUs have been sold, and software developers, scientists, and researchers are using CUDA in various fields, including image and video processing, computational biology and chemistry, fluid dynamics simulations, CT image reconstruction, seismic analysis, and ray tracing.

[0095] It's important to note that AI-CUDA overlays one video onto another, requiring at least two inputs and one output. The first input is the "master" video to be overlaid. `xy` sets the x and y coordinates of the overlaid video on top of the master video. `main_w` and `wmain_h` represent the width and height of the master video, respectively. `overlay_w` and `woverlay_h` represent the width and height of the overlaid video, respectively.

[0096] For example, placing two input videos side-by-side for output:

[0097] nullsrc=size=200x100[background];

[0098] [0:v]setpts=PTS-STARTPTS,scale=100x100[left];

[0099] [1:v]setpts=PTS-STARTPTS,scale=100x100[right];

[0100] [background][left]

[0101] overlay=shortest=1[background+left];

[0102] [background+left][right]overlay=shortest=1:x=100[left+right]

[0103] Registers—these are private to each thread, meaning that registers allocated to one thread are not visible to other threads. The compiler decides how registers are used.

[0104] L1 / Shared Memory (SMEM) – Each SM has a fast on-chip temporary memory that can be used as L1 cache and shared memory. All threads in a CUDA block can share memory, and all CUDA blocks running on a given SM can share the physical memory resources provided by the SM.

[0105] Read-only memory – Each SM has instruction cache, constant memory, texture memory, and RO cache, which are read-only for kernel code.

[0106] L2 Cache – The L2 cache is shared across all SMs, so every thread in each CUDA block can access this memory. The NVIDIA A100 GPU increases the L2 cache size to 40MB, compared to 6MB in the V100 GPU.

[0107] Global memory – the size of the frame buffer in the DRAM of the GPU.

[0108] For example, taking a file saved by the camera as 720P, combined with 8K hybrid compression, such as... Figure 3 and Figure 4 As shown, the specific process of video compression is as follows:

[0109] The stitching of 36 video files was performed using a merge function (VStack or HStack). Overlay_CUDA was used to perform 6x6 screen stacking on the GPU. Using the CPU for computation could not efficiently perform screen stitching. Through comparative experiments on the hardware platform of CPU 6338 and GPU 4090, the efficiency of the GPU was 8 times that of the CPU.

[0110] With real-time 8K-HEVC encoding and a controllable compression ratio, CPU-based software encoding cannot meet the requirements for real-time encoding and compression of existing 8K 30 frames per second systems. Figure 5 As shown, the NVNEC module using a GPU acceleration unit in this embodiment of the invention enables a single-node GPU card to achieve real-time processing efficiency of 8K60 frames per second. GPU hardware accelerator engines using video decoding (NVDEC) and video encoding (NVENC) support faster processing than real-time video, making them suitable for transcoding applications other than video playback. It supports even load distribution across multiple encoders and real-time 8K60 encoding of AV1 and HEVC formats.

[0111] In some possible implementations, such as Figure 6 As shown, the overall process of video encoding and compression can be summarized as follows:

[0112] 1. Retrieve 36 channels of 720P video files from the storage pool;

[0113] 2. The acquired 36 channels of 720P video files were decoded in real time using CUVID;

[0114] 3. The decoded 36 video streams are then processed and stitched together using AI-CUDA;

[0115] 4. Generate one 8K video file;

[0116] 5. Then, the generated 8K video file is encoded in real time into HEVC (High Efficiency Video Coding, also known as H.265 and MPEG-H Part 2) format using NVNEC (video coding, the same as Nvenc).

[0117] 6. The final result is a 6x6 image with a resolution of 7680*4320.

[0118] S300: In response to the playback request of the target object, decode the second video file and then cut it to obtain several third video files;

[0119] It should be noted that in some embodiments, decoding the second video file and then cropping it to obtain several third video files includes: using a video decoder engine to perform parallel decoding of the second video file based on a preset video encoding format, and then cropping it to obtain several third video files. The video cropping is performed at the splicing points of the various first videos.

[0120] In some specific embodiments, when restoring the video, the H.265 video codec protocol is used to decode the 8K video surveillance file, and an AI model is used for cropping. However, it is difficult to decode 36N files simultaneously in real time using a CPU, where N depends on the number of GPUs on the computing nodes (in traditional mode using CPU, N is less than or equal to 2). Using a CPU would result in CPU resources being heavily consumed by the decoding program, and it would also fail to meet the requirements of subsequent video stitching. Figure 7 As shown, this embodiment of the invention utilizes AI-CUDA technology to upload files to the GPU and uses the video decoder engine NVdec to simultaneously decode 36N files.

[0121] S400: Perform video enhancement on the third video file to obtain the target video file and send it back to the target object;

[0122] Video enhancement includes image inpainting, spatiotemporal super-resolution processing, and face enhancement processing.

[0123] It should be noted that in some embodiments, when video enhancement includes image inpainting processing, the step of enhancing the third video file may include: using a deep learning image inpainting model to perform image inpainting processing on the third video file; wherein, the image inpainting processing includes noise and crease removal and color correction; the deep learning image inpainting model is constructed based on a variational autoencoder.

[0124] In some specific embodiments, AI deep learning image restoration models based on VAEs (Variational Autoencoders) enable automatic restoration of old videos, not only removing noise and creases, but also optimizing details and correcting colors. This greatly reduces the need for human resources.

[0125] In some embodiments, when video enhancement includes spatiotemporal super-resolution processing, the step of enhancing the third video file may include: performing spatial super-resolution on the third video file; performing temporal super-resolution on the third video file; which can realize the conversion from low resolution and low frame rate to high resolution and high frame rate; wherein, spatial super-resolution represents the process of improving the spatial resolution of the third video file, and temporal super-resolution represents the process of increasing the video frame rate of the third video file.

[0126] In some specific embodiments, an AI video spatiotemporal super-resolution model based on TMNet is used to convert low-resolution, low-frame-rate video to high-resolution, high-frame-rate video, effectively improving video clarity and smoothness. A neural network model is employed for the spatiotemporal super-resolution operation. In video spatiotemporal super-resolution, spatial super-resolution refers to increasing the spatial resolution of the video to improve image clarity, such as upgrading a 4K video to 8K in this embodiment; temporal super-resolution refers to increasing the frame rate of the video to provide a smoother viewing experience. Film video typically has a frame rate of 12 frames per second, far below the frame rate required for smooth viewing, necessitating temporal super-resolution.

[0127] like Figure 8 As shown, a video spatiotemporal super-resolution method based on TMNe is employed. Unlike step-by-step super-resolution, TMNet can simultaneously perform spatiotemporal super-resolution and super-resolution of video using a single model. This not only simplifies the operation steps and computational scale, but also provides better performance through this method of combining spatiotemporal information analysis. The TMNe-based video spatiotemporal super-resolution method is an image processing technique for video enhancement. TMNe stands for Temporal Motion Network Enhancement, which combines the concepts of spatiotemporal super-resolution reconstruction and motion compensation. The goal of video spatiotemporal super-resolution methods is to enhance low-resolution video sequences to high resolution, thereby improving video quality and detail clarity. Traditional super-resolution methods mainly focus on the reconstruction of single-frame images, while video spatiotemporal super-resolution methods consider the temporal relationships between frames in a video sequence. The TMNe method estimates the motion information between frames in a video sequence through motion compensation and performs spatiotemporal super-resolution reconstruction based on this information. It uses spatiotemporal filters to extract motion information and applies it to the reconstruction process of low-resolution images. This method can better preserve the motion continuity and spatial details of video sequences, resulting in clearer and more natural high-resolution videos. The TMNe-based spatiotemporal super-resolution method for video has wide applications in video enhancement. It can improve the visual quality of low-quality videos, such as enhancing details in surveillance videos and improving the clarity of video conferencing. By combining spatiotemporal super-resolution reconstruction and motion compensation, this method can significantly improve video quality and provide a better user experience in many applications.

[0128] In some embodiments, when video enhancement includes face enhancement processing, the step of enhancing a third video file includes: performing face enhancement processing on the third video file using a pre-trained prior embedding network; wherein the prior embedding network is pre-trained through the following steps: embedding a generative adversarial network into a U-shaped dynamic neural network to obtain the prior embedding network; wherein the generative adversarial network is trained and generated based on a first face image; the prior embedding network is trained and adjusted using a second face image to obtain a pre-trained prior embedding network; the clarity of the first face image is greater than the clarity of the second face image.

[0129] In some specific embodiments, such as Figure 9 As shown, the GPEN (Prior Embedding Network) model effectively enhances and repairs facial features, instantly making the photos much clearer, especially with rich enhancement of facial details. When watching videos, people focus more on the faces, but ordinary video super-resolution algorithms do not enhance facial details. To obtain film-restored videos with a better viewing experience, targeted facial enhancement is necessary. The core idea of ​​the GPEN model is to first learn a GAN (Generative Adversarial Network) for generating the first face image and embed it into a U-shaped DNN (U-shaped Dynamic Neural Network) as a prior decoder. Then, a set of synthesized second face images is used to fine-tune the prior embedded GAN DNN, ultimately achieving face enhancement.

[0130] GPEN (Prior Embedding Network) is a deep learning model for face image generation and editing. It's based on the Generative Adversarial Networks (GANs) framework, designed to learn and capture latent features and prior information from face images. GPEN is designed to generate high-quality, realistic face images and provide editing capabilities. It learns features and patterns from a large number of real face images during training and then uses these learned features to generate new face images. Unlike traditional GAN ​​models, GPEN introduces a prior embedding network. This network plays a crucial role between the generator and discriminator, learning prior information about the face image and embedding it into the generation process. The prior embedding network can be viewed as an encoder of face image features, learning to map the face image to a vector representation in a latent space. During generation, the GPEN model takes a latent vector as input and uses the generator network to transform it into the corresponding face image. The generator network consists of multiple layers that iterate and optimize repeatedly to generate realistic face images. The discriminator network evaluates the realism of the generated images and provides feedback signals for training the generator network. The GPEN model's advantage lies in its ability to generate high-quality, diverse face images and its ability to edit these images. By adjusting the prior embedding vectors in the latent space, fine-grained control can be achieved over the attributes, expressions, and poses of the generated faces. This makes GPEN a promising candidate for applications in face generation, virtual character creation, and face editing.

[0131] This invention can be played on any PC and mobile app player, supporting international standard codec protocols: H.264, H.265, VP8 / VP9, AV1, and H.266, and the AVS2 / 3 video codec protocol. In this embodiment, the AI-enhanced image delivers a completely new experience across five dimensions: color gamut, resolution, quantization accuracy, dynamic range, and frame rate. AI enhancement makes the original video colors richer; the BT2020 standard covers almost all colors of natural objects, resulting in richer colors and more delicate images. The enhanced video provides a stunning visual experience with its large screen and exquisite details. The 4x 4K resolution delivers detailed images, reproducing the scene's effect with more accurate colors; higher color depth values ​​yield more colors. 10-bit color sampling is denser, meaning more delicate gradations, and high color depth breaks down color gradation, resulting in more accurate visual color representation. In 8-bit RGB, each color has 2 to the power of 8, or 256 levels, resulting in 16.7 million (256*256*256) color combinations. In 10-bit RGB, each color has 2 to the power of 10, or 1024 levels, resulting in 1.07 billion (1024*1024*1024) color combinations, a 64-fold increase.

[0132] In some possible implementations, such as Figure 10 As shown, the overall process of video enhancement can be summarized as follows:

[0133] 1. First, obtain the 8K video file obtained after video compression;

[0134] 2. Use NVDEC (video decoder, same as NVdec) to decode 8K video files into HEVC format;

[0135] 3. Use CUDC to trim the decoded video;

[0136] 4. Perform AI enhancement on each of the cropped videos;

[0137] 5. Finally, the enhanced video is returned to the user for playback.

[0138] Among the achievable embodiments, such as Figure 11 As shown, the overall flow of the method of the present invention is as follows:

[0139] The camera stores the captured video files in a storage pool. The video storage pool uses the GPU computing power in the AI ​​computing power pool to compress the video files and then stores the compressed video back in the storage pool.

[0140] When a user requests to play back a video, the AI ​​computing power pool is requested to restore and enhance the video. The enhanced video file is then put back into the storage pool for the user to watch, or it can be directly pushed to the display platform for playback.

[0141] In some specific embodiments, the process of implementing video compression and AI restoration according to the present invention is as follows:

[0142] S1: When compressing video, the H.264 video codec protocol is used to decode multiple 720P video surveillance files simultaneously, and AI-CUDA is used to stitch 8K images in M×N format, and then 8K real-time encoding is performed to create a single file to achieve file compression.

[0143] S2: When restoring the video, the H.265 video codec protocol is used to decode the 8K video file, and an AI deep learning image restoration model based on VAEs is used to automatically repair, crop, and enhance the video, removing noise, creases, etc., while also optimizing details and correcting colors. The enhanced video file is then provided to the user.

[0144] It should also be noted that, in some feasible embodiments, the present invention also provides a system architecture for implementing the aforementioned methods, such as... Figure 12 The diagram illustrates the overall system architecture of this invention. The front-end presentation layer uses Vue.js to build a single-page application on the PC, implementing web page logic for resource management and other query functions. The load balancing layer, an Nginx server, handles client access requests. The service layer is developed based on the cloud framework of mainstream open-source systems, using Node.js to build a microservice cluster for business functions. The data interaction layer implements data conversion and transmission between microservices and the underlying data storage layer. Structured data in microservices uses the Mybatis framework and Druid connection pool for database read / write operations. The caching part uses Redis as temporary storage for hot data, avoiding additional database pressure caused by frequent database access.

[0145] In summary, addressing the problems existing in current technologies, this invention utilizes video encoding and decoding compression technology to combine multiple video files into a single video file, achieving video file compression and saving disk space. Simultaneously, it employs AI image processing technology to repair and restore video resolution, color gamut, aspect ratio, image degradation, and automatic colorization, enhancing the video and thus reducing the construction and maintenance costs of video storage devices. Furthermore, it leverages AI technology to enhance video image quality. Compared to existing technologies, the beneficial effects of this invention include:

[0146] The technical solution of the present invention has high compression efficiency and the compression ratio can be specified. The compression efficiency can be adjusted between 50% and 80% according to the actual situation.

[0147] The technical solution of this invention has a fast compression speed; a 1-hour video can be compressed in about 33 minutes.

[0148] The technical solution of this invention utilizes AI decoding technology, which can complete the decoding and rendering of 8K video in milliseconds.

[0149] The technical solution of this invention reduces the file size after compression, thereby reducing the transmission bandwidth utilization by 50% when the user views the file.

[0150] On the other hand, such as Figure 13 As shown, this embodiment of the invention provides a video compression and enhancement device 800, comprising: a first module 810, used to acquire video data captured by a target camera and store the video data in a storage pool; a second module 820, used to decode multiple low-resolution first video files from the storage pool based on a preset video encoding protocol, stack and splice the multiple first video files, and encode them to obtain a high-resolution second video file; a third module 830, used to decode the second video file in response to a playback request from a target object, and then crop it to obtain several third video files; and a fourth module 840, used to perform video enhancement on the third video files to obtain a target video file and feed it back to the target object; wherein, the video enhancement includes image restoration processing, spatiotemporal super-resolution processing, and face enhancement processing.

[0151] In some specific embodiments, the device of the present invention first acquires video data captured by the target camera through a first module and stores the video data in a storage pool; then, through a second module, it decodes multiple low-resolution first video files from the storage pool based on a preset video encoding protocol, stacks and splices the multiple first video files, and encodes them to obtain a high-resolution second video file; next, in response to the playback request of the target object, the third module decodes the second video file and then crops it to obtain several third video files; finally, the fourth module performs video enhancement on the third video files to obtain the target video file and feeds it back to the target object; wherein, the video enhancement includes image restoration processing, spatiotemporal super-resolution processing, and face enhancement processing.

[0152] It should be noted that, in some embodiments, the device further includes the following modules:

[0153] The fifth module is used to store the second video file in the storage pool and delete multiple first video files in the storage pool used for encoding to obtain the second video file.

[0154] The content of the method embodiments of the present invention is applicable to the device embodiments. The specific functions implemented by the device embodiments are the same as those of the above method embodiments, and the beneficial effects achieved are also the same as those achieved by the above methods.

[0155] On the other hand, such as Figure 14 As shown, this embodiment of the invention also provides an electronic device 900, which includes at least one processor 910 and at least one memory 920 for storing at least one program; taking one processor 910 and one memory 920 as an example.

[0156] The processor 910 and memory 920 can be connected via a bus or other means.

[0157] Memory 920, as a non-transitory computer-readable storage medium, can be used to store non-transitory software programs and non-transitory computer-executable programs. Furthermore, memory 920 may include high-speed random access memory, and may also include non-transitory memory, such as at least one disk storage device, flash memory device, or other non-transitory solid-state storage device. In some embodiments, memory 920 may optionally include memory remotely located relative to the processor, and this remote memory can be connected to the device via a network. Examples of such networks include, but are not limited to, the Internet, intranets, local area networks, mobile communication networks, and combinations thereof.

[0158] The electronic device embodiments described above are merely illustrative. The units described as separate components may or may not be physically separate; that is, they may be located in one place or distributed across multiple network units. Some or all of the modules can be selected to achieve the purpose of this embodiment according to actual needs.

[0159] Specifically, Figure 15 A schematic block diagram of a computer system architecture for implementing an electronic device according to embodiments of the present invention is shown.

[0160] It should be noted that, Figure 15 The computer system 1000 of the electronic device shown is merely an example and should not impose any limitation on the functionality and scope of use of the embodiments of the present invention.

[0161] like Figure 15As shown, the computer system 1000 includes a central processing unit (CPU) 1001, which can perform various appropriate actions and processes based on programs stored in read-only memory (ROM) 1002 or programs loaded from storage section 1008 into random access memory (RAM). The RAM 1003 also stores various programs and data required for system operation. The CPU 1001, ROM 1002, and RAM 1003 are interconnected via a bus 1004. An input / output interface 1005 (I / O interface) is also connected to the bus 1004.

[0162] The following components are connected to the input / output interface 1005: an input section 1006 including a keyboard, mouse, etc.; an output section 1007 including a cathode ray tube (CRT), liquid crystal display (LCD), etc., and speakers, etc.; a storage section 1008 including a hard disk, etc.; and a communication section 1009 including a network interface card such as a local area network card, modem, etc. The communication section 1009 performs communication processing via a network such as the Internet. A drive 1010 is also connected to the input / output interface 1005 as needed. A removable medium 1011, such as a disk, optical disk, magneto-optical disk, semiconductor memory, etc., is installed on the drive 1010 as needed so that computer programs read from it can be installed into the storage section 1008 as needed.

[0163] In particular, according to embodiments of the present invention, the processes described in the various method flowcharts can be implemented as computer software programs. For example, embodiments of the present invention include a computer program product comprising a computer program carried on a computer-readable medium, the computer program containing program code for performing the methods shown in the flowcharts. In such embodiments, the computer program can be downloaded and installed from a network via communication section 1009, and / or installed from removable medium 1011. When the computer program is executed by central processing unit 1001, it performs various functions defined in the system of the present invention.

[0164] It should be noted that the computer-readable medium shown in the embodiments of the present invention can be a computer-readable signal medium or a computer-readable storage medium, or any combination thereof. A computer-readable storage medium can be, for example,—but not limited to—an electrical, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any combination thereof. More specific examples of a computer-readable storage medium may include, but are not limited to: an electrical connection having one or more wires, a portable computer disk, a hard disk, random access memory (RAM), read-only memory (ROM), erasable programmable read-only memory (EPROM), flash memory, optical fiber, portable compact disc read-only memory (CD-ROM), optical storage device, magnetic storage device, or any suitable combination thereof. In the present invention, a computer-readable storage medium can be any tangible medium containing or storing a program that can be used by or in conjunction with an instruction execution system, apparatus, or device. In the present invention, a computer-readable signal medium can include a data signal propagated in baseband or as part of a carrier wave, wherein computer-readable program code is carried. Such transmitted data signals can take various forms, including but not limited to electromagnetic signals, optical signals, or any suitable combination thereof. The computer-readable signal medium can also be any computer-readable medium other than a computer-readable storage medium, which can send, propagate, or transmit a program for use by or in connection with an instruction execution system, apparatus, or device. The program code contained on the computer-readable medium can be transmitted using any suitable medium, including but not limited to wireless, wired, etc., or any suitable combination thereof.

[0165] The content of the method embodiments of the present invention is applicable to the system embodiments. The specific functions implemented in the system embodiments are the same as those in the above method embodiments, and the beneficial effects achieved are also the same as those achieved by the above methods.

[0166] Another aspect of this invention provides a computer-readable storage medium storing a program that is executed by a processor to implement the aforementioned method.

[0167] The content of the method embodiments of the present invention is applicable to the computer-readable storage medium embodiments. The specific functions implemented by the computer-readable storage medium embodiments are the same as those of the above method embodiments, and the beneficial effects achieved are also the same as those achieved by the above methods.

[0168] This invention also discloses a computer program product or computer program, which includes computer instructions stored in a computer-readable storage medium. A processor of a computer device can read the computer instructions from the computer-readable storage medium and execute the computer instructions, causing the computer device to perform the aforementioned method.

[0169] The flowcharts and block diagrams in the accompanying drawings illustrate the architecture, functionality, and operation of possible implementations of systems, methods, and computer program products according to various embodiments of the present invention. In this regard, each block in a flowchart or block diagram may represent a module, segment, or portion of code containing one or more executable instructions for implementing a specified logical function. It should also be noted that in some alternative implementations, the functions indicated in the blocks may occur in a different order than those indicated in the drawings. For example, two consecutively indicated blocks may actually be executed substantially in parallel, and they may sometimes be executed in reverse order, depending on the functions involved. It should also be noted that each block in a block diagram or flowchart, and combinations of blocks in a block diagram or flowchart, may be implemented using a dedicated hardware-based system that performs the specified function or operation, or using a combination of dedicated hardware and computer instructions.

[0170] It should be noted that although several modules for the device used to perform actions have been mentioned in the detailed description above, this division is not mandatory. In fact, according to embodiments of the present invention, the features and functions of two or more modules or units described above can be embodied in one module or unit. Conversely, the features and functions of one module or unit described above can be further divided and embodied by multiple modules or units.

[0171] Through the above description of the embodiments, those skilled in the art will readily understand that the exemplary embodiments described herein can be implemented by software or by combining software with necessary hardware. Therefore, the technical solutions according to the embodiments of the present invention can be embodied in the form of a software product, which can be stored in a non-volatile storage medium (such as a CD-ROM, USB flash drive, portable hard drive, etc.) or on a network, including several instructions to cause a computing device (such as a personal computer, server, touch terminal, or network device, etc.) to execute the method according to the embodiments of the present invention.

[0172] In some alternative embodiments, the functions / operations mentioned in the block diagrams may not occur in the order shown in the operation diagrams. For example, depending on the functions / operations involved, two consecutively shown blocks may actually be executed substantially simultaneously, or the blocks may sometimes be executed in reverse order. Furthermore, the embodiments presented and described in the flowcharts of this invention are provided by way of example to provide a more comprehensive understanding of the technology. The disclosed methods are not limited to the operations and logic flows presented herein. Alternative embodiments are contemplated in which the order of various operations is changed and sub-operations described as part of a larger operation are executed independently.

[0173] Furthermore, although the invention has been described in the context of functional modules, it should be understood that, unless otherwise stated, one or more of the functions and / or features may be integrated into a single physical device and / or software module, or one or more functions and / or features may be implemented in a separate physical device or software module. It is also understood that a detailed discussion of the actual implementation of each module is unnecessary for understanding the invention. Rather, given the properties, functions, and internal relationships of the various functional modules in the apparatus disclosed herein, the actual implementation of the module will be understood within the scope of conventional skill of an engineer. Therefore, those skilled in the art can implement the invention as set forth in the claims using ordinary techniques without excessive experimentation. It is also understood that the specific concepts disclosed are merely illustrative and not intended to limit the scope of the invention, which is determined by the full scope of the appended claims and their equivalents.

[0174] If a function is implemented as a software functional unit and sold or used as an independent product, it can be stored in a computer-readable storage medium. Based on this understanding, the technical solution of this invention, or the part that contributes to the prior art, or a part of the technical solution, can be embodied in the form of a software product. This computer software product is stored in a storage medium and includes several instructions to cause a computer device (which may be a personal computer, server, or network device, etc.) to execute all or part of the steps of the methods of the various embodiments of this invention. The aforementioned storage medium includes various media capable of storing program code, such as USB flash drives, portable hard drives, read-only memory (ROM), random access memory (RAM), magnetic disks, or optical disks.

[0175] The logic and / or steps represented in the flowchart or otherwise described herein, for example, can be considered as a sequenced list of executable instructions for implementing logical functions, and can be embodied in any computer-readable medium for use by, or in conjunction with, an instruction execution means, apparatus, or device (such as a computer-based device, a processor-including device, or other means that can fetch and execute instructions from, or in conjunction with, an instruction execution means, apparatus, or device). For the purposes of this specification, "computer-readable medium" can be any means that can contain, store, communicate, propagate, or transmit programs for use by, or in conjunction with, an instruction execution means, apparatus, or device.

[0176] More specific examples of computer-readable media (a non-exhaustive list) include: electrical connections (electronic devices) having one or more wires, portable computer disk drives (magnetic devices), random access memory (RAM), read-only memory (ROM), erasable and editable read-only memory (EPROM or flash memory), fiber optic devices, and portable optical disc read-only memory (CDROM). Furthermore, computer-readable media can even be paper or other suitable media on which programs can be printed, because programs can be obtained electronically, for example, by optically scanning the paper or other medium, followed by editing, interpreting, or otherwise processing as necessary, and then stored in computer memory.

[0177] It should be understood that various parts of the present invention can be implemented in hardware, software, firmware, or a combination thereof. In the above embodiments, multiple steps or methods can be implemented in software or firmware stored in memory and executed by a suitable instruction execution device. For example, if implemented in hardware, as in another embodiment, it can be implemented using any one or a combination of the following techniques known in the art: discrete logic circuits having logic gates for implementing logical functions on data signals, application-specific integrated circuits (ASICs) having suitable combinational logic gates, programmable gate arrays (PGAs), field-programmable gate arrays (FPGAs), etc.

[0178] In the description of this specification, references to terms such as "one embodiment," "some embodiments," "example," "specific example," or "some examples," etc., indicate that a specific feature, structure, material, or characteristic described in connection with that embodiment or example is included in at least one embodiment or example of the invention. In this specification, the illustrative expressions of the above terms do not necessarily refer to the same embodiment or example. Furthermore, the specific features, structures, materials, or characteristics described may be combined in any suitable manner in one or more embodiments or examples.

[0179] Although embodiments of the invention have been shown and described, those skilled in the art will understand that various changes, modifications, substitutions and alterations can be made to these embodiments without departing from the principles and spirit of the invention, the scope of which is defined by the claims and their equivalents.

[0180] The above is a detailed description of the preferred embodiments of the present invention. However, the present invention is not limited to the embodiments. Those skilled in the art can make various equivalent modifications or substitutions without departing from the spirit of the present invention. All such equivalent modifications or substitutions are included within the scope defined by the claims of the present invention.

Claims

1. A method for compressing and enhancing video, characterized in that, include: Acquire video data captured by the target camera and store the video data in a storage pool; Based on a preset video encoding protocol, multiple low-resolution first video files are decoded from the storage pool, and the multiple first video files are stacked and spliced ​​together to encode a high-resolution second video file. In response to the target object's playback request, the second video file is decoded to obtain several third video files; The third video file is enhanced to obtain a target video file, which is then fed back to the target object; wherein, the video enhancement includes image inpainting, spatiotemporal super-resolution processing, and face enhancement processing; Wherein, when the video enhancement includes face enhancement processing, the step of enhancing the third video file includes: A pre-trained prior embedding network is used to learn and capture latent features and prior information in face images in order to perform face enhancement processing on the third video file. The prior embedding network is obtained through the following pre-training steps: A generative adversarial network is embedded in a U-shaped dynamic neural network as a prior decoder to obtain a prior embedded network; wherein, the generative adversarial network is generated based on a first face image. The prior embedding network is trained and adjusted using the second face image to obtain a pre-trained prior embedding network; the clarity of the first face image is greater than that of the second face image.

2. The video compression and enhancement method according to claim 1, characterized in that, The stacking and splicing of multiple first video files includes: The first video files are stacked and stitched together in the GPU using a merge function; The merging function includes a horizontal merging function and a vertical merging function.

3. The video compression and enhancement method according to claim 1, characterized in that, The method further includes: The second video file is stored in the storage pool, and multiple first video files used to encode the second video file are deleted from the storage pool.

4. The video compression and enhancement method according to claim 1, characterized in that, The decoding of the second video file to obtain several third video files includes: Based on a preset video encoding format, the second video file is decoded in parallel using a video decoder engine, and then cropped to obtain several third video files.

5. The video compression and enhancement method according to claim 1, characterized in that, When the video enhancement includes image restoration processing, the step of enhancing the third video file includes: The third video file is processed using a deep learning image restoration model. The image inpainting process includes noise and crease removal and color correction; the deep learning image inpainting model is constructed based on a variational autoencoder.

6. The video compression and enhancement method according to claim 1, characterized in that, When the video enhancement includes spatiotemporal super-resolution processing, the step of enhancing the third video file includes: Spatial super-resolution is performed on the third video file; Perform temporal super-resolution on the third video file; The spatial domain super-resolution characterization improves the spatial resolution of the third video file, while the temporal domain super-resolution characterization increases the video frame rate of the third video file.

7. A video compression and enhancement device, characterized in that, include: The first module is used to acquire video data captured by the target camera and store the video data in a storage pool; The second module is used to decode multiple low-resolution first video files from the storage pool based on a preset video encoding protocol, stack and splice the multiple first video files, and encode them to obtain a high-resolution second video file. The third module is used to decode the second video file in response to the playback request of the target object, and then cut it to obtain several third video files; The fourth module is used to perform video enhancement on the third video file to obtain a target video file and feed it back to the target object; wherein, the video enhancement includes image inpainting processing, spatiotemporal super-resolution processing, and face enhancement processing; Wherein, when the video enhancement includes face enhancement processing, the step of enhancing the third video file includes: The third video file is subjected to face enhancement processing using a pre-trained prior embedding network; The prior embedding network is obtained through the following pre-training steps: A generative adversarial network is embedded into a U-shaped dynamic neural network to obtain a priori embedded network; wherein, the generative adversarial network is generated based on a first face image. The prior embedding network is trained and adjusted using the second face image to obtain a pre-trained prior embedding network; the clarity of the first face image is greater than that of the second face image.

8. An electronic device, characterized in that, Including the processor and memory; The memory is used to store programs; The processor executes the program to implement the method as described in any one of claims 1 to 6.

9. A computer storage medium storing a processor-executable program, characterized in that, The processor-executable program, when executed by the processor, is used to implement the method as described in any one of claims 1 to 6.