A video processing method, apparatus, device, medium and program product
By generating multiple video enhancement strategies using the first model and evaluating their quality using the second model, the optimal video is automatically selected, solving the problem of time-consuming and labor-intensive manual selection in existing technologies and achieving video enhancement with optimal global image quality.
Patent Information
- Authority / Receiving Office
- CN · China
- Patent Type
- Applications(China)
- Current Assignee / Owner
- BEIJING ZITIAO NETWORK TECH CO LTD
- Filing Date
- 2026-03-16
- Publication Date
- 2026-06-19
Smart Images

Figure CN122244648A_ABST
Abstract
Description
Technical Field
[0001] This relates to the field of computer technology, and in particular to a video processing method, apparatus, device, medium, and program product. Background Technology
[0002] With the rapid development of computer technology, there is a need for video enhancement processing to improve video quality. Current video processing methods rely on human experience to select appropriate enhancement methods and perform the enhancement, which is time-consuming, labor-intensive, and cannot effectively guarantee the quality of video processing. Summary of the Invention
[0003] This paper provides a video processing method, apparatus, device, medium, and program product to achieve automatic video enhancement without human intervention and effectively guarantee the video processing effect.
[0004] In one scenario, this paper provides a video processing method, including: Get the first video; Obtain the first analysis result, which is the content analysis result of the first video; Multiple first video enhancement strategies are obtained. The multiple first video enhancement strategies are obtained by using a first model based on the first video and the first analysis results. Each of the multiple first video enhancement strategies includes: at least one video enhancement method and the enhancement processing order corresponding to the at least one video enhancement method. Multiple second videos are obtained, wherein the multiple second videos are obtained by enhancing the first video based on the multiple first video enhancement strategies; A first quality assessment result is obtained, which includes the quality assessment result of each of the plurality of second videos. The quality assessment result of the second video is obtained by using a second model to assess the quality of the second video. Based on the first quality assessment result, a third video is determined from the plurality of second videos.
[0005] In one instance, this document also provides a video processing apparatus, comprising: The first video acquisition module is used to acquire the first video. The analysis result acquisition module is used to obtain a first analysis result, which is the content analysis result of the first video. An enhancement strategy acquisition module is used to acquire multiple first video enhancement strategies. The multiple first video enhancement strategies are obtained by using a first model based on the first video and the first analysis result. Each of the multiple first video enhancement strategies includes: at least one video enhancement method and the enhancement processing order corresponding to the at least one video enhancement method. The second video acquisition module is used to acquire multiple second videos, which are obtained by enhancing the first video based on the multiple first video enhancement strategies. A quality assessment result acquisition module is used to obtain a first quality assessment result, which includes the quality assessment result of each of the plurality of second videos. The quality assessment result of the second video is obtained by performing a quality assessment on the second video using a second model. The third video determination module is used to determine a third video from the plurality of second videos based on the first quality assessment result.
[0006] In one instance, this document also provides an electronic device comprising: One or more processors; Storage device for storing one or more programs. When the one or more programs are executed by the one or more processors, the one or more processors implement the video processing method as described herein.
[0007] In one instance, this document also provides a storage medium containing computer-executable instructions that, when executed by a computer processor, are used to perform the video processing methods described herein.
[0008] In one instance, this document also provides a computer program product, including a computer program that, when executed by a processor, implements the video processing method as described herein.
[0009] By employing a first model, multiple first video enhancement strategies are obtained based on a first video and the first analysis results. Each first video enhancement strategy includes all video enhancement methods required for the first video and the enhancement processing order of these methods, thus achieving long-chain multi-strategy prediction using the first model. By enhancing the first video based on these multiple first video enhancement strategies, multiple second videos are obtained. The second model is then used to evaluate the quality of each second video, obtaining a first quality evaluation result. Based on this result, a third video with the optimal video processing effect can be determined from the multiple second videos. The entire process requires no manual intervention, achieving automatic video enhancement. Furthermore, the collaborative processing of the first and second models yields the third video with the best global image quality, effectively ensuring the video processing effect. Attached Figure Description
[0010] The above and other features, advantages, and aspects of the embodiments described herein will become more apparent when taken in conjunction with the accompanying drawings and the following detailed description. Throughout the drawings, the same or similar reference numerals denote the same or similar elements. It should be understood that the drawings are schematic, and the originals and elements are not necessarily drawn to scale.
[0011] Figure 1 This is a schematic diagram of the structure of a video processing system provided in one scenario; Figure 2 This is a flowchart illustrating a video processing method provided in one scenario; Figure 3 This is an example diagram illustrating a video processing procedure in one scenario. Figure 4 This is an example diagram illustrating a sample data determination process in one scenario. Figure 5 This is a flowchart illustrating another video processing method provided in one scenario; Figure 6 This is an example diagram illustrating another video processing procedure involved in one scenario; Figure 7 This is a schematic diagram of the structure of a video processing device provided in one scenario; Figure 8 This is a schematic diagram of the structure of an electronic device provided in one scenario. Detailed Implementation
[0012] The embodiments will now be described in more detail with reference to the accompanying drawings. While some embodiments are shown in the drawings, it should be understood that this document can be implemented in various forms and should not be construed as limited to the embodiments set forth herein. Rather, these embodiments are provided to provide a more thorough and complete understanding of the technical solutions. It should be understood that the accompanying drawings and embodiments are for illustrative purposes only and are not intended to limit the scope of protection of the technical solutions.
[0013] It should be understood that the steps described in the method implementation may be performed in different orders and / or in parallel. Furthermore, the method implementation may include additional steps and / or omit the steps shown. The scope of this document is not limited in this respect.
[0014] The term "comprising" and its variations as used herein are open-ended inclusions, meaning "including but not limited to". The term "based on" means "at least partially based on". The term "one situation" means "at least one situation"; the term "another situation" means "at least one additional situation"; the term "some situations" means "at least some situations". Definitions of other terms will be given in the following description.
[0015] It should be noted that the concepts of "first" and "second" mentioned are only used to distinguish different devices, modules or units, and are not used to limit the order of the functions performed by these devices, modules or units or their interdependencies.
[0016] It should be noted that the use of the terms "one" and "more" is illustrative rather than restrictive, and those skilled in the art should understand that, unless otherwise expressly indicated in the context, they should be understood as "one or more".
[0017] The names of messages or information exchanged between the various devices in this document are for illustrative purposes only and are not intended to limit the scope of these messages or information.
[0018] It is understandable that the data involved in the technical solution (including but not limited to the data itself, the acquisition or use of the data) should comply with the requirements of relevant laws, regulations and related provisions.
[0019] The technical solution in this article can be applied to... Figure 1The video processing system shown is a real-world application. In practice, this system may include a client 101 and a server 102. The client 101 may include, but is not limited to, personal mobile terminals such as smartphones, tablets, and personal computers, or other client terminals. Various applications, such as media content publishing applications and session applications, are deployed on the client 101. The server 102 may be one or more servers providing various interfaces. That is, it can be implemented as a distributed server cluster composed of multiple servers, or as a single server; furthermore, it can be a server in a distributed system, a server integrating blockchain technology, a cloud server, or an intelligent cloud computing server or intelligent cloud host deployed with machine learning models.
[0020] In the technical solution described herein, client 101 can interact with server 102, such as receiving or sending data. For example, in this paper, server 102 can receive a video that needs enhancement processing sent by client 101, perform enhancement processing on the video to obtain a video with better image quality, and then send the enhanced video back to client 101 for video processing, such as video playback. Client 101 and server 102 achieve data interaction and functional collaboration through network communication.
[0021] For example, in the technical solution of this paper, the video processing method can also be executed on the client 101 or the server 102. For instance, the video to be enhanced can be imported into the client 101 or the server 102, and the client 101 or the server 102 can perform enhancement processing on the imported video. It should be understood that... Figure 1 The number of clients and servers shown is for illustrative purposes only. Any number of clients and servers can be configured to meet specific implementation requirements.
[0022] Figure 2 This is a flowchart illustrating a video processing method for one scenario, applicable to enhancing distorted video. The method can be executed by a video processing device, which can be implemented in software and / or hardware, optionally through an electronic device such as a mobile terminal, PC, or server. Figure 2 As shown, the video processing method may specifically include the following steps: S210, Get the first video.
[0023] The first video can refer to any video that requires image quality enhancement. The first video can be any video exhibiting image quality distortion. The first video can be a distorted video with a single type of distortion, or a distorted video with multiple types of distortion superimposed on each other.
[0024] For example, the first video may include, but is not limited to, one of the following: computer graphics video, user-generated content video, and film and television video.
[0025] Computer Graphics Video (CG) refers to videos generated or synthesized by computer software; it is also known as 3D animation video or digitally composited video. For example, CG videos can include anime videos, animated videos, and special effects videos. User-Generated Content (UGC) videos are created, shared, or published by ordinary users rather than professional media organizations. UGC videos are often affected by a combination of distortions such as noise, blur, and compression. Film and television videos can be videos shot and produced using professional equipment. For example, film and television videos can include movie videos, short drama videos, and feature-length drama videos.
[0026] S220. Obtain the first analysis result, which is the content analysis result of the first video.
[0027] The first analysis result is obtained by analyzing the video content of the first video. For example, see... Figure 3 The first video is input into the video analysis module for analysis of the video content, and the first analysis result is obtained. The video analysis module can analyze the video from the aspects of scene content and / or image quality distortion to obtain scene content information and / or image quality distortion information.
[0028] For example, the first analysis result may include at least one of scene content information and image quality distortion information; wherein, the scene content information includes the content type of the first video and / or whether there is preset information in the first video; the image quality distortion information includes each type of image quality distortion present in the first video and the degree of distortion corresponding to the type of image quality distortion.
[0029] The content type of the first video refers to the video type to which the video content belongs. The content type of the first video can include, but is not limited to, computer graphics videos, user-generated content videos, and film / television videos. Different video content types can have different image quality requirements. For example, computer graphics videos have fewer scene textures and focus on sharpness. Film / television videos have more textures and focus on texture. Preset information refers to the information that needs to be considered in the pre-defined video content. For example, preset information can include specified text and images. The video image quality can have multiple distortion types. For example, image quality distortion types can include, but are not limited to, at least one of: blur, noise, sharpness, artifacts, and luminance. The degree of distortion corresponding to each image quality distortion type can be used to characterize the degree of distortion of the first video under each image quality distortion type. At least one image quality distortion type can exist in the first video.
[0030] It should be understood that in the content analysis module, the first video can be analyzed using preset scene analysis methods to obtain scene content information of the first video. In addition, the first video can be detected using the image quality distortion detection methods corresponding to each type of image quality distortion to determine whether there is image quality distortion in the first video corresponding to each type of image quality distortion, and to obtain the degree of distortion corresponding to the existing image quality distortion type, thereby obtaining the image quality distortion information of the first video.
[0031] S230. Obtain multiple first video enhancement strategies. The multiple first video enhancement strategies are obtained by using a first model based on the first video and the first analysis results. Each first video enhancement strategy includes: at least one video enhancement method and the enhancement processing order corresponding to at least one video enhancement method.
[0032] The first model can be a model for predicting and generating multiple enhancement strategies for videos. The first model can be a neural network model or an agent. The first video enhancement strategy can refer to an enhancement strategy that matches the first video. The number of first video enhancement strategies is the number of enhancement strategies that the first model can generate simultaneously, and this number can be a pre-set preset number. For example, the preset number can be three, so that the first model can obtain three first video enhancement strategies simultaneously. The video enhancement method can be any method used to enhance video quality. For example, video enhancement methods can include, but are not limited to: video super-resolution, video denoising, video deblurring, video frame interpolation, sharpening, brightness equalization, color enhancement, text enhancement, dark scene enhancement, decompression distortion, icon removal, subtitle removal, water ripple restoration, vertical to horizontal screen conversion, or black and white colorization, etc. The video enhancement method can correspond one-to-one with the image quality distortion type. For example, if the image quality distortion type is blurring, the corresponding video enhancement method is deblurring. The video enhancement methods included in each first video enhancement strategy are all the video enhancement methods that the first video needs to use. The enhancement processing order can be used to characterize the execution order of all video enhancement methods in the first video enhancement strategy. Different enhancement processing orders may produce different processing effects. For example, removing artifacts before super-resolution results in smooth and clear edges, while super-resolution followed by artifact removal may enhance artifacts. Each first video enhancement strategy can be a sequence of video enhancement methods consisting of at least one video enhancement method.
[0033] It should be understood that the first video and the first analysis result can be input into the first model. Based on the input first video and content analysis result, the first model determines and outputs a pre-set number of video enhancement strategies that have the highest matching degree with the first video from all video enhancement strategies, thereby obtaining multiple first video enhancement strategies that match the first video. See also Figure 3 The first model can output three different first video enhancement strategies based on the input first video and content analysis results.
[0034] S240. Obtain multiple second videos, which are obtained by enhancing the first video based on multiple first video enhancement strategies.
[0035] It should be understood that the second video is a video obtained after image quality enhancement. See also... Figure 3 Each first video enhancement strategy and the first video are input into the enhancement processing module. The enhancement processing module uses each first video enhancement strategy to enhance the first video, thereby obtaining multiple enhanced second videos. There can be a one-to-one correspondence between the first video enhancement strategy and the second video. Using different first video enhancement strategies can yield second videos with different image qualities.
[0036] For example, step S240 may include: for each of the plurality of first video enhancement strategies, performing enhancement processing on each video frame in the first video based on each video enhancement method and enhancement processing order in the first video enhancement strategy to obtain a second video, the second video corresponding to the first video enhancement strategy.
[0037] For each of the first video enhancement strategies, the first video is enhanced according to the enhancement processing order within that strategy, using each video enhancement method within that strategy. This means that each video frame in the first video undergoes the same enhancement process, resulting in a second video enhanced using the same first video enhancement strategy. For example, if the first video enhancement strategy is: sharpening → denoising → deblurring, then the first video is first sharpened, then denoised, and finally deblurred. The deblurred first video is then used as the second video.
[0038] S250. Obtain the first quality assessment result, which includes the quality assessment result of each of the multiple second videos. The quality assessment result of the second videos is obtained by using the second model to assess the quality of the second videos.
[0039] The second model can be a model used to evaluate video quality. It can also be a neural network model or an intelligent agent. The quality evaluation result of each second video can be represented by a quality score; for example, a higher quality score indicates higher image quality in the second video.
[0040] It should be understood, see Figure 3 Each second video can be input into the second model, where the quality of each input second video is evaluated and the quality evaluation result of each second video is output, thereby obtaining the first quality evaluation result.
[0041] S260. Based on the first quality assessment results, determine the third video from multiple second videos.
[0042] The third video can refer to the second video with the best enhancement effect. The third video with the best quality is obtained from all the second videos by comparing the results of the first quality assessment.
[0043] For example, the third video is the second video with the best quality assessment result among the first quality assessment results. It should be understood that the quality assessment results of each second video are compared, and the second video with the best quality assessment result (e.g., the highest quality score) is selected as the third video, see [link to relevant documentation]. Figure 3The second video (3) is used as the final enhanced third video, which has the best image quality enhancement effect globally.
[0044] It should be noted that the first video enhancement strategy encompasses all video enhancement methods needed for the first video, forming a long-chain strategy. A method that only predicts the optimal video enhancement method needed for the first video at each step, processes the first video based on that method, and then predicts the next video enhancement method needed for the processed first video, only considers the locally optimal processing effect of the current step. It ignores the long-chain correlation of video enhancement method combinations, which may lead to the optimal enhancement strategy being overlooked, failing to achieve the globally optimal image quality enhancement effect. Therefore, the long-chain multi-strategy prediction implemented using the first model avoids this situation and achieves the globally optimal processing effect.
[0045] The above method employs a first model to obtain multiple first video enhancement strategies based on a first video and first analysis results. Each first video enhancement strategy includes all video enhancement methods required for the first video and the enhancement processing order of these methods, thus achieving long-chain multi-strategy prediction using the first model. By enhancing the first video based on these multiple first video enhancement strategies, multiple second videos are obtained. A second model is then used to evaluate the quality of each second video, obtaining a first quality evaluation result. Based on this first quality evaluation result, a third video with the optimal video processing effect can be determined from the multiple second videos. The entire process requires no manual intervention, achieving automatic video enhancement. Furthermore, the collaborative processing of the first and second models yields the third video with the best global image quality, effectively ensuring the video processing effect.
[0046] In one scenario, the first model is trained using the first sample data; the second model is trained using the second sample data; wherein... The first sample data includes: the second analysis results and the video enhancement strategy tags corresponding to the fourth video; wherein, the video enhancement strategy tags are determined from multiple second video enhancement strategies based on the second quality assessment results, the second quality assessment results include the actual quality assessment results of each of the multiple fifth videos, the multiple fifth videos are obtained by enhancing the fourth video based on multiple second video enhancement strategies, and the multiple second video enhancement strategies are obtained by traversing and combining multiple video enhancement methods; the second analysis results are the content analysis results of the fourth video; the second sample data includes: the sixth video and the actual quality assessment results of the sixth video, wherein the sixth video is the fifth video corresponding to the video enhancement strategy tag.
[0047] The fourth video can be any sample video used to train the model. The fourth video can be a real, distorted video. The second analysis result is obtained in the same way as the first analysis result, as described above, and will not be repeated here. The video enhancement strategy label can be the actual video enhancement strategy that matches the fourth video, serving as the output label during supervised training of the first model. The fifth video is the video obtained by enhancing the fourth video using a second video enhancement strategy. The sixth video is the video obtained by enhancing the fourth video using the video enhancement strategy label. The actual quality assessment result can be a quality score obtained through subjective image quality evaluation of the video, such as manual labeling. The actual quality assessment result can serve as the output label during supervised training of the second model.
[0048] Specifically, see Figure 4 The process of determining the first and second sample data can be as follows: All video enhancement methods are iterated through and combined, and each combination is used as a second video enhancement strategy, thus obtaining all possible second video enhancement strategies. Each second video enhancement strategy includes at least one video enhancement method and the enhancement processing order corresponding to these methods. For each second video enhancement strategy, the fourth video is enhanced based on that strategy, and the enhanced fourth video is used as the fifth video, thus obtaining a fifth video for each second video enhancement strategy. Subjective image quality can be manually evaluated for each fifth video to obtain its actual quality assessment result, such as its actual quality score. Based on the actual quality assessment result of each fifth video, the enhancement strategy label corresponding to the fourth video is determined from all second video enhancement strategies. For example, based on the actual quality assessment result of each fifth video, all fifth videos are sorted in descending order to obtain the top preset number of fifth videos. The second video enhancement strategies corresponding to the top preset number of fifth videos are used as strategy video enhancement labels, thus obtaining a preset number of second video enhancement strategies matching the fourth video. The first sample data is obtained by combining the fourth video, the second analysis result (i.e., the content analysis result of the fourth video), and the video enhancement strategy tag corresponding to the fourth video. The fifth video, ranked from the top preset number, can be used as the sixth video, and the actual quality assessment result of the fifth video can also be used as the actual quality assessment result of the sixth video. The second sample data is obtained by combining the sixth video and its actual quality assessment result.
[0049] For example, a first model to be trained is subjected to supervised training based on the first sample data to obtain a first model capable of accurately generating long-chain multi-strategy algorithms. For instance, the fourth video and the second analysis results are input into the first model to be trained to generate multiple video enhancement strategies. Based on the loss function, the training error is determined according to the output of the first model to be trained and the video enhancement strategy label corresponding to the fourth video. The training error is then backpropagated to the first model to be trained, and the model parameters in the first model to be trained are adjusted until a preset convergence condition is met, such as the number of iterations equaling a preset number, or the change in training error tending to be stable. The training is then considered complete, and the trained first model is obtained.
[0050] For example, a second model to be trained is supervisedly trained based on the second sample data to obtain a second model capable of accurately evaluating video quality. For instance, a sixth video is input into the second model to be trained for video quality evaluation. Based on a loss function, the training error is determined according to the output of the second model and the actual quality evaluation result of the sixth video. This training error is then backpropagated to the second model to be trained, and the model parameters are adjusted until a preset convergence condition is met, such as the number of iterations equaling a preset number, or the training error becoming stable. Training is then considered complete when this condition is met, resulting in the trained second model.
[0051] It should be noted that since the sample videos used to train the first and second models are real distorted videos, rather than synthetic distorted videos, the generalization and robustness of the models in different real distorted scenarios are improved, thereby ensuring the video processing effect.
[0052] In one embodiment, the method further includes: storing historical strategy information, which includes a first analysis result and a third video enhancement strategy, wherein the third video enhancement strategy is the first video enhancement strategy used by the third video; and optimizing the second model based on the historical strategy information.
[0053] The third video enhancement strategy is the globally optimal first video enhancement strategy. For example, historical strategy information could include: CG videos require desharpening before super-resolution; or UGC videos require desharpening before deblurring, etc. It should be understood that each video processing yields the content analysis results and the optimal video enhancement strategy, thus obtaining the corresponding historical strategy information. This historical strategy information can be stored in the first database. Historical strategy information can be used to represent the empirical information generated each time a strategy is generated. The first database can serve as the empirical database for the second model. When generating multiple video enhancement strategies, the second model can adjust and optimize the generated strategies based on the historical strategy information in the first database, resulting in more accurate video enhancement strategies output by the second model. This further improves the accuracy of the video enhancement strategies generated by the second model and also enhances the video processing effect.
[0054] Figure 5 This is a flowchart illustrating another video processing method provided in one scenario, which optimizes the video quality assessment process based on the above embodiments. Explanations of terms that are the same as or corresponding to those in the above embodiments are not repeated here. Figure 5 As shown, the video processing method can specifically include the following steps: S510, Get the first video.
[0055] S520. Obtain the first analysis result, which is the content analysis result of the first video.
[0056] S530. Obtain multiple first video enhancement strategies. The multiple first video enhancement strategies are obtained by using a first model based on a first video and a first analysis result. Each first video enhancement strategy includes: at least one video enhancement method and the enhancement processing order corresponding to at least one video enhancement method.
[0057] S540. Obtain multiple second videos, which are obtained by enhancing the first video based on multiple first video enhancement strategies.
[0058] S550. Obtain the video quality assessment method. The video quality assessment method is obtained by using the third model based on the first analysis result. The video quality assessment method is matched with the first video.
[0059] The third model can be a model used to determine the video quality assessment method that matches the first video. The third model can be a neural network model or an intelligent agent; alternatively, the second and third models can be combined into the same intelligent agent.
[0060] For example, video quality assessment methods may include: at least one video quality assessment metric and corresponding assessment weights for the video quality assessment metric. The video quality assessment metric can be any indicator used to evaluate video quality. Different video quality assessment metrics can correspond to different quality assessment dimensions of the video. For example, video quality assessment metrics may include, but are not limited to: NIQE (Natural Image Quality Evaluator), HyperIQA (Hyper Network-based Image Quality Assessment), CLIP-IQA (CLIP-based Image Quality Assessment), etc. CLIP (Contrastive Language Image Pre-training) is a multimodal foundational model. NIQE is a referenceless image quality assessment metric that does not require a clear original image as a reference. It achieves an objective assessment of image quality by extracting statistical features of natural images, primarily used to evaluate image sharpness, distortion, and naturalness; lower scores generally indicate better visual quality. HyperIQA is an image quality assessment method based on deep learning and hypernetwork architecture. It models and predicts image content, distortion type, and distortion degree by constructing a quality-aware network, outputting an objective image quality score highly consistent with human subjective perception; a higher score indicates better image quality. CLIP-IQA is an image quality assessment method based on the multimodal pre-trained model CLIP. It utilizes the semantic understanding and visual feature extraction capabilities of the CLIP model to assess image quality in dimensions such as sharpness, texture naturalness, and color fidelity, achieving objective scoring consistent with human subjective evaluation. The evaluation weight corresponding to each video quality assessment indicator can be used to represent the proportion of that video quality assessment indicator when evaluating the quality of the first video.
[0061] It should be understood, see Figure 6The first analysis result is input into the third model to determine the video quality assessment method. Based on the input first analysis result, the third model can select at least one suitable video quality assessment indicator from a pre-configured set of video quality assessment indicators, assign appropriate assessment weights to it, and obtain and output the video quality assessment method corresponding to the first video. For example, the video quality assessment methods are NIQE, HyperIQA, and CLIP-IQA, with assessment weights of 0.7, 0.3, and 1, respectively. By utilizing the third model to adaptively adjust the video quality assessment method according to the video content analysis results, dynamic adjustment of the video quality assessment method is achieved to adapt to the differentiated needs of different video scenarios. For example, CG videos emphasize sharpness, while film and television videos emphasize texture, thereby further improving the accuracy of video quality assessment.
[0062] It should be noted that step S550 can be executed after step S540, before step S540, or simultaneously with step S540. The execution order of step S550 is not limited here.
[0063] S560. Obtain the first quality assessment result, which includes the quality assessment result of each of the multiple second videos. The second model performs a quality assessment on the second videos based on the video quality assessment method to obtain the quality assessment result of the second videos.
[0064] It should be understood that each of the multiple second videos is input into the second model, which then performs a video quality assessment on each input second video based on a video quality evaluation method, obtaining a quality assessment result for each second video. The video quality evaluation method used in the second model is the same as the video quality evaluation method output by the third model. By using a video quality evaluation method that is more closely matched to the first video, the second model can further improve the accuracy of the video quality assessment and further ensure the video processing effect.
[0065] For example, the method further includes: storing historical evaluation information, which includes the first analysis results and the video quality evaluation method; wherein the third model is optimized based on the historical evaluation information.
[0066] The video quality assessment method is one that matches the results of the first analysis. Historical assessment information includes both the first analysis results and the video quality assessment method. For example, historical assessment information might include: for CG videos, the assessment method might be NIQE weight 0.4, HyperIQA weight 0.7, and CLIP-IQA weight 1; or for UGC videos, the assessment method might be NIQE weight 0.5, HyperIQA weight 1, and CLIP-IQA weight 0.2, etc. Each time a video is processed, the content analysis results and corresponding video quality assessment method are obtained, thus acquiring the historical assessment information for that video. This historical assessment information is stored in the second database. The historical assessment information can be used to represent the experience information used when generating each video quality assessment method. The second database can serve as the experience database for the third model. When generating video quality assessment methods, the third model can adjust and optimize the generated methods based on the historical assessment information in the second database, making the output of the third model a more suitable video quality assessment method, further improving the accuracy of the video quality assessment, and also further ensuring the video processing effect.
[0067] S570. Based on the results of the first quality assessment, determine the third video from multiple second videos.
[0068] The above method uses a third model to dynamically determine the video quality assessment method that matches the first video, so that the second model can perform video quality assessment on the second video based on the video quality assessment method to obtain the first quality assessment result. This achieves adaptive adjustment of the video quality assessment method to adapt to the differentiated needs of different video scenarios, further improving the accuracy of video quality assessment and ensuring the video processing effect.
[0069] Figure 7 This is a schematic diagram of the structure of a video processing device provided in one scenario, such as... Figure 7 As shown, the device may include: a first video acquisition module 710, an analysis result acquisition module 720, an enhancement strategy acquisition module 730, a second video acquisition module 740, a quality assessment result acquisition module 750, and a third video determination module 760.
[0070] The system includes: a first video acquisition module 710 for acquiring a first video; an analysis result acquisition module 720 for acquiring a first analysis result, which is the content analysis result of the first video; an enhancement strategy acquisition module 730 for acquiring multiple first video enhancement strategies, which are obtained by using a first model based on the first video and the first analysis result, and each of the multiple first video enhancement strategies includes: at least one video enhancement method and an enhancement processing order corresponding to the at least one video enhancement method; a second video acquisition module 740 for acquiring multiple second videos, which are obtained by enhancing the first video based on the multiple first video enhancement strategies; a quality assessment result acquisition module 750 for acquiring a first quality assessment result, which includes the quality assessment result of each of the multiple second videos, and the quality assessment result of the second video is obtained by using a second model to perform quality assessment on the second video; and a third video determination module 760 for determining a third video from the multiple second videos based on the first quality assessment result.
[0071] Based on the aforementioned apparatus, a first model is employed to obtain multiple first video enhancement strategies based on a first video and a first analysis result. Each first video enhancement strategy includes all video enhancement methods required for the first video and the enhancement processing order of these methods, thereby achieving long-chain multi-strategy prediction using the first model. By enhancing the first video based on these multiple first video enhancement strategies, multiple second videos are obtained. A second model is then used to evaluate the quality of each second video, obtaining a first quality evaluation result. Based on this first quality evaluation result, a third video with the optimal video processing effect can be determined from the multiple second videos. The entire process requires no manual intervention, achieving automatic video enhancement. Furthermore, the collaborative processing of the first and second models yields the third video with the best global image quality, effectively ensuring the video processing effect.
[0072] Optionally, the first analysis result includes at least one of: scene content information and image quality distortion information; The scene content information includes the content type of the first video and / or whether there is preset information in the first video; The image quality distortion information includes the type of image quality distortion present in the first video and the degree of distortion corresponding to the type of image quality distortion.
[0073] Optionally, the second video acquisition module 740 is specifically used for: For each of the plurality of first video enhancement strategies, based on each video enhancement method and enhancement processing order in the first video enhancement strategy, each video frame in the first video is enhanced to obtain a second video, which corresponds to the first video enhancement strategy.
[0074] Optionally, the device may also include: An evaluation method determination module is used to obtain a video quality evaluation method, wherein the video quality evaluation method is obtained by using a third model based on the first analysis result, and the video quality evaluation method is matched with the first video; The second model performs a quality assessment on the second video based on the video quality assessment method to obtain the quality assessment result of the second video.
[0075] Optionally, the video quality assessment method includes: at least one video quality assessment indicator and the assessment weight corresponding to the video quality assessment indicator.
[0076] Optionally, the device may also include: A historical evaluation information storage module is used to store historical evaluation information, which includes the first analysis result and the video quality evaluation method; wherein, the third model is optimized based on the historical evaluation information.
[0077] Optionally, the third video is the second video with the best quality assessment result among the first quality assessment results.
[0078] Optionally, the first model is obtained by training based on the first sample data; the second model is obtained by training based on the second sample data; wherein, The first sample data includes: a fourth video, a second analysis result, and a video enhancement strategy tag corresponding to the fourth video; wherein, the video enhancement strategy tag is determined from multiple second video enhancement strategies based on the second quality assessment result, the second quality assessment result includes the actual quality assessment result of each of the multiple fifth videos, the multiple fifth videos are obtained by enhancing the fourth video based on the multiple second video enhancement strategies, and the multiple second video enhancement strategies are obtained by traversing and combining multiple video enhancement methods; the second analysis result is the content analysis result of the fourth video; The second sample data includes: the sixth video and the actual quality assessment result of the sixth video, wherein the sixth video is the fifth video corresponding to the video enhancement strategy tag.
[0079] Optionally, the first video includes one of the following: Computer graphics videos, user-generated content videos, and film and television videos.
[0080] The video processing apparatus provided herein can execute the video processing method provided in any embodiment of this invention, and has the corresponding functional modules and beneficial effects for executing the method.
[0081] It is worth noting that the various units and modules included in the above-mentioned device are divided according to functional logic, but are not limited to the above division, as long as the corresponding functions can be realized; in addition, the specific names of each functional unit are only for easy differentiation and are not used to limit the scope of protection.
[0082] Figure 8 This is a schematic diagram of the structure of an electronic device provided in one scenario. See below for reference. Figure 8 It shows an electronic device suitable for implementing this technical solution (e.g., Figure 8 The diagram below shows the structure of the terminal device or server 500. The terminal device in this document may include, but is not limited to, mobile terminals such as mobile phones, laptops, digital radio receivers, PDAs (Personal Digital Assistants), tablets, PMPs (Portable Media Players), in-vehicle terminals (e.g., in-vehicle navigation terminals), and fixed terminals such as digital televisions and desktop computers. Figure 8 The electronic devices shown are merely examples and should not impose any limitations on their functionality or scope of use.
[0083] like Figure 8 As shown, the electronic device 500 may include a processing unit (e.g., a central processing unit, a graphics processing unit, etc.) 501, which can perform various appropriate actions and processes according to a program stored in read-only memory (ROM) 502 or a program loaded from storage device 508 into random access memory (RAM) 503. The random access memory 503 also stores various programs and data required for the operation of the electronic device 500. The processing unit 501, ROM 502, and RAM 503 are interconnected via a bus 504. An edit / output (I / O) interface 505 is also connected to the bus 504.
[0084] Typically, the following devices can be connected to the input / output interface 505: input devices 506 including, for example, a touchscreen, touchpad, keyboard, mouse, camera, microphone, accelerometer, gyroscope, etc.; output devices 507 including, for example, a liquid crystal display, speaker, vibrator, etc.; storage devices 508 including, for example, magnetic tape, hard disk, etc.; and communication devices 509. Communication device 509 allows electronic device 500 to communicate wirelessly or wiredly with other devices to exchange data. Although Figure 8 An electronic device 500 with various devices is shown; however, it should be understood that it is not required to implement or possess all of the devices shown. More or fewer devices may be implemented or possessed alternatively.
[0085] In particular, according to embodiments herein, the processes described in the above-referenced flowcharts can be implemented as computer software programs. For example, embodiments herein include a computer program product comprising a computer program carried on a non-transitory computer-readable medium, the computer program containing program code for performing the methods shown in the flowcharts. In such embodiments, the computer program can be downloaded and installed from a network via communication device 509, or installed from storage device 508, or installed from read-only memory 502. When the computer program is executed by processing device 501, the aforementioned functions defined in the methods are performed.
[0086] The names of messages or information exchanged between multiple devices in the embodiments herein are for illustrative purposes only and are not intended to limit the scope of these messages or information.
[0087] The electronic device provided in this text and the video processing method provided in the above embodiments belong to the same inventive concept. Technical details not described in detail can be found in the above embodiments, and they have the same beneficial effects as the above embodiments.
[0088] The text provides a computer storage medium on which a computer program is stored, which, when executed by a processor, implements the video processing method provided in the above embodiments.
[0089] It should be noted that the aforementioned computer-readable medium can be a computer-readable signal medium, a computer-readable storage medium, or any combination thereof. A computer-readable storage medium can be, for example,—but not limited to—an electrical, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any combination thereof. More specific examples of a computer-readable storage medium may include, but are not limited to: an electrical connection having one or more wires, a portable computer disk, a hard disk, a random access memory, a read-only memory, an erasable programmable read-only memory, an optical fiber, a portable compact disk read-only memory, an optical storage device, a magnetic storage device, or any suitable combination thereof. A computer-readable storage medium can be any tangible medium containing or storing a program that can be used by or in conjunction with an instruction execution system, apparatus, or device. A computer-readable signal medium can include a data signal propagated in baseband or as part of a carrier wave, carrying computer-readable program code. Such propagated data signals can take various forms, including but not limited to electromagnetic signals, optical signals, or any suitable combination thereof. A computer-readable signal medium may be any computer-readable medium other than a computer-readable storage medium, which can send, propagate, or transmit a program for use by or in connection with an instruction execution system, apparatus, or device. The program code contained on the computer-readable medium can be transmitted using any suitable medium, including but not limited to: wires, optical fibers, RF (Radio Frequency), etc., or any suitable combination thereof.
[0090] In some implementations, clients and servers can communicate using any currently known or future-developed network protocol such as HTTP (Hypertext Transfer Protocol), and can interconnect with digital data communication (e.g., communication networks) of any form or medium. Examples of communication networks include local area networks (LANs), wide area networks (WANs), the Internet (e.g., the Internet), and peer-to-peer networks (e.g., self-organizing peer-to-peer networks), as well as any currently known or future-developed networks.
[0091] The aforementioned computer-readable medium may be included in the aforementioned electronic device; or it may exist independently and not assembled into the electronic device.
[0092] The aforementioned computer-readable medium carries one or more programs, which, when executed by the electronic device, cause the electronic device to: acquire a first video; obtain a first analysis result, wherein the first analysis result is a content analysis result of the first video; obtain a plurality of first video enhancement strategies, wherein the plurality of first video enhancement strategies are obtained using a first model based on the first video and the first analysis result, and each of the plurality of first video enhancement strategies includes: at least one video enhancement method and an enhancement processing order corresponding to the at least one video enhancement method; acquire a plurality of second videos, wherein the plurality of second videos are obtained by enhancing the first video based on the plurality of first video enhancement strategies; obtain a first quality assessment result, wherein the first quality assessment result includes a quality assessment result of each of the plurality of second videos, wherein the quality assessment result of the second video is obtained by performing a quality assessment on the second video using a second model; and determine a third video from the plurality of second videos based on the first quality assessment result.
[0093] The beneficial effects of the aforementioned storage medium are as follows: By employing a first model, multiple first video enhancement strategies are obtained based on the first video and the first analysis results. Each first video enhancement strategy includes all video enhancement methods required for the first video and the enhancement processing order of these methods, thereby achieving long-chain multi-strategy prediction using the first model. By enhancing the first video based on these multiple first video enhancement strategies, multiple second videos are obtained. The second model is then used to perform quality evaluation on each second video, obtaining a first quality evaluation result. Based on this result, a third video with the optimal video processing effect can be determined from the multiple second videos. The entire process requires no manual intervention, achieving automatic video enhancement processing. Furthermore, the collaborative processing of the first and second models can obtain the third video with the best global image quality, effectively ensuring the video processing effect.
[0094] Computer program code for performing the operations described herein may be written in one or more programming languages or a combination thereof, including but not limited to object-oriented programming languages such as Java, Smalltalk, and C++, as well as conventional procedural programming languages such as the "C" language or similar programming languages. The program code may execute entirely on the user's computer, partially on the user's computer, as a standalone software package, partially on the user's computer and partially on a remote computer, or entirely on a remote computer or server. In cases involving remote computers, the remote computer may be connected to the user's computer via any type of network, including local area networks (LANs) or wide area networks (WANs), or it may be connected to an external computer (e.g., via the Internet using an Internet service provider).
[0095] The text also provides a computer program product, including a computer program that, when executed by a processor, implements the video processing methods provided in the above embodiments.
[0096] The computer program product includes a computer program carried on a non-transitory computer-readable medium, which contains program code for performing video processing methods. The program code can be written in one or more programming languages or a combination thereof, including object-oriented programming languages such as Java, Smalltalk, and C++, as well as conventional procedural programming languages such as C or similar languages. The program code can be executed entirely on the user's computer, partially on the user's computer, as a standalone software package, partially on the user's computer and partially on a remote computer, or entirely on a remote computer or server. In cases involving remote computers, the remote computer can be connected to the user's computer via any type of network (including local area networks or wide area networks), or it can be connected to an external computer (e.g., via the Internet using an Internet service provider).
[0097] The beneficial effects of the aforementioned program product are as follows: By employing a first model, multiple first video enhancement strategies are obtained based on a first video and the first analysis results. Each first video enhancement strategy includes all video enhancement methods required for the first video and the enhancement processing order of these methods, thereby achieving long-chain multi-strategy prediction using the first model. By enhancing the first video based on these multiple first video enhancement strategies, multiple second videos are obtained. The second model is then used to evaluate the quality of each second video, obtaining a first quality evaluation result. Based on this result, a third video with the optimal video processing effect can be determined from the multiple second videos. The entire process requires no manual intervention, achieving automatic video enhancement. Furthermore, the collaborative processing of the first and second models yields the third video with the best overall image quality, effectively ensuring the video processing effect.
[0098] The flowcharts and block diagrams in the accompanying drawings illustrate the architecture, functionality, and operation of possible implementations of systems, methods, and computer program products according to various embodiments of this document. In this regard, each block in a flowchart or block diagram may represent a module, segment, or portion of code containing one or more executable instructions for implementing the specified logical function. It should also be noted that in some alternative implementations, the functions indicated in the blocks may occur in a different order than those indicated in the drawings. For example, two consecutively indicated blocks may actually be executed substantially in parallel, and they may sometimes be executed in reverse order, depending on the functions involved. It should also be noted that each block in the block diagrams and / or flowcharts, and combinations of blocks in the block diagrams and / or flowcharts, may be implemented using a dedicated hardware-based system that performs the specified function or operation, or using a combination of dedicated hardware and computer instructions.
[0099] The units described herein can be implemented in software or hardware. The name of a unit does not necessarily limit the unit itself; for example, the first acquisition unit can also be described as "a unit that acquires at least two Internet Protocol addresses".
[0100] The functions described above in this document can be performed, at least in part, by one or more hardware logic components. For example, exemplary types of hardware logic components that can be used, without limitation, include: Field Programmable Gate Array (FPGA), Application Specific Integrated Circuit (ASIC), Application Specific Standard Product (ASSP), System on Chip (SOC), Complex Programmable Logic Device (CPLD), and so on.
[0101] In the context of this document, a machine-readable medium can be a tangible medium that may contain or store a program for use by or in conjunction with an instruction execution system, apparatus, or device. A machine-readable medium can be a machine-readable signal medium or a machine-readable storage medium. Machine-readable media can be, but is not limited to, electronic, magnetic, optical, electromagnetic, infrared, or semiconductor systems, apparatus, or devices, or any suitable combination of the foregoing. More specific examples of machine-readable storage media include electrical connections based on one or more wires, portable computer disks, hard disks, random access memory, read-only memory, erasable programmable read-only memory, optical fibers, portable compact disk read-only memory, optical storage devices, magnetic storage devices, or any suitable combination of the foregoing.
[0102] The above description is merely a preferred embodiment and an explanation of the technical principles employed. Those skilled in the art should understand that the scope of disclosure herein is not limited to technical solutions formed by specific combinations of the above-described technical features, but should also cover other technical solutions formed by arbitrary combinations of the above-described technical features or their equivalents without departing from the above-described concept. For example, technical solutions formed by substituting the above features with (but not limited to) technical features disclosed herein that have similar functions.
[0103] Furthermore, while the operations are described in a specific order, this should not be construed as requiring these operations to be performed in the specific order shown or in a sequential order. In certain circumstances, multitasking and parallel processing may be advantageous. Similarly, while several specific implementation details are included in the above discussion, these should not be interpreted as limiting the scope of this document. Certain features described in the context of individual implementations may also be implemented in combination in a single implementation. Conversely, various features described in the context of a single implementation may also be implemented individually or in any suitable sub-combination in multiple implementations.
[0104] Although the subject matter has been described using language specific to structural features and / or methodological logic, it should be understood that the subject matter defined in the appended claims is not necessarily limited to the specific features or actions described above. Rather, the specific features and actions described above are merely illustrative examples of implementing the claims.
Claims
1. A video processing method, comprising: Get the first video; Obtain the first analysis result, which is the content analysis result of the first video; Multiple first video enhancement strategies are obtained. The multiple first video enhancement strategies are obtained by using a first model based on the first video and the first analysis results. Each of the multiple first video enhancement strategies includes: at least one video enhancement method and the enhancement processing order corresponding to the at least one video enhancement method. Multiple second videos are obtained, wherein the multiple second videos are obtained by enhancing the first video based on the multiple first video enhancement strategies; A first quality assessment result is obtained, which includes the quality assessment result of each of the plurality of second videos. The quality assessment result of the second video is obtained by using a second model to assess the quality of the second video. Based on the first quality assessment result, a third video is determined from the plurality of second videos.
2. The video processing method according to claim 1, wherein the first analysis result includes: At least one of scene content information and image quality distortion information; The scene content information includes the content type of the first video and / or whether there is preset information in the first video; The image quality distortion information includes the type of image quality distortion present in the first video and the degree of distortion corresponding to the type of image quality distortion.
3. The video processing method according to claim 1, wherein obtaining a plurality of second videos includes: For each of the plurality of first video enhancement strategies, based on each video enhancement method and enhancement processing order in the first video enhancement strategy, each video frame in the first video is enhanced to obtain a second video, which corresponds to the first video enhancement strategy.
4. The video processing method according to claim 1, further comprising: A video quality assessment method is obtained, which is obtained by using a third model based on the first analysis result, and the video quality assessment method is matched with the first video; The second model performs a quality assessment on the second video based on the video quality assessment method to obtain the quality assessment result of the second video.
5. The video processing method according to claim 4, wherein the video quality assessment method includes: At least one video quality assessment metric and the corresponding assessment weight for the video quality assessment metric.
6. The video processing method according to claim 4, further comprising: Store historical evaluation information, which includes the first analysis result and the video quality evaluation method; The third model is an optimization based on the historical evaluation information.
7. The video processing method according to claim 1, wherein the third video is the second video with the best quality assessment result among the first quality assessment results.
8. The video processing method according to claim 1, wherein the first model is obtained by training based on the first sample data; The second model was obtained by training based on the second sample data; wherein, The first sample data includes: a fourth video, a second analysis result, and a video enhancement strategy tag corresponding to the fourth video; wherein, the video enhancement strategy tag is determined from multiple second video enhancement strategies based on the second quality assessment result, the second quality assessment result includes the actual quality assessment result of each of the multiple fifth videos, the multiple fifth videos are obtained by enhancing the fourth video based on the multiple second video enhancement strategies, and the multiple second video enhancement strategies are obtained by traversing and combining multiple video enhancement methods; the second analysis result is the content analysis result of the fourth video; The second sample data includes: the sixth video and the actual quality assessment result of the sixth video, wherein the sixth video is the fifth video corresponding to the video enhancement strategy tag.
9. The video processing method according to any one of claims 1-8, wherein the first video comprises one of the following: Computer graphics videos, user-generated content videos, and film and television videos.
10. A video processing apparatus, comprising: The first video acquisition module is used to acquire the first video. The analysis result acquisition module is used to obtain a first analysis result, which is the content analysis result of the first video. An enhancement strategy acquisition module is used to acquire multiple first video enhancement strategies. The multiple first video enhancement strategies are obtained by using a first model based on the first video and the first analysis result. Each of the multiple first video enhancement strategies includes: at least one video enhancement method and the enhancement processing order corresponding to the at least one video enhancement method. The second video acquisition module is used to acquire multiple second videos, which are obtained by enhancing the first video based on the multiple first video enhancement strategies. A quality assessment result acquisition module is used to obtain a first quality assessment result, which includes the quality assessment result of each of the plurality of second videos. The quality assessment result of the second video is obtained by performing a quality assessment on the second video using a second model. The third video determination module is used to determine a third video from the plurality of second videos based on the first quality assessment result.
11. An electronic device, the electronic device comprising: One or more processors; Storage device for storing one or more programs. When the one or more programs are executed by the one or more processors, the one or more processors implement the video processing method as described in any one of claims 1-9.
12. A storage medium comprising computer-executable instructions, which, when executed by a computer processor, are used to perform the video processing method as described in any one of claims 1-9.
13. A computer program product comprising a computer program that, when executed by a processor, implements the video processing method as described in any one of claims 1-9.