Image processing method and apparatus, and related device
By fusing multimodal feature data from business images and target standard images under different business scenarios, and using a multimodal large language model for image quality detection, the problem of insufficient detection accuracy in existing technologies is solved, and flexible and accurate image quality detection is achieved.
Patent Information
- Authority / Receiving Office
- WO · WO
- Patent Type
- Applications
- Current Assignee / Owner
- TENCENT TECHNOLOGY (SHENZHEN) CO LTD
- Filing Date
- 2025-12-01
- Publication Date
- 2026-07-02
AI Technical Summary
Existing technologies cannot use the same neural network model to flexibly implement image quality detection in different business scenarios, resulting in reduced detection accuracy.
Image quality detection is performed by acquiring business images and retrieving matching target standard images from a standard image set of the target business scenario, fusing multimodal feature data of business images and target standard images, and utilizing a multimodal large language model for detection.
It improves the flexibility and accuracy of image quality detection, adapting to the needs of different business scenarios.
Smart Images

Figure CN2025139138_02072026_PF_FP_ABST
Abstract
Description
Image processing methods, apparatus and related equipment
[0001] This application claims priority to Chinese Patent Application No. 2024119576863, filed on December 27, 2024, entitled "Image Processing Method, Apparatus and Related Equipment", the entire contents of which are incorporated herein by reference. Technical Field
[0002] This application relates to the field of data processing technology, and in particular to image processing methods, apparatus and related equipment. Background Technology
[0003] In existing technologies, different image quality judgment standards exist for images in different business scenarios. Therefore, it is necessary to construct different neural network models for images in different business scenarios to perform image quality detection. For example, it is necessary to construct a neural network model (e.g., network model Mo1) corresponding to a certain image quality judgment standard for images in a certain business scenario (e.g., business scenario B1), and construct a neural network model (e.g., network model Mo2) corresponding to another image quality judgment standard for images in another business scenario (e.g., business scenario B2).
[0004] However, the inventors discovered in practice that, based on existing image quality judgment standards, it is impossible to use the same neural network model to flexibly perform quality detection on images in different business scenarios. For example, if the network model Mo1 corresponding to business scenario B1 is used to perform quality detection on images in another business scenario (e.g., business scenario B2), the network model Mo1 will have insufficient generalization ability, resulting in difficulty in accurately obtaining quality detection results that match the images in business scenario B2, thus reducing the accuracy of image quality detection. Summary of the Invention
[0005] This application provides an image processing method, apparatus, and related equipment that, upon acquiring any image (i.e., a business image), can intelligently and flexibly determine the target business scenario to which the business image belongs in at least one business scenario. Furthermore, it can comprehensively utilize the standard image set corresponding to the target business scenario to quickly retrieve a target standard image matching the business image. By combining the business image with the retrieved target standard image, multimodal feature data is generated that integrates both business feature data from the business image and feature data and image quality data from the target standard image. This improves the flexibility and accuracy of image quality detection across different business scenarios.
[0006] This application provides an image processing method, which includes:
[0007] Acquire the business image to be subjected to image quality inspection; the business image is an image under the target business scenario determined in at least one business scenario; each business scenario in at least one business scenario is assigned and configured with a corresponding standard image set;
[0008] In the standard image set corresponding to the target business scenario, retrieve the target standard image that matches the business image, and obtain the target image quality data bound to the target standard image;
[0009] When the business feature data of the business image and the target standard feature data of the target standard image are obtained, the business feature data, the target standard feature data and the target image quality data are spliced together to obtain multimodal feature data for input into the target business model.
[0010] When the model text prompt information for the target business model is obtained, the multimodal feature data and the model text prompt information are input into the target business model. The target business model then performs image quality detection on the multimodal feature data according to the model text prompt information to obtain image quality detection data.
[0011] One embodiment of this application provides an image processing apparatus, the apparatus comprising:
[0012] The image acquisition unit is used to acquire the business image to be subjected to image quality detection; the business image is an image under the target business scenario determined in at least one business scenario; each business scenario in at least one business scenario is assigned and configured with a corresponding standard image set;
[0013] The image retrieval unit is used to retrieve a target standard image that matches the business image from the standard image set corresponding to the target business scenario, and to obtain the target image quality data bound to the target standard image.
[0014] The feature data splicing unit is used to splice the business feature data, target standard feature data and target image quality data when the business feature data of the business image and the target standard feature data of the target standard image are obtained, and to splice them to obtain multimodal feature data for input into the target business model.
[0015] The image quality detection unit is used to input multimodal feature data and model text prompt information into the target business model when model text prompt information for the target business model is obtained. The target business model then performs image quality detection on the multimodal feature data according to the model text prompt information to obtain image quality detection data.
[0016] One aspect of this application provides a computer-readable storage medium storing a computer program adapted to be loaded and executed by a processor, so that a computer device having the processor performs the method provided in this application.
[0017] One embodiment of this application provides a computer program product or computer program, which includes computer instructions stored in a computer-readable storage medium. A processor of a computer device reads the computer instructions from the computer-readable storage medium and executes the computer instructions, causing the computer device to perform the method provided in this application embodiment.
[0018] In this embodiment, when a computer device acquires any business image requiring image quality detection, it can intelligently determine whether a business scenario matching the business image exists in at least one business scenario. If so, the business scenarios matching the business image in at least one business scenario are collectively referred to as the target business scenario to which the business image belongs. It should be understood that in this embodiment, each business scenario in the at least one scenario is configured with a corresponding standard image set. Further, the computer device can quickly search the standard image set corresponding to the target business scenario to see if a target standard image matching the business image exists (e.g., a standard image with a similar image category and clarity to the business image). If so, it can further acquire the image quality data (i.e., target image quality data) bound to the retrieved target standard image. Thus, by splicing the image quality data (i.e., target image quality data), the standard feature data (i.e., target standard feature data), and the business feature data corresponding to the business image, multimodal feature data that integrates both the business feature data of the business image and the feature data and image quality data of the target standard image can be obtained, which can then be input into the target business model (e.g., a large language model). Furthermore, the computer device can obtain model text prompts for the target business model, and then input the model text prompts and multimodal feature data into the target business model (e.g., a large language model), so that the target business model can perform image quality detection on the multimodal feature data according to the model text prompts to obtain image quality detection data.It should be understood that the target standard image retrieved from the standard image set corresponding to the target business scenario in this application embodiment, which matches the business image, can be equivalently regarded as a reference that the computer device can use as an image quality judgment standard when performing image quality detection. This means that for any image currently acquired, a target business scenario that matches it can be flexibly found in at least one of the above-mentioned business scenarios. Therefore, when performing image quality detection on a business image in any business scenario (such as the target business scenario), this application embodiment can generate an image that integrates different information sources by combining the business image in the corresponding business scenario with the target standard image that matches the business image. Multimodal feature data from different modalities is input into the target business model for processing. This enables the target business model to automatically perform feature understanding and mining on the multimodal feature data. By mining, the image quality judgment standard used by the target standard image in the target business scenario can be obtained as the image quality judgment standard for the business image in the target business scenario. In this way, when the computer equipment performs image quality detection on the business image according to the image quality judgment standard, it can accurately calculate the image quality detection data that is as close as possible to the image quality judgment standard. This improves the flexibility and accuracy of image quality detection in different business scenarios. Attached Figure Description
[0019] Figure 1 is a schematic diagram of the structure of an image processing system provided in an embodiment of this application;
[0020] Figure 2 is a schematic diagram of a scene of an image processing method provided in an embodiment of this application;
[0021] Figure 3 is a flowchart illustrating an image processing method provided in an embodiment of this application;
[0022] Figure 4 is a flowchart illustrating the process of determining a target business scenario provided in an embodiment of this application;
[0023] Figure 5 is a schematic flowchart of an image quality detection process provided in an embodiment of this application;
[0024] Figure 6 is a schematic flowchart of an image processing method provided in an embodiment of this application;
[0025] Figure 7 is a schematic flowchart of a feature extraction process provided in an embodiment of this application;
[0026] Figure 8 is a flowchart illustrating the training process of a semantic feature extractor provided in an embodiment of this application;
[0027] Figure 9 is a schematic flowchart of the training process of a quality feature extractor provided in an embodiment of this application;
[0028] Figure 10 is a schematic flowchart of another image quality detection process provided in an embodiment of this application;
[0029] Figure 11 is a schematic diagram of the structure of an image processing device provided in an embodiment of this application;
[0030] Figure 12 is a schematic diagram of the structure of a computer device provided in an embodiment of this application. Detailed Implementation
[0031] The concepts of relevant technical terms used in the embodiments of this application will be explained below:
[0032] I. A business image refers to an image that requires image quality detection, acquired by a computer device (e.g., a server) during the application of a business model. In this embodiment, the business image may specifically refer to an image determined from a user's uploaded image or video via a corresponding client (e.g., a social client, a news client, a video client, etc.).
[0033] For example, when a user needs to publish one or more images to a social platform through a social client, in order to improve the image quality (e.g., image clarity) of these images, the server that provides publishing services for users on the social platform can, upon receiving one or more images uploaded by the user, employ the image processing method involved in this application to provide an image quality detection method based on a multimodal feature retrieval and generation mechanism, thereby improving the image quality (e.g., image clarity) when performing image quality detection on these images to be published.
[0034] In one possible implementation, the server can iterate through each image uploaded by the user as a business image to be inspected for image quality. This allows the image quality inspection method, based on the multimodal feature retrieval and generation mechanism, to quickly retrieve the most relevant standard image from the standard image library (i.e., standard image set) corresponding to the business scenario to which the business image belongs, as the target standard image. This allows the server to combine the business image (specifically, the feature data extracted from the visual modality of the business image) with the retrieved target standard image (specifically, the feature data extracted from the target standard image in multiple business modalities) to equivalently generate multimodal feature data that fully represents the business image. This multimodal feature data can then be used to assist the large language model in providing a faster and more reliable clarity score.
[0035] In other words, the target standard image refers to the reference image retrieved from the standard image set corresponding to the target business scenario that is closest in category and clarity to the business image. Therefore, by combining the business image with the retrieved reference image, multimodal feature data that fully characterizes the business image can be generated. The feature data extracted from the target standard image across multiple business modalities includes image features extracted from the visual modality (i.e., image feature modality) and quality features extracted from the text modality. Specifically, for the target standard image, the business modality corresponding to the target standard feature data obtained by extracting image features from the target standard image is the image feature modality, while the business modality of the target image quality data bound to the target standard image is the text modality.
[0036] II. Multimodal feature data refers to the image quality detection method under the multimodal retrieval-augmented generation (RAG) framework, which uses the aforementioned multimodal feature retrieval and generation mechanism. It combines the feature data of the image to be tested (i.e., the business image to be inspected) with the different modal data of the target standard image (i.e., the reference image that is most similar to the business image in terms of category and clarity) to generate another feature data that is used to equivalently represent the business image.
[0037] In the embodiments of this application, the multimodal data used to characterize the service image, compared to the original single-modal data (i.e., the feature data extracted from the service image in the visual modality, such as contrast, brightness, and texture, etc.) used only to characterize the service image, not only considers the feature data of the service image itself in the visual modality, but also comprehensively considers the different modal data of other information sources (other standard images most similar to the service image retrieved) (e.g., the image context of other standard images, the business standards followed by other standard images, and the image quality data evaluated under the corresponding business standards bound to other standard images, etc.). In this way, the comprehensiveness of image quality assessment can be improved from the root, thereby solving the problem of the limitations of image quality assessment caused by directly using the single-modal data of the service image for image quality assessment in the prior art.
[0038] Here, multimodal feature data refers to the data obtained by concatenating the feature data of the service image (i.e., service feature data) with the different modal data of the target standard image (e.g., feature data extracted from the visual modality and image quality data bound to the text modality of the target standard image). In other words, in this embodiment, the service modality corresponding to the feature data extracted from the service image (i.e., service feature data) and the standard feature data extracted from the target standard image (i.e., target standard feature data) is the image feature modality (i.e., the aforementioned visual modality), while the service modality of the target image quality data is the text modality, thus allowing the multimodal feature data to be composed based on data from different service modalities.
[0039] Third, the target business scenario refers to the scenario that matches the currently acquired business image, determined from at least one business scenario configured by the business party based on actual business needs. In other words, the target business scenario can be a scenario determined by the computer equipment in at least one business scenario. Here, the business party refers to the party providing image quality inspection services under different business scenarios.
[0040] The target business scenario here refers to the business scenario that matches the business image in at least one business scenario. For example, the target business scenario here can be a business scenario that matches the scene identifier associated with the business image found in the scene identifiers associated with each business scenario.
[0041] IV. A standard image set (or reference image set) refers to an image set that the business party pre-configures for each business scenario within the at least one business scenario, providing image quality reference standards (i.e., image quality judgment standards). Specifically, these image quality reference standards (i.e., image quality judgment standards) can be clarity review standards customized or personalized by the business party for the corresponding business scenario.
[0042] For example, for a standard image set corresponding to a certain business scenario, the standard image set may contain multiple different standard images (i.e. different reference images) bound with different image quality data. Here, the different image quality data bound to the different standard images (i.e. different reference images) are all obtained by pre-checking the clarity of the corresponding clarity review standards configured for the business scenario based on the business policy.
[0043] This application embodiment, by introducing a standard image library (or standard image set), can improve the flexibility and convenience of the image processing system in performing image quality inspection (e.g., sharpness assessment) on business images under different business scenarios. Furthermore, even if the image type of the business image pair acquired by the image processing system is a new image type, it allows business users to flexibly and adaptively address the actual business needs of different users in certain business scenarios (e.g., the need to assess the sharpness of new image types). This enables the binding of new image quality data (e.g., binding a new sharpness score obtained by evaluating new sharpness standards and / or modifying existing sharpness standards) to the standard images corresponding to the new image type.
[0044] V. Business Model. The business model involved in this application embodiment includes a target business model and an initial business model. The target business model refers to the currently trained business model used for image quality detection during model application. Similarly, the initial business model refers to the business model that needs to be trained with sample training data before model application. In other words, in this application embodiment, the target business model is obtained by training the initial business model using sample training data.
[0045] VI. Multimodal RAG (Retrieval Enhanced Generation) architecture refers to the technical processing architecture in an image processing system (e.g., a server within that system) used to provide the aforementioned multimodal feature retrieval and generation mechanisms for image quality detection. The multimodal feature retrieval and generation mechanisms for image quality detection here can include at least multimodal feature retrieval methods and multimodal feature generation methods.
[0046] For example, under the multimodal retrieval enhancement generation (RAG) architecture, the image processing system can pre-retrieve the target standard image that is most similar (or most matched) to the business image from the standard image library corresponding to the corresponding business scenario (i.e., the target business scenario) through multimodal feature retrieval. Then, it can use multimodal feature generation to concatenate the feature data of the business image with the different modal data of the retrieved target standard image to finally obtain a framework of multimodal feature data used to enhance the feature data of the business image.
[0047] Please refer to Figure 1, which is a schematic diagram of an image processing system provided in an embodiment of this application. As shown in Figure 1, this image processing system can be an image quality detection system, which refers to a system used for image quality detection. Specifically, this image processing system (i.e., the image quality detection system) may include terminal devices (such as devices 11a, 12a, and 13a) and a server 100a. It is understood that the number of terminal devices and servers in Figure 1 is merely illustrative; any number of terminal devices and servers can be used depending on implementation needs. The terminal devices (such as devices 11a, 12a, and 13a) can communicate with the server through a network (i.e., through a medium providing a communication link via wired, wireless communication links, or fiber optic cables, etc.) to transmit data.
[0048] It is understandable that a client can run on the terminal device (such as device 12a), which can be a program that provides local services to the user (also called the target object or operation object). Server 100a can be the server corresponding to the client, and a program running on server 100a can provide resources, service data, and other services to the client. It is also understandable that the client running on the terminal device can be called an application client, business client, etc. For example, for the terminal device in this image processing system, the client running on the terminal device can be a client involving image or video viewing (such as a social application client, an e-commerce platform client, a video client, etc.), and then the user can publish (or upload) images or videos through this client.
[0049] To address the limitations of existing technologies that directly use a single-modal sharpness assessment algorithm to evaluate the sharpness of an image, this application proposes an image processing method applicable to image processing systems. In this application, the image processing system, under a multimodal RAG (Retrieval Enhanced Generation) architecture, comprehensively utilizes image retrieval and MLLM (Multimodal Large Language Model) technologies to combine the currently acquired image to be tested (i.e., the business image to be inspected) with a standard image library provided by the business party, thereby improving the flexibility and customizability of image quality inspection. The standard image library refers to different standard image sets that the business party flexibly configures to adapt to different business scenarios (for example, each business scenario can be assigned a corresponding standard image set). This means that when the image processing system acquires the image to be tested, it can select the standard image set that matches the business scenario to which the image to be tested belongs (i.e., the target business scenario determined in at least one business scenario) from these pre-configured standard image sets. Then, it can determine whether the category and clarity of the image to be tested (i.e., the business image to be inspected) can be retrieved from the multiple standard images contained in the target standard image set. If one or more standard images (e.g., top 5 reference images) are retrieved that are closest in category and sharpness to the image under test (i.e., the business image to be image quality detected), then a multimodal prompt (i.e., multimodal model prompt information) can be formed (or constructed) by retrieving these standard images, the target image quality data (e.g., sharpness) bound to these standard images, and the business image. This prompt is then submitted to the Multimodal Large Language Model (MLLM) for comprehensive discrimination. This not only improves the accuracy of image quality detection (e.g., sharpness discrimination) but also enhances the adaptability of the image processing system to different business scenarios.
[0050] Optionally, in another possible implementation, if no standard image is found that is closest in category and sharpness to the image under test (i.e., the business image to be inspected for image quality), a method can be generated to instruct the user to switch the current image quality inspection method to a single-modality sharpness judgment algorithm and continue image quality inspection of the image under test. This ensures that the user can flexibly choose the appropriate image quality inspection method according to actual needs. Optionally, in yet another possible implementation, relevant personnel from the business side can be further notified to modify or replace the existing standard image set for the target business scenario in a timely manner, so that other users can continue to perform similar image retrieval based on the modified standard image set for the target business scenario.
[0051] It should be understood that the multimodal large language model here can be the aforementioned target business model. Specifically, the target business model can be a business model that combines a large language model with multimodal data processing capabilities. This multimodal large language model can achieve cross-modal understanding and reasoning by fusing different modal data such as text, images, and audio through the aforementioned MLLM technology. In the embodiments of this application, the multimodal data processed by the multimodal large language model is specifically a multimodal prompt (i.e., multimodal model prompt information) composed of the aforementioned business image, the target standard image most similar to the business image, and the image quality data bound to the target standard image.
[0052] For ease of understanding, this example uses an image uploaded by a user through a client to illustrate the specific process by which the server performs image quality detection on the obtained business image when it receives the image. For instance, the server in this image processing system can, upon receiving the image uploaded by the user, further employ the image processing method provided in this application embodiment to obtain the business image that needs image quality detection from the uploaded image, so that it can subsequently determine whether the business scenario to which the business image belongs is one of at least one pre-configured business scenarios.
[0053] If so, the business scenario to which the business image belongs can be collectively referred to as the target business scenario. That is, in one optional implementation, the target business scenario here refers to the business scenario that matches the business image in at least one business scenario. Optionally, in another optional implementation, the target business scenario here can also be the business scenario that matches the scene identifier associated with the business image, found in the scene identifiers associated with each business scenario.
[0054] In this way, the server can, based on the multimodal feature retrieval method indicated by the aforementioned image retrieval technology, search from the standard image set pre-configured for the target business scenario to determine if there exists one or more standard images matching the business image. If such images are found, they can be collectively referred to as target standard images. Thus, this embodiment can invoke the business model (i.e., the target business model) used for image quality retrieval, enabling the target business model to generate multimodal feature data according to the aforementioned MLLM technology. This generated multimodal data is then used to perform image quality detection on the business image, thereby obtaining image quality retrieval results for the business image. Based on the image quality detection results, corresponding business processing can be performed on the user-uploaded images. For example, if the image quality retrieval results indicate that the image or video uploaded by the user is of substandard quality, the server can further intercept the user-uploaded image or video, meaning the server does not need to add the user-uploaded image or video to the recommendation system. This reduces the likelihood of recommending user-uploaded images or videos from the outset.
[0055] It is understood that terminal devices (such as device 11a) may include, but are not limited to, mobile phones, computers, smart voice interaction devices, smart home appliances, vehicle terminals, aircraft, smart speakers, etc., without limitation. The server 100a may be an independent physical server, a server cluster or distributed system composed of multiple physical servers, or a cloud server providing basic cloud computing services such as cloud services, cloud databases, cloud computing, cloud functions, cloud storage, network services, cloud communication, middleware services, domain name services, security services, CDN, and big data and artificial intelligence platforms, without limitation. The embodiments of this application can be applied to the aforementioned terminal devices and servers, without limitation.
[0056] Further, please refer to Figure 2, which is a schematic diagram of a scenario of an image processing method provided in this application embodiment. As shown in Figure 2, firstly, a business image It (as shown in 211a in Figure 2) to be subjected to image quality detection can be obtained. This business image can be an image uploaded or published by a business object (such as a user), or a video frame in a business video uploaded or published by a business object (such as a user). Then, the target business scenario corresponding to the business image It can be determined from at least one business scenario (step S21a). At least one business scenario can refer to a scenario requiring image quality detection. For example, as shown in 2111a in Figure 2, at least one business scenario can include business scenario C1, business scenario C2, and business scenario C3, etc. For example, business scenario C1 can be a resource recommendation scenario, business scenario C2 can be a news scenario, and business scenario C3 can be a social application scenario. Each business scenario involved in this application embodiment can be configured with a corresponding standard image set; that is, one business scenario can be configured with one standard image set. Thus, for each standard image set corresponding to each business scenario, the image sets configured by the business party for each business scenario can be collectively referred to as standard image sets. Each of these standard image sets configured by the business party can include multiple standard images. It should be understood that the standard images involved in the embodiments of this application can refer to images used as references when performing image quality detection on business images. For example, the standard images here can be used to provide additional information from various sources (e.g., image context and / or business standards) for business images. For example, as shown in Figure 2(2112a), business scenario C1 has a corresponding standard image set J1, business scenario C2 has a corresponding standard image set J2, and business scenario C3 has a corresponding standard image set J3.
[0057] Furthermore, when the target business scenario corresponding to the business image It is determined to be business scenario C1 (as shown in 212a in Figure 2), the standard image set J1 corresponding to business scenario C1 is obtained (as shown in 213a in Figure 2). Then, image retrieval is performed in the standard image set J1 (step S22a), thereby retrieving the target standard image that matches the business image It from the standard image set J1 (as shown in 214a in Figure 2). As shown in Figure 2, the target standard image can include multiple standard images. For example, the target standard image can be the M standard images with the highest image similarity to the target business image, where M is a positive integer greater than 1. As shown in 2141a in Figure 2, the target standard image can include: standard image Itop1, standard image Itop2, ..., standard image ItopM, etc., M standard images. It can be understood that each standard image can be bound to corresponding image quality data, which can be data used to represent the image quality of the standard image, such as an image quality score. For example, as shown in 2142a of Figure 2, the standard image Itop1 is bound to the image quality data Qtop1, the standard image Itop2 is bound to the image quality data Qtop2, and so on, the standard image ItopM is bound to the image quality data QtopM.
[0058] Furthermore, in this embodiment of the application, target standard feature data corresponding to the target standard image can be determined based on the standard feature data corresponding to each standard image in the target standard image (as shown in 215a in Figure 2). The standard feature data corresponding to any standard image can be obtained by performing feature extraction processing on the corresponding standard image. Additionally, target image quality data can be determined based on the image quality data corresponding to each standard image in the target standard image (as shown in 216a in Figure 2).
[0059] Furthermore, feature extraction processing can be performed on the business image It (step S23a) to obtain the business feature data corresponding to the business image It (as shown in 217a in Figure 2). Further, the business feature data, target standard feature data, and target image quality data can be concatenated to obtain multimodal feature data (as shown in 218a in Figure 2). The business modality corresponding to the business feature data and target standard feature data is the image feature modality (i.e., the aforementioned visual modality), while the business modality of the target image quality data is the text modality. Thus, the multimodal feature data can be composed based on data from different business modalities. It should be understood that, in this embodiment, when inputting multimodal feature data combining different modalities into the target business model for image quality detection, the dependencies between different modalities (i.e., different business modalities) can be fully explored and captured. Therefore, when performing image quality detection (e.g., image sharpness judgment) on the business image through the explored and captured dependencies, the reliability and accuracy of image quality detection can be effectively improved.
[0060] Furthermore, this embodiment of the application can obtain model text prompt information (as shown in 219a in Figure 2), which can be prompt information used to guide the target business model on the processing method of multimodal feature data. Then, this embodiment of the application can input the model text prompt information and multimodal feature data into the target business model (as shown in 220a in Figure 2). For example, the target business model can be a multimodal large language model (i.e., MLLM). Furthermore, the target business model can perform image quality detection on the multimodal feature data according to the model text prompt information to obtain image quality detection data (as shown in 221a in Figure 2). The image quality detection data can be data obtained by performing quality detection on the business image. This image quality detection data can be represented as an image quality score of the business image.
[0061] It should be noted that this application may display prompt interfaces, pop-ups, or output voice prompts before and during the collection of user data. These prompt interfaces, pop-ups, or voice prompts are used to inform the user that their data is being collected. This ensures that the application only begins the steps for collecting user data after receiving confirmation from the user regarding the prompt interface or pop-up; otherwise (i.e., without user confirmation), the steps for collecting user data end, meaning no user data is collected. In other words, all user data collected in this application is collected with the user's consent and authorization, and the collection, use, and processing of related user data must comply with the relevant laws, regulations, and standards of the relevant regions.
[0062] It is understood that the above scenarios are merely examples and do not constitute a limitation on the application scenarios of the technical solutions provided in the embodiments of this application. The technical solutions of this application can also be applied to other scenarios. For example, as those skilled in the art will know, with the evolution of system architecture and the emergence of new business scenarios, the technical solutions provided in the embodiments of this application are also applicable to similar technical problems.
[0063] Further, please refer to Figure 3, which is a schematic flowchart of an image processing method provided in an embodiment of this application. This method can be executed by a computer device, such as the server 100a in Figure 1 above. The method may include at least the following steps S101-S104.
[0064] S101. Obtain the business image to be inspected for image quality; the business image is an image under the target business scenario determined in at least one business scenario; each business scenario in at least one business scenario is configured with a corresponding standard image set.
[0065] The business image refers to the image to be subjected to image quality detection. This business image can be an image within a target business scenario. In this embodiment, the image quality of the business image can be reflected in its sharpness, noise level, etc.
[0066] The target business scenario refers to the business scenario to which the business image to be image quality detected belongs. This target business scenario can be any of at least one business scenario, and these at least one business scenario can refer to the scenario requiring image quality detection. For ease of description, these at least one business scenario can be referred to as a business scenario set, which can be configured by relevant management personnel. For example, the business scenarios in this set can be resource recommendation scenarios, news scenarios, social application scenarios, etc., without limitation here. In these scenarios, image quality detection can be performed on the corresponding images. If the image quality of the corresponding image is found to be substandard (i.e., the image quality detection result of the business image indicates that the image quality of the business image is substandard), corresponding business processing can be performed on the business image. For example, in a resource recommendation scenario, if the image quality of a business image in an image resource is found to be substandard, the recommendation probability of that image resource is reduced, thereby avoiding the promotion of low-quality image resources and improving the overall quality of image resources in the application.
[0067] In one embodiment, when a business image is acquired, the target business scenario to which the business image belongs can be determined based on the target scenario identifier associated with the business image. Specifically, the target scenario identifier associated with the business image is acquired, and a business scenario matching the target scenario identifier is searched from at least one business scenario. The business scenario matching the target scenario identifier is then identified as the target business scenario. Here, the target scenario identifier can refer to the scenario identifier corresponding to the business scenario to which the business image belongs; one scenario identifier is used to uniquely identify one business scenario.
[0068] For example, please refer to Figure 4, which is a flowchart illustrating a process for determining a target business scenario provided in an embodiment of this application. As shown in Figure 4, the business image acquired by the computer device (as shown in 401a) is associated with a scenario identifier Cb1 (as shown in 402a). Then, the computer device can search for a business scenario matching the scenario identifier Cb1 from the business scenario set 403a (step S41a). It can be seen that the business scenario set 403a may include multiple business scenarios such as business scenario C1 (as shown in 404a), business scenario C2 (as shown in 405a), ... business scenario Cn (as shown in 406a), etc. Each business scenario is associated with a corresponding scenario identifier. For example, business scenario C1 is associated with scenario identifier Cb1 (as shown in 414a), business scenario C2 is associated with scenario identifier Cb2 (as shown in 415a), and so on, with business scenario Cn being associated with scenario identifier Cbn (as shown in 416a). Based on this, it can be seen that the business scenario that matches the scene identifier Cb1 associated with the business image (as shown in 401a in Figure 4) is business scenario C1, and thus business scenario C1 can be identified as the target business scenario (as shown in 407a in Figure 4).
[0069] It is understandable that each business scenario has a corresponding standard image set. This means that a business entity can configure a corresponding standard image set for each business scenario within at least one business scenario. For example, one business scenario can correspond to one standard image set, and a standard image set can specifically include multiple standard images. Here, standard images refer to images provided by the business entity that can be used as references when performing image quality inspection on business images. The standard image sets corresponding to each business scenario belong to a standard image library. Therefore, when the target business scenario to which a business image belongs is determined, the standard image library corresponding to that target business scenario can be loaded to find the standard image set corresponding to the target business scenario from the loaded standard image library.
[0070] In a standard image set (or standard image library), each standard image can be associated with corresponding image quality data. This image quality data can be data used to characterize the image quality of the standard image. This image quality data can be represented as an image quality score range or an image quality score for the standard image. Here, the image quality score refers to a rating used to represent the image quality, and the image quality score range refers to a range of ratings used to represent the image quality. For example, if the image quality score range for a standard image is 0-10, then the image quality data for any standard image can be an image quality score or an image quality score range belonging to 0-10. For instance, the image quality data for a certain standard image indicates an image quality score of 4, or the image quality score range indicated by the image quality data for a certain standard image is (6,8).
[0071] For example, Table (1) below is an example of a standard image set and its related information corresponding to a business scenario.
[0072] Table (1)
[0073] Image quality score refers to the quality score of an image, and its value ranges from 0 to 10. Image quality categories can be derived from image quality scores. For example, if the image quality score of any image ranges from 0 to 10, then when the image quality score falls within the range of [1,2), the image quality category of the business image is determined to be completely blurry; when the image quality score falls within the range of [2,3), the image quality category of the business image is determined to be subject-matter blurry; when the image quality score falls within the range of [3,4), the image quality category of the business image is determined to be blurry; when the image quality score is 4, the image quality category of the business image is determined to be slightly blurry; and when the image quality score falls within the range of (4,10), the image quality category of the business image is determined to be sharp. As shown in Table 1 above, for a certain standard image in the standard image set, the higher the image quality score associated with the standard image, the better the sharpness of the standard image; conversely, the lower the image quality score associated with the standard image, the worse the sharpness of the standard image (i.e., the more blurry the visual effect presented by the image).
[0074] The processing result can be the result of processing images of various image quality categories according to the corresponding business scenario (such as the target business scenario). For example, processing business images (or business videos corresponding to business images) can include blocking or not blocking. Specifically, in a resource recommendation scenario, the processing result can be the result of blocking or not blocking the image resources to which the business image belongs. Here, blocking can refer to the operation performed by the computer device to reduce the probability of recommending the image resources to which the business image belongs, while not blocking can refer to another operation performed by the computer device to ensure or not reduce the probability of recommending the image resources to which the business image belongs.
[0075] It is understandable that each image quality category has multiple corresponding standard images. Therefore, each standard image corresponding to an image quality category can be bound to a corresponding image quality score and image quality category (i.e., image quality data). It is also understandable that a set of standard images corresponding to a business scenario can include relevant descriptive information for each image quality category. For example, this relevant descriptive information can include keyword information, which can describe the key descriptive information of the corresponding image quality category (used to describe the image quality judgment criteria for the corresponding image quality category). For example, the key descriptive information for the completely blurry quality category could be: solid color, subject unrecognizable; the key descriptive information for the subject-blurred quality category could be: subject present, subject outline unclear; the key descriptive information for the blurred quality category could be: subject and outline distinguishable, no obvious blurriness; the key descriptive information for the slightly blurred quality category could be: subject clear, details have flaws; the key descriptive information for the clear category could be: clear. The relevant descriptive information can also include image detail information, which can be used to specifically describe the image quality judgment criteria for the corresponding image quality category. This will not be elaborated upon here.
[0076] Optionally, in this embodiment, the business image can be an image directly published or uploaded by a business object (such as a user), or it can be a video frame selected from multiple video frames included in the business video; no limitation is made here. The business video can be a video published or uploaded by a business object (such as a user).
[0077] S102. In the standard image set corresponding to the target business scenario, retrieve the target standard image that matches the business image, and obtain the target image quality data bound to the target standard image.
[0078] The standard image set corresponding to the target business scenario can refer to the standard image set corresponding to the target business scenario, and the standard image can refer to the image used as a reference when performing image quality detection on the business image.
[0079] It is understood that in the embodiments of this application, the image quality judgment standards for different images differ in different business scenarios. Therefore, different standard images can be configured for different business scenarios as references. That is, the standard images involved in the embodiments of this application refer to images that can pre-configure and provide corresponding image quality judgment standards (i.e., the above-mentioned image quality reference standards) for different image quality categories (such as completely blurry, subject blurry, slightly blurry, clear, etc.) in different business scenarios. This is so that the standard images belonging to the same scenario as the business images can be used as a reference when performing image quality detection on the business images, thereby obtaining image quality detection results adapted to the business scenario to which the business images belong.
[0080] The target standard image can refer to the target standard image in the standard image set corresponding to the target business scenario that matches the business image. Specifically, the target standard image can be an image in the standard image set that meets the similarity condition between the image and the business image.
[0081] In one embodiment, retrieving a target standard image that matches a business image from a set of standard images corresponding to a target business scenario may include the following steps: ① Performing feature extraction processing on the business image using a feature extractor to obtain business feature data of the business image; ② Obtaining standard feature data corresponding to each standard image in the set of standard images corresponding to the target business scenario; the standard feature data corresponding to each standard image is obtained by performing feature extraction processing on each standard image using a feature extractor; ③ Determining the image similarity between each standard image and the business image based on the feature similarity between the business feature data and the standard feature data corresponding to each standard image; ④ Searching for standard images in the set of standard images corresponding to the target business scenario whose image similarity meets the similarity condition with the business image, and determining the searched standard images as target standard images that match the business image.
[0082] The feature extractor can be a processing module used to extract features from business images. When extracting features from business images, this feature extractor integrates semantic features and image quality features, making the found target standard image more similar to the business image. This allows the target standard image (highly similar to the business image) to be used as a reference for image quality detection of the business image, making the image quality detection results more suitable for the corresponding business scenario and improving the flexibility and accuracy of image quality detection.
[0083] Business feature data refers to the feature data obtained by performing feature extraction processing on business images. This business feature data can be represented as a feature vector, which is not limited here.
[0084] The standard feature data can refer to the feature data obtained by performing feature extraction processing on any standard image. This standard feature data can be represented as a feature vector, which is not limited here. It is understood that the standard feature data corresponding to any standard image is obtained by performing feature extraction processing on any standard image using a feature extractor. The specific extraction process can be referred to the relevant description of determining business feature data, which will not be elaborated here. It is understood that in the embodiments of this application, before retrieving a standard image matching a business image, feature extraction processing can be performed on each standard image in advance to obtain the corresponding standard feature data. Then, when retrieving a standard image matching a business image, the standard feature data of each standard image can be directly obtained, thereby improving the efficiency of image retrieval. Furthermore, the pre-generated standard feature data of each standard image can be stored, so when retrieving different business images, it is not necessary to perform repeated calculations; the pre-generated standard feature data of each standard image can be directly obtained, greatly improving the efficiency of image retrieval.
[0085] Feature similarity refers to the similarity between business feature data and any standard feature data. Image similarity refers to the similarity between a business image and any standard image. The image similarity between any standard image and a business image can be determined based on the feature similarity between the business feature data of the business image and the standard feature data of the standard image. For example, feature similarity can be directly used as image similarity, or feature similarity can be converted to the similarity range corresponding to image similarity (e.g., 0-1) to obtain image similarity. It should be understood that image similarity is directly proportional to feature similarity.
[0086] The feature similarity between the business feature data and the standard feature data corresponding to any standard image can include: obtaining a feature product based on the product between the business feature data and the standard feature data corresponding to any standard image; obtaining a feature norm product based on the product between the norm of the business feature data and the norm of the standard feature data corresponding to any standard image; and then dividing the feature product by the feature norm product to obtain the feature similarity between the business feature data and the standard feature data corresponding to any standard image.
[0087] For example, the process of calculating the image similarity between a business image and any standard image can be obtained by referring to the following formula (1).
[0088] Among them, S(I t ,I i ) represents the business image I t With any standard image Ii Image similarity between them. F(I) t ) represents the business image I t Business characteristic data, F(I) i ) represents any standard image I i Standard feature data.
[0089] The similarity condition can refer to the conditions that the target standard image matching the business image must meet. For example, the similarity condition can refer to the condition that the sorting number of each standard image is less than or equal to a preset sorting number when sorting each standard image from high to low image similarity. The sorting number can refer to the number when sorting each standard image; for example, the sorting number of the first standard image is 1, and the earlier the image is sorted, the smaller the sorting number. The preset sorting number can refer to the threshold number that the image similarity of the target standard image must reach when determining the target standard image. For example, the preset sorting number can be 5, indicating that the 5 standard images with the highest image similarity meet the similarity condition, meaning that the 5 standard images with the highest image similarity can be selected as the target standard image. Optionally, the similarity condition can also be that the image similarity is greater than or equal to a similarity threshold. This similarity threshold can refer to the threshold that the image similarity of the target standard image must reach when determining the target standard image; for example, the similarity threshold can be 0.6. Alternatively, the similarity condition can be to satisfy both of the above conditions simultaneously. That is, when sorting each standard image according to image similarity from high to low, the sorting number is less than or equal to a preset sorting number, and the image similarity is greater than or equal to a similarity threshold. Alternatively, other similarity conditions can be used, which are not limited here. It is understood that in the embodiments of this application, the greater the image similarity between the business image and the standard image, the more similar the standard image is to the business image. Therefore, by judging based on the similarity condition, a standard image that is more similar to the business image is selected as a reference for image quality detection of the business image, thus the accuracy of image quality detection of the business image is higher.
[0090] The target image quality data can refer to the image quality data bound to the target standard image. For an introduction to image quality data, please refer to the relevant descriptions above, which will not be repeated here.
[0091] It is understandable that the above-mentioned retrieval of target standard images matching the business images from the standard image set corresponding to the target business scenario can be performed by the image retrieval module.
[0092] S103. When the business feature data of the business image and the target standard feature data of the target standard image are obtained, the business feature data, the target standard feature data and the target image quality data are spliced together to obtain multimodal feature data for inputting the target business model.
[0093] The process of acquiring business feature data can be referred to the relevant descriptions above, and will not be repeated here.
[0094] Among them, the target standard feature data refers to the standard feature data of the target standard image. The acquisition process can be referred to the relevant description above, and will not be repeated here.
[0095] The multimodal feature data can be feature data to be input into the target business model. This multimodal feature data includes data from multiple business modalities; specifically, it can include image feature modal data corresponding to business feature data and target standard feature data, as well as text modal data corresponding to target image quality data. In this embodiment, the multimodal feature data is obtained by concatenating the feature data from these two business modalities, hence the term "multimodal feature data."
[0096] In one embodiment, the target standard image includes M standard images, where M is a positive integer; the target standard feature data includes standard feature data corresponding to each of the M standard images; and the target image quality data includes image sharpness data corresponding to each of the M standard images. Then, concatenating the business feature data, target standard feature data, and target image quality data to obtain multimodal feature data for inputting into the target business model can include the following steps: ① Performing a first concatenation process on the standard feature data corresponding to each of the M standard images to obtain first concatenated feature data; ② Performing a second concatenation process on the image sharpness data corresponding to each of the M standard images to obtain second concatenated feature data; the business modality indicated by the second concatenated feature data is different from the business modality indicated by the first concatenated feature data; ③ Obtaining a concatenation function for multimodal concatenation processing, and concatenating the business feature data, first concatenated feature data, and second concatenated feature data using the concatenation function to obtain multimodal feature data for inputting into the target business model.
[0097] The number of target standard images is M, where M is a positive integer. This means that the target standard images can include one or more standard images. For example, the target standard images can include the five standard images from the standard image set that have the highest image similarity to the business image, which can be represented as target standard image I. top5 ={I1,I2,I3,I4,I5}.
[0098] Accordingly, the target standard feature data includes the standard feature data corresponding to each of the M standard images; the target image quality data includes the image quality data corresponding to each of the M standard images. Specifically, when the image quality is sharpness, the image quality data corresponding to each standard image can be image sharpness data, that is, the target image quality data includes the image sharpness data corresponding to each of the M standard images.
[0099] The first stitched feature data can refer to the feature data obtained by stitching together the standard feature data corresponding to each of the M standard images. The second stitched feature data can refer to the feature data obtained by stitching together the image sharpness data corresponding to each of the M standard images. It is understood that the business modality indicated by the second stitched feature data differs from the business modality indicated by the first stitched feature data. Specifically, the business modality indicated by the first stitched feature data can be an image feature modality, while the business modality indicated by the second stitched feature data can be a text modality.
[0100] The concatenation function can refer to a function used for multimodal concatenation processing, such as the `concat()` function. Based on this concatenation function, business feature data, first concatenated feature data, and second concatenated feature data can be concatenated to obtain multimodal feature data for input into the target business model.
[0101] For example, the process of determining multimodal feature data can be represented by the following formula (2). P = concat(F t ,F topM Q topM ) Formula (2)
[0102] Where P represents multimodal feature data, concat() represents the concatenation function, and F t F represents the business feature data of the business image. top5 Q represents the first stitched feature data corresponding to the standard feature data of M standard images (e.g., 5 standard images). top5 This represents the second stitching feature data corresponding to the image sharpness data of M standard images (e.g., 5 standard images).
[0103] S104. When obtaining the model text prompt information for the target business model, input the multimodal feature data and the model text prompt information into the target business model. The target business model then performs image quality detection on the multimodal feature data according to the model text prompt information to obtain image quality detection data.
[0104] The model text prompts can be used to instruct the target business model to perform image quality detection (e.g., sharpness assessment and / or noise detection) on multimodal feature data. For example, the model text prompts can include instructions telling the model (i.e., the target business model) what tasks it needs to complete, output requirement prompts indicating the desired output of the target business model, or other prompts, which are not limited here.
[0105] Specifically, when inputting multimodal feature data and model text prompts into the target business model, multimodal model prompts can be generated based on the multimodal feature data and model text prompts. These multimodal model prompts can then be input into the target business model. The multimodal model prompts can refer to prompt words (i.e., prompts) used to input into the target business model. A prompt is a technology based on artificial intelligence (AI) instructions; through this explicit and specific guidance (e.g., prompt words), a large language model can be guided to output according to specific output requirements.
[0106] Among them, the construction of multimodal model prompt information for the target business model based on multimodal feature data and model text prompt information can be based on prompt engineering. Through prompt engineering, more accurate prompt words (i.e., multimodal model prompt information) can be generated, thereby obtaining more accurate image quality detection results.
[0107] The target business model refers to a business model used for image quality detection. This target business model can be a large language model (LLM model), specifically a multimodal large language model (MLLM model). The model structure in this target business model can be a transformer structure (a network model structure), which includes an input layer, an encoder / decoder, and an output layer. The input layer of the target business model can convert the input multimodal model prompts into embedding vectors, which are then encoded and decoded by the encoder / decoder, resulting in the final output layer. The encoder / decoder employs a self-attention mechanism, which captures the dependencies between different business modalities. This allows the target business model to better notice the correlations between feature data from various business modalities, thereby improving the accuracy of image quality detection.
[0108] The image quality detection data can be data obtained by performing quality detection on the business image. This image quality detection data can be represented as an image quality score for the business image, or as an image quality category determined based on the image quality score, or as a quality compliance category determined based on the image quality score. The quality compliance category can indicate whether the image quality is qualified or unqualified. The image quality category can be a category derived from the image quality score, as specifically described in Table (1) above. The quality compliance category can be determined based on whether the image quality score is greater than or equal to a scoring threshold. If the image quality score of the business image is greater than or equal to the scoring threshold, the quality compliance category indicates that the image quality is qualified; if the image quality score of the business image is less than the scoring threshold, the quality compliance category indicates that the image quality is unqualified.
[0109] In one embodiment, multimodal feature data and model text prompts are input into a target business model. The target business model then performs image quality detection on the multimodal feature data according to the model text prompts to obtain image quality detection data. This process may include the following steps: ① Constructing multimodal model prompts for the target business model based on the multimodal feature data and model text prompts, and inputting the multimodal model prompts into the target business model; ② Performing image quality detection on the multimodal feature data according to the multimodal model prompts to obtain a first quality detection result for the business image; ③ Determining the first quality detection result as the image quality detection data.
[0110] The description of constructing multimodal model prompts can be found above and will not be repeated here. The description of the model structure for the target business model can also be found above. For example, the multimodal model prompts can indicate to the target business model the image quality detection results corresponding to the business feature data in the multimodal feature data, obtained based on the multimodal feature data detection, thereby outputting image quality detection data.
[0111] The first quality detection result can be a quality detection result determined based on multimodal feature data. This first quality detection result can be represented as an image quality score for the service image. Further, this first quality detection result can be determined as the final image quality detection data. For example, the image quality score indicated by the first quality detection result of the service image can be determined as the image quality detection data; or, if the image quality category to which the service image belongs is determined based on the image quality score indicated by the first quality detection result of the service image, the determined image quality category can be further determined as the image quality detection data; or, if the quality compliance category of the service image is determined based on the image quality score indicated by the first quality detection result of the service image, the determined quality compliance category can be further determined as the image quality detection data.
[0112] The process of image quality detection for business images based on multimodal feature data is illustrated here. Please refer to Figure 5, which is a flowchart illustrating an image quality detection process provided in an embodiment of this application. As shown in Figure 5, the business image in the target business scenario can first be obtained (as shown in 501a in Figure 5). Then, a feature extractor (as shown in 503a in Figure 5) can be used to extract features from the business image to obtain business feature data (as shown in 504a in Figure 5). Furthermore, standard feature data of each standard image in the standard image set corresponding to the target business scenario can be obtained (as shown in 505a in Figure 5). This data can be obtained by performing feature extraction on each standard image in the standard image set corresponding to the target business scenario (as shown in 502a in Figure 5) using the feature extractor (as shown in 503a in Figure 5). Furthermore, the image retrieval module (as shown in 506a in Figure 5) can determine the target standard image matching the business image based on the standard feature data and business feature data of each standard image in the standard image set corresponding to the target business scenario (as shown in 507a in Figure 5). The specific image retrieval process can be referred to the relevant description of step S102, which will not be repeated here. The target standard image may include multiple standard images, such as image Itop1, image Itop2, and image ItopM, etc. Then, the target image quality data corresponding to the target standard image can be obtained (as shown in 508a in Figure 5). The target image quality data includes the image quality data corresponding to each standard image in the target standard image, such as the quality data Qtop1 corresponding to image Itop1, the quality data Qtop2 corresponding to image Itop2, the quality data QtopM corresponding to image ItopM, etc.
[0113] Furthermore, target standard feature data corresponding to the target standard image can be obtained (as shown in 509a in Figure 5). The target standard feature data can include the standard feature data corresponding to each standard image in the target standard image, such as the feature data Ftop1 corresponding to image Itop1, the feature data Ftop2 corresponding to image Itop2, the feature data FtopM corresponding to image ItopM, and so on.
[0114] Furthermore, multimodal feature data can be constructed based on business feature data, target standard feature data, target image quality data, and business feature data (as shown in 510a in Figure 5). The specific construction method can be referred to the relevant description of step S103 above, which will not be repeated here.
[0115] Furthermore, the prompt word engineering (as shown in 511a in Figure 5) can determine (i.e. construct) the multimodal model prompt information based on the multimodal feature data and the model text prompt information. Then, the target business model (as shown in 512a in Figure 5) can perform image quality detection on the multimodal feature data according to the instructions of the multimodal model prompt information to obtain the image quality detection data of the business image (as shown in 513a in Figure 5).
[0116] It is understood that, in the embodiments of this application, the computer device can use the multimodal retrieval enhanced generation (RAG) method provided by the multimodal RAG architecture to retrieve target standard images that match the business image under the above-mentioned multimodal RAG architecture. For example, the multimodal retrieval enhanced generation (RAG) method proposes to comprehensively utilize image retrieval and MLLM technology, and accurately retrieve the top 5 reference images (i.e., M standard images) that are closest to the business image in terms of semantics and clarity by combining the business image with the standard image set provided by the business party corresponding to the target business scenario. Then, the different modal data (e.g., image feature modality and text modality) of these retrieved top 5 reference images (i.e., M standard images) and the image feature modality of the business image can be used to form multimodal feature data, and then the multimodal feature data can be handed over to the MLLM model (i.e., the target business model) for comprehensive discrimination. In this embodiment, by introducing this image retrieval process, the accuracy of image clarity assessment for a business image can be improved by retrieving standard images that are closest in meaning and clarity to the business image. This also enhances the adaptability of the image processing system under the multimodal RAG architecture to different business scenarios. Furthermore, considering that the standard image sets corresponding to each business scenario can be flexibly added or replaced by relevant management personnel according to actual business needs, it is easier to adjust the image quality judgment standards for each business scenario. Therefore, even when the acquired business image type is a new image type, the standard image set can be adjusted without retraining the model. This allows the computer device to accurately retrieve new standard images matching the new image type from the flexibly added or replaced standard image set when performing similar searches using image retrieval technology. This improves the flexibility of image quality detection through the retrieved new standard images.
[0117] Optionally, after obtaining the first quality detection result, embodiments of this application may further adjust the first quality detection result based on image quality rules associated with the target business scenario to obtain a second quality detection result, and then determine the final image quality detection data based on the first and second quality detection results. The specific process can be referred to the relevant description of the embodiment shown in Figure 6 below.
[0118] Further, please refer to Figure 6, which is a schematic flowchart of an image processing method provided in an embodiment of this application. This method can be executed by a computer device, such as the server 100a in Figure 1 above. The method may include at least the following steps S201-S208.
[0119] S201. Obtain the business image to be subjected to image quality detection; the business image is an image under the target business scenario determined in at least one business scenario, and each business scenario in at least one business scenario is configured with a corresponding standard image set.
[0120] In other words, the business images here can be images from the target business scenario; the target business scenario refers to a scene in at least one business scenario; one business scenario corresponds to one standard image set.
[0121] S202. In the standard image set corresponding to the target business scenario, retrieve the target standard image that matches the business image, and obtain the target image quality data bound to the target standard image.
[0122] S203. When the business feature data of the business image and the target standard feature data of the target standard image are obtained, the business feature data, the target standard feature data and the target image quality data are spliced together to obtain multimodal feature data for inputting the target business model.
[0123] The descriptions of steps S201-S203 can be found in the descriptions of steps S101-S103 above, and will not be repeated here.
[0124] As described above, business feature data is obtained by performing feature extraction processing on business images using a feature extractor. It should be understood that the feature extractor here can specifically include a semantic feature extractor for extracting semantic features and a quality feature extractor for extracting quality features.
[0125] In one embodiment, the business feature data is obtained by performing feature extraction processing on the business image using a feature extractor, which includes a semantic feature extractor and a quality feature extractor. Therefore, the method for determining the business feature data of a business image may include the following steps: ① Performing semantic feature extraction processing on the business image using a semantic feature extractor to obtain semantic feature data of the business image; ② Performing quality feature extraction processing on the business image using a quality feature extractor to obtain quality feature data of the business image; ③ Performing feature fusion processing on the semantic feature data and the quality feature data to obtain the business feature data of the business image.
[0126] The semantic feature extractor can be an extractor used to extract semantic features (i.e., semantic feature data) of an image (such as a business image). The quality feature extractor can be an extractor used to extract image quality features (i.e., quality feature data) of an image (such as a business image). Semantic feature data refers to the semantic features of an image, used to characterize information about the image's content. Quality feature data refers to the image quality features of an image (such as a business image), used to characterize information about the image's quality. In this application, the network structures of the semantic feature extractor and the quality feature extractor can be the same or different, and are not limited here. For example, the network structures of the semantic feature extractor and the quality feature extractor can be a CNN (Convolutional Neural Network) network.
[0127] Specifically, the feature fusion processing of semantic feature data and quality feature data can be performed through a feature fusion network. This feature fusion network can be any network used for fusing semantic and quality feature data; for example, it could be a transformer (a network structure), but this is not limited here.
[0128] For example, please refer to Figure 7, which is a flowchart illustrating a feature extraction process provided in an embodiment of this application. As shown in Figure 7, when a business image is acquired (as shown in 701a), semantic features can be extracted from the business image using a semantic feature extractor (as shown in 702a) to obtain semantic feature data. Furthermore, quality features can be extracted from the business image using a quality feature extractor (as shown in 703a) to obtain quality feature data. Then, a feature fusion network (as shown in 704a) can be used to fuse the semantic feature data and the quality feature data to obtain the business feature data of the business image (as shown in 705a).
[0129] It is understandable that both the semantic feature extractor and the quality feature extractor can be obtained by training on the corresponding training data.
[0130] In one embodiment, the method for determining a semantic feature extractor may include the following steps: ① acquiring a first sample image for training an initial semantic feature extractor; ② performing semantic feature extraction processing on the first sample image using the initial semantic feature extractor to obtain sample semantic feature data of the first sample image; ③ performing image scaling processing on the first sample image to obtain a scaled sample image corresponding to the first sample image; ④ performing semantic feature extraction processing on the scaled sample image using the initial semantic feature extractor to obtain scaled semantic feature data of the scaled sample image; ⑤ adjusting the parameters of the initial semantic feature extractor based on the sample semantic feature data and the scaled semantic feature data, and determining the initial semantic feature extractor after parameter adjustment as the semantic feature extractor (or the target semantic feature extractor, i.e., the target semantic feature extractor can be the semantic feature extractor obtained after adjusting the parameters of the initial semantic feature extractor).
[0131] The initial semantic feature extractor can be an untrained semantic feature extractor (e.g., a semantic feature extractor that requires parameter fine-tuning during training), and its network structure is consistent with that of the target semantic feature extractor. The first sample image can be a sample image used to train the initial semantic feature extractor.
[0132] Here, sample semantic feature data refers to the feature data obtained by performing semantic feature extraction processing on the first sample image through an initial semantic feature extractor, which can characterize the semantic features of the first sample image. The method for determining sample semantic feature data can refer to the above introduction on determining the semantic feature data of business images, and will not be repeated here.
[0133] The scaled sample image can be an image obtained by scaling the first sample image. The scaled sample image is identical to the first sample image in terms of image content, but the image size is different, which means that the image quality of the two is different.
[0134] Scaling semantic feature data refers to the feature data obtained by performing semantic feature extraction on the scaled sample image through an initial semantic feature extractor. The method for determining scaling semantic feature data can be referred to the above introduction on determining the semantic feature data of business images, and will not be repeated here.
[0135] It is understandable that the scaled sample image is consistent with the first sample image in terms of image content. Therefore, the sample semantic feature data extracted by the same initial semantic feature extractor and the scaled semantic feature data should be as consistent as possible. Therefore, loss information for the initial semantic feature extractor (which can be denoted as the first loss information) can be calculated based on the sample semantic feature data and the scaled semantic feature data. Then, the network parameters of the initial semantic feature extractor can be adjusted based on the first loss information until the first loss information converges. This allows the trained initial semantic feature extractor to extract the semantic features of the image. Thus, the initial semantic feature extractor after parameter adjustment can be determined as the semantic feature extractor (i.e., the target semantic feature extractor mentioned above).
[0136] For example, please refer to Figure 8, which is a flowchart illustrating the training process of a semantic feature extractor provided in an embodiment of this application. As shown in Figure 8, a first sample image 801a for training the initial semantic feature extractor can first be obtained, and the first sample image 801a can be scaled to obtain a scaled sample image 802a. It can be seen that the image content of the first sample image 801a and the scaled sample image 802a is the same, but the image size is different. Then, the initial semantic feature extractor (as shown in 803a in Figure 8) can be used to perform semantic feature extraction processing on the first sample image to obtain sample semantic feature data of the first sample image (as shown in 804a in Figure 8). The initial semantic feature extractor (as shown in 803a in Figure 8) can then be used to perform semantic feature extraction processing on the scaled sample image to obtain scaled semantic feature data of the scaled sample image (as shown in 805a in Figure 8). Based on the sample semantic feature data and the scaled semantic feature data, the first loss information can be determined (as shown in 806a in Figure 8). For example, in this embodiment, the difference between the sample semantic feature data and the scaled semantic feature data can be calculated, and the first loss information can be determined by calculating the difference. It should be understood that the first loss information here can be... The first loss information is used to provide feedback on the difference in image content between the first sample image 801a and the scaled sample image 802a extracted by the initial semantic feature extractor. Based on this first loss information, the network parameters of the initial semantic feature extractor are adjusted until the loss converges. In other words, in this embodiment, when the difference between the two most recently extracted semantic feature data (i.e., the latest extracted sample semantic feature data and the scaled semantic feature data) by the initial semantic feature extractor after network parameter adjustment is sufficiently small, it indicates that the two most recently extracted semantic feature data have been kept as consistent as possible in terms of image content. Therefore, it can be determined that the initial semantic feature extractor after network parameter adjustment has reached loss convergence, and thus the initial semantic feature extractor after network parameter adjustment can be identified as a semantic feature extractor.
[0137] In one embodiment, the method for determining a quality feature extractor may include the following steps: ① acquiring a second sample image for training an initial quality feature extractor; ② performing quality feature extraction processing on the second sample image using the initial quality feature extractor to obtain sample quality feature data of the second sample image; ③ segmenting the second sample image to obtain multiple image slices corresponding to the second sample image, and randomly combining the multiple image slices to obtain a combined sample image corresponding to the second sample image; the arrangement and display order of the multiple image slices in the combined sample image is different from the arrangement and display order of the multiple image slices in the second sample image; ④ performing quality feature extraction processing on the combined sample image using the initial quality feature extractor to obtain combined quality feature data of the combined sample image; ⑤ adjusting the parameters of the initial quality feature extractor based on the sample quality feature data and the combined quality feature data, and determining the initial quality feature extractor after parameter adjustment as the quality feature extractor (i.e., the target quality feature extractor, for example, the quality feature extractor may be a semantic feature extractor that has completed parameter fine-tuning training).
[0138] The initial quality feature extractor can be an untrained quality feature extractor (e.g., a quality feature extractor that requires parameter fine-tuning during training), and its network structure is consistent with that of the target quality feature extractor. The second sample image can be a sample image used to train the initial quality feature extractor.
[0139] The sample quality feature data refers to the feature data obtained by extracting quality features from the second sample image using the initial quality feature extractor, which can characterize the quality features of the second sample image. The method for determining the sample quality feature data can be referred to the above description of determining the quality feature data of business images, and will not be repeated here.
[0140] The combined sample image can be an image obtained by randomly combining multiple image slices from the second sample image. The arrangement order of the multiple image slices in the combined sample image differs from the arrangement order of the multiple image slices in the second sample image. Therefore, the combined sample image and the second sample image have the same image quality, but their image content is scrambled, meaning the image content is different. An image slice refers to a slice obtained by segmenting the second sample image. For example, the size of each image slice can be the same or different; this is not limited. This application does not limit the number of image slices obtained from segmenting the second sample image; it can be determined according to actual needs. For example, segmenting the second sample image can involve dividing it into four equal parts to obtain four image slices of the same size, or dividing it into six equal parts to obtain six image slices of the same size.
[0141] It is understandable that the combined sample image and the second sample image are consistent in image quality. Therefore, the sample quality feature data and the combined quality feature data extracted by the same initial quality feature extractor should be as consistent as possible. Thus, loss information for the initial quality feature extractor (which can be denoted as the second loss information) can be calculated based on the sample quality feature data and the combined quality feature data. Then, the network parameters of the initial quality feature extractor can be adjusted using this second loss information until the second loss information converges. This allows the trained initial quality feature extractor to extract image quality features. Therefore, the initial quality feature extractor with adjusted parameters can be determined as the quality feature extractor.
[0142] For example, please refer to Figure 9, which is a flowchart illustrating the training process of a quality feature extractor provided in an embodiment of this application. As shown in Figure 9, a second sample image 901a for training the initial quality feature extractor can first be acquired. Then, the second sample image is segmented to obtain multiple image slices corresponding to the second sample image (as shown in 901b in Figure 9). These multiple image slices may include image slice ①, image slice ②, ... image slice ⑥, etc. The multiple image slices can then be combined (e.g., randomly) to obtain a combined sample image 902a. It can be seen that the arrangement order of the image slices in the combined sample image 902a is different from the arrangement order of the multiple image slices in the second sample image 901a. Obviously, the image content of the second sample image 901a and the combined sample image 902a is different, but their image sizes are the same, thus their image quality is consistent. Then, the initial quality feature extractor (as shown in 903a in Figure 9) can be used to extract quality features from the second sample image to obtain sample quality feature data of the second sample image (as shown in 904a in Figure 9). The initial quality feature extractor (as shown in 903a in Figure 9) can then be used to extract quality features from the combined sample image to obtain combined quality feature data of the combined sample image (as shown in 905a in Figure 9). Based on the sample quality feature data and the combined quality feature data, the second loss information can be determined (as shown in 906a in Figure 9). Based on this second loss information, the network parameters of the initial quality feature extractor can be adjusted until the loss converges. Thus, the initial quality feature extractor after network parameter adjustment can be determined as the quality feature extractor.
[0143] S204. When obtaining model text prompt information for the target business model, obtain the image quality rules under the target business scenario;
[0144] The image quality rules are configured for the target business scenario. These image quality rules can refer to rules used to determine image quality. The image quality rules for the target business scenario can be configured specifically for that scenario, for example, they can be configured by the scenario management object (such as managing users) of the corresponding business scenario. It is understood that in this embodiment, each business scenario in the set of business scenarios can be configured with corresponding image quality rules. Therefore, when the target business scenario is determined, the image quality rules for that target business scenario can be determined based on the target business scenario.
[0145] The image quality rules can include N sub-rules, where N is a positive integer. Each sub-rule can be a judgment logic under the target business scenario. For example, in a resource recommendation scenario, additional image quality rules can be configured so that when the image type of the business image is the target type, a preset quality score can be added to the first quality detection result. Even if the first quality detection result indicates that the business image is relatively blurry, the image quality rules can still improve the detected image quality score, thereby avoiding reducing the recommendation probability of business images of the target type. The target type can be a specified type among multiple image types that classify the content of the business image. For example, in some business scenarios, the target type can be an image type that is inherently prone to blurring, such as a selfie, a night scene, a KTV scene, a sports scene, etc., without limitation. The image type to which the business image belongs can be obtained by classifying the business feature data of the business image through the target business model, or by classifying and recognizing it through other image classification models, or it can be labeled by the business push (such as when a user publishes or uploads a business image), without limitation. Based on this, this application allows for more flexible special processing of certain types of images through the configuration of image quality rules. Furthermore, the configuration of these image quality rules can be customized at any time without affecting the processing logic of the target business model, thus offering greater flexibility and enabling more flexible image quality detection for images in various business scenarios.
[0146] S205. Construct multimodal model prompts for the target business model based on image quality rules, multimodal feature data, and model text prompts, and input the multimodal model prompts into the target business model.
[0147] The description of the multimodal model prompt information can be found in the above-mentioned descriptions and will not be repeated here. For ease of distinction, in this embodiment, the multimodal model prompt information constructed based on multimodal feature data and model text prompt information can be referred to as the first multimodal model prompt information, and the multimodal model prompt information constructed based on image quality rules, multimodal feature data, and model text prompt information can be referred to as the second multimodal model prompt information. Furthermore, when constructing the second multimodal model prompt information, it can be input into the target business model. For example, the second multimodal model prompt information can prompt the target business model to determine the first quality detection result based on the multimodal feature data, and adjust the quality result of the first quality detection result through image quality rules to obtain the second quality detection result, thereby outputting the image quality detection data determined by the first quality detection result and the second quality detection result.
[0148] S206. The target business model performs image quality detection on the multimodal feature data according to the multimodal model prompt information to obtain the first quality detection result of the business image.
[0149] It should be understood that when the computer device executes step S206, the multimodal model prompt information here specifically refers to the aforementioned second multimodal prompt information. This second multimodal prompt information is used to instruct the target service model, when performing image quality detection on the multimodal feature data used to characterize the service image, to temporarily determine the result obtained from the image quality detection as the first quality detection result, so that step S207 can be further executed subsequently. The method by which the computer device determines the first quality detection result can be specifically described in the embodiment corresponding to Figure 3 above, using the aforementioned multimodal model prompt information (e.g., the first multimodal model prompt information) to obtain the first quality detection result of the service image, and will not be elaborated here.
[0150] S207. Based on the image quality rules and the first quality detection result, obtain the second quality detection result for the business image.
[0151] It should be understood that when the computer device invokes the target business model to execute step S207, it can also optimize the first quality detection result based on the image quality rules indicated by the second multimodal prompt information to obtain a second quality detection result. The second quality detection result can be a quality detection result determined based on image quality rules, specifically, it can refer to the result obtained by adjusting the quality result of the first quality detection result (i.e., optimizing the result) based on image quality rules. This second quality detection result can be represented as an image quality score.
[0152] In one embodiment, the image quality rule includes N sub-rules, where N is a positive integer. Then, obtaining a second quality detection result for the business image through the image quality rule and the first quality detection result may include the following steps: ① determining a target sub-rule that matches the business image from the N sub-rules; ② calling the target sub-rule to adjust the quality result of the first quality detection result to obtain a second quality detection result for the business image.
[0153] The target sub-rule can be a sub-rule that matches the business image. For example, the N sub-rules may include sub-rule R1, sub-rule R2, and sub-rule R3, where sub-rule R1 is: when the image type of the business image is a selfie, increase the image quality score indicated by the first quality detection result by a first preset quality score; sub-rule R2 is: when the image type of the business image is a motion scene, increase the image quality score indicated by the first quality detection result by a second preset quality score; and sub-rule R3 is: when the image type of the business image is a night scene, increase the image quality score indicated by the first quality detection result by a third preset quality score. Therefore, determining the target sub-rule that matches the business image from N sub-rules can be done as follows: obtain the image type corresponding to the business image, and determine the sub-rule that matches the business image based on the image type. For example, when the image type of the business image is a selfie, sub-rule R1 can be determined as the target sub-rule, and then the target sub-rule can be called to adjust the quality result of the first quality detection result. This can be done by adding a first preset quality score to the image quality score indicated by the first quality detection result according to the instruction of the target sub-rule (i.e., sub-rule R1) to obtain the second quality detection result.
[0154] For example, any subrule can be represented by the following formula (3): R j If K t ="K j ",then Q t2 =E(Q) t1 ) Formula (3)
[0155] Among them, R j K represents the j-th sub-rule in the image quality rules. t Indicates the image type of the business image, "K j "Indicates the image type specified in the j-th sub-rule, denoted as image type K. j For example, the image type K j Specifically, these can be the above-mentioned sports scenario types, Q t2 Q is the second quality detection result for the business image. t1 Let E() be the first quality detection result of the business image, and let E() be the function that can adjust the first quality detection result according to the j-th sub-rule.
[0156] Furthermore, the computer device calls the target sub-rule to adjust the quality result of the first quality detection result, which can be expressed by the following formula (4): Q t2 =override(Q t1 R j ) Formula (4)
[0157] Among them, Q t2 Q represents the second quality detection result of the business image. t1 R represents the first quality inspection result of the business image. j This represents the target sub-rule that matches the business image; override() indicates that the target sub-rule R is invoked. j Regarding the first quality inspection result Q t1 A function for adjusting quality results (i.e. optimizing results).
[0158] S208. Based on the first quality detection result and the second quality detection result, determine the image quality detection data.
[0159] It is understandable that determining image quality detection data based on the first quality detection result and the second quality detection result can be achieved by weighted fusion of the first and second quality detection results. Specifically, it can be achieved by weighted averaging of the image quality score indicated by the first quality detection result and the image quality score indicated by the second quality detection result to obtain the final target image quality score. The final image quality detection data can then be determined based on the target image quality score. For example, the target image quality score can be used as the image quality detection data, or the final image quality category or quality compliance category can be determined based on the target image quality score as the image quality detection data.
[0160] It is understood that, in this embodiment of the application, by introducing an image quality detection optimization mechanism based on image quality rules, the entire image quality detection system can possess a high level of flexibility and customizability. This allows business stakeholders in various business scenarios to more flexibly adjust the image quality rules for their respective scenarios. Furthermore, when performing image quality detection on business images within a target business scenario, specific business rules (i.e., the image quality rules corresponding to the target business scenario) can be directly input into the target business model without retraining it. This allows the target business model to comprehensively consider these rules during image quality detection. Thus, when generating the discrimination result through the target business model, the specific image quality rules can be combined to achieve intelligent adjustment and optimization of the discrimination result, thereby improving the flexibility and customization of image quality detection. Moreover, the target business model can combine the retrieved target standard image and its sharpness score (i.e., image quality data) to comprehensively improve the reliability of image quality detection discrimination. Understandably, the design of the entire image quality inspection system supports rapid adaptation to new sharpness review standards or modifications to existing standards. That is, it can adjust the standard image sets and image quality rules corresponding to various business scenarios. This feature enables the entire image quality inspection system to continuously meet the changing needs of business stakeholders, thereby achieving more efficient sharpness review (i.e., image quality review).
[0161] In one embodiment, a first quality detection result is used to indicate a first image quality score for a business image, and a second quality detection result is used to indicate a second image quality score for a business image. Therefore, determining image quality detection data based on the first and second quality detection results may include the following steps: ① obtaining a first scoring weight for the first image quality score and a second scoring weight for the second image quality score; ② performing a weighted fusion of the first and second image quality scores based on the first and second scoring weights to obtain the image quality detection data.
[0162] The first image quality score can refer to the image quality score indicated by the first quality detection result, and the second image quality score can refer to the image quality score indicated by the second quality detection result. For an introduction to the image quality scores, please refer to the relevant descriptions above, which will not be repeated here.
[0163] Here, the first scoring weight can refer to the weight corresponding to the first image quality score, and the second scoring weight can refer to the weight corresponding to the second image quality score. It is understood that these first and second scoring weights characterize the importance of the first and second image quality scores in determining the final image quality detection data. A larger scoring weight indicates a greater impact of the corresponding image quality score (such as the first or second image quality score) on the determined image quality detection data. It is understood that these first and second scoring weights can be preset according to actual business needs and are not limited here.
[0164] The image quality detection data is obtained by weighted fusion of the first image quality score and the second image quality score. Alternatively, the target image quality score can be obtained by weighted averaging of the first image quality score and the second image quality score, and then the image quality detection data can be determined based on the target image quality score.
[0165] For example, the image quality detection data for business images can be determined by referring to the following formula (5). Q t =αQ t1 +βQ t2 Formula (5)
[0166] Among them, Q t Q represents the target image quality score indicated by the image quality detection data of the business image. t1 Q represents the first image quality score. t2 This represents the second image quality score. Correspondingly, α represents the first image quality score Q. t1 The corresponding first scoring weight, β, represents the second image quality score Q. t2 The corresponding second scoring weight.
[0167] It is understood that in the embodiments of this application, the target business model can be obtained by fine-tuning a large language model that has been pre-trained with a large amount of sample data. That is, the large language model that has been pre-trained with a large amount of sample data can be used as the initial business model. At this time, the target business model can be obtained by further fine-tuning the model parameters of the initial business model.
[0168] This section, in conjunction with illustrations, describes the process of image quality detection for business images based on multimodal feature data and image quality rules. For example, please refer to Figure 10, which is a flowchart illustrating another image quality detection process provided in an embodiment of this application. As shown in Figure 10, after obtaining the business image (1001a in Figure 10) in the target business scenario, multimodal feature data (1010a in Figure 10) can be determined based on the steps shown in Figures 1001a-1010a. The method for determining this multimodal feature data (1010a in Figure 10) can refer to the relevant descriptions in Figures 501a-510a above, and will not be repeated here. Furthermore, image quality rules corresponding to the target business scenario can be obtained (as shown in 1011a of Figure 10). Based on the image quality rules, multimodal feature data, and model text prompts, the prompt word engineering (as shown in 1012a of Figure 10) determines the multimodal model prompt information. Then, the target business model (as shown in 1013a of Figure 10) can perform image quality detection on the multimodal feature data according to the instructions of the multimodal model prompt information, obtaining the image quality detection data of the business image (as shown in 1014a of Figure 10). The specific process of determining the image quality detection data can be referred to the relevant descriptions above, and will not be repeated here.
[0169] In one embodiment, the present application may further include the following steps: ① Obtaining sample training data for training an initial business model; the sample training data includes a third sample image, an image quality label associated with the third sample image, and a sample standard image associated with the third sample image; the sample standard image is bound to sample image quality data; ② When obtaining the sample feature data corresponding to the third sample image and the sample standard feature data corresponding to the sample standard image, the sample feature data, the sample standard feature data, and the sample image quality data are concatenated to obtain sample multimodal feature data for inputting the initial business model; ③ Performing image quality detection on the sample multimodal feature data through the initial business model to obtain a predicted image quality detection result; ④ Fine-tuning the model parameters of the initial business model based on the predicted image quality detection result and the image quality label to obtain the target business model.
[0170] The predicted image quality detection result can be the predicted image quality data obtained by performing image quality detection on the multimodal feature data of the samples through the initial business model (e.g., the predicted image quality score indicated by the predicted image quality data). The initial business model can refer to the business model that has not been fine-tuned; for example, the initial business model can be a large language model trained on a large corpus. The model structure of the initial business model can refer to the introduction of the model structure of the target business model above, and will not be repeated here.
[0171] The training data can be sample data used to train the initial business model. The third sample image in the training data can refer to a sample image used for image quality detection. The image quality label can be a label indicating the image quality of the third sample image, used to characterize the actual image quality of the third sample image. This image quality label can be represented as the image quality score, image quality category, or quality compliance category (i.e., image quality is qualified or unqualified) of the third sample image; no specific limitation is made here.
[0172] The sample standard image associated with the third sample image can refer to the standard image corresponding to the third sample image. This sample standard image can be used as a reference for image quality detection of the third sample image. Sample image quality data can refer to the image quality data bound to the sample standard image. For an introduction to sample image quality data, please refer to the relevant description of the image quality data of the standard image mentioned above, which will not be repeated here.
[0173] Sample feature data can refer to the feature data corresponding to the third sample image, which can be obtained by extracting features from the third sample image using the aforementioned feature extractor. The method for determining this sample feature data can refer to the relevant description of the business feature data of the aforementioned business image, and will not be elaborated here. Sample standard feature data can refer to the feature data corresponding to the sample standard image, which can be obtained by extracting features from the sample standard image using the aforementioned feature extractor. The method for determining this sample feature data can refer to the relevant description of the standard feature data of the aforementioned standard image, and will not be elaborated here.
[0174] Multimodal feature data of samples is feature data obtained by stitching together sample feature data, sample standard feature data and sample image quality data. The specific acquisition process can be found in the relevant description of multimodal feature data above, and will not be repeated here.
[0175] The predicted image quality detection result can refer to the result obtained by the initial business model from performing image quality detection on the multimodal feature data of the samples. For details, please refer to the relevant description of determining the image quality detection data mentioned above, which will not be repeated here. The predicted image quality detection result can be expressed as an image quality score, the probability for each image quality category, or the probability that the image quality is acceptable; there are no restrictions here.
[0176] Furthermore, a third loss information can be determined based on the predicted image quality detection results and image quality labels. Then, the model parameters of the initial business model can be adjusted based on this third loss information until the loss converges. This makes the predicted image quality detection results obtained by the parameter-adjusted initial business model more similar to the image quality labels. The parameter-adjusted initial business model can then be determined as the target business model. This target business model has the ability to perform image quality detection based on multimodal feature data. Therefore, image quality detection can be performed on business images using this target business model to obtain image quality detection data adapted to the target business scenario.
[0177] In one embodiment, each business scenario in at least one business scenario has a corresponding scenario identifier; obtaining the business image to be image quality detected includes: ① obtaining the business video to be image quality detected, and selecting a target video frame from multiple video frames included in the business video; the business video is associated with a target scenario identifier; ② searching for a business scenario that matches the target scenario identifier from at least one business scenario; ③ if a target business scenario that matches the target scenario identifier is found, then the target video frame is determined as the business image to be image quality detected.
[0178] The description of at least one business scenario can be found above and will not be repeated here. A scenario identifier can be a unique identifier corresponding to a business scenario. Each business scenario in the set of business scenarios has a corresponding scenario identifier, thus allowing the corresponding business scenario to be determined from the set of business scenarios based on the scenario identifier when it is obtained.
[0179] In this context, "business video" refers to the video to be subjected to image quality inspection. This business video can be a video published or uploaded by a business object (such as a user). A business video can include multiple video frames, which are the basic building blocks of a video, and each video frame is a still image.
[0180] The target video frame can be a video frame selected from multiple video frames included in the business video for image quality detection. Selecting the target video frame from multiple video frames can be done randomly or according to a certain video frame sampling rule; this is not limited here. The video frame sampling rule can refer to the rules used to select video frames from multiple video frames in the business video. For example, the video frame sampling rule can include: determining the number of sampling frames based on the video length of the business video, and selecting video frames as business images based on the number of sampling frames; or, the video frame sampling rule can include: determining key video frames from multiple video frames in the business video, and selecting video frames as business images from the key video frames. The key video frame can refer to a key frame in the business video, which can be the frame containing the key action in the movement of a character or object during animation or video production. Alternatively, the video sampling rule can also be other rules; this is not limited here.
[0181] The business video can be associated with a target scene identifier, which refers to the scene identifier corresponding to the business scene to which the business video belongs. Furthermore, based on the target scene identifier, a business scene matching the target scene identifier can be searched from multiple business scenes included in the business scene set. Thus, the business scene matching the target scene identifier can be identified as the target business scene. It is understood that if a target business scene matching the scene identifier is found, it means that the business scene to which the business video belongs belongs to the business scene set, and image quality detection can be performed using this application embodiment. Conversely, if no target business scene matching the scene identifier is found, it means that the business scene to which the business video belongs does not belong to the business scene set, and it will be difficult to obtain the relevant standard image set corresponding to the business scene to which the business video belongs. Therefore, image quality detection is not required using this application embodiment.
[0182] For example, in embodiments of this application, if no target business scene matching the scene identifier is found, the clarity of each video frame or part of the key video frames (i.e., business images) in the business video can continue to be judged according to the single-modal clarity judgment algorithm. For example, embodiments of this application can directly extract image features of each video frame or part of the key video frames (i.e., business images) in the image feature modality to extract image features such as contrast, brightness, and texture. Then, the clarity of each video frame or part of the key video frames (i.e., business images) in the business video can be evaluated by calculating the quantitative indicators of these features. For example, in one implementation, the brightness difference between bright and dark areas in the business image can be used as the quantitative quality of contrast for clarity judgment. That is, the contrast can be used to measure the brightness difference between bright and dark areas in the business image. For example, embodiments of this application can calculate the ratio of the maximum pixel value in the bright area to the minimum pixel value in the dark area of the business image as the quantitative indicator of contrast. The higher the contrast, the clearer the business image is usually. For another example, the uniformity of brightness can also be used as the quantitative indicator of brightness for clarity judgment. Similarly, the richness of texture detail can also be used as a quantitative measure of texture quality for judging sharpness. It should be understood that the embodiments of this application may base sharpness judgment on one or more of the aforementioned indicators such as contrast, brightness, and texture; these will not be limited here.
[0183] It is understandable that the number of target video frames selected from multiple video frames in a business video can be one or more. Each selected target video frame can be used as a business image, and the business scenario to which each business image belongs is the target business scenario. Therefore, image quality detection can be performed on one or more target video frames selected from the business video frames. Then, based on the image quality detection results corresponding to each target video frame, the video quality detection result of the business video can be determined. This video quality detection result can refer to the detection result of the video quality of the business video. Optionally, determining the video quality detection result of the business video can be achieved by calculating the average video frame score (i.e., the average of the image quality scores corresponding to each target video frame) of the image quality detection results corresponding to each target video frame. The video quality detection result is then determined based on this average video frame score. For example, if the average video frame score is greater than or equal to a certain video score threshold, a video quality detection result indicating that the image quality of the business video is acceptable is determined; conversely, if the average video frame score is less than a certain video score threshold, a video quality detection result indicating that the image quality of the business video is unacceptable is determined. Furthermore, based on the video quality detection results, corresponding business processing can be performed on the business video in the relevant target business scenario. For example, in the resource recommendation scenario, if the video quality detection result of the business video in the video resource indicates that the video quality of the business video is unqualified, the probability of recommending the video resource corresponding to the business video can be reduced.
[0184] It is understood that the image processing method provided in this application embodiment can be encapsulated as a reusable image quality detection component for widespread use in multiple business scenarios. For example, the image quality detection component can adopt a preset protocol, such as the Inferno (a protocol type) inference framework protocol, and its input parameters can include the video identifier of the business video (or the image identifier of the business image), the video source (i.e., the business scene identifier), the title, and other parameters. Through efficient interface design, the image quality detection component can be quickly integrated into different business scenarios to meet various content quality assessment needs, helping businesses improve efficiency and user experience in multiple fields such as video processing, content recommendation, and creation optimization. For example, the component can adopt a preset protocol (such as the Inferno inference framework protocol), and its input parameters can include the video identifier of the business video, the video source (i.e., the business scene identifier), the title, and other parameters. Thus, in actual business processing, the computer device (which can be the terminal device and / or server shown in Figure 1 above) deploying the image quality detection component can call the image quality detection component to quickly perform image quality detection, thereby improving the efficiency of image quality detection.
[0185] Please refer to Figure 11, which is a schematic diagram of the structure of an image processing apparatus provided in an embodiment of this application. As shown in Figure 11, the image processing apparatus 1 can be a computer program (including program code) running on a computer device (e.g., the server 100a mentioned above), for example, the image processing apparatus 1 is an application software; it is understood that the image processing apparatus 1 can be used to execute the corresponding steps in the image processing method provided in the embodiment of this application. As shown in Figure 11, the image processing apparatus 1 may include: an image acquisition unit 11, an image retrieval unit 12, a feature data stitching unit 13, and an image quality detection unit 14;
[0186] Image acquisition unit 11 is used to acquire a business image to be subjected to image quality detection; the business image is an image under a target business scenario determined in at least one business scenario; each business scenario in at least one business scenario is assigned and configured with a corresponding standard image set;
[0187] Image retrieval unit 12 is used to retrieve a target standard image that matches the business image from the standard image set corresponding to the target business scenario, and obtain the target image quality data bound to the target standard image.
[0188] The feature data splicing unit 13 is used to splice the business feature data, target standard feature data and target image quality data when the business feature data of the business image and the target standard feature data of the target standard image are obtained, and splice them to obtain multimodal feature data for input into the target business model.
[0189] The image quality detection unit 14 is used to input multimodal feature data and model text prompt information into the target business model when model text prompt information for the target business model is obtained. The target business model then performs image quality detection on the multimodal feature data according to the model text prompt information to obtain image quality detection data.
[0190] In one embodiment, the image retrieval unit 12 is specifically used for:
[0191] The business image is processed by a feature extractor to obtain the business feature data of the business image;
[0192] Obtain the standard feature data corresponding to each standard image in the standard image set corresponding to the target business scenario; the standard feature data corresponding to each standard image is obtained by performing feature extraction processing on each standard image using a feature extractor;
[0193] Based on the feature similarity between the business feature data and the standard feature data corresponding to each standard image, the image similarity between each standard image and the business image is determined.
[0194] From the standard image set corresponding to the target business scenario, find standard images whose image similarity to the business images meets the similarity conditions, and determine the found standard images as target standard images that match the business images.
[0195] In one embodiment, the target standard image includes M standard images, where M is a positive integer; the target standard feature data includes standard feature data corresponding to each of the M standard images; and the target image quality data includes image sharpness data corresponding to each of the M standard images.
[0196] Feature data splicing unit 13 is specifically used for:
[0197] The standard feature data corresponding to each of the M standard images are subjected to a first stitching process to obtain the first stitched feature data;
[0198] The image sharpness data corresponding to each of the M standard images is subjected to a second stitching process to obtain the second stitching feature data; the service mode indicated by the second stitching feature data is different from the service mode indicated by the first stitching feature data.
[0199] Obtain the splicing function for multimodal splicing processing. Use the splicing function to splice the business feature data, the first spliced feature data, and the second spliced feature data to obtain the multimodal feature data for input to the target business model.
[0200] In one embodiment, the image quality detection unit 14 is specifically used for:
[0201] Multimodal model prompts are constructed based on multimodal feature data and model text prompts, and then the multimodal model prompts are input into the target business model.
[0202] The target business model performs image quality detection on the multimodal feature data according to the multimodal model prompts to obtain the first quality detection result of the business image;
[0203] The first quality detection result is determined as the image quality detection data.
[0204] In one embodiment, the image quality detection unit 14 is specifically used for:
[0205] Obtain the image quality rules for the target business scenario; the image quality rules are configured specifically for the target business scenario.
[0206] Based on image quality rules, multimodal feature data, and model text prompts, multimodal model prompts are constructed for the target business model, and the multimodal model prompts are input into the target business model.
[0207] The target business model performs image quality detection on the multimodal feature data according to the multimodal model prompts to obtain the first quality detection result of the business image;
[0208] By combining the image quality rules and the first quality detection result, a second quality detection result is obtained for the business image.
[0209] Based on the first quality detection result and the second quality detection result, the image quality detection data is determined.
[0210] In one embodiment, the image quality rule includes N sub-rules, where N is a positive integer;
[0211] Image quality detection unit 14 is specifically used for:
[0212] Determine the target sub-rule that matches the business image from among N sub-rules;
[0213] The target sub-rule is invoked to adjust the quality result of the first quality detection result, resulting in a second quality detection result for the business image.
[0214] In one embodiment, a first quality detection result is used to indicate a first image quality score of the business image, and a second quality detection result is used to indicate a second image quality score of the business image.
[0215] Image quality detection unit 14 is specifically used for:
[0216] Obtain the first rating weight for the first image quality score and the second rating weight for the second image quality score;
[0217] Based on the first and second scoring weights, the first image quality score and the second image quality score are weighted and fused to obtain image quality detection data.
[0218] In one embodiment, the business feature data is obtained by performing feature extraction processing on the business image using a feature extractor, which includes a semantic feature extractor and a quality feature extractor.
[0219] The image processing device 1 further includes: a feature extraction unit 15;
[0220] Feature extraction unit 15 is specifically used for:
[0221] The semantic feature extractor is used to extract semantic features from the business image to obtain the semantic feature data of the business image.
[0222] The quality feature extractor is used to extract quality features from the business image to obtain the quality feature data of the business image.
[0223] The semantic feature data and quality feature data are fused to obtain the business feature data of the business image.
[0224] In one embodiment, the image processing device 1 further includes: a semantic feature extraction training unit 16;
[0225] Semantic feature extraction training unit 16 is specifically used for:
[0226] Obtain the first sample image used to train the initial semantic feature extractor;
[0227] The semantic features of the first sample image are extracted by the initial semantic feature extractor to obtain the sample semantic feature data of the first sample image;
[0228] The first sample image is scaled to obtain a scaled sample image corresponding to the first sample image.
[0229] The initial semantic feature extractor is used to extract semantic features from the scaled sample image to obtain the scaled semantic feature data of the scaled sample image.
[0230] Based on the sample semantic feature data and scaled semantic feature data, the parameters of the initial semantic feature extractor are adjusted, and the initial semantic feature extractor after parameter adjustment is determined as the semantic feature extractor.
[0231] In one embodiment, the image processing apparatus 1 further includes: a quality feature extraction training unit 17;
[0232] Quality feature extraction training unit 17 is specifically used for:
[0233] Obtain a second sample image for training the initial quality feature extractor;
[0234] The quality feature extraction process of the second sample image is performed by the initial quality feature extractor to obtain the sample quality feature data of the second sample image;
[0235] The second sample image is segmented to obtain multiple image slices corresponding to the second sample image. These multiple image slices are then randomly combined to obtain a combined sample image corresponding to the second sample image. The arrangement and display order of the multiple image slices in the combined sample image is different from the arrangement and display order of the multiple image slices in the second sample image.
[0236] The combined sample images are processed by the initial quality feature extractor to extract quality features, resulting in combined quality feature data of the combined sample images.
[0237] Based on sample quality feature data and combined quality feature data, the parameters of the initial quality feature extractor are adjusted, and the initial quality feature extractor after parameter adjustment is determined as the quality feature extractor.
[0238] In one embodiment, each business scenario in at least one business scenario has a corresponding scenario identifier;
[0239] Image acquisition unit 11 is specifically used for:
[0240] The business video to be image quality inspected is acquired, and the target video frame is selected from multiple video frames included in the business video; the business video is associated with a target scene identifier;
[0241] Find at least one business scenario that matches the target scenario identifier;
[0242] If a target business scene that matches the target scene identifier is found, the target video frame is identified as the business image to be inspected for image quality.
[0243] In one embodiment, the image processing apparatus further includes: a business model training unit 18;
[0244] Business model training unit 18 is specifically used for:
[0245] Obtain sample training data for training the initial business model; the sample training data includes a third sample image, an image quality label associated with the third sample image, and a sample standard image associated with the third sample image; the sample standard image is bound to sample image quality data;
[0246] When the sample feature data corresponding to the third sample image and the sample standard feature data corresponding to the sample standard image are obtained, the sample feature data, sample standard feature data and sample image quality data are spliced together to obtain sample multimodal feature data for input into the initial business model.
[0247] Image quality detection is performed on the multimodal feature data of the samples using the initial business model to obtain the predicted image quality detection results;
[0248] Based on the predicted image quality detection results and image quality labels, the model parameters of the initial business model are fine-tuned to obtain the target business model.
[0249] Please refer to Figure 12, which is a schematic diagram of the structure of a computer device provided in an embodiment of this application. For example, the computer device can be the server 100a in Figure 1 above. As shown in Figure 12, the computer device 1000 may include: a processor 1001, a network interface 1004, and a memory 1005. In addition, the computer device 1000 may also include: a user interface 1003, and at least one communication bus 1002. The communication bus 1002 is used to realize the connection and communication between these components. The user interface 1003 may include a display screen and a keyboard. Optionally, the user interface 1003 may also include a standard wired interface and a wireless interface. The network interface 1004 may optionally include a standard wired interface and a wireless interface (such as a Wi-Fi interface). The memory 1005 may be a high-speed RAM memory or a non-volatile memory, such as at least one disk storage device. Optionally, the memory 1005 may also be at least one storage device located remotely from the aforementioned processor 1001. As shown in Figure 12, the memory 1005, which is a computer-readable storage medium, may include an operating system, a network communication module, a user interface module, and a device control application program.
[0250] In the computer device 1000 shown in Figure 12, the network interface 1004 provides network communication functionality; the user interface 1003 is mainly used to provide an input interface for the user; and the processor 1001 can be used to call the device control application stored in the memory 1005 to execute the image processing method described in any of the corresponding embodiments above, which will not be repeated here. Furthermore, the beneficial effects of using the same method will also not be repeated.
[0251] Furthermore, it should be noted that this application also provides a computer-readable storage medium storing a computer program executed by the image processing apparatus 1 mentioned above. The computer program includes program instructions, which, when executed by the processor, enable the execution of the image processing methods described in the embodiments of Figures 3 and 6 above. Therefore, these descriptions will not be repeated here. Additionally, the beneficial effects of using the same method will also not be repeated. For technical details not disclosed in the embodiments of the computer-readable storage medium involved in this application, please refer to the description of the method embodiments of this application.
[0252] The aforementioned computer-readable storage medium can be an internal storage unit of the image processing apparatus or computer device provided in any of the foregoing embodiments, such as a hard disk or memory of the computer device. The computer-readable storage medium can also be an external storage device of the computer device, such as a plug-in hard disk, smart media card (SMC), secure digital (SD) card, flash card, etc., provided on the computer device. Furthermore, the computer-readable storage medium may include both internal and external storage units of the computer device. The computer-readable storage medium is used to store the computer program and other programs and data required by the computer device. The computer-readable storage medium can also be used to temporarily store data that has been output or will be output.
[0253] Furthermore, it should be noted that this application also provides a computer program product or computer program, which includes computer instructions stored in a computer-readable storage medium. The processor of a computer device reads the computer instructions from the computer-readable storage medium and executes the computer instructions, causing the computer device to perform the method provided in any of the preceding corresponding embodiments. Additionally, the beneficial effects of using the same method will not be repeated here. For technical details not disclosed in the embodiments of the computer program product or computer program involved in this application, please refer to the description of the method embodiments of this application.
[0254] In this application embodiment, the terms "module" or "unit" refer to a computer program or part of a computer program that has a predetermined function and works with other related parts to achieve a predetermined goal, and can be implemented wholly or partially using software, hardware (such as processing circuitry or memory), or a combination thereof. Similarly, a processor (or multiple processors or memory) can be used to implement one or more modules or units. Furthermore, each module or unit can be part of an overall module or unit that includes the functionality of that module or unit.
[0255] The terms "first," "second," etc., in the specification, claims, and drawings of this application are used to distinguish different objects, not to describe a specific order. Furthermore, the term "comprising," and any variations thereof, are intended to cover non-exclusive inclusion. For example, a process, method, apparatus, product, or device that includes a series of steps or units is not limited to the listed steps or modules, but may optionally include steps or modules not listed, or may optionally include other step units inherent to these processes, methods, apparatuses, products, or devices.
[0256] Those skilled in the art will recognize that the units and algorithm steps of the various examples described in conjunction with the embodiments disclosed herein can be implemented in electronic hardware, computer software, or a combination of both. To clearly illustrate the interchangeability of hardware and software, the components and steps of the various examples have been generally described in terms of functionality in the foregoing description. Whether these functions are implemented in hardware or software depends on the specific application and design constraints of the technical solution. Those skilled in the art can use different methods to implement the described functions for each specific application, but such implementations should not be considered beyond the scope of this application.
[0257] The above-disclosed embodiments are merely preferred embodiments of this application and should not be construed as limiting the scope of this application. Therefore, any equivalent variations made in accordance with the claims of this application shall still fall within the scope of this application.
Claims
1. An image processing method, characterized in that, The method includes: Acquire a business image to be subjected to image quality inspection; the business image is an image under a target business scenario determined in at least one business scenario; each business scenario in the at least one business scenario is configured with a corresponding standard image set; In the standard image set corresponding to the target business scenario, a target standard image that matches the business image is retrieved, and the target image quality data bound to the target standard image is obtained. When the business feature data of the business image and the target standard feature data of the target standard image are obtained, the business feature data, the target standard feature data and the target image quality data are spliced together to obtain multimodal feature data for inputting the target business model; When the model text prompt information for the target business model is obtained, the multimodal feature data and the model text prompt information are input into the target business model. The target business model then performs image quality detection on the multimodal feature data according to the model text prompt information to obtain image quality detection data.
2. The method according to claim 1, characterized in that, The step of retrieving a target standard image that matches the business image from the standard image set corresponding to the target business scenario includes: The business image is processed by a feature extractor to obtain the business feature data of the business image; Obtain the standard feature data corresponding to each standard image in the standard image set corresponding to the target business scenario; the standard feature data corresponding to each standard image is obtained by the feature extractor performing feature extraction processing on each standard image; Based on the feature similarity between the business feature data and the standard feature data corresponding to each standard image, the image similarity between each standard image and the business image is determined. From the standard image set corresponding to the target business scenario, find a standard image whose image similarity to the business image meets the similarity condition, and determine the found standard image as the target standard image that matches the business image.
3. The method according to any one of claims 1-2, characterized in that, The target standard image includes M standard images, where M is a positive integer; the target standard feature data includes standard feature data corresponding to each of the M standard images; the target image quality data includes image sharpness data corresponding to each of the M standard images. The process of concatenating the business feature data, the target standard feature data, and the target image quality data to obtain multimodal feature data for inputting into the target business model includes: The standard feature data corresponding to each of the M standard images are subjected to a first stitching process to obtain the first stitched feature data; The image sharpness data corresponding to each of the M standard images is subjected to a second stitching process to obtain second stitching feature data; the service mode indicated by the second stitching feature data is different from the service mode indicated by the first stitching feature data; Obtain a splicing function for multimodal splicing processing, and use the splicing function to splice the business feature data, the first spliced feature data, and the second spliced feature data to obtain multimodal feature data for inputting the target business model.
4. The method according to any one of claims 1-3, characterized in that, The process involves inputting the multimodal feature data and the model text prompt information into the target business model, and then having the target business model perform image quality detection on the multimodal feature data according to the model text prompt information to obtain image quality detection data, including: Based on the multimodal feature data and the model text prompt information, multimodal model prompt information is constructed for the target business model, and the multimodal model prompt information is input into the target business model; The target business model performs image quality detection on the multimodal feature data according to the multimodal model prompt information to obtain the first quality detection result of the business image; The first quality detection result is determined as the image quality detection data.
5. The method according to any one of claims 1-4, characterized in that, The process involves inputting the multimodal feature data and the model text prompt information into the target business model, and then having the target business model perform image quality detection on the multimodal feature data according to the model text prompt information to obtain image quality detection data, including: Obtain the image quality rules for the target business scenario; the image quality rules are configured specifically for the target business scenario. Based on the image quality rules, multimodal feature data, and model text prompt information, multimodal model prompt information is constructed for the target business model, and the multimodal model prompt information is input into the target business model; The target business model performs image quality detection on the multimodal feature data according to the multimodal model prompt information to obtain the first quality detection result of the business image; A second quality detection result for the service image is obtained by using the image quality rules and the first quality detection result; Based on the first quality detection result and the second quality detection result, the image quality detection data is determined.
6. The method according to any one of claims 1-5, characterized in that, The image quality rules include N sub-rules, where N is a positive integer; The step of obtaining a second quality detection result for the service image using the image quality rules and the first quality detection result includes: From the N sub-rules, determine the target sub-rule that matches the business image; The target sub-rule is invoked to adjust the quality result of the first quality detection result, thereby obtaining a second quality detection result for the service image.
7. The method according to any one of claims 1-6, characterized in that, The first quality detection result is used to indicate a first image quality score of the service image, and the second quality detection result is used to indicate a second image quality score of the service image; The step of determining image quality detection data based on the first quality detection result and the second quality detection result includes: Obtain a first scoring weight for the first image quality score and a second scoring weight for the second image quality score; Based on the first scoring weight and the second scoring weight, the first image quality score and the second image quality score are weighted and fused to obtain image quality detection data.
8. The method according to any one of claims 1-7, characterized in that, The business feature data is obtained by performing feature extraction processing on the business image using a feature extractor, which includes a semantic feature extractor and a quality feature extractor. The method further includes: The semantic feature extractor is used to extract semantic features from the business image to obtain the semantic feature data of the business image. The quality feature extractor is used to extract quality features from the service image to obtain the quality feature data of the service image. The semantic feature data and the quality feature data are subjected to feature fusion processing to obtain the business feature data of the business image.
9. The method according to any one of claims 1-8, characterized in that, The method further includes: Obtain the first sample image used to train the initial semantic feature extractor; The first sample image is processed by the initial semantic feature extractor to extract semantic features, thereby obtaining the sample semantic feature data of the first sample image; The first sample image is scaled to obtain a scaled sample image corresponding to the first sample image. The initial semantic feature extractor is used to extract semantic features from the scaled sample image to obtain the scaled semantic feature data of the scaled sample image. Based on the sample semantic feature data and the scaled semantic feature data, the parameters of the initial semantic feature extractor are adjusted, and the initial semantic feature extractor after parameter adjustment is determined as the semantic feature extractor.
10. The method according to any one of claims 1-9, characterized in that, The method further includes: Obtain a second sample image for training the initial quality feature extractor; The initial quality feature extractor is used to extract quality features from the second sample image to obtain sample quality feature data of the second sample image. The second sample image is segmented to obtain multiple image slices corresponding to the second sample image. The multiple image slices are randomly combined to obtain a combined sample image corresponding to the second sample image. The arrangement and display order of the multiple image slices in the combined sample image is different from the arrangement and display order of the multiple image slices in the second sample image. The combined sample image is processed by the initial quality feature extractor to extract quality features, and the combined quality feature data of the combined sample image is obtained. Based on the sample quality feature data and the combined quality feature data, the parameters of the initial quality feature extractor are adjusted, and the initial quality feature extractor after parameter adjustment is determined as the quality feature extractor.
11. The method according to any one of claims 1-10, characterized in that, Each of the at least one business scenario has a corresponding scenario identifier. The acquisition of the service image to be subjected to image quality detection includes: The service video to be image quality inspected is acquired, and a target video frame is selected from multiple video frames included in the service video; the service video is associated with a target scene identifier. Find the business scenario that matches the target scenario identifier from the at least one business scenario; If a target business scene that matches the target scene identifier is found, the target video frame is identified as the business image to be subjected to image quality detection.
12. The method according to any one of claims 1-11, characterized in that, The method further includes: Obtain sample training data for training the initial business model; the sample training data includes a third sample image, an image quality label associated with the third sample image, and a sample standard image associated with the third sample image; the sample standard image is bound to sample image quality data; When the sample feature data corresponding to the third sample image and the sample standard feature data corresponding to the sample standard image are obtained, the sample feature data, the sample standard feature data and the sample image quality data are spliced together to obtain sample multimodal feature data for inputting the initial business model; Image quality detection is performed on the multimodal feature data of the sample using an initial business model to obtain the predicted image quality detection result; Based on the predicted image quality detection results and the image quality labels, the model parameters of the initial business model are fine-tuned to obtain the target business model.
13. An image processing apparatus, characterized in that, The device includes: An image acquisition unit is used to acquire a business image to be subjected to image quality detection; the business image is an image under a target business scenario determined in at least one business scenario; each business scenario in the at least one business scenario is configured with a corresponding standard image set. The image retrieval unit is used to retrieve a target standard image that matches the business image from the standard image set corresponding to the target business scenario, and to obtain the target image quality data bound to the target standard image. The feature data splicing unit is used to splice the business feature data, the target standard feature data, and the target image quality data when the business feature data of the business image and the target standard feature data of the target standard image are obtained, and to splice them to obtain multimodal feature data for inputting the target business model; An image quality detection unit is used to input the multimodal feature data and the model text prompt information into the target business model when a model text prompt information for the target business model is obtained, and the target business model performs image quality detection on the multimodal feature data according to the model text prompt information to obtain image quality detection data.
14. A computer device, characterized in that, Including memory and processor; The memory is connected to the processor, the memory is used to store computer programs, and the processor is used to invoke the computer programs so that the computer device performs the method according to any one of claims 1-12.
15. A computer-readable storage medium, characterized in that, The computer-readable storage medium stores a computer program adapted to be loaded and executed by a processor to cause a computer device having the processor to perform the method of any one of claims 1-12.
16. A computer program product, characterized in that, Includes a computer program / instruction that, when executed by a processor, implements the method according to any one of claims 1-12.