An image processing method, apparatus, device and storage medium

By determining the category information of each pixel through a document image segmentation model and calculating document image quality indicators, the problem of complex and time-consuming document image quality assessment is solved, and efficient and reliable document image quality assessment and text recognition are achieved.

CN115937657BActive Publication Date: 2026-06-30TENCENT TECHNOLOGY (SHENZHEN) CO LTD

Patent Information

Authority / Receiving Office
CN · China
Patent Type
Patents(China)
Current Assignee / Owner
TENCENT TECHNOLOGY (SHENZHEN) CO LTD
Filing Date
2021-09-30
Publication Date
2026-06-30

AI Technical Summary

Technical Problem

Existing technologies suffer from low image quality during document image acquisition due to factors such as lighting and occlusion, which affects the accuracy of text recognition. Furthermore, existing quality assessment methods are complex, cumbersome, and time-consuming.

Method used

Semantic segmentation is performed using a document image segmentation model to determine the document category information for each pixel, calculate the document image quality index, and use the image as the target document image when it meets the preset conditions.

Benefits of technology

It simplifies the document image quality assessment process, improves the efficiency and reliability of determining document image quality information, and enhances the accuracy of text recognition.

✦ Generated by Eureka AI based on patent content.

Smart Images

  • Figure CN115937657B_ABST
    Figure CN115937657B_ABST
Patent Text Reader

Abstract

This application discloses an image processing method, apparatus, device, and storage medium. The method includes acquiring an original document image; inputting the original document image into a document image segmentation model for semantic segmentation to obtain document category information corresponding to each pixel in the original document image; calculating a document image quality index corresponding to the original document image based on the document category information; and using the original document image as a target document image when the document image quality index meets a preset quality condition. The technical solution provided by this application can determine the document category information of each pixel in the original document image by combining a document image segmentation model, and then quickly obtain the document image quality index based on the document category information of each pixel in the original document image, which is beneficial for efficiently and reliably determining the quality information of the original document image.
Need to check novelty before this filing date? Find Prior Art

Description

Technical Field

[0001] This application relates to the field of computer technology, specifically to an image processing method, apparatus, device, and storage medium. Background Technology

[0002] With the continuous development of the internet and information technology, people are increasingly inclined to store information and process business online. This requires submitting captured document images (images containing text information) during business processes. Then, intelligent document recognition technology is used to identify the content of the submitted document images, obtaining relevant business data for electronic storage, or for business processing (such as the identification of document information, address information, and the identification and storage of paper transaction vouchers). However, the effectiveness of intelligent document recognition technology is highly sensitive to the quality of the submitted document images. During the document image acquisition process, various factors such as lighting, occlusion, and movement can lead to low-quality document images, resulting in reduced document recognition accuracy, inaccurate business data, or loss of key information, severely hindering subsequent business processes. Therefore, it is necessary to determine the quality of the captured document images and, based on the quality, whether the next step can be performed.

[0003] Currently, in some insurance claims and underwriting processes, customers often need to upload images of medical or insurance documents such as insurance policies, medical bills, ID cards, and bank cards. When recognizing text information in these documents, image quality analysis is often required to determine images that meet quality requirements, thus ensuring the accuracy of subsequent text recognition. Related technologies typically first determine the sharpness value of each pixel in the corresponding text area of ​​the document image. Then, based on the sharpness value of each pixel, a final sharpness value is determined, and this final sharpness value is used to assess whether the image quality meets the requirements for information extraction in the business scenario. However, this method of quality assessment based on document image sharpness requires calculating the sharpness value of each pixel, which is complex, cumbersome, and time-consuming. Furthermore, the quality analysis heavily relies on the accuracy of the sharpness analysis. Therefore, a more efficient and accurate solution is needed. Summary of the Invention

[0004] To address the problems of the prior art, this application provides an image processing method, apparatus, device, and storage medium. The technical solution is as follows:

[0005] This application provides an image processing method, the method comprising:

[0006] Obtain the original document image;

[0007] The original document image is input into a document image segmentation model to perform semantic segmentation of the document image, thereby obtaining the document category information corresponding to each pixel in the original document image;

[0008] Based on the document category information, calculate the document image quality index corresponding to the original document image;

[0009] When the document image quality index meets the preset quality conditions, the original document image is used as the target document image.

[0010] Another aspect of this application provides an image processing apparatus, the apparatus comprising:

[0011] The raw image acquisition module is used to acquire raw document images;

[0012] The image semantic segmentation module is used to input the original document image into the document image segmentation model to perform document image semantic segmentation, and obtain the document category information corresponding to each pixel in the original document image;

[0013] The quality index calculation module is used to calculate the document image quality index corresponding to the original document image based on the document category information.

[0014] The target document image determination module is used to use the original document image as the target document image when the document image quality index meets the preset quality conditions.

[0015] In another aspect, this application provides an apparatus comprising a processor and a memory, wherein the memory stores at least one instruction or at least one program, the at least one instruction or the at least one program being loaded and executed by the processor to implement the image processing method described above.

[0016] In another aspect, this application provides a computer-readable storage medium storing at least one instruction or at least one program, which is loaded and executed by a processor to implement the image processing method described above.

[0017] In another aspect, this application provides a computer program product, including computer instructions that, when executed by a processor, implement the image processing method described above.

[0018] The image processing method, apparatus, device, and storage medium provided in this application have the following technical effects:

[0019] This application acquires an original document image; inputs the original document image into a document image segmentation model for semantic segmentation to obtain document category information corresponding to each pixel in the original document image; calculates a document image quality index corresponding to the original document image based on the document category information; and uses the original document image as the target document image when the document image quality index meets a preset quality condition. This method can determine the document category information of each pixel in the original document image using a document image segmentation model, and then quickly obtain the document image quality index based on the document category information of each pixel in the original document image. This facilitates convenient and efficient determination of the quality information of the original document image, and the pixel-based approach can very finely distinguish low-quality text, high-quality text, and background parts in the entire document image, thus improving the reliability of determining the quality information of the original document image.

[0020] Additional aspects and advantages of this application will be set forth in part in the description which follows, and in part will be obvious from the description, or may be learned by practice of this application. Attached Figure Description

[0021] To more clearly illustrate the technical solutions and advantages in the embodiments of this application or the prior art, the drawings used in the description of the embodiments or the prior art will be briefly introduced below. Obviously, the drawings described below are only some embodiments of this application. For those skilled in the art, other drawings can be obtained based on these drawings without creative effort.

[0022] Figure 1 This is a schematic diagram of an application environment provided in an embodiment of this application;

[0023] Figure 2 This is a flowchart of an image processing method provided in an embodiment of this application;

[0024] Figure 3 This is a schematic diagram of the structure of a document image segmentation model provided in an embodiment of this application;

[0025] Figure 4 This is a flowchart of another image processing method provided in the embodiments of this application;

[0026] Figure 5 This is a flowchart of another image processing method provided in the embodiments of this application;

[0027] Figure 6 This is a flowchart of another image processing method provided in the embodiments of this application;

[0028] Figure 7 This is a flowchart of another image processing method provided in the embodiments of this application;

[0029] Figure 8 This is a flowchart of another image processing method provided in the embodiments of this application;

[0030] Figure 9 This is a flowchart of another image processing method provided in the embodiments of this application;

[0031] Figure 10 This is a schematic diagram of an image processing apparatus provided in an embodiment of this application;

[0032] Figure 11 This is a hardware structure block diagram of an image processing server provided in an embodiment of this application. Detailed Implementation

[0033] The technical solutions of the embodiments of this application will be clearly and completely described below with reference to the accompanying drawings. Obviously, the described embodiments are only some embodiments of this application, and not all embodiments. Based on the embodiments of this application, all other embodiments obtained by those of ordinary skill in the art without creative effort are within the scope of protection of this application. Examples of the embodiments are shown in the accompanying drawings, wherein the same or similar reference numerals denote the same or similar elements or elements having the same or similar functions throughout.

[0034] It should be noted that the terms "first," "second," etc., in the specification, claims, and accompanying drawings of this application are used to distinguish similar objects and are not necessarily used to describe a specific order or sequence. It should be understood that such data can be interchanged where appropriate so that the embodiments of this application described herein can be implemented in orders other than those illustrated or described herein. Furthermore, the terms "comprising" and "having," and any variations thereof, are intended to cover non-exclusive inclusion; for example, a process, method, system, product, or server that comprises a series of steps or units is not necessarily limited to those steps or units explicitly listed, but may include other steps or units not explicitly listed or inherent to such processes, methods, products, or devices.

[0035] Artificial intelligence (AI) is the theory, methods, technology, and application systems that use digital computers or machines controlled by digital computers to simulate, extend, and expand human intelligence, perceive the environment, acquire knowledge, and use that knowledge to achieve optimal results. In other words, AI is a comprehensive technology within computer science that attempts to understand the essence of intelligence and produce a new kind of intelligent machine that can react in a way similar to human intelligence. AI studies the design principles and implementation methods of various intelligent machines, enabling them to possess the functions of perception, reasoning, and decision-making.

[0036] Artificial intelligence (AI) is a comprehensive discipline encompassing a wide range of fields, including both hardware and software technologies. Fundamental AI technologies generally include sensors, dedicated AI chips, cloud computing, distributed storage, big data processing, operating / interactive systems, and mechatronics. AI software technologies primarily include computer vision, speech processing, natural language processing, as well as machine learning / deep learning, autonomous driving, and intelligent transportation.

[0037] Computer vision (CV) is a science that studies how to enable machines to "see." More specifically, it refers to machine vision, which uses cameras and computers to replace human eyes in tasks such as target recognition, tracking, and measurement, and further performs image processing to create images more suitable for human observation or transmission to instruments. As a scientific discipline, computer vision studies related theories and technologies, attempting to build artificial intelligence systems capable of extracting information from images or multidimensional data. Computer vision technologies typically include image processing, image recognition, image semantic understanding, image retrieval, OCR, video processing, video semantic understanding, video content / behavior recognition, 3D object reconstruction, 3D technology, virtual reality, augmented reality, simultaneous localization and mapping (SLAM), autonomous driving, intelligent transportation, and other technologies, as well as common biometric recognition technologies such as facial recognition and fingerprint recognition.

[0038] The solutions provided in this application involve technologies such as computer vision technology in artificial intelligence, and are specifically illustrated through the following embodiments.

[0039] Please see Figure 1 , Figure 1 This is a schematic diagram of an application environment provided in an embodiment of this application, such as... Figure 1 As shown, the application environment may include client 01 and server 02.

[0040] In this embodiment, client 01 may run an application that can provide related business services, such as electronic storage of document information and generation of corresponding electronic document information, identification and storage of paper transaction vouchers, and application to subsequent after-sales service processing. Client 01 can acquire the original document image and, if the document image quality index of the original document image meets preset quality conditions, use the original document image as the target document image in subsequent business processing. In an optional embodiment, client 01 can also be used to perform quality analysis on the original document image and calculate the document image quality index corresponding to the original document image. In one embodiment, the aforementioned client 01 may include, but is not limited to, smartphones, tablets, laptops, desktop computers, smart speakers, smartwatches, in-vehicle terminals, smart TVs, etc.

[0041] In this embodiment, server 02 can be a server providing background computing and storage services for the application in client 01; in an optional embodiment, server 01 can also be used to perform quality analysis on the original document image and calculate the document image quality index corresponding to the original document image. In this embodiment, server 02 can be an independent physical server, a server cluster or distributed system composed of multiple physical servers, or a cloud server providing basic cloud computing services such as cloud services, cloud databases, cloud computing, cloud functions, cloud storage, network services, cloud communication, middleware services, domain name services, security services, CDN, and big data and artificial intelligence platforms.

[0042] In addition, it should be noted that, Figure 1 This is merely a schematic diagram of an application environment provided by an embodiment of this application. The client 01 and server 02 described above can be directly or indirectly connected through wired or wireless communication methods, and this application is not limited thereto.

[0043] Figure 2 This is a flowchart illustrating an image processing method provided in an embodiment of this application. This specification provides the operational steps of the method as described in the embodiments or flowcharts, but based on conventional or non-inventive labor, more or fewer operational steps may be included. The order of steps listed in the embodiments is merely one possible execution order among many and does not represent the only possible execution order. In actual system or server product execution, the method can be executed sequentially or in parallel (e.g., in a parallel processor or multi-threaded processing environment) as shown in the embodiments or accompanying drawings. Specifically, as... Figure 2 As shown, the method may include:

[0044] S201: Obtain the original document image.

[0045] In this embodiment, the aforementioned original document image can refer to an electronic image containing text information. Specifically, paper documents can be converted into electronic images in various ways, such as taking a picture of the paper document with an image acquisition device like a camera to obtain a photo containing text information, or scanning the paper document with a scanner to obtain an electronic scanned image. In this embodiment, the aforementioned original document image can include document images acquired in real time using a client's image acquisition device, or document images selected from a local or online database. Using the aforementioned electronic image, the text information contained therein can be recognized, stored online, and applied to various business processes. For example, electronic storage of document information and generation of corresponding electronic document information, recognition and storage of paper transaction vouchers, and application to subsequent after-sales service processing, etc. The information storage time is long, and it can be conveniently and flexibly retrieved when needed.

[0046] In practical applications, when converting paper documents into electronic images using various methods, the resulting images may be affected by various factors (such as lighting conditions, camera shake, and obstruction), leading to low-quality images (e.g., dark images, blurry text, and partially obscured content). This makes it difficult to accurately identify text information in the document image, resulting in the loss of crucial information. Therefore, it is necessary to first determine the quality of the original document image.

[0047] S203: Input the original document image into the document image segmentation model to perform semantic segmentation of the document image, and obtain the document category information corresponding to each pixel in the original document image.

[0048] Specifically, the original document image consists of multiple pixels. After semantic segmentation of the document image by a pre-trained document image segmentation model, the document category information corresponding to each pixel in the original document image can include a first type of text, a second type of text, or a document background. The document category information can be document category information based on text quality. The text quality of the pixel corresponding to the first type of text is higher than that of the pixel corresponding to the second type of text. In this embodiment, the first type of text can indicate that the corresponding pixel belongs to high-quality text, the second type of text can indicate that the corresponding pixel belongs to low-quality text, and the document background can indicate that the corresponding pixel belongs to document background. That is, the pixel is not a text pixel, and each pixel belongs to one of these three categories. In a specific embodiment, the document category information of each pixel can be visually labeled in the original document image, which helps users quickly and clearly perceive the distribution of pixel categories in the text. In an optional embodiment, pixels whose document category information includes the first type of text can be marked in green, pixels whose document category information includes the second type of text can be marked in red, and pixels whose document category information includes the document background can be marked in blue.

[0049] In an optional embodiment, the document category information may include a first probability corresponding to a first type of text, a second probability corresponding to a second type of text, and a third probability corresponding to the document background. In this embodiment, the first probability may represent the probability that the corresponding pixel is high-quality text, the second probability may represent the probability that the corresponding pixel is low-quality text, and the third probability may represent the probability that the corresponding pixel is document background, i.e., the probability that the pixel is not a text pixel. The document category information of each pixel may include these three probabilities simultaneously, wherein the category corresponding to the highest probability among the three probabilities is the category of the pixel.

[0050] In this embodiment, the original document image is input into a document image segmentation model for semantic segmentation. Specifically, a pre-trained document image segmentation model is used to classify each pixel in the original document image into three categories (first category: text; second category: text; or document background). Since semantic segmentation requires determining the category of each pixel in the image to achieve accurate segmentation, leveraging the text-centric nature of document images, the training of the document image segmentation model only needs to enable it to distinguish between high-quality text pixels, low-quality text pixels, and document background pixels. This allows the model to determine the document category of each pixel during prediction, thus comprehensively determining the document image's quality. For pixels belonging to text, the corresponding quality value is not directly evaluated. Instead, along with background segmentation, these text pixels are categorized into high-quality and low-quality categories. This integrates text segmentation and text quality information determination into a single task, greatly simplifying the document image quality determination process. There is no need to first determine the text regions in the image and then perform quality analysis on each text region. Only one classification using a single model is needed, and the document image quality is determined by combining the classification results. This achieves end-to-end quality analysis, significantly improving the efficiency and reliability of determining document image quality information.

[0051] In this embodiment, the document image segmentation model described above may include, but is not limited to, ResNet (Deep Residual Network), FCN (Fully Convolutional Networks), U-Net, PSENet (Progressive Scale Expansion Network), etc. In related technologies, text detection requires the separation of text boxes, precise location of text boundaries, and detection of input images of different sizes. However, in this embodiment, the main requirement is to classify the pixels in the original document image into three categories (first-class text, second-class text, or document background) to determine the document image quality. Precise text region segmentation is not required. Therefore, the model structure used for text detection can be simplified (e.g., by removing the multi-scale feature fusion process) to obtain the document image segmentation model in this embodiment. In an optional embodiment, please refer to... Figure 3 The aforementioned document image segmentation model can include convolutional pooling components and fully connected layers. The convolutional pooling component is used to extract features from the original document image, while the fully connected layers are used to determine the document category information for each pixel. This simplifies the complex model structure while ensuring good segmentation results, avoiding redundant computation, improving the efficiency of determining document image quality information, and reducing the training difficulty of the document image segmentation model.

[0052] In this embodiment, the document image segmentation model is trained beforehand using sample document images labeled with document category tags for semantic segmentation. Each document category tag includes a first type of text, a second type of text, or a document background. The method may also include the training process of the document image segmentation model; for details, please refer to [link to relevant documentation]. Figure 4 The training process of the above document image segmentation model may include:

[0053] S401: Obtain multiple sample document images labeled with document category tags.

[0054] In this embodiment, the sample document images described above can also refer to electronic images containing text information. The document category label annotated for each sample document image can include a first type of text, a second type of text, or a document background; the text quality of the sample document image corresponding to the first type of text is higher than the text quality of the sample document image corresponding to the second type of text. In this embodiment, the first type of text can indicate that the corresponding sample document image belongs to the category of high-quality text, the second type of text can indicate that the corresponding sample document image belongs to the category of low-quality text, and the document background can indicate that the corresponding sample document image belongs to the category of document background, meaning that the sample document image does not contain text information, and each sample document image belongs to one of these three categories. In this embodiment, a large number of sample document images labeled with document category tags can be obtained to fully train the document image segmentation model described above; for example, the number of sample document images can be 50 million, which is beneficial to improving the reliability of model training.

[0055] In an optional embodiment, the sample document images labeled with document category tags can be determined based on text content recognition results from the original sample document images and actual text information. Please refer to... Figure 5 The acquisition of multiple sample document images labeled with document category tags can include:

[0056] S501: Obtain multiple sample original document images.

[0057] In the embodiments of this application, the above-mentioned original document image can indicate an electronic image containing text information in a real scene.

[0058] S503: Input the above multiple sample original document images into the preset text content recognition model to perform text content recognition, and obtain multiple text recognition regions labeled with content recognition results.

[0059] In this embodiment, the aforementioned preset text content recognition model can be obtained by pre-training on a large number of sample text images labeled with text content information. The aforementioned text recognition region can be an image region containing text information recognized by the aforementioned preset text content recognition model. In an optional embodiment, the aforementioned image region can be a text line in the original sample document image. The labeled content recognition result is the text content information corresponding to the text recognition region.

[0060] S505: Determine the true text information for each text recognition region.

[0061] Specifically, the aforementioned real text information can indicate the actual text content information in the corresponding text recognition area. In one optional embodiment, the context information of each text recognition area (e.g., the content recognition result corresponding to the text recognition area within a preset range of the text recognition area) can be obtained to determine the real text information of each text recognition area. In another optional embodiment, the real text information of each text recognition area can be obtained and stored in advance in response to a text information instruction. In this case, the real text information of each text recognition area can be input by the developer, and this application is not limited thereto.

[0062] S507: Based on the real text information and content recognition results of each of the above text recognition regions, determine the recognition confidence result corresponding to each of the above text recognition regions.

[0063] In a specific embodiment, for each of the above text recognition regions, when the corresponding content recognition result is consistent with the real text information, the corresponding recognition confidence result indicates that the recognition is correct; when the corresponding content recognition result is inconsistent with the real text information, the corresponding recognition confidence result indicates that the recognition is incorrect.

[0064] S509: The text recognition regions that are correctly identified by the above recognition confidence results are used as sample document images labeled as the first type of text, and the text recognition regions that are incorrectly identified by the above recognition confidence results are used as sample document images labeled as the second type of text.

[0065] In this embodiment, the accuracy of the text content recognition model's recognition result depends on the quality of the region it recognizes (e.g., lighting conditions, whether it is blurry, whether there is partial occlusion, etc.). If the quality of the region it recognizes is high, the text content recognition model is more reliable; if the quality of the region it recognizes is low, the probability of the text content recognition model making a mistake increases significantly. Therefore, when the recognition confidence result of a text recognition region indicates that the recognition is correct, the text recognition region is labeled as first-class text (high-quality text); when the recognition confidence result of a text recognition region indicates that the recognition is incorrect, the text recognition region is labeled as second-class text (low-quality text). For example, when the real text information corresponding to a text recognition region is "globulin," but the model's recognition result is "globulin day," the corresponding recognition confidence result indicates that the recognition is incorrect, and the text recognition region is labeled as second-class text (low-quality text).

[0066] Using the confidence level of the text content recognition model's recognition results to determine the samples corresponding to high-quality text and low-quality text is scientific, objective, and conducive to obtaining reliable training data.

[0067] S511: Based on the above multiple text recognition regions labeled with content recognition results, determine the document background region in the above multiple original document images of the samples.

[0068] Specifically, based on the multiple text recognition regions labeled with content recognition results, the unlabeled regions in the original document images of the multiple samples can be determined as the document background regions.

[0069] S513: Use the above-mentioned document background area as a sample document image labeled as the above-mentioned document background.

[0070] In this embodiment, the text content of the original document image is recognized by combining a preset text content recognition model. The confidence level of the text content recognition results is used to determine the samples corresponding to high-quality text and the samples corresponding to low-quality text. Furthermore, the unmarked areas in the actual document image are determined as document background areas based on the text content recognition results. This approach is scientific and objective, which is conducive to obtaining reliable training data and makes the trained document image segmentation model more reliable.

[0071] In an optional embodiment, after identifying the text recognition region, the category level of pixels at the boundary of the text recognition region can be labeled according to the text recognition region. This category level can indicate the tendency of the pixel towards the first type of text and the second type of text. For example, there are 10 category levels from 1 to 10, where 1 represents the second type of text and 10 represents the first type of text. The larger the value, the more the pixel's category tends to be the first type of text. Using such continuous category levels to label pixels at the boundary of the text recognition region helps to make the document image segmentation model more robust.

[0072] S403: Based on the above-mentioned sample document images labeled with document category tags, train the preset neural network model for document image semantic segmentation. During the training of document image semantic segmentation, adjust the model parameters of the preset neural network model until the preset neural network model meets the preset convergence condition, and obtain the document image segmentation model.

[0073] In this embodiment, the aforementioned preset neural network model may include, but is not limited to, model structures such as ResNet (Deepresidual network), FCN (Fully Convolutional Network), U-Net, and PSENet (Progressive Scale Expansion Network). Furthermore, the initial model parameters of the preset neural network model can use the model parameters of a text detection model, improving the efficiency of model training. In an optional embodiment, gradient descent can be used for iterative training of the model. The learning rate of the SGD (stochastic gradient descent) optimizer and the weight decay applied to all layers can be set according to the actual application requirements.

[0074] In an optional embodiment, the aforementioned preset convergence condition may include the loss value corresponding to the target loss function being less than or equal to a preset threshold. Specifically, the aforementioned target loss function can be determined according to actual application requirements. In one embodiment, to more effectively control the weights of easily classifiable samples (text and background) and difficult-to-classify samples (first-class text and second-class text), and to enhance the classification effect of the first-class text and second-class text in the document image, the aforementioned target loss function can be Focal loss. Specifically, the formula for Focal loss can refer to Equation 1, where p... t Calculated from Equation 2:

[0075] FL(p t )=-α t (1-p t )γ log(p t )

[0076] Formula 1

[0077] Among them, FL(p t ) represents the loss value corresponding to a pixel. In one embodiment, the loss value corresponding to all pixels in a sample document image is the loss value corresponding to the target loss function; p t p represents the accuracy of the classification. t The smaller the value, the lower the accuracy of classification; (1-p t (1-p) represents the modulation factor, and γ represents the focusing parameter, which can be set according to actual training needs; t ) γ α represents the modulation coefficient. t The value of p can be determined based on actual training needs. When a sample is misclassified, p... t It is very small, so the modulation factor (1-p) t When p is close to 1, the loss value is not affected; when p t When it is close to 1, the modulation factor (1-p) t When the modulation coefficient approaches 0, the weights of the better-scoring samples are lowered. Therefore, when the modulation coefficient approaches 1, the change in loss value compared to the original value is relatively small. And when p... t When γ approaches 1 (at which point the sample is correctly classified and easily classified), the modulation coefficient approaches 0, and its corresponding loss value is very small, meaning its contribution to the total loss value is minimal. When γ = 0, Focal loss is the traditional cross-entropy loss; as γ increases, the modulation coefficient also increases. Adjusting the parameter γ smoothly regulates the proportion of easily classified samples with lower weights. Increasing γ enhances the influence of the modulation factor; in an optional embodiment, γ = 2. The modulation factor reduces the loss contribution of easily classified samples, broadening the range of low-loss sample reception. When γ is constant, for example, equal to 2, easily classified samples (e.g., p...)... t The loss value of (=0.9) is more than a hundred times smaller than the standard cross-entropy loss value, but for hard-to-classify samples (e.g., p), t With a weighting of less than 0.5, the loss value is reduced by up to four times. This significantly increases the weight of hard-to-classify samples, thus amplifying the importance of misclassified samples. Therefore, using Focal loss as the loss function reduces the weight of easily classified samples, allowing the model to focus more on hard-to-classify samples during training, improving the reliability of model training, and consequently, the reliability of the resulting document image segmentation model.

[0078] S205: Based on the above document category information, calculate the document image quality index corresponding to the above original document image.

[0079] In this embodiment, the document image quality index can indicate the overall text quality information of the original document image. In one embodiment, the larger the document image quality index, the higher the overall text quality of the original document image, and thus the higher the probability of obtaining accurate recognition results (e.g., business data) by performing content recognition on the original document image. The document image quality index corresponding to the original document image can be calculated efficiently and reliably based on the document category information of all pixels in the original document image.

[0080] In an optional embodiment, the document category information includes a first type of text, a second type of text, or a document background, wherein the text quality of the pixel corresponding to the first type of text is higher than the text quality of the pixel corresponding to the second type of text. In this embodiment, the first type of text may indicate that the corresponding pixel is high-quality text, the second type of text may indicate that the corresponding pixel is low-quality text, and the document background may indicate that the corresponding pixel is a document background, meaning that the pixel is not a text pixel, and each pixel belongs to one of these three categories.

[0081] For details, please refer to Figure 6 The above-mentioned calculation of the document image quality index corresponding to the original document image based on the document category information may include:

[0082] S601: Based on the above document category information, determine the number of first pixels corresponding to the first type of text and the number of second pixels corresponding to the second type of text.

[0083] In this embodiment of the application, the first pixel count can indicate the total number of pixels in the original document image that are classified as high-quality text, and the second pixel count can indicate the total number of pixels in the original document image that are classified as low-quality text.

[0084] S603: Calculate the pixel percentage information corresponding to the first type of text based on the first pixel count and the second pixel count.

[0085] In a specific embodiment, the pixel ratio information corresponding to the first type of text can be calculated using the following formula 2:

[0086]

[0087] S605: Use the above pixel ratio information as the image quality index of the above document.

[0088] In this embodiment of the application, by determining the total number of pixels belonging to the category of high-quality text and the total number of pixels belonging to the category of low-quality text based on the document category information of each pixel in the original document image, the pixel ratio information corresponding to high-quality text can be quickly determined as a document image quality indicator corresponding to the original document image, which is beneficial for efficiently and reliably determining the document image quality corresponding to the original document image.

[0089] In another optional embodiment, the document category information may include a first probability corresponding to a first type of text, a second probability corresponding to a second type of text, and a third probability corresponding to the document background. In this embodiment, the first probability may represent the probability that the corresponding pixel is high-quality text, the second probability may represent the probability that the corresponding pixel is low-quality text, and the third probability may represent the probability that the corresponding pixel is document background, i.e., the probability that the pixel is not a text pixel. The document category information of each pixel may include these three probabilities simultaneously, wherein the category corresponding to the highest probability among the three probabilities is the category of the pixel.

[0090] For details, please refer to Figure 7 The above-mentioned calculation of the document image quality index corresponding to the original document image based on the document category information may include:

[0091] S701: Obtain the first weight corresponding to the first type of text, the second weight corresponding to the second type of text, and the third weight corresponding to the document background.

[0092] In this embodiment, the first weight can indicate the degree of influence of the category "first type of text" (i.e., high-quality text) on the pixel text quality, the second weight can indicate the degree of influence of the category "second type of text" (i.e., low-quality text) on the pixel text quality, and the third weight can indicate the degree of influence of the category "document background" on the pixel text quality. The specific values ​​of the first, second, and third weights can be set in combination with the actual quality test results and application requirements.

[0093] S703: Based on the first probability, second probability, and third probability corresponding to each pixel, and the first weight, second weight, and third weight, a weighted sum is performed to obtain the quality weighted value corresponding to each pixel.

[0094] Specifically, the aforementioned quality weighting value can indicate the text quality information of the corresponding pixel. In a specific embodiment, the larger the quality weighting value, the higher the text quality of the corresponding pixel. The specific process of weighted summation can be referred to Equation 3:

[0095] Q i =a*P i1+b*P i2 +c*P i3

[0096] Formula 3

[0097] Among them, Q i This represents the quality weighting value corresponding to pixel i in the original document image above, where a represents the first weight, b represents the second weight, c represents the third weight, and P... i1 P represents the first probability corresponding to pixel i. i2 P represents the second probability corresponding to pixel i. i3 This represents the third probability corresponding to pixel i.

[0098] S705: Based on the above quality weighting values, determine the document image quality index corresponding to the above original document image.

[0099] In one specific embodiment, determining the document image quality index corresponding to the original document image based on the aforementioned quality weighting value may include: determining the number of pixels in the original document image; calculating the quality-weighted average value corresponding to the original document image based on the quality weighting value and the number of pixels in the original document image; and using the aforementioned quality-weighted average value as the document image quality index.

[0100] By separately obtaining the weights corresponding to each category and summing the probabilities of each category for each pixel in the original document image, the quality weighted value of each pixel in the original document image is obtained. Then, the quality weighted values ​​of all pixels in the original document image are combined to determine the document image quality index. This approach focuses not only on the quality of the text region in the document image but also on the quality of each pixel in the document. It can better focus on the quality of the boundary buffer area between text and background, which helps to improve the reliability of the document image quality determination.

[0101] S207: When the above document image quality indicators meet the preset quality conditions, the above original document image is used as the target document image.

[0102] In one specific embodiment, the preset quality condition may include the document image quality index being greater than or equal to a preset threshold, wherein the preset threshold can be set according to actual application requirements. The target document image can indicate that the document image quality index meets the preset quality condition and can be used as a document image in subsequent text recognition, business processing, and other processes.

[0103] By calculating the document image quality index corresponding to the original document image, when the document image quality index meets the preset quality conditions, the original document image is used as the target document image. This helps to determine the quality of the original document image in a timely and reliable manner, thereby improving the recognition accuracy when the original document image needs to be recognized, promoting the smooth progress of subsequent business processes, and obtaining key information in a timely and accurate manner.

[0104] In one specific embodiment, please refer to Figure 8 The above methods may also include:

[0105] S801: When the above document image quality indicators do not meet the above preset quality conditions, obtain an updated document image.

[0106] In one specific embodiment, the updated document image may include a received, re-acquired document image. An updated document image may be acquired when the document image quality index does not meet the preset quality condition, i.e., when the document image quality index is less than a preset threshold. In an optional embodiment, when the document image quality index does not meet the preset quality condition, a document image update prompt message may be generated and displayed on the display unit.

[0107] S803: Using the updated document image as the original document image, perform the step of inputting the original document image into the document image segmentation model for document image semantic segmentation, and calculating the document image quality index corresponding to the original document image based on the document category information, until the document image quality index meets the preset quality conditions.

[0108] Specifically, the process of inputting the original document image into the document image segmentation model for semantic segmentation, and calculating the document image quality index corresponding to the original document image based on the document category information, is similar to steps S203 to S205. Please refer to the relevant descriptions of steps S203 to S205; they will not be repeated here. If the requirements are still not met, the process continues to acquire updated document images as the original document image and execute the above steps until the document image quality index meets the preset quality conditions. At this point, the original document image is used as the target document image.

[0109] By acquiring an updated document image when the document image quality index does not meet the preset quality conditions, and then recalculating whether the document image quality index meets the conditions, and continuing to acquire an updated document image until the document image quality index meets the preset quality conditions, it is beneficial to promptly reject the image and request a re-acquisition when the document image quality does not meet the conditions, thus avoiding the loss of document information and facilitating the rapid acquisition of reliable document images.

[0110] In one specific embodiment, please refer to Figure 9 The above methods may also include:

[0111] S901: Input the above target document image into the text content recognition model to perform text content recognition and obtain business content data.

[0112] Specifically, the aforementioned business content data can indicate text information in the target document image used for target business processing. For example, the aforementioned business content data may include, but is not limited to, identification numbers, address information, transaction time in transaction vouchers, transaction order numbers, etc.

[0113] S903: Store the above business content data locally.

[0114] In this embodiment, the identified business content data can be stored locally, enabling electronic storage of volatile information such as paper information, long-term storage of business content data, and facilitating subsequent data retrieval. In some embodiments, the aforementioned business content data can also be directly applied to subsequent business processing to achieve automated business processing, such as automatically processing after-sales procedures after uploading a photographed transaction voucher. This application is not limited to this.

[0115] After the document image quality indicators meet the preset quality conditions, the target document image is input into the text content recognition model for text content recognition. The resulting business content data can be stored locally or applied to business processes. This allows for the use of high-quality document images to obtain more reliable business content data, thereby improving the efficiency and reliability of subsequent business processing.

[0116] In this embodiment, the acquired original document image is input into a document image segmentation model for semantic segmentation, obtaining document category information corresponding to each pixel in the original document image. This allows for three-class classification (classified as first-class text, second-class text, or document background) of each pixel in the original document image, combined with a pre-trained document image segmentation model. Since semantic segmentation requires determining the category of each pixel in the image for accurate segmentation, pixels belonging to text are not directly evaluated for quality values. Instead, they are classified into high-quality and low-quality categories along with background segmentation. This integrates text segmentation and text quality information determination into a single task, significantly simplifying the process of determining document image quality. Only one model is needed for classification, and the document image quality is determined based on the classification results, achieving end-to-end quality analysis and greatly improving the efficiency and reliability of determining document image quality information. By separately obtaining the weights corresponding to each category and weighting the sum of the probabilities of each category for each pixel in the original document image, a quality weighted value for each pixel in the original document image is obtained. The document image quality index is then determined by combining the quality weighted values ​​of all pixels in the original document image. This approach focuses not only on the quality of the text region but also on the quality of each pixel in the document image, allowing for better attention to the quality of the text-background boundary buffer area, thus improving the reliability of document image quality determination. After the document image quality index meets preset quality conditions, the target document image is input into a text content recognition model for text content recognition. The resulting business content data can be stored locally or applied to business processes. High-quality document images can be used to obtain more reliable business content data, thereby improving the efficiency and reliability of subsequent business processing.

[0117] In a specific embodiment, taking insurance claims and underwriting as an example, where customers need to upload images of medical or insurance documents such as insurance policies, medical bills, ID cards, and bank cards to meet the requirements for subsequent text information recognition, after obtaining the business document images (original document images) uploaded by the user, the business document images can be input into a document image segmentation model for semantic segmentation to obtain document category information corresponding to each pixel in the business document image. Based on the identified document category information, the document image quality index corresponding to the business document image is calculated. When the document image quality index meets the preset quality conditions, the business document image is used as the target document image, and then the target document image can be input into a text content recognition model for text content recognition to obtain the business content data required by insurance claims and underwriting agents. When the document image quality index does not meet the above preset quality conditions, an updated document image (i.e., the business document image re-uploaded by the user) is obtained. Based on the re-uploaded business document image, the above-mentioned quality analysis operations such as document image semantic segmentation and document image quality index calculation are performed until the document image quality index of the re-uploaded business document image meets the preset quality conditions.

[0118] In the above embodiments, the document category information of each pixel in the business document image can be determined by combining the document image segmentation model. Then, based on the document category information divided according to document quality, the document image quality index can be obtained quickly. This is beneficial for conveniently and efficiently determining the quality information of the original document image. Moreover, based on pixels, it can very finely distinguish low-quality text, high-quality text and background parts in the entire document image, which is beneficial for improving the reliability and accuracy of determining the quality information of the original document image.

[0119] This application also provides an embodiment of an image processing apparatus, such as... Figure 10 As shown, the device may include:

[0120] The original image acquisition module 1010 is used to acquire the original document image;

[0121] The image semantic segmentation module 1020 is used to input the original document image into the document image segmentation model to perform document image semantic segmentation, and obtain the document category information corresponding to each pixel in the original document image;

[0122] The quality index calculation module 1030 is used to calculate the document image quality index corresponding to the original document image based on the document category information.

[0123] The target document image determination module 1040 is used to use the original document image as the target document image when the document image quality index meets the preset quality conditions.

[0124] In one embodiment, the document category information includes a first type of text, a second type of text, or a document background, wherein the text quality of pixels corresponding to the first type of text is higher than the text quality of pixels corresponding to the second type of text; the quality index calculation module 1030 may include:

[0125] A pixel count determination unit is used to determine, based on the document category information, the first pixel count corresponding to the first type of text and the second pixel count corresponding to the second type of text;

[0126] The pixel proportion information calculation unit is used to calculate the pixel proportion information corresponding to the first type of text based on the first pixel quantity and the second pixel quantity;

[0127] The first indicator determination unit is used to use the pixel ratio information as the document image quality indicator.

[0128] In another embodiment, the document category information includes a first probability corresponding to a first type of text, a second probability corresponding to a second type of text, and a third probability corresponding to the document background, wherein the text quality of the pixels corresponding to the first type of text is higher than the text quality of the pixels corresponding to the second type of text; the quality index calculation module 1030 may include:

[0129] The weight acquisition unit is used to acquire the first weight corresponding to the first type of text, the second weight corresponding to the second type of text, and the third weight corresponding to the document background, respectively.

[0130] The weighting unit is used to perform a weighted summation based on the first probability, second probability, and third probability corresponding to each pixel, as well as the first weight, second weight, and third weight, to obtain the quality weighted value corresponding to each pixel;

[0131] The second indicator determination unit is used to determine the document image quality indicator corresponding to the original document image based on the quality weighting value.

[0132] In one embodiment, the above-mentioned apparatus may further include:

[0133] The sample document image acquisition module is used to acquire multiple sample document images labeled with document category tags;

[0134] The model training module is used to train a preset neural network model for document image semantic segmentation based on the multiple sample document images labeled with document category tags. During the training of document image semantic segmentation, the model parameters of the preset neural network model are adjusted until the preset neural network model meets the preset convergence condition, thereby obtaining the document image segmentation model.

[0135] In one embodiment, each document category tag includes a first type of text, a second type of text, or a document background; the text quality of the sample document image corresponding to the first type of text is higher than the text quality of the sample document image corresponding to the second type of text, and the sample document image acquisition module may include:

[0136] The original document image acquisition unit is used to acquire multiple original document images of samples.

[0137] The text content recognition unit is used to input the multiple sample original document images into a preset text content recognition model to perform text content recognition and obtain multiple text recognition regions labeled with content recognition results;

[0138] The real text information determination unit is used to determine the real text information of each text recognition region.

[0139] The confidence result determination unit is used to determine the confidence result corresponding to each text recognition region based on the real text information and content recognition result of each text recognition region;

[0140] The first sample image determination unit is used to use the recognition confidence result to represent the correctly recognized text recognition area as a sample document image labeled as the first type of text, and to use the recognition confidence result to represent the incorrectly recognized text recognition area as a sample document image labeled as the second type of text.

[0141] The document background region determination unit is used to determine the document background region in the multiple sample original document images based on the multiple text recognition regions labeled with content recognition results;

[0142] The second sample image determination unit is used to use the document background area as a sample document image labeled as the document background.

[0143] In one embodiment, the above-mentioned apparatus may further include:

[0144] The document image acquisition module is updated to acquire an updated document image when the document image quality index does not meet the preset quality conditions.

[0145] The updated quality analysis module is used to take the updated document image as the original document image, and perform the steps of inputting the original document image into the document image segmentation model for document image semantic segmentation, and calculating the document image quality index corresponding to the original document image based on the document category information, until the document image quality index meets the preset quality conditions.

[0146] In one embodiment, the above-mentioned apparatus may further include:

[0147] The text content recognition module is used to input the target document image into the text content recognition model to perform text content recognition and obtain business content data;

[0148] The data storage module is used to store the business content data locally.

[0149] The components and method embodiments described in the device embodiments are based on the same application concept.

[0150] This application provides a computer device including a processor and a memory. The memory stores at least one instruction or at least one program, which is loaded and executed by the processor to implement the image processing method provided in the above method embodiments.

[0151] Memory can be used to store software programs and modules. The processor executes various functional applications and data processing by running the software programs and modules stored in the memory. Memory can primarily include a program storage area and a data storage area. The program storage area can store the operating system, application programs required for the functions, etc.; the data storage area can store data created based on the use of the device, etc. Furthermore, memory can include high-speed random access memory, and can also include non-volatile memory, such as at least one disk storage device, flash memory device, or other volatile solid-state storage device. Accordingly, memory can also include a memory controller to provide the processor with access to the memory.

[0152] The methods and embodiments provided in this application can be executed in a mobile terminal, computer terminal, server, or similar computing device; that is, the aforementioned computer device may include a mobile terminal, computer terminal, server, or similar computing device. Taking running on a server as an example... Figure 11 This is a hardware structure block diagram of a server for an image processing method provided in an embodiment of this application. For example... Figure 11As shown, the server 1100 can vary significantly due to different configurations or performance. It may include one or more central processing units (CPUs) 1110 (CPUs 1110 may include, but are not limited to, microprocessors (MCUs) or programmable logic devices (FPGAs), a memory 1130 for storing data, and one or more storage media 1120 (e.g., one or more mass storage devices) for storing application programs 1123 or data 1122. The memory 1130 and storage media 1120 may be temporary or persistent storage. The program stored in the storage media 1120 may include one or more modules, each module including a series of instruction operations on the server. Furthermore, the CPU 1110 may be configured to communicate with the storage media 1120 and execute the series of instruction operations stored in the storage media 1120 on the server 1100. Server 1100 may also include one or more power supplies 1160, one or more wired or wireless network interfaces 1150, one or more input / output interfaces 1140, and / or one or more operating systems 1121, such as Windows Server. TM Mac OS X TM Unix TM Linux TM FreeBSD TM etc.

[0153] The input / output interface 1140 can be used to receive or send data via a network. Specific examples of the network described above may include a wireless network provided by the communication provider of server 1100. In one example, the input / output interface 1140 includes a network interface controller (NIC), which can connect to other network devices via a base station to communicate with the Internet. In another example, the input / output interface 1140 may be a radio frequency (RF) module for wireless communication with the Internet.

[0154] Those skilled in the art will understand that Figure 11 The structure shown is for illustrative purposes only and does not limit the structure of the aforementioned electronic device. For example, server 1100 may also include... Figure 11 The more or fewer components shown, or having the same Figure 11 The different configurations shown.

[0155] Embodiments of this application also provide a computer-readable storage medium, which can be disposed in a server to store at least one instruction or at least one program related to implementing an image processing method in the method embodiments. The at least one instruction or the at least one program is loaded and executed by the processor to implement the image processing method provided in the above method embodiments.

[0156] Optionally, in this embodiment, the storage medium may be located at at least one of the multiple network servers in a computer network. Optionally, in this embodiment, the storage medium may include, but is not limited to, various media capable of storing program code, such as USB flash drives, read-only memory (ROM), random access memory (RAM), portable hard drives, magnetic disks, or optical disks.

[0157] Embodiments of this application also provide a computer program product or computer program that includes computer instructions stored in a computer-readable storage medium. A processor of a computer device reads the computer instructions from the computer-readable storage medium and executes the computer instructions, causing the computer device to perform the methods provided in the various optional implementations described above.

[0158] As can be seen from the embodiments of the image processing method, apparatus, computer equipment, storage medium, or computer program provided in this application, this application performs semantic segmentation of the document image by inputting the acquired original document image into a document image segmentation model to obtain the document category information corresponding to each pixel in the original document image. It can combine a pre-trained document image segmentation model to perform three-class classification (classified as first-class text, second-class text, or document background) on each pixel in the original document image. Since semantic segmentation requires determining the category of each pixel in the image to achieve accurate segmentation, for pixels belonging to text, the corresponding quality value is not directly evaluated. Instead, it is combined with background segmentation to classify pixels belonging to text into two categories: high quality and low quality. This integrates text segmentation and text quality information determination into a single task, greatly simplifying the process of determining the quality of document images. Only one model is needed for classification, and the document image quality is determined by combining the classification results, achieving end-to-end quality analysis and greatly improving the efficiency and reliability of determining document image quality information. By separately obtaining the weights corresponding to each category and weighting the sum of the probabilities of each category for each pixel in the original document image, a quality weighted value for each pixel in the original document image is obtained. The document image quality index is then determined by combining the quality weighted values ​​of all pixels in the original document image. This approach focuses not only on the quality of the text region but also on the quality of each pixel in the document image, allowing for better attention to the quality of the text-background boundary buffer area, thus improving the reliability of document image quality determination. After the document image quality index meets preset quality conditions, the target document image is input into a text content recognition model for text content recognition. The resulting business content data can be stored locally or applied to business processes. High-quality document images can be used to obtain more reliable business content data, thereby improving the efficiency and reliability of subsequent business processing.

[0159] It should be noted that the order of the embodiments described above is merely for descriptive purposes and does not represent the superiority or inferiority of the embodiments. Furthermore, specific embodiments have been described above. Other embodiments are within the scope of the appended claims. In some cases, the actions or steps described in the claims can be performed in a different order than that shown in the embodiments and still achieve the desired result. Additionally, the processes depicted in the drawings do not necessarily require a specific or sequential order to achieve the desired result. In some embodiments, multitasking and parallel processing are also possible or may be advantageous.

[0160] The various embodiments in this specification are described in a progressive manner. Similar or identical parts between embodiments can be referred to mutually. Each embodiment focuses on describing the differences from other embodiments. In particular, the embodiments of apparatus, devices, and storage media are basically similar to the method embodiments, so the descriptions are relatively simple; relevant parts can be referred to the descriptions of the method embodiments.

[0161] Those skilled in the art will understand that all or part of the steps of the above embodiments can be implemented by hardware or by a program instructing related hardware. The program can be stored in a computer-readable storage medium, such as a read-only memory, a disk, or an optical disk.

[0162] The above description is only a preferred embodiment of this application and is not intended to limit this application. Any modifications, equivalent substitutions, improvements, etc., made within the spirit and principles of this application should be included within the protection scope of this application.

Claims

1. An image processing method, characterized in that, The method includes: Obtain the original document image; The original document image is input into a document image segmentation model for semantic segmentation to obtain document category information corresponding to each pixel in the original document image. The document category information for each pixel in the original document image includes a first type of text, a second type of text, or document background. This document category information is based on text quality. The text quality of pixels corresponding to the first type of text is higher than that of pixels corresponding to the second type of text. The document image segmentation model is pre-trained based on sample document images labeled with document category tags for semantic segmentation. Each document category tag includes a first type of text, a second type of text, or document background. The text quality of sample document images corresponding to the first type of text is higher than that of sample document images corresponding to the second type of text. Based on the document category information, calculate the document image quality index corresponding to the original document image; When the document image quality index meets the preset quality conditions, the original document image is used as the target document image.

2. The method according to claim 1, characterized in that, The step of calculating the document image quality index corresponding to the original document image based on the document category information includes: Based on the document category information, determine the number of first pixels corresponding to the first type of text and the number of second pixels corresponding to the second type of text; Calculate the pixel percentage information corresponding to the first type of text based on the first pixel count and the second pixel count; The pixel ratio information is used as the document image quality indicator.

3. The method according to claim 1, characterized in that, The document category information includes a first probability corresponding to a first type of text, a second probability corresponding to a second type of text, and a third probability corresponding to the document background; the step of calculating the document image quality index corresponding to the original document image based on the document category information includes: The first weight corresponding to the first type of text, the second weight corresponding to the second type of text, and the third weight corresponding to the document background are obtained respectively. The quality weighted value for each pixel is obtained by weighting and summing the first probability, second probability, and third probability corresponding to each pixel, as well as the first weight, second weight, and third weight. Based on the quality weighting value, the document image quality index corresponding to the original document image is determined.

4. The method according to claim 1, characterized in that, The method further includes: Obtain multiple sample document images labeled with document category tags; Based on the multiple sample document images labeled with document category tags, a preset neural network model is trained for document image semantic segmentation. During the training of document image semantic segmentation, the model parameters of the preset neural network model are adjusted until the preset neural network model meets the preset convergence condition, thereby obtaining the document image segmentation model.

5. The method according to claim 4, characterized in that, The process of obtaining multiple sample document images labeled with document category tags includes: Obtain multiple sample original document images; The original document images of the multiple samples are input into a preset text content recognition model to perform text content recognition, resulting in multiple text recognition regions labeled with content recognition results; Determine the true text information for each text recognition region; Based on the real text information and content recognition results of each text recognition region, determine the recognition confidence result corresponding to each text recognition region; The recognition confidence result represents the correctly recognized text recognition region, which is used as a sample document image labeled as the first type of text; and the recognition confidence result represents the incorrectly recognized text recognition region, which is used as a sample document image labeled as the second type of text. Based on the multiple text recognition regions labeled with content recognition results, determine the document background regions in the multiple original document images of the samples; The document background area is used as a sample document image labeled as the document background.

6. The method according to claim 1, characterized in that, The method further includes: When the document image quality index does not meet the preset quality conditions, obtain an updated document image; The updated document image is used as the original document image. The process involves inputting the original document image into a document image segmentation model for semantic segmentation of the document image, and then calculating the document image quality index corresponding to the original document image based on the document category information, until the document image quality index meets the preset quality conditions.

7. The method according to claim 1, characterized in that, The method further includes: The target document image is input into a text content recognition model for text content recognition to obtain business content data; the business content data is then stored locally.

8. An image processing apparatus, characterized in that, The device includes: The raw image acquisition module is used to acquire raw document images; The image semantic segmentation module is used to input the original document image into a document image segmentation model for document image semantic segmentation, obtaining document category information corresponding to each pixel in the original document image. The document category information corresponding to each pixel in the original document image includes a first type of text, a second type of text, or document background. The document category information is document category information based on text quality. The text quality of pixels corresponding to the first type of text is higher than that of pixels corresponding to the second type of text. The document image segmentation model is pre-trained based on sample document images labeled with document category tags for document image semantic segmentation. Each document category tag includes a first type of text, a second type of text, or document background. The text quality of sample document images corresponding to the first type of text is higher than that of sample document images corresponding to the second type of text. The quality index calculation module is used to calculate the document image quality index corresponding to the original document image based on the document category information. The target document image determination module is used to use the original document image as the target document image when the document image quality index meets the preset quality conditions.

9. The apparatus according to claim 8, characterized in that, The quality indicator calculation module includes: A pixel count determination unit is used to determine, based on the document category information, the first pixel count corresponding to the first type of text and the second pixel count corresponding to the second type of text; The pixel proportion information calculation unit is used to calculate the pixel proportion information corresponding to the first type of text based on the first pixel quantity and the second pixel quantity; The first indicator determination unit is used to use the pixel ratio information as the document image quality indicator.

10. The apparatus according to claim 8, characterized in that, The document category information includes a first probability corresponding to the first type of text, a second probability corresponding to the second type of text, and a third probability corresponding to the document background; the quality index calculation module includes: The weight acquisition unit is used to acquire the first weight corresponding to the first type of text, the second weight corresponding to the second type of text, and the third weight corresponding to the document background, respectively. The weighting unit is used to perform a weighted summation based on the first probability, second probability, and third probability corresponding to each pixel, as well as the first weight, second weight, and third weight, to obtain the quality weighted value corresponding to each pixel; The second indicator determination unit is used to determine the document image quality indicator corresponding to the original document image based on the quality weighting value.

11. The apparatus according to claim 8, characterized in that, The device further includes: The sample document image acquisition module is used to acquire multiple sample document images labeled with document category tags; The model training module is used to train a preset neural network model for document image semantic segmentation based on the multiple sample document images labeled with document category tags. During the training of document image semantic segmentation, the model parameters of the preset neural network model are adjusted until the preset neural network model meets the preset convergence condition, thereby obtaining the document image segmentation model.

12. The apparatus according to claim 11, characterized in that, The sample document image acquisition module includes: The original document image acquisition unit is used to acquire multiple original document images of samples. The text content recognition unit is used to input the multiple sample original document images into a preset text content recognition model to perform text content recognition and obtain multiple text recognition regions labeled with content recognition results; The real text information determination unit is used to determine the real text information of each text recognition region. The confidence result determination unit is used to determine the confidence result corresponding to each text recognition region based on the real text information and content recognition result of each text recognition region; The first sample image determination unit is used to use the recognition confidence result to represent the correctly recognized text recognition area as a sample document image labeled as the first type of text, and to use the recognition confidence result to represent the incorrectly recognized text recognition area as a sample document image labeled as the second type of text. The document background region determination unit is used to determine the document background region in the multiple sample original document images based on the multiple text recognition regions labeled with content recognition results; The second sample image determination unit is used to use the document background area as a sample document image labeled as the document background.

13. The apparatus according to claim 8, characterized in that, The device further includes: The document image acquisition module is updated to acquire an updated document image when the document image quality index does not meet the preset quality conditions. The updated quality analysis module is used to take the updated document image as the original document image, and perform the steps of inputting the original document image into the document image segmentation model for document image semantic segmentation, and calculating the document image quality index corresponding to the original document image based on the document category information, until the document image quality index meets the preset quality conditions.

14. The apparatus according to claim 8, characterized in that, The device further includes: The text content recognition module is used to input the target document image into the text content recognition model to perform text content recognition and obtain business content data; The data storage module is used to store the business content data locally.

15. An image processing apparatus, characterized in that, The device includes a processor and a memory, the memory storing at least one instruction or at least one program, the at least one instruction or the at least one program being loaded and executed by the processor to implement the image processing method as described in any one of claims 1 to 7.

16. A computer-readable storage medium, characterized in that, The storage medium stores at least one instruction or at least one program segment, which is loaded and executed by a processor to implement the image processing method as described in any one of claims 1 to 7.