Character recognition method

What is AI technical title?
AI technical title is built by Patsnap AI team. It summarizes the technical point description of the patent document.
By correcting the shape of irregularly shaped polygonal regions and mapping them to rectangular regions for OCR processing, the problem of accuracy and efficiency in recognizing irregularly shaped character images is solved, achieving a balance between accuracy and efficiency in character recognition.

CN117746431BActive Publication Date: 2026-06-30XIAOHONGSHU TECH CO LTD

View PDF 1 Cites 0 Cited by

Patent Information

Authority / Receiving Office: CN · China
Patent Type: Patents(China)
Current Assignee / Owner: XIAOHONGSHU TECH CO LTD
Filing Date: 2023-01-31
Publication Date: 2026-06-30

Application Information

Patent Timeline

31 Jan 2023

Application

30 Jun 2026

Publication

CN117746431B

IPC: G06V30/14; G06V30/146; G06V30/148; G06V30/19; G06V10/82

AI Tagging

Technology Topics

Text detection Computer graphics (images)

Technical Efficacy Phrases

Guaranteed accuracyGuaranteed recognition efficiency

Explore More Agents

Novelty Search
Search existing technologies and assess novelty
↗
FTO
Analyze whether a product may infringe others' patents
↗
Design FTO
Check prior-design risk for exterior design
↗
Drafting
Draft patent application text based on a technical solution
↗
Find Solutions with TRIZ
Generate feasible solution to solve your technical challenge
↗

Similar Technology Patents

Electrode sheet
CN224505453UReduce the probability of separation and sheddingReduce the need to repeatedly plug in wiresMechanical engineering Biomedical engineering
An in-situ permeability testing system and method for underground coal gasification
CN122259424AGuaranteed accuracy Guaranteed reliability Permeability/surface area analysis Thermodynamics Petroleum engineering
Vehicle underbody living body detection method, computer program product, electronic device and vehicle
CN122200729AImprove targeting Improve efficiency Biometric pattern recognition Alarms
A skeleton structure for a wind tunnel test model, a wind tunnel test model and a method of use
CN122409128AAdapt to support needsImprove shape fitting accuracyClassical mechanics Structural engineering
Screen color uniformity on-line detection system based on dynamic rotating polarizing spectrum
CN122385147AGuaranteed accuracyRealize true online detectionData acquisition Light beam

Get free access to AI patent search and analysis

Check patentability, review prior art and ask IP Agent with full patent context.

Smart Images

Figure CN117746431B_ABST

Patent Text Reader

Abstract

This application discloses a character recognition method. The method includes: performing text detection on a target image to obtain at least one polygonal region; determining a target polygonal region from the at least one polygonal region, wherein the target polygonal region has an irregular shape; determining four target vertices from multiple vertices of the target polygonal region, and using these four target vertices as the four vertices of a rectangular region corresponding to the target polygonal region; mapping pixel information of each pixel in the target polygonal region to the rectangular region based on the positions of each vertex of the rectangular region in the target image to generate pixel information of each pixel in the rectangular region; and performing OCR processing on the rectangular region and the polygonal regions in the at least one polygonal region other than the target polygonal region to obtain the character recognition result of the target image. This application embodiment can simultaneously achieve both accuracy and efficiency in character recognition.

Need to check novelty before this filing date? Find Prior Art

Description

Technical Field

[0001] This application relates to the field of computer application technology, and in particular to character recognition methods. Background Technology

[0002] Optical character recognition (OCR) refers to the process of using computer devices (such as scanners or digital cameras) to capture images containing characters, and then using character recognition methods to translate the shapes of the characters in the image into computer text. Specifically, traditional character recognition methods typically detect the rectangular regions containing text in each row or column of an image, and then perform character recognition on these rectangular regions. However, in real-world scenarios, the characters in the captured images are not necessarily arranged in a regular row or column; in other words, the characters in the image may not form regular rectangles, but could be irregular shapes, such as arcs, wavy lines, or trapezoids. Therefore, how to perform character recognition on images with irregularly arranged characters, while ensuring accuracy and efficiency, is a pressing technical problem that needs to be solved. Summary of the Invention

[0003] This application provides a character recognition method that can simultaneously ensure both the accuracy and efficiency of character recognition.

[0004] On one hand, embodiments of this application provide a character recognition method, which includes:

[0005] Text detection is performed on the target image to obtain at least one polygonal region; wherein, the at least one polygonal region refers to the region in the target image where at least one text is located;

[0006] A target polygon region is determined from the at least one polygon region; wherein the shape of the target polygon region is an irregular shape;

[0007] Four target vertices are determined from the multiple vertices of the target polygon region, and these four target vertices are used as the four vertices of the rectangular region corresponding to the target polygon region; wherein, the rectangular region refers to the region obtained after shape correction of the target polygon region, and the shape of the rectangular region is a regular shape;

[0008] Based on the positions of each vertex of the rectangular region in the target image, the pixel information of each pixel in the target polygonal region is mapped to the rectangular region to generate the pixel information of each pixel in the rectangular region;

[0009] OCR processing is performed on the rectangular region and the polygonal regions other than the target polygonal region in the at least one polygonal region to obtain the character recognition result of the target image.

[0010] In one embodiment, determining four target vertices from a plurality of vertices of the target polygon region includes:

[0011] Two extreme points of the target polygon region are obtained from multiple vertices of the target polygon region; wherein the walking distance between the two extreme points is greater than the walking distance between any vertex of the multiple vertices and any other vertex of the multiple vertices except the two extreme points.

[0012] Based on the two poles of the target polygon region, the four target vertices are determined from the multiple vertices of the target polygon region.

[0013] In one embodiment, obtaining the two poles of the target polygon region from a plurality of vertices of the target polygon region includes:

[0014] Traverse multiple vertices of the target polygon region, control the first vertex currently being traversed to walk along the edge of the target polygon region, determine the second vertex from the multiple vertices, and obtain the target walking distance between the first vertex currently being traversed and the second vertex; wherein, the target walking distance between the first vertex currently being traversed and the second vertex is greater than the walking distance between the first vertex currently being traversed and the vertices other than the second vertex among the multiple vertices;

[0015] After the traversal is completed, the first and second vertices corresponding to the maximum walking distance in the obtained target walking distance are taken as the two extreme points of the target polygon region.

[0016] In one embodiment, determining the four target vertices from a plurality of vertices of the target polygon region based on two poles of the target polygon region includes:

[0017] Traverse the two poles of the target polygon region, and determine the target point from multiple vertices of the target polygon region based on the straight-line distance between the two poles and the straight-line distance between the currently traversed pole and each vertex of the target polygon region;

[0018] Based on the target point and the currently traversed pole, construct a candidate polygon region;

[0019] The ratio between the area of the candidate polygon region and the area of the smallest bounding rectangle region of the candidate polygon region is obtained.

[0020] If the ratio is greater than the ratio threshold, then candidate vertices are determined from the four vertices of the minimum bounding rectangle region of the candidate polygon region; wherein, the straight-line distance between the candidate vertex and the currently traversed pole is less than the straight-line distance between the vertices other than the candidate vertex in the four vertices of the minimum bounding rectangle region of the candidate polygon region and the currently traversed pole.

[0021] The vertex with the smallest straight-line distance from the candidate vertex among the multiple vertices of the candidate polygon region is taken as the target vertex.

[0022] In one embodiment, the method further includes:

[0023] If the ratio is less than or equal to the ratio threshold, then the process of determining candidate vertices from multiple vertices of the target polygon region is triggered based on the target walking distance corresponding to the currently traversed pole and the straight-line distance between the currently traversed pole and each vertex of the target polygon region; wherein the number of candidate vertices determined now is less than the number of candidate vertices determined in the previous step.

[0024] In one embodiment, the method further includes:

[0025] Obtain the walking distance between any two adjacent target vertices among the four target vertices;

[0026] Based on the walking distance between any two adjacent target vertices among the four target vertices, and the spatial relationship of the four target vertices in the target polygonal region, the spatial relationship of the four target vertices in the rectangular region is determined;

[0027] The step of mapping pixel information of each pixel in the target polygonal region to the rectangular region based on the positions of each vertex of the rectangular region in the target image, to generate pixel information of each pixel in the rectangular region, includes:

[0028] Based on the spatial relationship of the four target vertices in the rectangular region and the position of each vertex of the rectangular region in the target image, the pixel information of each pixel in the target polygonal region is mapped to the rectangular region to generate the pixel information of each pixel in the rectangular region.

[0029] In one embodiment, mapping pixel information of each pixel in the target polygonal region to the rectangular region based on the positions of each vertex of the rectangular region in the target image includes:

[0030] Determine the mapping points from each vertex in the target polygonal region to the rectangular region;

[0031] Based on the multiple vertices of the target polygon region and the spatial relationship between the multiple vertices of the target polygon region in the target polygon region, the target polygon region is triangulated to obtain multiple first triangle regions corresponding to the target polygon region;

[0032] Based on multiple mapping points in the rectangular region and the spatial relationship between these mapping points within the rectangular region, the rectangular region is triangulated to obtain multiple second triangular regions corresponding to the rectangular region; wherein, there is a one-to-one correspondence between the multiple first triangular regions and the multiple second triangular regions.

[0033] Traverse the plurality of first triangular regions, for any pixel in the currently traversed first triangular region, determine the target pixel corresponding to the pixel in the second triangular region corresponding to the currently traversed first triangular region, and map the pixel information of the target pixel to the pixel;

[0034] After the traversal is completed, pixel information of each pixel in the rectangular region is generated.

[0035] In one embodiment, mapping the pixel information of the target pixel to any pixel includes:

[0036] If the number of target pixels corresponding to any pixel is not an integer, then the interpolation method is called to calculate the pixel information of any pixel based on the pixel information of the target pixel.

[0037] If the number of target pixels corresponding to any given pixel is an integer, then the pixel information of the target pixels is used as the pixel information of any given pixel.

[0038] In one embodiment, determining the mapping points from each vertex in the target polygonal region to the rectangular region includes:

[0039] Based on the walking distance of each pair of adjacent vertices of the rectangular region in the target polygonal region, the width and height of the rectangular region are obtained;

[0040] Based on the width and height, determine the coordinates of the mapping points in the rectangular region to the four target vertices of the target polygonal region;

[0041] Based on the walking distance between every two adjacent vertices of the target polygon region and the spatial relationship between multiple vertices of the target polygon region, the coordinates of the mapping points in the rectangular region are determined for the vertices other than the four target vertices in the target polygon region.

[0042] In one embodiment, determining the target polygon region from the at least one polygon region includes:

[0043] For any polygon region among the at least one polygon region, obtain the minimum bounding rectangle region of the polygon region.

[0044] Obtain the area of the first region of any polygonal region, and the area of the second region of the smallest bounding rectangle region;

[0045] If the difference between the area of the first region and the area of the second region is greater than the difference threshold, then any polygonal region is determined as the target polygonal region.

[0046] On the other hand, embodiments of this application provide a character recognition device, which includes:

[0047] A text detection unit is used to perform text detection on a target image to obtain at least one polygonal region; wherein, the at least one polygonal region refers to the region where at least one text is located in the target image;

[0048] A region determination unit is configured to determine a target polygon region from the at least one polygon region; wherein the region shape of the target polygon region is an irregular shape;

[0049] A vertex determination unit is used to determine four target vertices from multiple vertices of the target polygonal region, and to use the four target vertices as the four vertices of a rectangular region corresponding to the target polygonal region; wherein, the rectangular region refers to the region obtained after shape correction of the target polygonal region, and the shape of the rectangular region is a regular shape;

[0050] A mapping unit is used to map the pixel information of each pixel in the target polygonal region to the rectangular region based on the position of each vertex of the rectangular region in the target image, so as to generate the pixel information of each pixel in the rectangular region;

[0051] A character recognition unit is used to perform OCR processing on the rectangular region and the polygonal regions other than the target polygonal region in the at least one polygonal region to obtain the character recognition result of the target image.

[0052] On the other hand, embodiments of this application provide a computer device including a processor, a storage device, and a communication interface, wherein the processor, storage device, and communication interface are interconnected, wherein the storage device is used to store a computer program that supports the computer device in executing the above-described method, the computer program including program instructions, and the processor is configured to invoke the program instructions to execute the following steps:

[0053] Text detection is performed on the target image to obtain at least one polygonal region; wherein, the at least one polygonal region refers to the region in the target image where at least one text is located;

[0054] A target polygon region is determined from the at least one polygon region; wherein the shape of the target polygon region is an irregular shape;

[0055] Four target vertices are determined from the multiple vertices of the target polygon region, and these four target vertices are used as the four vertices of the rectangular region corresponding to the target polygon region; wherein, the rectangular region refers to the region obtained after shape correction of the target polygon region, and the shape of the rectangular region is a regular shape;

[0056] Based on the position of each vertex of the rectangular region in the target image, the pixel information of each pixel in the target polygonal region is mapped to the rectangular region to generate the pixel information of each pixel in the rectangular region;

[0057] OCR processing is performed on the rectangular region and the polygonal regions other than the target polygonal region in the at least one polygonal region to obtain the character recognition result of the target image.

[0058] On the other hand, embodiments of this application provide a computer-readable storage medium storing a computer program, the computer program including program instructions, which, when executed by a processor, cause the processor to perform the aforementioned character recognition method.

[0059] On the other hand, embodiments of this application provide a computer program product, which includes a computer program adapted to be loaded by a processor and executed by the character recognition method described above.

[0060] In this embodiment, after determining a target polygonal region with an irregular shape from at least one polygonal region, the target polygonal region is first shaped. Specifically, four target vertices are determined from multiple vertices of the target polygonal region, and these four target vertices are used as the four vertices of the rectangular region corresponding to the target polygonal region. Based on the position of each vertex of the rectangular region in the target image, the pixel information of each pixel in the target polygonal region is mapped to the rectangular region to generate the pixel information of each pixel in the rectangular region. After the rectangular region is obtained by shaped correction of the target polygonal region, OCR processing is performed on regions with regular shapes (i.e., the aforementioned rectangular regions, and at least one polygonal region other than the target polygonal region), which ensures the accuracy of character recognition. In addition, this embodiment only performs shape correction on polygonal regions with irregular shapes; for polygonal regions with regular shapes, OCR processing is performed directly, thus ensuring the recognition efficiency of character recognition. Therefore, this embodiment can simultaneously take into account both the accuracy and efficiency of character recognition. Attached Figure Description

[0061] To more clearly illustrate the technical solutions in the embodiments of this application or the prior art, the drawings used in the description of the embodiments or the prior art will be briefly introduced below. Obviously, the drawings described below are only some embodiments of this application. For those skilled in the art, other drawings can be obtained based on these drawings without creative effort.

[0062] Figure 1a This is a schematic diagram of a polygonal region with a regular shape, provided in an embodiment of this application;

[0063] Figure 1b This is a schematic diagram of another polygonal region with a regular shape provided in an embodiment of this application;

[0064] Figure 1c This is a schematic diagram of a polygonal region with an irregular shape, provided in an embodiment of this application;

[0065] Figure 1d This is a schematic diagram of a rectangular region obtained by shape correction of an irregularly shaped polygonal region according to an embodiment of this application;

[0066] Figure 2 This is a flowchart illustrating a character recognition method provided in an embodiment of this application;

[0067] Figure 3 This is a flowchart illustrating another character recognition method provided in an embodiment of this application;

[0068] Figure 4 This is a schematic diagram of another polygonal region with an irregular shape provided in an embodiment of this application;

[0069] Figure 5 This is a schematic diagram of the structure of a character recognition device provided in an embodiment of this application;

[0070] Figure 6 This is a schematic diagram of the structure of a computer device provided in an embodiment of this application. Detailed Implementation

[0071] The technical solutions of the embodiments of this application will be clearly and completely described below with reference to the accompanying drawings. Obviously, the described embodiments are only some embodiments of this application, and not all embodiments. Based on the embodiments of this application, all other embodiments obtained by those of ordinary skill in the art without creative effort are within the scope of protection of this application.

[0072] Images on content publishing platforms (such as notes, video screenshots, advertising images, or product images) often contain text, which greatly assists in subject recognition, scene recognition, or product content recognition. Traditional character recognition methods can only recognize characters arranged in a conventional shape. Specifically, after performing text detection on an image to obtain at least one polygonal region, if the text information in the polygonal region is arranged horizontally (e.g., ... Figure 1a (as shown) or vertical (as shown) Figure 1b As shown in the image, traditional character recognition methods can accurately identify the characters within a polygonal region. However, if the polygonal region has an irregular shape, meaning the characters are arranged irregularly (e.g., in an arc shape), the problem persists. Figure 1c As shown in the figure, the polygonal region has an arc shape. Since OCR can only recognize characters in rectangular regions, if the smallest bounding rectangle of the polygonal region with an irregular shape is used for recognition, the smallest bounding rectangle will contain a large blank area, resulting in a very low recognition accuracy.

[0073] Based on this, the character recognition method provided in this application, after performing text detection on the target image and obtaining at least one polygonal region, determines a target polygonal region with an irregular shape (such as...) within the at least one polygonal region. Figure 1c (As shown), then the target polygonal region is shape-corrected to obtain a rectangular region with a regular shape (as shown). Figure 1dAs shown in the figure, OCR processing is performed on the rectangular area to ensure the accuracy of character recognition. Furthermore, for at least one polygonal region with a regular shape, shape correction is not required; instead, OCR processing is performed directly on that polygonal region. Therefore, this embodiment only performs shape correction on target polygonal regions with irregular shapes, thus ensuring the efficiency of character recognition. In other words, this embodiment can simultaneously ensure both the accuracy and efficiency of character recognition.

[0074] The "regular shape" mentioned in this application refers to a rectangular shape. That is, the arrangement of characters in a polygonal region with a regular shape is regular, i.e., horizontal or vertical. The "irregular shape" refers to a shape other than a regular shape. That is, the arrangement of characters in a polygonal region with an irregular shape is irregular, i.e., an arrangement other than horizontal and vertical, such as an arc-shaped arrangement, a wave-shaped arrangement, a trapezoidal arrangement, or a wedge-shaped arrangement.

[0075] The character recognition method provided in this application can be applied to a character recognition device, which can be installed or integrated into a computer device. The computer device may include terminal devices or servers, and includes, but is not limited to, scanners, electronic devices equipped with image acquisition devices, such as smartphones, cameras, wearable devices, or computers. Image acquisition devices may include cameras, infrared sensors, or ultrasonic sensors. Optionally, after acquiring the character recognition results of an image, the character recognition device can be applied to scenarios such as e-commerce search, e-commerce recommendation, multimodal search, or multimodal recommendation.

[0076] Please see Figure 2 , Figure 2 This is a flowchart illustrating a character recognition method provided in an embodiment of this application. This character recognition method can be executed by a character recognition device or a computer device; for example... Figure 2 The character recognition scheme shown includes, but is not limited to, steps S201 to S205, wherein:

[0077] S201, perform text detection on the target image to obtain at least one polygonal region.

[0078] Among them, at least one deformable region refers to the region in the target image where at least one text is located.

[0079] In one implementation, text detection of the target image can be performed using a neural network to obtain at least one polygonal region, where each polygonal region refers to the area in the target image that indicates the location of a text.

[0080] The polygonal region can include at least four sides, such as a quadrilateral or hexagonal region. The polygonal region can be a symmetrical polygon or an asymmetrical polygon, such as a rectangle or a sector.

[0081] The neural network includes, but is not limited to, Differentiable Binarization Networks (DBNet) or DBNet++. DBNet works by first outputting a probability map for text segmentation, then using a binarization threshold learned by the network to transform the probability map into a binary map, and finally obtaining the detection result (i.e., at least one rectangular region as described above) through post-processing. DBNet++ adds an Adaptive Scale Fusion (ASF) module to DBNet. Features at different scales are processed by the ASF module to obtain better fused features. The ASF module also introduces a spatial attention mechanism, making the fused features more robust.

[0082] In one implementation, text detection in the target image can be performed using regression-based methods to obtain at least one polygonal region. Examples of regression-based methods include TextBoxes++, EAST, DeepReg, and DeRPN. TextBoxes++ can detect text from multiple angles. EAST (Efficient and Accurate Scene Textdetector) is a pixel-based scene text detection algorithm. DeepReg is an open-source toolkit for medical image registration using deep learning. DeRPN proposes a dimensionality decomposition region proposal network capable of handling scale issues in scene text detection.

[0083] In one implementation, text detection in a target image can be performed using a component-based approach, yielding at least one polygonal region. Component-based methods include SegLink and SegLink++. The main idea behind SegLink is to decompose text into two locally detectable elements: segments and links. Segments are bounding boxes representing characters or words, and links connect these boxes; final detection is generated by connecting the segments. SegLink++ refers to a bottom-up text detection method that is sensitive to text instances.

[0084] In one implementation, text detection in the target image can be performed based on segmentation methods to obtain at least one polygonal region. Segmentation-based methods include Mask Text Spotter and PSENet. Mask Text Spotter incorporates semantic segmentation into its end-to-end training, and its greatest advantage is its ability to detect text of arbitrary shapes. PSENet is a novel instance segmentation network capable of locating text of arbitrary shapes, and it proposes a progressive scale-scaling algorithm that successfully identifies adjacent text instances.

[0085] S202, Determine the target polygon region from at least one polygon region.

[0086] The target polygonal region has an irregular shape. There can be one or more target polygonal regions, and the number of target polygonal regions is less than or equal to the number of at least one polygonal region obtained from text detection of the target image.

[0087] In one implementation, for any polygonal region in at least one polygonal region, the minimum bounding rectangle region of any polygonal region is obtained, the first region area of any polygonal region is obtained, and the second region area of the minimum bounding rectangle region is obtained. If the difference between the first region area and the second region area is greater than the difference threshold, then any polygonal region is determined as the target polygonal region.

[0088] In one example, suppose text detection is performed on a target image, resulting in three polygonal regions: a first polygonal region, a second polygonal region, and a third polygonal region. The minimum bounding rectangle of the first polygonal region can be obtained. Then, the area of the first region and the area of the second region within the minimum bounding rectangle of the first polygonal region are calculated. If the ratio of the first and second region areas is greater than a preset value, it indicates that the area of the second region is relatively close to that of the first region, meaning the difference between the two areas is less than or equal to a difference threshold. This means the minimum bounding rectangle of the first polygonal region contains a small blank area, and the region shape can be considered regular. Therefore, OCR processing is performed on the first polygonal region to obtain the character recognition result. If the ratio of the first and second region areas is less than or equal to a preset value, it indicates that the area of the second region is much larger than that of the first region, meaning the difference between the two areas is greater than a difference threshold. This means the minimum bounding rectangle of the first polygonal region contains a large blank area, and the region shape can be considered irregular. Therefore, the first polygonal region is identified as the target polygonal region. The preset value can be a pre-set value that is greater than 0 and less than 1. The preset value can be set based on experience or obtained through neural network learning, for example, the value is 0.75.

[0089] In another example, suppose text detection is performed on the target image, resulting in three polygonal regions: a first polygonal region, a second polygonal region, and a third polygonal region. The minimum bounding rectangle of the first polygonal region can be obtained. Then, the area of the first region and the area of the second region within the minimum bounding rectangle of the first polygonal region are calculated. If the difference between the areas of the second and first regions is less than or equal to an area threshold, it indicates that the areas of the second and first regions are relatively close, meaning the difference is less than or equal to a difference threshold. This means the minimum bounding rectangle of the first polygonal region contains a small blank area, and the region shape can be considered regular. Therefore, OCR processing is performed on the first polygonal region to obtain the character recognition result. If the difference between the areas of the second and first regions is greater than an area threshold, it indicates that the area of the second region is much larger than the area of the first region, meaning the difference is greater than a difference threshold. This means the minimum bounding rectangle of the first polygonal region contains a large blank area, and the region shape can be considered irregular. Therefore, the first polygonal region is identified as the target polygonal region. The area threshold can be a preset value, which can be set based on experience or learned through a neural network.

[0090] In this embodiment, the method for determining whether the second or third polygonal region is the target polygonal region is the same as the method for determining whether the first polygonal region is the target polygonal region, and will not be repeated here. Furthermore, this embodiment does not limit the order in which the determination of whether each polygonal region within the at least one polygonal region is the target polygonal region. For example, the determination of whether each polygonal region is the target polygonal region can be performed in parallel, or the at least one polygonal region can be sorted, and the determination of whether each polygonal region is the target polygonal region can be performed sequentially according to the sorting.

[0091] S203, determine four target vertices from the multiple vertices of the target polygon region, and use the four target vertices as the four vertices of the rectangular region corresponding to the target polygon region.

[0092] The rectangular region refers to the region obtained after shape correction of the target polygonal region. The rectangular region has a regular shape.

[0093] In a specific implementation, when there is only one target polygon region, four target vertices can be determined from the multiple vertices of the target polygon region. These four target vertices are then used as the four vertices of the rectangular region corresponding to the target polygon region. Based on the position of each vertex of the rectangular region in the target image, the pixel information of each pixel in the target polygon region is mapped to the rectangular region to generate the pixel information of each pixel in the rectangular region.

[0094] For example, suppose text detection is performed on a target image, resulting in three polygonal regions: a first polygonal region, a second polygonal region, and a third polygonal region. The first polygonal region is the target polygonal region, while the second and third polygonal regions have regular shapes. Shape correction can be performed on the first polygonal region by identifying four target vertices from its multiple vertices and using these four target vertices as the four vertices of a corresponding rectangular region. Based on the positions of the vertices of the rectangular region in the target image, the pixel information of each pixel in the first polygonal region is mapped to the rectangular region to generate the pixel information of each pixel in the rectangular region. Then, OCR processing is performed on the rectangular region, the second polygonal region, and the third polygonal region respectively to obtain the character recognition results for each region. This results in the character recognition results of the target image, which include the character recognition results for the rectangular region, the second polygonal region, and the third polygonal region.

[0095] When there are multiple target polygon regions, for any target polygon region, four target vertices can be determined from the multiple vertices of any target polygon region, and these four target vertices are used as the four vertices of the rectangular region corresponding to any target polygon region. Based on the position of each vertex of the rectangular region in the target image, the pixel information of each pixel in any target polygon region is mapped to the rectangular region to generate the pixel information of each pixel in the rectangular region. Then, the rectangular region is the rectangular region obtained by shape correction of any target polygon region.

[0096] For example, suppose text detection is performed on a target image, resulting in three polygonal regions: a first polygonal region, a second polygonal region, and a third polygonal region. The first and second polygonal regions are the target polygonal regions, while the third polygonal region has a regular shape. We can then perform shape correction on the first polygonal region to obtain a corresponding rectangular region. This involves identifying four target vertices from the multiple vertices of the first polygonal region and using these four target vertices as the four vertices of the rectangular region corresponding to the first polygonal region. Based on the positions of the vertices of the rectangular region in the target image, the pixel information of each pixel in the first polygonal region is mapped to the rectangular region to generate the pixel information of each pixel in the rectangular region. Similarly, we can perform shape correction on the second polygonal region to obtain a corresponding rectangular region. This involves identifying four target vertices from the multiple vertices of the second polygonal region and using these four target vertices as the four vertices of the rectangular region corresponding to the second polygonal region. Based on the positions of the vertices of the rectangular region in the target image, the pixel information of each pixel in the second polygonal region is mapped to the rectangular region to generate the pixel information of each pixel in the rectangular region. Then, OCR processing is performed on the rectangular regions corresponding to the first polygonal region, the rectangular regions corresponding to the second polygonal region, and the third polygonal region respectively to obtain the character recognition results of each region, thereby generating the character recognition results of the target image. The character recognition results of the target image include the character recognition results of the rectangular regions corresponding to the first polygonal region, the character recognition results of the rectangular regions corresponding to the second polygonal region, and the character recognition results of the third polygonal region.

[0097] The specific method for correcting the shape of the target polygonal region to obtain a rectangular region can be found in the following embodiment description.

[0098] S204, based on the position of each vertex of the rectangular region in the target image, the pixel information of each pixel in the target polygonal region is mapped to the rectangular region to generate the pixel information of each pixel in the rectangular region.

[0099] S205, perform OCR processing on the rectangular region and at least one polygonal region other than the target polygonal region to obtain the character recognition result of the target image.

[0100] Traditional character recognition methods can be used to perform OCR processing on rectangular regions and at least one polygonal region other than the target polygonal region to obtain character recognition results for each region. Optionally, the character recognition results may include text information in the corresponding region, and the text information may include at least one character. The character recognition results may also include the confidence scores of each character within the at least one character. The confidence score of any character refers to the probability that the recognized character is an accurate character.

[0101] For example, traditional character recognition methods may include Convolutional Recurrent Neural Networks (CRNNs), 2D-CTC, ACE, SVTR, SAR, and TroOCR. 2D-CTC addresses the problem of irregular images containing a large amount of background, which is noise for the model. The 1D probability matrix of CTC inevitably introduces noise, so 2D-CTC adds height information to minimize the impact of background noise on the probability matrix. ACE is a sequence recognition algorithm based on cross-entropy loss. SVTR is a scene text detection and recognition algorithm. SAR can be used to recognize irregular text (such as curved characters or artistic fonts). TroOCR refers to an end-to-end Transformer-based OCR model that utilizes a pre-trained model.

[0102] In this embodiment, text detection is performed on the target image to obtain at least one polygonal region. From the at least one polygonal region, a target polygonal region with an irregular shape is determined. Four target vertices are determined from multiple vertices of the target polygonal region and used as the four vertices of the rectangular region corresponding to the target polygonal region. Based on the position of each vertex of the rectangular region in the target image, the pixel information of each pixel in the target polygonal region is mapped to the rectangular region to generate the pixel information of each pixel in the rectangular region. OCR processing is performed on the rectangular region and the polygonal regions in the at least one polygonal region other than the target polygonal region to obtain the character recognition result of the target image, which can simultaneously take into account the accuracy and efficiency of character recognition.

[0103] Based on the above description, please refer to Figure 3 , Figure 3 This is a flowchart illustrating another character recognition method provided in an embodiment of this application. This character recognition method can be executed by a character recognition device or a computer device; for example... Figure 3 The character recognition scheme shown includes, but is not limited to, steps S301 to S308, wherein:

[0104] S301, perform text detection on the target image to obtain at least one polygonal region.

[0105] S302, Determine the target polygon region from at least one polygon region.

[0106] The steps S301 and S302 can be found in the detailed descriptions of steps S201 and S202 in the above embodiments, and will not be repeated in this application embodiment.

[0107] S303: Obtain the two extreme points of the target polygon region from multiple vertices of the target polygon region.

[0108] The walking distance between two poles is greater than the walking distance between any vertex and any other vertex among the multiple vertices excluding the two poles. The walking distance between two points refers to the distance traveled by a point along the edge of a target polygonal region to reach another point.

[0109] In one implementation, the Double Max algorithm can be used to obtain the two extreme points of the target polygon region from multiple vertices. Specifically, the perimeter of the target polygon region is first calculated. Then, each vertex V of the target polygon region is traversed, and the vertex U_V that is furthest from vertex V is found along the perimeter (here, distance refers to the walking distance, not the straight-line distance). The walking distance R_V between V and U_V is also obtained. After the traversal, the maximum value of R_V is found among all R_V, along with the corresponding V and U_V. The V and U_V corresponding to the maximum value are taken as the two extreme points of the target polygon region, and the straight-line distance between the found V and U_V is taken as the diameter D of the target polygon region.

[0110] The above implementation can be understood as follows: traverse multiple vertices of the target polygon region, control the first vertex currently being traversed to walk along the edge of the target polygon region, determine the second vertex from among the multiple vertices, and obtain the target walking distance between the first and second vertices currently being traversed. After the traversal is completed, the first and second vertices corresponding to the maximum walking distance among the obtained target walking distances are taken as the two extreme points of the target polygon region.

[0111] Specifically, the target travel distance between the first and second vertices currently traversed is greater than the travel distance between the first vertex and all other vertices except the second vertex. A vertex refers to the intersection of any two edges in the target polygon region.

[0112] by Figure 4Taking the schematic diagram of the target polygonal region shown as an example, if the target polygonal region has 4 sides, then the number of vertices of the target polygonal region is 4, as shown below. Figure 4 The vertices A, B, C, and D in the diagram are shown. When traversing vertex A, the walking distance between vertex A and vertex B is calculated as L. AB (i.e., the length of edge 1), the walking distance between vertex A and vertex C is L. AC (That is, the sum of the lengths of edge 1 and edge 2), the walking distance between vertex A and vertex D is L. AD (i.e., the length of side 4), where L AC >L AB >L AD Therefore, we can determine that the vertex furthest from vertex A is C; that is, if vertex A is the first vertex, then vertex C is the second vertex. Similarly, when traversing vertex B, we calculate the distance between vertex B and the other three vertices to determine the vertex furthest from vertex B, which is the second vertex corresponding to vertex B. When traversing vertex C, we calculate the distance between vertex C and the other three vertices to determine the vertex furthest from vertex C, which is the second vertex corresponding to vertex C. When traversing vertex D, we calculate the distance between vertex D and the other three vertices to determine the vertex furthest from vertex D, which is the second vertex corresponding to vertex D.

[0113] Assuming vertex A is the first vertex and vertex C is the second vertex, the walking distance between vertex A and vertex C is L. AC When vertex B is the first vertex and vertex D is the second vertex, the walking distance between vertex B and vertex D is L. BD When vertex C is the first vertex and vertex A is the second vertex, the walking distance between vertex C and vertex A is L. AC When vertex D is the first vertex and vertex B is the second vertex, the walking distance between vertex D and vertex B is L. BD Therefore, after the traversal is complete, the obtained target walking distance includes L. AC and L BD Because of L AC =L BD Therefore, A and C can be defined as the two extreme points of the target polygon region, and the straight-line distance between vertices A and C can be used as the diameter of the target polygon region. Alternatively, B and D can be defined as the two extreme points of the target polygon region, and the straight-line distance between vertices B and D can be used as the diameter of the target polygon region.

[0114] Optionally, when calculating the walking distance between two vertices, one can walk in both clockwise and counterclockwise directions, and take the smaller of the two calculated distances as the walking distance between the two vertices. For example, when traversing vertex A, walk along edge 1 of the target polygon region. When reaching vertex B, calculate the distance between vertex A and vertex B (i.e., the length of edge 1). Then walk along edge 4 of the target polygon region. When reaching vertex B, calculate the distance between vertex A and vertex B (i.e., the sum of the lengths of edge 2, edge 3, and edge 4). Since the sum of the lengths of edge 2, edge 3, and edge 4 is greater than the length of edge 1, the walking distance between vertex A and vertex B is the length of edge 1.

[0115] S304, based on the two poles of the target polygon region, determines four target vertices from the multiple vertices of the target polygon region.

[0116] In one implementation, the two poles of the target polygon region can be traversed. Based on the straight-line distance between the two poles and the straight-line distance between the currently traversed pole and each vertex of the target polygon region, a target point is determined from multiple vertices of the target polygon region. Then, based on the target point and the currently traversed pole, candidate polygon regions are constructed, and the ratio between the area of the candidate polygon region and the area of its minimum bounding rectangle is obtained. If the ratio is greater than a ratio threshold, candidate vertices are determined from the four vertices of the minimum bounding rectangle of the candidate polygon region. The vertex with the smallest straight-line distance from the candidate vertex among the multiple vertices of the candidate polygon region is taken as the target vertex. The ratio threshold can be a pre-set value, greater than 0 and less than 1. The ratio threshold can be set empirically or learned through a neural network; for example, a ratio threshold of 0.8.

[0117] In the specific implementation, assuming the two poles of the target polygonal region are P1 and P2, and the straight-line distance between P1 and P2 is D (i.e., the diameter of the target polygonal region is D), then four target vertices can be determined from the multiple vertices of the target polygonal region based on P1, P2, and D. Specifically, two target vertices can be determined from the multiple vertices of the target polygonal region based on P1 and D; and the other two target vertices can be determined from the multiple vertices of the target polygonal region based on P2 and D.

[0118] Based on P1 and D, the method to determine two target vertices from multiple vertices of the target polygon region is as follows: Let r = D / 2. First, based on P1, select all points from the edges of the target polygon region whose straight-line distance from P1 is less than r. These selected points form a new polygon Y. Second, find the smallest bounding rectangle X of polygon Y and calculate the ratio j between the area of Y and the area of X. Third, if j > 0.8, select the two vertices B1 and B2 closest to P1 from the four vertices of X. If j <= 0.8, successively select r = D / 3, D / 4, D / 5, D / 6, ..., and repeat steps one and two above. If the condition j > 0.8 is not met even up to r = D / 10, then select Y, B1, and B2 corresponding to the maximum j value. Fourth, find the vertex C1 on Y that is closest to B1 as the target vertex, and the vertex C2 on Y that is closest to B2 as the target vertex.

[0119] The method for determining two additional target vertices from multiple vertices of the target polygon region according to P2 and D is the same as the method for determining two target vertices from multiple vertices of the target polygon region according to P1 and D, and will not be repeated in the embodiments of this application.

[0120] The straight-line distance between a candidate vertex and the currently traversed pole is less than the straight-line distance between the vertices other than the candidate vertex in the minimum bounding rectangle of the candidate polygon region and the currently traversed pole.

[0121] by Figure 4 For example, suppose the two extreme points of the target polygon region are vertex A and vertex C, and the straight-line distance between vertex A and vertex C is D. AC When traversing vertex A, based on D... AC The target point is determined from the edges of the target polygon region, specifically, by considering the straight-line distances between vertex A and points on each edge (edge 1, edge 2, edge 3, and edge 4) of the target polygon region. Specifically, a target point is defined as an edge of the target polygon region whose straight-line distance to vertex A is less than D. AC The point is 2 / 2. Among the points, the straight-line distance between edge 1 of the target polygon region and vertex A is equal to D. AC If point E is the vertex of line segment AE in edge 1, then the straight-line distance between all points on line segment AE and vertex A is less than D. AC / 2. In edge 3 of the target polygon region, the straight-line distance to vertex A is equal to D. AC If point F is the vertex of line segment DF in edge 3, then the straight-line distance between all points on line segment DF and vertex A is less than D. AC / 2, and the straight-line distance between all points on edge 4 and vertex A is less than D. AC / 2. Then, as... Figure 4As shown, the candidate polygon region Y can refer to a closed region composed of edges AE, EF, FD, and DA. Assuming the ratio between the area of region Y and the area of its smallest bounding rectangle is greater than a threshold, the two vertices with the smallest straight-line distance from vertex A are determined from the four vertices of Y's smallest bounding rectangle; these are candidate vertices B1 and B2. Furthermore, since the vertices of Y include A, E, F, and D, and the vertex with the smallest straight-line distance from candidate vertex B1 is A, and the vertex with the smallest straight-line distance from candidate vertex B2 is D, then the target vertices can be determined to include vertices A and D.

[0122] Similarly, when traversing vertex C, the method for determining the target vertex is the same as when traversing vertex A, and will not be repeated in the embodiments of this application. The target vertex determined when traversing vertex C may include both vertex B and vertex C.

[0123] In one implementation, if the ratio is less than or equal to a ratio threshold, then the execution is triggered to determine candidate vertices from multiple vertices of the target polygon region based on the target walking distance corresponding to the currently traversed pole and the straight-line distance between the currently traversed pole and each vertex of the target polygon region; wherein the number of candidate vertices determined now is less than the number of candidate vertices determined in the previous time.

[0124] For example, if when r = D / 2, the ratio of the calculated area of polygon Y to the area of its minimum bounding rectangle X is less than or equal to the ratio threshold, then we can take r = D / 3 and execute steps one and two above. If, when r = D / 3, the ratio of the calculated area of polygon Y to the area of its minimum bounding rectangle X is less than or equal to the ratio threshold, then we can take r = D / 4 and execute steps one and two above. If, up to r = D / 10, the ratio of the area of polygon Y to the area of its minimum bounding rectangle X is still less than or equal to the ratio threshold, then we take the polygon Y corresponding to the maximum ratio, and select the two vertices B1 and B2 from the four vertices of the minimum bounding rectangle X of polygon Y that are closest to the currently traversed extreme point.

[0125] S305, the four target vertices are used as the four vertices of the rectangular region corresponding to the target polygon region.

[0126] The rectangular region refers to the region obtained after shape correction of the target polygonal region. The rectangular region has a regular shape.

[0127] Optionally, after determining the four target vertices, the spatial relationship of each target vertex in the rectangular region can be identified, that is, which target vertex is the upper left vertex of the rectangular region, which target vertex is the lower left vertex of the rectangular region, which target vertex is the upper right vertex of the rectangular region, and which target vertex is the lower right vertex of the rectangular region.

[0128] In one implementation, the walking distance between any two adjacent target vertices among the four target vertices can be obtained. Based on the walking distance between any two adjacent target vertices among the four target vertices and the spatial relationship of the four target vertices in the target polygon region, the spatial relationship of the four target vertices in the rectangular region can be determined.

[0129] For example, after obtaining four target vertices, a topological sort can be performed on them. For instance, they can be arranged in clockwise order, such as... Figure 4 As shown, the four target vertices after sorting are A, B, C, and D. Optionally, they can also be arranged in a counter-clockwise order. For the four target vertices after sorting, calculate the distance (i.e., the walking distance) between any two adjacent vertices to obtain L. AB L BC L CD L DA According to L AB L BC L CD L DA The size determines the two long sides (i.e., side 1 and side 3) and the two short sides (i.e., side 2 and side 4), and determines the correspondence between the top left corner, top right corner, bottom right corner, bottom left corner and A, B, C and D.

[0130] exist Figure 4In the rectangle, since sides 1 and 3 are longer sides, and sides 2 and 4 are shorter sides, with endpoints A and B for side 1, C and D for side 3, B and C for side 2, and A and D for side 4, if A is to the left of B in the target polygon region, then A is the top-left vertex of the rectangle region, B is the top-right vertex, C is the bottom-right vertex, and D is the bottom-left vertex. If A is to the right of B in the target polygon region, then C is the top-left vertex, D is the top-right vertex, A is the bottom-right vertex, and B is the bottom-left vertex. If A and B have the same x-coordinate and A is above B in the target polygon region, then A is the top-left vertex, B is the top-right vertex, C is the bottom-right vertex, and D is the bottom-left vertex. If A and B have the same x-coordinate within the target polygonal region, and A is below B, then C is the top-left vertex of the rectangular region, D is the top-right vertex of the rectangular region, A is the bottom-right vertex of the rectangular region, and B is the bottom-left vertex of the rectangular region.

[0131] S306, based on the position of each vertex of the rectangular region in the target image, the pixel information of each pixel in the target polygonal region is mapped to the rectangular region to generate the pixel information of each pixel in the rectangular region.

[0132] In one implementation, the pixel information of each pixel in the target polygon region can be mapped to the rectangular region based on the spatial relationship of the four target vertices in the rectangular region and the position of each vertex in the rectangular region in the target image, so as to generate the pixel information of each pixel in the rectangular region.

[0133] In one implementation, the width and height of the rectangular region can be obtained based on the walking distance of each pair of adjacent vertices in the target polygonal region; based on the width and height, the coordinates of the mapping points of the four target vertices of the target polygonal region to the rectangular region can be determined; based on the walking distance of each pair of adjacent vertices in the target polygonal region and the spatial relationship of multiple vertices in the target polygonal region, the coordinates of the mapping points of the vertices other than the four target vertices of the target polygonal region to the rectangular region can be determined.

[0134] by Figure 4 For example, assuming the target vertices are A, B, C, and D, with the two longer sides being side 1 and side 3, and the two shorter sides being side 2 and side 4, calculate the distance (i.e., the walking distance) between two adjacent target vertices to obtain L. AB L BC L CD L DA The width of the rectangular region W = max(L) ABL CD The height H of the rectangular region is H = max(L) BC L DA Each pixel in the target polygonal region can be mapped to a pixel in the rectangular region. For example, if A is the top-left vertex of the rectangular region, B is the top-right vertex, C is the bottom-right vertex, and D is the bottom-left vertex, the coordinates of the mapped point of A in the rectangular region can be (0, 0), the coordinates of the mapped point of B in the rectangular region can be (W, 0), the coordinates of the mapped point of C in the rectangular region can be (W, H), and the coordinates of the mapped point of D in the rectangular region can be (0, H).

[0135] Furthermore, starting from vertex A and walking along the perimeter of the target polygonal region to vertex B, for each vertex V passed, calculate the walking distance G between V and A. The coordinates of the point mapped to V in the rectangular region are (G, 0). Similarly, starting from B and walking along the perimeter of the target polygonal region to C, for each vertex V passed, calculate the walking distance G between V and B. The coordinates of the point mapped to V in the rectangular region are (W, G). Starting from C and walking along the perimeter of the target polygonal region to D, for each vertex V passed, calculate the walking distance G between V and C. The coordinates of the point mapped to V in the rectangular region are (G, H). Starting from D and walking along the perimeter of the target polygonal region to A, for each vertex V passed, calculate the walking distance G between V and D. The coordinates of the point mapped to V in the rectangular region are (0, G).

[0136] In one implementation, the mapping points from each vertex in the target polygonal region to the rectangular region can be determined. Based on the multiple vertices of the target polygonal region and their spatial relationships within the region, the target polygonal region is triangulated to obtain multiple first triangular regions. Based on the multiple mapping points in the rectangular region and their spatial relationships within the region, the rectangular region is triangulated to obtain multiple second triangular regions. A one-to-one correspondence exists between the multiple first triangular regions and the multiple second triangular regions. The multiple first triangular regions are traversed, and for any pixel in the currently traversed first triangular region, a target pixel corresponding to that pixel is determined from the second triangular region corresponding to the currently traversed first triangular region, and the pixel information of the target pixel is mapped to that pixel. After the traversal is complete, the pixel information of each pixel in the rectangular region is generated.

[0137] For example, based on all the vertices of the target polygon region and the spatial relationships between them, the target polygon region can be divided into several triangular regions. Figure 4 For example, assuming the target polygonal region includes four vertices, A, B, C, and D, it can be divided into two triangular regions: triangle ABD and triangle BCD. Since every vertex V in the target polygonal region has a corresponding point N in the rectangular region, the triangular division of the rectangular region can be determined in the same way. For instance, if vertices V1, V2, and V3 in the target polygonal region form a triangle, then the mapping points N1 (corresponding to V1), N2 (corresponding to V2), and N3 (corresponding to V3) in the rectangular region also form a triangle. The total number of triangles T_s in the target polygonal region equals the number of triangles T_t in the rectangular region. After determining T_s and T_t and their pairings, all triangle pairs can be traversed, and the color value corresponding to each pixel in the rectangular region can be filled using a backward method. For example, if a pixel n in a rectangular region is located in triangle T3_t, and the triangle in the target polygon region corresponding to T3_t is T3_s, then the point v corresponding to n can be found from triangle T3_s in the target polygon region. The RGB color value corresponding to point v can be directly copied to pixel n to generate the pixel information of pixel n in the rectangular region.

[0138] In one implementation, if the number of target pixels corresponding to any pixel is not an integer, the interpolation method is called to calculate the pixel information of any pixel based on the pixel information of the target pixel; if the number of target pixels corresponding to any pixel is an integer, the pixel information of the target pixel is used as the pixel information of any pixel.

[0139] For example, if a pixel n in a rectangular region is located in triangle T3_t, and the triangle in the target polygonal region corresponding to T3_t is T3_s, then the point v corresponding to n can be found in triangle T3_s of the target polygonal region. If the coordinates of pixel v corresponding to pixel n are integers, then the RGB color value of point v is directly copied to pixel n. If the coordinates of pixel v corresponding to pixel n are decimals, then the color value of pixel n is calculated using bilinear interpolation.

[0140] S307, Perform OCR processing on the rectangular region and at least one polygonal region other than the target polygonal region to obtain the character recognition result of the target image.

[0141] In this embodiment, two extreme points of the target polygonal region are obtained from multiple vertices of the target polygonal region. Based on the multiple extreme points of the target polygonal region, four target vertices are determined from multiple vertices of the target polygonal region. These four target vertices are used as the four vertices of the rectangular region corresponding to the target polygonal region. Based on the position of each vertex of the rectangular region in the target image, the pixel information of each pixel in the target polygonal region is mapped to the rectangular region to generate the pixel information of each pixel in the rectangular region. This method can efficiently and conveniently perform shape correction on the target polygonal region with an irregular shape to obtain the rectangular region corresponding to the target polygonal region.

[0142] This application also provides a computer storage medium storing program instructions, which, when executed, are used to implement the corresponding methods described in the above embodiments.

[0143] Please see again Figure 5 , Figure 5 This is a schematic diagram of the structure of a character recognition device provided in an embodiment of this application.

[0144] In one implementation of the character recognition device according to the embodiments of this application, the character recognition device includes the following structure.

[0145] The text detection unit 501 is used to perform text detection on the target image to obtain at least one polygonal region; wherein, the at least one polygonal region refers to the region where at least one text is located in the target image;

[0146] The region determination unit 502 is used to determine a target polygon region from the at least one polygon region; wherein the region shape of the target polygon region is an irregular shape;

[0147] Vertex determination unit 503 is used to determine four target vertices from multiple vertices of the target polygonal region, and to use the four target vertices as the four vertices of the rectangular region corresponding to the target polygonal region; wherein, the rectangular region refers to the region obtained after shape correction of the target polygonal region, and the shape of the rectangular region is a regular shape;

[0148] The mapping unit 504 is used to map the pixel information of each pixel in the target polygonal region to the rectangular region based on the position of each vertex of the rectangular region in the target image, so as to generate the pixel information of each pixel in the rectangular region.

[0149] The character recognition unit 505 is used to perform OCR processing on the rectangular region and the polygonal regions other than the target polygonal region in the at least one polygonal region to obtain the character recognition result of the target image.

[0150] In one embodiment, the vertex determination unit 503 determines four target vertices from a plurality of vertices in the target polygon region, including:

[0151] Two extreme points of the target polygon region are obtained from multiple vertices of the target polygon region; wherein the walking distance between the two extreme points is greater than the walking distance between any vertex of the multiple vertices and any other vertex of the multiple vertices except the two extreme points.

[0152] Based on the two poles of the target polygon region, the four target vertices are determined from the multiple vertices of the target polygon region.

[0153] In one embodiment, the vertex determination unit 503 obtains two extreme points of the target polygon region from multiple vertices of the target polygon region, including:

[0154] Traverse multiple vertices of the target polygon region, control the first vertex currently being traversed to walk along the edge of the target polygon region, determine the second vertex from the multiple vertices, and obtain the target walking distance between the first vertex currently being traversed and the second vertex; wherein, the target walking distance between the first vertex currently being traversed and the second vertex is greater than the walking distance between the first vertex currently being traversed and the vertices other than the second vertex among the multiple vertices;

[0155] After the traversal is completed, the first and second vertices corresponding to the maximum walking distance in the obtained target walking distance are taken as the two extreme points of the target polygon region.

[0156] In one embodiment, the vertex determination unit 503 determines the four target vertices from a plurality of vertices of the target polygon region based on two poles of the target polygon region, including:

[0157] Traverse the two poles of the target polygon region, and determine the target point from the edges of the target polygon region based on the straight-line distance between the two poles and the straight-line distance between the currently traversed pole and each vertex of the target polygon region;

[0158] Based on the target point and the currently traversed pole, construct a candidate polygon region;

[0159] The ratio between the area of the candidate polygon region and the area of the smallest bounding rectangle region of the candidate polygon region is obtained.

[0160] If the ratio is greater than the ratio threshold, then candidate vertices are determined from the four vertices of the minimum bounding rectangle region of the candidate polygon region; wherein, the straight-line distance between the candidate vertex and the currently traversed pole is less than the straight-line distance between the vertices other than the candidate vertex in the four vertices of the minimum bounding rectangle region of the candidate polygon region and the currently traversed pole.

[0161] The vertex with the smallest straight-line distance from the candidate vertex among the multiple vertices of the candidate polygon region is taken as the target vertex.

[0162] In one embodiment, the mapping unit 504 is further configured to:

[0163] Obtain the walking distance between any two adjacent target vertices among the four target vertices;

[0164] Based on the walking distance between any two adjacent target vertices among the four target vertices, and the spatial relationship of the four target vertices in the target polygonal region, the spatial relationship of the four target vertices in the rectangular region is determined;

[0165] The mapping unit 504 maps the pixel information of each pixel in the target polygon region to the rectangular region based on the positions of each vertex of the rectangular region in the target image, thereby generating the pixel information of each pixel in the rectangular region, including:

[0166] Based on the spatial relationship of the four target vertices in the rectangular region and the position of each vertex of the rectangular region in the target image, the pixel information of each pixel in the target polygonal region is mapped to the rectangular region to generate the pixel information of each pixel in the rectangular region.

[0167] In one embodiment, the mapping unit 504 maps pixel information of each pixel in the target polygonal region to the rectangular region based on the positions of each vertex of the rectangular region in the target image, including:

[0168] Determine the mapping points from each vertex in the target polygonal region to the rectangular region;

[0169] Based on the multiple vertices of the target polygon region and the spatial relationship between the multiple vertices of the target polygon region in the target polygon region, the target polygon region is triangulated to obtain multiple first triangle regions corresponding to the target polygon region;

[0170] Based on multiple mapping points in the rectangular region and the spatial relationship between these mapping points within the rectangular region, the rectangular region is triangulated to obtain multiple second triangular regions corresponding to the rectangular region; wherein, there is a one-to-one correspondence between the multiple first triangular regions and the multiple second triangular regions.

[0171] Traverse the plurality of first triangular regions, for any pixel in the currently traversed first triangular region, determine the target pixel corresponding to the pixel in the second triangular region corresponding to the currently traversed first triangular region, and map the pixel information of the target pixel to the pixel;

[0172] After the traversal is completed, pixel information of each pixel in the rectangular region is generated.

[0173] In one embodiment, the mapping unit 504 maps the pixel information of the target pixel to any pixel, including:

[0174] If the number of target pixels corresponding to any pixel is not an integer, then the interpolation method is called to calculate the pixel information of any pixel based on the pixel information of the target pixel.

[0175] If the number of target pixels corresponding to any given pixel is an integer, then the pixel information of the target pixels is used as the pixel information of any given pixel.

[0176] In one embodiment, the mapping unit 504 determines the mapping points from each vertex in the target polygonal region to the rectangular region, including:

[0177] Based on the walking distance of each pair of adjacent vertices of the rectangular region in the target polygonal region, the width and height of the rectangular region are obtained;

[0178] Based on the width and height, determine the coordinates of the mapping points in the rectangular region to the four target vertices of the target polygonal region;

[0179] Based on the walking distance between every two adjacent vertices of the target polygon region and the spatial relationship between multiple vertices of the target polygon region, the coordinates of the mapping points in the rectangular region are determined for the vertices other than the four target vertices in the target polygon region.

[0180] In one embodiment, the region determination unit 502 determines a target polygon region from the at least one polygon region, including:

[0181] For any polygon region among the at least one polygon region, obtain the minimum bounding rectangle region of the polygon region.

[0182] Obtain the area of the first region of any polygonal region, and the area of the second region of the smallest bounding rectangle region;

[0183] If the difference between the area of the first region and the area of the second region is greater than the difference threshold, then any polygonal region is determined as the target polygonal region.

[0184] In this embodiment, the text detection unit 501 performs text detection on the target image to obtain at least one polygonal region. The region determination unit 502 determines a target polygonal region with an irregular shape from the at least one polygonal region. The vertex determination unit 503 determines four target vertices from multiple vertices of the target polygonal region and uses the four target vertices as the four vertices of the rectangular region corresponding to the target polygonal region. The mapping unit 504 maps the pixel information of each pixel in the target polygonal region to the rectangular region based on the position of each vertex of the rectangular region in the target image to generate the pixel information of each pixel in the rectangular region. The character recognition unit 505 performs OCR processing on the rectangular region and the polygonal regions in the at least one polygonal region other than the target polygonal region to obtain the character recognition result of the target image, which can simultaneously take into account the accuracy and efficiency of character recognition.

[0185] Please see again Figure 6 , Figure 6 This is a schematic diagram of the structure of a computer device provided in an embodiment of this application. The computer device in this embodiment includes a power supply module and other structures, and includes a processor 601, a storage device 602, and a communication interface 603. The processor 601, the storage device 602, and the communication interface 603 can exchange data, and the processor 601 implements the corresponding target detection method.

[0186] Storage device 602 may include volatile memory, such as random-access memory (RAM); storage device 602 may also include non-volatile memory, such as flash memory, solid-state drive (SSD), etc.; storage device 602 may also include combinations of the above types of memory.

[0187] Processor 601 may be a central processing unit (CPU). Processor 601 may also be a combination of CPU and GPU. In a server, multiple CPUs and GPUs may be included as needed for corresponding data processing. In one embodiment, storage device 602 is used to store program instructions. Processor 601 can invoke program instructions to implement the various methods described above in the embodiments of this application.

[0188] In a first possible implementation, the processor 601 of the computer device calls program instructions stored in the storage device 602 to perform text detection on the target image to obtain at least one polygonal region; wherein, the at least one polygonal region refers to the region where at least one text is located in the target image; a target polygonal region is determined from the at least one polygonal region; wherein, the region shape of the target polygonal region is irregular; four target vertices are determined from multiple vertices of the target polygonal region, and the four target vertices are used as the four vertices of a rectangular region corresponding to the target polygonal region; wherein, the rectangular region refers to the region obtained after shape correction of the target polygonal region, and the region shape of the rectangular region is regular; based on the position of each vertex of the rectangular region in the target image, the pixel information of each pixel in the target polygonal region is mapped to the rectangular region to generate the pixel information of each pixel in the rectangular region; OCR processing is performed on the rectangular region and the polygonal regions in the at least one polygonal region other than the target polygonal region to obtain the character recognition result of the target image.

[0189] In one embodiment, when the processor 601 determines four target vertices from a plurality of vertices in the target polygon region, it may perform the following operations:

[0190] Two extreme points of the target polygon region are obtained from multiple vertices of the target polygon region; wherein the walking distance between the two extreme points is greater than the walking distance between any vertex of the multiple vertices and any other vertex of the multiple vertices except the two extreme points.

[0191] Based on the two poles of the target polygon region, the four target vertices are determined from the multiple vertices of the target polygon region.

[0192] In one embodiment, when the processor 601 obtains two extreme points of the target polygon region from multiple vertices of the target polygon region, it may perform the following operations:

[0193] Traverse multiple vertices of the target polygon region, control the first vertex currently being traversed to walk along the edge of the target polygon region, determine the second vertex from the multiple vertices, and obtain the target walking distance between the first vertex currently being traversed and the second vertex; wherein, the target walking distance between the first vertex currently being traversed and the second vertex is greater than the walking distance between the first vertex currently being traversed and the vertices other than the second vertex among the multiple vertices;

[0194] After the traversal is completed, the first and second vertices corresponding to the maximum walking distance in the obtained target walking distance are taken as the two extreme points of the target polygon region.

[0195] In one embodiment, when the processor 601 determines the four target vertices from a plurality of vertices of the target polygon region based on the two poles of the target polygon region, it may perform the following operations:

[0196] Traverse the two poles of the target polygon region, and determine the target point from the edges of the target polygon region based on the straight-line distance between the two poles and the straight-line distance between the currently traversed pole and each vertex of the target polygon region;

[0197] Based on the target point and the currently traversed pole, construct a candidate polygon region;

[0198] The ratio between the area of the candidate polygon region and the area of the smallest bounding rectangle region of the candidate polygon region is obtained.

[0199] If the ratio is greater than the ratio threshold, then candidate vertices are determined from the four vertices of the minimum bounding rectangle region of the candidate polygon region; wherein, the straight-line distance between the candidate vertex and the currently traversed pole is less than the straight-line distance between the vertices other than the candidate vertex in the four vertices of the minimum bounding rectangle region of the candidate polygon region and the currently traversed pole.

[0200] The vertex with the smallest straight-line distance from the candidate vertex among the multiple vertices of the candidate polygon region is taken as the target vertex.

[0201] In one embodiment, the processor 601 is further configured to perform the following operations:

[0202] Obtain the walking distance between any two adjacent target vertices among the four target vertices;

[0203] Based on the walking distance between any two adjacent target vertices among the four target vertices, and the spatial relationship of the four target vertices in the target polygonal region, the spatial relationship of the four target vertices in the rectangular region is determined;

[0204] When the processor 601 maps the pixel information of each pixel in the target polygonal region to the rectangular region based on the position of each vertex of the rectangular region in the target image, in order to generate the pixel information of each pixel in the rectangular region, it may perform the following operations:

[0205] Based on the spatial relationship of the four target vertices in the rectangular region and the position of each vertex of the rectangular region in the target image, the pixel information of each pixel in the target polygonal region is mapped to the rectangular region to generate the pixel information of each pixel in the rectangular region.

[0206] In one embodiment, when the processor 601 maps pixel information of each pixel in the target polygonal region to the rectangular region based on the positions of each vertex of the rectangular region in the target image, it may perform the following operations:

[0207] Determine the mapping points from each vertex in the target polygonal region to the rectangular region;

[0208] Based on the multiple vertices of the target polygon region and the spatial relationship between the multiple vertices of the target polygon region in the target polygon region, the target polygon region is triangulated to obtain multiple first triangle regions corresponding to the target polygon region;

[0209] Based on multiple mapping points in the rectangular region and the spatial relationship between these mapping points within the rectangular region, the rectangular region is triangulated to obtain multiple second triangular regions corresponding to the rectangular region; wherein, there is a one-to-one correspondence between the multiple first triangular regions and the multiple second triangular regions.

[0210] Traverse the plurality of first triangular regions, for any pixel in the currently traversed first triangular region, determine the target pixel corresponding to the pixel in the second triangular region corresponding to the currently traversed first triangular region, and map the pixel information of the target pixel to the pixel;

[0211] After the traversal is completed, pixel information of each pixel in the rectangular region is generated.

[0212] In one embodiment, when the processor 601 maps the pixel information of the target pixel to any pixel, it may perform the following operations:

[0213] If the number of target pixels corresponding to any pixel is not an integer, then the interpolation method is called to calculate the pixel information of any pixel based on the pixel information of the target pixel.

[0214] If the number of target pixels corresponding to any given pixel is an integer, then the pixel information of the target pixels is used as the pixel information of any given pixel.

[0215] In one embodiment, when the processor 601 determines the mapping points of each vertex in the target polygonal region to the rectangular region, it may perform the following operations:

[0216] Based on the walking distance of each pair of adjacent vertices of the rectangular region in the target polygonal region, the width and height of the rectangular region are obtained;

[0217] Based on the width and height, determine the coordinates of the mapping points in the rectangular region to the four target vertices of the target polygonal region;

[0218] Based on the walking distance between every two adjacent vertices of the target polygon region and the spatial relationship between multiple vertices of the target polygon region, the coordinates of the mapping points in the rectangular region are determined for the vertices other than the four target vertices in the target polygon region.

[0219] In one embodiment, when the processor 601 determines the target polygon region from the at least one polygon region, it may perform the following operations:

[0220] For any polygon region among the at least one polygon region, obtain the minimum bounding rectangle region of the polygon region.

[0221] Obtain the area of the first region of any polygonal region, and the area of the second region of the smallest bounding rectangle region;

[0222] If the difference between the area of the first region and the area of the second region is greater than the difference threshold, then any polygonal region is determined as the target polygonal region.

[0223] In this embodiment, the processor 601 performs text detection on the target image to obtain at least one polygonal region. From the at least one polygonal region, it determines a target polygonal region with an irregular shape. From the multiple vertices of the target polygonal region, it determines four target vertices and uses these four target vertices as the four vertices of the rectangular region corresponding to the target polygonal region. Based on the position of each vertex of the rectangular region in the target image, it maps the pixel information of each pixel in the target polygonal region to the rectangular region to generate the pixel information of each pixel in the rectangular region. OCR processing is performed on the rectangular region and the polygonal regions in the at least one polygonal region other than the target polygonal region to obtain the character recognition result of the target image, which can simultaneously take into account the accuracy and efficiency of character recognition.

[0224] Those skilled in the art will understand that all or part of the processes in the methods of the above embodiments can be implemented by a computer program instructing related hardware. The computer program can be stored in a computer-readable storage medium, and when executed, it can include the processes of the embodiments of the methods described above. The computer-readable storage medium can be a magnetic disk, optical disk, read-only memory (ROM), or random access memory (RAM), etc. The computer-readable storage medium can mainly include a program storage area and a data storage area. The program storage area can store the operating system, at least one application program required for a function, etc.; the data storage area can store data created based on the use of blockchain nodes, etc.

[0225] The above-disclosed embodiments are merely some of the embodiments of this application, and should not be construed as limiting the scope of this application. Those skilled in the art can understand that implementing all or part of the above embodiments and making equivalent changes in accordance with the claims of this application still fall within the scope of this invention.

Claims

1. A character recognition method, characterized in that, include: Text detection is performed on the target image to obtain at least one polygonal region; wherein, the at least one polygonal region refers to the region where at least one text is located in the target image; A target polygon region is determined from the at least one polygon region; wherein the shape of the target polygon region is an irregular shape; Two extreme points of the target polygon region are obtained from multiple vertices of the target polygon region; wherein the walking distance between the two extreme points is greater than the walking distance between any vertex of the multiple vertices and any other vertex of the multiple vertices except the two extreme points. Traverse the two poles of the target polygon region, and determine the target point from the edges of the target polygon region based on the straight-line distance between the two poles and the straight-line distance between the currently traversed pole and each vertex of the target polygon region; Based on the target point and the currently traversed pole, construct a candidate polygon region; The ratio between the area of the candidate polygon region and the area of the smallest bounding rectangle region of the candidate polygon region is obtained. If the ratio is greater than the ratio threshold, then candidate vertices are determined from the four vertices of the minimum bounding rectangle region of the candidate polygon region; wherein, the straight-line distance between the candidate vertex and the currently traversed pole is less than the straight-line distance between the vertices other than the candidate vertex in the four vertices of the minimum bounding rectangle region of the candidate polygon region and the currently traversed pole. Among the multiple vertices of the candidate polygon region, the vertex with the smallest straight-line distance to the candidate vertex is taken as the target vertex, and the four target vertices are taken as the four vertices of the rectangular region corresponding to the target polygon region; wherein, the rectangular region refers to the region obtained after shape correction of the target polygon region, and the shape of the rectangular region is a regular shape; Based on the positions of each vertex of the rectangular region in the target image, the pixel information of each pixel in the target polygonal region is mapped to the rectangular region to generate the pixel information of each pixel in the rectangular region; Optical character recognition (OCR) processing is performed on the rectangular region and the polygonal regions other than the target polygonal region in the at least one polygonal region to obtain the character recognition result of the target image.

2. The method according to claim 1, characterized in that, The step of obtaining the two extreme points of the target polygon region from multiple vertices of the target polygon region includes: Traverse multiple vertices of the target polygon region, control the first vertex currently being traversed to walk along the edge of the target polygon region, determine the second vertex from the multiple vertices, and obtain the target walking distance between the first vertex currently being traversed and the second vertex; wherein, the target walking distance between the first vertex currently being traversed and the second vertex is greater than the walking distance between the first vertex currently being traversed and the vertices other than the second vertex among the multiple vertices; After the traversal is completed, the first and second vertices corresponding to the maximum walking distance in the obtained target walking distance are taken as the two extreme points of the target polygon region.

3. The method according to claim 1, characterized in that, The method further includes: If the ratio is less than or equal to the ratio threshold, then candidate vertices are determined from multiple vertices of the target polygon region based on the target walking distance corresponding to the currently traversed pole and the straight-line distance between the currently traversed pole and each vertex of the target polygon region; wherein the number of candidate vertices determined now is less than the number of candidate vertices determined in the previous time.

4. The method according to claim 1, characterized in that, The method further includes: Obtain the walking distance between any two adjacent target vertices among the four target vertices; Based on the walking distance between any two adjacent target vertices among the four target vertices, and the spatial relationship of the four target vertices in the target polygonal region, the spatial relationship of the four target vertices in the rectangular region is determined; The step of mapping pixel information of each pixel in the target polygonal region to the rectangular region based on the positions of each vertex of the rectangular region in the target image, to generate pixel information of each pixel in the rectangular region, includes: Based on the spatial relationship of the four target vertices in the rectangular region and the position of each vertex of the rectangular region in the target image, the pixel information of each pixel in the target polygonal region is mapped to the rectangular region to generate the pixel information of each pixel in the rectangular region.

5. The method according to claim 1, characterized in that, The step of mapping pixel information of each pixel in the target polygonal region to the rectangular region based on the positions of each vertex of the rectangular region in the target image includes: Determine the mapping points from each vertex in the target polygonal region to the rectangular region; Based on the multiple vertices of the target polygon region and the spatial relationship between the multiple vertices of the target polygon region in the target polygon region, the target polygon region is triangulated to obtain multiple first triangle regions corresponding to the target polygon region; Based on multiple mapping points in the rectangular region and the spatial relationship between these mapping points within the rectangular region, the rectangular region is triangulated to obtain multiple second triangular regions corresponding to the rectangular region; wherein, there is a one-to-one correspondence between the multiple first triangular regions and the multiple second triangular regions. Traverse the plurality of first triangular regions, for any pixel in the currently traversed first triangular region, determine the target pixel corresponding to the pixel in the second triangular region corresponding to the currently traversed first triangular region, and map the pixel information of the target pixel to the pixel; After the traversal is completed, pixel information of each pixel in the rectangular region is generated.

6. The method according to claim 5, characterized in that, The step of mapping the pixel information of the target pixel to any pixel includes: If the number of target pixels corresponding to any pixel is not an integer, then the interpolation method is called to calculate the pixel information of any pixel based on the pixel information of the target pixel. If the number of target pixels corresponding to any given pixel is an integer, then the pixel information of the target pixels is used as the pixel information of any given pixel.

7. The method according to claim 5, characterized in that, Determining the mapping points from each vertex in the target polygonal region to the rectangular region includes: Based on the walking distance of each pair of adjacent vertices of the rectangular region in the target polygonal region, the width and height of the rectangular region are obtained; Based on the width and height, determine the coordinates of the mapping points in the rectangular region to the four target vertices of the target polygonal region; Based on the walking distance between every two adjacent vertices of the target polygon region and the spatial relationship between multiple vertices of the target polygon region, the coordinates of the mapping points in the rectangular region are determined for the vertices other than the four target vertices in the target polygon region.

8. The method according to claim 1, characterized in that, Determining the target polygon region from the at least one polygon region includes: For any polygon region among the at least one polygon region, obtain the minimum bounding rectangle region of the polygon region. Obtain the area of the first region of any polygonal region, and the area of the second region of the smallest bounding rectangle region; If the difference between the area of the first region and the area of the second region is greater than the difference threshold, then any polygonal region is determined as the target polygonal region.

9. A character recognition device, characterized in that, The character recognition device includes: A text detection unit is used to perform text detection on a target image to obtain at least one polygonal region; wherein, the at least one polygonal region refers to the region where at least one text is located in the target image; A region determination unit is configured to determine a target polygon region from the at least one polygon region; wherein the region shape of the target polygon region is an irregular shape; A vertex determination unit is used to obtain two extreme points of the target polygon region from multiple vertices of the target polygon region; wherein the walking distance between the two extreme points is greater than the walking distance between any vertex among the multiple vertices and any other vertex among the multiple vertices excluding the two extreme points; traversing the two extreme points of the target polygon region, and determining a target point from the edges of the target polygon region based on the straight-line distance between the two extreme points and the straight-line distance between the currently traversed extreme point and each vertex of the target polygon region; Based on the target point and the currently traversed pole, a candidate polygon region is constructed; the ratio between the area of the candidate polygon region and the area of the minimum bounding rectangle region of the candidate polygon region is obtained; if the ratio is greater than a ratio threshold, candidate vertices are determined from the four vertices of the minimum bounding rectangle region of the candidate polygon region; wherein, the straight-line distance between the candidate vertex and the currently traversed pole is less than the straight-line distance between the vertices other than the candidate vertex and the currently traversed pole among the four vertices of the minimum bounding rectangle region of the candidate polygon region; the vertex with the smallest straight-line distance from the candidate vertex among the multiple vertices of the candidate polygon region is taken as the target vertex, and the four target vertices are taken as the four vertices of the rectangular region corresponding to the target polygon region; wherein, the rectangular region refers to the region obtained after shape correction of the target polygon region, and the shape of the rectangular region is a regular shape; A mapping unit is used to map the pixel information of each pixel in the target polygonal region to the rectangular region based on the position of each vertex of the rectangular region in the target image, so as to generate the pixel information of each pixel in the rectangular region; A character recognition unit is used to perform OCR processing on the rectangular region and the polygonal regions other than the target polygonal region in the at least one polygonal region to obtain the character recognition result of the target image.

10. A computer device, characterized in that, The computer device includes a processor, a storage device, and a communication interface, which are interconnected. The storage device stores a computer program, which includes program instructions. The processor is configured to invoke the program instructions to execute the character recognition method as described in any one of claims 1 to 8.

11. A computer-readable storage medium, characterized in that, The computer-readable storage medium stores a computer program, the computer program including program instructions that, when executed by a processor, cause the processor to perform the character recognition method as described in any one of claims 1 to 8.

12. A computer program product, characterized in that, The computer program product includes a computer program adapted to be loaded by a processor and executed as described in any one of claims 1 to 8.

Citation Information

Patent Citations

Method and device for processing picture
CN102855482A

Patent Information

Abstract

Description

Patent Citations

Method and device for processing picture