Method for character recognition and electronic device
By segmenting text images into sub-images and performing character recognition, matching, and splicing processes, the problem of low accuracy in character recognition models for longer text images in existing technologies is solved, achieving higher recognition accuracy and a wider range of applications.
Patent Information
- Authority / Receiving Office
- CN · China
- Patent Type
- Patents(China)
- Current Assignee / Owner
- SHANGHAI HONGJI INFORMATION TECH CO LTD
- Filing Date
- 2022-09-26
- Publication Date
- 2026-06-12
AI Technical Summary
Existing character recognition models have low accuracy and limited applicability when processing long text images, and cannot effectively recognize text images containing long text.
The text image is divided into multiple sub-images based on the segmentation length threshold and the preset overlap length. A character recognition model is used to identify the character information of each sub-image. The overlapping characters of adjacent sub-images are processed by matching and splicing to improve the recognition accuracy.
It improves the accuracy and applicability of character recognition in longer text images and enhances the processing capabilities of the character recognition model.
Smart Images

Figure CN115497100B_ABST
Abstract
Description
Technical Field
[0001] This application relates to the field of computer vision technology, and more specifically, to methods and electronic devices for character recognition. Background Technology
[0002] Character recognition technology for text images (i.e., images containing text) is a common technique in computer vision, and is often used in scenarios such as document information extraction, certificate recognition, and qualification verification.
[0003] In the current technology, character recognition models are usually used to perform character recognition on text images to obtain character recognition information.
[0004] However, due to limitations in device memory or video memory, character recognition models typically cannot support long text lines in images, limiting their applicability. Furthermore, the accuracy of character recognition is usually low when the text length in the image is long. Therefore, improving the accuracy and applicability of character recognition for images containing long text is a problem that needs to be addressed. Summary of the Invention
[0005] The purpose of this application is to provide a method and electronic device for character recognition, which can improve the accuracy and applicability of character recognition when recognizing text images containing long text.
[0006] On the one hand, a method for character recognition is provided, including:
[0007] Based on the segmentation length threshold and the overlap length preset value, the target text image to be identified is segmented to obtain multiple text sub-images; the overlap length preset value is the length of the overlapping area between any two adjacent text sub-images; the length of the text sub-image is greater than the overlap length preset value but not greater than the segmentation length threshold;
[0008] A character recognition model is used to perform character recognition on each text sub-image to obtain the character recognition information corresponding to each text sub-image.
[0009] The overlapping characters of each pair of adjacent text sub-images are matched to obtain the matching results; the overlapping characters of two adjacent text sub-images are the characters identified from the overlapping areas of the two adjacent text sub-images respectively;
[0010] Based on the matching results, the character recognition information of each character is concatenated to obtain the character recognition information of the target text image.
[0011] In the above implementation process, based on the segmentation length threshold and the overlap length preset value, the target text image containing longer text is divided into multiple text sub-images containing shorter text, thereby solving the problem that character recognition models have difficulty accurately recognizing images containing longer text.
[0012] In one embodiment, before segmenting the target text image to be identified based on a segmentation length threshold and a preset overlap length value to obtain multiple text sub-images, the method further includes:
[0013] Perform text line detection on the original image to obtain the text line regions;
[0014] Extract the text line image containing the text line region from the original image;
[0015] The scaling ratio is obtained based on the height of the image in the text line and the preset image height value;
[0016] The height and length of the text line image are scaled according to the scaling ratio to obtain the target text image.
[0017] In the above implementation process, the original image is preprocessed to segment the target text image that contains only text line regions, providing a traversal for subsequent image segmentation.
[0018] In one implementation, the target text image to be identified is segmented based on a segmentation length threshold and a preset overlap length value to obtain multiple text sub-images, including:
[0019] Based on the segmentation length threshold and the overlap length preset value, the target text image is divided to obtain at least one text sub-image with a length of the segmentation length threshold, and at least one text sub-image with a length greater than the overlap length preset value and less than the segmentation length threshold.
[0020] Alternatively, based on the segmentation length threshold and the preset overlap length, the target text image can be divided into multiple text sub-images of the same length.
[0021] In the above implementation process, different methods can be used for image segmentation.
[0022] In one implementation, a character recognition model is used to perform character recognition on each text sub-image to obtain the character recognition information corresponding to each text sub-image, including:
[0023] Each text sub-image is input into the character recognition model to obtain each character in each text sub-image and the confidence score corresponding to each character.
[0024] The character recognition information includes the character and its corresponding confidence level, where the confidence level is the reliability of the character recognition.
[0025] In the above implementation process, the characters in the image and their corresponding confidence levels can be identified.
[0026] In one implementation, the overlapping characters of every two adjacent text sub-images are matched to obtain matching results, including:
[0027] For the first target text sub-image and the second target text sub-image in each text sub-image, match the last n characters in the first character recognition information of the first target text sub-image with the first n characters in the second character recognition information of the second target text sub-image to obtain the matching result;
[0028] Wherein, the first target text sub-image and the second target text sub-image are any two adjacent text sub-images among the text sub-images, and the first target text sub-image is the text sub-image preceding the second target text sub-image, n is the maximum number of overlapping characters, and n is a positive integer.
[0029] In the above implementation process, overlapping characters in adjacent text sub-images are matched for subsequent character filtering.
[0030] In one implementation, based on the matching results, the character recognition information is concatenated to obtain the character recognition information of the target text image, including:
[0031] For the first target text sub-image and the second target text sub-image in each text sub-image, based on the number of characters in the overlapping region and the matching results, character filtering processing is performed on the first character recognition information of the first target text sub-image and the second character recognition information of the second target text sub-image; the number of characters in the overlapping region is used to indicate the number of overlapping characters contained in an overlapping region of a text sub-image;
[0032] The first character recognition information and the second character recognition information after character filtering are concatenated.
[0033] In the above implementation process, overlapping characters in adjacent text sub-images are matched and filtered, which improves the accuracy of character filtering.
[0034] In one embodiment, character filtering processing is performed on the first character recognition information of the first target text sub-image and the second character recognition information of the second target text sub-image, including:
[0035] Repeat the following steps until the character filtering is complete:
[0036] If, based on the matching results, it is determined that the last m characters in the first character recognition information are the same as the first m characters in the second character recognition information, then the corresponding identical characters in the last m characters and the first m characters are deduplicated. m is the number of characters in the overlapping area, and m is a positive integer. The initial value of the number of characters in the overlapping area is the maximum number of overlapping characters.
[0037] If it is determined that m is greater than 1, and the matching results indicate that there are corresponding characters that are the same as the first m characters and also corresponding characters that are different, then the corresponding characters that are the same in the last m characters and the first m characters are deduplicated, and the corresponding characters that are different in the last m characters and the first m characters are filtered according to the confidence of each character.
[0038] If m=1 is determined, and the last character in the first character recognition information and the first character in the second character recognition information are determined to be different based on the matching results, then character filtering is performed on the last character and the first character based on the confidence scores of the last character and the first character.
[0039] If it is determined that m is greater than 1, and the matching results show that there are no corresponding characters in the last m characters and the first m characters, then the number of characters in the overlapping area is reduced by one to obtain the updated number of characters in the overlapping area.
[0040] In the above implementation process, different methods are used for character filtering based on different numbers of overlapping characters, which improves the accuracy of character filtering.
[0041] In one implementation, deduplication is performed on the last m characters and the first m characters to remove duplicate characters, including:
[0042] For the same first target character and second target character, remove the first target character, or remove the second target character;
[0043] Wherein, the first target character is the i-th character among the last m characters, and the second target character is the i-th character among the first m characters, where i represents the character's index, i is a positive integer, and i is not greater than m.
[0044] In the above implementation process, duplicate overlapping characters can be removed.
[0045] In one implementation, based on the confidence level of each character, the following steps are taken: Selecting characters that are distinct between the last m characters and the first m characters, including:
[0046] For different first target characters and second target characters, determine the minimum confidence level between the confidence levels of the first target character and the second target character, and remove the character corresponding to the minimum confidence level;
[0047] Wherein, the first target character is the i-th character among the last m characters, and the second target character is the i-th character among the first m characters, where i represents the character's index, i is a positive integer, and i is not greater than m.
[0048] In the above implementation process, overlapping characters with high confidence can be filtered out.
[0049] In one implementation, character filtering is performed on the last character and the first character based on the confidence scores of the last character and the first character, including:
[0050] If it is determined that the confidence scores of the last character and the first character are both greater than the upper confidence threshold, then the last character and the first character are retained.
[0051] If it is determined that the confidence scores of the last character and the first character are both less than the lower confidence threshold, then remove the last character and the first character.
[0052] If it is determined that there is a character in the last character and the first character whose confidence level is not less than the lower confidence level threshold and not greater than the upper confidence level threshold, then determine the minimum confidence level between the confidence level of the last character and the confidence level of the first character, and remove the character corresponding to the minimum confidence level.
[0053] In the above implementation process, characters can be filtered based on confidence level, which improves the accuracy of subsequent character concatenation.
[0054] On the one hand, a character recognition device is provided, comprising:
[0055] The segmentation unit is used to segment the target text image to be identified based on a segmentation length threshold and an overlap length preset value to obtain multiple text sub-images. The overlap length preset value is the length of the overlapping area between any two adjacent text sub-images. The length of the text sub-image is greater than the overlap length preset value but not greater than the segmentation length threshold. The recognition unit is used to perform character recognition on each text sub-image using a character recognition model to obtain the character recognition information corresponding to each text sub-image. The matching unit is used to match the overlapping characters of each pair of adjacent text sub-images to obtain the matching result. The overlapping characters of two adjacent text sub-images are characters identified separately from the overlapping areas of the two adjacent text sub-images. The splicing unit is used to splice the character recognition information according to the matching result to obtain the character recognition information of the target text image.
[0056] In one embodiment, the segmentation unit is further configured to: perform text line detection on the original image to obtain text line regions; divide the original image into text line images containing text line regions; obtain a scaling ratio based on the height of the text line images and a preset image height value; and scale the height and length of the text line images according to the scaling ratio to obtain the target text image.
[0057] In one implementation, the segmentation unit is used for:
[0058] Based on the segmentation length threshold and the overlap length preset value, the target text image is divided to obtain at least one text sub-image with a length equal to the segmentation length threshold, and at least one text sub-image with a length greater than the overlap length preset value and less than the segmentation length threshold; or, based on the segmentation length threshold and the overlap length preset value, the target text image is divided into equal-length segments to obtain multiple text sub-images of the same length.
[0059] In one embodiment, the identification unit is used for:
[0060] Each text sub-image is input into the character recognition model to obtain each character in each text sub-image and the confidence score corresponding to each character.
[0061] The character recognition information includes the character and its corresponding confidence level, where the confidence level is the reliability of the character recognition.
[0062] In one implementation, the matching unit is used for:
[0063] For the first target text sub-image and the second target text sub-image in each text sub-image, match the last n characters in the first character recognition information of the first target text sub-image with the first n characters in the second character recognition information of the second target text sub-image to obtain the matching result;
[0064] Wherein, the first target text sub-image and the second target text sub-image are any two adjacent text sub-images among the text sub-images, and the first target text sub-image is the text sub-image preceding the second target text sub-image, n is the maximum number of overlapping characters, and n is a positive integer.
[0065] In one embodiment, the splicing unit is used for:
[0066] For the first target text sub-image and the second target text sub-image in each text sub-image, based on the number of characters in the overlapping region and the matching results, character filtering processing is performed on the first character recognition information of the first target text sub-image and the second character recognition information of the second target text sub-image; the number of characters in the overlapping region is used to indicate the number of overlapping characters contained in an overlapping region of a text sub-image;
[0067] The first character recognition information and the second character recognition information after character filtering are concatenated.
[0068] In one embodiment, the splicing unit is used for:
[0069] Repeat the following steps until the character filtering is complete:
[0070] If, based on the matching results, it is determined that the last m characters in the first character recognition information are the same as the first m characters in the second character recognition information, then the corresponding identical characters in the last m characters and the first m characters are deduplicated. m is the number of characters in the overlapping area, and m is a positive integer. The initial value of the number of characters in the overlapping area is the maximum number of overlapping characters.
[0071] If it is determined that m is greater than 1, and the matching results indicate that there are corresponding characters that are the same as the first m characters and also corresponding characters that are different, then the corresponding characters that are the same in the last m characters and the first m characters are deduplicated, and the corresponding characters that are different in the last m characters and the first m characters are filtered according to the confidence of each character.
[0072] If m=1 is determined, and the last character in the first character recognition information and the first character in the second character recognition information are determined to be different based on the matching results, then character filtering is performed on the last character and the first character based on the confidence scores of the last character and the first character.
[0073] If it is determined that m is greater than 1, and the matching results show that there are no corresponding characters in the last m characters and the first m characters, then the number of characters in the overlapping area is reduced by one to obtain the updated number of characters in the overlapping area.
[0074] In one embodiment, the splicing unit is used for:
[0075] For the same first target character and second target character, remove the first target character, or remove the second target character;
[0076] Wherein, the first target character is the i-th character among the last m characters, and the second target character is the i-th character among the first m characters, where i represents the character's index, i is a positive integer, and i is not greater than m.
[0077] In one embodiment, the splicing unit is used for:
[0078] For different first target characters and second target characters, determine the minimum confidence level between the confidence levels of the first target character and the second target character, and remove the character corresponding to the minimum confidence level;
[0079] Wherein, the first target character is the i-th character among the last m characters, and the second target character is the i-th character among the first m characters, where i represents the character's index, i is a positive integer, and i is not greater than m.
[0080] In one embodiment, the splicing unit is used for:
[0081] If it is determined that the confidence scores of the last character and the first character are both greater than the upper confidence threshold, then the last character and the first character are retained.
[0082] If it is determined that the confidence scores of the last character and the first character are both less than the lower confidence threshold, then remove the last character and the first character.
[0083] If it is determined that there is a character in the last character and the first character whose confidence level is not less than the lower confidence level threshold and not greater than the upper confidence level threshold, then determine the minimum confidence level between the confidence level of the last character and the confidence level of the first character, and remove the character corresponding to the minimum confidence level.
[0084] On one hand, an electronic device is provided, including a processor and a memory storing computer-readable instructions that, when executed by the processor, perform the steps of the method provided in any of the above-described alternative implementations of character recognition.
[0085] On the one hand, a computer-readable storage medium is provided on which a computer program is stored, which, when executed by a processor, performs the steps of the method provided in any of the various alternative implementations of character recognition described above.
[0086] On the one hand, a computer program product is provided that, when run on a computer, causes the computer to perform the steps of the method provided in any of the various alternative implementations of character recognition described above.
[0087] Other features and advantages of this application will be set forth in the description which follows, and will be apparent in part from the description, or may be learned by practicing the application. The objectives and other advantages of this application may be realized and obtained by means of the structures particularly pointed out in the written description, claims, and drawings. Attached Figure Description
[0088] To more clearly illustrate the technical solutions of the embodiments of this application, the accompanying drawings used in the embodiments of this application will be briefly introduced below. It should be understood that the following drawings only show some embodiments of this application and should not be regarded as a limitation of the scope. For those skilled in the art, other related drawings can be obtained based on these drawings without creative effort.
[0089] Figure 1 A flowchart illustrating a character recognition method provided in an embodiment of this application;
[0090] Figure 2 This is a schematic diagram of image segmentation provided in an embodiment of this application;
[0091] Figure 3 A flowchart illustrating the implementation of a character filtering method provided in this application embodiment;
[0092] Figure 4 A character recognition schematic diagram provided for an embodiment of this application;
[0093] Figure 5 An example diagram illustrating a comparison of test metrics provided in an embodiment of this application;
[0094] Figure 6 A structural block diagram of a character recognition device provided in an embodiment of this application;
[0095] Figure 7 This is a schematic diagram of the structure of an electronic device provided in an embodiment of this application. Detailed Implementation
[0096] The technical solutions of the embodiments of this application will be clearly and completely described below with reference to the accompanying drawings. Obviously, the described embodiments are only a part of the embodiments of this application, and not all of the embodiments. The components of the embodiments of this application described and shown in the accompanying drawings can generally be arranged and designed in various different configurations. Therefore, the following detailed description of the embodiments of this application provided in the accompanying drawings is not intended to limit the scope of the claimed application, but merely represents selected embodiments of this application. All other embodiments obtained by those skilled in the art based on the embodiments of this application without inventive effort are within the scope of protection of this application.
[0097] Optical Character Recognition (OCR) technology refers to the recognition of optical characters through image processing and pattern recognition techniques. For example, for printed characters, optical methods can be used to convert the text in a paper document into a black-and-white dot matrix image file, and then recognition software can convert the text in the image into text format for further editing by word processing software.
[0098] Optical Computation (OCR) is a branch of Computer Vision (CV) research and an important component of computer science. CV technology studies how to enable machines to "see," specifically, using cameras and computers to replace human eyes for target recognition, tracking, and measurement, followed by image processing to create images more suitable for human observation or transmission to instruments. As a scientific discipline, computer vision researches related theories and technologies, attempting to build artificial intelligence systems capable of extracting information from images or multidimensional data. OCR, as a commonly used technique in CV, is widely applied in practical projects such as document information extraction, certificate recognition, and qualification verification, especially in Robotic Process Automation (RPA) projects.
[0099] RPA technology can simulate employees' daily computer operations using keyboards and mice, replacing humans in tasks such as logging into systems, operating software, reading and writing data, downloading files, and retrieving emails. Using automated robots as virtual labor for enterprises can free employees from repetitive, low-value tasks, allowing them to focus on high-value work. This enables enterprises to reduce costs and increase efficiency while undergoing digital and intelligent transformation. RPA uses software robots to replace manual tasks in business processes and interacts with computer front-end systems like humans. Therefore, RPA can be seen as a software-based program robot running on a personal PC or server, mimicking user operations on a computer to automatically perform repetitive tasks such as retrieving emails, downloading attachments, logging into systems, and data processing and analysis—fast, accurate, and reliable. While both RPA and traditional physical robots address speed and accuracy issues in human work through specific rules, traditional physical robots are hardware-software hybrids requiring specific hardware support and software to perform tasks. RPA robots, on the other hand, are purely software-based; once the appropriate software is installed, they can be deployed to any PC or server to complete the assigned tasks. In other words, RPA is a method and related technologies that utilize "digital employees" to perform business operations in place of humans. Essentially, RPA uses software automation technology to simulate human operation of computer systems, software, web pages, and documents, acquiring business information, executing business actions, and ultimately achieving automated process processing, labor cost savings, and improved processing efficiency. As described, in some RPA application scenarios, OCR technology can be used to recognize text and other information on the interface, and based on the recognized text information, simulate human actions such as clicking the mouse and typing on the keyboard.
[0100] First, some of the terms used in the embodiments of this application will be explained to facilitate understanding by those skilled in the art.
[0101] Terminal devices can be mobile terminals, fixed terminals, or portable terminals, such as mobile phones, sites, units, devices, multimedia computers, multimedia tablets, internet nodes, communicators, desktop computers, laptop computers, notebook computers, netbook computers, tablet computers, personal communication system devices, personal navigation devices, personal digital assistants, audio / video players, digital cameras / camcorders, positioning devices, television receivers, radio broadcast receivers, e-book devices, gaming devices, or any combination thereof, including accessories and peripherals of these devices, or any combination thereof. It is also foreseeable that terminal devices can support any type of user-facing interface (e.g., wearable devices).
[0102] Servers can be independent physical servers, server clusters or distributed systems composed of multiple physical servers, or cloud servers that provide basic cloud computing services such as cloud services, cloud databases, cloud computing, cloud functions, cloud storage, network services, cloud communication, middleware services, domain name services, security services, and big data and artificial intelligence platforms.
[0103] Traditionally, character recognition models are used to identify characters in text images. However, in practice, these models are typically only accurate for text within a certain length (e.g., 10 pixels). When the text in an image exceeds this length, the accuracy of character recognition is usually low.
[0104] Therefore, in order to improve the accuracy and applicability of character recognition when recognizing text images containing long text, embodiments of this application provide a character recognition method and an electronic device.
[0105] See Figure 1 The diagram shown is a flowchart of a character recognition method provided in an embodiment of this application. This method is applied to an electronic device, which can be a server or a terminal device. The specific implementation flow of this method is as follows:
[0106] Step 101: Based on the segmentation length threshold and the preset overlap length, the target text image to be identified is segmented to obtain multiple text sub-images.
[0107] The preset overlap length is the length of the overlapping area between any two adjacent text sub-images. The length of each text sub-image must be greater than the preset overlap length but not greater than the segmentation length threshold. Different text sub-images can be the same or different.
[0108] In one embodiment, the implementation process of step 101 may further include: performing image preprocessing on the original image to obtain the target text image.
[0109] In one implementation, the specific steps of preprocessing the original image to obtain the target text image may include:
[0110] S101-1: Perform text line detection on the original image to obtain the text line region.
[0111] This allows us to detect areas in the original image that contain text.
[0112] S101-2: Extract the text line image containing the text line region from the original image.
[0113] In this way, we can first extract the text line image from the original image, which contains only the text line region.
[0114] S101-3: Obtain the scaling ratio based on the height of the text line image and the preset image height value.
[0115] S101-4: Scale the height and length of the text line image according to the scaling ratio to obtain the target text image.
[0116] In one embodiment, the height of the text line image is scaled to a preset image height value, and the ratio between the length of the text line image and the scaling ratio (i.e., the length of the text line image after scaling) is determined, and the length of the text line image is scaled to that ratio.
[0117] As an example, the scaling ratio is scale = h1 / h2. Therefore, the height of the scaled text line image (i.e., the target text image) is h2, and the length is w2 = w1 / scale. Here, h1 is the height of the text line image, h2 is the preset image height (e.g., 32 pixels), w1 is the length of the text line image, and w2 is the length of the scaled text line image.
[0118] Since character recognition models can only accurately recognize images within a certain height range, a preset image height can be obtained based on the image recognition height of the character recognition model. Then, the text line image can be resized based on the preset image height, that is, the height and length of the text line image can be scaled proportionally to ensure the accuracy of subsequent recognition and the reproducibility of the results.
[0119] In one implementation, step 101 can be carried out in any of the following ways:
[0120] Method 1: Based on the segmentation length threshold and the overlap length preset value, the target text image is divided to obtain at least one text sub-image with a length of the segmentation length threshold, and at least one text sub-image with a length greater than the overlap length preset value and less than the segmentation length threshold.
[0121] It should be noted that there can be one or more text sub-images with a length equal to the segmentation length threshold, and there can also be one or more other text sub-images (i.e., text sub-images whose length is not equal to the segmentation length threshold). In practical applications, the length and number of other text sub-images can be set according to the actual application scenario, and are not restricted here. The segmentation length threshold can be determined based on the maximum text length limited by the character recognition model.
[0122] As an example, see Figure 2 The image shown is a schematic diagram of image segmentation. The segmentation length threshold is split_width, and the overlap length is preset to delta_width (e.g., split_width is 240 pixels, delta_width is 18 pixels). Figure 2 In this process, the target text image is segmented into four sub-images. The overlap between any two adjacent sub-images is defined as delta_width. The first three sub-images each have a length of split_width. The last sub-image has a length greater than delta_width but less than split_width.
[0123] In this way, images can be segmented according to the segmentation length threshold first, thereby reducing the amount of data processing and the number of segmentations.
[0124] Method 2: Based on the segmentation length threshold and the preset overlap length, the target text image is divided into equal-length sub-images to obtain multiple text sub-images of the same length.
[0125] As an example, determine the difference between the segmentation length threshold and the preset overlap length, d1 = split_width - delta_width. If the difference d1 is divisible by the length L of the target text image, the target text image can be divided into multiple sub-images of equal length, each with the segmentation length threshold. Otherwise, a segmentation length variable x can be set, and the value of x can be calculated using L = (x - delta_width)k. Based on the determined x value, the target text image can then be divided into multiple sub-images of equal length. K represents the number of sub-images obtained and is a positive integer.
[0126] It should be noted that the length of the segmented target text image must ensure that the number of characters in the target text image is within the recognition range of the character recognition model. Therefore, the length of the segmented target text image can be determined based on the segmentation length threshold. The preset overlap length value should ensure that there is at least one overlapping character in two adjacent target text images.
[0127] In one implementation, the length range of a single character is estimated based on the height of the target text image, and the length range of n characters is estimated based on the estimated length range of the characters. Based on the length range of the n characters, a preset value for the overlap length is determined (e.g., the preset value for the overlap length can be the maximum value among the length ranges of the n characters).
[0128] Here, n represents the maximum number of overlapping characters. Since different characters have different lengths, the actual number of overlapping characters may differ from n. As an example, let n be 2; since i is very narrow, the number of overlapping characters is 3.
[0129] In practical applications, the segmentation length threshold and the overlap length preset value can be set according to the actual application scenario. The length of each text sub-image can be the same or different, and there is no restriction here.
[0130] Step 102: Use a character recognition model to perform character recognition on each text sub-image to obtain the character recognition information corresponding to each text sub-image.
[0131] In one implementation, each text sub-image is input into a character recognition model (e.g., each text sub-image is input into a character recognition model in batches) to obtain each character in each text sub-image and the confidence level corresponding to each character.
[0132] Character recognition information refers to the character information in the identified text sub-image and the confidence level of the identified characters. Character recognition information includes the characters and their corresponding confidence levels. Confidence level, or confidence percentage, refers to the probability that the true value of a parameter falls within the measurement result. In this embodiment, the confidence level of the identified character is the probability that the true character is the identified character, i.e., the reliability of the character recognition.
[0133] Optionally, the character recognition model may employ, but is not limited to, any of the following algorithms:
[0134] Convolutional Recurrent Neural Network (CRNN) and Attentional Seq2seq, a sequence-to-sequence text recognition network based on attention mechanisms.
[0135] In practical applications, the character recognition model can be set according to the actual application scenario, and there are no restrictions here.
[0136] Step 103: Match the overlapping characters of every two adjacent text sub-images to obtain the matching results.
[0137] In one implementation, for a first target text sub-image and a second target text sub-image in each text sub-image, if a first overlapping region in the first target text sub-image overlaps with a second overlapping region in the second target text sub-image, then the character identified in the first overlapping region is the overlapping character of the first target text sub-image, and the character identified in the second overlapping region is the overlapping character of the second target text sub-image. Matching the overlapping characters of each pair of adjacent text sub-images includes: matching the characters in the first overlapping region with the characters in the second overlapping region.
[0138] In one embodiment, the last n characters in the first character recognition information of the first target text sub-image are matched with the first n characters in the second character recognition information of the second target text sub-image to obtain a matching result.
[0139] The overlapping characters of two adjacent text sub-images are characters identified from the overlapping areas of the two adjacent text sub-images. The first target text sub-image and the second target text sub-image are any two adjacent text sub-images in each text sub-image, and the first target text sub-image is the preceding text sub-image of the second target text sub-image. n is the maximum number of overlapping characters, and n is a positive integer.
[0140] Step 104: Based on the matching results, concatenate the character recognition information to obtain the character recognition information of the target text image.
[0141] In one embodiment, for the first target text sub-image and the second target text sub-image in each text sub-image, based on the number of characters in the overlapping area and the matching result, character filtering processing is performed on the first character recognition information of the first target text sub-image and the second character recognition information of the second target text sub-image; the first character recognition information and the second character recognition information after character filtering processing are then concatenated.
[0142] The number of overlapping characters indicates the number of overlapping characters contained in an overlapping region of a text sub-image.
[0143] In one embodiment, concatenating the first character recognition information and the second character recognition information after character filtering can include: concatenating the two strings (i.e., the first character recognition information and the second character recognition information after character filtering) end to end (i.e., connecting the first character of the latter string with the last character of the former string) to obtain a concatenated string (i.e., the character recognition information of the target text image).
[0144] In one implementation, the character filtering process may include:
[0145] Repeat the following steps until the character filtering is complete:
[0146] S104-1: If, based on the matching result, it is determined that the last m characters in the first character identification information are the same as the first m characters in the second character identification information, then the corresponding identical characters in the last m characters and the first m characters are deduplicated. m is the number of characters in the overlapping area, and m is a positive integer. The initial value of the number of characters in the overlapping area is the maximum number of overlapping characters.
[0147] In one implementation, deduplication is performed on the corresponding characters in the last m characters and the first m characters, including: removing the first target character from the last m characters, or removing the second target character from the first m characters.
[0148] Wherein, the first target character is the i-th character among the last m characters, and the second target character is the i-th character among the first m characters, where i represents the character's index, i is a positive integer, and i is not greater than m.
[0149] As an example, if m is determined to be 3, and the last 3 characters in the first character recognition information are exactly the same as the first 3 characters in the second character recognition information, then the last 3 characters are removed.
[0150] This removes duplicate overlapping characters (i.e., removes overlapping characters that correspond to each other).
[0151] S104-2: If it is determined that m is greater than 1, and the matching results indicate that there are corresponding characters that are the same as the first m characters and also corresponding characters that are different, then the corresponding characters that are the same in the last m characters and the first m characters are deduplicated, and the corresponding characters that are different in the last m characters and the first m characters are filtered according to the confidence of each character.
[0152] In one implementation, based on the confidence level of each character, the following steps are taken: for different first target characters and second target characters, the minimum confidence level between the confidence level of the first target character and the confidence level of the second target character is determined, and the character corresponding to the minimum confidence level is removed.
[0153] "Corresponding identically" means that two characters with the same index in the last m characters and the first m characters are identical. That is, the i-th character in the last m characters is the same as the i-th character in the first m characters. "Corresponding differently" means that two characters with the same index in the last m characters and the first m characters are different. That is, the i-th character in the last m characters is different from the i-th character in the first m characters.
[0154] Thus, if there are both overlapping characters that correspond to the same character and overlapping characters that correspond to different characters, then the overlapping characters that correspond to the same character will be deduplicated.
[0155] S104-3: If m=1 is determined, and the last character in the first character identification information and the first character in the second character identification information are determined to be different according to the matching result, then the last character and the first character are filtered according to the confidence of the last character and the confidence of the first character.
[0156] S104-4: If it is determined that m is greater than 1, and it is determined from the matching results that there are no corresponding characters in the last m characters and the first m characters, then the number of characters in the overlapping area is reduced by one to obtain the updated number of characters in the overlapping area.
[0157] It should be noted that S104-1 to S104-4 can be executed in any order, and the execution order is not limited in this embodiment.
[0158] In one implementation, the process of S104-3 may include:
[0159] S104-31: If it is determined that the confidence level of the last character and the confidence level of the first character are both greater than the upper confidence level threshold, then the last character and the first character are retained.
[0160] S104-32: If it is determined that the confidence scores of the last character and the first character are both less than the lower confidence threshold, then remove the last character and the first character.
[0161] S104-33: If it is determined that there is a character in the last character and the first character whose confidence level is not less than the lower confidence level threshold and not greater than the upper confidence level threshold, then determine the minimum confidence level between the confidence level of the last character and the confidence level of the first character, and remove the character corresponding to the minimum confidence level.
[0162] As an example, the upper confidence threshold is threshold_high=0.96, and the lower confidence threshold is threshold_low=0.6.
[0163] If the confidence level is not less than the upper confidence threshold, the character is considered correctly identified and retained; if the confidence level is not greater than the lower confidence threshold, the character is considered incorrectly identified and discarded. If the character's confidence level falls between the two thresholds, the character with the higher confidence level is retained, and the character with the lower confidence level is removed.
[0164] See Figure 3 The diagram shown is a flowchart illustrating the implementation of a character filtering method. The following example uses a specific application scenario to illustrate the character filtering process in the above embodiment. Assuming the maximum number of overlapping characters n is 3, the first character recognition information of the first target text sub-image can be represented as pic1, and the second character recognition information of the first target text sub-image can be represented as pic2. The specific implementation flow of this method is as follows:
[0165] Step 301: Based on the number of characters in the overlapping area m=3, determine whether the last three characters in pic1 are the same as the first three characters in pic2. If so, proceed to step 302; otherwise, proceed to step 303.
[0166] Specifically, based on the maximum number of overlapping characters, the initial value of the number of characters m in the overlapping area is determined to be 3.
[0167] Step 302: Remove the last three characters from pic1 or remove the first three characters from pic2, then proceed to step 313.
[0168] Thus, if the last three characters of pic1 are the same as the first three characters of pic2, then pic1 and pic2 can be merged (i.e., overlapping characters in pic1 and pic2 are deduplicated and concatenated).
[0169] Step 303: Determine whether the last three characters in pic1 are completely different from the first three characters in pic2. If so, proceed to step 305; otherwise, proceed to step 304.
[0170] Step 304: Remove duplicate characters from the last three characters of pic1 and the first three characters of pic2, and filter out the characters that are the same. Then proceed to step 313.
[0171] In other words, the last three characters in pic1 and the first three characters in pic2 have some corresponding characters that are the same, and there are also some corresponding characters that are different.
[0172] Step 305: Decrease the number of characters in the overlapping area by one to obtain the updated number of characters in the overlapping area m=2.
[0173] Specifically, m = m - 1.
[0174] Step 306: If the number of characters in the overlapping area is m=2>1, then based on the updated number of characters in the overlapping area, determine whether the last two characters in pic1 are the same as the first two characters in pic2. If so, proceed to step 307; otherwise, proceed to step 308.
[0175] Step 307: Remove the last two characters from pic1 or remove the first two characters from pic2, then proceed to step 313.
[0176] Therefore, if the last two characters of pic1 are the same as the first two characters of pic2, then pic1 and pic2 can be merged.
[0177] Step 308: Determine whether the last two characters in pic1 are completely different from the first two characters in pic2. If so, proceed to step 310; otherwise, proceed to step 309.
[0178] Step 309: Remove duplicate characters from the last two characters of pic1 and the first two characters of pic2, and filter out the characters that are different. Then proceed to step 313.
[0179] In one implementation, assume the last two characters are a1 and a2, and the first two characters are b1 and b2. If a1 and b1 are the same, and a2 and b2 are different, then either a1 or b1 is removed, and the character with the higher confidence level between a2 and b2 is retained, while the character with the lower confidence level is discarded. If a1 and b1 are different, and a2 and b2 are the same, then either a2 or b2 is removed, and the character with the higher confidence level between a1 and b1 is retained, while the character with the lower confidence level is discarded.
[0180] Step 310: Update the number of characters in the overlapping area m=1, and determine whether the last character in the first character identification information and the first character in the second character identification information are the same. If they are the same, proceed to step 311; otherwise, proceed to step 312.
[0181] Step 311: Remove the last character from the first character identification information, or remove the first character from the second character identification information, and proceed to step 313.
[0182] Therefore, if the last character of pic1 is the same as the first character of pic2, then pic1 and pic2 can be merged.
[0183] Step 312: Based on the confidence scores of the last character and the first character, perform character filtering on the last character and the first character.
[0184] Step 301: End the character filtering process.
[0185] In one implementation, step 312 may include:
[0186] Method 1: If the confidence scores of the last character and the first character are both greater than the upper confidence threshold (threshold_high), then retain the last character and the first character.
[0187] Method 2: If the confidence scores of the last character and the first character are both less than the lower confidence threshold (threshold_low=0.6), then remove the last character and the first character.
[0188] Method 3: If the confidence level of the last character and / or the confidence level of the first character are both within [threshold_low, threshold_high], then retain the character with the higher confidence level and discard the character with the lower confidence level.
[0189] The above embodiments will be illustrated using a specific application scenario. Figure 4 This is a schematic diagram of character recognition. Figure 4 This includes the target text image, multiple text sub-images, multiple character recognition information, and the character recognition information of the target text image. In some image data, there are images with a very long image frame width (i.e., the image length) that contain relatively long text. Figure 4 The target text image is shown. Since inputting long text images into a character recognition model may lead to significant deviations in the character recognition results, this embodiment first uses a segmentation length threshold and a preset overlap length value to... Figure 4 The target text image shown is segmented to obtain... Figure 4 Multiple text sub-images (each line is a text sub-image) are then processed using a character recognition model. Figure 4 Each of the text sub-images shown is subjected to character recognition to obtain... Figure 4 The character recognition information shown (each line represents one character recognition information) is then combined using the concatenation processing scheme described in the above embodiment to obtain... Figure 4 The character recognition information of the target text image shown improves the accuracy of character recognition in long texts.
[0190] See Figure 5 The image shown is an example of a comparison of test metrics. Character recognition tests were conducted on both the traditional character recognition method and the character recognition method of this application for multiple test metrics, yielding the results. Figure 5The test metrics for traditional character recognition methods and the test metrics for the character recognition method of this application are presented. Each test metric includes recognition accuracy, recall, and score. Recognition accuracy represents the precision of character recognition. Recall represents the proportion of characters recognized from an image to the total number of characters in the image. The score represents a rating of the character recognition performance. Clearly, higher recognition accuracy, higher recall, and higher score indicate better character recognition performance. Compared to traditional character recognition methods, the character recognition method of this application has better recognition performance. Furthermore, separate modules can be designed for the image segmentation method and character concatenation method in this application and added to the character recognition model, enabling the character recognition model to accurately recognize long texts.
[0191] In this embodiment, based on the segmentation length threshold and the preset overlap length, the target text image containing longer text is divided into multiple text sub-images containing shorter text, thereby solving the problem that character recognition models have difficulty accurately recognizing images containing longer text. Furthermore, by using the matching results and confidence scores of characters in the overlapping areas of adjacent text sub-images, character recognition information of adjacent text sub-images is filtered and spliced to remove corresponding overlapping characters and inaccurately recognized characters, thus improving the accuracy of character recognition for long texts.
[0192] Based on the same inventive concept, this application also provides a character recognition device. Since the principle of the above-mentioned device and equipment in solving the problem is similar to that of a character recognition method, the implementation of the above-mentioned device can refer to the implementation of the method, and the repeated parts will not be described again.
[0193] like Figure 6 The diagram shown is a structural schematic of a character recognition device provided in an embodiment of this application, comprising:
[0194] The segmentation unit 601 is used to segment the target text image to be identified based on the segmentation length threshold and the overlap length preset value to obtain multiple text sub-images; the overlap length preset value is the length of the overlapping area between any two adjacent text sub-images; the length of the text sub-image is greater than the overlap length preset value but not greater than the segmentation length threshold; the recognition unit 602 is used to perform character recognition on each text sub-image using a character recognition model to obtain the character recognition information corresponding to each text sub-image; the matching unit 603 is used to match the overlapping characters of each pair of adjacent text sub-images to obtain the matching result; the overlapping characters of two adjacent text sub-images are characters identified separately from the overlapping areas of the two adjacent text sub-images; the splicing unit 604 is used to splice the character recognition information according to the matching result to obtain the character recognition information of the target text image.
[0195] In one embodiment, the segmentation unit 601 is further configured to: perform text line detection on the original image to obtain a text line region; divide the original image into a text line image containing the text line region; obtain a scaling ratio based on the height of the text line image and a preset image height value; and scale the height and length of the text line image according to the scaling ratio to obtain a target text image.
[0196] In one embodiment, the segmentation unit 601 is used to: divide the target text image based on a segmentation length threshold and an overlap length preset value to obtain at least one text sub-image with a length equal to the segmentation length threshold, and at least one text sub-image with a length greater than the overlap length preset value and less than the segmentation length threshold; or, based on the segmentation length threshold and the overlap length preset value, divide the target text image into equal length segments to obtain multiple text sub-images with the same length.
[0197] In one embodiment, the recognition unit 602 is used to: input each text sub-image into a character recognition model, and obtain each character in each text sub-image and the confidence level corresponding to each character; wherein, the character recognition information includes the character and its corresponding confidence level, and the confidence level is the credibility of the character recognition.
[0198] In one embodiment, the matching unit 603 is used to: for the first target text sub-image and the second target text sub-image in each text sub-image, match the last n characters in the first character recognition information of the first target text sub-image with the first n characters in the second character recognition information of the second target text sub-image to obtain a matching result; wherein, the first target text sub-image and the second target text sub-image are any two adjacent text sub-images in each text sub-image, and the first target text sub-image is the preceding text sub-image of the second target text sub-image, n is the maximum number of overlapping characters, and n is a positive integer.
[0199] In one embodiment, the splicing unit 604 is used to: for the first target text sub-image and the second target text sub-image in each text sub-image, based on the number of characters in the overlapping region and the matching result, perform character filtering processing on the first character recognition information of the first target text sub-image and the second character recognition information of the second target text sub-image; the number of characters in the overlapping region is used to indicate the number of overlapping characters contained in an overlapping region of a text sub-image; and splice the first character recognition information and the second character recognition information after character filtering processing.
[0200] In one embodiment, the splicing unit 604 is used to: cyclically execute the following steps until character filtering is completed: If, based on the matching result, it is determined that the last m characters in the first character recognition information are the same as the first m characters in the second character recognition information, then the corresponding identical characters in the last m characters and the first m characters are deduplicated, where m is the number of characters in the overlapping area, m is a positive integer, and the initial value of the number of characters in the overlapping area is the maximum number of overlapping characters; if it is determined that m is greater than 1, and based on the matching result, it is determined that there are corresponding identical characters and also corresponding different characters between the last m characters and the first m characters, then the corresponding identical characters in the last m characters and the first m characters are deduplicated. For characters that are identical, deduplication is performed, and based on the confidence level of each character, the characters that are different from the last m characters and the first m characters are filtered. If m=1 is determined, and the last character in the first character identification information and the first character in the second character identification information are determined to be different based on the matching results, then the last character and the first character are filtered based on the confidence levels of the last character and the first character. If m is determined to be greater than 1, and the matching results determine that there are no corresponding identical characters among the last m characters and the first m characters, then the number of characters in the overlapping area is reduced by one to obtain the updated number of characters in the overlapping area.
[0201] In one embodiment, the splicing unit 604 is used to: remove the first target character or remove the second target character for the same first target character and second target character; wherein the first target character is the i-th character among the last m characters, the second target character is the i-th character among the first m characters, i represents the character number, i is a positive integer, and i is not greater than m.
[0202] In one embodiment, the splicing unit 604 is used to: determine the minimum confidence level between the confidence level of the first target character and the confidence level of the second target character for different first target characters and second target characters, and remove the character corresponding to the minimum confidence level; wherein, the first target character is the i-th character among the last m characters, the second target character is the i-th character among the first m characters, i represents the character number, i is a positive integer, and i is not greater than m.
[0203] In one embodiment, the splicing unit 604 is configured to: retain the last character and the first character if it is determined that the confidence level of the last character and the confidence level of the first character are both greater than the upper confidence level threshold; remove the last character and the first character if it is determined that the confidence level of the last character and the confidence level of the first character are both less than the lower confidence level threshold; and remove the character corresponding to the lowest confidence level if it is determined that there is a character among the last character and the first character whose confidence level is not less than the lower confidence level threshold and not greater than the upper confidence level threshold.
[0204] The character recognition method and electronic device provided in this application segment the target text image to be recognized based on a segmentation length threshold and an overlap length preset value to obtain multiple text sub-images. The overlap length preset value is the length of the overlapping area between any two adjacent text sub-images. The length of the text sub-image is greater than the overlap length preset value but not greater than the segmentation length threshold. A character recognition model is used to perform character recognition on each text sub-image to obtain the character recognition information corresponding to each text sub-image. The overlapping characters of each pair of adjacent text sub-images are matched to obtain a matching result. The overlapping characters of two adjacent text sub-images are characters identified separately from the overlapping areas of the two adjacent text sub-images. Based on the matching result, the character recognition information is concatenated to obtain the character recognition information of the target text image. In this way, based on the segmentation length threshold and the overlap length preset value, the target text image containing longer text is divided to obtain multiple text sub-images containing shorter text, thereby solving the problem that the character recognition model has difficulty accurately recognizing images containing longer text.
[0205] Figure 7 A schematic diagram of the structure of an electronic device 7000 is shown. (See also...) Figure 7 As shown, the electronic device 7000 includes a processor 7010 and a memory 7020, and optionally may also include a power supply 7030, a display unit 7040, and an input unit 7050.
[0206] The processor 7010 is the control center of the electronic device 7000. It connects various components through various interfaces and lines, and performs various functions of the electronic device 7000 by running or executing software programs and / or data stored in the memory 7020, thereby performing overall monitoring of the electronic device 7000.
[0207] In this embodiment, when the processor 7010 calls the computer program stored in the memory 7020, it executes the steps in the above embodiments.
[0208] Optionally, the processor 7010 may include one or more processing units; preferably, the processor 7010 may integrate an application processor and a modem processor, wherein the application processor mainly handles the operating system, user interface, and applications, and the modem processor mainly handles wireless communication. It is understood that the modem processor may not be integrated into the processor 7010. In some embodiments, the processor and memory may be implemented on a single chip; in some embodiments, they may also be implemented separately on independent chips.
[0209] The memory 7020 may primarily include a program storage area and a data storage area. The program storage area may store the operating system, various applications, etc.; the data storage area may store data created based on the use of the electronic device 7000, etc. In addition, the memory 7020 may include high-speed random access memory, and may also include non-volatile memory, such as at least one disk storage device, flash memory device, or other volatile solid-state storage device, etc.
[0210] Electronic device 7000 also includes a power supply 7030 (such as a battery) that supplies power to various components. The power supply can be logically connected to processor 7010 through a power management system, thereby enabling the management of charging, discharging, and power consumption.
[0211] The display unit 7040 can be used to display information input by the user or information provided to the user, as well as various menus of the electronic device 7000. In this embodiment of the invention, it is mainly used to display the display interfaces of various applications in the electronic device 7000, as well as text, images, and other objects displayed on the display interfaces. The display unit 7040 may include a display panel 7041. The display panel 7041 may be configured in the form of a liquid crystal display (LCD), an organic light-emitting diode (OLED), or the like.
[0212] The input unit 7050 can be used to receive information such as numbers or characters input by the user. The input unit 7050 may include a touch panel 7051 and other input devices 7052. The touch panel 7051, also known as a touch screen, can collect touch operations on or near the touch panel 7051 (such as operations performed by the user using a finger, stylus, or any suitable object or accessory on or near the touch panel 7051).
[0213] Specifically, the touch panel 7051 can detect user touch operations and the signals generated by these operations, convert them into touch point coordinates, send them to the processor 7010, and receive and execute commands from the processor 7010. Furthermore, the touch panel 7051 can be implemented using various types of sensors, including resistive, capacitive, infrared, and surface acoustic wave sensors. Other input devices 7052 can include, but are not limited to, one or more of the following: physical keyboard, function keys (such as volume control buttons, power buttons, etc.), trackball, mouse, joystick, etc.
[0214] Of course, the touch panel 7051 can cover the display panel 7041. When the touch panel 7051 detects a touch operation on or near it, it transmits the information to the processor 7010 to determine the type of touch event. Subsequently, the processor 7010 provides corresponding visual output on the display panel 7041 based on the type of touch event. Although in Figure 7 In this embodiment, the touch panel 7051 and the display panel 7041 are two separate components to realize the input and output functions of the electronic device 7000. However, in some embodiments, the touch panel 7051 and the display panel 7041 can be integrated to realize the input and output functions of the electronic device 7000.
[0215] The electronic device 7000 may also include one or more sensors, such as a pressure sensor, a gravity acceleration sensor, a proximity light sensor, etc. Of course, depending on the specific application, the electronic device 7000 may also include other components such as a camera. Since these components are not the focus of this application's embodiments, therefore... Figure 7 It is not shown in the text and will not be described in detail here.
[0216] Those skilled in the art will understand that Figure 7 This is merely an example of an electronic device and does not constitute a limitation on the electronic device. It may include more or fewer components than shown, or a combination of certain components, or different components.
[0217] In this embodiment of the application, a computer-readable storage medium stores a computer program thereon. When the computer program is executed by a processor, it enables a communication device to perform the various steps in the above embodiments.
[0218] For ease of description, the above sections are divided into modules (or units) according to their functions and described separately. Of course, in implementing this application, the functions of each module (or unit) can be implemented in one or more software or hardware components.
[0219] Those skilled in the art will understand that embodiments of this application can be provided as methods, systems, or computer program products. Therefore, this application can take the form of a completely hardware embodiment, a completely software embodiment, or an embodiment combining software and hardware aspects. Furthermore, this application can take the form of a computer program product embodied on one or more computer-usable storage media (including but not limited to disk storage, CD-ROM, optical storage, etc.) containing computer-usable program code.
[0220] This application is described with reference to flowchart illustrations and / or block diagrams of methods, apparatus (systems), and computer program products according to embodiments of this application. It will be understood that each block of the flowchart illustrations and / or block diagrams, and combinations of blocks in the flowchart illustrations and / or block diagrams, can be implemented by computer program instructions. These computer program instructions can be provided to a processor of a general-purpose computer, special-purpose computer, embedded processor, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, generate instructions for implementing the flowchart... Figure 1 One or more processes and / or boxes Figure 1 A device that provides the functions specified in one or more boxes.
[0221] Although preferred embodiments of this application have been described, those skilled in the art, upon learning the basic inventive concept, can make other changes and modifications to these embodiments. Therefore, the appended claims are intended to be interpreted as including the preferred embodiments as well as all changes and modifications falling within the scope of this application.
[0222] Obviously, those skilled in the art can make various modifications and variations to this application without departing from the spirit and scope of this application. Therefore, if such modifications and variations fall within the scope of the claims of this application and their equivalents, this application also intends to include such modifications and variations.
Claims
1. A method for character recognition, characterized in that, include: Based on the segmentation length threshold and the overlap length preset value, the target text image to be identified is segmented to obtain multiple text sub-images; the overlap length preset value is the length of the overlapping area between any two adjacent text sub-images; the length of the text sub-image is greater than the overlap length preset value and not greater than the segmentation length threshold. A character recognition model is used to perform character recognition on each text sub-image to obtain the character recognition information corresponding to each text sub-image. The overlapping characters of each pair of adjacent text sub-images are matched to obtain the matching results; the overlapping characters of the two adjacent text sub-images are characters identified from the overlapping areas of the two adjacent text sub-images respectively; Based on the matching results, the character recognition information of each character is concatenated to obtain the character recognition information of the target text image; The step of matching the overlapping characters of every two adjacent text sub-images to obtain the matching results includes: For the first target text sub-image and the second target text sub-image in each text sub-image, the last n characters in the first character recognition information of the first target text sub-image are matched with the first n characters in the second character recognition information of the second target text sub-image to obtain the matching result; Wherein, the first target text sub-image and the second target text sub-image are any two adjacent text sub-images among the text sub-images, and the first target text sub-image is the text sub-image preceding the second target text sub-image, n is the maximum number of overlapping characters, and n is a positive integer; The step of concatenating the character recognition information based on the matching result to obtain the character recognition information of the target text image includes: For the first target text sub-image and the second target text sub-image in each text sub-image, based on the number of characters in the overlapping region and the matching result, character filtering processing is performed on the first character recognition information of the first target text sub-image and the second character recognition information of the second target text sub-image; the number of characters in the overlapping region is used to indicate the number of overlapping characters contained in an overlapping region of a text sub-image; The first character recognition information and the second character recognition information after character filtering are concatenated; The character filtering process for the first character recognition information of the first target text sub-image and the second character recognition information of the second target text sub-image includes: Repeat the following steps until the character filtering is complete: If, based on the matching result, it is determined that the last m characters in the first character recognition information are the same as the first m characters in the second character recognition information, then the corresponding identical characters in the last m characters and the first m characters are deduplicated, where m is the number of characters in the overlapping area, and m is a positive integer. The initial value of the number of characters in the overlapping area is the maximum number of overlapping characters. If it is determined that m is greater than 1, and according to the matching result it is determined that there are corresponding identical characters and corresponding different characters between the last m characters and the first m characters, then the corresponding identical characters in the last m characters and the first m characters are deduplicated, and the corresponding different characters in the last m characters and the first m characters are filtered according to the confidence level of each character; If m=1 is determined, and the last character in the first character recognition information and the first character in the second character recognition information are different according to the matching result, then character filtering is performed on the last character and the first character based on the confidence level of the last character and the confidence level of the first character. If it is determined that m is greater than 1, and it is determined according to the matching result that there is no corresponding character in the last m characters and the first m characters, then the number of characters in the overlapping area is reduced by one to obtain the updated number of characters in the overlapping area.
2. The method as described in claim 1, characterized in that, Before segmenting the target text image to be identified into multiple text sub-images based on the segmentation length threshold and the overlap length preset value, the method further includes: Perform text line detection on the original image to obtain the text line regions; Extract text line images containing the text line regions from the original image; The scaling ratio is obtained based on the height of the text line image and the preset image height value; The height and length of the text line image are scaled according to the scaling ratio to obtain the target text image.
3. The method as described in claim 1, characterized in that, The target text image to be identified is segmented based on a segmentation length threshold and a preset overlap length value to obtain multiple text sub-images, including: Based on the segmentation length threshold and the overlap length preset value, the target text image is divided to obtain at least one text sub-image with a length of the segmentation length threshold, and at least one text sub-image with a length greater than the overlap length preset value and less than the segmentation length threshold; Alternatively, based on the segmentation length threshold and the overlap length preset value, the target text image can be divided into multiple text sub-images of the same length.
4. The method according to any one of claims 1-3, characterized in that, The method employs a character recognition model to perform character recognition on each text sub-image, obtaining the character recognition information corresponding to each text sub-image, including: Each text sub-image is input into the character recognition model to obtain each character in each text sub-image and the confidence level corresponding to each character. The character recognition information includes characters and their corresponding confidence levels, where the confidence level is the reliability of the character recognition.
5. The method as described in claim 1, characterized in that, The step of removing duplicate characters from the last m characters and the first m characters includes: For the same first target character and second target character, remove the first target character, or remove the second target character; Wherein, the first target character is the i-th character among the last m characters, and the second target character is the i-th character among the first m characters, where i represents the character's sequence number, i is a positive integer, and i is not greater than m.
6. The method as described in claim 1, characterized in that, The step of filtering the different characters in the last m characters and the first m characters based on the confidence level of each character includes: For different first target characters and second target characters, determine the minimum confidence level between the confidence level of the first target character and the confidence level of the second target character, and remove the character corresponding to the minimum confidence level; Wherein, the first target character is the i-th character among the last m characters, and the second target character is the i-th character among the first m characters, where i represents the character's sequence number, i is a positive integer, and i is not greater than m.
7. The method as described in claim 1, characterized in that, The step of filtering characters based on the confidence levels of the last character and the first character includes: If it is determined that the confidence level of the last character and the confidence level of the first character are both greater than the upper confidence level threshold, then the last character and the first character are retained. If it is determined that the confidence scores of the last character and the first character are both less than the lower confidence threshold, then the last character and the first character are removed. If it is determined that there is a character among the last character and the first character whose confidence level is not less than the lower confidence level threshold and not greater than the upper confidence level threshold, then the minimum confidence level between the confidence level of the last character and the confidence level of the first character is determined, and the character corresponding to the minimum confidence level is removed.
8. An electronic device, characterized in that, It includes a processor and a memory, the memory storing computer-readable instructions that, when executed by the processor, perform the method as described in any one of claims 1-7.