A form information recognition method, device and related equipment
By performing table detection and text box image segmentation on form images, combined with text recognition and target field retrieval, the problem of low accuracy in form information recognition in existing technologies is solved, and efficient and accurate form information extraction is achieved.
Patent Information
- Authority / Receiving Office
- CN · China
- Patent Type
- Patents(China)
- Current Assignee / Owner
- AISINO CORPORATION
- Filing Date
- 2022-09-23
- Publication Date
- 2026-06-26
Smart Images

Figure CN115565197B_ABST
Abstract
Description
Technical Field
[0001] This application relates to the field of image recognition technology, and in particular to a method, apparatus and related equipment for recognizing form information. Background Technology
[0002] With the development of image recognition technology, more and more information related to forms containing various texts is being widely used across various industries through image recognition. When recognizing images of invoices or other forms, it is often necessary to detect, identify, and extract the text information from the images for later organization and summarization. Currently, the industry often uses template recognition methods based on specific general forms or invoices to extract text information from the forms. This method has low accuracy when extracting information containing names, letters, or other special characters or fields, often requiring manual verification of the results. This significantly impacts the efficiency of extracting information from forms and results in a poor user experience. Summary of the Invention
[0003] In view of this, embodiments of this application provide a form information recognition method, apparatus, and related equipment to at least partially solve the above-mentioned problems.
[0004] In a first aspect, embodiments of this application provide a method for identifying form information, including:
[0005] The table included in the form image to be identified is detected to determine the position information of the table;
[0006] Based on the position information of the table, determine the text box image containing the target text information;
[0007] Perform text recognition on the text box image to determine the recognition result of the text information;
[0008] The recognition results of the text information are used to perform target field retrieval to determine a candidate list of target fields;
[0009] Based on the candidate list of the target field and the table position information, the recognition result of the target information contained in the form image to be recognized is determined.
[0010] Optionally, in one implementation of this application embodiment, the step of detecting the table included in the form image to be identified and determining the position information of the table includes:
[0011] Detect the table lines in the form image to be identified;
[0012] Based on the detection results, the connected regions in the form image to be identified are determined;
[0013] The connected regions are fitted to determine the position information of the table.
[0014] Optionally, in one implementation of this application embodiment, the step of performing text recognition on the text box image to determine the recognition result of the text information includes:
[0015] Perform single-character recognition on the characters in the text box image;
[0016] Based on the results of the single-character recognition and the text sequence features, the recognition result of the text information is determined.
[0017] Optionally, in one implementation of this application embodiment, the step of performing target field retrieval on the recognition result of the text information to determine a candidate list of target fields includes:
[0018] The recognition results of the text information are retrieved and matched using regular expression matching to determine the candidate list of the target field.
[0019] Optionally, in one implementation of this application embodiment, before determining the text box image of the target text information based on the position information of the table in the form image to be identified, the method further includes: binarizing the form image to be identified to obtain a binarized image of the form image to be identified.
[0020] Optionally, in one implementation of this application embodiment, the method for determining the text box image of the target text information based on the position information of the table in the form image to be recognized further includes:
[0021] The form image to be identified is preprocessed, and the preprocessing includes at least one of the image processing methods, namely image correction processing or image smoothing processing.
[0022] Optionally, in one implementation of this application embodiment, the image correction processing includes: using a projection correction algorithm to correct the form image to be recognized; the image smoothing processing includes: using median filtering technology to process the form image to be recognized.
[0023] Secondly, based on the form information recognition method described in the first aspect of this application, embodiments of this application also provide a form information recognition device, characterized in that it includes:
[0024] The detection module is used to detect the tables included in the form image to be identified and determine the position information of the tables;
[0025] The determination module is used to determine the text box image containing the target text information based on the position information of the table;
[0026] The recognition module is used to perform text recognition on the text box image and determine the recognition result of the text information;
[0027] The retrieval module is used to perform target field retrieval on the recognition results of the text information and determine a candidate list of the target fields;
[0028] The acquisition module is used to acquire the recognition result of the target field contained in the form image to be recognized based on the candidate list of the target field and the table position information.
[0029] Optionally, in one embodiment of this application, the form information recognition device further includes a preprocessing module, which is used to preprocess the form image to be recognized before the detection module determines the text box image of the target text information based on the position information of the table in the form image to be recognized. The preprocessing includes at least one of image processing, namely image correction processing or image smoothing processing.
[0030] Thirdly, embodiments of this application also provide a storage medium storing a computer program thereon, which, when executed by a processor, implements any one of the form information recognition methods described in the first aspect of this application.
[0031] This application provides a form information recognition method, apparatus, and related equipment as described in any one of the claims. The form information recognition method includes: detecting tables included in the form image to be recognized and determining the position information of the tables; determining text box images containing target text information based on the position information of the tables; performing text recognition on the text box images and determining the recognition result of the text information; performing target field retrieval on the recognition result of the text information and determining a candidate list of target fields; and determining the recognition result of the target information contained in the form image to be recognized based on the candidate list of target fields and the table position information. By segmenting the form image to be recognized into text box images, and using the text fields recognized in each text box image as target candidate fields, and further combining this with the table position information to determine the target recognition result in the form image to be recognized, this method can effectively adapt to the recognition of text information in various different form images, thereby effectively ensuring the accuracy of recognizing and extracting characters or fields such as names, letters, or other special characters in the extracted form information. Furthermore, the implementation process is simple and reliable, and the recognition is widely applicable. Attached Figure Description
[0032] To more clearly illustrate the technical solutions in the embodiments of this application or the prior art, the drawings used in the description of the embodiments or the prior art will be briefly introduced below. Obviously, the drawings described below are only some embodiments recorded in the embodiments of this application. For those skilled in the art, other drawings can be obtained based on these drawings.
[0033] Figure 1 A flowchart illustrating a form information recognition method provided in this application embodiment;
[0034] Figure 2 This is a schematic diagram of the structure of a form information recognition device provided in an embodiment of this application. Detailed Implementation
[0035] To enable those skilled in the art to better understand the technical solutions in the embodiments of this application, the technical solutions in the embodiments of this application will be clearly and completely described below with reference to the accompanying drawings. Obviously, the described embodiments are only some embodiments of this application, and not all embodiments. Based on the embodiments of this application, all other embodiments obtained by those skilled in the art should fall within the protection scope of the embodiments of this application.
[0036] It should be understood that the steps described in the method embodiments of this application may be performed in different orders and / or in parallel. Furthermore, the method embodiments may include additional steps and / or omit the steps shown. The scope of this application is not limited in this respect.
[0037] Example 1
[0038] This application provides a method for recognizing form information, such as... Figure 1 As shown, Figure 1 A flowchart illustrating a form information recognition method provided in this application embodiment, the form information recognition method comprising:
[0039] S101. Detect the tables included in the form image to be identified and determine the position information of the tables. This embodiment of the application identifies form information based on the identification of table position information, making it applicable to various types of form information. Therefore, it is not limited to identifying and extracting information from specific form images, thus ensuring the applicability of the form information identification method provided in this embodiment.
[0040] Optionally, in one implementation of this application embodiment, detecting the table included in the form image to be identified and determining the table's position information can be performed using the YOLO (You Only Look Once) model. The YOLO (You Only Look Once) model is a one-step object detection model that models text detection as a regression problem, using an end-to-end network to complete the process from input of the original image to output of object position and category. The model introduces a Convolutional Neural Network (CNN) for image feature extraction and uses a fully connected layer (FC) to predict image position and category probability values, ultimately obtaining a block diagram containing text information, i.e., the table position image in this application embodiment, thus enabling rapid and accurate determination of the table's position information.
[0041] S102. Based on the position information of the table, determine the text box image containing the target text information.
[0042] In this embodiment, based on the determined table location information, the form image to be recognized is divided into bounding boxes containing text information or text-based images. Further filtering is performed based on the table location to identify bounding boxes containing the text information to be recognized. These bounding boxes are then identified as text box images containing the target text information. This process removes irrelevant images from the form image to be recognized, reducing the amount of data that needs to be processed, improving the efficiency of text box detection, narrowing the image range requiring text recognition, and ultimately improving the accuracy of form information recognition.
[0043] Optionally, in one implementation of this application embodiment, detecting the tables included in the form image to be identified and determining the position information of the tables includes: detecting table lines in the form image to be identified, determining connected regions in the form image to be identified based on the detection results, and fitting the connected regions to determine the position information of the tables. In the application scenario of this embodiment, in most cases, the tables in the form image to be identified are generally composed of table lines such as horizontal lines, vertical lines, horizontally invisible lines, and vertically invisible lines. Therefore, in the implementation process of this application embodiment, by detecting the determined table lines, and then, based on the results determined by the detection of marked horizontal or vertical lines, by using image analysis methods such as geometric analysis, the connected region images of different table images and the related content filled in them contained in the form image to be identified are extracted, and then further, by fitting, for example, zigzag lines, the free line segments are merged according to geometric information such as distance or tilt angle to form frame lines, and finally the table information in the form image to be identified can be accurately determined.
[0044] Optionally, in one implementation of this application embodiment, after determining the text box image of the target text information based on the position information of the table in the form image to be recognized, the form information recognition method further includes: preprocessing the form image to be recognized, the preprocessing including at least one of image processing methods such as image correction processing or image smoothing processing. In specific application scenarios of this application embodiment, the input form image to be recognized may have issues such as image tilt, folding, or unclear text. In such cases, directly recognizing the form information from the form image to be recognized may seriously affect the accuracy and comprehensiveness of information recognition. At this time, the input form image to be recognized can be preprocessed, that is, through one or more of image correction processing or image smoothing processing, to correct or improve the clarity of the form image to be recognized if there are rotations or noisy pixels, so as to ultimately improve the accuracy of recognizing the form information.
[0045] Optionally, in a preferred implementation of this application embodiment, the image correction processing includes: using a projection correction algorithm to correct the form image to be recognized. Specifically, the projection correction algorithm includes: in the projection of the form image to be recognized, the projection along its normal direction is the longest and the projection along its horizontal direction is the shortest, i.e., the Radon transform. Utilizing this characteristic, the following definition applies: the projection of a bivariate function f(x,y) is the line integral in a certain direction. For example, the line integral of f(x,y) in the vertical direction is the projection in the x direction, the line integral in the horizontal direction is the projection in the y direction, and the line integral along the y′ direction is the projection along the x′ direction. The projection can be performed at any angle. Typically, the Radon transform of f(x,y) is the line integral parallel to the y′ axis, and the calculation formula is shown in equation (1-2). Where (x′,y′) is the new coordinate value of the original image (x,y) rotated by an angle θ, and R... θ Parallel to y during Radon transform ′ Line integrals of the axis.
[0046]
[0047]
[0048] Based on the above projection correction calculation method, projection correction can be performed on all pixels in the form image to be identified more clearly and accurately, thereby improving the accuracy of the form image to be identified.
[0049] Preferably, in one implementation of this application embodiment, image smoothing processing includes: using median filtering technology to perform image processing on the form image to be identified. Specifically, median filtering is a non-linear smoothing technique, which is a non-linear signal processing technique based on sorting statistics theory that can effectively suppress noise. The basic principle of median filtering is to replace the value of a pixel in a digital image or digital sequence with the median value of each pixel in its neighborhood, so that the surrounding pixel values are close to the true values, thereby eliminating isolated noise points. First, using a two-dimensional sliding template, the pixels in the template are sorted according to the size of the pixel values to generate a monotonically increasing / decreasing two-dimensional data sequence, output the filtered value and finally complete the median filtering. The calculation formula is as follows (3):
[0050] g(x,y)=median{f(xk,yl)},k,l∈W (3)
[0051] Where f(x,y) and g(x,y) are the original image and the processed image, respectively, and W is a two-dimensional template. In one implementation of this application, W is typically a 3×3 or 5×5 pixel area to ensure good clarity in the image smoothing process while reducing the amount of data computation and improving the efficiency of image smoothing. Of course, W can also be different shapes, such as circles, crosses, etc. Users can adjust its parameters according to the needs of the actual scene to ensure smoothing effects on images of different shapes. This application does not limit this aspect.
[0052] Optionally, in one implementation of this application embodiment, before determining the text box image of the target text information based on the position information of the table in the form image to be recognized, the form image to be recognized is binarized to obtain a binarized image of the form image to be recognized. This converts the form image to be recognized into a black and white image, improving the contrast between the character image or table image in the form image to be recognized and the background image, so that the subsequent recognition results of the table and characters in the form image to be recognized are clearer and more accurate.
[0053] Preferably, in one implementation of this application embodiment, the preprocessing of the image to be recognized can involve first performing image correction on the form image to be recognized, then smoothing the corrected image, and finally, before detecting the tables included in the form image to be recognized and determining the table's position information, binarizing the input form image to be recognized, thereby improving the accuracy of recognizing the table's position information. Binarizing the smoothed image utilizes various methods to improve and enhance the clarity of tables and text characters in the form image to be recognized, thus ensuring the final recognition effect of the form information to be recognized.
[0054] S103. Perform text recognition on the text box image and determine the recognition result of the text information.
[0055] Optionally, in one implementation of this application embodiment, performing text recognition on the text box image to determine the recognition result of the text information includes: performing single-character recognition on the characters in the text box image, and determining the recognition result of the text information based on the single-character recognition result and text sequence features. In this embodiment, by combining the single-character recognition result and text sequence features, the readability and completeness of the recognized text information can be effectively guaranteed.
[0056] Preferably, in a preferred implementation of this application embodiment, text recognition of a text box image includes: introducing a CTC (Connectionist Temporal Classification) model to achieve "image -> text" conversion, thereby accurately and quickly determining the recognition result of the text information. The CTC model uses a "single character recognition + text alignment" approach to recognize image text. In the single character recognition stage, a CNN network is first used to extract image features, and the extracted features are classified and judged against the text content to determine the single character content. Furthermore, a Recurrent Neural Network (RNN) is introduced to extract text sequence features, solving the compression of redundant text, thereby ensuring that the predicted text information corresponds one-to-one with the detected text image, thus improving the efficiency of text recognition while ensuring the accuracy of text information recognition.
[0057] S104. The recognition results of the text information are used to perform target field retrieval to determine a candidate list of target fields. This application narrows the scope of target text recognition by retrieving target fields, thereby improving the efficiency and accuracy of form information recognition.
[0058] In one implementation of this application, the target field retrieval of the recognition result of text information to determine the candidate list of the target field includes: selecting from the input text content using regular expression matching to determine the candidate list of the target field. Retrieving the target field using regular expression matching is simple and reliable. Furthermore, the regular expressions used are flexibly configurable. Its convenient and reliable performance can meet the work requirements of recognizing corresponding form information from form images containing various forms of information.
[0059] S105. Based on the candidate list and table position information of the target field, determine the recognition result of the target information contained in the form image to be recognized. Specifically, in this embodiment, the target information is the information filled in various formatted form images, such as information representing passenger identity, flight number, destination, seat position, and ticket price in an airplane itinerary. Of course, this embodiment is only an example for illustration and does not mean that this application is limited to this.
[0060] Specifically, in one implementation of this application, the recognition result of the target information contained in the form image to be recognized is determined based on the candidate list of the target field and the table position information. Specifically, the table boundaries in the form image to be recognized are first determined based on the table position information. Then, the results are filtered based on the position information of the target field corresponding to the candidate list of the target field to be recognized, thus determining the recognition result of the target information contained in the form image to be recognized. For example, the target fields after target field retrieval are sorted according to their positional relationship from top to bottom and from left to right. The corresponding position information or coordinate information is located above the table and meets the target field type, such as letter type or array type, to complete the recognition result of the target information contained in the form image to be recognized.
[0061] Optionally, in one implementation of this application embodiment, for some form images to be identified, since the target field may be generated by printing on the basis of an empty form, there will be a positional deviation of characters or fields. In this case, when determining the target information contained in the form image to be identified based on the candidate list of the target field and the table position information, the identification result can be corrected according to the following formulas (4) and (5):
[0062] down y -α·h avg ≤t cy ≤up y +α·h avg (4)
[0063] left x -β·w avg ≤t cx ≤right x +β·w avg (5)
[0064] Among them, t cx / y The left and right coordinates represent the center point of the text box image corresponding to the candidate category of the target field. x right x The up column represents the x-coordinate of the left and right boundaries of the corresponding table position. y downy w represents the vertical coordinates of the upper and lower boundaries of the corresponding table position. avg h avg The text box image represents the average width and height. α and β are printing deviation coefficients, which can be flexibly set according to the user's experience. For example, they can be set to 0.15 and 0.2 respectively. This embodiment of the application does not limit this.
[0065] This application provides a form information recognition method, apparatus, and related equipment. The form information recognition method includes: detecting tables included in a form image to be recognized and determining the table's position information; determining text box images containing target text information based on the table's position information; performing text recognition on the text box images to determine the recognition result of the text information; performing target field retrieval on the text information recognition result to determine a candidate list of target fields; and determining the recognition result of the target information contained in the form image to be recognized based on the candidate list of target fields and the table's position information. By segmenting the form image to be recognized into text box images, and using the recognized text fields in each text box image as target candidate fields, and further combining this with the table's position information to determine the target recognition result in the form image to be recognized, this method can effectively adapt to the recognition of text information in various form images, thereby effectively ensuring the accuracy of recognizing and extracting characters or fields such as names, letters, or other special characters from the extracted form information. Furthermore, the implementation process is simple and reliable, and the recognition is widely applicable.
[0066] Example 2:
[0067] Based on the form information recognition method of the first aspect of this application, embodiments of this application also provide a form information recognition device, such as... Figure 2 As shown, Figure 2 This is a schematic diagram of the structure of a form information recognition device 20 provided in Embodiment 2 of this application. The form information recognition device 20 includes:
[0068] Detection module 201 is used to detect tables included in the form image to be identified and determine the position information of the tables;
[0069] The determination module 202 is used to determine the text box image containing the target text information based on the position information of the table;
[0070] The recognition module 203 is used to perform text recognition on the text box image and determine the recognition result of the text information;
[0071] The retrieval module 204 is used to retrieve the target field from the recognition results of the text information and determine the candidate list of the target field;
[0072] The acquisition module 205 is used to acquire the recognition result of the target field contained in the form image to be recognized based on the candidate list of the target field and the table position information.
[0073] Optionally, in one implementation of the embodiments of this application, the form information recognition device further includes: a preprocessing module (not shown in the figures), which is used to preprocess the form image to be recognized before the detection module determines the text box image of the target text information based on the position information of the table in the form image to be recognized. The preprocessing includes at least one of image processing, namely image correction processing or image smoothing processing.
[0074] Optionally, in one implementation of this application embodiment, the image correction processing includes: using a projection correction algorithm to perform correction processing on the form image to be identified; the image smoothing processing includes: using median filtering technology to perform image processing on the form image to be identified.
[0075] Optionally, in one implementation of this application embodiment, the determining module 202 is further configured to detect table lines in the form image to be identified; determine connected regions in the form image to be identified based on the detection results; and fit the connected regions to determine the position information of the table.
[0076] Optionally, in one implementation of this application embodiment, the recognition module 203 is further configured to perform single-character recognition on the characters in the text box image; and determine the recognition result of the text information based on the result of single-character recognition and the text sequence features.
[0077] Optionally, in one implementation of this application embodiment, the retrieval module 204 also uses regular expression matching to retrieve and match the recognition results of the text information in order to determine the candidate list of the target field.
[0078] Optionally, in one implementation of this application embodiment, the form information recognition device 20 further includes a binarization module (not shown in the figures). The binarization module is used to determine the text box image of the target text information based on the position information of the table in the form image to be recognized, and to binarize the form image to be recognized to obtain the binarized image of the form image to be recognized.
[0079] This application provides a form information recognition device. It includes a detection module to detect tables within a form image and determine their location; a determination module to identify text box images containing target text information based on the table's location; a recognition module to perform text recognition on the text box images and determine the recognition result; a retrieval module to retrieve target fields from the recognition result and determine a candidate list of target fields; and finally, an acquisition module to determine the recognition result of the target information contained in the form image based on the candidate list of target fields and the table's location information. By segmenting the form image into text box images and using the recognized text fields in each text box image as target candidate fields, and further combining this with the table's location information, the device can effectively adapt to recognizing text information from various form images, thus ensuring the accuracy of recognizing and extracting characters or fields such as names, letters, or other special characters from the extracted form information. Furthermore, the implementation process is simple and reliable, and the device has wide applicability.
[0080] Example 3
[0081] Based on the form information recognition method of Embodiment 1 of this application, this application also provides a storage medium storing a computer program thereon. When the program is executed by a processor, it implements the form information recognition method as described in any of the above method embodiments of this application. The form information recognition method includes, but is not limited to:
[0082] The table in the form image to be identified is detected to determine the position information of the table;
[0083] Based on the table's location information, determine the text box image containing the target text information;
[0084] Perform text recognition on the text box image to determine the recognition result of the text information;
[0085] The recognition results of the text information are used to retrieve the target field and determine the candidate list of the target field;
[0086] Based on the candidate list of the target field and the table position information, the recognition result containing the target information in the form image to be recognized is determined.
[0087] This application has now described specific embodiments of the subject matter. Other embodiments are within the scope of the appended claims. In some cases, the actions recited in the claims can be performed in a different order and still achieve the desired result. Furthermore, the processes depicted in the drawings do not necessarily require a specific or sequential order to achieve the desired result. In some embodiments, multitasking and parallel processing can be advantageous.
[0088] In the 1990s, improvements to a technology could be clearly distinguished as either hardware improvements (e.g., improvements to the circuit structure of diodes, transistors, switches, etc.) or software improvements (improvements to the methodology). However, with technological advancements, many methodological improvements today can be considered direct improvements to the hardware circuit structure. Designers almost always obtain the corresponding hardware circuit structure by programming the improved methodology into the hardware circuit. Therefore, it cannot be said that a methodological improvement cannot be implemented using hardware physical modules. For example, a Programmable Logic Device (PLD) (such as a Field Programmable Gate Array (FPGA)) is such an integrated circuit whose logic function is determined by the user programming the device. Designers can program and "integrate" a digital system onto a PLD themselves, without needing chip manufacturers to design and manufacture dedicated integrated circuit chips. Furthermore, nowadays, instead of manually manufacturing integrated circuit chips, this programming is mostly implemented using "logic compiler" software. Similar to the software compiler used in program development, the original code before compilation must be written in a specific programming language, called a Hardware Description Language (HDL). There are many HDLs, such as ABEL (Advanced Boolean Expression Language), AHDL (Altera Hardware Description Language), Confluence, CUPL (Cornell University Programming Language), HDCal, JHDL (Java Hardware Description Language), Lava, Lola, MyHDL, PALASM, and RHDL (Ruby Hardware Description Language). Currently, the most commonly used are VHDL (Very-High-Speed Integrated Circuit Hardware Description Language) and Verilog. Those skilled in the art should understand that by simply performing some logic programming on the method flow using one of these hardware description languages and programming it into an integrated circuit, the hardware circuit implementing the logical method flow can be easily obtained.
[0089] The controller can be implemented in any suitable manner. For example, it can take the form of a microprocessor or processor and a computer-readable medium storing computer-readable program code (e.g., software or firmware) executable by the (micro)processor, logic gates, switches, application-specific integrated circuits (ASICs), programmable logic controllers, and embedded microcontrollers. Examples of controllers include, but are not limited to, the following microcontrollers: ARC 625D, Atmel AT91SAM, Microchip PIC18F26K20, and Silicon Labs C8051F320. A memory controller can also be implemented as part of the control logic of the memory. Those skilled in the art will also recognize that, in addition to implementing the controller in purely computer-readable program code form, the same functionality can be achieved by logically programming the method steps to make the controller take the form of logic gates, switches, application-specific integrated circuits, programmable logic controllers, and embedded microcontrollers. Therefore, such a controller can be considered a hardware component, and the means included therein for implementing various functions can also be considered as structures within the hardware component. Alternatively, the means for implementing various functions can be considered as both software modules implementing the method and structures within the hardware component.
[0090] The systems, devices, modules, or units described in the above embodiments can be implemented by computer chips or entities, or by products with certain functions. A typical implementation device is a computer. Specifically, a computer can be, for example, a personal computer, laptop computer, cellular phone, camera phone, smartphone, personal digital assistant, media player, navigation device, email device, game console, tablet computer, wearable device, or any combination of these devices.
[0091] For ease of description, the above devices are described separately by function as various units. Of course, in implementing this application, the functions of each unit can be implemented in one or more software and / or hardware.
[0092] Those skilled in the art will understand that embodiments of this application can be provided as methods, systems, or computer program products. Therefore, this application can take the form of a completely hardware embodiment, a completely software embodiment, or an embodiment combining software and hardware aspects. Furthermore, this application can take the form of a computer program product embodied on one or more computer-usable storage media (including but not limited to disk storage, CD-ROM, optical storage, etc.) containing computer-usable program code.
[0093] It should also be noted that the terms "comprising," "including," or any other variations thereof are intended to cover non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements includes not only those elements but also other elements not expressly listed, or elements inherent to such process, method, article, or apparatus. Unless otherwise specified, an element defined by the phrase "comprising one..." does not exclude the presence of other identical elements in the process, method, article, or apparatus that includes that element.
[0094] Those skilled in the art will understand that embodiments of this application can be provided as methods, systems, or computer program products. Therefore, this application can take the form of a completely hardware embodiment, a completely software embodiment, or an embodiment combining software and hardware aspects. Furthermore, this application can take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, etc.) containing computer-usable program code.
[0095] This application can be described in the general context of computer-executable instructions that are executed by a computer, such as program modules. Generally, program modules include routines, programs, objects, components, data structures, etc., that perform specific transactions or implement specific abstract data types. This application can also be practiced in distributed computing environments where transactions are performed by remote processing devices connected via a communication network. In distributed computing environments, program modules can reside in local and remote computer storage media, including storage devices.
[0096] The various embodiments in this specification are described in a progressive manner. Similar or identical parts between embodiments can be referred to interchangeably. Each embodiment focuses on describing the differences from other embodiments. In particular, the system embodiments are basically similar to the method embodiments, so the description is relatively simple; relevant parts can be referred to the descriptions in the method embodiments.
[0097] The above are merely embodiments of this application and are not intended to limit the scope of this application. Various modifications and variations can be made to this application by those skilled in the art. Any modifications, equivalent substitutions, improvements, etc., made within the spirit and principles of this application should be included within the scope of the claims of this application.
Claims
1. A method for recognizing form information, characterized in that, include: The table included in the form image to be identified is detected to determine the position information of the table; Based on the position information of the table, determine the text box image containing the target text information; Perform text recognition on the text box image to determine the recognition result of the text information; The recognition results of the text information are used to perform target field retrieval to determine a candidate list of target fields; Based on the candidate list of the target field and the table position information, the recognition result for the target information contained in the form image to be recognized is determined; this step specifically corrects the recognition result according to the following formula: in, The x and y coordinates of the center point of the text box image corresponding to the candidate category of the target field. , This represents the x-coordinates of the left and right boundaries of the corresponding table position. , The vertical coordinates represent the upper and lower boundaries of the corresponding table position. , This indicates the average width and height of the text box image. , This is the printing deviation coefficient.
2. The form information recognition method according to claim 1, characterized in that, The step of detecting the tables included in the form image to be identified and determining the position information of the tables includes: Detect the table lines in the form image to be identified; Based on the detection results, the connected regions in the form image to be identified are determined; The connected regions are fitted to determine the position information of the table.
3. The form information recognition method according to claim 1, characterized in that, The step of performing text recognition on the text box image to determine the recognition result of the text information includes: Perform single-character recognition on the characters in the text box image; Based on the results of the single-character recognition and the text sequence features, the recognition result of the text information is determined.
4. The form information recognition method according to claim 1, characterized in that, The step of performing target field retrieval on the recognition result of the text information to determine a candidate list of target fields includes: The recognition results of the text information are retrieved and matched using regular expression matching to determine the candidate list of the target field.
5. The form information recognition method according to claim 1, characterized in that, Before determining the text box image of the target text information based on the position information of the table in the form image to be identified, the method further includes: binarizing the form image to be identified to obtain a binarized image of the form image to be identified.
6. The form information recognition method according to claim 1, characterized in that, The method for determining the text box image of the target text information based on the position information of the table in the form image to be identified further includes: The form image to be identified is preprocessed, and the preprocessing includes at least one of the image processing methods, namely image correction processing or image smoothing processing.
7. The form information recognition method according to claim 6, characterized in that, The image correction process includes: using a projection correction algorithm to correct the form image to be recognized; The image smoothing process includes: using median filtering technology to process the form image to be identified.
8. A form information recognition device, characterized in that, include: The detection module is used to detect the tables included in the form image to be identified and determine the position information of the tables; The determination module is used to determine the text box image containing the target text information based on the position information of the table; The recognition module is used to perform text recognition on the text box image and determine the recognition result of the text information; The retrieval module is used to perform target field retrieval on the recognition results of the text information and determine a candidate list of the target fields; The acquisition module is used to acquire the recognition result of the target field contained in the form image to be recognized based on the candidate list of the target field and the table position information; Specifically, it is used to correct the recognition results according to the following formula: in, The x and y coordinates of the center point of the text box image corresponding to the candidate category of the target field. , This represents the x-coordinates of the left and right boundaries of the corresponding table position. , The vertical coordinates represent the upper and lower boundaries of the corresponding table position. , This indicates the average width and height of the text box image. , This is the printing deviation coefficient.
9. The form information recognition device according to claim 8, characterized in that, Also includes: A preprocessing module is used to preprocess the form image to be identified before the detection module determines the text box image of the target text information based on the position information of the table in the form image to be identified. The preprocessing includes at least one of image processing, namely image correction processing or image smoothing processing.
10. A storage medium having a computer program stored thereon, which, when executed by a processor, implements the form information recognition method as described in any one of claims 1-7.