A wireless table recognition method, device, equipment and readable storage medium
By generating text boxes and marking coordinate information using a text detection model, the problem of wireless table recognition and conversion into formatted text is solved, achieving fast, efficient and accurate wireless table recognition and conversion.
Patent Information
- Authority / Receiving Office
- CN · China
- Patent Type
- Patents(China)
- Current Assignee / Owner
- CHINA CITIC BANK CO LTD
- Filing Date
- 2022-12-05
- Publication Date
- 2026-06-12
AI Technical Summary
Existing technologies cannot effectively identify and convert wireless forms into formatted text, making it impossible to handle wireless forms in daily work.
Text boxes are generated using a text detection model and their coordinate information is marked. The wireless table is converted into formatted text based on the position information of the text boxes. The position of each row and column is determined by the coordinate information of the text boxes, and erroneously merged text boxes are split.
It enables fast, efficient, and accurate recognition and conversion of wireless forms into formatted text, improving recognition accuracy and making it suitable for applications such as financial reports and bank statements.
Smart Images

Figure CN115761774B_ABST
Abstract
Description
Technical Field
[0001] This invention relates to the field of image recognition, and more specifically, to a wireless form recognition method, apparatus, device, and readable storage medium. Background Technology
[0002] In existing technologies, models are generally used to identify table lines in a table to determine the position of text in an image. Then, relevant technologies are used to recognize and convert the text in the image into formatted text. However, for wireless tables, it is not necessary to use models to locate table lines and determine the position of text boxes. It is clear that the existing processing methods cannot recognize and process wireless tables into formatted text. How to convert wireless tables, which cannot be processed in daily work, into formatted text is an urgent problem to be solved. Summary of the Invention
[0003] The purpose of this invention is to provide a wireless form recognition method, apparatus, device, and readable storage medium to improve the above-mentioned problems.
[0004] To achieve the above objectives, the embodiments of this application provide the following technical solutions:
[0005] On one hand, embodiments of this application provide a wireless table recognition method, the method comprising:
[0006] Acquire image information, which includes information about the wireless form to be identified;
[0007] The image information is sent to the text detection model to obtain the first information. The text detection model is used to detect at least one text information included in the wireless table to be identified, and to generate a corresponding text box for each text information, while marking the coordinate information of each text box.
[0008] The first table information is obtained based on the first information. The first table information includes the position information of each text box in each row and the position information of each text box in each column of the wireless table to be identified.
[0009] The wireless form to be identified is converted into formatted text based on the information in the first form.
[0010] Secondly, embodiments of this application provide a wireless form recognition device, the device comprising:
[0011] The first acquisition module is used to acquire image information, the image information including information about the wireless table to be identified;
[0012] The detection module is used to send the image information to the text detection model to obtain first information. The text detection model is used to detect at least one text information included in the wireless table to be identified, and generate a corresponding text box for each text information, while marking the coordinate information of each text box.
[0013] The judgment module is used to obtain first table information based on the first information. The first table information includes the position information of each text box in each row and the position information of each text box in each column of the wireless table to be identified.
[0014] The conversion module is used to convert the wireless form to be identified into formatted text based on the information in the first form.
[0015] Thirdly, embodiments of this application provide a wireless form recognition device, the device including a memory and a processor. The memory stores a computer program; the processor executes the computer program to implement the steps of the aforementioned wireless form recognition method.
[0016] Fourthly, embodiments of this application provide a readable storage medium storing a computer program, which, when executed by a processor, implements the steps of the wireless table recognition method described above.
[0017] The beneficial effects of this invention are as follows:
[0018] 1. This invention utilizes a text detection model to accurately locate text information areas in a wireless table. By generating text boxes from the text information, the position of the text box in each row and column of the wireless table to be identified can be determined based on the coordinate information of the text box, and the wireless table can be accurately restored. This provides a fast, effective and highly accurate identification method for solving the problem of wireless table recognition.
[0019] 2. This invention generates a corresponding second boundary line for each column based on the number of text boxes and the coordinate information of the text in each text box, and splits the text boxes that are mistakenly merged in different columns, thereby improving the accuracy of wireless table recognition.
[0020] Other features and advantages of the invention will be set forth in the following description, and will be apparent in part from the description, or may be learned by practicing embodiments of the invention. The objects and other advantages of the invention may be realized and obtained by means of the structures particularly pointed out in the written description, claims, and drawings. Attached Figure Description
[0021] To more clearly illustrate the technical solutions of the embodiments of the present invention, the accompanying drawings used in the embodiments will be briefly introduced below. It should be understood that the following drawings only show some embodiments of the present invention and should not be regarded as a limitation on the scope. For those skilled in the art, other related drawings can be obtained based on these drawings without creative effort.
[0022] Figure 1 This is a schematic diagram of the wireless table recognition method described in an embodiment of the present invention.
[0023] Figure 2 This is a schematic diagram of the wireless form recognition device described in an embodiment of the present invention.
[0024] Figure 3 This is a schematic diagram of the structure of the wireless form recognition device described in an embodiment of the present invention. Detailed Implementation
[0025] To make the objectives, technical solutions, and advantages of the embodiments of the present invention clearer, the technical solutions of the embodiments of the present invention will be clearly and completely described below with reference to the accompanying drawings. Obviously, the described embodiments are only some, not all, of the embodiments of the present invention. The components of the embodiments of the present invention described and shown in the accompanying drawings can generally be arranged and designed in various different configurations. Therefore, the following detailed description of the embodiments of the present invention provided in the accompanying drawings is not intended to limit the scope of the claimed invention, but merely to illustrate selected embodiments of the invention. All other embodiments obtained by those skilled in the art based on the embodiments of the present invention without inventive effort are within the scope of protection of the present invention.
[0026] It should be noted that similar reference numerals and letters in the following figures indicate similar items; therefore, once an item is defined in one figure, it does not need to be further defined and explained in subsequent figures. Furthermore, in the description of this invention, terms such as "first," "second," etc., are used only to distinguish descriptions and should not be construed as indicating or implying relative importance.
[0027] Example 1
[0028] like Figure 1 As shown, this embodiment provides a wireless table recognition method, which includes steps S1, S2, S3 and S7.
[0029] Step S1: Obtain image information, the image information including information about the wireless table to be identified;
[0030] Step S2: Send the image information to the text detection model to obtain the first information. The text detection model is used to detect at least one text information included in the wireless table to be identified, and generate a corresponding text box for each text information, while marking the coordinate information of each text box.
[0031] Step S3: Obtain first table information based on the first information. The first table information includes the position information of each text box in each row and the position information of each text box in each column of the wireless table to be identified.
[0032] Step S7: Convert the wireless table to be identified into formatted text based on the information in the first table.
[0033] In existing technologies, models are generally used to identify table lines in a table to determine the location information of text in an image. However, in daily work, the recognition and conversion of wireless tables is unavoidable, and existing recognition methods cannot be used to process wireless tables.
[0034] Therefore, this embodiment uses a text detection model to detect text information in the wireless table to be identified, generates a corresponding text box for each text information, and marks the coordinate information of each text box. The position area of the text information in the wireless table is located by the coordinate information of each text box. Then, the text boxes are processed according to their position information to obtain the position information of each text box in each row and the position information of each text box in each column. Determining the position information of each text box also determines the position and format of the text when the wireless table to be identified is converted into formatted text. The format can be an EXCEL format, where each row of text boxes corresponds to each row of cells in an EXCEL table, each column of text boxes corresponds to each column of cells in an EXCEL table, and the text information included in each text box corresponds to the text information included in each cell.
[0035] Based on the above features, this embodiment can quickly and efficiently identify wireless tables and accurately convert them into formatted text. This invention provides a fast, efficient and accurate method for identifying wireless tables. This method can be widely used in situations where wireless tables need to be identified and converted into formatted text for further processing in daily life or work, such as financial statements, bank statements and other wireless tables.
[0036] In one specific embodiment of this disclosure, step S3 may further include steps S31 and S32.
[0037] Step S31: Sort the text boxes from top to bottom according to the coordinate information of each text box to obtain the second table information;
[0038] Step S32: Determine whether each text box in each row of the second table information is a text box in the same row. If so, obtain the position information corresponding to each text box in each row of the wireless table to be identified. If not, detect whether the text box is a text box in the next row.
[0039] In this embodiment, each text box can be sorted from top to bottom using its vertical coordinate. After sorting, the position information of each row of text boxes can be initially determined. Since the number of characters in each text box is different, the size of the text boxes may be different. Therefore, it is necessary to judge whether the text boxes in each row are all in the same row. If they are, the position information of the text boxes in each row can be determined. If not, it is checked whether the text box is in the next row and then merged into the next row.
[0040] In one specific embodiment of this disclosure, step S32 may further include steps S321, S322 and S323.
[0041] Step S321: Obtain the height information of each text box;
[0042] Step S322: Calculate the height difference between two adjacent text boxes in each row of the second table information to obtain the first calculation result;
[0043] Step S323: Based on comparing the first calculation result with the height threshold, determine whether two adjacent text boxes in each row of the second table information are text boxes in the same row.
[0044] In this embodiment, since the number of characters in each text box is different, the size of the text boxes may be different. Based on the height information of each text box, the height of each text box is obtained. When the height difference between two adjacent text boxes is large and greater than the height threshold, it can be determined that the two adjacent text boxes are not on the same line.
[0045] In one specific embodiment of this disclosure, step S323 may be followed by steps S324, S325 and S326.
[0046] Step S324: Generate a corresponding text box inner line for each text box, wherein the text box inner line is a horizontal center line;
[0047] Step S325: Extend the text box line to both sides to obtain the extension line of the text box line;
[0048] Step S326: Determine whether the extension line of the text box intersects with the adjacent text box in the horizontal direction. If they intersect, determine that the text boxes are on the same line. If they do not intersect, determine that the text boxes are not on the same line.
[0049] In this embodiment, the position of a text box can be determined by whether it intersects with the extension line of the text box. If the height of adjacent text boxes differs greatly, it may lead to the text boxes in the previous or next row being judged as text boxes in the same row. However, the method of the present invention excludes the case where text boxes with large height differences are in the same row in advance, thus improving the accuracy of wireless table recognition.
[0050] In one specific embodiment of this disclosure, step S32 may be followed by steps S33, S34, S35 and S36.
[0051] Step S33: Adjust each column of the second table information according to the coordinate information of the text box to obtain the adjusted second table information, wherein the adjusted second table information includes the alignment of each column of text boxes;
[0052] Step S34: Generate a corresponding first boundary line for each text box according to the alignment of each column of text boxes. The first boundary line is the boundary line in the vertical direction of the text box.
[0053] Step S35: Calculate the distance between the first boundary lines of two adjacent columns of text boxes to obtain the second calculation result;
[0054] Step S36: Based on comparing the second calculation result with the distance threshold, obtain the position information corresponding to each text box in each column of the wireless table to be identified.
[0055] In this embodiment, the text boxes in each column can be adjusted using the horizontal coordinate of each text box. After adjustment, the alignment of each column of text boxes can be obtained, including left alignment, right alignment, and center alignment. Different first boundary lines are generated according to different alignment methods: left alignment generates the left boundary line of the text box, right alignment generates the right boundary line of the text box, and center alignment generates the center line of the text box in the numerical direction as the boundary line. By comparing the distance between the first boundary lines of two adjacent columns with a distance threshold, it can be determined whether the two adjacent columns are the same column. If it is less than the distance threshold, the two adjacent columns are merged; if it is greater than the distance threshold, it can be determined that the two adjacent columns belong to different columns, thereby obtaining the position information of the text boxes in each column.
[0056] In one specific embodiment of this disclosure, step S3 may be followed by steps S4, S5 and S6.
[0057] Step S4: Obtain the number of characters in each text box and the coordinates of each character in each text box;
[0058] Step S5: Generate a corresponding second boundary line for each column of text boxes based on the number of characters included in each text box;
[0059] Step S6: Split the cross-column text box according to the coordinate information of each character in each text box. The cross-column text box is the text box that intersects with the second boundary line.
[0060] In this embodiment, since there may be erroneously merged text boxes in the identified wireless table, it is necessary to detect and split these erroneously merged text boxes. Based on the number of characters in each text box, a corresponding second boundary line can be generated for each column of text boxes. If there are erroneously merged text boxes, it can be determined that the second boundary line must intersect with the erroneously merged text boxes. At the same time, in order to prevent the text from being erroneously split and cut off during the splitting process, the method of this invention uses the coordinate information of each character to split the erroneously merged text boxes and determines whether the character belongs to the left or right column. This also avoids the problem of missing characters caused by cutting text boxes, effectively avoiding the problem of erroneous character splitting and improving the accuracy of wireless table recognition. In addition, the method of this invention will merge some text boxes whose position information cannot be determined when determining the position information of each column of text boxes into the column with the shortest distance based on the distance between the text box and each column of text boxes.
[0061] Example 2
[0062] like Figure 2 As shown, this embodiment provides a wireless table recognition device, which includes a first acquisition module 901, a detection module 902, a judgment module 903, and a conversion module 907.
[0063] The first acquisition module 901 is used to acquire image information, the image information including information about the wireless table to be identified;
[0064] The detection module 902 is used to send the image information to the text detection model to obtain first information. The text detection model is used to detect at least one text information included in the wireless table to be identified, and generate a corresponding text box for each text information, while marking the coordinate information of each text box.
[0065] The judgment module 903 is used to obtain first table information based on the first information. The first table information includes the position information of each text box in each row and the position information of each text box in each column of the wireless table to be identified.
[0066] The conversion module 907 is used to convert the wireless table to be identified into formatted text based on the first table information.
[0067] Based on the above features, this embodiment can quickly and efficiently identify wireless tables and accurately convert them into formatted text. This invention provides a fast, efficient and accurate device for identifying wireless tables. This device can be widely used in situations where wireless tables need to be identified and converted into formatted text for further processing in daily life or work, such as financial statements, bank statements and other wireless tables.
[0068] In one specific embodiment of this disclosure, the judgment module 903 includes a sorting unit 9031 and a first judgment unit 9032.
[0069] The sorting unit 9031 is used to sort the text boxes from top to bottom according to the coordinate information of each text box to obtain the second table information;
[0070] The first judgment unit 9032 is used to determine whether each text box in each row of the second table information is a text box in the same row. If so, the position information corresponding to each text box in each row of the wireless table to be identified is obtained; if not, the position information of the text box is detected as a text box in the next row.
[0071] In one specific embodiment of this disclosure, the first judgment unit 9032 includes an acquisition unit 90321, a first calculation unit 90322, and a first sub-judgment unit 90323.
[0072] The acquisition unit 90321 is used to acquire the height information of each text box;
[0073] The first calculation unit 90322 is used to calculate the height difference between two adjacent text boxes in each row of the second table information to obtain a first calculation result;
[0074] The first sub-judgment unit 90323 is used to determine whether two adjacent text boxes in each row of the second table information are text boxes in the same row by comparing the first calculation result with the height threshold.
[0075] In one specific embodiment of this disclosure, the device further includes a first generation unit 90324, an extension unit 90325, and a second sub-judgment unit 90326.
[0076] The first generation unit 90324 is used to generate a corresponding text box inner line for each text box, wherein the text box inner line is a horizontal center line;
[0077] The extension unit 90325 is used to extend the text box line to both sides to obtain the extension line of the text box line;
[0078] The second sub-judgment unit 90326 is used to determine whether the extension line of the text box intersects with the adjacent text box in the horizontal direction. If they intersect, the text boxes are determined to be on the same line. If they do not intersect, the text boxes are determined to be on different lines.
[0079] In one specific embodiment of this disclosure, the device further includes an adjustment unit 9033, a second generation unit 9034, a second calculation unit 9035, and a second judgment unit 9036.
[0080] The adjustment unit 9033 is used to adjust each column of the second table information according to the coordinate information of the text box to obtain the adjusted second table information, the adjusted second table information including the alignment of each column of text box;
[0081] The second generation unit 9034 is used to generate a corresponding first boundary line for each text box according to the alignment of each column of text boxes. The first boundary line is the boundary line in the vertical direction of the text box.
[0082] The second calculation unit 9035 is used to calculate the distance between the first boundary lines of two adjacent text boxes to obtain a second calculation result;
[0083] The second judgment unit 9036 is used to obtain the position information corresponding to each text box in each column of the wireless table to be identified by comparing the second calculation result with the distance threshold.
[0084] In one specific embodiment of this disclosure, the device further includes a second acquisition module 904, a generation module 905, and a splitting module 906.
[0085] The second acquisition module 904 is used to acquire the number of characters in each text box and the coordinate information of each character in each text box;
[0086] The generation module 905 is used to generate a corresponding second boundary line for each column of text boxes based on the number of characters included in each text box.
[0087] The splitting module 906 is used to split the cross-column text box according to the coordinate information of each character in each text box, wherein the cross-column text box is a text box that intersects with the second boundary line.
[0088] It should be noted that the specific manner in which each module performs its operation in the apparatus described in the above embodiments has been described in detail in the embodiments of the method, and will not be elaborated here.
[0089] Example 3
[0090] Corresponding to the above method embodiments, this disclosure also provides a wireless form recognition device. The wireless form recognition device described below and the wireless form recognition method described above can be referred to each other.
[0091] Figure 3 This is a block diagram illustrating a wireless form recognition device 800 according to an exemplary embodiment. Figure 3 As shown, the wireless form recognition device 800 may include a processor 801 and a memory 802. The wireless form recognition device 800 may also include one or more of a multimedia component 803, an input / output (I / O) interface 804, and a communication component 805.
[0092] The processor 801 controls the overall operation of the wireless form recognition device 800 to complete all or part of the steps in the aforementioned wireless form recognition method. The memory 802 stores various types of data to support the operation of the wireless form recognition device 800. This data may include, for example, instructions for any application or method operating on the wireless form recognition device 800, and application-related data such as contact data, sent and received messages, pictures, audio, video, etc. The memory 802 can be implemented by any type of volatile or non-volatile storage device or a combination thereof, such as Static Random Access Memory (SRAM), Electrically Erasable Programmable Read-Only Memory (EEPROM), Erasable Programmable Read-Only Memory (EPROM), Programmable Read-Only Memory (PROM), Read-Only Memory (ROM), magnetic storage, flash memory, magnetic disk, or optical disk. Multimedia component 803 may include a screen and an audio component. The screen may be, for example, a touchscreen, and the audio component is used to output and / or input audio signals. For example, the audio component may include a microphone for receiving external audio signals. The received audio signals may be further stored in memory 802 or transmitted via communication component 805. The audio component also includes at least one speaker for outputting audio signals. I / O interface 804 provides an interface between processor 801 and other interface modules, such as a keyboard, mouse, buttons, etc. These buttons may be virtual or physical buttons. Communication component 805 is used for wired or wireless communication between the wireless form recognition device 800 and other devices. Wireless communication may include Wi-Fi, Bluetooth, Near Field Communication (NFC), 2G, 3G, or 4G, or a combination of these. Therefore, the corresponding communication component 805 may include a Wi-Fi module, a Bluetooth module, or an NFC module.
[0093] In one exemplary embodiment, the wireless form recognition device 800 may be implemented by one or more application-specific integrated circuits (ASICs), digital signal processors (DSPs), digital signal processing devices (DSPDs), programmable logic devices (PLDs), field-programmable gate arrays (FPGAs), controllers, microcontrollers, microprocessors, or other electronic components to perform the wireless form recognition method described above.
[0094] In another exemplary embodiment, a computer-readable storage medium including program instructions is also provided, which, when executed by a processor, implement the steps of the wireless form recognition method described above. For example, the computer-readable storage medium may be the memory 802 including the program instructions described above, which may be executed by the processor 801 of the wireless form recognition device 800 to complete the wireless form recognition method described above.
[0095] Corresponding to the above method embodiments, this disclosure also provides a readable storage medium, and the readable storage medium described below can be referred to in conjunction with the wireless table recognition method described above.
[0096] Example 4
[0097] A readable storage medium storing a computer program, which, when executed by a processor, implements the steps of the wireless table recognition method described in the above method embodiments.
[0098] Specifically, the readable storage medium can be a USB flash drive, external hard drive, read-only memory (ROM), random access memory (RAM), magnetic disk, or optical disk, or any other readable storage medium capable of storing program code.
[0099] The above description is merely a preferred embodiment of the present invention and is not intended to limit the invention. Various modifications and variations can be made to the present invention by those skilled in the art. Any modifications, equivalent substitutions, improvements, etc., made within the spirit and principles of the present invention should be included within the scope of protection of the present invention.
[0100] The above description is merely a specific embodiment of the present invention, but the scope of protection of the present invention is not limited thereto. Any variations or substitutions that can be easily conceived by those skilled in the art within the technical scope disclosed in the present invention should be included within the scope of protection of the present invention. Therefore, the scope of protection of the present invention should be determined by the scope of the claims.
Claims
1. A wireless form recognition method, characterized in that, include: Acquire image information, which includes information about the wireless form to be identified; The image information is sent to the text detection model to obtain the first information. The text detection model is used to detect at least one text information included in the wireless table to be identified, and to generate a corresponding text box for each text information, while marking the coordinate information of each text box. The first table information is obtained based on the first information. The first table information includes the position information of each text box in each row and the position information of each text box in each column of the wireless table to be identified. Based on the information in the first table, the wireless table to be identified is converted into formatted text; The step of obtaining the first table information based on the first information includes: The text boxes are sorted from top to bottom according to their coordinate information to obtain the second table information; Determine whether each text box in each row of the second table information is a text box in the same row. If yes, obtain the position information corresponding to each text box in each row of the wireless table to be identified; if not, detect whether the text box is a text box in the next row. The determination of whether each text box in each row of the second table information is a text box in the same row includes: Get the height information of each text box; Calculate the height difference between two adjacent text boxes in each row of the second table information to obtain the first calculation result; Based on the comparison between the first calculation result and the height threshold, it is determined whether two adjacent text boxes in each row of the second table information are text boxes in the same row.
2. A wireless form recognition device, characterized in that, include: The first acquisition module is used to acquire image information, the image information including information about the wireless table to be identified; The detection module is used to send the image information to the text detection model to obtain first information. The text detection model is used to detect at least one text information included in the wireless table to be identified, and generate a corresponding text box for each text information, while marking the coordinate information of each text box. The judgment module is used to obtain first table information based on the first information. The first table information includes the position information of each text box in each row and the position information of each text box in each column of the wireless table to be identified. The conversion module is used to convert the wireless table to be identified into formatted text based on the information in the first table. The judgment module includes: A sorting unit is used to sort the text boxes from top to bottom according to the coordinate information of each text box to obtain second table information; The first judgment unit is used to determine whether each text box in each row of the second table information is a text box in the same row. If so, the position information corresponding to each text box in each row of the wireless table to be identified is obtained; if not, the text box is detected as a text box in the next row. The first determination unit includes: The acquisition unit is used to obtain the height information of each text box; The first calculation unit is used to calculate the height difference between two adjacent text boxes in each row of the second table information to obtain the first calculation result. The first sub-judgment unit is used to determine whether two adjacent text boxes in each row of the second table information are text boxes in the same row by comparing the first calculation result with the height threshold.
3. A wireless form recognition device, characterized in that, include: Memory, used to store computer programs; A processor, configured to implement the steps of the wireless table recognition method as described in claim 1 when executing the computer program.
4. A readable storage medium, characterized in that: The readable storage medium stores a computer program that, when executed by a processor, implements the steps of the wireless table recognition method as described in claim 1.