Information processing device, information processing system, information processing method, and program

The information processing device enhances table recognition by generating consistent training data through boundary detection and expansion, addressing the inconsistency in learning data for models with omitted grid lines, thereby improving recognition accuracy.

JP2026096114APending Publication Date: 2026-06-12CANON KK

Patent Information

Authority / Receiving Office
JP · JP
Patent Type
Applications
Current Assignee / Owner
CANON KK
Filing Date
2024-12-02
Publication Date
2026-06-12

AI Technical Summary

Technical Problem

Existing table recognition models struggle with inconsistent learning data due to varying ways of drawing ruled lines in tables with omitted grid lines, leading to inconsistent labeling and poor recognition performance.

Method used

An information processing device generates training data by detecting and expanding cell boundaries using character recognition and a neural network model, creating consistent ground truth labels for tables with or without grid lines.

🎯Benefits of technology

This approach enables accurate table recognition in document images, regardless of the presence or absence of grid lines, by ensuring consistent and effective training data generation.

✦ Generated by Eureka AI based on patent content.

Smart Images

  • Figure 2026096114000001_ABST
    Figure 2026096114000001_ABST
Patent Text Reader

Abstract

This generates training data suitable for obtaining a learning model that can accurately recognize tables within document images, regardless of the presence or absence of grid lines. [Solution] The information processing device is an information processing device that generates training data for obtaining a trained model for performing table recognition on a document image, and includes an acquisition means for acquiring a document image containing a table in which strings are written in cells, and a correct image showing the structure of the table contained in the document image, a detection means for detecting the boundaries between cells constituting the table in the correct image, and a generation means for generating a correct image in which new boundaries are set based on the strings adjacent to the boundaries obtained by character recognition on the document image.
Need to check novelty before this filing date? Find Prior Art

Description

【Technical Field】 【0001】 The present disclosure relates to an information processing apparatus, an information processing system, an information processing method, and a program. 【Background Art】 【0002】 There is a technique for recognizing a table in an image of a document on which a明细表such as an invoice or an order form is described, and extracting character strings such as product names, quantities, unit prices, and amounts from the recognized table. Patent Document 1 discloses a technique for performing table recognition on an image of a明细表using a machine learning model, and extracting item values for items such as the quantity and unit price of a product by associating a cell region in the recognized table with a character string recognized by character recognition in the image. 【Prior Art Documents】 【Patent Documents】 【0003】 【Patent Document 1】 U.S. Patent No. 11087123 【Summary of the Invention】 【Problems to be Solved by the Invention】 【0004】 In order to perform table recognition using a machine learning model, learning data for learning the boundaries and regions of cells in the table is required. There are tables with a table structure in which items forming the elements of the table are separated by ruled lines, and tables with a table structure in which the ruled lines between items are omitted. In a correct answer image corresponding to an image including a table having a table structure with omitted ruled lines, there are various ways of drawing ruled lines as ruled lines for inferring the omitted ruled lines. Therefore, there is a problem that the learning data lacks consistency, and it becomes difficult to obtain good results with a model obtained using such learning data. 【Means for Solving the Problems】 【0005】 An information processing device according to one aspect of the present disclosure is an information processing device that generates training data for obtaining a trained model for performing table recognition on a document image, and is characterized by comprising: acquisition means for acquiring a document image including a table in which strings are written in cells, and a correct image showing the structure of the table included in the document image; detection means for detecting boundaries between cells constituting the table in the correct image; and generation means for generating a correct image in which new boundaries are set based on the strings adjacent to the boundaries obtained by character recognition on the document image. [Effects of the Invention] 【0006】 The technology disclosed herein makes it possible to generate training data suitable for obtaining a learning model that can accurately recognize tables within document images, regardless of the presence or absence of grid lines. [Brief explanation of the drawing] 【0007】 [Figure 1] This is a diagram showing an example of the configuration of an information processing system. [Figure 2] This diagram shows the hardware configuration of the image processing device, learning device, and information processing device. [Figure 3] This is a diagram showing the processing sequence of an information processing system. [Figure 4] This flowchart shows the process flow for training a cell boundary recognition model. [Figure 5] This figure shows an example of a training document image and an example of its correct label image. [Figure 6] This diagram illustrates the creation of ground truth label images for cell boundaries. [Figure 7] This flowchart shows the detailed flow of the cell boundary extension process. [Figure 8] This diagram illustrates the expansion of boundaries within the ground truth label image. [Figure 9] This flowchart shows the detailed flow of the item value extraction process. [Figure 10] This figure shows an example of a document image and an example of the recognition results of a cell boundary recognition model. [Figure 11] This figure shows an example of item value extraction based on cell boundary recognition results. [Figure 12] This figure shows an example of a confirmation screen. [Figure 13] This diagram illustrates the expansion of cell boundaries within the ground truth label image. [Figure 14] This flowchart shows the detailed flow of the cell boundary extension process. [Figure 15] This diagram illustrates the extension of the boundary lines of the ground truth label images for a cell region. [Figure 16] This figure shows an example of a document image including a detailed table and an example of a correct label image. [Modes for carrying out the invention] 【0008】 Before describing embodiments of this disclosure, we will explain the factors that prevent the generation of training data suitable for obtaining a learning model that accurately recognizes tables in document images, regardless of the presence or absence of grid lines. 【0009】 First, as in Patent Document 1, learning data is required to train a machine learning model that recognizes the boundaries and regions of cells in a table on a pixel-by-pixel basis. When training a model to recognize cell boundaries, the learning data consists of a document image including a statement and a correct label image indicating pixels corresponding to the cell boundaries in the document image. For example, it is a document image 1600 including the statement 1601 shown in FIG. 16(a) and a correct label image 1610 shown in FIG. 16(b). In the correct label image 1610, for each pixel, in the case of a pixel belonging to the boundary between cells, a value indicating that it is a boundary (for example, 255 for the white part in the image) is set, and in the case of a pixel not belonging to the boundary between cells, a value indicating that it is not a boundary (for example, 0 for the black part in the image) is set. Thereby, the correct label 1611 for the statement 1601 is obtained. In the statement 1601 in the document image 1600 of FIG. 16(a), a grid line is drawn for pixels corresponding to the boundary, but there may be cases where no grid line is drawn as in the statement 1621 in the document image 1620 of FIG. 16(c). In such a table, although no grid line is drawn, since each cell in the table exists, based on the cognition of a human who is the creator of the correct label, a correct label is assigned to the pixels belonging to the cell boundary in the same manner as in the document image 1600. Thereby, as shown in FIG. 16(d), a correct label image 1630 is obtained, and a correct label 1631 for the statement 1621 is obtained. 【0010】 However, in the above, there may be multiple patterns for assigning the correct label to the boundary in the correct label 1631 for the statement 1621. For example, as shown in FIG. 16(e), when assigning a vertical boundary between "Product Code - Product Name" and "Quantity" in the statement 1621, the correct labels 1631 and 1632 are considered as the correct label for the statement 1621. This means that there is a possibility that the tendency of the correct label may vary depending on the operator who assigns the correct label and the time and circumstances of the work. Therefore, there is a problem that the learning data loses consistency, and it is difficult to obtain good results with a model obtained using such learning data. This issue was not considered in Patent Document 1. 【0011】 Hereinafter, embodiments for implementing the technology of the present disclosure will be described in detail while referring to the drawings. Note that the following embodiments do not limit the technology of the present disclosure according to the claims. Not all combinations of features described in the embodiments are essential as solutions to the technology of the present disclosure, and a plurality of features may be arbitrarily combined. For the same configuration, the same reference numerals will be used for description. Also, each step (step) in the flowchart will be denoted by prefixing "S" at the beginning. 【0012】 <<Embodiment 1>> <Information Processing System> FIG. 1 is a diagram showing a configuration example of an information processing system. As shown in FIG. 1, the information processing system 100 includes, for example, an image processing apparatus 110, a learning apparatus 120, and an information processing apparatus 130, and each apparatus is connected to each other via a network 140. In the information processing system 100, the image processing apparatus 110, the learning apparatus 120, and the information processing apparatus 130 may be configured to be connected to the network 140 in a plurality of connections instead of a single connection. For example, the information processing apparatus 130 may be configured by a first server apparatus having high-speed computing resources and a second server apparatus having a large-capacity storage, and may be connected to each other via the network 140. 【0013】 The image processing apparatus 110 is realized by an MFP (Multi-Function Peripheral) having a plurality of functions such as printing, scanning, and FAX. The image processing apparatus 110 has at least an image acquisition unit 111 as a functional unit. 【0014】 The image processing device 110 has a scanner device 206 (see Figure 2). The scanner device 206 optically reads a document 11 on which a string of characters is printed on a storage medium such as paper, and the image acquisition unit 111 generates a document image 13 by performing predetermined image processing on the data obtained by reading. Alternatively, for example, the image acquisition unit 111 receives fax data 12 sent from a fax transmitter (not shown) and generates a document image 13 by performing predetermined fax image processing. The image acquisition unit 111 transmits the generated document image 13 to the information processing device 130. 【0015】 The image processing device 110 may be implemented using a PC (Personal Computer) or other means, in addition to the MFP (Multifunction Printer) equipped with the scanning or faxing functions described above. For example, a document image 13 such as a PDF or JPEG, generated using a document creation application running on a PC, may be sent to the information processing device 130. 【0016】 The learning device 120 has functional units: a generation unit 121 and a learning unit 122. Multiple learning document images 14 are input to the learning device 120. 【0017】 The generation unit 121 generates a ground truth label image 15 from each of the multiple training document images 14. The learning unit 122 trains a cell boundary recognition model 16 using the training document images 14 and the ground truth label images 15. The "cell boundary recognition model" is a recognizer that recognizes the boundaries of tables in a document image. The cell boundary recognition model is configured as a neural network. Furthermore, "learning" as described above refers to adjusting the parameter values ​​of the cell boundary recognition model, which is a neural network, using a deep learning method based on the training document images 14 and the ground truth label images 15. Note that any known method can be used to train the cell boundary recognition model. For example, a Fully Convolutional Network (FCN) that detects objects in an image at the pixel level can be used. 【0018】 The information processing device 130 has functional units: an information processing unit 131 and a storage unit 132. The information processing unit 131 acquires multiple strings contained in the document image 13 received from the image processing device 110, and then extracts strings 17 from the detail table. The storage unit 132 stores the strings that are the result of the extraction by the information processing unit 131. Specifically, the information processing unit 131 uses the cell boundary recognition model 16 received from the learning unit 122 of the learning device 120 to recognize the boundaries between cells in the detail table from the document image 13, and detects the closed region enclosed by the boundary as a cell region. Next, the information processing unit 131 performs character recognition processing (OCR (Optical Character Recognition)) on the detected cell region to extract the strings. The strings 17 from the detail table to be extracted are those written in the detail table, such as product name, quantity, unit price, amount, etc. 【0019】 Network 140 is implemented using a LAN or WAN, and is a communication unit that connects the image processing device 110, the learning device 120, and the information processing device 130 to each other, and for sending and receiving data between the devices. 【0020】 <Device Hardware Configuration> Figure 2 shows an example of the hardware configuration of the image processing device 110, the learning device 120, and the information processing device 130. 【0021】 (Hardware configuration of the image processing device) Figure 2(a) shows an example of the hardware configuration of the image processing device 110. As shown in Figure 2(a), the image processing device 110 includes a CPU 201, ROM 202, RAM 204, a printer device 205, a scanner device 206, and a document transport device 207. The image processing device 110 further includes a storage device 208, an input device 209, a display device 210, and an external interface 211. The above-mentioned devices 201 to 211 are connected to a data bus 203 so that they can send and receive data from each other. 【0022】 The CPU 201 is a control unit that controls the overall operation of the image processing device 110. The CPU 201 starts up the image processing device 110 system by executing a startup program stored in the ROM 202. Then, the CPU 201 implements the print, scan, fax, and other functions of the image processing device 110 by executing a control program stored in the storage 208. 【0023】 ROM202 is implemented as non-volatile memory and is a storage unit for storing a startup program that starts the image processing device 110. Data bus 203 is a communication unit for sending and receiving data between devices that make up the image processing device 110. RAM204 is implemented as volatile memory and is a storage unit used as work memory when the CPU 201 executes a control program. 【0024】 The printer device 205 is an image output device and is a processing unit for printing and outputting document images from inside the image processing device 110 onto a storage medium such as paper. The scanner device 206 is an image input device and is a processing unit for optically reading a storage medium such as paper on which text, diagrams, etc. are printed and acquiring it as a document image. The document transport device 207 is implemented as an ADF (Auto Document Feeder) or the like and is a processing unit for detecting documents placed on the document glass and transporting the detected documents one by one to the scanner device 206. 【0025】 Storage 208 is implemented as an HDD (Hard Disk Drive) or the like, and is a storage unit for storing the aforementioned control program and document images. Input device 209 is implemented as a touch panel and hard keys or the like, and is a processing unit for receiving operation input from the user to the image processing device 110. Display device 210 is implemented as a liquid crystal display or the like, and is a display unit for displaying various display screens of the image processing device 110 to the user. External interface 211 connects the image processing device 110 and the network 140, and is an interface unit for receiving fax data from a fax transmitter (not shown) and sending document images to the information processing device 130. 【0026】 (Hardware configuration of the learning device) Figure 2(b) shows an example of the hardware configuration of the learning device 120. As shown in Figure 2(b), the learning device 120 has a CPU 231, ROM 232, RAM 234, storage 235, input device 236, display device 237, external interface 238, and GPU 239. The above-mentioned devices 231, 232, 234-239 are connected to a data bus 233 so that they can send and receive data from each other. 【0027】 The CPU 231 is a control unit for controlling the overall operation of the learning device 120. The CPU 231 starts the system of the learning device 120 by executing a boot program stored in the ROM 232. The CPU 231 also generates a cell boundary recognition model by executing a learning program stored in the storage 235. 【0028】 ROM232 is implemented as non-volatile memory and is a storage unit for storing the boot program that starts the learning device 120. Data bus 233 is a communication unit for sending and receiving data between devices that make up the learning device 120. RAM234 is implemented as volatile memory and is a storage unit used as work memory when the CPU 231 executes the learning program. 【0029】 Storage 235 is implemented as an HDD (Hard Disk Drive) or the like, and is a memory unit for storing the aforementioned control program, cell boundary recognition model, etc. Input device 236 is implemented as a mouse and keyboard or the like, and is a processing unit for receiving operation input to the learning device 120 from the engineer (operator of the learning device 120). Display device 237 is implemented as a liquid crystal display or the like, and is a display unit for displaying various display screens of the learning device 120 to the engineer. CPU 231 operates as a display control unit that controls the screen displayed on display device 237. 【0030】 The external interface 238 connects the learning device 120 and the network 140, and is an interface unit for receiving images from a PC (not shown) and transmitting the cell boundary recognition model 16 to the information processing device 130. The GPU 239 is a processing unit composed of image processing processors. For example, the GPU 239 performs calculations to train the cell boundary recognition model 16 based on training data, according to control commands given by the CPU 231. 【0031】 (Hardware configuration of information processing equipment) Figure 2(c) shows an example of the hardware configuration of the information processing device 130. As shown in Figure 2(c), the information processing device 130 has a CPU 261, ROM 262, RAM 264, storage 265, input device 266, display device 267, and external interface 268. Devices 261, 262, 264-268 are connected to a data bus 263 so that they can send and receive data from each other. 【0032】 The CPU 261 is a control unit that controls the overall operation of the information processing device 130. The CPU 261 starts up the information processing device 130 system by executing a boot program stored in the ROM 262. Then, the CPU 261 executes the information processing program stored in the storage 265 to perform processing of the information processing unit 141, such as character recognition and information extraction. 【0033】 ROM262 is implemented as non-volatile memory and is a storage unit for storing the boot program that starts the information processing device 130. Data bus 263 is a communication unit for sending and receiving data between devices that make up the information processing device 130. RAM264 is implemented as volatile memory and is a storage unit used as work memory when the CPU 261 executes the information processing program. 【0034】 Storage 265 is implemented using an HDD (Hard Disk Drive) or the like, and is a storage unit for storing the aforementioned information processing program, document image 13, cell boundary recognition model 16, string data 17, etc. 【0035】 The input device 266 is implemented as a mouse and keyboard, etc., and is a processing unit for receiving operation input from a user or engineer to the information processing device 130. The display device 267 is implemented as a liquid crystal display, etc., and is a display unit for displaying various display screens of the information processing device 130 to the user or engineer. The CPU 261 operates as a display control unit that controls the screen displayed on the display device 267. 【0036】 The external interface 268 connects the information processing device 130 and the network 140, and is an interface unit for receiving the cell boundary recognition model 16 from the learning device 120 and the document image 13 from the image processing device 110. 【0037】 <Processing Sequence> Figures 3(a) and 3(b) show examples of processing sequences for the information processing system 100. Figure 3(a) shows an example of a processing sequence in which the learning device 120 of the information processing system 100 learns a cell boundary recognition model, and Figure 3(b) shows a processing sequence in which the information processing device 130 of the information processing system 100 outputs item values ​​included in the table of the document image 13. 【0038】 First, let's explain the processing sequence during development by the engineer. As shown in Figure 3(a), in S301, the engineer developing the information processing system 100 inputs one or more training document images 14 to the learning device 120. The learning device 120 acquires the training document images 14 based on the input from the engineer. 【0039】 In S302, the engineer operates the learning device 120 to instruct it to generate a correct label image (also called a correct image) 15 corresponding to the training document image 14 input in S301. 【0040】 In S303, the learning device 120 generates a correct label image 15 corresponding to the training document image 14 acquired in S301, based on the instructions. As a result, the learning device 120 acquires training data including the training document image 14 and the correct label image 15 corresponding to the training document image 14. 【0041】 In S304, the learning device 120 uses the data as training data to train a model (cell boundary recognition model) 16 that recognizes the cell boundaries of tables within a document image. 【0042】 In S305, the learning device 120 transmits the trained cell boundary recognition model 16 (also called the trained model) obtained through the learning in S303 to the information processing device 130. In S306, the information processing device 130 saves the trained cell boundary recognition model 16 received in S305 to the storage device 265. 【0043】 Next, we will explain the processing sequence when a user uses the system. As shown in Figure 3(b), in S311, the user using the information processing system 100 places the original document 11 on the image processing device 110 and instructs the image processing device 110 to perform a scan of the original document in order to obtain the text within the document. 【0044】 In S312, the image processing device 110 performs a scanning process on the original document 11 placed on the image processing device 110 to generate a document image 13. 【0045】 In S313, the image processing device 110 transmits the document image 13 generated in S312 to the information processing device 130. 【0046】 In S314, the information processing device 130 receives the document image 13 transmitted in S313. The information processing device 130 then uses the cell boundary recognition model 16 to recognize the boundaries between cells on the document image 13, identifies the cell region enclosed by the boundary, and extracts the string (item value) within the identified cell. 【0047】 In S315, the information processing device 130 outputs the string extracted in S314 in a format visible to the user. 【0048】 <Learning Process> Figure 4 is a flowchart showing the process flow in which the learning device 120 learns a cell boundary recognition model in the sequence shown in Figure 3(a). The execution program for each step shown in Figure 4 is stored in, for example, the ROM 232, RAM 234, or storage 235 of the learning device 120 and executed by the CPU 231 or GPU 239 of the learning device 120. 【0049】 In S401, the CPU 231 acquires one or more training document images 14 and their correct label images 15, input by the engineer, as training data. The training document image 14 is, for example, a document image 500 as shown in Figure 5(a). The area indicated by the dashed line within the document image 500 is the detail table 501. The detail table 501 is a 4x6 table, but the grid lines that indicate the boundaries between cells are omitted in the detail table 501. The training document image is, for example, a full-color (RGB 3-channel) image. The size of the training document image is, for example, an image size equivalent to A4 at 300dpi (width 2480 pixels, height 3508 pixels). The correct label image 15 is, for example, an image (hereinafter referred to as the correct label image) 510 as shown in Figure 5(b), which is an image in which the correct labels are assigned to the pixels corresponding to the boundaries between cells in the detail table. The correct label image 510 is the correct label image corresponding to the document image 500. The correct label image is, for example, a binary image and has the same image size as the training document image. For pixels that belong to a boundary between cells, a value indicating that it is a boundary (for example, 255, representing the white area in the image) is set. For pixels that do not belong to a boundary between cells, a value indicating that it is not a boundary (for example, 0, representing the black area in the image) is set. The correct label image 510 is created, for example, by the following procedure. As shown in Figure 6(a), the learning device 120 displays a UI screen 600 for creating the correct label on the display device 237. The UI screen 600 displays the document image 500. The learning device 120 then accepts an input operation to set a specified location as a boundary between cells 602. This input operation is performed by an engineer operating the input device 236, for example, by dragging the mouse cursor 603 in a linear fashion, as shown in Figure 6(a). Note that the boundary between cells 602 indicates a provisionally set boundary. Once the engineer has finished creating the correct labels and pressed the save button 604, the learning device 120 saves the correct label image to which the correct labels created by the engineer have been assigned. For example, as shown in Figure 6(b), the learning device 120 saves the correct label image 510 to which the correct label 511 has been assigned to the detail table 501. As a result, the learning device 120 obtains the correct label image corresponding to the training document image. 【0050】 Returning to the explanation of the flowchart in Figure 4, in S402, CPU231 performs a process to expand the cell boundaries within the ground truth label image acquired in S401. This generates a new ground truth label image with expanded cell boundaries. 【0051】 (Cell boundary expansion process within the correct label image) The cell boundary expansion process within the ground truth label image will be explained using Figures 7 and 8. Figure 7 is a flowchart showing the detailed flow of the cell boundary expansion process (S402) within the ground truth label image. Figure 8 is a diagram illustrating the expansion of boundaries within the ground truth label image. Figure 8(a) shows a detail table of a document image, and Figure 8(b) shows an example of a ground truth label for the detail table shown in Figure 8(a). Figure 8(c) shows the case where the vertical boundary line 811 in Figure 8(b) is selected, and Figure 8(d) shows the case where the vertical boundary line 811 in Figure 8(b) is expanded. Figure 8(e) shows the case where the horizontal boundary line 814 in Figure 8(b) is selected, and Figure 8(f) shows the case where the horizontal boundary line 814 in Figure 8(e) is expanded. The execution program for each step shown in Figure 7 is stored in either the ROM 232, RAM 234, or storage 235 of the learning device 120 and executed by the CPU 231 or GPU 239 of the learning device 120. 【0052】 In S701, CPU231 selects one from the pairs of document images and correct label images acquired in S401. 【0053】 In S702, CPU231 obtains cell boundary lines from the ground truth label image. Specifically, CPU231 detects pixels belonging to the outer perimeter using an existing contour tracking algorithm for the ground truth label 511. After outer perimeter detection, vertical and horizontal lines remain in the table, which are detected as straight lines using an existing Hough transform process. Then, CPU231 determines the angle of each detected straight line relative to the vertical or horizontal direction of the ground truth label image. If the calculated angle is close to horizontal with respect to the ground truth label image, CPU231 detects the line as a horizontal boundary line; if the calculated angle is close to perpendicular with respect to the ground truth label image, CPU231 detects the line as a vertical boundary line. When detecting the orientation of a boundary line, CPU231 obtains the coordinates (x,y) of both ends of the boundary line segment as the position information of the detected boundary line. Hereafter, unless otherwise specified, cell boundary lines will simply be referred to as boundary lines. For example, CPU 231 obtains a correct label 511 as shown in Figure 8(b) for a detail table 501 of document image 500 as shown in Figure 8(a), and obtains vertical and horizontal lines as boundaries, excluding the outer perimeter lines of the table within the correct label 511. Specifically, it obtains boundaries 811 to 813 as vertical boundaries, and boundaries 814 to 818 as horizontal boundaries. 【0054】 In S703, CPU231 selects one of the multiple boundary lines obtained in S702. For example, in Figure 8(b), one of the multiple boundary lines 811 to 818 is selected. 【0055】 In S704, CPU231 determines the orientation of the boundary line selected in S703. Specifically, CPU231 determines the orientation of the boundary line by calculating the slope of the line based on the coordinates of both ends of the boundary line obtained in S702. CPU231 determines the orientation of the boundary line to be horizontal if the calculated slope of the line is less than 45 degrees, and vertical if the slope is between 45 and 90 degrees. For example, boundary lines 811-813 shown in Figure 8(b) are determined to be vertical. Boundary lines 814-818 shown in Figure 8(b) are determined to be horizontal. If a boundary line is determined to be vertical (vertical in S704), processing proceeds to S705. If a boundary line is determined to be horizontal (horizontal in S704), processing proceeds to S708. 【0056】 In S705, CPU231 retrieves all strings that exist on both the right and left sides of the vertical boundary line acquired in S703. When retrieving strings, CPU231 obtains the coordinates indicating the position of the string as the position information of the string. For example, in the detail table 501 shown in Figure 8(c), all strings adjacent to the vertical boundary line 811 are retrieved on both the right and left sides. The strings are obtained by performing character recognition processing on the document image 500. For character recognition, existing character recognition technology that identifies characters based on pixel information in the image can be used. To the left of boundary line 811, strings 821 to 826 are retrieved. To the right of boundary line 811, strings 831 to 836 are retrieved. Note that in Figure 8(c), only some strings necessary for the explanation of the figure are shown, and strings such as unit price and amount that exist to the right of boundary line 811 are not shown as targets for acquisition, but are acquired by this process. Similarly, only some strings are shown in subsequent figures. 【0057】 In S706, CPU231 identifies the string closest to the boundary line from the character range of the string obtained in S705, both to the right and to the left of the boundary line. For example, in Figure 8(c), string 821 is identified as the string closest to the boundary line among the strings located to the left of boundary line 811. Also, string 831 is identified as the string closest to the boundary line among the strings located to the right of boundary line 811. 【0058】 In S707, CPU231 extends the boundary line based on the coordinates of the string identified in S706. Specifically, CPU231 extends from the right edge 827 of the area of ​​string 821 identified to the left of boundary line 811 to the left edge 837 of the area of ​​string 831 identified to the right of boundary line 811. As a result of the extension, CPU231 obtains the extended boundary line 841 as shown in Figure 8(d). Note that in Figure 8(d), the boundary line may also be extended to have margins between the right edge 827 and the left edge 837 and the boundary line. Specifically, the boundary line may be extended from a position further to the right of the right edge 827 to a position further to the left of the left edge 837. The vertical boundary line is extended in the manner described above. 【0059】 In S708, CPU231 acquires all strings that exist on both sides of the horizontal boundary line acquired in S703, both above and below it. When acquiring strings, CPU231 acquires the coordinates indicating the position of the string as the position information of the string. For example, in the detail table 501 shown in Figure 8(e), all strings adjacent to the horizontal boundary line 814 are acquired both above and below it. The strings are acquired by performing character recognition processing on the document image 500. For character recognition, existing character recognition technology that identifies characters based on pixel information in the image can be used. Above the boundary line 814, strings 821, 831, 851, and 852 are acquired. Also, below the boundary line 814, strings 822, 832, 861, and 862 are acquired. Note that in Figure 8(e), only some of the strings necessary for the explanation of the figure are shown, and the strings from the second row onwards, which exist below the boundary line 814, are not shown as targets for acquisition, but are acquired by this process. 【0060】 In S709, CPU231 identifies the string closest to the boundary line from the character range of the string obtained in S708, both in the upward and downward directions relative to the boundary line. For example, in Figure 8(e), string 821 is identified as the string closest to the boundary line among the strings located above boundary line 814. Also, string 832 is identified as the string closest to the boundary line among the strings located below boundary line 814. 【0061】 In S710, the CPU 231 extends the boundary line based on the coordinates of the string identified in S709. Specifically, it extends the boundary line from the lower edge 853 of the area of ​​string 821 identified above boundary line 814 to the upper edge 863 of the area of ​​string 832 identified below boundary line 814. As a result of the extension, the CPU 231 obtains the extended boundary line 871 as shown in Figure 8(f). Note that in Figure 8(f), the boundary line may also be extended to have margins between the lower edge 853 and the upper edge 863 and the boundary line. Specifically, the boundary line may be extended from a position further below the lower edge 853 to a position further above the upper edge 863. In this way, the horizontal boundary line is also extended. 【0062】 In S711, CPU231 determines whether all boundaries of the table in the ground truth label image obtained in S702 have been expanded. If it is determined that all boundaries have been expanded (YES in S711), the process moves to S712. If it is determined that all boundaries have not been expanded (NO in S711), the process returns to S703. 【0063】 In S712, CPU231 saves a newly generated ground truth label image after augmentation processing. 【0064】 In S713, CPU231 determines whether or not it has performed the process of extending the boundary lines for all ground truth label images. If it determines that all have been performed (YES in S713), it terminates the flow shown in Figure 7. Otherwise (NO in S713), the process returns to S701. 【0065】 Let's return to the explanation of the flowchart in Figure 4. In S403, the CPU 231 initializes the parameters of the cell boundary recognition model. The CPU 231 initializes the parameters included in the cell boundary recognition model by randomly determining their values. Note that there may be other methods for determining the values ​​used for initialization. For example, the parameters of the model that were previously trained using a training document image and a ground truth label image different from the training document image 14 and the ground truth label image saved in S712 may be used. The training document image 14 and the ground truth label image 15 may be used as the other training document image and ground truth label image mentioned above. 【0066】 In S404, the CPU 231 acquires a portion of the training data read in S401, i.e., a mini-batch. In this embodiment, the mini-batch method is used as the training method for the neural network. After the CPU 231 acquires a predetermined number of training data (mini-batch size, e.g., 8), processing proceeds to S405. Note that other training methods besides the mini-batch method may also be used; for example, the batch method or the online method may be used. 【0067】 In S405, CPU231 calculates the error in neural network training. The error represents the difference between the inference result of the cell boundary recognition model at that point in time and the correct information (correct label image). First, the training document image is input to the cell boundary recognition model to obtain the inference result. The difference between the obtained inference result and the correct label image is evaluated pixel by pixel to calculate the error. The error can be expressed, for example, by the following equation (1). Note that equation (1) for calculating the error is just one example, and any other equation used for training a neural network may be used. 【0068】 【number】 【0069】 In equation (1), x k and x k ' represents the inference result and the correct value for each pixel belonging to the boundary between cells. k and y k' represents the inference result and the correct value for pixels that do not belong to the boundaries between cells, respectively. M is the number of pixels in the entire image that belong to the boundaries between cells. N is the number of pixels in the entire image that do not belong to the boundaries between cells. Specifically, the first term of equation (1) decreases in value as the number of pixels that belong to the boundaries correctly inferred to be pixels that belong to the boundaries decreases. The second term decreases in value as the number of pixels that do not belong to the boundaries correctly inferred to be pixels that do not belong to the boundaries decreases. 【0070】 In S406, CPU231 adjusts the parameters of the cell boundary recognition model. Specifically, CPU231 modifies the parameters of the cell boundary recognition model using the backpropagation method based on the error calculated in S405. 【0071】 In S407, CPU231 determines whether to terminate the learning process. The determination is made by checking whether the processes in S404 to S406 have been performed a predetermined number of times (for example, 1000 times). The predetermined number of times is determined by user input at the start of this flowchart. If it is determined that the predetermined number of times has been performed (YES in S407), the process moves to S408. If it is determined that the predetermined number of times has not been performed (NO in S407), the process returns to S404. The learning of the cell boundary recognition model then continues. 【0072】 Finally, in S408, the CPU 231 outputs the trained cell boundary recognition model (trained cell boundary recognition model) to the information processing device 130. The information processing device 130 then saves the trained cell boundary recognition model output from the learning unit 122 of the learning device 120 to the storage 265. 【0073】 <Processing to extract item values ​​from document images> The process for extracting item values ​​from a document image according to this embodiment will be explained using Figures 9 to 11. Figure 9 is a flowchart showing the detailed flow of the process (S314) for extracting item values ​​from a table in a document image. The execution program for each step shown in Figure 9 is stored in either the ROM 262, RAM 264, or storage 265 of the information processing device 130 and executed by the CPU 261 of the information processing device 130. Figure 10 shows an example of a document image and an example of the recognition result of its cell boundary recognition model, with Figure 10(a) showing the document image 1000 and Figure 10(b) showing an example of the recognition result of the cell boundary recognition model for the document image 1000. Figure 11 is a diagram showing an example of item value extraction based on the cell boundary recognition result. 【0074】 In S901, CPU261 retrieves the trained cell boundary recognition model stored in storage265 at S306 in Figure 3 (S408 in Figure 4). 【0075】 In S902, CPU 261 receives and acquires the document image to be processed, which is input by the user in S312 of Figure 3. Specifically, CPU 261 acquires, for example, a document image 1000 that includes a detail table 901 in which text is written in cells, as shown in Figure 10(a). 【0076】 In S903, the CPU 261 inputs the document image 1000 acquired in S902 into the trained cell boundary recognition model acquired in S901 to perform inference processing and obtain the inference result. The CPU 261 obtains an inference result image 1010, which includes the inference result 1011 for the detail table 1001, as shown in Figure 10(b). The inference result image 1010 has a similar structure to the ground truth label image with the ground truth label 511 shown in Figures 8(d) and 8(f). Specifically, for each pixel, if it is recognized as a pixel corresponding to the boundary between extended cells, a value indicating that it is a boundary (for example, 255, representing the white area in the image) is stored. If it is recognized as a pixel that does not correspond to the boundary between extended cells, a value indicating that it is not a boundary (for example, 0, representing the black area in the image) is stored. 【0077】 In S904, CPU261 identifies each cell region and extracts item values ​​based on the inference result image acquired in S903. Here, the method of extracting item values ​​will be explained using Figure 11. As shown in Figure 11, assume that inference result 1011 has been obtained for detail table 1001, and we will explain an example of extracting item values ​​from detail table 1001. First, each cell region is identified from the inference result 1011 for detail table 1001 obtained in S903. Specifically, closed regions surrounded by values ​​(white parts in the image) that indicate the boundaries of expanded cells within inference result 1011 are extracted. By extracting closed regions, the regions of each cell such as cell regions 1111, 1112, 1113, 1114, etc. can be extracted from inference result 1011. Next, cells belonging to the same row are identified from the extracted cell regions. First, cell region 1111, which was extracted from the upper left of detail table 1001, is selected from the extracted cell regions. Next, we extract cells that belong to the same row as the selected cell region 1111. For example, if the top-left coordinate, width, and height of cell region 1111 are ((x, y), w, h), we extract cells other than cell region 1111 whose top y coordinate is between (y+H) and (yH), and whose bottom y coordinate is between (y+h+H) and (y+hH) (H is a predetermined constant). 【0078】 This allows us to identify a single row containing cell areas 1111, 1112, 1113, and 1114. Similar processing can be used to identify each row in the details table 1001 and the cell areas it contains for other cell areas. Next, the text within each cell in each row of the details table 1001 is extracted. Character recognition processing is then performed on each extracted cell area to recognize and identify the text. Existing character recognition techniques that identify characters based on pixel information within the area can be used for character recognition. For example, the text "product name" (1101) is extracted from cell area 1111. Similarly, the text "unit price" (1102) is extracted from cell area 1112, the text "quantity" (1103) from cell area 1113, and the text "amount" (1104) from cell area 1114. By performing similar processing for each row, the text within each cell can be extracted as an item value for each row. 【0079】 In S905, the CPU 261 outputs the item values ​​extracted in S904 and presents them to the user. For example, the CPU 261 of the information processing device 130 displays a confirmation screen, which is a UI screen for checking the extracted item values, on the display device 267. 【0080】 (Confirmation screen) Figure 12 shows an example of a screen for confirming item values. The confirmation screen 1200 includes a preview image display screen 1210, a results display screen 1220, and an exit button 1230. The preview image display screen 1210 highlights the details table 1211 within the document image, and row 1212 of the details table output on the results display screen. 【0081】 The results display screen 1220 displays the strings 1221, 1222, 1223, and 1224 extracted from the cells of the detail table. Specifically, the strings "WWW" 1221, "2,000" 1222, "1" 1223, and "2,000" 1224 are displayed. For example, the strings "WWW", "2,000", "1", and "2,000" contained in row 1212 of detail table 1211 displayed on the preview image display screen 1210 are displayed. In addition, edit button 1225 corresponds to string (item value) 1221, edit button 1226 corresponds to string (item value) 1222, edit button 1227 corresponds to string (item value) 1223, and edit button 1228 corresponds to string (item value) 1224. When an edit button is pressed by the user, the result of the item value corresponding to the pressed edit button can be modified. When the user presses the exit button 1230, all processes are terminated. 【0082】 As described above, according to this embodiment, by generating a ground truth label image with expanded boundaries between cells, it is possible to reduce the variability in the pattern of assigning ground truth labels to boundaries and the resulting trends. This makes it easier to obtain good results with the trained model by training it on the created data. Although the process of expanding boundaries within ground truth labels for tables with omitted grid lines has been described, boundaries may also be expanded for tables with other features. For example, tables with grid lines or tables where boundaries are distinguished by the presence or absence of background color, and the process of this embodiment may be applied to these tables as well. 【0083】 <<Embodiment 2>> In Embodiment 1, consistent training data was created by generating a ground truth label image with extended boundaries between cells. Embodiment 1 described a detail table 501 without grid lines, but tables with other features may be input. For example, as shown in detail table 1300 in Figure 13(a), there are tables where an entire row is merged as a single cell, such as rows containing strings indicating titles for listing by product category, like the rows for "Office Supplies" and "Peripherals". In this case, detail table 1300 in the ground truth label image becomes the ground truth label image 1310 shown in Figure 13(b), and the boundaries are interrupted at the merged cells, as shown by the vertical boundaries 1311, 1312, and 1313. In contrast, this embodiment describes a method of treating these boundaries as the same group and generating a ground truth label image with extended boundaries within the same group. 【0084】 (Expanding cell boundaries within the correct label image) The process of expanding cell boundaries within a ground truth label image will be explained using Figures 13 and 14. Figure 13 is a diagram illustrating the expansion of boundaries within a ground truth label image. Figure 13(a) shows an example of a document detail table, and Figure 13(b) shows an example of the ground truth label for the detail table shown in Figure 13(a). Figure 13(c) shows the case where the vertical boundaries 1311, 1312, and 1313 in Figure 13(b) are selected, and Figure 13(d) shows the case where the vertical boundaries 1311, 1312, and 1313 in Figure 13(b) are expanded. Figure 13(e) shows an example of a document detail table, and Figure 13(f) shows an example of the ground truth label for the detail table shown in Figure 13(e). Figure 13(g) shows the case where the horizontal boundary lines 1361 and 1362 in Figure 13(f) are selected, and Figure 13(h) shows the case where the horizontal boundary lines 1361 and 1362 in Figure 13(f) are extended. Figure 14 is a flowchart showing the detailed flow of the cell boundary extension process (S402) in the correct label image. Note that the execution program for each step shown in Figure 14 is stored in either the ROM 232, RAM 234, or storage 235 of the learning device 120 and executed by the CPU 231 or GPU 239 of the learning device 120. The differences from Embodiment 1 will be explained in detail. 【0085】 If it is determined to be a vertical boundary line (vertical in S704), the process proceeds to S1401. If it is determined to be a horizontal boundary line (horizontal in S704), the process proceeds to S1405. 【0086】 In S1401, the CPU 231 identifies the boundary lines belonging to the same group as the boundary line obtained in S703. For example, in S703, it is assumed that the vertical boundary line 1311 is selected in the correct label image 1310 shown in FIG. 13(b) corresponding to the list 1300 shown in FIG. 13(a). When the direction of the selected boundary line is vertical, to identify the boundary lines belonging to the same group, among the other vertical boundary lines, the boundary line with an x - coordinate close to that of the selected boundary line is identified. Specifically, when the x - coordinates of the endpoints of the boundary line 1311 are x1 and x2 respectively, and the x - coordinates of the endpoints of another boundary line are x3 and x4 respectively, it is determined whether x1 - R < x3 < x1 + R and x2 - R < x4 < x2 + R (R is a predetermined constant) are satisfied. If it is determined that they are satisfied, the obtained boundary line is identified as a boundary line belonging to the same group. In the correct label image 1310 shown in FIG. 13(b), the boundary lines 1311, 1312, and 1313 are identified as boundary lines belonging to the same group. 【0087】 In S1402, the CPU 231 acquires all the character strings existing in the right direction and the left direction respectively, which are on both sides of the boundary lines of the same group specified in S1201. For example, in the list 1300 shown in FIG. 13(c), for the boundary lines 1311, 1312, and 1313 of the same group, all the character strings adjacent in the right direction and the left direction respectively are acquired. In the left direction, the character string 1321 adjacent to the boundary line 1311 is acquired. Also, the character strings 1322 and 1323 adjacent to the boundary line 1312 are acquired. Also, the character string 1324 adjacent to the boundary line 1313 is acquired. In the right direction, the character string 1331 adjacent to the boundary line 1311 is acquired. Also, the character strings 1332 and 1333 adjacent to the boundary line 1312 are acquired. Also, the character string 1334 adjacent to the boundary line 1313 is acquired. 【0088】 In S1403, CPU231 identifies the string closest to the boundary line of the same group from the character range of the string obtained in S1402, both to the right and to the left of the boundary line of the same group. For example, in the detail table 1300 shown in Figure 13(c), among the strings located to the left of the boundary lines of the same group 1311, 1312, and 1313, string 1321 is identified as the string closest to the boundary lines of the same group 1311, 1312, and 1313. Also, among the strings located to the right of the boundary lines of the same group 1311, 1312, and 1313, string 1331 is identified as the string closest to the boundary lines of the same group 1311, 1312, and 1313. 【0089】 In S1404, CPU231 extends the boundaries of the same group based on the coordinates of the string identified in S1403. Specifically, CPU231 extends boundaries 1311, 1312, and 1313 from the left edge of the cell boundary corresponding to the boundary of the same group to the right edge 1325 of the area of ​​string 1321 identified to the left of the boundary of the same group. CPU231 extends boundaries 1311, 1312, and 1313 from the right edge of the cell boundary corresponding to the boundary of the same group to the left edge 1335 of the area of ​​string 1331 identified to the right of the boundary of the same group. As a result of the extension, CPU231 obtains the extended boundaries 1341, 1342, and 1343 as shown in Figure 13(d). In this way, the vertical boundaries of the same group are extended. 【0090】 In S1405, the CPU 231 identifies the boundary lines belonging to the same group as the boundary line obtained in S703. For example, in S703, it is assumed that the horizontal boundary line 1361 is selected in the correct label image 1360 shown in FIG. 13(f) corresponding to the list 1350 shown in FIG. 13(e). When the direction of the selected boundary line is horizontal, the method for identifying the boundary lines belonging to the same group is to identify, among the other horizontal boundary lines, the boundary line with a y - coordinate close to that of the selected boundary line. Specifically, when the y - coordinates of the endpoints of the boundary line 1361 are y1 and y2 respectively, and the y - coordinates of the endpoints of another boundary line are y3 and y4 respectively, it is determined whether y1 - T < y3 < y1 + T and y2 - T < y4 < y2 + T (T is a predetermined constant) are satisfied. If it is determined that they are satisfied, the obtained boundary line is identified as a boundary line belonging to the same group. In the correct label image 1360 shown in FIG. 13(f), the boundary lines 1361 and 1362 are identified as boundary lines belonging to the same group. 【0091】 In S1406, the CPU 231 acquires all the character strings existing respectively in the upward and downward directions, which are on both sides of the boundary lines of the same group identified in S1405. For example, in the list 1350 shown in FIG. 13(g), for the boundary lines 1361 and 1362 of the same group, all the character strings adjacent in the upward and downward directions respectively are acquired. In the upward direction, the character string 1351 adjacent to the boundary line 1361 is acquired. Also, the character strings 1352 and 1353 adjacent to the boundary line 1362 are acquired. In the downward direction, the character string 1354 adjacent to the boundary line 1361 is acquired. Also, the character strings 1355 and 1356 adjacent to the boundary line 1362 are acquired. 【0092】 In S1407, CPU231 identifies the string closest to the boundary line of the same group from the character range of the string obtained in S1406, both upward and downward relative to the boundary line of the same group. For example, in the detail table 1350 shown in Figure 13(g), among the strings located above the boundary lines of the same group 1361 and 1362, it identifies string 1351, which is closest to the boundary lines of the same group 1361 and 1362. Also, among the strings located downward from the boundary lines of the same group 1361 and 1362, it identifies string 1355, which is closest to the boundary lines of the same group 1361 and 1362. 【0093】 In S1408, CPU231 extends the boundaries of the same group based on the coordinates of the string identified in S1407. Specifically, CPU231 extends boundaries 1361 and 1362 from the upper end of the cell boundaries corresponding to boundaries 1361 and 1362 of the same group down to the lower end 1357 of the area of ​​string 1351 identified above boundaries 1361 and 1362 of the same group. CPU231 extends boundaries 1361 and 1362 from the lower end of the cell boundaries corresponding to boundaries 1361 and 1362 of the same group down to the upper end 1358 of the area of ​​string 1355 identified below boundaries 1361 and 1362 of the same group. As a result of the extension, CPU231 obtains the extended boundaries 1371 and 1372 as shown in Figure 13(h). In this way, the horizontal boundaries of the same group are also extended. 【0094】 As explained above, in this embodiment, boundary lines are treated as the same group, and boundary lines are extended within the same group. This makes it possible to extend boundary lines even in cases where they are interrupted, such as when cells are merged, by treating each boundary line as the same group. 【0095】 <<Other Embodiments>> In the above embodiment, a process for extending the boundaries within an image was described for a ground-level label image used to train a model that estimates cell boundaries. In another embodiment, the process of extending the boundaries may be applied to a ground-level label image used to train a model that estimates cell regions instead of cell boundaries. 【0096】 Figure 15 illustrates the ground truth label images used to train a model for estimating cell regions. Figure 15 is a diagram illustrating the expansion of cell regions within the ground truth label images of cell regions. Figure 15(a) shows an example of a document image, and Figure 15(b) shows an example of the ground truth label image for the document image shown in Figure 15(a). Figure 15(c) shows the case where the vertical boundary line 1512 in Figure 15(b) is selected, and Figure 15(d) shows the case where the vertical boundary line 1512 in Figure 15(b) is expanded. 【0097】 For example, for the document image 500 in Figure 15(a), the ground truth label image for training a model to estimate cell regions is the ground truth label image 1510 shown in Figure 15(b). Within the ground truth label image 1510, the detail table 501 becomes the ground truth label 1511, and for pixels belonging to a cell region, a value indicating that it is a cell region (e.g., 255, representing the white area in the image) is set. For pixels not belonging to a cell region, a value indicating that it is not a cell region is set (e.g., 0, representing the black area in the image). In this case, vertical and horizontal lines between cell regions within the ground truth label 1511 can be extracted as cell boundaries. For example, in the ground truth label 1511, boundaries 1512 to 1514 are obtained as vertical boundaries. Also, boundaries 1515 to 1519 are obtained as horizontal boundaries. This allows the boundaries to be extended using the same processing as in Embodiment 1. For example, as shown in Figure 15(c), with respect to the vertical boundary line 1512, S708 and S709 obtain the strings 821 and 831 that are closest to the boundary line in the left-right direction. Then, the boundary line is extended based on the coordinates of the obtained strings. Specifically, by extending from the right edge 827 of the area of ​​string 821 identified to the left of boundary line 1512 to the left edge 837 of the area of ​​string 831 identified to the right of boundary line 1512, the extended boundary line 1522 is obtained as shown in Figure 15(d). In this way, by training a cell boundary recognition model using the document image and the ground truth label image with the extended boundary as training data, the model can recognize the cell area rather than the boundary between cells. 【0098】 As described above, a process to extend the boundaries can be applied to the ground truth label images used to train a model that estimates cell regions. 【0099】 This disclosure can also be implemented by supplying a program that implements one or more of the functions of the above embodiments to a system or device via a network or storage medium, and by having one or more processors in the computer of that system or device read and execute the program. It can also be implemented by a circuit (e.g., an ASIC) that implements one or more functions. Furthermore, the program may be recorded on a recording medium readable by a computer and provided. 【0100】 This embodiment includes the following configuration examples. 【0101】 (Composition 1) An information processing device that generates training data for obtaining a trained model for performing table recognition on document images, An acquisition means for acquiring a document image containing a table with strings written in its cells, and a correct image showing the structure of the table contained in the document image. A detection means for detecting the boundaries between cells that constitute the table in the aforementioned correct image, A generation means for generating a ground truth image in which a new boundary is set based on the string adjacent to the boundary obtained by character recognition on the document image, An information processing device characterized by having the following: 【0102】 (Configuration 2) The generation means generates the ground truth image, which sets a new boundary on each side of the boundary based on the string closest to the boundary. The information processing device according to configuration 1, characterized by the above. 【0103】 (Composition 3) The generation means generates the ground truth image, which newly sets the boundary by extending it to the string closest to the boundary on each side of the boundary. An information processing device according to configuration 1 or 2, characterized by the above. 【0104】 (Composition 4) The system further has a means for identifying the group to which the boundary belongs, The generation means generates the ground truth image by setting a new boundary based on the string adjacent to the boundary where the group is considered identical. An information processing device according to any one of configurations 1 to 3, characterized by the above. 【0105】 (Composition 5) The generation means generates the ground truth image by setting a new boundary on each side of the boundary where the group is considered identical, based on the string closest to the boundary. The information processing apparatus according to configuration 4, characterized by the features described above. 【0106】 (Composition 6) The generation means generates the ground truth image by setting a new boundary on each side of the boundary where the group is considered identical, extending it to the string closest to the boundary. The information processing apparatus according to configuration 4 or 5, characterized by the above. 【0107】 (Composition 7) The detection means detects the boundary in the vertical direction, The generation means generates the ground truth image in which the vertical boundary is newly set based on the string adjacent to the vertical boundary. An information processing device according to any one of configurations 1 to 6, characterized by the above. 【0108】 (Composition 8) The generation means generates the ground truth image in which a new vertical boundary is set on each side of the vertical boundary based on the string closest to the vertical boundary. The information processing apparatus according to configuration 7, characterized by the features described above. 【0109】 (Composition 9) The generation means generates the ground truth image by setting a new vertical boundary on each side of the vertical boundary, extending it to the character closest to the vertical boundary. The information processing apparatus according to configuration 7 or 8, characterized by the above. 【0110】 (Composition 10) The system further includes a means for identifying the group to which the aforementioned vertical boundary belongs, The generation means generates the ground truth image by setting a new vertical boundary based on the string adjacent to the vertical boundary where the group is considered identical. An information processing device according to any one of configurations 7 to 9, characterized by the above. 【0111】 (Composition 11) The generation means generates the ground truth image by setting a new vertical boundary on each side of the vertical boundary where the groups are considered identical, based on the string closest to the vertical boundary. The information processing apparatus according to configuration 10, characterized by the above. 【0112】 (Composition 12) The generation means generates the ground truth image by setting a new vertical boundary on each side of the vertical boundary where the groups are considered identical, extending it to the string closest to the vertical boundary. An information processing device according to configuration 10 or 11, characterized by the above. 【0113】 (Composition 13) The detection means detects the boundary in the lateral direction, The generation means generates the ground truth image in which the horizontal boundary is newly set based on the string adjacent to the horizontal boundary. An information processing device according to any one of configurations 1 to 12, characterized by the above. 【0114】 (Composition 14) The generation means generates the ground truth image by setting a new horizontal boundary on each side of the horizontal boundary based on the string closest to the horizontal boundary. The information processing device according to configuration 13, characterized by the above. 【0115】 (Composition 15) The generation means generates the ground truth image by setting a new horizontal boundary on each side of the horizontal boundary, extending it to the string closest to the horizontal boundary. An information processing apparatus according to configuration 13 or 14, characterized by the above. 【0116】 (Composition 16) The system further includes a means for identifying the group to which the aforementioned lateral boundary belongs, The generation means generates the ground truth image by setting a new horizontal boundary based on the string adjacent to the horizontal boundary where the group is considered identical. An information processing device according to any one of configurations 13 to 15, characterized by the above. 【0117】 (Composition 17) The generation means generates the ground truth image by setting a new horizontal boundary on each side of the horizontal boundary where the groups are the same, based on the string closest to the horizontal boundary. The information processing device according to configuration 16, characterized in that... 【0118】 (Composition 18) The generation means generates the ground truth image by setting a new horizontal boundary on each side of the horizontal boundary where the groups are considered identical, extending it to the string closest to the horizontal boundary. An information processing device according to configuration 16 or 17, characterized by the above. 【0119】 (Composition 19) The detection means detects the vertical boundary and the horizontal boundary, respectively. The generation means generates the ground truth image by setting a new vertical boundary based on the string adjacent to the vertical boundary, and a horizontal boundary based on the string adjacent to the horizontal boundary. An information processing device according to any one of configurations 1 to 18, characterized by the above. 【0120】 (Composition 20) The generation means generates the ground truth image by setting a new vertical boundary on each side of the vertical boundary based on the string closest to the vertical boundary, and a horizontal boundary on each side of the horizontal boundary based on the string closest to the horizontal boundary. The information processing apparatus according to configuration 19, characterized by the features described herein. 【0121】 (Composition 21) The generation means generates the ground truth image by setting a new vertical boundary that extends on both sides of the vertical boundary to the string closest to the vertical boundary, and a horizontal boundary that extends on both sides of the horizontal boundary to the string closest to the horizontal boundary. An information processing device according to any one of configurations 19 to 20, characterized by the above. 【0122】 (Composition 22) The system further includes identifying means for identifying the group to which the aforementioned vertical boundary and the aforementioned horizontal boundary belong, The generation means generates the ground truth image by setting a new vertical boundary based on the string adjacent to the vertical boundary of the group that is the same, and a horizontal boundary based on the string adjacent to the horizontal boundary of the group that is the same. An information processing device according to any one of configurations 19 to 21, characterized by the above. 【0123】 (Composition 23) The generation means generates the ground truth image by setting a new vertical boundary on each side of the vertical boundary where the group is the same, based on the string closest to the vertical boundary, and a horizontal boundary on each side of the horizontal boundary where the group is the same, based on the string closest to the horizontal boundary. The information processing apparatus according to configuration 22, characterized by the features described above. 【0124】 (Composition 24) The generation means generates the ground truth image by newly setting the vertical boundary on each side of the vertical boundary where the groups are the same, extending it to the string closest to the vertical boundary, and the horizontal boundary on each side of the horizontal boundary where the groups are the same, extending it to the string closest to the horizontal boundary. An information processing apparatus according to configuration 22 or 23, characterized by the features described herein. 【0125】 (Composition 25) The acquisition means acquires the document image which includes the table in which the boundaries between cells are omitted, and the ground truth image which shows the structure of the table included in the document image. An information processing device according to any one of configurations 1 to 24, characterized by the above. 【0126】 (Composition 26) An acquisition means for acquiring a document image containing a table with strings written in its cells, and a correct image showing the structure of the table contained in the document image. A detection means for detecting the boundaries between cells that constitute the table in the aforementioned correct image, A generation means for generating a ground truth image in which a new boundary is set based on the string adjacent to the boundary obtained by character recognition on the document image, A learning means for generating a trained model for performing table recognition on document images by training a learning model using the ground truth image generated by the generation means and the document image corresponding to the ground truth image as training data, A receiving means that accepts a document image containing a table with text entered in the cells, A table recognition means that performs table recognition on the input document image using the aforementioned trained model, Extraction means for extracting strings from cells of the table recognized by the table recognition means, An information processing system characterized by having the following features. 【0127】 (Composition 27) An information processing method for generating training data to obtain a trained model for performing table recognition on document images, An acquisition step to obtain a document image containing a table with strings written in cells, and a correct image showing the structure of the table contained in the document image, A detection step for detecting the boundaries between cells that constitute the table in the correct image, A generation step of generating a ground truth image in which a new boundary is set based on the string of characters adjacent to the boundary obtained by character recognition on the document image, An information processing method characterized by including 【0128】 (Composition 28) A program that causes a computer to execute the information processing method described in configuration 27.

Claims

[Claim 1] An information processing device that generates training data for obtaining a trained model for performing table recognition on document images, An acquisition means for acquiring a document image containing a table with strings written in its cells, and a correct image showing the structure of the table contained in the document image. A detection means for detecting the boundaries between cells that constitute the table in the aforementioned correct image, A generation means for generating a ground truth image in which a new boundary is set based on the string adjacent to the boundary obtained by character recognition on the document image, An information processing device characterized by having the following: [Claim 2] The generation means generates the ground truth image, which sets a new boundary on each side of the boundary based on the string closest to the boundary. The information processing apparatus according to feature 1. [Claim 3] The generation means generates the ground truth image, which newly sets the boundary by extending it to the string closest to the boundary on each side of the boundary. The information processing apparatus according to feature 1. [Claim 4] The system further has a means for identifying the group to which the boundary belongs, The generation means generates the ground truth image by setting a new boundary based on the string adjacent to the boundary where the group is considered identical. The information processing apparatus according to feature 1. [Claim 5] The generation means generates the ground truth image by setting a new boundary on each side of the boundary where the group is considered identical, based on the string closest to the boundary. The information processing apparatus according to feature 4. [Claim 6] The generation means generates the ground truth image by setting a new boundary on each side of the boundary where the group is considered identical, extending it to the string closest to the boundary. The information processing apparatus according to feature 4. [Claim 7] The detection means detects the boundary in the vertical direction, The generation means generates the ground truth image in which the vertical boundary is newly set based on the string adjacent to the vertical boundary. The information processing apparatus according to feature 1. [Claim 8] The generation means generates the ground truth image in which a new vertical boundary is set on each side of the vertical boundary based on the string closest to the vertical boundary. The information processing apparatus according to feature 7. [Claim 9] The generation means generates the ground truth image by setting a new vertical boundary on each side of the vertical boundary, extending it to the character closest to the vertical boundary. The information processing apparatus according to feature 7. [Claim 10] The system further includes a means for identifying the group to which the aforementioned vertical boundary belongs, The generation means generates the ground truth image by setting a new vertical boundary based on the string adjacent to the vertical boundary where the group is considered identical. The information processing apparatus according to feature 7. [Claim 11] The generation means generates the ground truth image by setting a new vertical boundary on each side of the vertical boundary where the groups are considered identical, based on the string closest to the vertical boundary. The information processing apparatus according to feature 10. [Claim 12] The generation means generates the ground truth image by setting a new vertical boundary on each side of the vertical boundary where the groups are considered identical, extending it to the string closest to the vertical boundary. The information processing apparatus according to feature 10. [Claim 13] The detection means detects the boundary in the lateral direction, The generation means generates the ground truth image in which the horizontal boundary is newly set based on the string adjacent to the horizontal boundary. The information processing apparatus according to feature 1. [Claim 14] The generation means generates the ground truth image by setting a new horizontal boundary on each side of the horizontal boundary based on the string closest to the horizontal boundary. The information processing apparatus according to feature 13. [Claim 15] The generation means generates the ground truth image by setting a new horizontal boundary on each side of the horizontal boundary, extending it to the string closest to the horizontal boundary. The information processing apparatus according to feature 13. [Claim 16] The system further includes a means for identifying the group to which the aforementioned lateral boundary belongs, The generation means generates the ground truth image by setting a new horizontal boundary based on the string adjacent to the horizontal boundary where the group is considered identical. The information processing apparatus according to feature 13. [Claim 17] The generation means generates the ground truth image by setting a new horizontal boundary on each side of the horizontal boundary where the groups are the same, based on the string closest to the horizontal boundary. The information processing apparatus according to feature 16. [Claim 18] The generation means generates the ground truth image by setting a new horizontal boundary on each side of the horizontal boundary where the groups are considered identical, extending it to the string closest to the horizontal boundary. The information processing apparatus according to feature 16. [Claim 19] The detection means detects the vertical boundary and the horizontal boundary, respectively. The generation means generates the ground truth image by setting a new vertical boundary based on the string adjacent to the vertical boundary, and a horizontal boundary based on the string adjacent to the horizontal boundary. The information processing apparatus according to feature 1. [Claim 20] The generation means generates the ground truth image by setting a new vertical boundary on each side of the vertical boundary based on the string closest to the vertical boundary, and a horizontal boundary on each side of the horizontal boundary based on the string closest to the horizontal boundary. The information processing apparatus according to feature 19. [Claim 21] The generation means generates the ground truth image by setting a new vertical boundary that extends on both sides of the vertical boundary to the string closest to the vertical boundary, and a horizontal boundary that extends on both sides of the horizontal boundary to the string closest to the horizontal boundary. The information processing apparatus according to feature 19. [Claim 22] The system further includes identifying means for identifying the group to which the aforementioned vertical boundary and the aforementioned horizontal boundary belong, The generation means generates the ground truth image by setting a new vertical boundary based on the string adjacent to the vertical boundary of the group that is the same, and a horizontal boundary based on the string adjacent to the horizontal boundary of the group that is the same. The information processing apparatus according to feature 19. [Claim 23] The generation means generates the ground truth image by setting a new vertical boundary on each side of the vertical boundary where the group is the same, based on the string closest to the vertical boundary, and a horizontal boundary on each side of the horizontal boundary where the group is the same, based on the string closest to the horizontal boundary. The information processing apparatus according to feature 22. [Claim 24] The generation means generates the ground truth image by newly setting the vertical boundary on each side of the vertical boundary where the groups are the same, extending it to the string closest to the vertical boundary, and the horizontal boundary on each side of the horizontal boundary where the groups are the same, extending it to the string closest to the horizontal boundary. The information processing apparatus according to feature 22. [Claim 25] The acquisition means acquires the document image which includes the table in which the boundaries between cells are omitted, and the ground truth image which shows the structure of the table included in the document image. The information processing apparatus according to feature 1. [Claim 26] An acquisition means for acquiring a document image containing a table with strings written in its cells, and a correct image showing the structure of the table contained in the document image. A detection means for detecting the boundaries between cells that constitute the table in the aforementioned correct image, A generation means for generating a ground truth image in which a new boundary is set based on the string adjacent to the boundary obtained by character recognition on the document image, A learning means for generating a trained model for performing table recognition on document images by training a learning model using the ground truth image generated by the generation means and the document image corresponding to the ground truth image as training data, A receiving means that accepts a document image containing a table with text entered in the cells, A table recognition means that performs table recognition on the input document image using the aforementioned trained model, Extraction means for extracting strings from cells of the table recognized by the table recognition means, An information processing system characterized by having the following features. [Claim 27] An information processing method for generating training data to obtain a trained model for performing table recognition on document images, An acquisition step to obtain a document image containing a table with strings written in cells, and a correct image showing the structure of the table contained in the document image, A detection step for detecting the boundaries between cells that constitute the table in the correct image, A generation step of generating a ground truth image in which a new boundary is set based on the string of characters adjacent to the boundary obtained by character recognition on the document image, An information processing method characterized by including [Claim 28] A program for causing a computer to execute the information processing method described in claim 27.