Method and program for processing table data in document

The method and program address the challenge of processing table data with merged cells and complex structures by converting it into a structured format, ensuring effective utilization in intelligent systems.

WO2026135120A1PCT designated stage Publication Date: 2026-06-25POSCO HLDG INC

Patent Information

Authority / Receiving Office
WO · WO
Patent Type
Applications
Current Assignee / Owner
POSCO HLDG INC
Filing Date
2025-12-16
Publication Date
2026-06-25

AI Technical Summary

Technical Problem

Existing systems struggle to effectively process table data in various document formats, particularly those with merged cells or complex hierarchical structures, leading to a loss of semantic relationships and difficulties in utilizing such data within RAG systems.

Method used

A method and program for detecting tables within documents, analyzing their matrix structure, identifying hierarchical relationships, and converting the data into a structured format, preserving semantic associations.

Benefits of technology

Enables efficient processing and utilization of table data across different formats by maintaining hierarchical relationships and semantic associations, facilitating consistent processing in intelligent systems.

✦ Generated by Eureka AI based on patent content.

Smart Images

  • Figure KR2025021775_25062026_PF_FP_ABST
    Figure KR2025021775_25062026_PF_FP_ABST
Patent Text Reader

Abstract

A method for processing table data in a document, according to one embodiment of the present invention, may comprise the steps of: detecting a table area in a document; analyzing a row-column structure of the detected table; identifying a hierarchical relationship of text elements in the table on the basis of the analyzed row-column structure; mapping higher-level text and lower-level text according to the identified hierarchical relationship; converting the mapped text data into a structured format; and providing the converted structured data to an intelligent system.
Need to check novelty before this filing date? Find Prior Art

Description

Method and program for processing table data within a document

[0001] The present invention relates to a method for detecting and identifying table data in electronic documents having multiple formats for the construction of a RAG system.

[0002] Existing systems for processing information within electronic documents have focused primarily on plain text processing, which has limitations in effectively processing data structured in a table format.

[0003] Failure to properly recognize the structural characteristics of tables can lead to a loss of relationships between data, and in the case of tables with merged cells or complex hierarchical structures, it is difficult to process them while preserving semantic relationships. Furthermore, it is challenging to process tables within documents created in various formats (PDF, Word, Excel, etc.) in a consistent manner. Consequently, there are limitations in effectively utilizing documents containing table data within the RAG system.

[0004] According to the present invention, a method for detecting a table included in various electronic documents and analyzing the structure of the table is provided.

[0005] According to the present invention, a method is provided for detecting a table included in various electronic documents and analyzing text within the table.

[0006] According to the present invention, a method for converting mapped text into a structured format is provided.

[0007] The technical problems to be solved in this document are not limited to those mentioned above, and other technical problems not mentioned will be clearly understood by those skilled in the art to which this invention belongs from the description below.

[0008] A method for processing table data within a document according to an embodiment of the present invention may include: detecting a table area within the document; analyzing the matrix structure of the detected table; identifying the hierarchical relationship of text elements within the table based on the analyzed matrix structure; mapping a superordinate concept text and a subordinate concept text according to the identified hierarchical relationship; converting the mapped text data into a structured format; and providing the converted structured data to an intelligent system.

[0009] In the above table data processing method, the matrix structure analysis step may include: a step of identifying the coordinates of the rows and columns of the table; a step of verifying cell merging information based on the identified coordinates; and a step of analyzing the hierarchical structure between text elements based on the cell merging information and the coordinate information.

[0010] In the above table data processing method, the step of detecting a table area within the document may include: receiving electronic documents of various formats; converting the input documents into a standardized format; identifying the boundaries of a table area within the converted documents; and recognizing the cell structure within the identified table area.

[0011] In the above table data processing method, the step of identifying hierarchical relationships may include: a step of determining the parent-child relationship between text elements by analyzing the positional relationship on matrix coordinates; a step of identifying text groups of the same hierarchy based on cell merging information; and a step of determining the dependency relationship between the identified text groups.

[0012] In the above table data processing method, the step of converting to the structured format may include: a step of converting the identified hierarchical relationship into a tree structure; a step of structuring the mapping relationship between the super-concept text and the sub-concept text; and a step of converting the structured mapping relationship into a data format that includes metadata.

[0013] A computer program stored on a computer-readable storage medium according to an embodiment of the present invention, wherein the computer program performs steps of processing table data within a document when executed on one or more processors of a computing device, the steps may include: detecting a table area within the document; analyzing the matrix structure of the detected table; identifying the hierarchical relationship of text elements within the table based on the analyzed matrix structure; mapping a super-concept text and a sub-concept text according to the identified hierarchical relationship; converting the mapped text data into a structured format; and providing the converted structured data to an intelligent system.

[0014] In the table data processing program above, the matrix structure analysis step may include: a step of identifying the coordinates of the rows and columns of the table; a step of verifying cell merging information based on the identified coordinates; and a step of analyzing the hierarchical structure between text elements based on the cell merging information and the coordinate information.

[0015] In the table data processing program above, the step of detecting a table area within the document may include: receiving electronic documents of various formats; converting the input documents into a standardized format; identifying the boundaries of the table area within the converted documents; and recognizing the cell structure within the identified table area.

[0016] In the table data processing program above, the hierarchical relationship identification step may include: a step of determining the parent-child relationship between text elements by analyzing the positional relationship on the matrix coordinates; a step of identifying text groups of the same hierarchy based on cell merging information; and a step of determining the dependency relationship between the identified text groups.

[0017] In the table data processing program above, the step of converting to the structured format above may include: a step of converting the identified hierarchical relationship into a tree structure; a step of structuring the mapping relationship between the super-concept text and the sub-concept text; and a step of converting the structured mapping relationship into a data format that includes metadata.

[0018] In a storage medium storing at least one instruction according to an embodiment of the present invention, when the at least one instruction is executed by a processor, the processor may perform the steps of: detecting a table area within a document; analyzing the matrix structure of the detected table; identifying the hierarchical relationship of text elements within the table based on the analyzed matrix structure; mapping a super-concept text and a sub-concept text according to the identified hierarchical relationship; converting the mapped text data into a structured format; and providing the converted structured data to an intelligent system.

[0019] According to the present invention, table data contained in documents of various formats can be detected, and the structure of the detected table data can be determined to coordinate the rows and columns of the table itself.

[0020] According to the present invention, an upper cell or a lower cell, which is a hierarchical structure of a table, can be identified based on the row and column structure of the coordinated table itself.

[0021] According to the present invention, after matching a described text with a coordinated table, the contents of the table can be transformed based on the hierarchical structure of the coordinated table, and the contents of the transformed table can be converted into a language such as machine language.

[0022] FIG. 1 is an overall flowchart of a table data processing method according to one embodiment of the present invention.

[0023] FIG. 2 is a detailed flowchart of the layout structure analysis process according to one embodiment of the present invention.

[0024] FIG. 3 is a block diagram showing a system configuration for table data processing according to one embodiment of the present invention.

[0025] FIG. 4 is a block diagram illustrating a data conversion process according to an embodiment of the present invention.

[0026] FIG. 5 is a block diagram showing the configuration of a computing device according to one embodiment of the present invention.

[0027] FIG. 6 is a diagram showing a specific example of table data processing according to one embodiment of the present invention.

[0028] The embodiments described in this document and the configurations illustrated in the drawings are merely preferred examples of the disclosed invention, and various modifications that may replace the embodiments and drawings of this specification may exist at the time of filing this application.

[0029] The terms used in this document are for describing the embodiments and are not intended to limit or restrict the disclosed invention.

[0030] For example, in this specification, singular expressions may include plural expressions unless the context clearly indicates otherwise.

[0031] In this document, each of the phrases such as "A or B", "at least one of A and B", "at least one of A or B", "A, B or C", "at least one of A, B and C", and "at least one of A, B, or C" may include any one of the items listed together in the corresponding phrase, or all possible combinations thereof.

[0032] The term "and / or" includes a combination of multiple related described components or any of the multiple related described components. For example, "A and / or B" may include only "A," only "B," or both "A and B."

[0033] Additionally, terms such as “include” or “have” are intended to express the existence of the features, numbers, steps, actions, components, parts, or combinations thereof described in the specification, and do not exclude the additional existence or addition of one or more other features, numbers, steps, actions, components, parts, or combinations thereof.

[0034] When it is said that one component is “connected,” “combined,” “supported,” or “in contact” with another component, this includes not only cases where the components are directly connected, combined, supported, or in contact, but also cases where they are indirectly connected, combined, supported, or in contact through a third component.

[0035] When it is said that a component is located “on” another component, this includes not only cases where one component is in contact with the other, but also cases where another component exists between the two components.

[0036] Meanwhile, terms such as “front,” “rear,” “left,” “right,” “top,” and “bottom” used in the following description are defined based on the drawings; however, the shape and position of each component are not limited by these terms. For example, the front side may be defined as the +X side and the rear side as the -X side. For example, based on the drawings, the right side may be defined as the +Y side and the left side as the -Y side. For example, based on the drawings, the top side may be defined as the +Z side and the bottom side as the -Z side.

[0037] In addition, terms including ordinal numbers, such as "first," "second," etc., are used to distinguish one component from another and do not limit the components.

[0038] In addition, terms such as "~part," "~unit," "~block," "~part," and "~module" may refer to a unit that processes at least one function or operation. For example, the terms may refer to at least one piece of hardware such as an FPGA (field-programmable gate array) or ASIC (application specific integrated circuit), at least one piece of software stored in memory, or at least one process processed by a processor.

[0039] An embodiment of the disclosed invention is described in detail below with reference to the attached drawings. Identical reference numbers or symbols in the attached drawings may indicate parts or components that perform substantially the same function.

[0040] The present invention may relate to a method and program for effectively processing table data within a document and converting it into a form usable in an intelligent system. Specifically, the invention may relate to a technology for detecting a table area in a document, analyzing the matrix structure of the table, identifying the hierarchical relationships of text elements based on the analyzed structure, and then converting them into a structured format.

[0041] The present invention aims to go beyond simple text extraction from structured data, specifically tables, and to convert the data into a form that can be processed by an intelligent system while preserving the hierarchical relationships and semantic associations between data elements. Through this, table data contained in documents of various formats can be efficiently processed and utilized.

[0042] The operating principle and embodiments of the present invention will be described below with reference to the attached drawings.

[0043] FIG. 1 is an overall flowchart of a table data processing method according to an embodiment of the present invention. FIG. 1 may show the entire processing process from the input of an electronic document to its provision to an intelligent system.

[0044] Referring to FIG. 1, when an electronic document is input (100), the control unit of the computing device can detect a table area within the document (110). In the table area detection process, a structured data area within the document can be automatically identified and the boundaries of the area can be determined.

[0045] Additionally, within this specification, structured area detection may be table area detection (110).

[0046] The control unit of the computing device can perform a layout structure analysis on the detected table area (120). Analyzing the layout structure of the table may involve identifying the row and column structure of the table and analyzing structural characteristics such as cell merging information.

[0047] Once the layout structure analysis is complete, the control unit of the computing device can map text elements within the table according to a hierarchical structure (130). Mapping text within the table according to a hierarchical structure may establish a relationship between a higher concept and a lower concept.

[0048] Mapped text data can be converted into a structured format by the control unit of the computing device (140), in which the relationships and meanings between the data are preserved. For example, converting into a structured format may mean converting the identified hierarchical relationships into a tree structure, structuring the mapping relationships between the super-concept text and the sub-concept text, and converting into a data format that includes the structured mapping relationships as metadata.

[0049] When the control unit of a computing device converts table data into a structured format, it may involve consolidating the formats of multiple electronic documents of various types into a single format. Additionally, it may involve converting the representation of the data so that a computer program can identify it. For example, this may mean converting table data in a human-readable natural language form into a machine language form that a computer can process.

[0050] Specifically, the data "display" item in the table and its sub-item "size: 6.7 inches" can be converted into a structured format that a computer can understand, such as {"category": "display", "specification": {"size": "6.7inch"}}. Through this conversion, the hierarchical structure and semantic relationships of the data are expressed in a form that a computer can process, which can be utilized later for tasks such as database storage, retrieval, and analysis.

[0051] Furthermore, conversion into a standardized format can ensure data consistency and standardization. Even table data of different forms can be converted into a format with the same rules and structure, enabling consistent processing in intelligent systems.

[0052] The converted structured data can be provided to an intelligent system (150).

[0053] Through this series of processes, structured data in the form of a table is converted into a form that can be processed by an intelligent system, and each step is executed sequentially or, if necessary, some steps can be processed in parallel.

[0054] FIG. 2 is a detailed flowchart of a layout structure analysis process according to an embodiment of the present invention. FIG. 2 illustrates in detail the process of analyzing the structural characteristics of a table and identifying the relationships between data.

[0055] Referring to FIG. 2, the layout structure analysis (120) step may include a step of analyzing and identifying row / column structures (121), a step of checking table cell merge information (122), and a step of identifying hierarchical relationships (123).

[0056] The control unit of the computing device can analyze and identify the row / column structure of a table included in an electronic document (121). This is a process of identifying the basic grid structure that constitutes the table, and may include a step of identifying the coordinates of the rows and columns of the table.

[0057] For example, analyzing and identifying the row / column structure of a table may involve determining the coordinate values ​​of the column structure, such as 'Classification', 'Item', and 'Specification', and the row structure for each item in a product specification sheet. This coordinate-based analysis serves as a foundation for understanding the physical structure of the table and can be utilized as an important reference point in subsequent analysis steps.

[0058] The control unit of the computing device can check table cell merge information (122). This may include checking cell merge information based on identified coordinates. Checking cell merge information may mean identifying cells that have been merged across multiple rows or columns and analyzing the impact of such merging on the structure of the table and data relationships.

[0059] For example, if a 'Display' item is merged across multiple rows, it means identifying the start and end coordinates of the corresponding cell and analyzing its relationship with other cells contained within the merged area. This merge information plays a key role in understanding data grouping and the hierarchical structure within the table.

[0060] When table cell merging information is confirmed, the control unit of the computing device can identify the hierarchical relationship between the data based on this (123). This may include the step of analyzing the hierarchical structure between text elements based on cell merging information and coordinate information.

[0061] Identifying hierarchical relationships involves grasping the parent-child or group relationships within the data of a table, thereby understanding semantic relationships that go beyond a simple matrix structure.

[0062] For example, by analyzing the coordinate information of 'Size' and 'Resolution' cells associated with a merged cell named 'Display' in a product specification sheet, it is possible to determine that 'Display' is a higher-level concept and 'Size' and 'Resolution' are lower-level concepts.

[0063] Understanding this hierarchical structure serves as an important foundation for grasping the logical structure of data and subsequently converting it into a structured format.

[0064] The detailed steps of this layout structure analysis may be performed sequentially, or some steps may be processed in parallel. The information obtained from each step is utilized in the analysis of the next step, ultimately enabling a complete structural understanding of the table.

[0065] In particular, the utilization of coordinate-based analysis and cell merging information enables the accurate identification of data relationships even within complex table structures, serving as key information for the precise interpretation and transformation of data in subsequent processing stages.

[0066] FIG. 3 is a block diagram showing a system configuration for table data processing according to an embodiment of the present invention. FIG. 3 shows each component required for the data processing process from electronic document input to intelligent system linkage and the relationships between them.

[0067] Referring to FIG. 3, the system may include an electronic document input unit (210), a processing unit (220), and an intelligent system linkage unit (230). The processing unit (220) may include a data area detection unit (221), a structure analysis unit (222), a text extraction unit (223), and a data conversion unit (224).

[0068] The electronic document input unit (210) performs the role of receiving electronic documents of various formats. The input electronic documents may be in various formats such as PDF, Word, Excel, etc., and the electronic document input unit (210) may include a function to convert these documents into a standardized format that the system can process. The converted documents may be transmitted to the processing unit (220) for subsequent processing.

[0069] The data area detection unit (221) of the processing unit (220) can perform the function of identifying the boundaries of a table area within a converted document and recognizing the cell structure within the table area. This may be a process of automatically finding the area containing table data within the document and specifying its range.

[0070] For example, in a document containing a product specification sheet, column headers such as 'Classification', 'Item', and 'Specification', as well as grid structures, can be detected to recognize them as table areas, and the coordinates of the start and end points of the table can be determined.

[0071] The structural analysis unit (222) can identify the coordinates of the rows and columns of the table, verify cell merging information based on the identified coordinates, and perform the function of analyzing the hierarchical structure between text elements based on the cell merging information and the coordinate information. Through this analysis, the structural characteristics of the table and the relationship between the data can be identified.

[0072] For example, if the 'Display' item is merged across multiple rows and the 'Size' and 'Resolution' items below it occupy separate rows, the hierarchical relationship between them can be identified.

[0073] The text extraction unit (223) extracts text information within the table based on the analyzed structure. In this process, it analyzes the positional relationships on the matrix coordinates to determine the hierarchical relationships between text elements, identifies text groups of the same hierarchy based on cell merging information, and performs the function of determining the dependency relationships between the identified text groups.

[0074] For example, text information such as 'Size: 6.7 inches' and 'Resolution: 3200x1800' under the 'Display' item can be extracted, and it can be recognized that these are sub-items dependent on the parent concept of 'Display'. In addition, the hierarchical relationship between the 'Performance' item at the same level and information such as 'CPU: Latest CPU' and 'RAM: 12GB' under it can also be identified and extracted.

[0075] The data conversion unit (224) converts the extracted text data into a structured format. This includes the process of converting the identified hierarchical relationships into a tree structure, structuring the mapping relationships between the upper concept text and the lower concept text, and converting the data into a data format that includes the structured mapping relationships as metadata.

[0076] For example, extracted product specification data can be converted into the following JSON format:

[0077] The intelligent system linkage unit (230) performs the role of finally processing and transmitting the converted structured data into a form that can be utilized in the intelligent system. This enables the table data to be effectively utilized in the intelligent system.

[0078] Through this system configuration, table data is systematically processed and transformed into a form usable by the intelligent system, and each component operates independently yet is organically connected to enable the efficient operation of the entire system.

[0079] FIG. 4 is a block diagram illustrating a data conversion process according to an embodiment of the present invention. FIG. 4 shows the processing process of how original data is converted into structured data.

[0080] Referring to FIG. 4, the data conversion process may include the steps of analyzing original data (210), intermediate processing (241, 242, 244), and generating final structured data (250).

[0081] The original data (210) may include structural characteristic data (211), hierarchical relationship data (212), and combination relationship data (213). The structural characteristic data (211) may be data representing the basic matrix structure of a table and the physical characteristics of a cell. For example, column structures such as 'Classification', 'Item', and 'Specification' in a product specification sheet, or the size and location information of each cell, may correspond to this.

[0082] Hierarchical relationship data (212) may be information indicating the hierarchical relationship between data elements within a table. For example, this may include information regarding the relationship between a parent item called 'Display' and its child items 'Size' and 'Resolution'. Such hierarchical relationships may be expressed through visual characteristics such as cell merging or indentation.

[0083] The association relationship data (213) may be information indicating the association between data elements in the same layer. For example, this may be relationship information forming item-value pairs such as 'size' and '6.7 inches', 'resolution' and '3200x1800'.

[0084] The original data may undergo the processes of structural characteristic transformation (241), hierarchical relationship structuring (242), and combination relationship structuring (244), respectively. Structural characteristic transformation (241) may be a process of transforming the physical structure of a table into a logical structure.

[0085] Hierarchical relationship structuring (242) may be a process of converting identified hierarchical relationships into a tree structure and structuring mapping relationships between upper concept text and lower concept text. Combined relationship structuring (244) may be a process of converting structured mapping relationships into a data format that includes metadata.

[0086] Through this transformation process, structured data (250) can be finally generated. The structured data (250) includes a structured structure tree (251), a hierarchical data tree (252), and a structured combination relationship tree (253), which may represent a data structure in a form that can be immediately utilized in an intelligent system.

[0087] For example, data in a typical product specification sheet can be structured into a hierarchical JSON structure:

[0088] Structured data structures clearly express the meaning and relationships of data, enabling various processing such as search, analysis, and inference in intelligent systems.

[0089] FIG. 5 is a block diagram showing the configuration of a computing device according to an embodiment of the present invention. FIG. 5 may show hardware components for table data processing and the relationships between them.

[0090] Referring to FIG. 5, the computing device (300) may include a processor (310), memory (320), and a plurality of interfaces (330, 340, 350). An artificial intelligence model (321) and data for structural analysis (322) may be stored in the memory (320).

[0091] The processor (310) can perform table data processing by executing instructions stored in memory (320). For example, it can perform functions such as identifying the coordinates of rows and columns of a table, verifying cell merging information based on the identified coordinates, and analyzing the hierarchical structure between text elements based on the cell merging information and coordinate information.

[0092] A processor may include a CPU (Central Processing Unit), GPU (Graphics Processing Unit), NPU (Neural Processing Unit), etc., and these can be responsible for processing according to their respective characteristics.

[0093] For example, the processor (310) may include a central processing unit (CPU), a graphics processing unit (GPU), and a neural network processing unit (NPU). The CPU is responsible for overall computation and control, the GPU performs tasks requiring parallel processing such as matrix operations, and the NPU can be dedicated to the inference tasks of the artificial intelligence model.

[0094] For example, the CPU can perform basic parsing and preprocessing of table data, the GPU can perform structural analysis requiring large-scale matrix operations, and the NPU can perform pattern recognition and inference through trained models.

[0095] For example, memory (320) can be composed of volatile memory such as RAM (Random Access Memory) and non-volatile memory such as SSD (Solid State Drive) or HDD (Hard Disk Drive).

[0096] The memory (320) may include volatile memory and non-volatile memory. Volatile memory serves as a temporary storage so that the processor can process data at high speed, and non-volatile memory can permanently store data for artificial intelligence models and structural analysis.

[0097] The artificial intelligence model (321) may be a trained model used to analyze the structure of the table and process data. The artificial intelligence model (321) may include deep learning models such as a convolutional neural network (CNN) or a Transformer.

[0098] CNNs are used to recognize visual patterns in tables, while Transformers can be utilized to analyze relationships between text elements. Since these models are pre-trained with large amounts of tabular data, they can effectively analyze new table structures as well.

[0099] The structural analysis data (322) is reference data for analyzing various types of table structures and may include information about the general patterns and structural features of the table.

[0100] The data for structural analysis (322) includes metadata that defines the characteristics of the table structure. For example, the metadata can be used as a reference standard when analyzing the table structure.

[0101] The input interface (330) may include various input devices such as a USB, network port, and scanner. Through this, documents of various formats such as PDF, image, and text files can be received, and appropriate preprocessing for each format can be automatically performed.

[0102] The output interface (340) may include a display port, a printer port, etc. The processed data may be output in a visualized form that can be viewed by the user, or in a data format that can be utilized in other systems.

[0103] The communication interface (350) can support various communication protocols such as Ethernet, Wi-Fi, and Bluetooth. It can exchange data with external systems in real time through communication methods such as REST API and WebSocket, and can also support encryption protocols for security.

[0104] The input interface (330), output interface (340), and communication interface (350) can exchange data with the outside. The input interface (330) can receive documents of various formats, and the output interface (340) can perform the role of outputting processed results. The communication interface (350) can exchange data with an external system through a network.

[0105] For example, when a product specification sheet in PDF format is input through the input interface (330), the processor (310) can analyze the table structure using an artificial intelligence model (321) and data for structural analysis (322), convert the data into a structured format, and then transmit it to an external intelligent system through the communication interface (350).

[0106] This hardware configuration enables the efficient processing of table data, and while each component operates independently, they are organically connected to facilitate the efficient operation of the entire system. In particular, the close interaction between the processor and memory allows for the rapid and accurate analysis of complex table structures and data transformation tasks.

[0107] FIG. 6 is a diagram illustrating a specific example of table data processing according to an embodiment of the present invention. FIG. 6 shows how actual table-shaped data is converted into structured text.

[0108] Referring to FIG. 6, the input data on the left may include table data in the form of a product specification table (210), and the right side may represent the result of processing this and converting it into structured text (250).

[0109] The product specification table, which is the input data (210), consists of three columns: 'Classification', 'Item', and 'Specification'. The 'Classification' column includes major categories such as 'Display' and 'Performance', and each major category may be displayed in a merged form spanning multiple rows. The 'Item' column lists detailed items belonging to each major category, and the 'Specification' column may contain specific specification values ​​for each item.

[0110] For example, under the 'Display' category, there are 'Size' and 'Resolution' items, which can have specification values ​​of '6.7 inches' and '3200x1800', respectively. Similarly, under the 'Performance' category, there are 'CPU' and 'RAM' items, which can have specification values ​​of 'Latest CPU' and '12GB', respectively.

[0111] This table-shaped data can be converted into structured text (250) with a preserved hierarchical structure as shown on the right through the processing process of the present invention. In the converted text, major category items are separated by numbers such as '1.', '2.', etc., and each detailed item can be expressed in an indented form starting with a hyphen (-). Additionally, items and specification values ​​can be displayed separated by a colon (:).

[0112] For example, "-Size: 6.7 inches" and "-Resolution: 3200x1800" may be indented and displayed under "1. Display", and "-CPU: Latest CPU" and "-RAM: 12GB" may be indented and displayed under "2. Performance".

[0113] Through this transformation, the hierarchical structure of tables and the relationships between data are clearly expressed, which can facilitate data processing and utilization in intelligent systems. In particular, this structured text format can be described as a form that is easy for machines to process while preserving the semantic structure of the data.

[0114] The example in Fig. 6 may be text data in a stage prior to being converted into a structured format. The text data converted into a structured format may be converted into a language that a computing device can understand.

[0115] When converted into a language that a computing device can understand, the table data of the input data is coordinated, the coordinates are matched with text, and the text is structured to store the coordinates matched with the converted text, thereby enabling the storage of the same structure as the input document as well as the storage of the document more concisely as structured text.

[0116] Meanwhile, the disclosed embodiments may be implemented in the form of a recording medium that stores instructions executable by a computer. The instructions may be stored in the form of program code and, when executed by a processor, may generate a program module to perform the operation of the disclosed embodiments. The recording medium may be implemented as a computer-readable recording medium.

[0117] Computer-readable recording media include all types of recording media that store instructions that can be decoded by a computer. Examples include ROM (read-only memory), RAM (random access memory), magnetic tape, magnetic disk, flash memory, optical data storage devices, etc.

[0118] Additionally, computer-readable recording media may be provided in the form of non-transitory storage media. Here, 'non-transitory storage media' simply means that it is a tangible device and does not contain a signal (e.g., electromagnetic waves), and this term does not distinguish between cases where data is stored semi-permanently and cases where it is stored temporarily. For example, 'non-transitory storage media' may include a buffer in which data is stored temporarily.

[0119] According to one embodiment, the method according to the various embodiments disclosed herein may be provided as included in a computer program product. The computer program product may be traded between a seller and a buyer as a product. The computer program product may be distributed in the form of a device-readable recording medium (e.g., compact disc read-only memory (CD-ROM)), or distributed online (e.g., download or upload) through an application store (e.g., Play Store™) or directly between two user devices (e.g., smartphones). In the case of online distribution, at least a portion of the computer program product (e.g., a downloadable app) may be temporarily stored or temporarily created on a device-readable recording medium, such as the memory of a manufacturer's server, an application store's server, or a relay server.

[0120] As described above, the disclosed embodiments have been explained with reference to the attached drawings. Those skilled in the art will understand that the present invention may be practiced in forms different from the disclosed embodiments without changing the technical spirit or essential features of the invention. The disclosed embodiments are illustrative and should not be interpreted restrictively.

Claims

1. Regarding the method of processing table data within a document, A step of detecting table areas within a document by inputting the document into an artificial intelligence model; Step of analyzing the matrix structure of the detected table; A step of identifying the hierarchical relationships of text elements within a table based on the analyzed matrix structure; A step of mapping super-concept text and sub-concept text according to the identified hierarchical relationship; A step of converting mapped text data into a structured format; and A table data processing method comprising the step of providing converted structured data to an intelligent system.

2. In Paragraph 1, The above matrix structure analysis step is, Step of identifying the row and column coordinates of the table; A step of verifying cell merge information based on identified coordinates; and A table data processing method comprising the step of analyzing the hierarchical structure between text elements based on cell merging information and coordinate information.

3. In Paragraph 1, The step of detecting a table area within the document by inputting the document into an artificial intelligence model is, Step of receiving electronic documents of various formats; A step of converting the input document into a standardized format; A step of identifying the boundaries of a table area within a converted document; and A table data processing method comprising the step of recognizing a cell structure within an identified table area.

4. In Paragraph 1, The above step of identifying hierarchical relationships is, A step of determining the hierarchical relationship between text elements by analyzing positional relationships on matrix coordinates; A step of identifying text groups of the same layer based on cell merging information; and A table data processing method comprising the step of determining dependency relationships between identified text groups.

5. In Paragraph 1, The step of converting to the above standardized format is, A step of converting identified hierarchical relationships into a tree structure; A step of structuring the mapping relationship between super-concept text and sub-concept text; and A table data processing method comprising the step of converting to a data format that includes structured mapping relationships as metadata.

6. A computer program stored on a computer-readable storage medium, wherein the computer program performs steps of processing table data within a document when executed on one or more processors of a computing device, and The above steps are, Step of detecting table areas within a document; Step of analyzing the matrix structure of the detected table; A step of identifying the hierarchical relationships of text elements within a table based on the analyzed matrix structure; A step of mapping super-concept text and sub-concept text according to the identified hierarchical relationship; A step of converting mapped text data into a structured format; and A table data processing program comprising the step of providing converted structured data to an intelligent system.

7. In Paragraph 6, The above matrix structure analysis step is, Step of identifying the row and column coordinates of the table; A step of verifying cell merge information based on identified coordinates; and A table data processing program comprising a step of analyzing the hierarchical structure between text elements based on cell merging information and coordinate information.

8. In Paragraph 6, The step of detecting the table area within the above document is, Step of receiving electronic documents of various formats; A step of converting the input document into a standardized format; A step of identifying the boundaries of a table area within a converted document; and A table data processing program comprising the step of recognizing a cell structure within an identified table area.

9. In Paragraph 6, The above step of identifying hierarchical relationships is, A step of determining the hierarchical relationship between text elements by analyzing positional relationships on matrix coordinates; A step of identifying text groups of the same layer based on cell merging information; and A table data processing program comprising the step of determining dependency relationships between identified text groups.

10. In Paragraph 6, The step of converting to the above standardized format is, A step of converting identified hierarchical relationships into a tree structure; A step of structuring the mapping relationship between super-concept text and sub-concept text; and A table data processing program comprising the step of converting structured mapping relationships into a data format that includes metadata.

11. In a storage medium storing at least one instruction, when the at least one instruction is executed by a processor, the processor, Step of detecting table areas within a document; Step of analyzing the matrix structure of the detected table; A step of identifying the hierarchical relationships of text elements within a table based on the analyzed matrix structure; A step of mapping super-concept text and sub-concept text according to the identified hierarchical relationship; A step of converting mapped text data into a structured format; and A storage medium that enables the step of providing converted structured data to an intelligent system.