Apparatus and method for searching for table and document according to question
The table and document search device improves search accuracy by identifying semantic relationships and refining graph structures to align results with user queries, addressing the limitations of existing search engines in handling table-text interactions.
Patent Information
- Authority / Receiving Office
- WO · WO
- Patent Type
- Applications
- Current Assignee / Owner
- POSTECH ACADEMY INDUSTRY FOUNDATION
- Filing Date
- 2025-03-07
- Publication Date
- 2026-06-11
Smart Images

Figure KR2025003037_11062026_PF_FP_ABST
Abstract
Description
Table and document search device and search method based on questions
[0001] The present invention relates to a table and document search device and a search method based on a question, and more specifically, to a table and document search device and a search method based on a question for searching and providing search results highly relevant to the question.
[0002]
[0003] The content described in this section merely provides background information regarding the present embodiment and does not constitute prior art.
[0004] Recently, the use of open question-and-answer systems replacing search engines has been increasing. Users of traditional search engines had to browse through multiple documents and compare various results to find the desired information after entering a question. In particular, identifying interrelationships between data of different formats, such as tables and text, required time and cost.
[0005] To address these issues, early fusion and late fusion can be utilized, but both methods have limitations in that they cannot perform accurate searches. For example, since early fusion forms pairs of table segments and documents before a question is given, tables and / or documents with low relevance to the question may be included in the search results, which can result in low precision and accuracy of the search. Additionally, for example, late fusion has limitations in that a single document may only partially contain the information necessary to determine relevance to the question, making it difficult to accurately determine relevance to the question based on a single document alone, and thus resulting in low search accuracy.
[0006] To address these issues, there was a need for technology capable of effectively identifying semantic relationships between tables and text, and rapidly searching for and providing data related to the questions.
[0007]
[0008] The objective of the present invention is to provide a table and document search device and a search method based on a question that can provide search results by reflecting the semantic relationship between the table and the text.
[0009] The objects of the present invention are not limited to those mentioned above, and other unmentioned objects and advantages of the present invention may be understood from the following description and will be more clearly understood by the embodiments of the present invention. Furthermore, it will be readily apparent that the objects and advantages of the present invention can be realized by the means and combinations thereof set forth in the claims.
[0010]
[0011] A table and document search device according to a question in accordance with an embodiment of the present invention comprises a processor and a memory operatively connected to the processor, wherein, when executed, the memory enables the processor to identify a table segment divided by rows, identify an initial graph comprising a plurality of first table segment document pairs in which the table segment and a document associated with the table segment are paired, identify a first similarity score between the question and each of the plurality of first table segment document pairs, identify a subgraph that is part of the initial graph based on the first similarity score, identify at least one of a new table segment and a new document that is different from each of the table segment of the subgraph and the document of the subgraph included in the subgraph based on the subgraph, identify at least one second table segment document pair that is at least part of the new table segment and the new document, identify a modified graph by adding the at least one second table segment document pair to the subgraph, identify an extended graph in which the modified graph is modified based on the modified graph, and, based on the relationship between the extended graph and the question, extend Stores instructions that identify the result graph.
[0012] In addition, the above instructions allow the processor to identify the initial graph formed in an early fusion manner.
[0013] Additionally, the above instructions cause the processor to input the question and the initial graph into an external network and receive the first similarity score from the external network.
[0014] Additionally, the above instructions cause the processor to identify the subgraph by identifying the top k pairs (where k is a natural number) with the highest first similarity scores among the plurality of first table segment document pairs.
[0015] Additionally, the above instructions allow the processor to identify each of the table segments of the subgraph and the documents of the subgraph as nodes, identify the node similarity score between the question and the nodes, and identify the top k nodes (where k is a natural number) with the highest node similarity scores among the nodes to identify a selected node group.
[0016] Additionally, the instructions allow the processor to identify a search result in the initial graph that is simultaneously associated with the selected node group and the question, and is of a different type from the selected node group, and to identify a second similarity score between the question and the search result, and to identify the top k search results with the highest second similarity scores to identify an additional search group, wherein the additional search group includes at least one of the new table segment and the new document.
[0017] Additionally, the instructions cause the processor to identify a third similarity score for each of the new table segment and the new document, and to identify at least one second table segment document pair based on the third similarity score.
[0018] Additionally, the instructions allow the processor to identify the third similarity score based on the node similarity score and the second similarity score, and to identify the top k additional search groups with high third similarity scores to identify the at least one second table segment document pair.
[0019] Additionally, the modification graph comprises a table segment of the subgraph, a document of the subgraph, a second table segment of at least one second table segment document pair, and a second document of at least one second table segment document pair, wherein the table segment of the subgraph and the second table segment are modification table segments, and the document of the subgraph and the second document are modification documents, and the instructions enable the processor to input the question into a large language model, receive from the large language model whether restoration of the modification table segment is required, and based on whether restoration is required, restore the modification table segment to the table, identify an additional table segment related to the question in the table, search for the additional table segment in either the modification graph or the initial graph to identify an additional document related to the additional table segment, separate the modification graph around the modification table segment to identify a separated graph, and identify the expanded graph based on the separated graph, the additional table segment, and the additional document.
[0020] Additionally, the instructions allow the processor to identify the expanded result graph by removing pairs of table segment documents unrelated to the question from the expanded graph.
[0021] Additionally, the above instructions allow the processor to identify the expanded graph by removing duplicate table segment document pairs from the separated graph based on the fact that the restoration is unnecessary.
[0022] Additionally, the instructions allow the processor to identify the expanded result graph by removing pairs of table segment documents unrelated to the question from the expanded graph.
[0023] Additionally, the instructions cause the processor to identify a final similarity score between the query and the pair of result table segment documents included in the extended result graph, and to identify a final graph in which the pair of result table segment documents is aligned based on the final similarity score.
[0024] Additionally, the above instructions cause the processor to provide the final graph as a response to the question.
[0025] A method for searching a table and documents based on a question according to an embodiment of the present invention comprises: a step of identifying a table segment divided by rows; a step of identifying an initial graph including a plurality of first table segment document pairs formed by the table segment and a document related to the table segment; a step of identifying a first similarity score between each of the plurality of first table segment document pairs between the question and each of the plurality of first table segment document pairs; a step of identifying a subgraph that is part of the initial graph based on the first similarity score; a step of identifying at least one of a new table segment and a new document that is different from each of the table segment of the subgraph and the document of the subgraph included in the subgraph based on the subgraph; a step of identifying at least one second table segment document pair that is at least part of the new table segment and the new document; a step of identifying a modified graph by adding the at least one second table segment document pair to the subgraph; a step of identifying an extended graph in which the modified graph is modified based on the modified graph; and a step of identifying an extended result graph based on the relationship between the extended graph and the question.
[0026] Additionally, the step of identifying at least one of the new table segment and the new document comprises: identifying each of the table segment of the subgraph and the document of the subgraph as a node; identifying a node similarity score between the question and the node; and identifying the top k nodes (where k is a natural number) with high node similarity scores among the nodes to identify a selected node group.
[0027] Additionally, the step of identifying at least one of the new table segment and the new document comprises: identifying a search result in which the new table segment and the new document, which are simultaneously related to the selection node group and the question and are of a different type from the selection node group, are searched in the initial graph; identifying a second similarity score between the question and the search result; and identifying the top k search results with high second similarity scores to identify an additional search group, wherein the additional search group includes at least one of the new table segment and the new document.
[0028] Additionally, the modification graph comprises a table segment of the subgraph, a document of the subgraph, a second table segment of at least one second table segment document pair, and a second document of at least one second table segment document pair, wherein the table segment of the subgraph and the second table segment are modification table segments, and the document of the subgraph and the second document are modification documents, and the step of identifying the extended graph comprises the step of inputting the question into a large language model, the step of receiving from the large language model whether restoration of the modification table segment is required, the step of restoring the modification table segment to a table based on the fact that restoration is required, the step of identifying an additional table segment related to the question in the table, the step of searching for the additional table segment in either the modification graph or the initial graph to identify an additional document related to the additional table segment, the step of separating the modification graph around the modification table segment to identify a separated graph, and the step of identifying the extended graph based on the separated graph, the additional table segment, and the additional document.
[0029] Additionally, the step of identifying the extended graph includes removing duplicate table segment document pairs from the separated graph based on the fact that the restoration is unnecessary, thereby identifying the extended graph.
[0030] Additionally, the method further includes the steps of identifying a final similarity score between the pair of result table segment documents included in the question and the extended result graph, identifying a final graph in which the pair of result table segment documents is aligned based on the final similarity score, and providing the final graph as a response to the question.
[0031]
[0032] The table and document search device and search method according to the question of the present invention consider the relevance between the table and the question and the relevance between the document and the question, and by expanding the search results, enable the user to efficiently search for structured data and unstructured data, and enable data highly relevant to the question to be searched quickly.
[0033] In addition, the table and document search device and search method according to the question of the present invention can provide search results highly relevant to the question by considering the relevance between the table and the question and the relevance between the document and the question, and by expanding the search results.
[0034] In addition to the above, the specific effects of the present invention are described together with the specific details for implementing the invention below.
[0035]
[0036] FIG. 1 is a drawing for explaining a table and document search device according to a question according to an embodiment of the present invention.
[0037] Figure 2 is a flowchart for explaining the operation of the processor of Figure 1.
[0038] Figure 3 is a diagram illustrating step S100 of Figure 2.
[0039] Figure 4 is a diagram illustrating step S100 of Figure 2.
[0040] Figure 5 is a diagram illustrating step S200 of Figure 2.
[0041] Figure 6 is a diagram illustrating step S300 of Figure 2.
[0042] Figure 7 is a diagram illustrating step S400 of Figure 2.
[0043] Figure 8 is a diagram illustrating steps S401, S403, and S405 of Figure 7.
[0044] Figure 9 is a diagram illustrating step S411 of Figure 7.
[0045] FIG. 10 is a diagram illustrating step S500 of FIG. 2.
[0046] FIG. 11 is a diagram illustrating step S600 of FIG. 2.
[0047] FIG. 12 is a diagram illustrating steps S700 and S800 of FIG. 2.
[0048] FIG. 13 is a drawing for explaining steps S703 and S705 of FIG. 12.
[0049] FIG. 14 is a diagram illustrating step S707 of FIG. 12.
[0050] FIG. 15 is a diagram illustrating step S709 of FIG. 12.
[0051] FIG. 16 is a diagram illustrating step S711 of FIG. 12.
[0052] Figure 17 is a diagram illustrating step S713 of Figure 12.
[0053] FIG. 18 is a diagram illustrating steps S900 and S1000 of FIG. 2.
[0054] FIG. 19 is a flowchart illustrating a table and document search method according to a question according to an embodiment of the present invention.
[0055]
[0056] Terms and words used in this specification and claims shall not be interpreted as being limited to their general or dictionary meanings. In accordance with the principle that an inventor may define the concept of a term or word to best describe their invention, they shall be interpreted in a meaning and concept consistent with the technical spirit of the invention. Furthermore, since the embodiments described in this specification and the configurations illustrated in the drawings are merely one embodiment of the invention and do not represent the entire technical spirit of the invention, it should be understood that various equivalents, modifications, and applicable examples capable of replacing them may exist at the time of filing this application.
[0057] The terms first, second, A, B, etc., as used in this specification and claims may be used to describe various components, but said components should not be limited by said terms. These terms are used solely for the purpose of distinguishing one component from another. For example, without departing from the scope of the present invention, the first component may be named the second component, and similarly, the second component may be named the first component. The term "and / or" includes a combination of a plurality of related described items or any of a plurality of related described items.
[0058] The terms used in this specification and claims are used merely to describe specific embodiments and are not intended to limit the invention. The singular expression includes the plural expression unless the context clearly indicates otherwise. In this application, terms such as "comprising" or "having" should be understood as not precluding the existence or addition of the features, numbers, steps, actions, components, parts, or combinations thereof described in the specification.
[0059] Unless otherwise defined, all terms used herein, including technical or scientific terms, have the same meaning as generally understood by those skilled in the art to which this invention pertains.
[0060] Terms such as those defined in commonly used dictionaries should be interpreted as having meanings consistent with their meanings in the context of the relevant technology, and should not be interpreted in an ideal or overly formal sense unless explicitly defined in this application. Furthermore, each component, process, procedure, or method included in each embodiment of the present invention may be shared within a scope that is not technically contradictory.
[0061]
[0062] Hereinafter, a table and document search device according to a question according to an embodiment of the present invention will be described with reference to FIGS. 1 to 18.
[0063] FIG. 1 is a drawing for explaining a table and document search device according to a question according to an embodiment of the present invention.
[0064] Referring to FIG. 1, a table and document search device (hereinafter, search device) (100) according to a question may include a processor (110) and a memory (120).
[0065] One or more other components (e.g., a communication module) may be added to the search device (100). In some embodiments, some of these components may be implemented as a single integrated circuit.
[0066] The search device (100) can receive a question from an external source (e.g., a user) and provide a search result (e.g., a final graph) that is highly relevant to the question. The search result may include table segments and / or documents.
[0067]
[0068] The memory (120) can store various data used by at least one component (e.g., processor (110)) of the search device (100). The data may include, for example, software (e.g., a program) and input data or output data for related instructions. The memory (120) may include volatile memory or non-volatile memory.
[0069] The memory (120) may store instructions, information, or data associated with the operation of components included in the search device (100). For example, the memory (120) may store instructions that enable the processor (110) to perform various operations described in this document during execution.
[0070]
[0071] The processor (110) may be operatively coupled with the memory (120) to perform the overall function of the search device (100). The processor (110) may include, for example, one or more processors. One or more processors may include, for example, an image signal processor (ISP), an application processor (AP), or a communication processor (CP).
[0072] The processor (110) can, for example, execute software (e.g., a program) to control at least one other component (e.g., a hardware or software component) of the search device (100) connected to the processor (110) and perform various data processing or operations. According to one embodiment, as at least part of the data processing or operations, the processor (110) can load commands or data received from another component (e.g., a communication module) into memory (120), process the commands or data stored in memory (120), and store result data in memory (120). According to one embodiment, the processor (110) may include a main processor (e.g., a central processing unit or an application processor) and an auxiliary processor (e.g., a graphics processing unit, an image signal processor, a sensor hub processor, or a communication processor) that can operate independently or together with it. Additionally or alternatively, the auxiliary processor may be configured to use lower power than the main processor or to be specialized for a specified function. The auxiliary processor may be implemented separately from the main processor or as part thereof. The program may be stored as software in memory (120) and may include, for example, an operating system, middleware, or application.
[0073] The operation of the processor (110) is described below.
[0074]
[0075] Figure 2 is a flowchart for explaining the operation of the processor of Figure 1.
[0076] Referring to FIGS. 1 and FIGS. 2, the processor (110) can identify an initial graph (S100) containing a plurality of first table segment document pairs.
[0077] Figure 3 is a diagram illustrating step S100 of Figure 2.
[0078] Referring to FIG. 3, the processor (110) may first divide a table (e.g., original table) (OT) into table segments (TS) to identify an initial graph. The table segments (TS) may be formed by dividing the table (OT) based on the rows of the table (OT). The processor (110) may identify the table segments (TS) formed by dividing the table (OT) based on the rows.
[0079] Figure 4 is a diagram illustrating step S100 of Figure 2.
[0080] Referring to FIG. 4, in (a), the processor (110) can identify a data pool graph (Gint) containing at least one table segment (TS) and at least one document (TX). The document (TX) contains text and may be of a different type of data than the table segment (TS).
[0081] In (b), the processor (110) can identify pairs by connecting at least one table segment (TS) and at least one document (TX) included in the data pool graph (Gint) based on their mutual relationship. The processor (110) can identify an initial graph (Gd) containing multiple first table segment document pairs (TP1) in which the table segment (TS) and document (TX) are paired. The paired documents (TX) and table segments (TS) may be related to each other. One table segment (TS) may be paired with, for example, multiple documents (TX). One document (TX) may be paired with, for example, multiple table segments (TS). For example, the initial graph (Gd) of FIG. 4 (b) may contain a total of nine first table segment document pairs (TP1).
[0082] In some embodiments, the initial graph (Gd) may be formed in an early fusion manner.
[0083] The processor (110) can identify the initial graph (Gd) based on the data pool graph (Gint). Alternatively, the processor (110) can identify the initial graph (Gd) by receiving the initial graph (Gd) from an external source.
[0084]
[0085] Referring again to FIGS. 1 and FIGS. 2, the processor (110) can identify a first similarity score (S200). The processor (110) can identify a first similarity score between each of a question and a plurality of first table segment document pairs.
[0086] The question may be data received from an external source (e.g., a user). The processor (110) may identify a first similarity score to identify the relationship between the question and a plurality of first table segment document pairs (TP1) included in the initial graph (Gd in FIG. 4). The first similarity score may include a plurality of similarity scores, identified for each of the plurality of first table segment document pairs (TP1) and the question.
[0087] For example, the processor (110) can identify a similarity score representing the relevance between a pair of table segment documents included in the initial graph (Gd in FIG. 4) and a question. For example, the processor (110) can identify a similarity score between another pair of table segment documents included in the initial graph (Gd in FIG. 4) and a question.
[0088] Figure 5 is a diagram illustrating step S200 of Figure 2.
[0089] Referring to FIG. 5, in some embodiments, the search device (100) may communicate with the network (200). The network (200) may be a network outside the search device (100). The network (200) may be, for example, a neural network. The processor (110) may input a question and an initial graph (Gd in FIG. 4) into the external network (200) and receive a first similarity score from the network (200).
[0090]
[0091] Referring again to FIGS. 1 and FIGS. 2, the processor (110) can identify a subgraph (S300) based on a first similarity score. The subgraph may be part of an initial graph (Gd in FIG. 4).
[0092] Figure 6 is a diagram illustrating step S300 of Figure 2.
[0093] Referring to FIGS. 4 and 6, in (c), the processor (110) can identify a subgraph (Gc) that is part of an initial graph (Gd). The subgraph (Gc) may include at least one table segment (TS_S) of the subgraph and at least one document (TX_S) of the subgraph. The subgraph (Gc) may include at least one pair (TP1) of a table segment (TS_S) of the subgraph and a document (TX_S) of the subgraph.
[0094] In some embodiments, the processor (110) can identify a subgraph (Gc) by identifying the top k pairs (where k is a natural number) with a high first similarity score among a plurality of first table segment document pairs (TP1). For example, the processor (110) can identify the top k pairs of table segment documents with a high first similarity score among a plurality of first table segment document pairs (TP1) and identify them as a subgraph (Gc). For example, among the nine first table segment document pairs (TP1) of the initial graph (Gd), five first table segment document pairs (TP1) may be included in the subgraph (Gc). At this time, the processor (110) can identify the subgraph (Gc) after removing any duplicate pairs among the identified at least one first table segment document pair (TP1).
[0095]
[0096] Referring again to FIGS. 1 and FIGS. 2, the processor (110) can identify (S400) at least one of a new table segment and a new document based on a subgraph. The processor (110) can identify at least one of a new table segment and a new document that is different from each of the table segment of the subgraph and the document of the subgraph, respectively, based on the subgraph.
[0097] Figure 7 is a diagram illustrating step S400 of Figure 2.
[0098] Referring to FIG. 7, the processor (110) can first identify each of the table segments of the subgraph and the documents of the subgraph as nodes (S401) in order to identify at least one of the new table segments and new documents.
[0099] The processor (110) can identify a node similarity score between a question and a node (S403). The processor (110) can identify multiple node similarity scores between each of multiple nodes and a question.
[0100] In some embodiments, the processor (110) inputs the question and subgraph to an external network (200 in FIG. 5) and can receive a node similarity score from the external network.
[0101] The processor (110) can identify a selected node group (S405). The processor (110) can identify the top k nodes among the node similarity scores as a selected node group. The processor (110) can identify the top k nodes with high node similarity scores as a selected node group. The selected node group may include at least one of a table segment of a subgraph and a document of a subgraph.
[0102] Figure 8 is a diagram illustrating steps S401, S403, and S405 of Figure 7.
[0103] Referring to FIG. 8, in (d), the processor (110) can identify each of the table segments (TS_S) and documents (TX_S) of the subgraph included in the subgraph (Gc) as nodes. The processor (110) can identify the node similarity scores between each of the questions and the multiple nodes. The processor (110) can identify the top k nodes with high node similarity scores as a selected node group (STS_S, STX_S).
[0104]
[0105] Referring again to FIG. 7, the processor (110) can identify search results (S407). The processor (110) can identify search results that are related to the selected node group and the query, and that are of a different type from the selected node group, such as related table segments and related documents, which are search results from the initial graph (Gd). The related table segments and related documents may not have been included in the subgraph (Gc).
[0106] Referring to FIG. 8, for example, if a selection table segment (STS_S) is included in a selection node group, the processor (110) may search the initial graph (Gd) for related documents of a different type from the selection table segment (STS_S) and include them in the search results. The related documents may be relevant to the selection table segment (STS_S) and the question simultaneously. Additionally, for example, if a selection document (STX_S) is included in a selection node group, the processor (110) may search the initial graph (Gd) for related table segments of a different type from the selection document (STX_S) and include them in the search results. The related table segments may be relevant to the selection document (STX_S) and the question simultaneously.
[0107] The search result may include at least one of a table segment and a document that is of a different type from the nodes (e.g., selected table segment and selected document) included in the selected node group (STS_S, STX_S).
[0108]
[0109] Referring again to FIG. 7, the processor (110) can identify a second similarity score (S409) between a question and a search result. The processor (110) can identify a second similarity score including multiple similarity scores between each of at least one table segment and document included in the search result and the question.
[0110] In some embodiments, the processor (110) inputs the question and search results into an external network (200 in FIG. 5) and can receive a second similarity score from the external network.
[0111]
[0112] The processor (110) can identify additional search groups (S411) based on the second similarity score.
[0113] Figure 9 is a diagram illustrating step S411 of Figure 7.
[0114] Referring to FIG. 9, the processor (110) can identify additional search groups (TS_N, TX_N) by identifying the top k search results with high second similarity scores. The additional search groups (TS_N, TX_N) may include at least one of a new table segment (TS_N) and a new document (TX_N).
[0115] In FIG. 9(e), the additional search group (TS_N, TX_N) may each include two new table segments (TS_N) and two new documents (TX_N). The new table segments (TS_N) may be the result of selecting k items with high second similarity scores after non-document types that are relevant to the selected documents (STX_S) and the question are searched in the initial graph (Gd). The new documents (TX_N) may be the result of selecting k items with high second similarity scores after non-table segment types that are relevant to the selected table segments (STS_S) and the question are searched in the initial graph (Gd).
[0116] For example, the processor (110) can search and identify multiple documents from the initial graph (Gd) that are related to both the selection table segment (STS_S) and the question among the selection node groups (STS_S, STX_S). The processor (110) can identify a second similarity score including similarity scores between the multiple documents and the question in the search results. The processor (110) can identify the top k documents with high second similarity scores to identify new documents (TX_N) included in an additional search group.
[0117] For example, the processor (110) can search and identify multiple table segments from the initial graph (Gd) that are related to both the selected document (STX_S) and the question among the selected node groups (STS_S, STX_S). The processor (110) can identify a second similarity score including similarity scores between the multiple table segments of the search results and the question. The processor (110) can identify the top k with high second similarity scores to identify new table segments (TS_N) included in additional search groups.
[0118] Additional search groups (TS_N, TX_N) may consist of table segments and / or documents that were not included in the subgraph (Gc).
[0119]
[0120] Referring again to FIGS. 1 and FIGS. 2, the processor (110) can identify (S500) at least one second table segment document pair. The at least one second table segment document pair may be at least some of a new table segment and a new document.
[0121] FIG. 10 is a diagram illustrating step S500 of FIG. 2.
[0122] Referring to FIG. 10, in (f), the processor (110) can identify at least one second table segment document pair (TP2) which is part of the table segment document pair in (e) of FIG. 9.
[0123] The processor (110) can identify a third similarity score for each of the new table segment (TX_N) and the new document (TX_N) based on the node similarity score and the second similarity score. For example, the processor (110) can identify the third similarity score by multiplying the node similarity score and the second similarity score. For example, the processor (110) can identify the third similarity score for the selected table segment (STS_S) and the first new document (TX_N) pair by multiplying the node similarity score of the selected table segment (STS_S) and the second similarity score between the selected table segment (STS_S) and the first new document (TX_N1) pair and the question.
[0124] The processor (110) can identify at least one second table segment document pair (TP2) based on a third similarity score. The processor (110) can identify the top k nodes with high third similarity scores with paired selection node groups (STS_S, STX_S) among additional search groups (TX_N, TS_N) as at least one second table segment document pair (TP2).
[0125] In (f) of Fig. 10, for example, among the four table segment document pairs, two pairs (TP2) with a third similarity score can be identified.
[0126]
[0127] Referring again to FIGS. 1 and FIGS. 2, the processor (110) can identify a modified graph (S600). The processor (110) can identify the modified graph by adding at least one second table segment document pair (TP2 in FIG. 10) to the subgraph (Gc in FIG. 8).
[0128] FIG. 11 is a diagram illustrating step S600 of FIG. 2.
[0129] Referring to FIG. 11, the processor (110) can identify a modified graph (Gl) by adding at least one second table segment document pair (TP2) to the subgraph (Gc in FIG. 8).
[0130] The modification graph (Gl) may include a modification table segment and a modification document. The modification table segment may include a table segment of the subgraph (Gc) and a second table segment of at least one second table segment document pair (TP2). The modification document may include a document of the subgraph (Gc) and a second document of at least one second table segment document pair (TP2).
[0131]
[0132] Referring again to FIGS. 1 and FIGS. 2, the processor (110) can identify an extended graph (S700). Based on the modified graph (Gl in FIG. 11), the processor (110) can identify an extended graph in which the modified graph has been modified.
[0133] The processor (110) can identify the expanded result graph (S800) based on the relationship between the expanded graph and the question.
[0134] FIG. 12 is a diagram illustrating steps S700 and S800 of FIG. 2.
[0135] Referring to FIG. 12, the processor (110) may first input a question into a large language model (S701) to identify an expanded graph by modifying the modification graph. The processor (110) may input the question into the large language model and receive a result from the large language model regarding whether it is necessary to restore the modification table segment included in the modification graph (Gl) to the original table (OT in FIG. 3). The modification table segment may be the result of splitting a table containing multiple rows along the rows.
[0136]
[0137] If restoration is required, the processor (110) can restore the modified table segment to the table (S703). The processor (110) can restore each of the modified table segments of FIG. 11 to the original table (OT of FIG. 3).
[0138]
[0139] The processor (110) can identify additional table segments (S705). The processor (110) can identify additional table segments related to the question in the restored table.
[0140] FIG. 13 is a diagram illustrating steps S703 and S705 of FIG. 12. The modified graph (Gl) of (g2) in FIG. 13 is identical to the modified graph (Gl) of (g1) in FIG. 11.
[0141] Referring to FIG. 13, the modification graph (Gl) in (g2) may include a first modification table segment (STS1), a second modification table segment (STS2), and a third modification table segment (STS3). The modification graph (Gl) may include a plurality of modification documents (STX) paired with each of the modification table segments (STS1, STS2, STS3).
[0142] The processor (110) can restore each of the modified table segments (STS1, STS2, STS3) to their original table based on the need for restoration. An example of (h) may be a case where some of the modified table segments (STS1, STS2, STS3) are split from the same table. In this case, there are three modified table segments (STS1, STS2, STS3), but fewer than three tables may be identified after restoration. For example, if the first modified table segment (STS1) and the third modified table segment (STS3) are split from the same table, restoring each of the first modified table segment (STS1) and the third modified table segment (STS3) will identify two identical original tables, so duplicate tables can be removed. As a result, the restored modified graph (Go) of (h) may include the restored first table (OT13). The processor (110) can restore the second modification table segment (STS2) to identify the restored second table (OT2) of the restored modification graph (Go) of (h).
[0143] The processor (110) can identify additional table segments (TS_N1, TS_N2) related to the question among the restored tables (OT13, OT2). The additional table segments (TS_N1, TS_N2) may be table segments that are the same as or different from the modified table segments (STS1, STS2, STS3).
[0144] In some embodiments, the processor (110) inputs the restored modified graph (Go) and the question into a large language model and can identify the additional table segments (TS_N1, TS_N2) associated with the question by receiving them from the large language model.
[0145]
[0146] Referring again to FIG. 12, the processor (110) can identify additional documents (S707). The processor (110) can identify additional documents associated with additional table segments (TS_N1, TS_N2 in FIG. 13) by searching for additional table segments (TS_N1, TS_N2 in FIG. 13) in either the modified graph (Gl in FIG. 13) or the initial graph (Gd in FIG. 4).
[0147] The processor (110) can search for related additional documents in the modified graph (Gl of FIG. 13) if the additional table segments (TS_N1, TS_N2 of FIG. 13) exist in the modified graph (Gl of FIG. 13). The processor (110) can search for related additional documents in the initial graph (Gd of FIG. 4) if the additional table segments (TS_N1, TS_N2 of FIG. 13) do not exist in the modified graph (Gl of FIG. 13).
[0148] FIG. 14 is a diagram illustrating step S707 of FIG. 12.
[0149] Referring to FIG. 14, the processor (110) can identify a first additional document (TX_N1) associated with a first additional table segment (TS_N1) identified in the restored modified graph (Go) of FIG. 13 (h). The processor (110) can identify a second additional document (TX_N2) associated with a second additional table segment (TS_N2) identified in the restored modified graph (Go) of FIG. 13 (h).
[0150] An additional graph (Gn) may include a subgraph (e.g., a star graph) organized around a table segment. For example, the additional graph (Gn) may include one subgraph in which a first additional document (TX_N1) is paired with a first additional table segment (TS_N1), and another subgraph in which a second additional document (TX_N2) is paired with a second additional table segment (TS_N2). The subgraph may contain duplicates, for example, where one additional document is paired with both the first additional table segment (TS_N1) and the second additional table segment (TS_N2).
[0151]
[0152] Referring again to FIG. 12, the processor (110) can identify a separation graph (S709). The processor (110) can identify the separation graph by separating the modification graph (Gl in FIG. 13) around the modification table segments (STS1, STS2, STS3).
[0153] FIG. 15 is a diagram illustrating step S709 of FIG. 12.
[0154] Referring to FIG. 15, in (j), the processor (110) can separate the modification graph (Gl of FIG. 13) around the modification table segments (STS1, STS2, STS3) of the modification graph (Gl of FIG. 13). In the example of (j), since there are three modification table segments (STS1, STS2, STS3), a separation graph containing three subgraphs can be identified. The separation graph may include a modification document (STX) associated with each of the modification table segments (STS1, STS2, STS3).
[0155]
[0156] Referring again to FIG. 12, the processor (110) can identify an extended graph (S711). The processor (110) can identify the extended graph based on a separated graph (graph (j) in FIG. 15) and an additional graph (Gn in FIG. 14). The additional graph (Gn in FIG. 14) may include additional table segments (TS_N1, TS_N2 in FIG. 14) and additional documents (TX_N1, TX_N2 in FIG. 14).
[0157] FIG. 16 is a diagram illustrating step S711 of FIG. 12.
[0158] Referring to FIG. 16, the processor (110) can identify an extended graph (Ge) by removing duplicate segments and documents from the separated graph (graph (j) in FIG. 15) and the additional graph (Gn in FIG. 14). For example, if the second additional table segment (TS_N2) and the third modification table segment (STS3) overlap in the separated graph (graph (j) in FIG. 15) and the additional graph (Gn in FIG. 14), the graphs for the second additional table segment (TS_N2) and the third modification table segment (STS3), respectively, can be combined into one.
[0159]
[0160] Referring again to FIG. 12, the processor (110) can identify the expanded result graph (S713). The processor (110) can identify the expanded result graph by removing pairs of table segment documents that are not (or have low relevance) to the question from the expanded graph (Ge in FIG. 16).
[0161] Figure 17 is a diagram illustrating step S713 of Figure 12.
[0162] Referring to FIG. 17, the processor (110) can identify an extended result graph (Gq) containing result table segments (TS_Q) and result documents (TX_Q). The extended result graph (Gq) may be the extended graph (Ge) of FIG. 16 from which pairs of table segment documents that are not (or have low relevance) to the question have been removed.
[0163] In some embodiments, the processor (110) inputs a question and an expanded graph (Ge in FIG. 16) into a large language model and receives from the large language model an expanded result graph (Gq) from which pairs of table segment documents that are not (or have low relevance) to the question have been removed.
[0164] Table segments and documents may be added and / or deleted in the extended result graph (Gq) compared to the subgraph (Gc in Fig. 6). For example, even if a document is selected in the subgraph (Gc in Fig. 6), it may be deleted in the extended result graph (Gq) if it is determined to have low relevance to the question. Also, for example, even if a table segment was not selected in the subgraph (Gc in Fig. 6), it may be added in the extended result graph (Gq) if it is determined to have high relevance to the question.
[0165]
[0166] Referring again to FIG. 12, the processor (110) can identify a separated graph (S709) without restoring the modified table segment to a table if restoration is not required. The processor (110) can identify an expanded graph (Ge in FIG. 16) by removing duplicate segments and documents from the separated graph (graph (j) in FIG. 15) (S711). The processor (110) can identify an expanded result graph (Gq in FIG. 17) by removing table segment document pairs that are not related to the question (or have low relevance) from the expanded graph (Ge in FIG. 16) (S713).
[0167]
[0168] Referring again to FIGS. 1 and FIGS. 2, the processor (110) can identify the final similarity score (S900). The processor (110) can identify the final similarity score between pairs of result table segment documents included in the question and the extended result graph (Gq in FIG. 17).
[0169] The processor (110) can identify the final graph (S1000) based on the final similarity score. The processor (110) can identify the final graph in which the result table segment document pairs are sorted based on the final similarity score. The result table segment document pairs of the final graph may be sorted in order of highest final similarity score.
[0170] FIG. 18 is a diagram illustrating steps S900 and S1000 of FIG. 2.
[0171] Referring to FIG. 18, the processor (110) can identify a final graph (Eq) in which the extended result graph (Gq) of FIG. 17 is aligned based on the final similarity score.
[0172] In some embodiments, the processor (110) inputs the question and the expanded result graph (Gq of FIG. 17) into a large language model and receives a final graph from the large language model sorted in order of highest final similarity.
[0173]
[0174] The processor (110) can identify and provide the final graph as a response to the question.
[0175]
[0176] A search device according to an embodiment of the present invention can identify a final graph (Eq) through the steps (S400, S500, S600, S700, S800, S900, S1000) of FIG. 2 from a subgraph (Gc), which is a portion of a graph selected in consideration of relevance to a question in an initial graph (Gd), thereby identifying tables and / or documents that are highly relevant to the question that were not searched in the subgraph (Gc), and can provide a response result that is highly relevant to the question by removing table segments and / or documents that were searched despite having low relevance to the question from the subgraph (Gc).
[0177]
[0178] Hereinafter, a method for searching tables and documents according to a question according to an embodiment of the present invention will be described with reference to FIG. 19. For clarity of explanation, parts that overlap with what has been previously described will be simplified or omitted.
[0179] FIG. 19 is a flowchart illustrating a table and document search method according to a question according to an embodiment of the present invention.
[0180] Referring to FIG. 19, a table and document search method according to a question according to an embodiment of the present invention may include a step (S1000) of identifying an initial graph containing a plurality of first table segment document pairs. The initial graph may be formed, for example, in an initial fusion method.
[0181]
[0182] A table and document search method based on a question according to an embodiment of the present invention may include a step (S2000) in which a first similarity score is identified.
[0183]
[0184] A table and document search method based on a question according to an embodiment of the present invention may include a step (S3000) of identifying subgraphs based on a first similarity score. Among a plurality of first table segment document pairs, the top k pairs with high first similarity scores may be identified as subgraphs.
[0185]
[0186] A table and document search method according to a question according to an embodiment of the present invention may include a step (S4000) of identifying at least one of a new table segment and a new document based on a subgraph.
[0187] The step (S4000) of identifying at least one of the new table segment and the new document may include the step of identifying each of the table segment of the subgraph and the document of the subgraph as a node. The step (S4000) of identifying at least one of the new table segment and the new document may include the step of identifying a node similarity score between the question and the node after the node is identified.
[0188] The step (S4000) of identifying at least one of the new table segments and the new documents may include the step of identifying a selected node group by identifying the top k nodes (where k is a natural number) with high node similarity scores among the nodes.
[0189] The step (S4000) of identifying at least one of the new table segment and the new document may include the step of identifying a search result in an initial graph that is simultaneously related to the selected node group and the question, and is the result of searching for a new table segment and a new document of a different type from the selected node group.
[0190] The step (S4000) of identifying at least one of the new table segment and the new document may include the step of identifying a second similarity score between the question and the search result.
[0191] The step (S4000) of identifying at least one of the new table segment and the new document may include identifying the top k search results with high second similarity scores to identify an additional search group. The additional search group may include at least one of the new table segment and the new document.
[0192]
[0193] A table and document search method according to a question according to an embodiment of the present invention may include a step (S5000) of identifying at least one second table segment document pair. At least one second table segment document pair may be at least some of a new table segment and a new document. At least one second table segment document pair may be identified based on a third similarity score for each of the new table segment and the new document. The third similarity score may be calculated based on a node similarity score and a second similarity score.
[0194]
[0195] A table and document search method according to a question according to an embodiment of the present invention may include a step (S6000) of identifying a modification graph. At least one second table segment document pair may be added to a subgraph so that the modification graph can be identified.
[0196]
[0197] A table and document search method according to a question according to an embodiment of the present invention may include a step (S7000) in which an expanded graph is identified.
[0198] The step of identifying an extended graph (S7000) may include the step of inputting a question into a large language model. The step of identifying an extended graph (S7000) may include the step of receiving from the large language model whether restoration of a modification table segment is required.
[0199] The step of identifying an expanded graph (S7000) may include a step of restoring a modified table segment to the original table based on the need for restoration. The step of identifying an expanded graph (S7000) may include a step of identifying an additional table segment related to a question in the table. The step of identifying an expanded graph (S7000) may include a step of identifying an additional document related to the additional table segment by searching for the additional table segment in either the modified graph or the initial graph. The step of identifying an expanded graph (S7000) may include a step of identifying a separated graph by separating the modified graph around the modified table segment. The step of identifying an expanded graph (S7000) may include a step of identifying an expanded graph based on the separated graph, the additional table segment, and the additional document.
[0200] If restoration is unnecessary, the step of identifying the extended graph (S7000) may include removing duplicate table segment document pairs from the separated graph to identify the extended graph.
[0201]
[0202] A table and document search method based on a question according to an embodiment of the present invention may include a step (S8000) of identifying an expanded result graph based on the relationship between the expanded graph and the question. The expanded result graph may be identified based on the relationship between the expanded graph and the question.
[0203]
[0204] A table and document search method based on a question according to an embodiment of the present invention may include a step (S9000) in which a final similarity score is identified.
[0205]
[0206] A table and document search method based on a question according to an embodiment of the present invention may include a step (S10000) of identifying a final graph based on a final similarity score. In the final graph, result table segment document pairs included in the extended result graph may be sorted in order of highest final similarity score.
[0207]
[0208] A table and document search method based on a question according to an embodiment of the present invention may include a step in which a final graph is provided as a response to the question.
[0209]
[0210] Various embodiments of the present document may be implemented as software (e.g., a program) comprising one or more instructions stored in a storage medium (e.g., a memory (120)) readable by a machine (e.g., a search device (100)). For example, a processor (110) of the machine (e.g., a search device (100)) may call at least one of the one or more instructions stored in the storage medium and execute it. This enables the machine to operate to perform at least one function according to the at least one called instruction. The one or more instructions may include code generated by a compiler or code that can be executed by an interpreter. The storage medium readable by the machine may be provided in the form of a non-transitory storage medium. Here, 'non-temporary' simply means that the storage medium is a tangible device and does not contain a signal (e.g., electromagnetic waves), and the term does not distinguish between cases where data is stored semi-permanently and cases where it is stored temporarily.
[0211] According to one embodiment, the method according to the various embodiments disclosed herein may be provided by being included in a computer program product. The computer program product may be traded between a seller and a buyer as a product. The computer program product may be distributed in the form of a device-readable storage medium (e.g., compact disc read-only memory (CD-ROM)), or distributed online (e.g., download or upload) through an application store (e.g., Play Store™) or directly between two user devices (e.g., smartphones). In the case of online distribution, at least a portion of the computer program product may be temporarily stored or temporarily created on a device-readable storage medium, such as the memory of a manufacturer's server, an application store's server, or a relay server.
[0212] According to various embodiments, each component (e.g., module or program) of the components described above may include a singular or multiple entities. According to various embodiments, one or more of the components or operations among the aforementioned components may be omitted, or one or more other components or operations may be added. Generally or additionally, multiple components (e.g., module or program) may be integrated into a single component. In this case, the integrated component may perform one or more functions of each of the components of the multiple components in the same or similar manner as those performed by the corresponding component among the multiple components prior to the integration. According to various embodiments, operations performed by the module, program, or other components may be executed sequentially, in parallel, iteratively, or heuristically, or one or more of the operations may be executed in a different order, omitted, or one or more other operations may be added.
[0213]
[0214] The above description is merely an illustrative explanation of the technical concept of the present embodiment, and a person skilled in the art to which the present embodiment belongs would be able to make various modifications and variations within the scope of the essential characteristics of the present embodiment. Accordingly, the present embodiments are intended to explain, not limit, the technical concept of the present embodiment, and the scope of the technical concept of the present embodiment is not limited by these embodiments. The scope of protection of the present embodiment shall be interpreted by the claims below, and all technical concepts within an equivalent scope shall be interpreted as being included within the scope of rights of the present embodiment.
Claims
1. In a table and document search device based on a question, processor; and It includes memory operatively connected to the above processor, and The above memory, when executed, causes the above processor: Identify table segments divided based on rows in the above table, and Identifying an initial graph containing multiple pairs of first table segment documents, wherein the table segment and the document associated with the table segment are paired, and Identifying a first similarity score between the above question and each of the plurality of first table segment document pairs, and Based on the first similarity score above, identify a subgraph that is part of the initial graph, and Based on the above subgraph, at least one of a new table segment and a new document that is different from each of the table segment of the subgraph and the document of the subgraph included in the above subgraph is identified, and Identifying at least one pair of second table segment documents that are at least some of the new table segments and the new documents, and By adding at least one second table segment document pair to the above subgraph, the modification graph is identified, and Based on the above modification graph, identify the extended graph modified from the above modification graph, and Storing instructions that identify the expanded result graph based on the relationship between the expanded graph and the question above. Search device.
2. In Paragraph 1, The above instructions are, the processor, Identifying the above initial graph formed by an early fusion method Search device.
3. In Paragraph 1, The above instructions are, the processor, Input the above question and the above initial graph into an external network, and receive the first similarity score from the external network. Search device.
4. In Paragraph 3, The above instructions are, the processor, Identifying the top k pairs (where k is a natural number) with the highest first similarity scores among the plurality of first table segment document pairs to identify the subgraph. Search device.
5. In Paragraph 1, The above instructions are, the processor, Each of the table segments of the above subgraph and the documents of the above subgraph is identified as a node, and Identify the node similarity score between the above question and the above node, and Among the above nodes, identify the top k nodes with high node similarity scores (where k is a natural number) to identify a selected node group. Search device.
6. In Paragraph 5, The above instructions are, the processor, Identifying search results that are the result of searching the initial graph for related table segments and related documents that are simultaneously related to the above-mentioned selected node group and the above-mentioned question, and which are of a different type from the above-mentioned selected node group, and Identifying a second similarity score between the above question and the above search result, Identify the top k results with the highest second similarity scores among the above search results to identify additional search groups, and The above additional search group includes at least one of the above new table segment and the above new document. Search device.
7. In Paragraph 6, The above instructions are, the processor, Identifying a third similarity score for each of the new table segments and the new documents, and Identifying at least one pair of second table segment documents based on the third similarity score. Search device.
8. In Paragraph 7, The above instructions are, the processor, Identify the third similarity score based on the above node similarity score and the above second similarity score, and Identifying the top k with the highest third similarity scores among the additional search groups to identify at least one second table segment document pair. Search device.
9. In Paragraph 1, The above modification graph includes a table segment of the subgraph, a document of the subgraph, a second table segment of the at least one second table segment document pair, and a second document of the at least one second table segment document pair, and The table segment of the above subgraph and the second table segment are modification table segments, and The document of the above subgraph and the above second document are modification documents, and The above instructions are, the processor, Input the above question into a large language model, and From the above large language model, receive whether restoration of the above modification table segment is required, and Based on the need for the above restoration, restore the above modified table segment to the above table, and In the table above, identify additional table segments related to the question above, and Search for the additional table segment in either the modified graph or the initial graph to identify additional documents associated with the additional table segment, and The above modification graph is separated around the above modification table segment to identify the separated graph, and Identifying the extended graph based on the above separated graph, the above additional table segment, and the above additional document Search device.
10. In Paragraph 9, The above instructions are, the processor, Removing table segment document pairs unrelated to the question from the expanded graph to identify the expanded result graph Search device.
11. In Paragraph 9, The above instructions are, the processor, Based on the fact that the above restoration is unnecessary, duplicate table segment document pairs are removed from the above separated graph to identify the above extended graph. Search device.
12. In Paragraph 11, The above instructions are, the processor, Removing table segment document pairs unrelated to the question from the expanded graph to identify the expanded result graph Search device.
13. In Paragraph 1, The above instructions are, the processor, Identify the final similarity score between the pairs of result table segment documents included in the above question and the above extended result graph, and Identifying the final graph in which the result table segment document pairs are aligned based on the final similarity score. Search device.
14. In Paragraph 13, The above instructions are, the processor, Providing the above final graph as an answer to the above question Search device.
15. A step of identifying table segments divided based on rows; A step of identifying an initial graph containing multiple pairs of first table segment documents, wherein the table segment and the document associated with the table segment are paired; A step of identifying a first similarity score between a question and each of the plurality of first table segment document pairs; A step of identifying a subgraph that is part of the initial graph based on the first similarity score; A step of identifying at least one of a new table segment and a new document that are different from each of the table segment of the subgraph and the document of the subgraph included in the subgraph, based on the above subgraph; A step of identifying at least one pair of second table segment documents, which is at least part of the new table segment and the new document; A step of adding at least one second table segment document pair to the above subgraph to identify a modified graph; Based on the above modification graph, a step of identifying an extended graph in which the modification graph is modified; and A step comprising identifying an expanded result graph based on the relationship between the expanded graph and the question. Methods for searching tables and documents based on questions.
16. In Paragraph 15, The step of identifying at least one of the new table segment and the new document is A step in which each of the table segments of the subgraph and the documents of the subgraph are identified as nodes; A step of identifying a node similarity score between the above question and the above node; and The method includes a step of identifying the top k nodes (where k is a natural number) with high node similarity scores among the above nodes to identify a selected node group. Methods for searching tables and documents based on questions.
17. In Paragraph 16, The step of identifying at least one of the new table segment and the new document is A step of identifying a search result that is the result of searching the initial graph for the new table segment and the new document, which are simultaneously related to the selected node group and the question and are of a different type from the selected node group; A step of identifying a second similarity score between the above question and the above search result; and The method includes the step of identifying the top k search results with high second similarity scores among the above search results to identify additional search groups. The above additional search group includes at least one of the above new table segment and the above new document. Methods for searching tables and documents based on questions.
18. In Paragraph 15, The above modification graph includes a table segment of the subgraph, a document of the subgraph, a second table segment of the at least one second table segment document pair, and a second document of the at least one second table segment document pair, and The table segment of the above subgraph and the second table segment are modification table segments, and The document of the above subgraph and the above second document are modification documents, and The step of identifying the above-mentioned extended graph is, The step of inputting the above question into a large language model; A step of receiving from the above large language model whether restoration of the above modification table segment is required; Based on the need for the above restoration, a step of restoring the above modified table segment to a table; A step of identifying additional table segments related to the question in the above table; A step of searching for the additional table segment in either the modified graph or the initial graph to identify additional documents associated with the additional table segment; A step of separating the above modification graph around the above modification table segment to identify the separated graph; and A step of identifying the extended graph based on the above separated graph, the above additional table segment, and the above additional document. Methods for searching tables and documents based on questions.
19. In Paragraph 18, The step of identifying the above-mentioned extended graph is, Based on the fact that the above restoration is unnecessary, the method includes the step of removing duplicate table segment document pairs from the separated graph to identify the expanded graph. Methods for searching tables and documents based on questions.
20. In Paragraph 15, A step of identifying the final similarity score between the pair of result table segment documents included in the above question and the above extended result graph; A step of identifying a final graph in which the result table segment document pairs are aligned based on the final similarity score; and The above final graph further includes the step of being provided as a response to the above question. Methods for searching tables and documents based on questions.