Execution plan generation method and device
The method addresses the limitations of RAG models by evaluating query complexity and generating subqueries to create a dataset, enhancing the RAG model's ability to process queries efficiently and accurately.
Patent Information
- Authority / Receiving Office
- WO · WO
- Patent Type
- Applications
- Current Assignee / Owner
- POSCO HLDG INC
- Filing Date
- 2025-12-16
- Publication Date
- 2026-06-25
AI Technical Summary
Existing Retriever-Augmented Generation (RAG) models in natural language processing lack the ability to organically link additional features for question and answer, such as optimizing and rewriting user queries, selecting external knowledge databases, and specifying generated results according to user requirements, necessitating the development of a training dataset for creating an execution plan.
A method and apparatus for generating an execution plan that evaluates query complexity, determines a processing method, generates subqueries, and extracts documents from selected databases to create a dataset, using techniques like Byte Pair Encoding, Attention Mechanism, and Seq2Seq algorithms to enhance the RAG model's functionality.
Enhances the RAG model's capability to generate accurate and richer text outputs by improving the processing speed and completeness of information extraction based on query complexity, enabling more effective data retrieval and generation.
Smart Images

Figure KR2025021797_25062026_PF_FP_ABST
Abstract
Description
Execution plan generation method and device
[0001] The present embodiments relate to a method and apparatus for generating an execution plan.
[0002] RAG (Retriever-Augmented Generation) is a model used in the field of natural language processing.
[0003] Generally, RAG can be used as input for a Large Language Model (LM), and it is a model that helps the LLM generate more accurate and richer text output by extracting information from external data instead of directly generating text.
[0004] RAG includes the process of a user inputting a query into a model and the process of extracting related documents or data based on the query to provide information to be used as input values for LLM.
[0005] However, existing RAGs have a disadvantage in that they cannot organically link additional features for question and answer, such as optimizing and rewriting user queries, selecting external knowledge databases to extract from, or specifying generated results according to user requirements.
[0006] Therefore, it is necessary to generate data by compensating for the aforementioned shortcomings and inputting it into an LLM model; however, development of such technology has been insufficient to date.
[0007] The present embodiments relate to a method and apparatus for generating an execution plan capable of generating a training data set for creating an execution plan.
[0008] In one aspect, the embodiments may provide a method for generating an execution plan, comprising: a processing method determination step for evaluating the complexity of a query included in an input value and determining a processing method according to the complexity of the query; a document extraction step for generating a subquery using the input value according to the processing method and extracting documents using a selected database according to the characteristics of the subquery; and a data set generation step for generating a data set by mapping documents and subqueries.
[0009] In another aspect, the embodiments may provide an execution plan generating device comprising: a processing method determining unit that evaluates the complexity of a query included in an input value and determines a processing method according to the complexity of the query; a document extraction unit that generates a subquery using the input value according to the processing method and extracts a document using a selected database according to the characteristics of the subquery; and a data set generating unit that generates a data set by mapping the document and the subquery.
[0010] According to the embodiments thereof, a method and apparatus for generating an execution plan capable of generating a training data set for creating an execution plan can be provided.
[0011] FIG. 1 is a flowchart illustrating a method for generating an execution plan according to the embodiments.
[0012] FIG. 2 is a flowchart illustrating the steps for determining the processing method according to the embodiments.
[0013] FIG. 3 is a flowchart illustrating the document extraction step according to the present embodiment.
[0014] FIG. 4 is a block diagram illustrating an execution plan generation device according to the embodiments.
[0015] Hereinafter, some embodiments of the present disclosure will be described in detail with reference to the exemplary drawings. In assigning reference numerals to the components of each drawing, the same components may have the same reference numeral as much as possible, even if they are shown in different drawings. Furthermore, in describing the embodiments, if it is determined that a detailed description of related known components or functions may obscure the essence of the technical concept, such detailed description may be omitted. Where terms such as "comprising," "having," or "consisting of" are used in this specification, other parts may be added unless "only" is used. Where a component is expressed in the singular, it may include a plural unless otherwise specified.
[0016] Additionally, terms such as first, second, A, B, (a), (b), etc., may be used to describe the components of the present disclosure. These terms are used merely to distinguish the components from other components, and the nature, order, sequence, or number of the components are not limited by such terms.
[0017] In describing the positional relationship of components, where it is stated that two or more components are "connected," "combined," or "joined," it should be understood that while the two or more components may be directly "connected," "combined," or "joined," they may also be "connected," "combined," or "joined" with other components "intervened." Here, the other components may be included in one or more of the two or more components that are "connected," "combined," or "joined" with one another.
[0018] In describing the temporal flow relationship regarding components, methods of operation, or methods of production, for example, when the temporal or sequential relationship is described using "after," "following," "next," or "before," it may include cases where the relationship is not continuous unless "immediately" or "directly" is used.
[0019] Meanwhile, where numerical values or corresponding information regarding a component (e.g., levels, etc.) are mentioned, even without separate explicit notation, the numerical values or corresponding information may be interpreted as including a range of error that may occur due to various factors (e.g., process factors, internal or external shocks, noise, etc.).
[0020] RAG (Retriever-Augmented Generation) is a model used in the field of natural language processing.
[0021] Generally, RAG can be used as an input value for a Large Language Model (LM), and it is an artificial intelligence model that helps the LLM generate more accurate and richer text output by extracting information from external data instead of directly generating text.
[0022] RAG includes the process of a user inputting a query into a model and the process of extracting related documents or data based on the query to provide information to be used as input values for LLM.
[0023] However, existing RAGs have a disadvantage in that they cannot organically link additional features for question and answer, such as optimizing and rewriting user queries, selecting external knowledge databases to extract from, or specifying generated results according to user requirements.
[0024] Therefore, in order to compensate for the aforementioned shortcomings, it is necessary to generate a training dataset for creating an execution plan and provide the training dataset to a RAG or LLM to train it, but the development of such technology has been insufficient to date.
[0025] Accordingly, the embodiments of the present disclosure aim to provide a technology capable of generating a training data set for creating an execution plan by compensating for the aforementioned disadvantages. Hereinafter, the embodiments of the present disclosure will be described with reference to the drawings.
[0026]
[0027] FIG. 1 is a flowchart illustrating a method for generating an execution plan according to the present embodiments. FIG. 2 is a flowchart illustrating a step for determining a processing method according to the present embodiments. FIG. 3 is a flowchart illustrating a document extraction step according to the present embodiments.
[0028] The embodiments of the present disclosure aim to induce the RAG or LLM to learn by providing a dataset as training data to the RAG or LLM.
[0029] The execution plan generation method of the present disclosure and each step of the execution plan generation method can be implemented through an LLM or a language model (LLAMA, QWEN, Chat GPT, etc.). For convenience of explanation, the execution plan generation method is described below based on the Chat GPT among language models, and for convenience of explanation, the terms "language model" and "Chat GPT" are used interchangeably. However, various language models may be utilized and are not limited to the present embodiment.
[0030] Referring to FIG. 1, the execution plan generation method may include a processing method determination step (S110) that evaluates the complexity of a query included in an input value and determines a processing method according to the complexity of the query, a document extraction step (S120) that generates a subquery from the input value according to the processing method and extracts documents using a selected database according to the characteristics of the subquery, and a data set generation step (S130) that generates a data set by mapping documents and subqueries.
[0031] The processing method determination step evaluates the complexity of the query included in the input value and can determine the processing method based on the complexity of the query. (S110)
[0032] The operation of the processing method determination step is explained with reference to Fig. 2.
[0033] For example, the input value may correspond to data entered by a user into the aforementioned language model. In this case, the input value may be in the form of text including conjunctions, symbols, interrogative words, and queries, and may further include numerical forms. However, the present embodiment is not limited to this, and various input values may be used.
[0034] For another example, referring to FIG. 2, the complexity of a query can be evaluated using at least one of the conjunctions, symbols, and interrogative words included in the query. (S111)
[0035] The complexity of a query can be determined based on whether it is logically complex, using the number of conjunctions included in the query and their usage patterns. For example, if a query contains multiple conjunctions, it may be deemed complex.
[0036] In addition, the complexity of a query can be determined based on whether it is logically complex, using mathematical or logical symbols included in the query. For example, if a query contains multiple symbols, it may be judged to be complex.
[0037] In addition, the complexity of a query can be defined by its purpose and scope based on the interrogative words included in the query, and a query containing multiple interrogative words may be judged to be complex.
[0038] However, the present embodiment is not limited to this, and various factors may be used to evaluate the complexity of a query. A detailed method for evaluating the complexity of a query is described below.
[0039] For example, the operation to evaluate query complexity may include preprocessing input values, analyzing the query's syntax structure, and calculating the complexity score. This is explained using the example 'Display only data completed after 2024'.
[0040] For example, the operation of preprocessing input values can be implemented by Byte Pair Encoding (BPE), an algorithm that decomposes and tokenizes input values included in a language model, and conjunctions, symbols, interrogative words, etc., can be extracted as tokens.
[0041] In this case, the token may include 'SELECT', '*', 'FROM', 'data', 'WHEN', 'date', '>', '2024-01-01', 'AND', 'status', '=', and "'completed'.
[0042] In addition, the operation of analyzing the syntactic structure of a query can be implemented by an Attention Mechanism included in the language model that can analyze the syntactic structure of the query, and the syntactic structure of the query can be analyzed based on extracted tokens such as conjunctions, symbols, and interrogative words.
[0043] In this operation, tokenized conjunctions, symbols, and interrogative words generate a syntax tree, and the relationship between each condition and operator can be visually analyzed using the generated syntax tree.
[0044] For example, a syntax tree can be generated as 'Date' > '2024-01-01' AND Status = 'Completed'. In this case, it can be analyzed that 'Date' > '2024-01-01' and Status = 'Completed' are connected by AND.
[0045] In addition, the operation of calculating the complexity score can be implemented by Contextual Embedding, which can score complexity by setting weights for conjunctions, symbols, and interrogative words included in the language model.
[0046] For example, the example phrase may have 2 conjunctions, 3 symbols, and 1 interrogative word, and the complexity score may be 6.
[0047] For example, the complexity of a query can be evaluated by calculating a complexity score based on a combination of the aforementioned operations. For example, if the criterion for evaluating query complexity is set to 2 points, the complexity of the query can be evaluated as simple if the complexity score is less than or equal to 2. Additionally, if the complexity score is greater than 2, the complexity of the query can be evaluated as not simple.
[0048] However, the complexity of the query is not limited to the present embodiment and can be evaluated by a combination of various methods, various algorithms, and various operations.
[0049] As another example, referring to FIG. 2, the processing method determination step may determine the processing method as a simple processing method if the complexity of the query is evaluated as simple, and determine the processing method as a complex processing method if the complexity of the query is evaluated as not simple. (S112)
[0050] If the complexity of the query is evaluated as simple, a simple processing method is determined, allowing documents to be extracted quickly during the document extraction stage. If the complexity of the query is evaluated as not simple, a complex processing method is determined, and documents can be extracted using the two document extraction methods to be explained later, which may result in slow processing of information. Therefore, to improve the processing speed of information, it is necessary to determine the processing method differently depending on the complexity of the query.
[0051] In addition, the step of determining the processing method is not limited to the present embodiment and can determine the processing method in various ways.
[0052] Below, the operation of extracting documents in the document extraction stage according to either the simple processing method or the complex processing method is explained.
[0053] Meanwhile, the document extraction step can generate a subquery using input values according to the processing method, and extract documents using a selected database according to the characteristics of the subquery. (S120)
[0054] The operation of the document extraction step is explained with reference to Fig. 3.
[0055] For example, the document extraction step can generate a subquery using input values depending on the processing method. (S121)
[0056] For example, a subquery may include a first subquery generated when the processing method is simple and two or more second subqueries generated when the processing method is complex.
[0057] As another example, in the document extraction step, if the processing method corresponds to a simple processing method, the query included in the input value can be corrected to generate a first subquery.
[0058] For example, in order to improve the processing speed of information because the complexity of the query included in the input value is evaluated as simple, the document extraction step can generate a first subquery by performing only a correction operation without needing to decompose the query included in the input value.
[0059] The following explanation uses the case where the input value is "Tell me the strategy Company A chose in fields such as solid electrolytes and lithium metal amidst the electric vehicle market chasm."
[0060] For example, the operation of generating the first subquery may include the operation of extracting key elements and question intent, the operation of matching a query pattern by correcting the question intent, and the operation of converting the result value of the matched query pattern into SQL.
[0061] In this case, the operation of extracting key elements and question intent can be performed by utilizing the Attention Mechanism included in the language model.
[0062] In this case, the subject is "Company A," the objects are "solid electrolytes" and "lithium metals," the situation is the "chasm in the electric vehicle market," and the intention of the question is "to know Company A's strategy."
[0063] In addition, the operation of matching query patterns by correcting the question intent can be implemented by a rule-based pattern matching algorithm included in the language model.
[0064] In this case, query pattern matching can be performed by removing "want to know" included in the question intent, converting the keyword "strategy" included in the question intent to "extraction," mapping "target" to "field," mapping "situation" to "market situation," and mapping "subject" to "company."
[0065] In addition, the operation of converting result values matched with query patterns into SQL can be implemented by a Seq2Seq algorithm that converts result values matched with query patterns, which are in a natural language format included in the language model, into a query (SQL) format so that they can be input into the database.
[0066] For example, the result value matched by the query pattern can be expressed as "Return of solid electrolyte and lithium metal strategy in the electric vehicle market chasm situation of Company A," and the result value matched by the query pattern can be generated as the first subquery.
[0067] In addition, the first subquery can be expressed in a query (SQL) format that can be entered into a database.
[0068] For example, the first subquery may include content where 'goal' and 'strategy' are mapped, 'company' is mapped to 'Company A', 'market situation' is mapped to 'EV chasm', and 'field' is mapped to 'solid electrolyte' and 'lithium metal'.
[0069] However, not limited to the present embodiment, the first subquery may include mappings of various contents, and the document extraction step may generate the first subquery using various algorithms included in the language model based on input values.
[0070] As another example, if the processing method corresponds to a complex processing method, the document extraction step may generate two or more second subqueries by decomposing and correcting the query included in the input value.
[0071] The operation of generating two or more second subqueries of the present disclosure further includes an operation of decomposing the query in the operation of generating the first subquery. That is, since the operation of generating the second subquery is similar in operation, function, and control method to the operation of generating the first subquery and the operation of decomposing the query, the present embodiment focuses on the operation of decomposing the query.
[0072] In this case, the explanation is provided using the input value "Tell me about the strategies Company A has chosen in fields such as solid electrolytes and lithium metal amidst the electric vehicle market chasm, and its competitiveness in light of recent global trends."
[0073] For example, the operation of decomposing a query may include the operation of extracting key elements and question intent, and the operation of decomposing the query by analyzing dependencies.
[0074] In this case, the operation of extracting key elements and question intent can be implemented by an Attention Mechanism included in the language model, and key elements and question intent can be extracted from the query included in the input value.
[0075] For example, the subject is "Company A," the objects are "solid electrolytes" and "lithium metals," the situation is the "chasm in the electric vehicle market," and the intent of the question is "Company A's strategy" and "competitiveness in light of global circumstances."
[0076] In addition, the operation of decomposing a query by analyzing dependencies can be implemented using the Attention Mechanism included in the language model.
[0077] For example, "strategy" can be linked to "solid electrolytes" and "lithium metals," and "competitiveness" can be linked to "recent global circumstances."
[0078] That is, the language model can decompose the first decomposition data, "strategy," "solid electrolyte," and "lithium metal," and the second decomposition data, "competitiveness" and "recent global situation," using the query included in the input value.
[0079] Subsequently, as described above, each query pattern can be matched using a Rule-based Pattern Matching algorithm included in the language model based on the respective decomposed first and second decomposed data, core elements, and question intent.
[0080] In addition, the language model can generate two or more second subqueries by inputting the result values of each query pattern matched into the Seq2Seq algorithm.
[0081] For example, the query included in the input value can be converted into a text format, such as "Company A's strategy regarding solid electrolytes and lithium metal" and "Company A's competitiveness in light of global conditions".
[0082] In this case, the query converted into text format can be generated into two or more second subqueries in the form of a query (SQL) that can be entered into a database.
[0083] For example, a second subquery can map 'goal' to 'strategy', 'company' to 'Company A', 'market_situation' to 'EV chasm', and 'field' to 'solid electrolyte' and 'lithium metal'.
[0084] Another second subquery can map 'goal' to 'competitiveness', 'company' to 'Company A', and 'recent trends' to 'global situation'.
[0085] However, not limited to the present embodiment, the second subquery may include mappings of various contents, and the document extraction step may generate the second subquery using various algorithms included in the language model based on input values.
[0086] As another example, the document extraction step can select a database based on the characteristics of the subquery (S122) and extract documents using the selected database (S123).
[0087] For example, if the subquery is the first subquery, it is explained using the example described above. According to the example described above, the first subquery may include content where 'goal' and 'strategy' are mapped, 'company' is mapped to 'Company A', 'market situation' is mapped to the 'electric vehicle chasm', and 'field' is mapped to 'solid electrolyte' and 'lithium metal'.
[0088] For example, the document extraction step may select any one of 'goal', 'company', and 'market conditions' included in the first subquery as a feature of the first subquery, and a database may be selected according to the feature of the first subquery.
[0089] In this case, if the characteristic of the first subquery corresponds to either 'goal' or 'market conditions', a database from which news can be extracted may be selected. As another example, if the characteristic of the first subquery is 'company', the internal knowledge database of 'Company A' may be selected.
[0090] As another example, the document extraction step can further determine the dependency relationships of each feature of two or more second subqueries to extract documents simultaneously or in order.
[0091] For example, dependency relationships can be determined by analyzing the structure of two or more second subqueries using an Attention Mechanism included in the language model, understanding the context of the second subqueries and learning the relationships using Embeddings included in the language model, and calculating similarity using a Cosine similarity determination algorithm included in the language model. Through this, it is possible to determine how related two or more second subqueries are.
[0092] As another example, if the document extraction step determines that there are no dependencies, it can select a database based on the characteristics of each of two or more second subqueries and extract documents simultaneously.
[0093] That is, if it is determined that each of two or more second subqueries is not related to one another, in order to reduce the document extraction time, the document extraction step can select the characteristics of each of the two or more second subqueries, select each database, and extract documents simultaneously.
[0094] As another example, in the document extraction step, if a dependency relationship is determined, databases are selected according to the characteristics of each of two or more second subqueries, the order is determined according to the dependency relationship, and documents can be extracted according to the order.
[0095] That is, if it is determined that each of two or more second subqueries is related to one another, in order to increase the completeness of the resulting data set, the document extraction step selects a database according to the characteristics of each of the two or more second subqueries, evaluates the dependency relationship for each, determines the second subquery with a high dependency relationship as the priority and determines the ranking as the dependency relationship decreases as the priority decreases, and extracts documents according to the ranking.
[0096] However, the present embodiment is not limited thereto, and the document extraction step may select or extract features of the subquery in various ways, and various methods for extracting documents may be used.
[0097] In addition, the document extraction step is not limited to the present embodiment, and various algorithms, models, methods, etc. capable of determining dependency relationships included in the language model may be used.
[0098] Meanwhile, the dataset creation step can create a dataset by mapping documents and subqueries. (S130)
[0099] For example, in the data set creation step, if the processing method determination step determines a simple processing method, the extracted document and the first subquery can be mapped to create a data set in text format.
[0100] For example, a dataset can be created by mapping the first subquery to the entire extracted document.
[0101] As another example, a dataset can be created in the order of the first subquery, the title of the extracted document, and the body of the extracted document.
[0102] However, the mapping method of the data set can be configured in various ways, not limited to the present embodiment.
[0103] As another example, in the data set creation step, if it is determined that the processing method is a composite processing method in the processing method determination step and determined that there is no dependency relationship in the document extraction step, the simultaneously extracted documents can be mapped to two or more second subqueries corresponding to each document, and the mapped second subqueries and documents can be integrated to create a data set.
[0104] For example, a dataset can be generated by mapping each second subquery to the title of the extracted document and the body of the extracted document.
[0105] As another example, the dataset can be generated by mapping each second subquery to the title of the extracted document and the date of the extracted document.
[0106] However, the mapping method of the data set can be configured in various ways, not limited to the present embodiment.
[0107] As another example, in the data set creation step, if it is determined that the processing method is a composite processing method in the processing method determination step and determined that there is a dependency relationship in the document extraction step, the data set of the Nth order can be created based on the documents extracted in order and the second subquery of the Nth order (where N is a natural number).
[0108] In this case, the document extraction step can integrate the Nth-order dataset and the N+1th-order dataset to select features again, select a database again based on the features, and extract documents. Then, the dataset creation step can create the N+1th-order dataset based on the extracted documents and the N+1th-order second subquery.
[0109] This process is repeated until the last step, and finally, a single data set can be generated.
[0110] Additionally, the dataset can be generated by mapping each second subquery to the title of the extracted document and the body of the extracted document. Alternatively, the dataset can be generated by mapping each second subquery to the title of the extracted document and the date of the extracted document.
[0111] However, the mapping method of the data set can be configured in various ways, not limited to the present embodiment.
[0112] FIG. 4 is a block diagram illustrating an execution plan generation device according to the embodiments.
[0113] Referring to FIG. 4, the execution plan generating device (1) may include a processing method determining unit (410) that evaluates the complexity of a query included in the input value and determines a processing method according to the complexity of the query, a document extraction unit (420) that generates a subquery from the input value according to the processing method and extracts a document using a selected database according to the characteristics of the subquery, and a data set generating unit (430) that generates a data set by mapping the document and the subquery.
[0114] The processing method determination unit (410) can evaluate the complexity of the query included in the input value and determine the processing method according to the complexity of the query.
[0115] For example, the input value may correspond to data entered by a user into the aforementioned language model. In this case, the input value may be in the form of text including conjunctions, symbols, interrogative words, and intentions, and may further include numerical forms. However, the present embodiment is not limited to this, and various input values may be used.
[0116] As another example, the complexity of a query can be evaluated using at least one of the conjunctions, symbols, and interrogative words included in the query.
[0117] The complexity of a query can be determined based on whether it is logically complex, using the number of conjunctions included in the query and their usage patterns. For example, if a query contains multiple conjunctions, it may be deemed complex.
[0118] In addition, the complexity of a query can be determined based on whether it is logically complex, using mathematical or logical symbols included in the query. For example, if a query contains multiple symbols, it may be judged to be complex.
[0119] In addition, the complexity of a query can be defined by its purpose and scope based on the interrogative words included in the query, and a query containing multiple interrogative words may be judged to be complex.
[0120] However, the present embodiment is not limited to this, and various factors may be used to evaluate the complexity of a query. A detailed method for evaluating the complexity of a query is described below.
[0121] For example, the operation to evaluate query complexity may include preprocessing input values, analyzing the query's syntax structure, and calculating the complexity score. This is explained using the example 'Display only data completed after 2024'.
[0122] For example, the operation of preprocessing input values can be implemented by Byte Pair Encoding (BPE), an algorithm that decomposes and tokenizes input values included in a language model, and conjunctions, symbols, interrogative words, etc., can be extracted as tokens.
[0123] In this case, the token may include 'SELECT', '*', 'FROM', 'data', 'WHEN', 'date', '>', '2024-01-01', 'AND', 'status', '=', and "'completed'.
[0124] In addition, the operation of analyzing the syntactic structure of a query can be implemented by an Attention Mechanism included in the language model that can analyze the syntactic structure of the query, and the syntactic structure of the query can be analyzed based on extracted tokens such as conjunctions, symbols, and interrogative words.
[0125] In this operation, tokenized conjunctions, symbols, and interrogative words generate a syntax tree, and the relationship between each condition and operator can be visually analyzed using the generated syntax tree.
[0126] For example, a syntax tree can be generated as 'Date' > '2024-01-01' AND Status = 'Completed'. In this case, it can be analyzed that 'Date' > '2024-01-01' and Status = 'Completed' are connected by AND.
[0127] In addition, the operation of calculating the complexity score can be implemented by Contextual Embedding, which can score complexity by setting weights for conjunctions, symbols, and interrogative words included in the language model.
[0128] For example, the example phrase may have 2 conjunctions, 3 symbols, and 1 interrogative word, and the complexity score may be 6.
[0129] For example, the complexity of a query can be evaluated by calculating a complexity score based on a combination of the aforementioned operations. For example, if the criterion for evaluating query complexity is set to 2 points, the complexity of the query can be evaluated as simple if the complexity score is less than or equal to 2. Additionally, if the complexity score is greater than 2, the complexity of the query can be evaluated as not simple.
[0130] However, the complexity of the query is not limited to the present embodiment and can be evaluated by a combination of various methods, various algorithms, and various operations.
[0131] As another example, the processing method determination unit (410) may determine the processing method as a simple processing method when the complexity of the query is evaluated as simple, and determine the processing method as a complex processing method when the complexity of the query is evaluated as not simple.
[0132] If the complexity of the query is evaluated as simple, a simple processing method is determined, and the document extraction unit (420) can quickly extract the document. If the complexity of the query is evaluated as not simple, a complex processing method is determined, and the document extraction unit (420) can extract the document using two document extraction methods to be explained later, so the information may be processed slowly. Therefore, in order to improve the processing speed of the information, it is necessary to determine the processing method differently depending on the complexity of the query.
[0133] In addition, the processing method determining unit (410) is not limited to this embodiment and can determine the processing method in various ways.
[0134] Below, the operation of extracting documents from the document extraction unit (420) according to either a simple processing method or a complex processing method will be explained.
[0135] Meanwhile, the document extraction unit (420) can generate a subquery using input values according to the processing method and extract documents using a selected database according to the characteristics of the subquery.
[0136] For example, the document extraction unit (420) can generate a subquery using input values according to the processing method.
[0137] For example, a subquery may include a first subquery generated when the processing method is simple and two or more second subqueries generated when the processing method is complex.
[0138] As another example, the document extraction unit (420) can generate a first subquery by correcting the query included in the input value when the processing method corresponds to a simple processing method.
[0139] For example, in order to improve the processing speed of information because the complexity of the query included in the input value is evaluated as simple, the document extraction unit (420) can generate a first subquery by performing only a correction operation without needing to decompose the query included in the input value.
[0140] The following explanation uses the case where the input value is "Tell me the strategy Company A chose in fields such as solid electrolytes and lithium metal amidst the electric vehicle market chasm."
[0141] For example, the operation of generating the first subquery may include the operation of extracting key elements and question intent, the operation of matching a query pattern by correcting the question intent, and the operation of converting the result value of the matched query pattern into SQL.
[0142] In this case, the operation of extracting key elements and question intent can be performed by utilizing the Attention Mechanism included in the language model.
[0143] In this case, the subject is "Company A," the objects are "solid electrolytes" and "lithium metals," the situation is the "chasm in the electric vehicle market," and the intention of the question is "to know Company A's strategy."
[0144] In addition, the operation of matching query patterns by correcting the question intent can be implemented by a rule-based pattern matching algorithm included in the language model.
[0145] In this case, query pattern matching can be performed by removing "want to know" included in the question intent, converting the keyword "strategy" included in the question intent to "extraction," mapping "target" to "field," mapping "situation" to "market situation," and mapping "subject" to "company."
[0146] In addition, the operation of converting result values matched with query patterns into SQL can be implemented by a Seq2Seq algorithm that converts result values matched with query patterns, which are in a natural language format included in the language model, into a query (SQL) format so that they can be input into the database.
[0147] For example, the result value matched by the query pattern can be expressed as "Return of solid electrolyte and lithium metal strategy in the electric vehicle market chasm situation of Company A," and the result value matched by the query pattern can be generated as the first subquery.
[0148] In addition, the first subquery can be expressed in a query (SQL) format that can be entered into a database.
[0149] For example, the first subquery may include content where 'goal' and 'strategy' are mapped, 'company' is mapped to 'Company A', 'market situation' is mapped to 'EV chasm', and 'field' is mapped to 'solid electrolyte' and 'lithium metal'.
[0150] However, not limited to the present embodiment, the first subquery may include various contents by mapping, and the document extraction unit (420) may generate the first subquery using various algorithms included in the language model based on input values.
[0151] As another example, the document extraction unit (420) can generate two or more second subqueries by decomposing and correcting the query included in the input value when the processing method corresponds to a complex processing method.
[0152] The operation of generating two or more second subqueries of the present disclosure further includes an operation of decomposing the query in the operation of generating the first subquery. That is, since the operation of generating the second subquery is similar in operation, function, and control method to the operation of generating the first subquery and the operation of decomposing the query, the present embodiment focuses on the operation of decomposing the query.
[0153] In this case, the explanation is provided using the input value "Tell me about the strategies Company A has chosen in fields such as solid electrolytes and lithium metal amidst the electric vehicle market chasm, and its competitiveness in light of recent global trends."
[0154] For example, the operation of decomposing a query may include the operation of extracting key elements and question intent, and the operation of decomposing the query by analyzing dependencies.
[0155] In this case, the operation of extracting key elements and question intent can be implemented by an Attention Mechanism included in the language model, and key elements and question intent can be extracted from the query included in the input value.
[0156] For example, the subject is "Company A," the objects are "solid electrolytes" and "lithium metals," the situation is the "chasm in the electric vehicle market," and the intent of the question is "Company A's strategy" and "competitiveness in light of global circumstances."
[0157] In addition, the operation of decomposing a query by analyzing dependencies can be implemented using the Attention Mechanism included in the language model.
[0158] For example, "strategy" can be linked to "solid electrolytes" and "lithium metals," and "competitiveness" can be linked to "recent global circumstances."
[0159] That is, the language model can decompose the first decomposition data, "strategy," "solid electrolyte," and "lithium metal," and the second decomposition data, "competitiveness" and "recent global situation," using the query included in the input value.
[0160] Subsequently, as described above, each query pattern can be matched using a Rule-based Pattern Matching algorithm included in the language model based on the respective decomposed first and second decomposed data, core elements, and question intent.
[0161] In addition, the language model can generate two or more second subqueries by inputting the result values of each query pattern matched into the Seq2Seq algorithm.
[0162] For example, the query included in the input value can be converted into a text format, such as "Company A's strategy regarding solid electrolytes and lithium metal" and "Company A's competitiveness in light of global conditions".
[0163] In this case, the query converted into text format can be generated into two or more second subqueries in the form of a query (SQL) that can be entered into a database.
[0164] For example, a second subquery can map 'goal' to 'strategy', 'company' to 'Company A', 'market_situation' to 'EV chasm', and 'field' to 'solid electrolyte' and 'lithium metal'.
[0165] Another second subquery can map 'goal' to 'competitiveness', 'company' to 'Company A', and 'recent trends' to 'global situation'.
[0166] However, not limited to the present embodiment, the second subquery may include various contents by mapping, and the document extraction unit (420) may generate the second subquery using various algorithms included in the language model based on the input value.
[0167] As another example, the document extraction unit (420) can select a database according to the characteristics of the subquery and extract documents using the selected database.
[0168] For example, if the subquery is the first subquery, it is explained using the example described above. According to the example described above, the first subquery may include content where 'goal' and 'strategy' are mapped, 'company' is mapped to 'Company A', 'market situation' is mapped to the 'electric vehicle chasm', and 'field' is mapped to 'solid electrolyte' and 'lithium metal'.
[0169] For example, the document extraction unit (420) can select any one of 'goal', 'company', and 'market situation' included in the first subquery as a feature of the first subquery, and a database can be selected according to the feature of the first subquery.
[0170] In this case, if the characteristic of the first subquery corresponds to either 'goal' or 'market conditions', a database from which news can be extracted may be selected. As another example, if the characteristic of the first subquery is 'company', the internal knowledge database of 'Company A' may be selected.
[0171] As another example, the document extraction unit (420) can further determine the dependency relationship of each of the features of two or more second subqueries to extract documents simultaneously or in order.
[0172] For example, dependency relationships can be determined by analyzing the structure of two or more second subqueries using an Attention Mechanism included in the language model, understanding the context of the second subqueries and learning the relationships using Embeddings included in the language model, and calculating similarity using a Cosine similarity determination algorithm included in the language model. Through this, it is possible to determine how related two or more second subqueries are.
[0173] As another example, if the document extraction unit (420) determines that there is no dependency relationship, it can select a database according to the characteristics of each of two or more second subqueries and extract documents simultaneously.
[0174] That is, if it is determined that each of two or more second subqueries is not related to one another, in order to reduce the document extraction time, the document extraction unit (420) can select the characteristics of each of one or more second subqueries, select each database, and extract documents simultaneously.
[0175] As another example, the document extraction unit (420) can, when it is determined that there is a dependency relationship, select a database according to the characteristics of each of two or more second subqueries, determine the order according to the dependency relationship, and extract documents according to the order.
[0176] That is, when it is determined that each of two or more second subqueries is related to one another, in order to increase the completeness of the resulting data set, the document extraction unit (420) selects a database according to the characteristics of each of the two or more second subqueries, evaluates the dependency relationship for each, determines the second subquery with a high dependency relationship as the priority, determines the ranking as the dependency relationship decreases as the ranking decreases, and extracts documents according to the ranking.
[0177] However, not limited to the present embodiment, the document extraction unit (420) may select or extract features of the subquery in various ways, and various methods for extracting documents may be used.
[0178] In addition, the document extraction unit (420) is not limited to the present embodiment, and various algorithms, models, methods, etc. capable of determining dependency relationships included in the language model may be used.
[0179] Meanwhile, the data set generation unit (430) can generate a data set by mapping documents and subqueries.
[0180] For example, the data set generation unit (430) can generate a data set in text format by mapping the extracted document and the first subquery when the processing method determination unit (410) determines that it is a simple processing method.
[0181] For example, a dataset can be created by mapping the first subquery to the entire extracted document.
[0182] As another example, a dataset can be created in the order of the first subquery, the title of the extracted document, and the body of the extracted document.
[0183] However, the mapping method of the data set can be configured in various ways, not limited to the present embodiment.
[0184] In another example, the data set generation unit (430) can determine that the processing method determination unit (410) determines that the processing method is a composite processing method and determines that there is no dependency relationship in the document extraction unit (420), map the simultaneously extracted documents to two or more second subqueries corresponding to each document, and integrate the mapped second subqueries and documents to create a data set.
[0185] For example, a dataset can be generated by mapping each second subquery to the title of the extracted document and the body of the extracted document.
[0186] As another example, the dataset can be generated by mapping each second subquery to the title of the extracted document and the date of the extracted document.
[0187] However, the mapping method of the data set can be configured in various ways, not limited to the present embodiment.
[0188] As another example, the data set generation unit (430) can generate a data set of the Nth order based on documents extracted in order and a second subquery of the Nth order (where N is a natural number) when the processing method determination unit (410) determines that it is a composite processing method and the document extraction unit (420) determines that there is a dependency relationship.
[0189] In this case, the document extraction unit (420) can integrate the data set of the Nth order and the data set of the N+1th order to select features again, select a database again according to the features, and extract documents. Then, the data set generation unit (430) can generate the data set of the N+1th order based on the extracted documents and the second subquery of the N+1th order.
[0190] This process is repeated until the last step, and finally, a single data set can be generated.
[0191] Additionally, the dataset can be generated by mapping each second subquery to the title of the extracted document and the body of the extracted document. Alternatively, the dataset can be generated by mapping each second subquery to the title of the extracted document and the date of the extracted document.
[0192] However, the mapping method of the data set can be configured in various ways, not limited to the present embodiment.
[0193] The embodiments described above may be implemented within a computer system, for example, on a computer-readable recording medium. The computer system of the execution plan generation device may include at least one element among one or more processors, memory, storage, user interface inputs, and user interface outputs, and these may communicate with each other via a bus. Additionally, the computer system may also include a network interface for connecting to a network. The processor may be a CPU or a semiconductor device that executes processing instructions stored in memory and / or storage. The memory and storage may include various types of volatile / non-volatile storage media. For example, the memory may include ROM and RAM.
[0194] The foregoing description of the present invention is for illustrative purposes only, and those skilled in the art will understand that other specific forms can be easily modified without altering the technical spirit or essential features of the present invention. Therefore, the embodiments described above should be understood as illustrative in all respects and not restrictive. For example, each component described as a single unit may be implemented in a distributed manner, and components described as distributed may likewise be implemented in a combined form.
[0195] The scope of the present invention is defined by the claims set forth below rather than by the detailed description above, and all modifications or variations derived from the meaning and scope of the claims and equivalent concepts thereof should be interpreted as being included within the scope of the present invention.
[0196] The foregoing description is merely an illustrative explanation of the technical concept of the present disclosure, and those skilled in the art to which the present disclosure pertains may make various modifications and variations within the scope of the essential characteristics of the technical concept. Furthermore, since these embodiments are intended to explain, not limit, the scope of the technical concept is not limited by these embodiments. The scope of protection of the present disclosure shall be interpreted by the claims below, and all technical concepts within an equivalent scope shall be interpreted as being included within the scope of rights of the present disclosure.
[0197]
[0198] CROSS-REFERENCE TO RELATED APPLICATION
[0199] This patent application claims priority pursuant to Section 119(a) of the U.S. Patent Act (35 USC § 119(a)) to Korean Patent Application No. 10-2024-0191037 filed on December 19, 2024, all of which are incorporated by reference into this patent application. Furthermore, this patent application claims priority in countries other than the United States for the same reasons as above, all of which are incorporated by reference into this patent application.
Claims
1. A processing method determination step that evaluates the complexity of a query included in an input value and determines a processing method according to the complexity of the query; A document extraction step for generating a subquery using the input value according to the above processing method, and extracting a document using a selected database according to the characteristics of the subquery; and A method for generating an execution plan that includes a dataset generation step of generating a dataset by mapping the above document and the above subquery.
2. In Paragraph 1, The complexity of the above query is, A method for generating an execution plan evaluated using at least one of the conjunctions, symbols, and interrogative words included in the above query.
3. In Paragraph 2, The above processing method determination step is, If the complexity of the above query is evaluated as simple, the above processing method is determined as a simple processing method, and A method for generating an execution plan in which, when the complexity of the above query is evaluated as not simple, the above processing method is determined as a complex processing method.
4. In Paragraph 1, The above document extraction step is, If the above processing method corresponds to a simple processing method, A method for generating an execution plan that generates a first subquery by correcting the query included in the above input value.
5. In Paragraph 1, The above document extraction step is, If the above processing method corresponds to a complex processing method, A method for generating an execution plan that generates two or more second subqueries by decomposing and correcting a query included in the above input value.
6. In Paragraph 5, The above document extraction step is, A method for generating an execution plan that extracts the documents simultaneously or in order by further determining the dependency relationship of each of the features of the two or more second subqueries mentioned above.
7. In Paragraph 6, The above document extraction step is, If it is determined that the above dependency relationship does not exist, A method for generating an execution plan that selects the databases according to the characteristics of each of the two or more second subqueries and simultaneously extracts the documents.
8. In Paragraph 7, The above data set generation step is, The above simultaneously extracted documents are each mapped to the above two or more second subqueries corresponding to each document, and A method for generating an execution plan that generates the data set by integrating the mapped second subquery and the document.
9. In Paragraph 6, The above document extraction step is, If it is determined that the above dependency relationship exists, Select the databases respectively according to the characteristics of each of the two or more second subqueries above, A method for generating an execution plan that determines the order according to the above dependency relationship and extracts the above documents according to the above order.
10. A processing method determining unit that evaluates the complexity of a query included in an input value and determines a processing method according to the complexity of the query; A document extraction unit that generates a subquery using the input value according to the above processing method and extracts a document using a selected database according to the characteristics of the subquery; and An execution plan generating device comprising a data set generating unit that generates a data set by mapping the above document and the above subquery.
11. In Paragraph 10, The complexity of the above query is, An execution plan generating device evaluated using at least one of the conjunction, symbol, and interrogative word included in the above query.
12. In Paragraph 11, The above processing method determining unit is, If the complexity of the above query is evaluated as simple, the above processing method is determined as a simple processing method, and An execution plan generation device that determines the processing method as a complex processing method when the complexity of the above query is evaluated as not simple.
13. In Paragraph 10, The above document extraction unit is, If the above processing method corresponds to a simple processing method, An execution plan generating device that generates a first subquery by correcting the query included in the above input value.
14. In Paragraph 10, The above document extraction unit is, If the above processing method corresponds to a complex processing method, An execution plan generating device that generates two or more second subqueries by decomposing and correcting a query included in the above input value.
15. In Paragraph 14, The above document extraction unit is, An execution plan generating device that further determines the dependency relationship of each of the features of the two or more second subqueries above to extract the documents simultaneously or in order.
16. In Paragraph 15, The above document extraction unit is, If it is determined that the above dependency relationship does not exist, An execution plan generating device that selects the databases respectively according to the characteristics of each of the two or more second subqueries and simultaneously extracts the documents.
17. In Paragraph 16, The above data set generation unit is, The above simultaneously extracted documents are each mapped to the above two or more second subqueries corresponding to each document, and An execution plan generation device that generates the data set by integrating the mapped second subquery and the document.
18. In Paragraph 15, The above document extraction unit is, If it is determined that the above dependency relationship exists, Select the databases respectively according to the characteristics of each of the two or more second subqueries above, An execution plan generating device that determines the order according to the above dependency relationship and extracts the document according to the above order.