Information processing device, information processing method, and program

What is AI technical title?
AI technical title is built by Patsnap AI team. It summarizes the technical point description of the patent document.
The information processing device and method automate the generation and evaluation of QRA datasets for RAG systems, addressing the inefficiencies of existing methods by ensuring quality and diversity, thereby enhancing the reliability of RAG evaluations.

WO2026126401A1PCT designated stage Publication Date: 2026-06-18NTT DOCOMO INC

Patent Information

Authority / Receiving Office: WO · WO
Patent Type: Applications
Current Assignee / Owner: NTT DOCOMO INC
Filing Date: 2024-12-11
Publication Date: 2026-06-18

AI Technical Summary

Technical Problem

Existing methods for generating QRA datasets for evaluating Retrieval Augmented Generation (RAG) systems are time-consuming and labor-intensive, and the appropriateness of the datasets is not adequately evaluated, which affects the reliability of RAG evaluation results.

Method used

An information processing device and method that includes a document receiving unit, QRA generation unit, and QRA evaluation unit to automatically generate, evaluate, and control the storage of QRA datasets, ensuring diversity and comprehensiveness by randomly selecting document sections as references and using AI models to create questions and answers, with evaluation based on methods like RAGAS.

Benefits of technology

Facilitates the generation of diverse and comprehensive QRA datasets efficiently, improving the reliability of RAG evaluation results by ensuring the quality and appropriateness of the generated datasets.

✦ Generated by Eureka AI based on patent content.

Smart Images

Figure JP2024043875_18062026_PF_FP_ABST

Patent Text Reader

Abstract

An information processing device according to one aspect of the present disclosure comprises: a document reception unit that receives input data; a QRA generation unit that generates a set of questions, references, and answers (QRA dataset) on the basis of the input data; a QRA evaluation unit that evaluates the QRA dataset generated by the QRA generation unit; and a control unit that controls the accumulation of the QRA dataset according to the result of the evaluation by the QRA evaluation unit.

Need to check novelty before this filing date? Find Prior Art

Description

Information Processing Apparatus, Information Processing Method, and Program

[0001] The present disclosure relates to an information processing apparatus, an information processing method, and a program. In particular, it relates to the generation of a QRA dataset used for the evaluation of so-called Retrieval Augmented Generation (RAG) that combines information retrieval and a generative AI model (Artificial Intelligence).

[0002] With RAG, information related to a question (query) is retrieved and natural sentences or answers are generated using the retrieved information. In the operation of RAG within a company, it is important from the perspective of optimizing the resources of the entire organization for companies using RAG to be able to make their own decisions on introducing RAG. That is, it is important to be able to determine whether the RAG under consideration is a valuable RAG to introduce or a worthless RAG to introduce.

[0003] Currently, when making a decision on introducing RAG, the introduction of an accuracy evaluation assistance tool is being considered. By preparing an evaluation dataset such as a document to be searched and a QRA dataset collection related to the document (Question: question, Reference: reference location, Answer: ideal answer), and inputting the QRA dataset collection into the accuracy evaluation assistance tool, an evaluation index by a RAG evaluation index, for example, RAGAS (RAG Assessment), is output. Based on this evaluation index, it is possible to confirm whether the content generation system using generative AI, for example, the answer generated by RAG, has sufficient accuracy. Therefore, it is possible to easily determine whether to use the RAG being considered (introduced).

[0004] Japanese Patent No. 7530134, Japanese Patent No. 7452623

[0005] As a system for verifying the output of a content generation system using generative AI, it has been proposed to check whether the output content is a fact (for example, refer to Patent Document 1). Patent Document 1 does not describe how to generate the information necessary for verifying the output of the content generation system.

[0006] A system has been proposed that outputs evidence from questions and text, and outputs answers from evidence and questions (see, for example, Patent Document 2). Patent Document 2 is a device that learns using sets of questions, evidence, and answers, and does not generate a QRA dataset for verifying the output of a content generation system. In addition, it requires input of reference text P and question Q, and the appropriateness of the generated QRA dataset is not evaluated.

[0007] To improve the reliability of RAG evaluation results, a diverse and comprehensive collection of QRA datasets (multiple QRA datasets) is necessary. However, creating a diverse and comprehensive collection of QRA datasets is time-consuming and labor-intensive. Since using an inappropriate collection of QRA datasets will not improve the reliability of RAG evaluation results, it is also necessary to determine whether the generated QRA datasets are appropriate.

[0008] An information processing device in one aspect of the present disclosure includes: a document receiving unit that receives input data; a QRA generation unit that generates a set of questions, references, and answers (QRA dataset) based on the input data; a QRA evaluation unit that evaluates the QRA dataset generated by the QRA generation unit; and a control unit that controls the storage of the QRA dataset according to the evaluation results of the QRA evaluation unit.

[0009] An information processing method in one aspect of the present disclosure includes an information processing device that receives input data, generates a QRA dataset based on the input data, evaluates the QRA dataset generated by the QRA generation unit, and controls the storage of the QRA dataset according to the evaluation result in the QRA evaluation unit.

[0010] A program in one aspect of this disclosure causes a computer to receive input data, generate a QRA dataset based on the input data, evaluate the QRA dataset generated by the QRA generation unit, and control the storage of the QRA dataset according to the evaluation results in the QRA evaluation unit.

[0011] According to this disclosure, by generating QRA datasets and evaluating the generated QRA datasets, it is possible to generate a diverse and comprehensive collection of QRA datasets.

[0012] Diagram showing the overall system configuration Diagram showing the configuration of the information processing device Functional block diagram of the information processing device Diagram showing an example of a QRA dataset Diagram showing a flowchart executed by the information processing device Example of prompts to input to the AI model when generating questions and answers using the AI model

[0013] The embodiments of this disclosure will be described in detail below with reference to the drawings as appropriate.

[0014] (Embodiment 1) Figure 1 shows the configuration of the entire system 100. The system 100 includes an input / output terminal 110, an information processing device 120, a knowledge database (DB) 130, and an LLM (Large Language Model) 140.

[0015] The input / output terminal 110 is a device that receives input information (e.g., a document) from a user and outputs a response to that input information (e.g., a QRA dataset corresponding to the document).

[0016] The information processing device 120 is a device that generates text to be input to a learning device, such as an LLM 140, based on input information entered by the user. For example, the information processing device 120 obtains an answer A from a QA dataset in response to a question Q entered by the user, and generates content based on the obtained answer A.

[0017] Knowledge DB130 consists of storage devices such as hard disk drives (HDDs) and solid state drives (SSDs).

[0018] LLM140 is a language model built using deep learning. It outputs responses in response to prompts input to LLM140.

[0019] Figure 2 shows the configuration of the information processing device 120. The information processing device 120 includes a processor 121, an input device 122, and an output device 123. In addition to the devices shown in Figure 2, it may also include a communication device, memory, and storage. Each device, such as the processor 121 and the input device 122, is connected by a bus 124 for communicating information. The bus 124 may be configured using a single bus, or different buses may be configured for each device.

[0020] The processor 121 is composed of a computer including a CPU (Central Processing Unit) and a GPU (Graphics Processing Unit). The processor 121 receives input information from the input device 122 and analyzes the input information. The processor 121 searches the knowledge database 130 for the input information. Based on the search results obtained from the knowledge database 130, the processor 121 generates a prompt to be input to the LLM 140. The processor 121 outputs the generated prompt to the LLM 140 via the output device 123. Furthermore, when the processor 121 receives information generated by the LLM 140 from the LLM 140, it generates a user response based on the information generated by the LLM 140 and outputs it to the input / output terminal 110.

[0021] The input device 122 may be an input device that receives input information from an external source (e.g., a keyboard, mouse, microphone, switch, button, sensor, etc.), or it may be an interface or communication device that receives data from the input / output terminal 110. The input device 122 may be connected to the knowledge DB 130 and LLM 140.

[0022] The output device 123 may be an output device that outputs user responses to the outside (e.g., a display, speaker, LED lamp, etc.), or it may be an interface or communication device that transmits data to the input / output terminal 110. The output device 123 may be connected to the knowledge DB 130 and LLM 140.

[0023] The input device 122 and the output device 123 may be configured as an integrated unit (for example, a touch panel), or the communication device may be the input device 122 and / or the output device 123.

[0024] Memory is a computer-readable recording medium and may consist of at least one of the following: a non-temporary computer-readable recording medium such as ROM (Read Only Memory), EPROM (Erasable Programmable ROM), EEPROM (Electrically Erasable Programmable ROM), or RAM (Random Access Memory). Memory may also be called a register, cache, or main memory. Memory can store executable programs (program code), software modules, etc., for implementing a wireless communication method according to one embodiment of this disclosure.

[0025] Storage is a non-transient, computer-readable recording medium and may consist of at least one of the following: an optical disc such as a CD-ROM (Compact Disc ROM), a hard disk drive, a flexible disk, a magneto-optical disk (e.g., compact disk, digital versatile disk, Blu-ray® disk), a smart card, flash memory (e.g., card, stick, key drive), a floppy® disk, a magnetic strip, etc. Storage may also be called an auxiliary storage device. The above-mentioned storage medium may be, for example, a database, server, or other suitable medium that includes at least one of memory and storage.

[0026] Communication equipment is hardware (input / output devices) for communicating between computers via at least one of a wired network and a wireless network, and is also called a network device, network controller, network card, communication module, etc.

[0027] Furthermore, the information processing device 120 may be configured to include hardware such as a microprocessor, a digital signal processor (DSP), an ASIC (Application Specific Integrated Circuit), a PLD (Programmable Logic Device), and an FPGA (Field Programmable Gate Array), and some or all of each functional block may be realized by such hardware. For example, the processor 121 may be implemented using at least one of these hardware components.

[0028] Figure 3 shows a functional block diagram of the processor 121. The processor 121 includes a document receiving unit 125, a QRA generation unit 126, a QRA evaluation unit 127, and a storage control unit 128.

[0029] The document reception unit 125 receives documents from the input device 122. The document reception unit 125 may receive multiple documents from the input device 122. Hereinafter, one or more documents input from the input device 122 will be referred to as a document group. In other words, a document group consists of one or more documents. The document reception unit 125 may receive a document group as a result of searching the knowledge DB 130 based on information input from the input / output terminal 110. The document group received by the document reception unit 125 may be a group of documents in text format. The document reception unit 125 may receive the document group in text format by having the text conversion unit (not shown) of the processor 121 convert the document group into text using character recognition means such as OCR (Optical Character Recognition) and transmit it to the document reception unit 125.

[0030] The QRA generation unit 126 generates a QRA dataset for the group of documents received by the document reception unit 125. The QRA dataset consists of a set of question Q, reference R, and answer A. The QRA generation unit 126 may also generate the dataset using the LLM 140 in response to the group of documents received by the document reception unit 125.

[0031] The QRA evaluation unit 127 evaluates the QRA dataset generated by the QRA generation unit 126. The QRA evaluation unit may evaluate the relationship between question Q and reference R (whether it is an appropriate reference location for the question), the relationship between reference R and answer A (whether it generates the correct answer from the reference location), and the relationship between question Q and answer A (whether it provides the correct answer for the question). The QRA evaluation unit 127 may evaluate the QRA dataset using a known method, such as RAGAS, or it may evaluate it using a rule-based method. The QRA evaluation unit 127 may evaluate the QRA dataset by evaluation values, or by class (e.g., "acceptable / unacceptable" or "good / average / bad"). The QRA evaluation unit 127 may evaluate the QRA dataset by having the AI compare the QRA dataset generated by the QRA generation unit 126 with example QRA datasets for each class. The QRA evaluation unit 127 may use an AI that employs a learning model that has learned the relationship between the example QRA dataset and the evaluation value to calculate the evaluation value of the QRA dataset generated by the QRA generation unit 126.

[0032] The storage control unit 128 controls the storage of data to the knowledge database 130. The storage control unit 128 may also determine whether the QRA dataset evaluated by the QRA evaluation unit 127 is appropriate. The storage control unit 128 may also determine whether the number of QRA datasets deemed appropriate by the QRA evaluation unit 127 is sufficient.

[0033] Figure 4 shows an example of a QRA dataset. The QRA dataset is generated by the QRA generation unit 126.

[0034] R is a reference. Reference R may be extracted from a set of documents. Reference R may be extracted by randomly selecting from a set of documents divided into paragraphs or sections.

[0035] Q is a question. Question Q may be generated using an AI model based on reference R.

[0036] A is the answer. Answer A is the ideal answer to question Q. Answer A may be generated based on question Q and reference R. Answer A may also be generated using an AI model such as LLM140.

[0037] Figure 5 shows a flowchart of the actions performed by the processor 121. The processor 121 determines whether or not a group of documents has been input (step S501). Step S501 is performed by the document receiving unit 125. The group of documents input in step S501 may be a group of documents that have been converted into text. The processor 121 may convert the group of documents into text, and the document receiving unit 125 may receive the group of documents that have been converted into text. The documents input in step S501 may be documents (a group of documents converted into text) retrieved from the knowledge DB 130 based on information input from the input / output terminal 110.

[0038] If no document has been entered (step S501, No), the processor 121 returns to step S501 and waits for the document to be entered.

[0039] When a document is input to the processor 121 (step S501, Yes), it extracts reference Rs (step S502). Step S502 is performed in the QRA generation unit 126. The processor 121 may extract reference Rs based on the document input in step S501. The processor 121 may also extract reference Rs by dividing the input document into a predetermined number of paragraphs, sections, pages, sentences, lines, or characters (hereinafter referred to as "split units") and randomly selecting one split unit from a plurality of split units. By using the selected split unit as a reference R, reference Rs can be easily extracted. Since various parts of the document are selected as reference Rs by randomly selecting the split units, a QRA dataset with less bias is generated. Reference Rs can also be extracted by changing the split units.

[0040] For example, the number of sentences that make up a division unit may be fixed, such as four, or it may have a range, such as three to five. Multiple sentences whose total number of characters is closest to a predetermined number may be used as a division unit, or sentences whose total number of characters exceeds the predetermined number may also be used as a division unit. For example, consider a case where the first sentence has 100 characters, the second sentence has 90 characters, the third sentence has 90 characters, and the fourth sentence has 80 characters. The first to third sentences have 280 characters, and the first to fourth sentences have 360 characters. Therefore, if the predetermined number of characters is 300, the "first to third sentences" which are closest to 300 characters may be used as a division unit, and the fourth sentence may be used as a separate division unit, or the "first to fourth sentences" which exceed 300 characters may be used as a division unit. The number of sentences and the total number of characters that make up a division unit may be changed. The same applies to the number of paragraphs, sections, and pages that make up a division unit.

[0041] The processor 121 generates a question Q and an answer A for the reference R (step S503). Step S503 is performed in the QRA generation unit 126. The processor 121 may use an AI model such as LLM 140 to generate the question Q and answer A. Figure 6 shows an example of prompts to be input to the AI model when generating the question Q and answer A using the AI model. The AI model used by the processor 121 when generating the question Q and answer A is a different AI model from the AI model being evaluated.

[0042] Processor 121 may generate question Q based on reference R. For example, processor 121 generates question Q about technical terms contained in reference R. The same question Q may be generated based on different reference R. Processor 121 may generate multiple question Q based on one reference R. Processor 121 may generate answer A based on the generated question Q and extracted reference R (i.e., pairs of question Q and reference R). Processor 121 may generate one or more answer A for one pair of question Q and reference R. By having processor 121 generate question Q and answer A for reference R to generate a QRA dataset, the difficulty of generating the QRA dataset is reduced, and the number of tokens used when generating the QRA dataset can be reduced. Since a collection of QRA datasets is automatically generated simply by inputting documents, time and effort are not required to create a diverse and comprehensive collection of QRA datasets.

[0043] The processor 121 evaluates whether the QRA dataset generated in steps S502 and S503 is appropriate (step S504). Step S504 is performed by the QRA evaluation unit 127. The processor 121 may evaluate the QRA dataset using a known evaluation method, such as RAGAS. The QRA evaluation unit 127 may evaluate the quality of the QRA dataset by evaluation value or class. The classes may be, for example, "acceptable / loadable" (2 classes) or "good / average / bad" (3 classes), or there may be 4 or more classes. The QRA evaluation unit 127 may determine the class as "good" if the evaluation value is above a first threshold, "average" if the evaluation value is below the first threshold but above a second threshold, and "bad" if the evaluation value is below the second threshold. The QRA evaluation unit 127 may calculate the evaluation value using AI that uses a learning model that has learned the relationship between an example QRA dataset and the evaluation value. The QRA evaluation unit 127 may calculate a class using an AI that has been trained on examples of correspondence between QRA datasets and classes. The QRA evaluation unit 127 may determine whether the QRA dataset generated by the QRA generation unit 126 is appropriate by comparing the evaluation value with a threshold, or it may determine whether the QRA dataset generated by the QRA generation unit 126 is appropriate based on the class. The QRA evaluation unit 127 may determine that the QRA dataset is appropriate if the class is "acceptable," "good," or "average," and that the QRA dataset is inappropriate if the class is "unacceptable" or "bad."

[0044] If the generated QRA dataset is not an appropriate QRA dataset (step S504, No), the processor 121 discards the inappropriate QRA dataset, and the flow returns to step S502 to generate question Q again. Instead of the flow returning to step S502, the processor 121 may return to step S503 and only regenerate question Q and answer A for the original reference R. Since the QRA dataset is generated based on the reference R randomly selected from the division units, there is a possibility that an inappropriate QRA set is generated in step S503. However, in step S504, since the QRA evaluation unit 127 evaluates whether the generated QRA dataset is appropriate, the inappropriate QRA dataset is discarded.

[0045] If the generated QRA dataset is an appropriate QRA dataset (step S504, Yes), the processor 121 determines whether the number of generated QRA datasets, that is, the number of QRA datasets constituting the QRA dataset collection, is sufficient (step S505). Step S505 is executed in the accumulation control unit 128. The processor 121 may store the appropriate QRA dataset in a storage device such as the knowledge DB 130. The processor 121 may determine that it is sufficient when the number of generated QRA datasets is equal to or greater than a threshold value (predetermined number), and determine that it is insufficient when it is less than the threshold value.

[0046] If the number of generated QRA datasets is not sufficient (step S505, No), the processor 121 returns to step S502 to generate a new question Q.

[0047] If the number of generated QRA datasets is sufficient (step S505, Yes), the processor 121 ends the process.

[0048] According to the present disclosure, the processor 121 can generate an appropriate QRA dataset collection. The processor 121 may evaluate the RAG based on the document group and the QRA dataset collection. The document group and the QRA dataset collection may be stored in the knowledge DB 130. By evaluating the RAG based on a QRA dataset collection composed of a predetermined number or more of appropriate QRA datasets, it is possible to check whether the answer generated by the RAG has sufficient accuracy for any RAG, including document search. Therefore, it is possible to easily determine whether the RAG can be used.

[0049] <Modified Example> The QRA generation unit 126 may generate a question Q from the document group, search for a document related to the question Q from the document group, extract a reference R, and generate an answer A based on the generated question Q and the searched reference R. The question Q may be generated in advance. By generating the question Q in advance, it is possible to generate a QRA dataset for the question Q that must be prepared in advance.

[0050] The accumulation control unit 128 may select a predetermined number of the generated QRA datasets with the highest evaluation values and store them in a storage device such as the knowledge DB 130. By selecting the ones with the highest evaluation values, it is possible to obtain a QRA dataset collection composed of more appropriate QRA datasets.

[0051] When the QRA evaluation unit 127 evaluates a QRA dataset, it may consider the relationship with the QRA datasets that have already been generated. The QRA evaluation unit 127 may evaluate whether the question Q of the QRA dataset that has already been generated is similar to the question Q of the QRA dataset generated by the QRA generation unit 126, and whether the generated question Q is biased. Whether the generated question Q is similar or biased can be evaluated by text similarity or clustering analysis. By evaluating the RAG based on a QRA dataset collection with less bias in the question Q, it is expected that the reliability of the evaluation result will be improved.

[0052] The QRA generation unit 126 may shorten the generated question Q if it is too long. The QRA generation unit 126 may also generate an answer A that does not include the question Q. By modifying the generated question Q and answer A, an easy-to-understand QRA dataset can be generated.

[0053] If the number of division units is small compared to the number of QRA datasets required to construct the QRA dataset collection, the QRA generation unit 126 may select the division units sequentially rather than randomly. When the number of division units is small, even if the division units are selected sequentially as reference Rs, the QRA datasets generated for the document group will not be biased.

[0054] In the embodiments described above, the notation for each component may be replaced with other notations such as "circuitry," "assembly," "device," "unit," or "module."

[0055] Furthermore, this disclosure can be implemented in software, hardware, or software in conjunction with hardware. Each functional block used in the description of the above embodiments may be implemented in part or in whole as an integrated circuit (LSI), and each process described in the above embodiments may be controlled in part or in whole by a single LSI or a combination of LSIs. An LSI may consist of individual chips, or it may consist of a single chip that includes some or all of the functional blocks. An LSI may have data inputs and outputs. Depending on the degree of integration, LSIs may also be referred to as ICs, system LSIs, super LSIs, or ultra LSIs.

[0056] The integrated circuit implementation method is not limited to LSIs; it may also be implemented using dedicated circuits, general-purpose processors, or dedicated processors. Furthermore, a Field Programmable Gate Array (FPGA) that can be programmed after LSI manufacturing, or a reconfigurable processor that allows for the reconfiguration of the connections and settings of circuit cells within the LSI, may also be used. This disclosure may be implemented as digital or analog processing.

[0057] Furthermore, if advancements in semiconductor technology or other derived technologies lead to the emergence of integrated circuit technologies that can replace LSIs, then naturally, it would be possible to use those technologies to integrate functional blocks. The application of biotechnology, for example, is a possibility.

[0058] This disclosure is useful for information processing devices that create diverse and comprehensive QRA datasets.

[0059] 100 System 110 Input / Output Terminal 120 Information Processing Device 121 Processor 122 Input Device 123 Output Device 124 Bus 125 Document Receiving Unit 126 QRA Generation Unit 127 QRA Evaluation Unit 128 Storage Control Unit 130 Knowledge DB 140 LLM

Claims

1. An information processing device comprising: a document receiving unit that receives a set of documents in text format; a QRA generation unit that generates a set of questions, references, and answers (QRA dataset) based on the set of documents; a QRA evaluation unit that evaluates the QRA dataset generated by the QRA generation unit; and a control unit that controls the storage of the QRA dataset according to the evaluation results from the QRA evaluation unit.

2. An information processing apparatus according to claim 1, which evaluates RAG using the accumulated QRA dataset.

3. The information processing apparatus according to claim 1, wherein the QRA generation unit generates the QRA dataset using an AI model.

4. The information processing apparatus according to claim 3, wherein the QRA generation unit generates one question for one reference and generates one or more answers for one reference and one question.

5. The information processing apparatus according to claim 1, wherein the QRA generation unit extracts the references from the document group, generates the questions based on the references, and generates the answers based on the questions and the references.

6. The information processing apparatus according to claim 5, wherein the QRA generation unit divides the document group into division units and extracts the references by randomly selecting the division units.

7. The information processing apparatus according to claim 6, wherein the division unit is a predetermined number of paragraphs, sections, pages, sentences, lines, or characters.

8. The information processing apparatus according to claim 1, wherein the QRA evaluation unit evaluates the quality of the QRA dataset generated by the QRA generation unit.

9. The information processing apparatus according to claim 8, wherein the QRA evaluation unit evaluates the quality of the QRA dataset by an evaluation value or class.

10. The information processing apparatus according to claim 9, wherein the QRA evaluation unit evaluates the quality of the QRA dataset using an AI model.

11. The information processing device according to claim 8, wherein the information processing device discards the QRA dataset that has been determined to be inappropriate by the QRA evaluation unit.

12. An information processing method comprising: an information processing device receiving a set of text-formatted documents; generating a QRA dataset based on the set of text-formatted documents; evaluating the generated QRA dataset; and controlling the storage of the QRA dataset according to the evaluation result.

13. A program for causing a computer to receive a set of text-formatted documents, generate a QRA dataset based on the set of text-formatted documents, evaluate the generated QRA dataset, and control the storage of the QRA dataset according to the evaluation results.