Large language model system

What is AI technical title?
AI technical title is built by Patsnap AI team. It summarizes the technical point description of the patent document.
The RAG approach addresses the inefficiencies and inaccuracies of large language models by using a separate database to retrieve and add context to questions, enhancing responsiveness and reducing costs, ensuring accurate and timely information integration across domains.

WO2026140992A1PCT designated stage Publication Date: 2026-07-02INST OF MEDICAL INFORMATION TECH CO LTD

View PDF 0 Cites 0 Cited by

Patent Information

Authority / Receiving Office: WO · WO
Patent Type: Applications
Current Assignee / Owner: INST OF MEDICAL INFORMATION TECH CO LTD
Filing Date: 2025-12-15
Publication Date: 2026-07-02

Application Information

Patent Timeline

15 Dec 2025

Application

02 Jul 2026

Publication

WO2026140992A1

IPC: G06N3/0475; G06F16/332; G06N3/10

AI Tagging

Technology Topics

Linguistic model Background information

Explore More Agents

Novelty Search
Search existing technologies and assess novelty
↗
FTO
Analyze whether a product may infringe others' patents
↗
Design FTO
Check prior-design risk for exterior design
↗
Drafting
Draft patent application text based on a technical solution
↗
Find Solutions with TRIZ
Generate feasible solution to solve your technical challenge
↗

Similar Technology Patents

Bloom cognitive level constraint-based achievement-oriented education diagnosis method and system
CN122453570ALinguistic model Algorithm
Systems and methods for training a multi-modal language model with reasoning
US20260148541A1Character and pattern recognition Linguistic modelModal language
Method and device for evaluating quality of activities of adolescents based on LLM, and storage medium
CN122332558AEvaluation result Linguistic model
A large model-based exclusive team performance portrait generation and intelligent evaluation system
CN122264618Aobjective evaluationComprehensive quantitative evaluationData processing applications Inference methods Linguistic model Data acquisition
Open-vocabulary segmentation method and system with multi-modal model representation optimization
CN118823350BPattern recognition Visual technology

Get free access to AI patent search and analysis

Check patentability, review prior art and ask IP Agent with full patent context.

AI Technical Summary

Technical Problem

Large language models face challenges in efficiently incorporating up-to-date knowledge, requiring significant computing resources and costs, and are prone to hallucinations due to incomplete data, especially in critical fields like medicine, where inaccuracies can lead to severe consequences.

Method used

A Retrieval-Augmented Generation (RAG) approach where additional information is stored in a separate database, and relevant chunks are retrieved and added to the question input field without modifying the model, enhancing responsiveness and reducing costs by leveraging multiple RAG databases for specific fields and incorporating background information.

Benefits of technology

This method allows for efficient, cost-effective integration of the latest information, reduces hallucinations, and supports immediate reflection of urgent data without extensive retraining, ensuring accurate responses across various domains.

✦ Generated by Eureka AI based on patent content.

Smart Images

Figure JP2025043653_02072026_PF_FP_ABST

Patent Text Reader

Abstract

[Problem] To provide a large language model using retrieval-augmented generation (RAG) that achieves: scaling up a capacity for recording additional information by constructing a plurality of RAG databases respectively for relevant fields as necessary; eliminating missed references by referring to the entire text or image of an original document page that includes a description of relevant matter on the basis of chunks obtained by RAG retrieval, and at the same time, reflecting, in the context of a prompt, background information or relevant information that may be described around a retrieved chunk; and eliminating the need for costly and time-consuming re-response generation by putting records of prompts and responses into a database to make the records retrievable. [Solution] A RAG database of the invention includes a page image acquisition means, a page text database recording means, a relevant feature vector extraction means, and a feature vector database recording means.

Need to check novelty before this filing date? Find Prior Art

Description

Large Language Model System

[0001] The present invention relates to a large language model system using retrieval augmented generation.

[0002] In recent years, the development of machine learning has been remarkable, and in particular, the popularization of large language models (LLMs: Large Language Models) has been progressing. Hundreds of billions to trillions of neural network parameters are learned with a large amount of data exceeding terabytes, and have been used to handle tasks such as translation, speech and image recognition, and text summarization. In addition, generative AI that generates images, music, and documents based on instructions (prompts) has also been increasingly put into practical use. It has become known that the performance improves by scaling up the model (scaling law: Scaling Laws), and the scaling of the model has been rapidly progressing.

[0003] As the application range of large language models expands, knowledge data in various fields is required, and with the progress of social situations and technologies in each field, the incorporation of up-to-date knowledge data is constantly demanded. However, learning a large amount of data requires large-scale computing resources, huge amounts of power, and high costs. Therefore, the reconstruction of large language models cannot be performed frequently. When asking questions to a large language model, there is a phenomenon called hallucination in which, even when the knowledge necessary to answer the question is not recorded internally, an answer that is not based on facts is generated. This is the reason why the spread is limited in fields such as medicine where mistakes can lead directly to accidents.

[0004] For the practical use of large language models based on the latest information in the field, additional acquisition of the latest information is required, and currently two types of approaches are being taken. One is what is called additional learning (fine-tuning) or transfer learning, in which learning is performed on a part of the output layer (fine-tuning) or only the final layer (transfer learning) of an existing large language model using additional information to make a specialized large language model. Since it becomes a large language model specialized for a field, its usefulness is high. However, although it is not as much as learning a large language model from scratch, the costs and technical skills required for learning are still necessary.

[0005] Another approach is the Retrieval-Augmented Generation (RAG) approach used in this invention. The model of the large-scale language model itself is not modified. Additional information is stored in a separate database (RAG database), and the information necessary to solve the question is retrieved from the RAG database. The obtained information is added to the question, and the large-scale language model is asked to produce an answer. Even if there is a large amount of data that could potentially be added, the additional information necessary to answer a given question is limited. Therefore, only that limited additional information is retrieved from the RAG database, and the extracted additional information (context) is added to the question in the question input field of the large-scale language model. In this approach, no changes are made to the model of the large-scale language model itself through learning; only contextual information is added to the question input field, resulting in lower costs and technical hurdles. Furthermore, it offers excellent responsiveness, such as being able to immediately reflect highly urgent information, such as urgent drug side effect information.

[0006] Here, the RAG database is created by breaking down the additional document information into small fragments (chunks), and then vectorizing each chunk into feature vectors (chunk vectors) to create a database. When a question is asked, the question itself is also vectorized, and chunk vectors with a high similarity to the vector of the question are searched from the RAG database. The contents of the obtained chunk group are then added to the question. This makes it possible to handle specific fields and the latest information without requiring additional training of a costly large-scale language model, and by limiting the basis for the answer to the question to the RAG database, it is possible to prevent hallucinations. The following are prior art documents related to this application.

[0007] https: / / ja.wikipedia.org / wiki / %E5%A4%A7%E8%A6%8F%E6%A8%A1%E8%A8%80%E8%AA%9E%E3%83%A2%E3%83%87%E3%83%ABhttps: / / www.idnet.co.jp / column / page_308.html

[0008] When actually building and attempting to search a RAG database, there are many unresolved issues, including: (1) the RAG database itself becomes enormous, resulting in a relatively large amount of data unrelated to the question; (2) the content to be searched is not always appropriately contained within chunks, leading to missed searches of data related to the question; (3) background information and related information that may be present around the searched chunks are missed; and (4) current large-scale language models are still insufficient for image recognition and searching. Furthermore, repeatedly performing RAG searches on similar questions is a waste of resources such as cost and time.

[0009] The present invention was made to solve the aforementioned problems of the past, and its objective is to provide a large-scale language model system using search-enhanced generation, which includes: constructing multiple RAG databases for each relevant field as needed to increase the storage capacity for additional information, improving efficiency by narrowing the target of RAG searches, referring to the text or image of the entire original document page containing relevant content based on the chunks obtained from RAG searches to eliminate missed references, and reflecting potentially relevant background information and related information contained around the searched chunks into the context of the question, presenting image data that is difficult to convert into text to the user, enabling additional information to be added to the question, and creating a database of question and answer records that is searchable, thereby eliminating the need for costly and time-consuming re-response generation.

[0010] As a means to achieve the above objective, the large-scale language model system according to claim 1 includes: (1) page image acquisition means for acquiring image images of individual pages of information sources to be referenced separately from the large-scale language model during inference, along with reference addresses to the image images; (2) page text database recording means for extracting string text from the page image images and recording it together with the reference addresses; (3) feature vector database recording means for dividing the extracted text into small chunks to calculate feature vectors and recording them together with the reference addresses; (4) related feature vector extraction means for extracting groups of feature vectors that are highly related to the question sentence given to the large-scale language model, along with the reference addresses; (5) duplicate reference address removal means for removing duplicate reference address groups from the extracted related feature vectors; and (6) related text transcription means for extracting the page text specified by the reference address from the page text database and then transcribing it together with the question sentence into the question input field, and obtaining an answer sentence by performing the operations (1) to (6) above.

[0011] The large-scale language model system according to claim 2 is characterized in that, in the large-scale language model system according to claim 1, it is further characterized by comprising a page image database recording means that records each acquired page image along with a reference address to the page image.

[0012] The large-scale language model system according to claim 3 is characterized in that, in the large-scale language model system according to claim 1 or 2, it is provided with page image viewing means for displaying and viewing the page image specified by the reference address.

[0013] The large-scale language model system according to claim 4 is characterized in that, in the large-scale language model system according to claim 2, the large-scale language model is provided with a comment transcription means that extracts a page image specified by a reference address from a page image database recording means, inputs comments for charts that have insufficient text conversion, and transcribes them into the question input field together with the question text.

[0014] The large-scale language model system according to claim 5 is characterized in that, in the large-scale language model system according to claim 2, it comprises a plurality of page image acquisition means, a page text database recording means for the page image database recording means, and a plurality of feature vector database recording means for the page text database recording means, and a plurality of RAG database search means for searching and extracting related feature vectors for any of the feature vector recording means.

[0015] The large-scale language model system according to claim 6 is characterized in that, in the large-scale language model system according to claim 1, it comprises a question and answer recording means for recording the question sentence and the obtained answer sentence, and for a new question sentence, the question and answer recording means first searches for a question sentence similar to the new question sentence, and if a similar question sentence is found, it comprises an F&Q database in which the answer record for the similar question sentence is used as the answer to the question.

[0016] The large-scale language model system according to claim 1 includes an image acquisition means to acquire image images of individual pages of information sources that should be referenced separately from the large-scale language model during inference, along with the reference addresses to those image images. It also includes a page text database recording means to extract string text from the page image images and record it along with the reference addresses. It also includes a feature vector database recording means to divide the extracted text into small segments (chunks), calculate feature vectors, and record them along with the reference addresses. It also includes a related feature vector extraction means to extract groups of feature vectors that are highly related to the question text given to the large-scale language model, along with their reference addresses. It also includes a duplicate reference address removal means to remove duplicate reference address groups from the extracted related feature vectors. Finally, it includes a related text transcription means to extract the page text specified by the reference address from the page text database and then transcribe it into the question input field along with the question text.

[0017] The large-scale language model system described in claim 2 includes a page image database recording means, so that each acquired page image is recorded along with a reference address to that page image.

[0018] The large-scale language model system described in claim 3 includes a means for viewing page images, which displays and makes available for viewing the page image specified by the reference address.

[0019] The large-scale language model system according to claim 4 includes a comment transcription means. In the large-scale language model, after extracting the page image specified by the reference address from the page image database recording means, comments are entered for the figures and tables that have not been adequately converted to text, and these comments are transcribed into the question input field along with the question text.

[0020] The large-scale language model system according to claim 5 comprises a plurality of page image acquisition means, a page text database recording means for the page image database recording means, and the feature vector recording means for the page text database recording means, and a plurality of RAG database search means for searching and extracting relevant feature vectors for any of the feature vector recording means.

[0021] The large-scale language model system according to claim 6 includes a question-answer recording means for recording a question and the obtained answer, and for a new question, the question-answer recording means first searches for a question similar to the new question, and if a similar question is found, it includes an F&Q database in which the answer record for the similar question is used as the answer to the question.

[0022] This is an embodiment of the network configuration of the present invention. It is an example of a user interface in a Large-Scale Language Model (LLM). It shows the relationship between the Large-Scale Language Model and the RAG database. It shows the configuration of the RAG database. It shows the method for assigning reference addresses. It describes the search procedure for the RAG database. Multiple RAG databases are constructed for each domain. This is an explanatory diagram of the F&Q database.

[0023] The large-scale language model system according to the present invention comprises a server device, a database, and a terminal. The server device is a known computer device and comprises an arithmetic unit, main memory, auxiliary storage device, input device, output device, and communication device. The arithmetic unit, main memory, auxiliary storage device, input device, output device, and communication device are connected to each other via a bus interface. The arithmetic unit comprises a known processor capable of executing an instruction set. The main memory comprises volatile memory such as RAM capable of temporarily storing the instruction set. The auxiliary storage device comprises non-volatile data storage capable of recording the OS and programs. The data storage may be an HDD or an SSD, for example. The input device may be a keyboard or mouse, for example. The output device may be a display such as an LCD, for example. The communication device comprises a network interface capable of connecting to a network. The server device includes means for acquiring page images, means for recording page text in a database, means for recording feature vectors in a database, means for extracting related feature vectors, means for removing duplicate reference addresses, means for transcribing related text, means for recording page images in a database, means for viewing page images, means for transcribing comments, means for searching multiple RAG databases, and an F&Q database. The processor of the server device exerts the effects of these means. The database according to the present invention may be configured in the auxiliary storage device of the server device, or it may be configured in a separate auxiliary storage device independent of the server device. The database stores information handled by a large-scale language model system. The terminal according to the present invention has the same hardware configuration as a known computer as the server device. The server device, database, and terminal according to the present invention are communicable via a network.

[0024] Figure 1 shows a typical system configuration of the present invention. Large-scale language model systems consist of massive amounts of data, numerous CPUs (Central Processing Units), GPUs (Graphics Processing Units), and a high-speed network connecting them. Therefore, they are built within large servers such as cloud data centers and provided via the Web. Companies and hospitals have numerous PC terminals connected via LANs (Local Area Networks) that are connected to the Web. Companies and hospitals also have servers operating their internal databases and electronic medical records. In recent years, there has been an increase in cases where mobile devices such as smartphones and tablets are used to access cloud services such as large-scale language models, internal databases, and electronic medical records from outside the company or hospital.

[0025] Servers, terminals, and mobile terminals all consist of memory for recording programs and data, recording media such as hard disks for persistently storing the programs and data as needed, a CPU for reading and processing the programs and data, a GPU for high-speed parallel processing as appropriate, and communication modules. As cloud services become more stable and inexpensive, there is an increasing trend to migrate some or all of in-house databases and electronic medical records to the cloud. Conversely, there is also a movement to move some or all of large-scale language models to terminals with increased processing power and memory capacity (edge computing). Furthermore, the development of small-scale language models with a reduced number of parameters is also progressing. It should be noted that even small-scale models are still sufficiently large compared to those before the emergence of large-scale language models, and all embodiments, including this form, are included in the present invention.

[0026] Figure 2 shows an example of a user interface for a Large-Scale Language Model (LLM). LLM is currently under rapid development, with numerous models being developed, including ChatGPT (a registered trademark of OpenAI), Bard, LaMDA (a registered trademark of Google), and LLaMA (a registered trademark of Meta). While the user interfaces naturally differ, the standard configuration, as shown in Figure 2, consists of a prompt input field for entering instructions and inquiries to the LLM, a field for displaying the response to that prompt (response display field), and a field for displaying the history of prompts and responses as a usage log (usage history field).

[0027] Recently, in addition to using LLM as a standalone application as described above, there has been an increase in cases where LLM itself is equipped with an API (Application Programming Interface), allowing external software to utilize LLM's functions. In this case, prompts, responses, and history are input and output between the external software and the LLM via the API, so the display format is controlled by the external software and is not limited to Figure 2.

[0028] In large-scale language models, a one-hot vector is used to represent a vocabulary word. This vector consists of zeros with the same number of dimensions as the number of vocabulary words used, and a single 1 is placed at the position corresponding to that vocabulary word. The vocabulary of a large amount of literature is replaced with vectors of this type, and the relationships (Attention) between each vocabulary vector are determined using deep learning. For a query (prompt), the model generates and adds vocabulary words one by one that are highly likely to follow the query and the already generated partial answer sentence to create the answer sentence. If the information used to generate the answer is contained in the large amount of literature, it is expected that a rational, useful, and correct sentence will be generated. However, if the information used to generate the answer sentence is not contained in the large amount of literature, the model mechanically selects the vocabulary words with high probability and proceeds with generating the answer sentence, which is known to generate false answers that are not based on evidence (hallucination). If this hallucination occurs in a medical setting, it can endanger the patient's life, and this is one of the reasons why the application of large-scale language models to core business operations has not progressed.

[0029] Training large-scale language models requires massive servers, including a large number of parallel processing units (GPUs), as well as significant electricity resources and costs, to process vast amounts of documents and use deep learning to determine relationships between vocabulary words. While new documents are created daily, it is not realistic to reflect all of them in large-scale language models without any delay. Furthermore, the vast amount of documents processed mainly consists of publicly available documents on the web, but they do not include sensitive information such as internal company documents or electronic medical records from hospitals. Therefore, it is said that the documents that can be collected represent only a small fraction of all documents that exist on Earth.

[0030] To effectively utilize large-scale language models based on the latest information in the field, additional up-to-date information is required, and currently two approaches are being taken. One is called additional learning (fine-tuning) or transfer learning, where additional information is used to train only a part of the output layer (fine-tuning) or only the final layer (transfer learning) of an existing large-scale language model, creating a specialized large-scale language model. While this results in a domain-specific large-scale language model with high utility, it still requires considerable expense and technical skills, even if not as extensive as training a large-scale language model from scratch. Furthermore, sensitive information such as personal information and descriptions of medical conditions included in the additional information is used for training, posing a risk of it being accessed outside the organization. To prevent this, it is necessary to exclusively build and operate the additionally trained large-scale language model within the company or hospital.

[0031] Another approach is the Search-Augmented Generator (RAG) approach used in this invention. The model of the large-scale language model itself is not modified. Additional information is stored in a separate database (RAG database), and the information necessary to solve the question is retrieved from the RAG database. The text information of the obtained chunks is added to the question, and the large-scale language model is asked to produce an answer. Even if there is a large amount of data that could potentially be added, the additional information necessary to answer a given question is limited, so only that limited additional information is retrieved from the RAG database, and the extracted additional information is added to the question (context) in the question input field of the large-scale language model. In this approach, no changes are made to the model of the large-scale language model itself through learning, and only context information is added to the question input field, so the cost and technical hurdles are low. Furthermore, it is highly responsive, as it can immediately reflect urgent information such as urgent drug side effect information.

[0032] Figure 3 shows the processing flow. A broker program such as a chat application receives a question from the user (1) and searches the RAG database for information related to the question (2). The broker program then asks the LLM (Language Literacy Manager) about the search results (3) along with the question (4) and receives an answer from the LLM (5). The received answer is then presented to the user (6). However, although the length of the chunk text string is variable in the settings, there are certain limitations, and it may not be possible to include all the necessary information. Conversely, if the chunk is too long, the focus of the feature vector may become blurred, and there is a risk that it will not be able to properly handle the search. Also, if a keyword is applied to the boundary between chunks, proper feature vectorization cannot be performed, making searching difficult. Furthermore, scanned images such as images and illustrations are not yet sufficiently recognized and converted into text, and there is a possibility that they will not be found in searches. Moreover, building a RAG database of a level that can withstand practical use requires a large-scale system based on a large number of document chunks. The required document set varies greatly depending on the field of interest. For example, the set of documents required in the medical field differs significantly from that required in history or literature. Building a RAG database from a set of documents covering all fields would result in a redundant configuration, potentially placing an unnecessary burden on database construction and retrieval.

[0033] Figure 4 shows the configuration of the RAG database. Scanned images of books, PDF images of web documents, etc., from potentially additional information sources (information resources), are recorded in the database, with each individual page image (page image) assigned a reference address such as "information resource name + page" (page image database). Subsequently, text is extracted from each page image (page text), and the page text is recorded along with the reference address (page text database). The text of each page is divided into small fragments (chunks), a feature vector is calculated for each chunk, and the reference address is assigned to record it in the feature vector database. Note that, if necessary, such as when there are strict limitations on recording capacity, the page text may be compressed and recorded using the summarization function of a large-scale language model.

[0034] Here, the reference address is formatted as a book title followed by page numbering, as is common in books, as shown in Figure 5. However, unlike books which have physical constraints, there is no need to be particular about the display format in web documents, so depending on the granularity of the information, it may be numbered at the paragraph level, or conversely, at the section or chapter level. The page text can be unstructured plain text, but in the case of complex content, tagged notation such as XML, JSON, or Markdown is desirable to clearly indicate the structure of the document. Different feature vector calculation software recommends different notations. This invention uses JSON notation, but any notation may be used.

[0035] Currently, it is not possible to perfectly represent images, videos, and illustrations included in page images with text. Therefore, human judgment and understanding are ultimately required. It is also useful for users to judge the aforementioned images, videos, and illustrations and add explanatory text to the corresponding page text as needed. Thus, while a page image database after extracting page text is not essential, it can be useful, and the relevant page images should be referenced as needed.

[0036] Figure 6 shows the search procedure for the RAG database. When a question is entered into the prompt input box, the broker program calculates the feature vector of the question and extracts a list of similar feature vectors from the vector database (RAG database). The degree of similarity between feature vectors is often expressed by the magnitude of the dot product between the vectors (cosine similarity), but the Manhattan distance, which is the sum of the absolute values of the differences between each element of both vectors, may also be used. The magnitude of the threshold for the degree of similarity used as the basis for extraction is variable depending on the situation. If the extracted list is too large, the threshold can be raised to narrow it down, and if the list is too small, the threshold can be lowered to increase the size of the list. In some cases, the threshold can be set to the top 10, for example, and the threshold can be automatically adjusted so that the number of extracted items matches the set value.

[0037] In conventional RAG systems, when a list of feature vectors corresponding to multiple chunks was extracted, the text of the chunk from which each feature vector originated was added to the question text. However, as mentioned above, there are limitations to the length of the chunk text strings, and sometimes the necessary information could not be included. Also, if a keyword was applied to the transition between chunks, proper feature vectorization was not possible. Furthermore, scanned images such as images and illustrations sometimes could not be adequately converted into text. To overcome these shortcomings, the present invention employs the following procedure.

[0038] The system organizes the reference addresses assigned to each feature vector, removes duplicate reference addresses, and transcribes the entire page text indicated by the remaining reference addresses into the prompt input box along with the question. The large-scale language model is then prompted to respond based on this. This procedure resolves the problem of gaps between chunks, where each chunk cannot contain all the necessary information, and also allows the text between chunks to be added to the question. This feature is useful in education for students and others, as it promotes deeper understanding by showing not only the solution to a specific problem but also the background information behind it.

[0039] If images, videos, illustrations, etc., are not adequately represented in text, the system searches for the image of the page in question in the page image database and displays it to the user. The user then adds a description of the image, video, or illustration to the question. If necessary, this description may also be added to the page text. This allows for the utilization of information such as images, videos, and illustrations that were not previously fully utilized.

[0040] Building a RAG database of a practical level requires a large-scale database based on a massive collection of document chunks. For example, in the medical field alone, there are well over 50 major fields such as internal medicine, surgery, and obstetrics and gynecology, and these fields tend to become even more subdivided as medicine advances. Attempting to consolidate everything into a single RAG database would place a tremendous burden on data storage and retrieval. Here, the necessary document sets differ greatly depending on the field of interest. For example, even within the medical field, there is considerable overlap between the document sets of abdominal surgery and psychiatry. Furthermore, the document sets required for medicine differ significantly from those of history and literature. Therefore, as shown in Figure 7, by constructing multiple RAG databases for each field of interest and having the broker program search one or more RAG databases related to the query as needed, it is possible to avoid the inefficiency of searching caused by the massive size of the RAG database. Of course, if the RAG database is too subdivided, it becomes necessary to perform numerous searches, which reduces search efficiency, so it goes without saying that it is necessary to consolidate documents from highly related fields when constructing the RAG database.

[0041] In educational settings, when generating explanations for exam answers, a general RAG database-based explanation is appropriate for students who answered a question incorrectly and whose performance is not good. However, this risks leaving high-achieving students intellectually unsatisfied. In this case, it would be useful to build a RAG database containing more advanced, fundamental content and generate and provide advanced, in-depth explanations from it.

[0042] While questions and their corresponding answers can be created each time, this incurs considerable computing resources and costs. As shown in Figure 8, pairs of questions and their corresponding answers are stored in a RAG database as an FAQ database. When a question is entered, the system first searches the FAQ database for answers closely related to that question. Only if the content of the found answers is unsatisfactory, a new answer is created using the procedure of the present invention, and the result is similarly registered in the FAQ database. When registering in the FAQ database, either the feature vector of the question is registered in the RAG database with a link to the answer, or the question and answer pairs are chunked and their feature vectors are registered.

[0043] Furthermore, since there is a risk of information leakage if the content entered in the prompt is used to train a large-scale language model, it is useful to explicitly declare in the prompt that training is prohibited, or to use a paid version of a large-scale language model that guarantees that the content will not be used for training.

[0044] Although embodiments have been described above, the specific configuration of the present invention is not limited to the embodiments described above, and design changes and the like that do not depart from the gist of the invention are also included in the present invention. For example, although the present invention mainly describes a chunk feature vector database, if the information is useful for generation, it is also included in the present invention if information extracted from a normal relational database using SQL statements or reference information from the web is added to the feature vector database as appropriate.

Claims

1. A large-scale language model system comprising: (1) page image acquisition means for acquiring image images of individual pages of information sources to be referenced separately from the large-scale language model during inference, along with reference addresses to said image images; (2) page text database recording means for extracting string text from the page image images and recording it together with the reference addresses; (3) feature vector database recording means for dividing the extracted text into small sections (chunks), calculating feature vectors, and recording them together with the reference addresses; (4) related feature vector extraction means for extracting groups of feature vectors that are highly related to the question sentence given to the large-scale language model, along with the reference addresses; (5) duplicate reference address removal means for removing duplicate reference address groups from the extracted related feature vectors; and (6) related text transcription means for extracting the page text specified by the reference address from the page text database and transcribing it together with the question sentence into the question input field; thereby obtaining an answer sentence by performing the operations (1) to (6) above.

2. The large-scale language model system according to claim 1, characterized by comprising a page image database recording means that records each acquired page image along with a reference address to the page image.

3. The large-scale language model system according to claim 1 or 2, characterized by comprising a page image viewing means for displaying and providing access to the page image specified by the reference address.

4. The large-scale language model system according to claim 2, characterized in that it includes a comment transcription means for extracting page image data specified by a reference address from a page image data database recording means, inputting comments for figures and tables that have insufficient text conversion, and transcribing them together with the question text into the question input field.

5. The large-scale language model system according to claim 2, comprising a plurality of page image acquisition means, a page text database recording means for the page image database recording means, and a plurality of feature vector database recording means for the page text database recording means, and further comprising a plurality of RAG database search means for searching and extracting relevant feature vectors for any of the feature vector recording means.

6. The large-scale language model system according to claim 1, comprising a question and answer recording means for recording the question and the obtained answer, wherein, with respect to a new question, the question and answer recording means first searches for a question similar to the new question, and if a similar question is found, the answer record for that similar question is used as the answer to the question in the F&Q database.