Efficient adjustment of chunk impact in search extension generation

A self-optimizing system improves generative AI responses by using user feedback to update relevance scores for text chunks, addressing the challenge of inaccurate and biased outputs in Retrieval Augmented Generation.

JP2026104780APending Publication Date: 2026-06-25エスアーペーエスエー

Patent Information

Authority / Receiving Office
JP · JP
Patent Type
Applications
Current Assignee / Owner
エスアーペーエスエー
Filing Date
2025-09-18
Publication Date
2026-06-25

AI Technical Summary

Technical Problem

Modern generative AI models struggle to provide domain-specific responses due to their broad training data, leading to inaccurate and biased outputs when using Retrieval Augmented Generation (RAG) without effective management of the RAG corpus.

Method used

A self-optimizing system that collects user ratings of text chunks to update relevance scores, prioritizing high-scoring chunks for context in prompts, thereby improving response accuracy and reducing hallucinations.

Benefits of technology

The system enhances response quality by dynamically adjusting the RAG corpus based on user feedback, ensuring more reliable and relevant outputs over time.

✦ Generated by Eureka AI based on patent content.

Smart Images

  • Figure 2026104780000001_ABST
    Figure 2026104780000001_ABST
Patent Text Reader

Abstract

This system provides an efficient management system for the RAG corpus, which is used to prompt generative AI models to produce improved responses. [Solution] The system and method include receiving a query from a user, determining a first text portion from a plurality of stored text portions that is semantically similar to the query, determining a first score associated with each of the first text portions, and generating a first prompt based on the first score, wherein the first prompt includes the query and the first text portions; sending the first prompt to a text generation model; receiving a response to the first prompt from the text generation model; presenting the response and the first text portions; receiving an evaluation of one of the presented first text portions from the user; and updating a first score associated with one of the first text portions based on the evaluation.
Need to check novelty before this filing date? Find Prior Art

Description

Technical Field

[0001] This disclosure relates to efficient adjustment of chunk impact in retrieval augmented generation.

Background Art

[0002] Modern generative AI models perform sophisticated generation of text, images, and even audio based on prompts sent by users. The most powerful of these models are trained on vast corpora of available data so as to be generally usable for all intended purposes. Due to the breadth of knowledge obtained through such training, it can be difficult to narrow the range of model responses to a desired domain. Furthermore, these models may not incorporate the specialized knowledge required to appropriately respond to some prompts.

[0003] To address the above, one approach involves fine-tuning the generative model using specific information not included in the initial training corpus. This approach can be costly and may not yield the desired results. Instead, Retrieval Augmented Generation (RAG) involves using a search algorithm to retrieve query-specific information from a RAG corpus. The retrieved data is then incorporated into the context of the prompt, which also includes the query, and the prompt is input into the generative model. RAG may increase response accuracy and reduce hallucinations that may arise from queries related to topics for which the generative model has not been trained.

[0004] RAG's performance depends on the quality of the data retrieved to be included in the prompt. For example, if the retrieved data is inaccurate, biased, and / or outdated, the resulting response may also be inaccurate, biased, and / or outdated. Managing the RAG corpus to filter out this undesirable data would be prohibitively expensive given the volume of data involved. Even if this information is filtered out, the RAG data source may still contain information that, while accurate, hinders the generative model from generating useful responses. [Overview of the Initiative] [Means for solving the problem]

[0005] What is needed is a system that efficiently manages the RAG corpus for use in prompting generative AI models to produce improved responses. [Brief explanation of the drawing]

[0006] [Figure 1] This figure shows a self-optimizing search extension generation system according to several embodiments. [Figure 2] This is a flowchart of the process for generating self-optimizing search extensions using several embodiments. [Figure 3] This is a diagram of a user interface that receives user queries in several embodiments. [Figure 4] This figure shows data being loaded into a vector database according to several embodiments. [Figure 5] This is a partial tabular representation of chunk score information according to several embodiments. [Figure 6] This figure shows prompting for text generation models according to several embodiments. [Figure 7] This is a diagram of a user interface that presents responses and overall chunk scores for several embodiments. [Figure 8]This diagram shows a user interface that receives chunk evaluations from the user in several embodiments. [Figure 9] This is a partial tabular representation of chunk score information according to several embodiments. [Figure 10] This diagram shows cloud-based implementations in several embodiments. [Modes for carrying out the invention]

[0007] The following description is provided so that those skilled in the art may create and use the embodiments described. However, various modifications will be immediately apparent to those skilled in the art.

[0008] Several embodiments implement a valid feedback loop that collects user ratings of RAG text chunks and instructs the model to utilize text chunks based on these ratings. For example, a query is received from a user, and text chunks that are semantically similar to the query are identified. Where available, the relevance score for each identified text chunk is retrieved from the data score. The relevance score for a given text chunk is based on previously received user ratings of that text chunk and is intended to represent the reliability and / or relevance of the text chunk for use in RAG.

[0009] A prompt is generated that includes the query and text chunks as context for the query. In some cases, identified text chunks associated with low scores are ignored and not included in the context. The context also includes instructions to the model to give more authoritative weight to chunks associated with higher scores than to chunks with lower scores, while generating a response to the query.

[0010] Upon receiving a response, the user also receives a display of the text chunks used to generate the response, and, if any, a display of their respective scores. The user may provide a rating for one or more of the text chunks, which is then used to update the stored scores of the rated text chunks. By collecting ratings of text chunks from consumers of the responses generated based on them, the scores associated with the text chunks will, over time, more accurately reflect their reliability and usefulness in generating responses. Then, by utilizing such scores to instruct the generation model, the quality of responses generated by the model will gradually improve.

[0011] Figure 1 shows a self-optimizing search extension generation system according to several embodiments. Each of the illustrated components may be implemented using any preferred combination of known or now known local, on-premises, cloud-based, or distributed computing hardware and / or software (including, for example, distributed storage and / or compute nodes). Each component described herein may be run by one or more physical and / or virtualized servers.

[0012] Two or more components in Figure 1 may be located in the same place. In some embodiments, two or more components are implemented by a single computing device. One or more components may be implemented by a cloud service (e.g., Software-as-a-Service, Platform-as-a-Service). A cloud-based implementation of any component in Figure 1 may flexibly allocate computing resources according to demand, needs, price, and / or any other criteria. Each component may be run by an execution environment that includes one or more services, virtual machines, a collection of container orchestration systems, etc. Such an execution environment may provide applications running within it with an operating system, services, I / O, storage, libraries, frameworks, etc.

[0013] Generally, the system in Figure 1 allows a user 105 to send queries to a text generation model 110 and receive responses from it. The text generation model 110 may include a neural network trained to generate text based on input text. Embodiments may implement a generative model that generates any type of data based on input prompts, including, but is not limited to, image, video, and audio data.

[0014] According to several embodiments, Model 110 is a Large Language Model (LLM) that follows a transformer architecture. Non-exhaustive examples of LLMs include GPT-4, LaMDA, and Claude. The transformer architecture may include, for example, an embedding layer, a feedforward layer, a regression layer, and an attention layer. The embedding layer creates embeddings from the input text that are intended to capture the semantic and syntactic meaning of the input text. The feedforward layer consists of multiple fully connected layers that transform the embeddings. Some feedforward layers are designed to generate a representation of the intent of the text input. The regression layer interprets the tokens (e.g., words) of the input text sequentially to capture the relationships between tokens. The attention layer may use a self-attention mechanism that can consider different parts of the input text and / or the entire context of the input text to generate the output text. Generally, each layer contains nodes, which are coupled to the inputs of nodes in the next layer to form a directed, weighted graph. Each node receives an input, changes its internal state according to that input, and produces an output corresponding to the input and its internal state.

[0015] The text generation model 110 may be implemented by any other representation, for example, executable program code, a set of hyperparameters defining the model structure and a corresponding set of weights, or an input-to-output mapping learned as a result of training. The model 110 may be publicly available or deployed within a trusted environment. Similarly, the text generation model 110 may be trained on publicly available data and / or personal data.

[0016] User 105 operates user device 115 to send queries to query server 120. User device 115 may include, for example, a laptop computer, desktop computer, smartphone, or tablet computer. Query server 120 may operate to provide a user interface to user device 115 for query submission, chunk evaluation, etc. According to some embodiments, user device 115 runs a web browser that accesses web pages provided by query server 120. Such a web browser may run a front-end application corresponding to the back-end application of query server 120. In some embodiments, query server 120 is a chatbot application.

[0017] The query server 120 may call a chunk retriever 125 to request text chunks that are semantically similar to the query received from the user device 115. The chunk retriever 125 performs a similarity search to identify these text chunks from within the chunk database 130. The chunk database 130 may include a vector database populated with data based on the text of the text data 135. The text data 135 may include any type of text data that can be used in RAG as described above.

[0018] As is known, the text data 135 is decomposed into text portions, i.e., "chunks", using any known or newly known chunking algorithm. Each chunk is converted into a multi-dimensional vector (i.e., embedding) intended to capture the semantic and syntactic meaning of the chunk. The conversion is performed such that multi-dimensional vectors of semantically similar chunks are close to each other in the vector space, and multi-dimensional vectors of semantically dissimilar chunks are far from each other in the vector space. The chunk database 130 stores each chunk in association with the multi-dimensional vector generated therefrom. Thus, the chunk retriever 125 converts the received query into a multi-dimensional vector, identifies the vector in the database 130 that is closest to the multi-dimensional vector (e.g., using a cosine similarity measure), and retrieves the text chunk stored in the database 130 in association with the identified vector.

[0019] The query server 120 may receive the text chunks identified by the chunk retriever 125 and request score information for each of the text chunks from the chunk scoring component 140. Next, the chunk scoring component 140 requests score information from the chunk score data store 145. The chunk score data store 145 may include a key-value data store where the text chunk is a key to the associated score information. The score information may indicate the reliability and usefulness of the text chunk for generating a suitable response using the model 110. The score information associated with the text chunk may be updated based on user evaluations of the text chunk received during operation, as described below.

[0020] The query server 120 passes the text chunks and their score information to the prompt generation component 150. The prompt generation component 150 generates a prompt (e.g., consisting of a system prompt and a user prompt) that includes the query and includes the text chunks as context for the query. The context includes instructions to prioritize text chunks associated with higher scores over text chunks associated with lower scores. The context may include the scores for each text chunk and may order the text chunks according to priorities, etc.

[0021] The prompt is sent to the model 110, and the model 110 operates based on its training to generate a response. The response is returned to the query server 120 for presentation to the user 105. The response may, in some embodiments, be presented with an overall score determined based on the score information of the text chunks included in the prompt. One or more of the text chunks included in the prompt may also be presented to the user along with their corresponding score information (e.g., the scores determined based on their score information).

[0022] The user 105 may operate the user device 115 to input a user evaluation for one or more of the presented text chunks. The query server 120 provides the evaluation to the chunk scoring component 140, and the chunk scoring component 140 updates the score information for the corresponding text chunks stored in the data store 145. The updated score information may be used in the generation of the next prompt that includes the corresponding text chunks.

[0023] The chunk synchronizer 170 may periodically update the chunk score data store 145 based on changes in the chunk database 130. For example, the chunk synchronizer 170 may delete keys for text chunks that no longer exist in the chunk database 130, or add keys for newly stored text chunks. In some embodiments, the chunk synchronizer 170 may also delete old score information from the chunk score data store 145.

[0024] Figure 2 includes a flowchart of process 200 for performing self-optimizing search extension generation according to several embodiments. Process 200 and other processes described herein may be performed using any preferred combination of hardware and software. Software program code using these processes may be stored on any non-temporary tangible medium, including but not limited to fixed disks, volatile or non-volatile random access memory, DVDs, flash drives, or magnetic tape, and may be executed by any number of processing units, including, but not limited to, processors, processor cores, and processor threads. Such processors, processor cores, and processor threads may be implemented by virtual machines provided in a cloud-based architecture. Embodiments are not limited to the examples described below.

[0025] In S205, a text query is received from the user. Figure 3 shows a user interface 300 of an application that may receive a text query in S205. The user may access interface 300 via a web browser and / or via a link provided by another application such as a launchpad. The user may be authenticated before receiving user interface 300. As shown in the figure, the user enters the text query 310 "Is Paris in France?" into interface 300.

[0026] In S210, an embedding is generated from a text query. The generation of the embedding may include providing the text query to an embedding model to generate a multidimensional vector representing the semantics of the text query. Next, in S215, text chunks are identified based on the similarity between the query embedding and other embeddings generated from multiple text chunks. The other embeddings may be stored in a vector database in association with multiple text chunks. S215, therefore, may consist of searching the vector database using the query embedding.

[0027] Figure 4 shows data being populated in a vector database 130 according to several embodiments. The text data 135 may include domain relation information that can help the text generation model respond to domain relation queries. The text data 135 may include documents, spreadsheets, program code, etc., in any known format. The chunking component 410 may include any preferred algorithm for generating text chunks 420 from the text data 135 that is known or has become known. One or more text chunks 420 may be generated from each of the text data 135. The algorithm may include, but is not limited to, a semantic chunking algorithm that divides the text data 135 according to semantic boundaries. For example, the chunking algorithm may convert the text data 135 into tokens consisting of words, subwords, or characters. The chunks 420 may be formed by dividing the text data 135 at natural breakpoints, such as sentences, paragraphs, or attributes. Some of the chunks 420 may contain the same (i.e., duplicate) tokens. For example, if the determined chunk size is 100 tokens, the next chunk may start with 80 tokens from the previous chunk to maintain context between consecutive chunks.

[0028] The embedding model 430 generates an embedding based on each of the chunks 420, resulting in an embedding 440. Each embedding 440 is stored in the vector database 130 in relation to the chunk 420 from which it was generated. As a result, identifying an embedding 440 in the vector database 130 allows for the retrieval of the chunk 420 used to generate the embedding 440.

[0029] One or more text chunks are identified in S215. The identified text chunks may include those text chunks associated with embeddings that have a similarity to the query embedding greater than a threshold. The identified text chunks may be text chunks associated with P of the most similar embeddings, where P is a predefined number. In some embodiments, the identified text chunks may be text chunks associated with P of the most similar embeddings, and their embedding similarity is greater than a threshold.

[0030] In S220, score information associated with each identified text chunk is retrieved. The score information may be stored in a key-value store whose key is the text chunk. Thus, each identified text chunk may be used to look up the associated score information from such a data store. Figure 5 shows a tabular representation of some chunk score information 500 according to several embodiments. For example, the score information includes the sum of all user ratings received for the associated text chunk, the number of user ratings received for the text chunk, a timestamp indicating when the last user rating was received for the text chunk, and a score consisting of the sum / count. In this example, the user ratings can be integers, i.e., one of -2 (unreliable / undesirable), -1, 0, 1, or 2 (reliable / desirable). In S220, if score information is not stored for one of the identified text chunks, the score information is not retrieved for that text chunk. In some embodiments, the values ​​of score, sum, and count for one of the text chunks are assumed to be 0.

[0031] In S225, a prompt is generated based on the text query, identified text chunks, and retrieved score information. The prompt includes the text query and uses the score information to indicate the importance (e.g., level of consideration) that the text generation model should give to each text chunk while creating a response to the text query. Embodiments may use any preferred method for determining and indicating different importance levels for different text chunks. In one embodiment, the prompt provides each identified text chunk with its associated score and an instruction to consider the text chunks according to their scores. In another embodiment, the text chunks are listed in order of their scores, and the prompt instructs the model to consider or weight the text chunks based on the order in which they are listed. One or more identified text chunks may be excluded from the prompt if their score is below a threshold, their count in user evaluation is below a threshold, and / or their timestamp is greater than a threshold length from the current time. The prompt is sent to the text generation model in S230, and a response is received from the text generation model in S235.

[0032] Figure 6 shows S225-S235 in several embodiments. The prompt generation component 150 receives a text query 610, an identified text chunk 612, and score information 614. The score information 614 may include any one or more values ​​retrieved in S220 and / or any one or more values ​​determined therefrom. For example, the score in the score information 614 may be reduced based on the length of time between the current time and the timestamp of the score information.

[0033] The prompt generation component 150 selects a prompt template 620, populates the prompt template 620 with data using the text query 610, identified text chunks 612, and score information 614 to generate a prompt 630, and sends the prompt 630 to the text generation model 160. In some embodiments, the prompt template 620 is sent to the text generation model 160 as a system prompt, and the text query 610, identified text chunks 612, and score information 614 are sent to the text generation model 160 as a user prompt. The text generation model 160 generates and returns a response 640 based on the prompt 630.

[0034] The response and the text chunks used to create the response are presented in S240. Figure 7 shows a user interface 300 presenting a response 710 in several embodiments. The response 710 is presented together with a total score indicator 715. In some embodiments, the total score indicator 715 shows the sum of the scores associated with the text chunks used to generate the response 710. The total score can be determined from the score information of the text chunks in any preferred manner.

[0035] According to some embodiments, the user manipulates the cursor to select indicator 715, for example, by a double-click action. This selection will display window 800 in Figure 8. Window 800 presents the text chunks 810 used to generate response 710, as well as the score 820 for each text chunk 810, which was retrieved in S220.

[0036] In S245, an evaluation is received for one of the text chunks presented in S240. Continuing in this example, the user manipulates cursor 720 to select the star icon on indicator 822, corresponding to a user evaluation of -2 for text chunk 812. Next, in S250, the score information for the text chunk is updated based on the received evaluation. As shown in Figure 9, upon receiving the user evaluation as described in Figure 8, the corresponding text chunk and its associated count are updated from 20 to 21, -2 is added to the total to make it -22, a timestamp is updated to indicate the time the user evaluation was received as an indicator of the score's freshness, and the score is recalculated as the average of all past individual evaluations, taking into account the updated total and count values. Advantageously, the updated score information can be used to generate the next prompt containing the corresponding text chunk.

[0037] Figure 10 shows some cloud-based implementations according to several embodiments. The query server 1010 may receive a text query from a user (not shown) and request the embedding model 1020 to generate an embedding corresponding to it. The query server 1010 may search the vector database 1030 for text chunks that are semantically similar to the text query based on the embedding. Next, the query server 1010 retrieves the chunk score store 1040 for each returned text chunk and its associated score. If no entry is found in the store 1040 for a given text chunk, the given chunk may be assigned a neutral score (for example, 0 for a relevance score ranging from -2 (very unreliable / unrelevant) to +2 (very reliable / relevant)). The query server 1010 may generate a prompt based on the text query, text chunks, and score information and send the prompt to the text generation model 1050.

[0038] Model 1050 returns a response, which is then presented to the user along with the text chunks. The user provides a user rating for one or more of the text chunks, and the score information for the text chunks is updated in the chunk score store 1040. Each of systems 1010 through 1050 may have cloud-based resources in one or more public clouds that provide self-service and immediate provisioning, auto-scaling, security, compliance, and identity management capabilities. Each of systems 1010 through 1050 may have servers or virtual machines in their respective Kubernetes clusters, but the embodiments are not limited thereto.

[0039] The diagram above represents a logical architecture for illustrating processes according to several embodiments, and actual implementations may include more or different components configured in other ways. Other topologies may be used in conjunction with other embodiments. Furthermore, each component or device described herein may be implemented by any number of devices communicating over any number of other public and / or private networks. Two or more such computing devices may be far apart from each other and may communicate with each other by any known way of the network and / or by dedicated connections. Each component or device may include any number of hardware and / or software elements suitable for providing the functions described herein and any other functions. For example, any computing device used in the implementation of the system according to some embodiments may include a processor that executes program code so that the computing device operates as described herein.

[0040] All systems and processes discussed herein may be embodied in program code stored on one or more non-temporary computer-readable recording media. Such media may include, for example, hard disks, DVD-ROMs, flash drives, magnetic tapes, and solid-state random-access memory (RAM) or read-only memory (ROM) storage units. Therefore, embodiments are not limited to any particular combination of hardware and software.

[0041] The embodiments described herein are for illustrative purposes only. Those skilled in the art will recognize that other embodiments can be carried out by modifying and altering the embodiments described above. [Explanation of Symbols]

[0042] 105 users 110 Text Generation Models 115 User Devices 120 query servers 125 Chunk Retriever 130 Chunk Database 135 Text data 140 chunk scoring components 145 Chunk Score Datastore 150 prompt generation components 160 Text Generation Models 170 Chunk Synchronizer 300 User Interfaces 310 Text queries 410 Chunking Components 420 text chunks 430 Recessed Model 440 Recessed 610 Text queries 612 text chunks 614 Score Information 620 Prompt Templates 630 prompt 640 Response 710 Response 715 Overall Score Indicator 720 Cursors 800 windows 810 text chunks 812 text chunks 820 score 822 Indicator 1010 Query Server 1020 Embedded Model 1030 Vector Databases 1040 Chunk Score Store 1050 Text Generation Models

Claims

1. Steps to receive text queries from the user, The steps include identifying a first text chunk based on the similarity between the text query and a plurality of text chunks, A step of determining first score information associated with each of the first text chunks, A step of generating a first prompt based on the first score information, wherein the first prompt includes the text query and the first text chunk; The first step of sending the above prompt to the text generation model, The steps include receiving a response to the first prompt from the text generation model, The steps include presenting the response and the first text chunk, The steps include receiving an evaluation of one of the first text chunks from the user, A step of updating the first score information associated with one of the first text chunks based on the evaluation above. Methods that include...

2. The step of presenting the response and the first text chunk is, A step of determining an overall score based on the first score information associated with each of the first text chunks, A step of presenting the indicator of the overall score along with the response. The method according to claim 1, including the method described in claim 1.

3. The first score information associated with each of the first text chunks includes the first score associated with each of the first text chunks, The step of presenting the response and the first text chunk is, The step of presenting each of the first text chunks along with an indicator of the first score associated with the first text chunk. The method according to claim 2, including the method described in claim 2.

4. The step of receiving an evaluation of one of the first text chunks is: The step of receiving a selection of indicators associated with one of the first text chunks, which is different from the presented indicators for the first score associated with one of the first text chunks. The method according to claim 3, including the method described in claim 3.

5. The first score information associated with each of the first text chunks includes the first score associated with each of the first text chunks, The step of presenting the response and the first text chunk is, The step of presenting each of the first text chunks along with an indicator of the first score associated with the first text chunk. The method according to claim 1, including the method described in claim 1.

6. The step of receiving an evaluation of one of the first text chunks is: The step of receiving a selection of indicators associated with one of the first text chunks, which is different from the presented indicators for the first score associated with one of the first text chunks. The method according to claim 5, including the method described in claim 5.

7. The method according to claim 1, wherein the first score information associated with each of the first text chunks includes a first score associated with each of the first text chunks, and the first prompt includes an instruction to associate the first text chunks with importance based on the first score of each of the first text chunks.

8. The steps include receiving a second text query from a second user, A step of identifying a second text chunk based on the similarity between the second text query and the plurality of text chunks, A step of determining second score information associated with each of the second text chunks, A step of generating a second prompt based on the second score information, wherein the second prompt includes the text query and the second text chunk. The steps include sending the second prompt to the text generation model, The steps include receiving a second response to the second prompt from the text generation model, The steps include presenting the second response and the second text chunk, The steps include receiving a second evaluation of one of the second text chunks from the second user, A step of updating the second score information associated with one of the second text chunks based on the second evaluation. The method according to claim 1, further comprising:

9. The identified second text chunk includes one of the first text chunks, The method according to claim 8, wherein the determined second score information associated with one of the first text chunks is the updated first score information.

10. The first score information associated with each of the first text chunks includes the first score associated with each of the first text chunks, The step of presenting the response and the first text chunk is, The step of presenting each of the first text chunks along with an indicator of the first score associated with the first text chunk. Includes, Each of the aforementioned second text chunks and the associated second score information includes a second score associated with each of the aforementioned second text chunks. The step of presenting the second response and the second text chunk is, The step of presenting each of the second text chunks along with an indicator of the second score associated with the second text chunk. The method according to claim 8, including the method described in claim 8.

11. The step of receiving an evaluation of one of the first text chunks is: The step of receiving a selection of indicators associated with one of the first text chunks, which is different from the presented indicators for the first score associated with one of the first text chunks. Includes, The step of receiving a second evaluation of one of the second text chunks is: The step of receiving a second selection of a second indicator associated with one of the second text chunks, which is different from the presented indicator of the second score associated with one of the second text chunks. The method according to claim 10, including the method described in claim 10.

12. It is a system, Memory for storing executable program code, At least one processing unit that executes the program code to cause the system to operate The above operation is provided, Receiving queries from users, From multiple stored text portions, determine a first text portion that is semantically similar to the query, Determining a first score associated with each of the first text portions, To generate a first prompt based on the first score, wherein the first prompt includes the query and the first text portion. Sending the above first prompt to the text generation model, Receiving a response to the first prompt from the text generation model, To present the aforementioned response and the first text portion, The user provides an evaluation of one of the first text portions presented, Based on the above evaluation, update the first score associated with one of the first text portions. A system that includes this.

13. To present the aforementioned response and the first text portion, A total score is determined based on the first score associated with each of the first text portions, To present the indicator of the overall score along with the response. The system according to claim 12, including the system described in claim 12.

14. To present the aforementioned response and the first text portion, To present each of the first text portions together with an indicator of the first score associated with the first text portion. Includes, Receiving an evaluation of one of the first text portions mentioned above means To receive a selection of indicators associated with one of the first text portions, which is different from the presented indicators of the first score associated with one of the first text portions. The system according to claim 12, including the system described in claim 12.

15. The system according to claim 12, wherein the first prompt includes an instruction to associate the first text portion with importance based on the first score of each of the first text portions.

16. The aforementioned operation, Receiving a second query from a second user, From the aforementioned multiple stored text portions, determine a second text portion that is semantically similar to the second query, Determining a second score associated with each of the aforementioned second text portions, To generate a second prompt based on the second score, wherein the second prompt includes the second query and the second text portion. Sending the aforementioned second prompt to the text generation model, Receiving a second response to the second prompt from the text generation model, To present the second response and the second text portion, The second user provides a second evaluation of one of the second text portions presented, Based on the aforementioned second evaluation, update the aforementioned second score associated with one of the aforementioned second text portions. The system according to claim 12, further comprising:

17. The determined second text portion includes one of the first text portions, The system according to claim 16, wherein the determined second score associated with one of the first text portions is the updated first score.

18. To present the aforementioned response and the first text portion, To present each of the first text portions together with an indicator of the first score associated with the first text portion. Includes, The presenting of the second response and the second text portion is: To present each of the second text portions along with an indicator of the second score associated with the second text portion. Includes, Receiving an evaluation of one of the first text portions mentioned above means To receive a selection of indicators associated with one of the first text portions, which is different from the presented indicators of the first score associated with one of the first text portions. Includes, Receiving a second evaluation of one of the second text portions is To receive a second selection of a second indicator associated with one of the second text portions, which is different from the presented indicator of the second score associated with one of the second text portions. The system according to claim 16, including the system described in claim 16.

19. One or more non-temporary computer-readable recording media storing program code, wherein the program code is executable by at least one processing unit of the computing system to cause the computing system to perform an operation, and the operation is Receiving queries from users, From multiple stored text portions, determine a first text portion that is semantically similar to the query, Determining a first score associated with each of the first text portions, To generate a first prompt based on the first score, wherein the first prompt includes the query and the first text portion. Sending the above first prompt to the text generation model, Receiving a response to the first prompt from the text generation model, To present the aforementioned response and the first text portion, The user provides an evaluation of one of the first text portions presented, Based on the above evaluation, update the first score associated with one of the first text portions. One or more non-temporary computer-readable recording media, including [the specified element].

20. The first prompt includes an instruction that associates the first text portion with importance based on the first score of each of the first text portions, The aforementioned operation, Receiving a second query from a second user, From the aforementioned multiple stored text portions, determine a second text portion that is semantically similar to the second query, Determining a second score associated with each of the aforementioned second text portions, To generate a second prompt based on the second score, wherein the second prompt includes a second query and a second text portion, and the second prompt includes an instruction to associate the second text portion with importance based on the second score of each of the second text portions, Sending the aforementioned second prompt to the text generation model, Receiving a second response to the second prompt from the text generation model, To present the second response and the second text portion, The second user provides a second evaluation of one of the second text portions presented, Based on the aforementioned second evaluation, update the aforementioned second score associated with one of the aforementioned second text portions. One or more non-temporary computer-readable recording media according to claim 19, further comprising: