A human preference-oriented efficient network retrieval enhanced answering method and system
By using a large-model augmented retrieval system, a self-guided training generative model, and a human preference-aware scorer, the language model is optimized to address the issues of long-format answer generation and low network resource utilization efficiency in open-domain question answering. This results in a high-efficiency, low-cost network-augmented question answering system with generated answer quality comparable to WebGPT.
Patent Information
- Authority / Receiving Office
- CN · China
- Patent Type
- Patents(China)
- Current Assignee / Owner
- TSINGHUA UNIVERSITY
- Filing Date
- 2023-02-21
- Publication Date
- 2026-06-19
AI Technical Summary
Existing large-scale language models are inefficient in handling open-domain question answering, especially in generating long-format answers and utilizing network resources, and are costly to deploy, making it difficult to meet the demand for efficient network retrieval that humans prefer.
By developing a large-model augmented retrieval system, a self-guided training generative model, and a human preference-aware scorer, combined with web search and recall techniques, the language model is optimized to generate an efficient and low-cost web-augmented question-answering system.
We have developed a network-enhanced question-answering system that can be efficiently deployed in real-world scenarios. It boasts high performance and cost-effectiveness, generates answers of comparable quality to WebGPT, and is more efficient than models of similar size.
Smart Images

Figure CN116501843B_ABST
Abstract
Description
Technical Field
[0001] This invention relates to the field of language model retrieval technology, and in particular to an efficient web retrieval enhancement answer method and system oriented towards human preferences. Background Technology
[0002] Large language models (LLMs), such as GPT-3, PaLM, OPT, BLOOM, and GLM-130B, have significantly pushed the boundaries of machine language understanding and generation capabilities. Question answering is one of the most attractive and fundamental language applications, and the development of LLMs has greatly improved its capabilities. Their closed-book question answering and in-context learning performance, comparable to supervised models, have redefined our understanding of their potential for memorizing knowledge. However, the capabilities of LLMs are limited, failing to meet expectations when faced with challenges requiring sufficiently rare knowledge. Therefore, recent efforts by many teams have focused on building language models that are augmented for retrieval and web search. These models leverage external knowledge to accomplish tasks beyond previous imaginations. For example, WebGPT can navigate web pages, answer complex human questions in long formats, and provide reasonable references accordingly. Despite WebGPT's success, it is still far from real-world deployment. First, it relies on rich expert-level browsing instructions, pre-written answers, and answer preference tags, which require significant costs, time, and personnel training. Second, its behavioral cloning method (i.e., imitation learning) requires GPT-3 (with up to 175 billion parameters) to interact with the browser like a human expert, generating action commands (such as searching, reading, and referencing) and then retrieving relevant information from the web. This browsing pattern requires massive computational resources and is too slow for the user experience. Despite the high quality of the answers, building an efficient web-based augmented question-answering system remains a significant challenge.
[0003] Compared to traditional question-answering tasks (such as SQuAD), which assume that a given question already provides the correct reference, open-domain question answering (OCA) targets the open world, making it more practical but also more challenging. For example, the Natural Questions dataset consists of queries from the Google search engine and annotations from Wikipedia paragraphs. Web Questions generates numerous open-domain questions from knowledge bases. MS Marco collects paragraph texts and their corresponding question selection tags. However, most open-domain QA datasets and models are limited to answering short, phrased answers, while people generally prefer long-format answers that include reference information. One possible reason is that constructing and evaluating long-format question-answering datasets with open-world references is difficult and requires expert-level annotation. Recent attempts include ELI5, which collects questions and long-format answers from Reddit, and WebGPT, which employs a large number of experts for annotation and utilizes the GPT-3 database with up to 175 billion parameters as its backbone.
[0004] In terms of retrieval techniques, current mainstream methods include BM25 and TF-IDF based on sparse vectors, and more recently, dense vector-based methods such as DPR and Contriever. REALM introduces the idea of retrieval-enhanced language models, advocating joint optimization of the retrieval engine and the language model. Representative works include RAG, Fusion-in-Decoder, and Atlas. WebGPT is similar in concept, requiring a large language model to interact with the browser to seek relevant information for better accuracy. However, it can consume significant computational resources and is too slow for practical deployment. Summary of the Invention
[0005] The present invention aims to at least partially solve one of the technical problems in the related art.
[0006] To address this, this invention proposes an efficient web retrieval augmentation answering method oriented towards human preferences. It enhances the pre-trained language model through web search and recall techniques to enable real-world application deployment while maintaining high system efficiency and low deployment costs. This invention develops a large-model augmented retrieval engine, a self-guided training generative model, and a human preference-aware scorer to accomplish challenging tasks. Based on this, a system standard for evaluating web-augmented question answering systems is proposed, and extensive multi-dimensional human evaluation and quantitative ablation studies are conducted, demonstrating that the WebGLM design outperforms existing systems. Furthermore, while maintaining high performance, efficiency, and cost-effectiveness, its performance in human evaluation is superior to, and even comparable to, WebGPT (13B) of similar size (175B).
[0007] Another objective of this invention is to propose an efficient web retrieval enhancement response system oriented towards human preferences.
[0008] To achieve the above objectives, this invention proposes an efficient web retrieval enhancement response method oriented towards human preferences, comprising:
[0009] Use a preset web search engine to retrieve candidate answers to the question from web pages;
[0010] Candidate references are obtained using a retrieval model;
[0011] A question-and-answer dataset based on the candidate reference materials is generated through contextual learning of the language model. The adoption information of the candidate reference materials by the language model in the question-and-answer dataset is used as a label to train the retrieval model. The language model is then fine-tuned through the question-and-answer dataset so that the trained language model can generate the question answer based on the candidate reference materials.
[0012] A human preference-aware rating system is built based on user feedback data on the answers to the questions, so as to obtain the optimized results of the answers to the questions based on the trained rating system.
[0013] In addition, the efficient web retrieval enhancement response method based on human preferences according to the above embodiments of the present invention may also have the following additional technical features:
[0014] Furthermore, in one embodiment of the present invention, the step of obtaining candidate answers corresponding to the question from web pages using a preset web search engine includes:
[0015] A list of candidate webpage URLs is obtained based on the analysis results of the problem using a web search engine interface;
[0016] The corresponding web page content of the candidate web pages is obtained based on the URL list and the parallel strategy;
[0017] The extracted text content of the web page is divided into paragraphs using line breaks.
[0018] Furthermore, in one embodiment of the present invention, the adoption information of candidate reference materials by the language model in the question-answering dataset is used as a label to train the retrieval model, including:
[0019] The citation results of the language model are corrected using a citation correction algorithm, and the candidate references are then used to determine whether to adopt them.
[0020] The retrieval model is used to encode the question and the candidate references used, and the relevance score of the question and the candidate references used is calculated based on the encoding results.
[0021] The mean squared error of the relevance score is used as the loss function for prediction and the accuracy score to train the retrieval model to obtain a trained retrieval model.
[0022] Furthermore, in one embodiment of the present invention, generating the question-answer dataset based on the candidate reference materials through contextual learning of a language model includes:
[0023] Get prompts for candidate reference materials and questions;
[0024] Instructions for context learning are obtained based on the prompt words and the parameters of the language model;
[0025] The question-and-answer dataset is generated by performing single-sample learning based on the instructions learned from the context.
[0026] Furthermore, in one embodiment of the present invention, the step of establishing a human preference-perceived rating system based on user feedback data regarding the answers to the questions includes:
[0027] Obtain all question answers and the corresponding number of user likes, as well as the valid question answers corresponding to the number of user likes and the valid questions corresponding to the valid question answers;
[0028] Compare the median length threshold of all question answers with the length threshold of valid question answers, and obtain the final valid question answer based on the length comparison result;
[0029] The final valid answers to the questions are sorted by the number of user likes. Based on the sorting results, the answers with more than a preset number of user likes are selected as positive and negative samples to train the scorer for human preference perception.
[0030] To achieve the above objectives, another aspect of the present invention proposes an efficient web retrieval enhanced response system oriented towards human preferences, comprising:
[0031] The coarse-grained search module is used to retrieve candidate answers to questions from web pages using a preset web search engine;
[0032] The fine-grained retrieval module is used to obtain candidate references using the retrieval model;
[0033] The model training module is used to generate a question-and-answer dataset based on the candidate reference materials through contextual learning of the language model, use the language model's adoption information of the candidate reference materials in the question-and-answer dataset as labels to train the retrieval model, and fine-tune the language model through the question-and-answer dataset so as to generate question answers based on the candidate reference materials through the trained language model.
[0034] The scorer optimization module is used to build a human preference-aware scorer based on user feedback data on the answers to the question, so as to obtain the optimized answer to the question based on the trained scorer.
[0035] The efficient web retrieval and enhanced answer method and system based on human preferences of this invention enhances the pre-trained language model through web search and recall technology to achieve application deployment in real-world scenarios, while maintaining high system efficiency and low deployment cost, and has high performance, efficiency and cost-effectiveness.
[0036] Additional aspects and advantages of the invention will be set forth in part in the description which follows, and in part will be obvious from the description, or may be learned by practice of the invention. Attached Figure Description
[0037] The above and / or additional aspects and advantages of the present invention will become apparent and readily understood from the following description of the embodiments taken in conjunction with the accompanying drawings, wherein:
[0038] Figure 1 This is a flowchart of an efficient web retrieval and response enhancement method based on human preferences, according to an embodiment of the present invention.
[0039] Figure 2 This is a framework diagram of an efficient network retrieval and enhanced response method based on human preferences according to an embodiment of the present invention;
[0040] Figure 3 This is a schematic diagram illustrating coarse-grained web search and fine-grained language model extraction retrieval according to an embodiment of the present invention;
[0041] Figure 4 This is a schematic diagram illustrating the construction of a high-quality dataset according to an embodiment of the present invention;
[0042] Figure 5 This is an example diagram of dataset generation according to an embodiment of the present invention;
[0043] Figure 6 This is a schematic diagram of the structure of an efficient web retrieval and enhanced response system based on human preferences, according to an embodiment of the present invention. Detailed Implementation
[0044] It should be noted that, unless otherwise specified, the embodiments and features described in the present invention can be combined with each other. The present invention will now be described in detail with reference to the accompanying drawings and embodiments.
[0045] To enable those skilled in the art to better understand the present invention, the technical solutions of the present invention will be clearly and completely described below with reference to the accompanying drawings of the embodiments of the present invention. Obviously, the described embodiments are only some embodiments of the present invention, and not all embodiments. Based on the embodiments of the present invention, all other embodiments obtained by those skilled in the art without creative effort should fall within the scope of protection of the present invention.
[0046] The following description, with reference to the accompanying drawings, describes an efficient web retrieval enhancement answer method and system based on human preferences, according to embodiments of the present invention.
[0047] The web retrieval augmented question-answering system implemented in this invention is a complex system engineering project requiring cross-domain collaboration, including large-scale language models, retrieval augmentation techniques, and reinforcement learning based on human feedback. This invention develops the WebGLM system, a practical web retrieval augmented question-answering system based on GLM-10B—it is efficient, cost-effective, and human-preference-aware; most importantly, its quality is comparable to WebGPT. This system employs several novel strategies and designs, and its process framework is as follows: Figure 2 As shown:
[0048] Large Model Augmented Retriever (LLM-augmented Retriever): This invention implements a two-stage retrieval system comprising coarse-grained web search and fine-grained large model extraction retrieval. It is inspired by the fact that large language models like GPT-3 can naturally learn to adopt correct citations, a capability that can be extracted to augment smaller, denser retrieval systems.
[0049] Bootstrap Generator: This invention discovers that large language models can learn to generate high-quality data, which initially relied on expensive human expert writing, through appropriate reference-based filtering. Therefore, this invention generates a high-quality dataset, WebGLM QA, using the large language model GPT-3 for in-context learning and corresponding cleaning and correction methods. This is a question-and-answer dataset with web retrieval references and long-formatted answers, including 45k high-quality samples after filtering and 83k noisy but diverse samples before filtering. The backbone of the WebGLM system is the GLM-10B model trained on the WebGLM QA dataset.
[0050] Human Preference-aware Scorer: A scorer trained on user upvote data from online question-and-answer forums to learn human preferences for different answers. Compared to the high-cost labeling by experts on WebGPT, this invention demonstrates that a high-quality scorer can also be trained with appropriate dataset construction and used for high-quality answer filtering.
[0051] Figure 1 This is a flowchart of an efficient online retrieval and response enhancement method based on human preferences, according to an embodiment of the present invention.
[0052] like Figure 1 As shown, the method includes, but is not limited to, the following steps:
[0053] S1: Use a preset web search engine to obtain candidate answers that correspond to the question from web pages.
[0054] Understandably, in traditional open-domain QA, systems typically only retrieve information from reliable sources (such as encyclopedias) and fail to benefit from knowledge across the entire web, because it's difficult to extract useful information from native web pages. This invention attempts to address this problem through a two-stage retrieval process: coarse-grained web search and fine-grained language model-based retrieval, such as... Figure 3 As shown.
[0055] In one embodiment of the invention, in coarse-grained web search, a third-party web search engine is used to obtain routes to the main candidate web pages. In most cases, these pages can cover the necessary context and knowledge to answer the question, but at the same time, they also contain a lot of useless information. This process mainly consists of three steps: searching, retrieving, and extracting.
[0056] In the search step, the present invention inputs the question into the search engine interface and obtains a list of URLs of potentially relevant pages (usually less than 10).
[0057] In the retrieval step, the corresponding webpage HTML content is obtained according to the aforementioned routing. Since there are many candidate pages, this invention uses a parallel strategy to improve efficiency.
[0058] In the extraction step, this invention uses HTML2TEXT2 to extract part of the text content in the HTML page and divides them into a list of paragraphs based on line breaks.
[0059] S2, using the retrieval model to obtain candidate references.
[0060] Understandably, this step involves fine-grained language model enhancement for retrieval, such as... Figure 3As shown, some paragraphs that may be useful for the question have been retrieved during web retrieval. However, even with filtering by widely used dense retrieval systems, many of them remain irrelevant (up to 30% of the context was irrelevant in the experiments of this invention). As a solution, this invention leverages the powerful language understanding capabilities of large language models for imitation learning to achieve question-specific paragraph selection.
[0061] Specifically, this invention explores the behavior of language models in using references. It finds that large language models can naturally distinguish between useful and useless references and use the useful ones in their answers. A dataset of 200 questions was created, each with five candidate references selected by Contriever scoring. The relevance of each reference was manually labeled, revealing that only 68.6% were relevant. When provided with GPT-3 candidate citations for answering, this invention found that it only used partial citations, achieving an accuracy of 90.2%, significantly higher than Contriever.
[0062] Furthermore, this invention distills the ability of a large language model to identify useful references into the retrieval model. This invention utilizes the reference adoption information from GPT-3 in the WebGLM-QA dataset (see data construction method below) as labels to fine-tune the Contriever model. Since GPT-3 sometimes generates incorrect citations, this invention first uses a citation correction algorithm before determining adoption. During fine-tuning, this invention uses two Contrievers to encode the question and references respectively, and calculates their inner product as the relevance score. This invention uses mean squared error (MSE) as the loss function for prediction and Rouge-1 accuracy score to train the Contriever. As shown in Table 1, further quantitative experiments of this invention demonstrate that the distillation operation significantly improves the question-answering retrieval accuracy enhanced by the Contriever network.
[0063] Table 1
[0064]
[0065] S3 is used to generate a question-and-answer dataset based on candidate references through contextual learning of a language model. The language model's adoption information of candidate references in the question-and-answer dataset is used as a label to train the retrieval model. The language model is then fine-tuned using the question-and-answer dataset to generate question answers based on candidate references using the trained language model.
[0066] Understandably, building a question-answering system enhanced by web retrieval presents a major hurdle: constructing high-quality question-answer datasets containing long texts with correctly cited answers is extremely costly. Compared to traditional question-answering systems, this invention aims to produce fact-based answers with accurate references.
[0067] In-context learning for large-scale language models—the task of generating answers based on a small number of samples as context—has recently been well-proven and explored. Therefore, this invention utilizes OpenAI's interface, questions from the ELI5 dataset, and references collected by the retrieval mechanism of this invention to generate long-form answers with numerous citations. Furthermore, since the quality of the generated samples is sometimes poor, this invention designs corresponding correction and selection strategies to filter out a high-quality subset for actual training. Finally, this invention creates the WebGLM-QA dataset, a question-and-answer dataset containing citations in long-text format, with 45k high-quality filtered and 83k unfiltered samples. The process of constructing the dataset is as follows... Figure 4 As shown.
[0068] Specifically, this invention first requires the selection of appropriate prompts. Since this invention inputs a great deal of content into contextual learning, including examples, questions, and corresponding references, the formulation of prompts can significantly impact performance. This invention compares several types of prompts, including the order of questions and their references, symbols used to mark reference indexes, and prompts for both references and questions. This invention experiments with each of the prompts mentioned in this invention, ultimately finding the appropriate one. Figure 5 The method shown in (a) performs best.
[0069] Furthermore, this invention utilizes a self-guided model to generate task descriptions. This invention requires appropriate guidance (e.g., "Please write an answer based on the question and references") to instruct a large language model to generate a qualified answer. Recent work has shown that this invention can leverage the large language model itself to design context-learning instructions, rather than through human intervention. This invention uses several high-quality examples to elicit some possible instructions, such as... Figure 5 (b) in the present invention evaluates the best performing result among several experiments.
[0070] Furthermore, this invention generates a large amount of data through contextual learning. For example... Figure 5As shown in (c), this invention investigated the optimal number of examples required to generate answers in long text form. Since the reference portion typically occupies a large portion of the sequence length, this invention noted that, most of the time, the answer quality of one-shot learning can surpass that of multiple-shot learning. Therefore, this invention ultimately generated a dataset of over 80,000 data points through one-shot learning.
[0071] This invention generates a large number of answers through GPT-3 context learning; however, it discovered that some answers cite incorrect or non-existent reference numbers. Therefore, correcting the reference numbers is an essential step to ensure high quality. This invention corrects the citation numbers based on the similarity between the citation and the references. An answer is divided into several segments by the citation numbers used to generate the answer, and then matched with the references. For each question, the references retrieved by this invention are defined as R, and the answer can be defined. This invention can define text segments, and for each pair, it calculates a citation matching score. This invention selects a threshold, and the final citation for each segment can be described. This invention ultimately selects the Rouge-1 score as the function f and uses 0.57 as the threshold T.
[0072] After correction, this invention further investigated more issues that might affect the quality of the dataset. This invention will not use a generated sample if it exhibits the following problems: 1) The answer utilizes the internal knowledge of a large language model instead of being based on references. Such answers are not fact-based and are sometimes seriously erroneous. This can be identified by the low overlap ratio between all references and the answer. 2) When an answer cites too few of the provided references, it typically exhibits poor reference relevance and therefore often lacks sufficient information and factual basis. 3) If an answer has too many incorrect citation numbers, this invention assumes it is a low-quality answer.
[0073] Under this screening strategy, the present invention ultimately obtained approximately 45,000 high-quality data points. The following are the results of the manual evaluation of the samples, as shown in Table 2.
[0074] Table 2
[0075]
[0076] S4. Build a human preference-aware rating system based on user feedback data on question answers, and obtain optimized question answers based on the trained rating system.
[0077] Understandably, in initial tests, the model trained on the dataset generated by the aforementioned strategy performed satisfactorily in most cases. However, recent research indicates that using human feedback (preferences or dislikes) on information generated by large language models is crucial for high-quality text generation. WebGPT recruited numerous experts to compare and rank the generated answers, and used the feedback to train a reward model to select the best answer from n candidates, further optimizing the model through reinforcement learning.
[0078] However, such expert annotation is costly, and reinforcement learning consumes significant computational resources. In this work, as a competing alternative, the invention uses a large amount of user feedback (e.g., number of likes) from online question-and-answer forums to build a human-preference-aware rater. With proper design and data cleaning, the invention experimentally demonstrates that such a rater also significantly improves the quality of answers and ratings in real-world human assessments.
[0079] Specifically, this invention first collects question-and-answer pairs and corresponding user likes from online question-and-answer forums. Although these answers are diverse, their length and quality vary greatly, and without proper preprocessing, the scorer may be biased during training.
[0080] In one embodiment of the present invention, the preprocessing process includes the following steps: 1) High-quality feedback: The present invention defines an answer with more than 3 likes as a valid answer. The present invention selects questions with 8 or more valid answers as qualified questions. 2) Length bias: The present invention noted in preliminary research that longer answers often have higher scores, rather than better answers. To mitigate bias, for each qualified question, the present invention uses the median length of all answers as a threshold to truncate longer answers and discards answers shorter than a certain length. 3) Contrast enhancement: After sorting answers by likes, the difference between adjacent answers is small, and the scorer trained on such an uninformative dataset performs poorly. To increase the difference between samples of the answers trained on contrast, the present invention selects answers with more than 5 at the ranking position as positive and negative sample pairs.
[0081] After preprocessing in this invention, there are approximately 93,000 questions and 249,000 positive and negative sample pairs, of which 230,000 pairs are used as the training set and 19,000 pairs are used as the test set. The backbone model for training the scorer in this invention is a GLM with 6 billion parameters.
[0082] This invention uses 272 questions displayed on WebGPT’s official website for primary evaluation—because WebGPT is not publicly available, and the selected questions are typically complex and closer to real human problems.
[0083] This invention recruited 15 experts with master's degrees for evaluation. For each question, this invention aggregated all search results and answers from different models into a single table, enabling annotators to effectively compare them and standardize annotation criteria. This invention evaluated the performance of its model against other different models through manual evaluation. This invention also compared and analyzed the results from different perspectives; the main results are shown in Table 3.
[0084] Table 3
[0085]
[0086] In addition, this invention conducted a Turing test. This invention randomly selected 200 questions from the 272 questions displayed on WebGPT's official website. For each question, this invention shuffled the answers generated by WebGLM, WebGPT-175B, WebGPT-13B, and perplexity.ai, and removed citation tags from the answers for fairness. Next, this invention mixed human-written answers into these answers and asked evaluators to rank the answers according to their quality, such as correctness, informativeness, and authenticity. The experimental results are shown in Table 4.
[0087] Table 4
[0088]
[0089] This invention presents an efficient web retrieval and answer enhancement method based on human preferences. It enhances a pre-trained language model through web search and recall techniques to enable real-world application deployment while maintaining high system efficiency and low deployment costs. A large-model-enhanced retrieval tool, a self-guided training generative model, and a human preference-aware scorer are developed to handle challenging tasks. Furthermore, a system standard for evaluating web-enhanced question answering systems is proposed, and extensive multi-dimensional human evaluation and quantitative ablation studies are conducted, demonstrating that the WebGLM design outperforms existing systems.
[0090] To achieve the above embodiments, such as Figure 6 As shown, this embodiment also provides an efficient web retrieval enhanced answer system 10 oriented towards human preferences. The device 10 includes a coarse-grained search module 100, a fine-grained retrieval module 200, a model training module 300, and a scorer optimization module 400.
[0091] The coarse-grained search module 100 is used to obtain candidate answers corresponding to questions from web pages using a preset web search engine;
[0092] The fine-grained retrieval module 200 is used to obtain candidate references using the retrieval model;
[0093] The model training module 300 is used to generate a question-and-answer dataset based on candidate references through contextual learning of the language model. The language model's adoption information of candidate references in the question-and-answer dataset is used as a label to train the retrieval model. The language model is then fine-tuned through the question-and-answer dataset so that the trained language model can generate the question answer based on the candidate references.
[0094] The scorer optimization module 400 is used to build a human preference-aware scorer based on user feedback data on question answers, so as to obtain the optimized result of question answers based on the trained scorer.
[0095] Furthermore, the aforementioned coarse-grained search module 100 is also used for:
[0096] A list of candidate webpage URLs is obtained based on the analysis results of the problem using a web search engine interface;
[0097] The corresponding web page content of the candidate web pages is obtained based on the URL list and the parallel strategy;
[0098] The extracted web page content is divided into paragraphs using line breaks.
[0099] Furthermore, the aforementioned model training module 300 is also used for:
[0100] The citation correction algorithm is used to correct the citation results of the language model, and the candidate references are used to determine whether to adopt them.
[0101] The retrieval model was used to encode the question and the candidate references used, and the relevance scores of the question and the candidate references used were calculated based on the encoding results.
[0102] The mean squared error of the relevance score is used as the loss function for prediction, and the accuracy score is used to train the retrieval model to obtain a well-trained retrieval model.
[0103] Furthermore, the aforementioned model training module 300 is also used for:
[0104] Get prompts for candidate reference materials and questions;
[0105] Instructions for context learning are derived from prompt words and parameters of the language model;
[0106] Context-based instruction learning is used to perform single-sample learning to generate question-answer datasets.
[0107] Furthermore, the aforementioned scorer optimization module 400 is also used for:
[0108] Retrieve all question answers and their corresponding number of user likes, as well as the valid question answers corresponding to the number of user likes and the valid questions corresponding to the valid question answers;
[0109] Compare the median length threshold of all question answers with the length threshold of valid question answers, and obtain the final valid question answers based on the length comparison results;
[0110] The final valid answers to the questions are sorted by the number of user likes. Based on the sorting results, answers with more than a preset number of user likes are selected as positive and negative samples to train a scorer for human preference perception.
[0111] This invention presents a human-preference-oriented, high-efficiency web retrieval augmented answering system that enhances pre-trained language models through web search and recall techniques to enable real-world application deployment while maintaining high system efficiency and low deployment costs. A large-model augmented retrieval tool, a self-guided training generative model, and a human preference-aware scorer were developed to handle challenging tasks. Furthermore, a system standard for evaluating web-enhanced question answering systems was proposed, and extensive multi-dimensional human evaluation and quantitative ablation studies were conducted, demonstrating that the WebGLM design outperforms existing systems.
[0112] In the description of this specification, the references to terms such as "one embodiment," "some embodiments," "example," "specific example," or "some examples," etc., indicate that a specific feature, structure, material, or characteristic described in connection with that embodiment or example is included in at least one embodiment or example of the present invention. In this specification, the illustrative expressions of the above terms do not necessarily refer to the same embodiment or example. Furthermore, the specific features, structures, materials, or characteristics described may be combined in any suitable manner in one or more embodiments or examples. Moreover, without contradiction, those skilled in the art can combine and integrate the different embodiments or examples described in this specification, as well as the features of different embodiments or examples.
[0113] Furthermore, the terms "first" and "second" are used for descriptive purposes only and should not be construed as indicating or implying relative importance or implicitly specifying the number of technical features indicated. Thus, a feature defined as "first" or "second" may explicitly or implicitly include at least one of that feature. In the description of this invention, "a plurality of" means at least two, such as two, three, etc., unless otherwise explicitly specified.
Claims
1. A highly efficient web retrieval enhancement answer method based on human preferences, characterized in that, Includes the following steps: Use a preset web search engine to retrieve candidate answers to the question from web pages; Candidate references are obtained using a retrieval model; Get prompts for candidate reference materials and questions; Instructions for context learning are obtained based on the prompt words and the parameters of the language model; Single-sample learning is performed based on the instructions learned from the context to generate a question-answering dataset; the citation results of the language model are corrected using a citation correction algorithm, and the candidate references are used based on the corrected citation results; the question and the candidate references are encoded using a retrieval model, and the relevance score of the question and the candidate references is calculated based on the encoding results. The mean squared error of the relevance score is used as the loss function and accuracy score to train the retrieval model to obtain a trained retrieval model. The language model is then fine-tuned using the question-and-answer dataset to generate question answers based on the candidate references using the trained language model. Obtain all question answers and their corresponding user likes, as well as the valid question answers corresponding to the user likes and the valid questions corresponding to the valid question answers; compare the median length threshold of all question answers with the length threshold of the valid question answers, and obtain the final valid question answers based on the length comparison results; sort the final valid question answers by the user likes, and select answers with user likes greater than a preset number as positive and negative samples to train the human preference perception scorer, so as to obtain the optimized result of the question answers based on the trained scorer.
2. The method of claim 1, wherein, The step of obtaining candidate answers corresponding to the question from web pages using a preset web search engine includes: A list of candidate webpage URLs is obtained based on the analysis results of the problem using a web search engine interface; The corresponding web page content of the candidate web pages is obtained based on the URL list and the parallel strategy; The extracted text content of the web page is divided into paragraphs using line breaks.
3. A human preference oriented efficient network search enhanced answering system, characterized by, include: The coarse-grained search module is used to retrieve candidate answers to questions from web pages using a preset web search engine; The fine-grained retrieval module is used to obtain candidate references using the retrieval model; The model training module is used to obtain candidate reference materials and prompts for the questions; Instructions for context learning are obtained based on the prompt words and the parameters of the language model; Single-sample learning is performed based on the instructions learned from the context to generate a question-answering dataset; the citation results of the language model are corrected using a citation correction algorithm, and the candidate references are used based on the corrected citation results; the question and the candidate references are encoded using a retrieval model, and the relevance score of the question and the candidate references is calculated based on the encoding results. The mean squared error of the relevance score is used as the loss function and accuracy score to train the retrieval model to obtain a trained retrieval model. The language model is then fine-tuned using the question-and-answer dataset to generate question answers based on the candidate references using the trained language model. The scorer optimization module is used to obtain all question answers and their corresponding user likes, as well as the valid question answers corresponding to the user likes and the valid questions corresponding to the valid question answers; compare the median length threshold of all question answers with the length threshold of the valid question answers, and obtain the final valid question answers based on the length comparison results; sort the final valid question answers by the user likes, and select answers with user likes greater than a preset number as positive and negative samples to train the scorer for human preference perception, so as to obtain the optimized result of the question answers based on the trained scorer.
4. The system of claim 3, wherein, The coarse-grained search module is also used for: A list of candidate webpage URLs is obtained based on the analysis results of the problem using a web search engine interface; The corresponding web page content of the candidate web pages is obtained based on the URL list and the parallel strategy; The extracted text content of the web page is divided into paragraphs using line breaks.