A generative document re-ranking method based on retrieval augmentation generation

By breaking down the thought chain into atomic reasoning steps and combining it with a sliding window reordering strategy, this paper solves the problem that existing document reordering methods cannot effectively perceive reasoning logic in complex question-answering tasks. It achieves globally consistent ranking of a large number of candidate documents and preservation of high-value evidence, thereby improving the recall accuracy of the model.

CN122019766BActive Publication Date: 2026-06-26HANGZHOU DIANZI UNIV

Patent Information

Authority / Receiving Office
CN · China
Patent Type
Patents(China)
Current Assignee / Owner
HANGZHOU DIANZI UNIV
Filing Date
2026-04-14
Publication Date
2026-06-26

AI Technical Summary

Technical Problem

Existing document reordering methods cannot effectively perceive reasoning logic in complex question-answering tasks, and are limited by the length of the context window, resulting in the omission of high-value evidence and affecting system performance.

Method used

By breaking down the thought chain into atomic reasoning steps, quantifying the information gain and semantic similarity of documents to the reasoning steps, and combining a sliding window reordering strategy and a position-aware weighted loss function, the accuracy and efficiency of the document reordering model are improved.

Benefits of technology

It effectively improves the recall accuracy of key documents in complex question-answering scenarios, avoids evidence omissions caused by data truncation, and enhances the model's recall capability in inference tasks.

✦ Generated by Eureka AI based on patent content.

Smart Images

  • Figure CN122019766B_ABST
    Figure CN122019766B_ABST
Patent Text Reader

Abstract

The application discloses a generative document reordering method based on retrieval enhancement generation, which comprises the following steps: firstly, preprocessing an original document set to obtain a candidate document set; then, inputting a query question and the candidate document set into a large language model to generate answers and thought chains, disassembling the thought chains into atomic reasoning steps, and then performing sample screening; calculating the information gain score and the semantic similarity score of each atomic reasoning step for each candidate document, taking the maximum value as the final contribution score of the document after weighted fusion, and sorting the candidate documents according to the score to form a training data set; training a generative reordering model to make it output an ordered sequence of document identifiers; and applying the pre-trained generative reordering model in the online inference stage. The method changes the sorting target to the actual contribution of the document to the reasoning step, combines a high-quality supervision signal, constructs a sliding window global rearrangement strategy, and improves the retrieval accuracy of key documents and the practicability of the reordering model in a complex question and answer scene.
Need to check novelty before this filing date? Find Prior Art

Description

Technical Field

[0001] This invention relates to the fields of natural language processing and information retrieval technology, and in particular to a document reordering method for retrieval enhancement generation scenarios, with the aim of improving document ranking performance. Background Technology

[0002] In retrieval-augmented generation (RAG) question-answering systems, document ranking is a critical factor affecting system performance. The system first recalls candidate documents relevant to the query from a large-scale document library, and then uses a re-ranking model to filter out high-quality documents, providing reliable evidence to support subsequent answer generation.

[0003] Currently, most document re-ranking methods rely on surface-level relevance measures between queries and documents, such as keyword-based matching scores, sparse retrieval scores, or dense vector similarity scores. While these methods perform well in simple factual question-answering scenarios, in complex question-answering tasks involving multi-hop reasoning, causal inference, contrastive analysis, or cross-document evidence integration, their ranking results often do not match the evidence support required for the reasoning process. Specifically, this manifests in the following two typical problems:

[0004] "Similar but not usable": Some documents are highly similar to the query topic in semantics, but lack the necessary facts or evidence to support a key step in the reasoning chain, and therefore cannot truly support the generation of an answer;

[0005] "Usable but not similar": Some documents contain key evidence, but due to significant differences in expression from the query, the surface similarity is low, so they rank low in the candidate list and are difficult to be effectively used by the subsequent reasoning module.

[0006] Furthermore, the training of existing re-ranking models typically relies on coarse-grained manual annotations such as "relevant" or "irrelevant," which fail to reflect the specific contributions of documents to different steps in complex reasoning processes. This lack of supervision signals limits the performance improvement of models in reasoning-aware ranking tasks.

[0007] On the other hand, in actual deployments, re-ranking models are often limited by the preset context window length. When there are many candidate documents, the model cannot process all documents at once, and often needs to reduce the input through truncation or random sampling, which leads to the omission of high-value evidence and further reduces the overall system performance.

[0008] Therefore, designing a reordering method that can perceive reasoning logic, refine supervisory signals, and adapt to long document lists has become a key challenge in improving the accuracy and reliability of retrieval enhancement generation systems. Summary of the Invention

[0009] To address the aforementioned technical problems, the core objective of this invention is to provide a generative document re-ranking method that balances reasoning utility and global ranking capability. By shifting the ranking objective to the actual contribution of a document to the reasoning steps, and combining high-quality supervision signal construction with a sliding window global re-ranking strategy, this method improves the retrieval accuracy of key documents and the practicality of the re-ranking model in complex question-answering scenarios.

[0010] To achieve the above objectives, the technical solution specifically adopted by the present invention is as follows:

[0011] A generative document re-ranking method based on retrieval enhancement, executed by a computing device, includes offline training and online inference phases, with the following specific steps:

[0012] (1) The original document set is cleaned and structured and divided into document fragments. After vectorization encoding, a vector index is constructed using the Faiss database to establish a mapping relationship between the fragments and the original documents. The query question is received and vector retrieval is performed based on the vector index to obtain a candidate document set.

[0013] (2) Input the query question and candidate document set into the large language model to generate the answer and thought chain. After decomposing the thought chain into atomic reasoning steps, sample screening is carried out by answer comparison, evidence verification and logical consistency evaluation to remove unqualified data samples.

[0014] (3) Calculate the information gain score and semantic similarity score of each candidate document for each atomic reasoning step, and take the maximum value after weighted fusion as the final contribution score of the document. Sort the candidate documents according to the score to form the training dataset.

[0015] (4) Encode the query question and training dataset as model input, iteratively train the generative reordering model to make its output document identifiers ordered sequence, introduce position-aware weighted loss function during training, and give higher loss weight to the documents ranked first.

[0016] (5) During the online inference stage, if the number of candidate documents does not exceed the input length limit of the generative reordering model, the ranking result is obtained by directly inputting the data into the model; if it exceeds the limit, the window capacity and sliding step size are determined according to the model context window, and global reordering is achieved by iterative sliding of the window.

[0017] Preferably, the document segmentation in step (1) is performed by segmenting according to natural paragraphs or semantic boundaries. When a paragraph exceeds a preset threshold, it is segmented again according to a fixed token length. An overlapping area is set between adjacent document segments, and the target length of the document segment is 256 or 512 tokens.

[0018] Preferably, the vectorization encoding in step (1) adopts the bge-m3 embedding model, which maps both document fragments and query questions to the same semantic vector space; the vector retrieval uses cosine similarity to measure the similarity between the query vector and the document fragment vector, and aggregates the retrieval results from the fragment layer to the document layer to form a candidate document set.

[0019] As a preferred method, the method of decomposing the thought chain into atomic reasoning steps in step (2) is as follows: based on the logical order or syntactic boundaries in the thought chain text, each semantic unit that can independently express intermediate inference is taken as an atomic reasoning step.

[0020] As a preferred option, the sample screening step in step (2) includes: comparing the final answer generated by the large language model with the standard answer; determining whether the atomic reasoning steps can locate supporting evidence in the candidate documents; and determining whether there is a logical conflict or causal inconsistency between the atomic reasoning steps.

[0021] Preferably, the method for obtaining the information gain score in step (3) is as follows:

[0022] For each candidate document and each atomic inference step, calculate the probability of generating the inference step under the condition that the document is introduced, calculate the probability of generating the inference step without the document being introduced, and calculate the information gain score of the document for the inference step based on the above probabilities.

[0023] Preferably, the method for obtaining the semantic similarity score in step (3) is as follows:

[0024] Atomic inference steps and candidate documents are mapped to the same semantic vector space, and the semantic similarity score between each candidate document and each atomic inference step is calculated.

[0025] Preferably, the weighted fusion in step (3) is the information gain score multiplied by the first weight parameter, plus the difference between the semantic similarity score multiplied by 1 and the weight parameter, wherein the weight parameter is used to balance causal contribution and semantic relevance.

[0026] Preferably, the location-aware weighted loss function in step (4) is the sum of the product of the negative reciprocal of the number of candidate documents and the product of the weight value of each sorting position and the logarithm of the generation probability of the document number at the corresponding position; the weight value of the sorting position is the ratio of 1 plus the hyperparameter to 1 plus the logarithm of the position number, the hyperparameter is greater than 0 and the weight value decreases as the sorting position increases.

[0027] As a preferred option, the sliding step size in step (5) is half of the window capacity rounded down. The determination of the window capacity ensures that the model input does not exceed a preset proportion of the maximum input length and reserves output space. The globally highly relevant documents are generated by the window iterative sliding and candidate update strategy aggregation.

[0028] Preferably, the generative reordering model adopts a listwise sorting method, which is obtained by fine-tuning the LLaMA-7B base model; the large language model is a GPT-4 model with reasoning ability, used to generate thought chains and construct sorting supervision signals.

[0029] This invention has the following characteristics and beneficial effects:

[0030] First, by breaking down the thought chain into atomic reasoning steps, the information gain and semantic similarity of documents to each reasoning step are quantified. This shifts the ranking objective from surface semantic matching to supporting reasoning utility, effectively improving the recall accuracy of key documents in complex question-answering scenarios and solving the core problems of "similar but unusable" and "usable but dissimilar." Second, for practical applications with limited model context window length, a sliding window re-ranking strategy is designed. By reasonably setting the window capacity and sliding step size, combined with iterative window movement and candidate updates, a globally consistent ranking of large-scale candidate document sets is achieved, avoiding the omission of high-value evidence due to data truncation. Third, a position-aware weighted loss function is introduced during the training process of the generative re-ranking model, making the model optimization process focus more on the accuracy of top-ranked documents, further strengthening the recall capability of core relevant documents in reasoning tasks and enhancing the practical application value of the model.

[0031] Furthermore, this invention employs a multi-round sample selection mechanism to eliminate samples with incorrect answers, insufficient evidence, or logical inconsistencies, ensuring the quality of the training dataset and avoiding interference from illusory supervision signals during model training. Simultaneously, it models the document reordering problem as a conditional sequence generation problem, using a listwise sorting method to enable the model to directly output an ordered sequence of documents, improving the efficiency and accuracy of reordering. The method of this invention can be widely applied to retrieval-enhanced question-answering systems, providing reliable evidence documents to support the reasoning and generation of large models. It is suitable for various complex question-answering scenarios such as multi-hop reasoning, causal inference, and cross-document evidence splicing, demonstrating good versatility and practicality.

[0032] This invention utilizes reasoning steps to shift the ranking objective towards reasoning utility, effectively improving the accuracy of key documents in question answering.

[0033] This invention addresses practical application scenarios with limited model context length by employing a sliding window reordering strategy to achieve globally consistent ranking of large-scale candidate document sets.

[0034] By introducing a position-aware weighted loss mechanism during model training, the model pays more attention to the accuracy of the top-ranked documents in the ranking results during optimization, effectively improving the recall ability of relevant documents in inference tasks. Attached Figure Description

[0035] Figure 1 The flowchart for constructing the dataset for this invention is shown.

[0036] Figure 2 This is a flowchart of the reordering based on a sliding window.

[0037] Figure 3 A diagram illustrating the sorting of documents in a sliding window. Detailed Implementation

[0038] The present invention will now be described in detail with reference to specific embodiments. These embodiments will help those skilled in the art to further understand the present invention, but do not limit the invention in any way. It should be noted that, unless otherwise specified, the embodiments and features described in the present invention can be combined with each other.

[0039] This invention discloses a generative document re-ranking method based on retrieval enhancement, executed by a computing device, comprising two stages: offline training and online inference. The computing device can be a server, workstation, cloud computing platform computing node, or other hardware device with data processing and model training capabilities, such as... Figures 1-3 As shown, the specific implementation steps of the method are as follows:

[0040] Step 1: Document preprocessing and vector index construction, candidate document retrieval.

[0041] 1.1 Raw Document Set Processing: Obtain the raw document set used for retrieval and sorting. The document set may originate from web page text, structured knowledge base, internal document system, or other text storage media. Perform unified cleaning and structuring processing on the raw document set, converting all documents to a unified encoding format and removing invisible characters, abnormal symbols, and redundant whitespace characters. For documents containing titles, body text, or multi-field structures, concatenate them according to a preset template while retaining meta-information fields such as source, timestamp, and domain tags. Assign a unique document identifier (docid) to each document, and perform deduplication processing on documents using text similarity or hash methods to reduce the index size and improve retrieval efficiency.

[0042] 1.2 Document Fragment Segmentation: The cleaned documents are segmented, prioritizing segmentation based on natural paragraphs or semantic boundaries. When the length of a single paragraph exceeds a preset threshold, a secondary segmentation is performed using a fixed token length. The target length for document fragments is set to 256 or 512 tokens. A certain proportion of overlapping areas is introduced between adjacent document fragments to prevent key information from being truncated at segmentation boundaries. A unique fragment identifier is assigned to each document fragment. At the same time, record the document identifier to which the fragment belongs. and their order and position in the original text Each segment stores the original text content and the display text used as input to the model. The display text can be truncated to control its length, while the original text is used for subsequent text localization.

[0043] 1.3 Vectorized Encoding and Vector Index Construction: bge-m3 is selected as the embedding model Encoder. All document fragments are vectorized, mapping each document fragment's text to the same semantic vector space. For any document fragment c, its vector representation is defined as:

[0044] ;

[0045] in, Representing a d-dimensional dense vector; writing the vectors of all document fragments into the Faiss database to construct a document fragment vector index, with the option to choose an approximate nearest neighbor index structure based on document size and performance requirements; maintaining internal identifiers and fragment identifiers within the vector index during index construction. Document identifier The mapping relationship between them is established, and after the index is built, the index file and the mapping table are persistently stored.

[0046] 1.4 Candidate Document Retrieval: After receiving the query, the system performs the same normalization processing on the query text as on the document, and uses the bge-m3 embedding model described above to vectorize the query. For query q, its vector representation is defined as: The query vector is then normalized. Cosine similarity is used as the metric to calculate the similarity between the query vector and all document fragment vectors in the Faiss vector index. Cosine similarity is defined as follows:

[0047] ;

[0048] The retrieval results are a set of document fragments and their similarity scores. To construct a candidate document set for reasoning and ranking, the retrieval results need to be aggregated from the fragment layer to the document layer. Specifically, the retrieval retrieves several document fragments most similar to the query vector; the retrieval results are aggregated from the fragment layer to the document layer; for multiple fragments corresponding to the same document, the fragments with the highest similarity are selected as evidence fragments for that document, and the highest similarity is used as the initial relevance score for the document; based on the initial relevance score at the document layer, the top few documents are selected to form the candidate document set. And retain the corresponding evidence fragment text for each document.

[0049] Step 2: Deconstruction of thought chain generation and atomic reasoning steps, and sample selection.

[0050] 2.1 Thought Chain Generation: Select a subset of training questions from the question set. For each training question, generate a corresponding candidate document set through the dense vector retrieval process in step 1 above. Organize the query question text and the candidate document set according to a unified input format and input them into a large language model with reasoning capabilities (GPT-4 is used in this embodiment) to generate the final answer to the query question and the corresponding thought chain. During the generation process, explicitly constrain the output format in the prompts so that the reasoning process is presented in a step-by-step manner. In each step, specify the document identifier on which it is based. At the same time, fix the sampling parameters to reduce the impact of randomness on the stability of the supervision signal.

[0051] 2.2 Atomic Reasoning Step Decomposition: The thought chain generated by the large language model is decomposed into multiple atomic reasoning steps, forming a set of reasoning steps. If the thought chain text contains explicit numbers, the numbers are used directly as the step boundaries for decomposition. If the thought chain text does not contain explicit numbers, it is decomposed according to syntactic boundaries. Each sentence that fully expresses an intermediate inference or factual judgment is regarded as an atomic reasoning step. Each decomposed atomic reasoning step is a semantic unit that can logically be used independently as an inference unit. The resulting ordered set of atomic reasoning steps is denoted as: ,in This represents the inference step of the i-th atom.

[0052] 2.3 Evidence Alignment: For each atomic reasoning step Vector alignment with evidence fragments in candidate documents, and each atomic reasoning step Mapping each piece of evidence from the candidate document to the same vector space and calculating their semantic similarity, for step... With Documents For any piece of evidence c in the document, the similarity is defined as:

[0053] ;

[0054] in and Vector representations of the step text and evidence fragments, respectively; for the document In other words, take all the evidence fragments and steps. The maximum similarity score is used to represent the semantic matching degree of the document for this step. If the maximum similarity score exceeds a preset threshold, the document is considered... Semantically supportable steps And for each atomic inference step, a set of candidate supporting documents is constructed.

[0055] 2.4 Multi-dimensional Sample Screening: The generated answers, thought processes, and atomic reasoning steps are screened from multiple dimensions to remove unqualified data samples. Specifically, this includes:

[0056] (1) Answer correctness screening: The final answer generated by the large language model is compared with the standard answer corresponding to the question. If the standard answer is in a closed form, it is matched precisely after normalization. If it is a numerical answer, a certain range of numerical error is allowed. If it is an open text answer, key facts are extracted by rules or a fixed discriminant model is used to judge semantic equivalence. Samples whose answers do not meet the requirements are directly removed.

[0057] (2) Evidence sufficiency check: Determine whether the candidate document set contains the necessary information to support the reasoning of the query question. For each atomic reasoning step, check whether its candidate supporting document set is not empty. If there are steps in the thought chain that have no corresponding supporting evidence, multiple steps that rely on external knowledge rather than candidate document content, or the reasoning chain that introduces facts or entities that are not located outside the candidate documents, they are all judged as insufficient information samples and removed.

[0058] (3) Logical consistency assessment: Input the query question text, candidate document set and the set of decomposed atomic reasoning steps into the review model. The review model judges the logical rationality of the reasoning process. Specifically, it checks whether there are obvious jumps between reasoning steps, whether there are circular arguments or self-contradictions, whether there are inverted causal relationships or conclusions first, and whether the reasoning chain as a whole gradually converges from the document evidence to the final answer. If the review model outputs a failure result, or the answer generated by the review model is inconsistent with the standard answer, the sample is removed.

[0059] Step 3: Document contribution metric and training dataset construction

[0060] 3.1 Information Gain Score Calculation: For each candidate document and each atomic inference step, calculate the probability of generating the inference step under the condition of introducing the document and the probability of generating the inference step without introducing the document. The probabilities are obtained through multiple sampling confidence estimation. The information gain score is defined based on the counterfactual assumption as follows:

[0061] ;

[0062] in, Indicates that when including the document Generation steps under conditions The probability, This indicates the steps to generate the document without including it. The probability of.

[0063] 3.2 Semantic Similarity Score Calculation: The atomic reasoning steps and each evidence fragment of the candidate document are mapped to the same semantic vector space using the bge-m3 embedding model to obtain the corresponding vector representations. The formula is as follows:

[0064] ;

[0065] For each candidate document and each atomic inference step, the semantic similarity score is defined as:

[0066] ;

[0067] 3.3 Final Contribution Score Calculation: The information gain score and semantic similarity score of each candidate document are weighted and fused. The fusion formula is as follows:

[0068] ;

[0069] Where α is a weighting parameter used to balance causal contribution and semantic relevance; a max-pooling strategy is used to select the maximum fusion score of the document across all atomic inference steps as the final contribution score of the document, as shown in the formula:

[0070] ;

[0071] 3.4 Training Dataset Formation: Training datasets have been generated for each candidate document based on the inference chain alignment method. Its final contribution score to problem q was calculated. The candidate document set is sorted in descending order to obtain the target sorting sequence consisting of document identifiers. This sequence represents the optimal document ranking induced by the model inference process. To facilitate model learning, local numbers are assigned to candidate documents within each sample, establishing a mapping relationship between local numbers and original document identifiers, thus transforming the target ranking sequence into an ordered sequence of local numbers. ,in This serves as the training dataset for document reordering.

[0072] Step 4: Training the generative reordering model

[0073] 4.1 Model Input Encoding: Concatenate the query question and the candidate document set in a preset format and encode it into the input features of the generative re-ranking model. In this embodiment, the generative re-ranking model is based on the LLaMA-7B model and adopts the Listwise ranking method, modeling the document re-ranking problem as a conditional sequence generation problem.

[0074] 4.2 Basic Settings for Model Training: Let the model parameters be θ, and the conditional probability distribution learned by the generative re-ranking model is:

[0075] ;

[0076] where y<t represents the sequence of document numbers generated before the t-th position. Train the model so that after receiving the input, it directly outputs an ordered sequence composed of local document numbers.

[0077] 4.3 Position-Aware Weighted Loss Function Setting: Introduce a position-aware weighted loss function during the model training process, assign higher loss weights to the documents ranked靠前, and make the model pay more attention to the sorting accuracy of the head documents. The loss function is defined as:

[0078] ;

[0079] where <00(00207>represents the weight function corresponding to the sorting position t. To highlight the importance of the head documents, the weight function is designed to decrease with the position, and the formula is:

[0080] ;

[0081] where is a hyperparameter used to control the enhancement amplitude of the head position weight; in the specific implementation, the position weight is only applied to the output representing the document number, and the default weight is maintained for other natural language tokens.

[0082] 4.4 Model Iterative Training: Use the training dataset constructed in step 3 above to perform iterative fine-tuning training on the generative re-ranking model until the sorting loss of the model converges and the performance reaches the preset standard, obtaining the trained generative re-ranking model.

[0083] Step 5: Global Re-ranking in the Online Inference Phase

[0084] 5.1 Candidate Document Preprocessing: In the online inference phase, after the system receives the query text, it repeats the document retrieval process in step 1 to obtain the candidate document set, and retains several highest-score evidence segments for each document as the model input content.

[0085] 5.2 Direct Ranking: If the total length of the model input corresponding to the number of candidate documents does not exceed the maximum input length of the context window of the generative re-ranking model. The query question and the candidate document set are directly input into the trained generative re-ranking model, and the model outputs an ordered sequence of document identifiers as the final re-ranking result.

[0086] 5.3 Global Reordering via Sliding Window: If the total length of the model input corresponding to the number of candidate documents exceeds the maximum input length of the model context window. A sliding window mechanism is used to achieve global reordering:

[0087] (1) Window parameter settings: based on the maximum input length of the model Determine the window size k to ensure that the model input within the window does not exceed a preset proportion of the maximum input length, and reserve sufficient space for the model output; set the sliding step size. ;

[0088] (2) Local reordering: The candidate document list is divided into windows in order, and the document sequence in each window is concatenated with the query question and then input into the generative reordering model to complete the local reordering;

[0089] (3) Global aggregation: Through iterative window movement and candidate update strategy, the local re-ranking results of each window are merged and finally aggregated to generate an ordered sequence of global Top-k related documents as the final re-ranking result.

[0090] Experimental verification

[0091] To verify the effectiveness of the method of this invention, a systematic experimental evaluation was conducted on the public reasoning question-answering datasets HotpotQA and MuSiQue:

[0092] Experimental setup: In the candidate recall phase, a dense vector retrieval method based on Sentence-BERT was adopted. A vector index was constructed using Faiss to recall the Top-50 candidate documents for each question. The generative re-ranking model was based on LLaMA-7B and trained by fine-tuning instructions. The large language inference model used GPT-4 to generate thought chains and construct ranking supervision signals. The Cross-Encoder (ms-marco-minilm-l-6-v2) re-ranking model was used as a comparison object, and Recall@5 was used as the evaluation metric to measure the performance of the model in recalling relevant documents.

[0093] To verify the effectiveness of the document re-ranking method proposed in this invention in question-answering scenarios, it was compared with the Cross-Encoder (ms-marco-minilm-l-6-v2) re-ranking model. The evaluation metric was the Recall@5 score, representing the performance in recalling relevant documents. The evaluation results are presented in the following table:

[0094]

[0095] Table 1 shows the evaluation results.

[0096] Results Analysis: The experimental results, as shown in Table 1, demonstrate that the method of this invention achieves a Recall@5 score of 65.5 on the MuSiQue dataset and 60.1 on the HotpotQA dataset, both significantly higher than the 57.2 and 48.3 scores of the comparison model. This proves that the method of this invention can effectively guide the model to prioritize documents that play a key supporting role in the reasoning process, and significantly improves the document re-ranking performance in complex reasoning question-answering scenarios.

[0097] The foregoing has shown and described the basic principles, main features, and advantages of the present invention. Those skilled in the art should understand that the present invention is not limited to the above embodiments. The embodiments and descriptions in the specification are merely preferred examples and are not intended to limit the invention. Various changes and modifications can be made to the invention without departing from its spirit and scope, and all such changes and modifications fall within the scope of the present invention as claimed. The scope of protection of the present invention is defined by the appended claims and their equivalents.

Claims

1. A generative document re-ranking method based on retrieval enhancement, characterized in that, Performed by computing devices, the process includes offline training and online inference phases, with the following specific steps: (1) The original document set is cleaned and structured and divided into document fragments. After vectorization encoding, a vector index is constructed using the Faiss database to establish a mapping relationship between the fragments and the original documents. The query question is received and vector retrieval is performed based on the vector index to obtain a candidate document set. (2) Input the query question and candidate document set into the large language model to generate the answer and thought chain. After decomposing the thought chain into atomic reasoning steps, sample screening is carried out by answer comparison, evidence verification and logical consistency evaluation to remove unqualified data samples. (3) Calculate the information gain score and semantic similarity score of each candidate document for each atomic reasoning step, and take the maximum value after weighted fusion as the final contribution score of the document. Sort the candidate documents according to the score to form the training dataset. (4) Encode the query question and training dataset into the input of the large language model, iteratively train the generative reordering model to make its output document identifiers into an ordered sequence, and introduce a position-aware weighted loss function during training to give higher loss weights to the documents ranked higher. (5) During the online inference stage, if the number of candidate documents does not exceed the input length limit of the generative reordering model, the ranking result is obtained by directly inputting the data into the model; if it exceeds the limit, the window capacity and sliding step size are determined according to the model context window, and global reordering is achieved by iterative sliding of the window.

2. The method according to claim 1, characterized in that, The document segmentation in step (1) prioritizes segmentation based on natural paragraphs or semantic boundaries. When a paragraph exceeds a preset threshold, it is segmented a second time based on a fixed token length. An overlapping area is set between adjacent document segments, and the target length of the document segment is 256 or 512 tokens.

3. The method according to claim 1, characterized in that, The vectorization encoding in step (1) adopts the bge-m3 embedding model, which maps both document fragments and query questions to the same semantic vector space; the vector retrieval adopts cosine similarity to measure the similarity between query vectors and document fragment vectors, and aggregates the retrieval results from the fragment layer to the document layer to form a candidate document set.

4. The method according to claim 1, characterized in that, The method for decomposing the thought chain into atomic reasoning steps in step (2) is as follows: based on the logical order or syntactic boundaries in the thought chain text, each semantic unit that can independently express intermediate inference is taken as an atomic reasoning step.

5. The method according to claim 1, characterized in that, The sample screening steps in step (2) include: comparing the final answer generated by the large language model with the standard answer; determining whether the atomic reasoning steps can locate supporting evidence in the candidate documents; and determining whether there are logical conflicts or causal inconsistencies between the atomic reasoning steps.

6. The method according to claim 1, characterized in that, The method for obtaining the information gain score in step (3) is as follows: For each candidate document and each atomic inference step, calculate the probability of generating the inference step under the condition that the document is introduced, calculate the probability of generating the inference step without the document being introduced, and calculate the information gain score of the document for the inference step based on the above probabilities.

7. The method according to claim 6, characterized in that, The method for obtaining the semantic similarity score in step (3) is as follows: Atomic inference steps and candidate documents are mapped to the same semantic vector space, and the semantic similarity score between each candidate document and each atomic inference step is calculated.

8. The method according to claim 7, characterized in that, The weighted fusion in step (3) is the information gain score multiplied by the first weight parameter, plus the difference between the semantic similarity score multiplied by 1 and the weight parameter. The weight parameter is used to balance causal contribution and semantic relevance.

9. The method according to claim 1, characterized in that, The location-aware weighted loss function in step (4) is the sum of the products of the negative reciprocal of the number of candidate documents and the logarithm of the generation probability of the document number at each ranking position; the weight of the ranking position is: ; in, For hyperparameters, This indicates the sorting position.

10. The method according to claim 1, characterized in that, The sliding step size mentioned in step (5) is half of the window capacity. The determination of the window capacity ensures that the model input does not exceed the preset proportion of the maximum input length and reserves output space. The globally highly relevant documents are generated by the window iterative sliding and candidate update strategy aggregation.

11. The method according to claim 1, characterized in that, The generative reordering model adopts a listwise sorting method and is obtained by fine-tuning the LLaMA-7B base model with instructions; the large language model is a GPT-4 model with reasoning ability, which is used to generate thought chains and construct sorting supervision signals.