A method and system for enhancing retrieval question answering performance based on hard negative examples

By using iterative linear assignment and adversarial training based on hard negative examples, the problem of low training efficiency in retrieval question answering systems is solved, improving model performance and avoiding overfitting. It is suitable for retrieval question answering systems in low-resource scenarios.

CN116186230BActive Publication Date: 2026-06-19BEIHANG UNIV

Patent Information

Authority / Receiving Office
CN · China
Patent Type
Patents(China)
Current Assignee / Owner
BEIHANG UNIV
Filing Date
2023-03-20
Publication Date
2026-06-19

AI Technical Summary

Technical Problem

In retrieval-based question answering systems, the negative examples obtained from batch sampling are relatively difficult, resulting in low training efficiency. How to improve the performance of retrieval-based question answering models without losing training efficiency has become an urgent problem to be solved, especially in low-resource scenarios where overfitting is serious and the number of difficult negative examples is small.

Method used

We adopt a method to enhance the performance of retrieval and question answering based on hard negative examples. We divide the training set into several groups through an iterative linear allocation algorithm to construct hard training batches. We also add adversarial perturbations to the question and answer embedding vectors to enhance the stability of training and the difficulty of negative examples within the batch. We use the encoding results of negative examples within the batch for training.

🎯Benefits of technology

Without increasing the forward training cost, the performance of the retrieval question answering model is improved, overfitting is avoided, and the robustness and training efficiency of the model are enhanced, especially the accuracy of retrieval question answering is significantly improved in low-resource scenarios.

✦ Generated by Eureka AI based on patent content.

Smart Images

  • Figure CN116186230B_ABST
    Figure CN116186230B_ABST
Patent Text Reader

Abstract

This invention relates to a method and system for enhancing retrieval and question answering performance based on hard negative examples. The method includes: S1: Preprocessing the text of the question and answer to obtain the input text sequence X. q and X a S2: X q and X a Inputting the dual-tower model yields the sentence-level context feature matrix H. q and H a S3: Calculate the question-answer pair H q and H a S4: Based on the normalized probability distribution, construct a loss function; S5: Based on the normalized probability distribution of the mismatched question-answer pairs, calculate the symmetric and asymmetric difficulty of the mismatched question-answer pairs; S6: Group the training set to maximize the difficulty of each group, and construct training batches based on the grouping results; S7: Add adversarial perturbations to the question and answer embedding vectors to enhance the stability of training and increase the difficulty of negative examples within a batch. The method provided by this invention improves the performance of the retrieval question-answering model while maintaining training efficiency.
Need to check novelty before this filing date? Find Prior Art

Description

Technical Field

[0001] This invention relates to the field of machine question answering, and more specifically to a method and system for enhancing retrieval question answering performance based on difficult-to-bear examples. Background Technology

[0002] Question answering systems, as a crucial application of natural language processing, have garnered increasing attention in recent years. In the information age, the internet is flooded with information, making information sharing and acquisition easier, but also posing a challenge to quickly retrieving the information needed. Question answering systems, as an ideal human-computer interaction model, aim to directly answer user questions, eliminating the annoyance of false information and advertisements that users may encounter when using search engines. Today, with the development of machine learning and deep learning, the general paradigm of question answering systems has shifted from traditional rule-based models to those based on pre-trained models, such as the large neural network model BERT. These models are pre-trained on massive amounts of unsupervised corpora and then fine-tuned using downstream data for specific tasks to achieve better performance. This data-driven model outperforms traditional rule-driven methods and is more flexible, unlike the rigidity of rules. Furthermore, benefiting from advancements in computing power and the increase in available data, its performance is even better; therefore, data-driven deep learning models have become the main paradigm for question answering systems.

[0003] Depending on the question-answering technology used, question-answering systems can be specifically divided into generative question-answering systems, extractive question-answering systems, and retrieval question-answering systems. Generative question-answering systems aim to generate answers word-by-word using generative models, which carries risks such as slow generation speed, potential for generating incorrect answers, and violations of ethical and legal regulations. Extractive question-answering systems search for answers through a two-stage process of recalling documents and extracting answers; documents containing the correct answers may be missing during recall, thus this method carries the risk of error accumulation. Retrieval question-answering systems retrieve answers end-to-end from existing answer paragraphs and sentences, which is controllable and avoids the error propagation problem of extractive question-answering systems. However, in retrieval question-answering systems, negative examples obtained through batch sampling are less difficult and lack information, which is detrimental to model performance. Encoding additional negative examples introduces time and space costs, significantly reducing training efficiency. Therefore, how to improve the performance of retrieval question-answering models without sacrificing training efficiency has become an urgent problem to be solved. Summary of the Invention

[0004] To address the aforementioned technical problems, this invention provides a method and system for enhancing retrieval and question answering performance based on difficult-to-bear examples.

[0005] The technical solution of this invention is: a method for enhancing retrieval and question answering performance based on difficult negative examples, comprising:

[0006] Step S1: Preprocess the text of the question and answer to obtain the input text sequences X of the question and answer respectively. q and X a ;

[0007] Step S2: Place X q and X a Input a twin-tower model, output X q and X a H, the context feature matrix at the sentence level q and H a ;

[0008] Step S3: Calculate the question-answer pair H q and H a The non-normalized matching scores between the pairs are then subjected to a Softmax operation to obtain a normalized probability distribution.

[0009] Step S4: Approximate the negative example set using an in-batch negative example sampling strategy. Based on the normalized probability distribution, the loss function is calculated using cross-entropy for backpropagation;

[0010] Step S5: Based on the normalized probability distribution of the question-answer pairs, calculate the symmetric and asymmetric difficulty between mismatched question-answer pairs;

[0011] Step S6: Group the training set: First, initialize the group centers of n groups. Then, based on the symmetric difficulty of the mismatched question-answer pairs, calculate the maximum payoff bijection between the ungrouped samples and the existing groups, and group them to maximize the difficulty of each group. The group difficulty is the average asymmetric difficulty of the mismatched question-answer pairs within the group. After multiple initializations and groupings, the group with the highest total difficulty is taken as the final grouping result, and random samples are taken from each group to construct the training batch.

[0012] Step S7: Add adversarial perturbations to the question and answer embedding vectors to enhance training stability and further increase the difficulty of negative examples within a batch.

[0013] Compared with the prior art, the present invention has the following advantages:

[0014] 1. This invention discloses a method for enhancing retrieval and question answering performance based on difficult negative examples. It uses an iterative linear allocation algorithm to divide the training set into several difficult groups, and on this basis, constructs difficult training batches, so that the model can reuse the encoding results of difficult negative examples in the batch. Without increasing the forward training cost, it can benefit from the training of difficult negative examples in the batch, maintain the high efficiency of training, and enhance the final retrieval and question answering performance.

[0015] 2. In low-resource scenarios, such as biomedicine, data annotation can only be done by experts in the field, requiring expensive human resources. In such scenarios, model performance is limited by the dataset size, making it prone to overfitting, and the number of available difficult negative examples in the dataset is also smaller. This invention proposes adversarial training using the fast gradient symbolic method. By applying small perturbations to the symbolic embedding vectors of questions and answers, the model is induced to make mistakes. This not only increases the difficulty of in-batch negative examples through gradient ascent but also enhances the robustness of the model and avoids overfitting. Attached Figure Description

[0016] Figure 1 This is a flowchart of a method for enhancing retrieval and question answering performance based on hard negative examples in an embodiment of the present invention;

[0017] Figure 2 This is a schematic diagram of the dual-tower model architecture in an embodiment of the present invention;

[0018] Figure 3 This is a structural block diagram of a system for enhancing retrieval and question answering performance based on difficult negative examples, as described in an embodiment of the present invention. Detailed Implementation

[0019] This invention provides a method for enhancing retrieval question answering performance based on hard negative examples. It combines the advantages of efficient training and no additional forward computation cost of negative examples obtained through batch sampling with the advantages of encoding additional hard negative examples to provide more information and better training results. This improves the performance of the retrieval question answering model without losing training efficiency.

[0020] To make the objectives, technical solutions, and advantages of this invention clearer, the invention will be further described in detail below through specific implementations and in conjunction with the accompanying drawings.

[0021] To aid in understanding the embodiments of the present invention, the definition of a difficult-to-bear example is first given:

[0022] Difficult-to-representation examples, which are incorrect answers that are difficult to distinguish from the correct answer, are crucial for retrieval-based question answering tasks. However, traditional difficult-to-representation examples require additional encoding time and space costs. For example, if two additional difficult-to-representation examples are needed, the forward training time and memory required will double. When more difficult-to-representation examples are needed, the problem becomes more serious, making training inefficient.

[0023] Example 1

[0024] like Figure 1 As shown in the figure, an embodiment of the present invention provides a method for enhancing retrieval and question answering performance based on difficult negative examples, comprising the following steps:

[0025] Step S1: Preprocess the text of the question and answer to obtain the input text sequences X of the question and answer respectively. q and X a ;

[0026] Step S2: Place X q and X a Input a twin-tower model, output X q and X a H, the context feature matrix at the sentence level q and H a ;

[0027] Step S3: Calculate the question-answer pair H q and H a The non-normalized matching scores between the pairs are then subjected to a Softmax operation to obtain a normalized probability distribution.

[0028] Step S4: Approximate the negative example set using an in-batch negative example sampling strategy. Based on the normalized probability distribution, the loss function is calculated using cross-entropy and used for backpropagation;

[0029] Step S5: Based on the normalized probability distribution of question-answer pairs, calculate the symmetric and asymmetric difficulty between mismatched question-answer pairs;

[0030] Step S6: Group the training set: First, initialize the group centers of n groups. Then, based on the symmetric difficulty of mismatched question-answer pairs, calculate the maximum payoff bijection between the ungrouped samples and the existing groups, and group them to maximize the difficulty of each group. The group difficulty is the mean of the asymmetric difficulty of mismatched question-answer pairs within the group. After multiple initializations and groupings, the group with the highest total difficulty is taken as the final grouping result, and random samples are taken from each group to construct the training batch.

[0031] Step S7: Add adversarial perturbations to the question and answer embedding vectors to enhance training stability and further increase the difficulty of negative examples within a batch.

[0032] In one embodiment, step S1 above involves preprocessing the texts of the question and answer to obtain the input text sequences X of the question and answer, respectively. q and X a Specifically, it includes:

[0033] Step S11: Convert the input question and candidate answer text to Unicode encoding, perform word segmentation on them, and obtain the word segmentation results for the question and answer: Where q and a represent a single question and answer, and They represent the i-th word in the question and the i-th word in the answer, respectively. q and la These are the text lengths of the question and the answer, respectively.

[0034] Given a user-submitted question and candidate answers, question answering aims to retrieve the correct answer from the candidate answers end-to-end. First, the question and candidate answers are converted to Unicode encoding (e.g., the commonly used UTF-8 encoding). Then, the question and candidate answers are segmented using the WordPiece sub-word model.

[0035] Step S12: Then concatenate the word segmentation results of the question and answer with the characters [CLS] and [SEP] to obtain the input text sequence X of the question. q The input text sequence X of the answer a :

[0036]

[0037]

[0038] The word segmentation results of the question and answer are concatenated with the special characters [CLS] and [SEP] in BERT as the start character and [SEP] as the end character.

[0039] In one embodiment, step S2 above: X q and X a Input a twin-tower model, output X q and X a H, the context feature matrix at the sentence level q and H a Specifically, it includes:

[0040] Step S21: Place X q and X a Convert the token_id sequence into a word sequence, and based on the length of the token_id sequence, assign a position_id of the appropriate length.

[0041] Step S22: Construct a dual-tower model with shared parameters based on BERT; input the token_id sequence and position_id into the dual-tower model to extract the word-level context feature matrix H. q and H a :

[0042]

[0043]

[0044] in, and and These represent the [CLS] and [SEP] features in the question and answer, respectively, along with their corresponding contextual feature vectors. and Let represent the context feature vectors corresponding to the i-th word in the question and answer, respectively, where d is the dimension of the context feature vector. and These are the context feature matrices composed of all words in the question and the answer, respectively.

[0045] In one embodiment, step S3 above: Calculate the question-answer pair H q and H a The non-normalized matching scores are then subjected to a Softmax operation to obtain a normalized probability distribution, which specifically includes:

[0046] Step S31: In order to obtain a fixed-size sentence-level contextual feature for efficient retrieval, this embodiment of the invention uses average pooling on H q and H a Perform aggregation operations to obtain the corresponding sentence-level context feature vectors e. q and e a :

[0047]

[0048]

[0049] in, It is the sentence-level contextual feature vector of the question. It is the sentence-level contextual feature vector of the answer text;

[0050] Step S32: Calculate the dot integral between the question-answer pair vectors to measure the degree of matching between question-answer pairs q and a:

[0051] sim(e q ,e a ) = e q ·e a

[0052] Where, sim(e q ,e a () represents the unnormalized matching score between question-answer pairs;

[0053] Step S33: For sim(e) q ,e a Perform a softmax operation to obtain a normalized probability distribution:

[0054]

[0055] in, The set of negative examples is the set of all answers that do not match the question q. These unmatched answers are called negative samples or negative examples.

[0056] In one embodiment, step S4 above: approximate the negative example set using an in-batch negative example sampling strategy. Based on a normalized probability distribution, a loss function is calculated using cross-entropy for backpropagation, specifically including:

[0057] The strategy of sampling negative examples within a batch leverages the randomness of batch construction, using negative examples within the batch in each iteration to approximate the conditional probability and construct the loss function. Makes the random negative examples within the batch approximate

[0058]

[0059] Where B represents the size of the training batch. These represent the sentence-level contextual feature vectors of the i-th question and the j-th answer within the batch, respectively.

[0060] In retrieval scenarios, the number of negative examples can reach thousands or more. Encoding all negative examples every time conditional probability is calculated incurs unacceptable computational costs and leads to inefficiency. In mini-batch gradient descent training strategies, a training batch consists of several question-answer pairs used to calculate the loss function and update the model via backpropagation. This invention employs an in-batch sampling strategy for negative examples, leveraging the randomness of batch construction to approximate the conditional probability using negative examples within the batch each time. This approach is not only effective but also highly efficient in training. Under this strategy, the loss function is optimized... The dual-tower model can bring the vector distance between matching question-answer pairs closer and widen the distance between unmatched question-answer pairs, thereby learning good feature vectors.

[0061] Before each training round, the questions and answers in the training set are inferred and encoded, and the difficulty of negative examples is calculated. Steps S2 and S3 above are run in inference mode to obtain the conditional probability scores of the matching between question-answer pairs in the entire training set, in the form of p(a|q). To find truly difficult negative examples, Using all negative examples, since inference is performed only once per round and there is no need to save the computation graph, there is no computational efficiency issue like in training. Construct a training batch containing difficult negative examples using the following steps.

[0062] In one embodiment, step S5 above, which calculates the symmetric and asymmetric difficulty between mismatched question-answer pairs based on the normalized probability distribution of the question-answer pairs, specifically includes:

[0063] The asymmetric difficulty between mismatched question-answer pairs q and a is defined as follows:

[0064] diff(i,j)=p(a j |q i )

[0065] Where, diff(i,j) represents the expression from problem q i When selecting the correct answer from the candidate answers, it is incorrectly judged as a. j The probability of the answer; the higher the value of diff(i,j), the higher the probability of the answer. j Compared to question q i The more valuable it is, the more valuable it becomes; diff(i,j) represents asymmetric difficulty, i.e.:

[0066] diff(i,j) ≠ diff(j,i)

[0067] However, the relationship between negative examples is symmetrical; that is, in a training batch, if the answer a... j The question is q. i If the negative example is a, then the answer is a. i Also a question q j Negative examples;

[0068] Therefore, the difficulty of the question-and-answer format is symmetrically represented by the following formula:

[0069]

[0070] Among them, diff sym (i,j) represents the symmetric difficulty of a mismatched question-answer pair.

[0071] In one embodiment, step S6 above: grouping the training set: First, initialize the group centers of n groups, then calculate the maximum payoff bijection between the ungrouped samples and the existing groups based on the symmetric difficulty of the mismatched question-answer pairs, group them, and maximize the group difficulty, where the group difficulty is the mean of the asymmetric difficulty of the mismatched question-answer pairs within the group; after multiple initializations and groupings, the group with the highest total difficulty is taken as the final grouping result, and random sampling is performed from each group to construct a training batch, specifically including:

[0072] Step S61: Define the difficulty objective of the grouping to be optimized: Split the training set into n groups of equal size, while maximizing the difficulty of each group:

[0073] The difficulty of a group G is defined as the mean of the asymmetric difficulty among mismatched question-answer pairs within the group:

[0074]

[0075] Where G is a set of several training samples, and |G| represents the size of the set G;

[0076] Based on the preset group size, the training set is split into multiple groups of the same size, and the sum of the difficulties of each group is maximized. The optimization objective is defined as follows:

[0077]

[0078]

[0079] Where GC represents the group size, n = |D| / GC represents the number of groups, and D is the sample set in the training set;

[0080] A greedy random algorithm is used as the initialization strategy to initialize n group centers. Specifically, a sample is first randomly selected as the first group center. Then, the maximum difficulty of the remaining samples and the existing group centers is calculated, and one of the top-k minimum difficulty samples is randomly selected as the new group center. This process is repeated n-1 times to obtain n group centers. Completely random initialization might split pairs of mutually difficult negative examples into different groups; therefore, the greedy algorithm in this invention reduces some randomness and achieves better initialization results.

[0081] Step S62: Define the benefit of assigning a sample to a group: For a group G, the benefit of adding a new sample to it is defined as the increase in the total difficulty value within the group, as shown in the following formula:

[0082]

[0083] Where Gain(i,G) represents the difficulty gain generated by assigning the new sample i to group G;

[0084] Since the final groups in this embodiment of the invention are all of uniform size, the operation of taking the average value is omitted.

[0085] Step S63: Calculate the optimal bijective allocation strategy among multiple samples and multiple groups using a linear allocation algorithm:

[0086] Let Gain: X × Y → R represent the payoff function. The goal is to find the bijective f: X → Y that maximizes the total payoff, i.e.:

[0087]

[0088] At each allocation, n distinct ungrouped samples are randomly sampled, and the payoff of each of the n ungrouped samples assigned to each group is calculated to obtain the payoff matrix. Then, the Jonker Volgenant algorithm is used to calculate the maximum payoff bijection between the n ungrouped samples and the existing groups, and the n ungrouped samples are added to the corresponding groups to complete the allocation;

[0089] Step S64: Repeat S63 until all samples are assigned to their corresponding groups, resulting in a grouping result;

[0090] Step S65: Repeat steps S61 to S64. After multiple initializations and groupings, multiple grouping results are obtained. Select the group with the highest total difficulty as the final grouping result.

[0091] Due to the influence of randomness, the above-mentioned initialization strategy and allocation algorithm are executed multiple times in the embodiments of the present invention. The group center and allocation result obtained each time are different. The total difficulty of all groups in each execution is calculated, and then the grouping result with the highest total difficulty is selected as the final grouping result.

[0092] Finally, training batches are constructed by randomly sampling from several groups in the final grouping results. Traditional training processes construct batches by randomly sampling from the entire training set, resulting in random and simple negative examples within each batch. The method of this invention constructs within-group samples that are mutually difficult negative examples; therefore, sampling from these groups yields training batches with higher difficulty.

[0093] This invention presents a method for initializing group centers that combines greedy and random strategies. It leverages the heuristic nature of random strategies while mitigating the uncontrollability of complete randomness through greedy constraints. Furthermore, based on linear allocation, the grouping process is iteratively completed, finding the optimal bijective for each sample allocation to maximize the grouping benefit.

[0094] In addition, to enhance the stability of training and further increase the difficulty of negative examples within a batch, this embodiment of the invention adds adversarial perturbations to the question and answer vector embeddings.

[0095] In one embodiment, step S7: adding adversarial perturbations to the question and answer embedding vectors to enhance training stability and further increase the difficulty of in-batch negative examples, specifically including:

[0096] The goal of adversarial training is defined as follows:

[0097]

[0098]

[0099]

[0100]

[0101]

[0102] Where D represents the training set, θ represents the model parameters, and y is a label indicating whether a question-answer pair matches. These are the question and answer feature vectors obtained by calculating and aggregating the perturbed embedding vectors using the BERT model; These represent the symbolic embedding vectors of question q and answer a, respectively, obtained by multiplying token_id with the symbolic embedding layer of the BERT model; It is the perturbation vector within space Ω;

[0103] To maximize Using the fast gradient sign method, adversarial perturbations are added through gradient ascent, as defined below:

[0104]

[0105] Where ∈ represents the maximum norm of the perturbation, in order to avoid δ q If it is too large, normalize it using the following formula:

[0106]

[0107] Where sign is the sign function. Expressing the request The derivative with respect to q;

[0108] Similarly, δ a Defined as:

[0109]

[0110] Define the final loss function. The formula is as follows:

[0111]

[0112] like Figure 2 The diagram shown is a schematic of the dual-tower model.

[0113] The proposed method for enhancing retrieval and question answering performance based on difficult negative examples is named Adversarial-Iterative Linear Assignment based Grouping (A-ILAG). By combining it with different text encoders such as BERT and BioBERT, the method of this invention can be named A-ILAG-BERT and A-ILAG-BioBERT, respectively.

[0114] The method proposed in this invention was tested on well-known biomedical datasets BioASQ 6b, 7b, 8b, and 9b, as well as the retrieval question-answering version (ReQA) of the well-known general-domain question-answering dataset Stanford Question Answering (SQuAD). The results were compared with top methods on the corresponding datasets. Ten experiments were conducted, and the mean and standard deviation were reported.

[0115] This invention uses three popular retrieval metrics—MRR, P@K, and R@K—to evaluate the results, following the conventions of previous ReQA work. MRR (Mean Reciprocal Rank) measures the overall performance of the ranking results, and the formula is as follows.

[0116]

[0117] Where |Q| represents the number of questions in the question set, and rank i This represents the ranking of the first correct answer among the prediction results for the i-th question.

[0118] P@K (Precision At Top K) measures the model's ability to predict the correct answer. It represents the proportion of correct answers among the top-k ranked results, and the formula is as follows:

[0119]

[0120] Among them, A i , These represent the predicted answer ranking and the set of correct answers for the i-th question, respectively. Considering that the main purpose of ReQA is to find the correct answer for each question, this embodiment of the invention obtains the result of P@1.

[0121] R@K (Recall At Top K) measures the model's ability to recall correct answers. It represents the proportion of correct answers recalled from the top-k ranked results out of the total number of correct answers, as shown in the formula below:

[0122]

[0123] Consistent with previous ReQA work, the results of R@5 are obtained in this embodiment of the invention.

[0124] Tables 1, 2, 3, and 4 below show the experimental results on ReQABioASQ 6b, 7b, 8b, and 9b, respectively. Table 5 shows the experimental results on the ReQA SQuAD dataset. It can be seen that A-ILAG-BioBERT has significantly improved the three metrics of MRR, P@1, and R@5 compared to the basic models Dual-BioBERT and Dual-BERT, and has surpassed previous methods based on interaction enhancement and transfer learning, such as Cross-VAE, ENDX, and RBAR, achieving the best results on all five datasets.

[0125] Table 1. Model performance on ReQA BioASQ 6b

[0126]

[0127] Table 2. Model performance on ReQA BioASQ 7b

[0128]

[0129]

[0130] Table 3 shows the model's performance on ReQABioASQ 8b.

[0131]

[0132] Table 4 shows the model's performance on ReQABioASQ 9b.

[0133]

[0134] Table 5 shows the model's performance on ReQA and SQuAD.

[0135]

[0136] This invention discloses a method for enhancing retrieval and question answering performance based on difficult negative examples. It uses an iterative linear allocation algorithm to divide the training set into several difficult groups, and then constructs difficult training batches based on these groups. This allows the model to reuse the encoding results of difficult negative examples within the batch, benefiting from the training of difficult negative examples within the batch without increasing the forward training cost. This maintains the high efficiency of training and enhances the final retrieval and question answering performance.

[0137] In low-resource scenarios, such as biomedicine, data annotation can only be done by experts in the field, requiring significant human resources. In such scenarios, model performance is limited by the dataset size, making it prone to overfitting, and the number of available difficult negative examples is also limited. This invention proposes adversarial training using a fast gradient symbolic method. By applying small perturbations to the symbolic embedding vectors of questions and answers, the model is induced to make errors. This increases the difficulty of in-batch negative examples through gradient ascent and enhances the model's robustness, avoiding overfitting.

[0138] Example 2

[0139] like Figure 3 As shown, this embodiment of the invention provides a system for enhancing retrieval and question answering performance based on difficult negative examples, comprising the following modules:

[0140] Preprocessing module 81 is used to preprocess the text of the question and answer, obtaining the input text sequences X of the question and answer respectively. q and X a ;

[0141] Module 82 for constructing sentence-level contextual feature matrix is ​​used to integrate X. q and X a Input a twin-tower model, output X q and X a H, the context feature matrix at the sentence level q and H a ;

[0142] Module 83, which calculates the normalized probability distribution of question-answer pairs, is used to calculate the question-answer pair H. q and H a The non-normalized matching scores between the pairs are then subjected to a Softmax operation to obtain a normalized probability distribution.

[0143] Construct a negative log-likelihood loss module 84 to approximate the negative example set using an in-batch negative example sampling strategy. Based on the normalized probability distribution, the loss function is calculated using cross-entropy and used for backpropagation;

[0144] The question-answer pair difficulty calculation module 85 is used to calculate the symmetric and asymmetric difficulty between mismatched question-answer pairs based on the normalized probability distribution of the question-answer pairs.

[0145] A training batch module 86 is constructed to group the training set: First, the group centers of n groups are initialized. Then, based on the symmetric difficulty of mismatched question-answer pairs, the maximum payoff bijection between the ungrouped samples and the existing groups is calculated, and the samples are grouped to maximize the difficulty of each group. The group difficulty is the mean of the asymmetric difficulty of mismatched question-answer pairs within the group. After multiple initializations and groupings, the group with the highest total difficulty is taken as the final grouping result, and random samples are taken from each group to construct the training batch.

[0146] A final loss function module 87 is constructed to add adversarial perturbations to the question and answer embedding vectors, thereby enhancing the stability of training and further increasing the difficulty of negative examples within a batch.

[0147] The above embodiments are provided merely for the purpose of describing the present invention and are not intended to limit the scope of the invention. The scope of the invention is defined by the appended claims. Various equivalent substitutions and modifications made without departing from the spirit and principles of the invention should be covered within the scope of the invention.

Claims

1. A method for enhancing retrieval and question answering performance based on difficult-to-bear examples, characterized in that, include: Step S1: Preprocess the text of the question and answer to obtain the input text sequences of the question and answer respectively. and ; Step S2: and Input a dual-tower model, output and Context feature matrices at the sentence level and ; Step S3: Calculate question-answer pairs and The non-normalized matching scores between the pairs are then subjected to a Softmax operation to obtain a normalized probability distribution. Step S4: Approximate the negative example set using an in-batch negative example sampling strategy. Based on the normalized probability distribution, the loss function is calculated using cross-entropy and used for backpropagation; Step S5: Based on the normalized probability distribution of the question-answer pairs, calculate the symmetric and asymmetric difficulty between mismatched question-answer pairs; Step S6: Divide the training set into groups: First, initialize... The group centers are determined, and then, based on the symmetric difficulty of the mismatched question-answer pairs, the maximum payoff bijection between the ungrouped samples and the existing groups is calculated. These samples are then grouped to maximize the difficulty of each group, where the group difficulty is the mean of the asymmetric difficulty of the mismatched question-answer pairs within the group. After multiple initializations and groupings, the group with the highest total difficulty is taken as the final grouping result, and training batches are constructed by randomly sampling from each group. Specifically, this includes: Step S61: Define the difficulty target for the groups to be optimized: Split the training set into n groups of equal size, while maximizing the difficulty of each group: Define a group The difficulty is the mean of the asymmetric difficulty among mismatched question-answer pairs within the group: in, It is a set consisting of several training samples. Representation group Size; Based on a preset group size, the training set is split into multiple groups of equal size, and the sum of the difficulties of each group is maximized. The optimization objective is defined as follows: in, Indicates the group's capacity. Indicates the number of groups; The training set is the set of samples. Using a greedy random algorithm as the initialization strategy, the initialization results are obtained. Each group center; Step S62: Define the benefit of assigning a sample to a group: for a group When a new sample is added, the benefit is defined as the increase in the total difficulty value within the group, as shown in the following formula: in, Indicates the new sample Assigned to group The resulting difficulty and benefits; Step S63: Calculate the optimal bijective allocation strategy among multiple samples and multiple groups using a linear allocation algorithm: make The representative payoff function aims to find a bijective. To maximize the total profit, i.e.: Random sampling is performed during each allocation. Given n distinct ungrouped samples, calculate the payoff for each of the n ungrouped samples assigned to each group to obtain the payoff matrix. Then, the Jonker Volgenant algorithm is used to calculate the maximum payoff bijection between the n ungrouped samples and the existing groups, and the n ungrouped samples are added to the corresponding groups. Step S64: Repeat S63 until all samples are assigned to their corresponding groups, resulting in a grouping result; Step S65: Repeat steps S61 to S64. After multiple initializations and groupings, multiple grouping results are obtained. The group with the highest total difficulty is selected as the final grouping result. Step S7: Add adversarial perturbations to the question and answer embedding vectors to enhance training stability and further increase the difficulty of in-batch negative examples. Specifically, this includes: The goal of adversarial training is defined as follows: in, Representative training set, Represents model parameters, It is a tag that indicates whether a question-and-answer pair matches. These are the question and answer feature vectors obtained by calculating and aggregating the perturbed embedding vectors using the BERT model; Each represents a problem and answer The symbolic embedding vector is obtained by multiplying token_id with the symbolic embedding layer of the BERT model; It is space The perturbation vector within; To maximize Using the fast gradient sign method, adversarial perturbations are added through gradient ascent, as defined below: in, The maximum norm represents the perturbation, in order to avoid If it is too large, normalize it using the following formula: in, It is a symbolic function. Expressing the request The derivative with respect to q; Similarly, Defined as: ; Define the final loss function. The formula is as follows: 。 2. The method for enhancing retrieval and question answering performance based on difficult-to-bear examples according to claim 1, characterized in that, Step S1: Preprocess the text of the question and answer to obtain the input text sequences of the question and answer, respectively. and Specifically, it includes: Step S11: Convert the input question and candidate answer text to Unicode encoding, perform word segmentation on them, and obtain the word segmentation results for the question and answer: ,in, and Representing a single question and answer, and These represent the first part of the question and the second part of the answer. One word, and These are the text lengths of the question and the answer, respectively. Step S12: Then, combine the word segmentation results of the question and answer with the characters. and By concatenating these sequences, we obtain the input text sequence for the question. and the sequence of input text for the answer : 。 3. The method for enhancing retrieval and question answering performance based on difficult negative examples according to claim 2, characterized in that, Step S2: ... and Input a dual-tower model, output and Context feature matrices at the sentence level and Specifically, it includes: Step S21: and Convert the token_id sequence into a word sequence, and based on the length of the token_id sequence, assign a position_id of the appropriate length. Step S22: Construct a dual-tower model with shared parameters based on BERT; input the token_id sequence and position_id into the dual-tower model to extract the word-level context feature matrix. and : in, , These represent the question and the answer respectively. and With the corresponding context feature vector, and These represent the first and second parts of the question and answer, respectively. The context feature vector corresponding to each word The dimension of the context feature vector. These are the context feature matrices composed of all words in the question and the answer, respectively.

4. The method for enhancing retrieval and question answering performance based on difficult negative examples according to claim 3, characterized in that, Step S3: Calculate question-answer pairs and The non-normalized matching scores are then subjected to a Softmax operation to obtain a normalized probability distribution, which specifically includes: Step S31: For Perform aggregation operations to obtain the corresponding sentence-level context feature vectors. and : in, It is the sentence-level contextual feature vector of the question. It is the sentence-level contextual feature vector of the answer text; Step S32: Calculate the dot integral between the question-answer pair vectors to measure the performance of the question-answer pair. Degree of matching between: in, This represents the unnormalized matching score between question-answer pairs; Step S33: For Performing a softmax operation yields a normalized probability distribution: in, Let the set of negative examples be all instances related to the problem. The set of non-matching answers.

5. The method for enhancing retrieval and question answering performance based on difficult-to-bear examples according to claim 4, characterized in that, Step S4: Approximate the negative example set using an in-batch negative example sampling strategy. Based on the normalized probability distribution, a loss function is calculated using cross-entropy for backpropagation, specifically including: The strategy of sampling negative examples within a batch leverages the randomness of batch construction, using negative examples within the batch in each iteration to approximate the conditional probability and construct the loss function. This makes the random negative examples within the batch approximately equal to... : in, This represents the size of the training batch. Representing the first in the batch The sentence-level contextual feature vector of the first question and the first Sentence-level contextual feature vectors for each answer.

6. The method for enhancing retrieval and question answering performance based on difficult negative examples according to claim 5, characterized in that, Step S5: Based on the normalized probability distribution of the question-answer pairs, calculate the symmetric and asymmetric difficulty between mismatched question-answer pairs, specifically including: Definition of mismatched question-answer pairs The asymmetric difficulty between them is as follows: in, Indicates from the question When selecting the correct answer from the candidate answers, it is incorrectly judged as The probability of; A higher value indicates a better answer. Compared to the problem The more valuable it is; It is asymmetric difficulty, that is However, the relationship between the negative examples is symmetrical, that is, in a training batch, if the answer... It's a problem If the negative example is given, then the answer is... It's also a problem. Negative examples; Therefore, the difficulty of the question-and-answer format is symmetrically represented by the following formula: in, This indicates the symmetric difficulty of mismatched question-answer pairs.

7. A system for enhancing retrieval and question-answering performance based on difficult-to-bear examples, characterized in that, Includes the following modules: The preprocessing module is used to preprocess the text of the questions and answers, obtaining the input text sequences of the questions and answers respectively. and ; The module for constructing sentence-level context feature matrices is used to... and Input a dual-tower model, output and Context feature matrices at the sentence level and ; The module for calculating the normalized probability distribution of question-answer pairs is used to calculate question-answer pairs. and The non-normalized matching scores between the pairs are then subjected to a Softmax operation to obtain a normalized probability distribution. Construct a negative log-likelihood loss module to approximate the set of negative examples using an in-batch negative example sampling strategy. Based on the normalized probability distribution, the loss function is calculated using cross-entropy and used for backpropagation; The question-answer pair difficulty calculation module is used to calculate the symmetric and asymmetric difficulty between mismatched question-answer pairs based on the normalized probability distribution of the question-answer pairs. The training batch module is used to group the training set: First, initialize... The group centers are determined, and then, based on the symmetric difficulty of the mismatched question-answer pairs, the maximum payoff bijection between the ungrouped samples and the existing groups is calculated. These samples are then grouped to maximize the difficulty of each group, where the group difficulty is the mean of the asymmetric difficulty of the mismatched question-answer pairs within the group. After multiple initializations and groupings, the group with the highest total difficulty is taken as the final grouping result, and training batches are constructed by randomly sampling from each group. Specifically, this includes: Step S61: Define the difficulty target for the groups to be optimized: Split the training set into n groups of equal size, while maximizing the difficulty of each group: Define a group The difficulty is the mean of the asymmetric difficulty among mismatched question-answer pairs within the group: in, It is a set consisting of several training samples. Representation group Size; Based on a preset group size, the training set is split into multiple groups of equal size, and the sum of the difficulties of each group is maximized. The optimization objective is defined as follows: in, Indicates the group's capacity. Indicates the number of groups; The training set is the set of samples. Using a greedy random algorithm as the initialization strategy, the initialization results are obtained. Each group center; Step S62: Define the benefit of assigning a sample to a group: for a group When a new sample is added, the benefit is defined as the increase in the total difficulty value within the group, as shown in the following formula: in, Indicates the new sample Assigned to group The resulting difficulty and benefits; Step S63: Calculate the optimal bijective allocation strategy among multiple samples and multiple groups using a linear allocation algorithm: make The representative payoff function aims to find a bijective. To maximize the total profit, i.e.: Random sampling is performed during each allocation. Given n distinct ungrouped samples, calculate the payoff for each of the n ungrouped samples assigned to each group to obtain the payoff matrix. Then, the Jonker Volgenant algorithm is used to calculate the maximum payoff bijection between the n ungrouped samples and the existing groups, and the n ungrouped samples are added to the corresponding groups. Step S64: Repeat S63 until all samples are assigned to their corresponding groups, resulting in a grouping result; Step S65: Repeat steps S61 to S64. After multiple initializations and groupings, multiple grouping results are obtained. The group with the highest total difficulty is selected as the final grouping result. A final loss function module is constructed to add adversarial perturbations to the question and answer embedding vectors, enhancing training stability and further increasing the difficulty of in-batch negative examples. Specifically, this includes: The goal of adversarial training is defined as follows: in, Representative training set, Represents model parameters, It is a tag that indicates whether a question-and-answer pair matches. These are the question and answer feature vectors obtained by calculating and aggregating the perturbed embedding vectors using the BERT model; Each represents a problem and answer The symbolic embedding vector is obtained by multiplying token_id with the symbolic embedding layer of the BERT model; It is space The perturbation vector within; To maximize Using the fast gradient sign method, adversarial perturbations are added through gradient ascent, as defined below: in, The maximum norm represents the perturbation, in order to avoid If it is too large, normalize it using the following formula: in, It is a symbolic function. Expressing the request The derivative with respect to q; Similarly, Defined as: ; Define the final loss function. The formula is as follows: 。

Citation Information

Patent Citations

  • Neural network model training method, device and equipment and readable storage medium

    CN111368989A

  • Table retrieval method based on deep learning

    CN113743539A