Question and answer pair matching model training method, question and answer pair matching method and device

CN122309635APending Publication Date: 2026-06-30CHINA PETROLEUM & CHEMICAL CORP +1

View PDF 0 Cites 0 Cited by

Patent Information

Authority / Receiving Office: CN · China
Patent Type: Applications(China)
Current Assignee / Owner: CHINA PETROLEUM & CHEMICAL CORP
Filing Date: 2024-12-27
Publication Date: 2026-06-30

Application Information

Patent Timeline

27 Dec 2024

Application

30 Jun 2026

Publication

CN122309635A

IPC: G06F16/3329; G06F18/214; G06N3/045; G06N3/08; G06N3/0442

AI Tagging

Technology Topics

Questions and answers Model parameters

Explore More Agents

Novelty Search
Search existing technologies and assess novelty
↗
FTO
Analyze whether a product may infringe others' patents
↗
Design FTO
Check prior-design risk for exterior design
↗
Drafting
Draft patent application text based on a technical solution
↗
Find Solutions with TRIZ
Generate feasible solution to solve your technical challenge
↗

Similar Technology Patents

Multi-model paper retrieval method for academic question answering
CN122019735BEnsure logical accuracyImprove discriminationDigital data information retrieval Semantic analysis Machine learningDocument retrieval
A problem-based generation education domain knowledge base search optimization method and device
CN117540063BImprove recallImprove experience Data processing applications Semantic analysisQuestion generationSemantic matching
A question and answer method and system based on multi-modal features
CN122264110ABiological models Inference methods Data pack Feature extraction
Question and answer processing method and apparatus, electronic device, medium, and program product
CN122311431AEvaluation result Questions and answers
An intelligent question and answer method and device combining expert and weight decomposition low rank adaptation
CN122287902AData setRiemannian optimization

Get free access to AI patent search and analysis

Check patentability, review prior art and ask IP Agent with full patent context.

AI Technical Summary

Technical Problem

Traditional question-answering matching methods struggle to deeply understand the semantic structure and logical order of questions, resulting in inaccurate extraction of key features and failing to meet the precise question-answering needs of scenarios such as intelligent customer service and knowledge question-answering systems.

Method used

A robust optimized bidirectional encoder representation-whole word mask model is used to encode the question-answer pair training data. It combines a feature enhancement layer and a cross-attention mechanism layer, and performs feature enhancement through an online long short-term memory network and a skip long short-term memory network. The cross-attention between the question and the answer is calculated, and the model parameters are adjusted using a target loss function.

Benefits of technology

It improves the accuracy of the question-answering pair matching model in predicting the matching degree between questions and answers, enhances the model's ability to handle complex data structures, and improves the response efficiency and accuracy of the intelligent question-answering system.

✦ Generated by Eureka AI based on patent content.

Smart Images

Figure CN122309635A_ABST

Patent Text Reader

Abstract

This disclosure relates to the field of artificial intelligence technology, and particularly to a question-answer pair matching model training method, question-answer pair matching method, and apparatus. The method includes: acquiring question-answer pair training data, wherein the question-answer pair training data includes question training data and answer training data; inputting the question-answer pair training data into a question-answer pair matching model, predicting the matching degree between the question training data and the answer data, and outputting a predicted matching probability, wherein the predicted matching probability represents the similarity between the question training data and the answer data; calculating a target loss function based on the predicted matching probability; and adjusting the model parameters of the question-answer pair matching model based on the target loss function to obtain a target question-answer pair matching model. This method can improve the accuracy of the answer data output by the model for question data.

Need to check novelty before this filing date? Find Prior Art

Description

Technical Field

[0001] This disclosure relates to the field of artificial intelligence technology, and in particular to a question-answer pair matching model training method, question-answer pair matching method and apparatus. Background Technology

[0002] Question-answer pair matching refers to determining the degree of match between a given question and a set of candidate answers. It is widely used in intelligent customer service systems, search engines, intelligent assistants, online education platforms, social media and online forums, medical consultations, and other fields. This process typically utilizes machine learning and natural language processing techniques to improve the relevance and accuracy between questions and answers.

[0003] Traditional methods primarily rely on simple keyword matching or lexical similarity calculations, which struggle to deeply understand the semantic structure and logical order of questions, leading to inaccurate key feature extraction. Furthermore, their limited processing capabilities when dealing with complex data structures prevent them from meeting the demands for precise question answers in scenarios such as intelligent customer service and knowledge-based question-answering systems. Therefore, improving the accuracy of the model's output answers for question data has become a pressing issue. Summary of the Invention

[0004] This disclosure provides a question-answer pair matching model training method, question-answer pair matching method and apparatus, which can solve the problem of how to improve the accuracy of the answer data output by the model for question data.

[0005] Firstly, this disclosure provides a method for training a question-answering pair matching model, including:

[0006] Obtain question-answer pair training data, which includes question training data and answer training data;

[0007] Input the question-answer pair training data into the question-answer pair matching model, predict the matching degree between the question training data and the answer data, and output the predicted matching probability. The predicted matching probability is used to represent the similarity between the question training data and the answer data.

[0008] Calculate the target loss function based on the predicted matching probability;

[0009] The model parameters of the question-answer pair matching model are adjusted according to the target loss function to obtain the target question-answer pair matching model.

[0010] In some embodiments, the question-answer pair matching model includes an encoding layer that inputs question-answer pair training data into the model to predict the matching degree between the question training data and the answer data, including:

[0011] In the encoding layer, a robust optimized bidirectional encoder representation-whole word mask model is used to encode the question-answer pair training data to obtain encoded question data and encoded answer data.

[0012] In some embodiments, the question-answering pair matching model further includes a feature enhancement layer, into which encoded question data and encoded answer data are input;

[0013] Inputting the question-answer pair training data into the question-answer pair matching model to predict the matching degree between the question training data and the answer data also includes:

[0014] In the feature enhancement layer, feature enhancement is performed on the encoded question data and the encoded answer data to obtain enhanced question data and enhanced answer data.

[0015] In some embodiments, feature enhancement is performed on the encoded question data and the encoded answer data, including:

[0016] Feature enhancement is performed on encoded problem data using an online long short-term memory network to obtain enhanced problem data. The online long short-term memory network includes a preset control signal, which guides the activation order of neurons in the online long short-term memory network. The preset control signal is correlated with the syntactic and semantic relationships of the encoded problem data.

[0017] By using a skip long short-term memory network to enhance the features of the encoded answer data, we obtain enhanced answer data. The skip long short-term memory network includes a skip matrix, which is correlated with the semantic relationships and syntactic structure of the encoded answer data.

[0018] In some embodiments, the question-answer pair matching model further includes a cross-attention mechanism layer, into which augmented question data and augmented answer data are input; inputting question-answer pair training data into the question-answer pair matching model to predict the matching degree between the question training data and the answer data, further includes:

[0019] The cross-attention mechanism is used to calculate the cross-attention between augmented question data and augmented answer data, thus obtaining the question-answer cross-attention.

[0020] The cross-attention mechanism is used to calculate the cross-attention between augmented answer data and augmented question data, thus obtaining the answer-question cross-attention.

[0021] The target attention is obtained by concatenating the question-answer cross-attention and the answer-question cross-attention.

[0022] Predict the degree of matching between training data and answer data based on target attention.

[0023] Secondly, this disclosure provides a question-answer pair matching method, including:

[0024] Obtain the problem data;

[0025] The question data is input into the target question-answering matching model for processing, and the corresponding answer data is output. The target question-answering matching model is trained by any of the methods in the first aspect.

[0026] Thirdly, this disclosure provides a training apparatus for a question-answering pair matching model, comprising:

[0027] The acquisition unit is used to acquire question-answer pair training data, which includes question training data and answer training data.

[0028] The prediction unit is used to input question-answer pair training data into the question-answer pair matching model, predict the matching degree between the question training data and the answer data, and output the predicted matching probability. The predicted matching probability is used to represent the similarity between the question training data and the answer data.

[0029] The calculation unit is used to calculate the target loss function based on the predicted matching probability;

[0030] The adjustment unit is used to adjust the model parameters of the question-answer pair matching model according to the target loss function to obtain the target question-answer pair matching model.

[0031] Fourthly, this disclosure provides a question-and-answer pair matching device, comprising:

[0032] The acquisition unit is used to acquire problem data;

[0033] The processing unit is used to input the question data into the target question-answer pair matching model for processing and output the answer data corresponding to the question data. The target question-answer pair matching model is trained by any of the methods in the first aspect.

[0034] Fifthly, this disclosure provides a computer device including a memory, a processor, and a computer program stored in the memory, the processor executing the computer program to implement the steps of the method described in the preceding aspects.

[0035] In a sixth aspect, this disclosure provides a computer-readable storage medium having a computer program stored thereon that, when executed by a processor, implements the steps of the method described in the above aspects.

[0036] In a seventh aspect, this disclosure provides a computer program product, including a computer program / instructions that, when executed by a processor, implement the steps of the methods described in the preceding aspects.

[0037] This disclosure provides a question-and-answer pair matching model training method, matching method, and apparatus. By using a question-and-answer pair training dataset, the model can learn rich linguistic features and contextual information. During training, by inputting questions and answers into the model and calculating their matching degree, the model can understand which features can represent the similarity between questions and answers. The model improves its prediction accuracy by learning which features have the greatest impact on the matching degree. By calculating the target loss function, the model can measure the gap between its prediction results and the true labels, thereby guiding the model's learning. The model parameters are adjusted according to the target loss function: this periodic feedback adjustment mechanism ensures that the model can gradually optimize its parameters in each iteration to minimize the loss, thereby enhancing the model's adaptability to input. As training continues, the model parameters will tend to the optimal value, ensuring that the model can accurately predict answers in real-world scenarios. Attached Figure Description

[0038] The present disclosure will be described in more detail below based on embodiments and with reference to the accompanying drawings:

[0039] Figure 1 This is a flowchart illustrating a question-answering pair matching model training method provided in an embodiment of this disclosure;

[0040] Figure 2 This is a flowchart illustrating a question-and-answer pair matching method provided in an embodiment of this disclosure;

[0041] Figure 3 This is a flowchart of a question-and-answer pair data processing method provided in an embodiment of this application;

[0042] Figure 4 This is a schematic diagram of the structure of a question-answering pair matching model training device provided in an embodiment of this disclosure;

[0043] Figure 5 This is a schematic diagram of the structure of a question-and-answer pair matching device provided in an embodiment of this disclosure;

[0044] Figure 6 This is a schematic diagram of the structure of a computer device provided in an embodiment of this disclosure.

[0045] In the accompanying drawings, the same parts are referred to by the same reference numerals, and the drawings are not drawn to scale. Detailed Implementation

[0046] To enable those skilled in the art to better understand the technical solutions of this disclosure, and to fully understand and implement the process of how this disclosure applies technical means to solve technical problems and achieve corresponding technical effects, the technical solutions in the embodiments of this disclosure will be clearly and completely described below with reference to the accompanying drawings. Obviously, the described embodiments are only some embodiments of this disclosure, not all embodiments. The embodiments of this disclosure and the various features within them can be combined with each other without conflict, and the resulting technical solutions are all within the protection scope of this disclosure. All other embodiments obtained by those skilled in the art based on the embodiments of this disclosure without creative effort should fall within the protection scope of this disclosure.

[0047] It should be noted that the terms "first," "second," etc., in the specification, claims, and accompanying drawings of this disclosure are used to distinguish similar objects and are not necessarily used to describe a specific order or sequence. It should be understood that such data can be interchanged where appropriate so that the embodiments of this disclosure described herein can be implemented in orders other than those illustrated or described herein. Furthermore, the terms "comprising" and "having," and any variations thereof, are intended to cover non-exclusive inclusion; for example, a process, method, system, product, or apparatus that comprises a series of steps or units is not necessarily limited to those steps or units explicitly listed, but may include other steps or units not explicitly listed or inherent to such processes, methods, products, or apparatus.

[0048] It should be noted that the steps shown in the flowchart in the accompanying drawings can be executed in a computer system such as a set of computer-executable instructions, and although a logical order is shown in the flowchart, in some cases the steps shown or described may be executed in a different order than that shown here.

[0049] Example 1

[0050] Figure 1 This is a flowchart illustrating a question-answering pair matching model training method provided in an embodiment of this disclosure. Figure 1 As shown, it includes the following steps S101 to S104.

[0051] S101. Obtain question-answer pair training data.

[0052] The question-answer pair training data includes question training data and answer training data.

[0053] A question-answer pair can be understood as a set of paired data containing questions and corresponding answers. Each question-answer pair contains a question (such as "What is machine learning?") and one or more answers (such as "Machine learning is a subset of artificial intelligence..."). Question-answer pairs can include question and answer pairs from different categories and contexts.

[0054] Training data refers to the dataset used to train a machine learning model. Obtaining question-answer pair training data is the first step in model learning. The model needs enough data to adapt to the potential relationship between questions and answers.

[0055] As an example, and not a limitation, question-answer pair training data can be obtained from multiple sources, such as publicly available question-answer datasets, publicly accessible social media, forums, or manually generated data. Training data can be stored in CSV, JSON, or a database, ensuring that each question-answer pair has a clearly defined format.

[0056] Suppose we obtain the following question-answer pairs from an open dataset:

[0057] Question 1: What is machine learning?

[0058] Answer 1: Machine learning is a subset of artificial intelligence that enables computers to learn from data.

[0059] Question 2: Please tell me the definition of machine learning.

[0060] Answer 2: Machine learning is a field of computer science that focuses on using algorithms to improve performance from experience.

[0061] Question 3: How's the weather today?

[0062] Answer 3: The current time is 10 a.m.

[0063] It is understandable that the specific questions and answers will vary depending on the actual situation, and no restrictions will be made here.

[0064] Obtaining high-quality training data is a crucial step in model performance, directly impacting training effectiveness and final prediction accuracy. Therefore, in the early stages of training, it is essential to ensure a sufficient quantity and quality of question-and-answer pairs as input data for the model.

[0065] S102. Input the question-answer pair training data into the question-answer pair matching model, predict the matching degree between the question training data and the answer data, and output the predicted matching probability.

[0066] Predicted matching probability is used to represent the degree of similarity between the question training data and the answer data.

[0067] Question-answering pair matching models are machine learning models used to determine the degree of matching between a given question and an answer. For example, question-answering pair matching models can include deep learning models based on BERT or RoBERTa.

[0068] The predicted match probability is the output generated by the model after inputting the question and answer, representing the similarity between the question and the answer, typically expressed as a probability value between 0 and 1. By inputting question-answer pairs into training data, the model extracts features and calculates the similarity between questions and answers, then converts this into a match probability using an activation function (such as Sigmoid or Softmax).

[0069] As an example, and not a limitation, the input data is preprocessed, including word segmentation and encoding (e.g., using BERT embeddings), and then fed into the matching model. The model calculates the prediction result and then applies an activation function to convert it into matching probabilities.

[0070] Referring back to the example above, the model's output is a matching probability between 0 and 1. For example:

[0071] For the question "What is machine learning?" and the answer "Machine learning is a subset of artificial intelligence that enables computers to learn from data.", the model might return 0.95.

[0072] For the question "How's the weather today?" and the answer "It's 10 a.m. now", the model might return 0.1.

[0073] It is understandable that the specific output probability will vary depending on the actual situation, and no limit will be made here.

[0074] By predicting the matching probability, the model can identify which questions and answers are most relevant, which helps guide the calculation of the loss function in subsequent steps.

[0075] S103. Calculate the target loss function based on the predicted matching probability.

[0076] The objective loss function is used to evaluate the difference between the model's predictions and the actual labels. Commonly used loss functions include cross-entropy loss and mean squared error.

[0077] The loss function measures the difference between the model's predicted output and the target value; the smaller the value, the better the model's performance.

[0078] Calculate the loss between the predicted and actual values based on the actual labels (1 if the question and answer match, 0 if they don't).

[0079] Based on the example above, suppose the model's predicted probabilities and true labels are as follows:

[0080] The predicted value (Question 1 and Answer 1) is 0.95, and the true label is 1.

[0081] The predicted value (for question 3 and answer 3) is 0.1, and the true label is 0.

[0082] The target loss function can be obtained by using the cross-entropy loss function.

[0083] The purpose of calculating the loss function is to quantify the model's performance and provide a basis for subsequent parameter tuning and model optimization.

[0084] S104. Adjust the model parameters of the question-answer pair matching model according to the target loss function to obtain the target question-answer pair matching model.

[0085] Model parameters refer to the weights used within a model to determine its output; for example, in deep learning, these are the weights and biases of the network.

[0086] A target question-answering pair matching model refers to a final model that, after training, can accurately match question and answer pairs.

[0087] Adjust the model parameters based on the results of the loss function to make the model output closer to the true label.

[0088] Update model parameters using optimization algorithms such as Adam or SGD. Adjust weights by calculating the gradient of each parameter (based on the loss function), replacing old parameters to improve model performance. Repeat this process until the maximum number of iterations is reached or the loss function converges. By adjusting the parameters, the model can optimize its predictive ability, thereby improving the accuracy of question-answer pair matching.

[0089] Through the above steps, the model continuously acquires new question-answer pair data, performs training, calculates the loss, and updates the parameters. Ultimately, it trains a model capable of accurately determining the degree of matching between questions and answers.

[0090] Using question-and-answer pairs for training ensures the model learns rich linguistic features and contextual information. During training, by inputting questions and answers into the model and calculating their matching degree, the model learns which features characterize the similarity between questions and answers. By learning which features have the greatest impact on the matching degree, the model improves its prediction accuracy. By calculating the target loss function, the model can measure the gap between its predictions and the true labels, thus guiding its learning. Adjusting model parameters based on the target loss function: this periodic feedback adjustment mechanism ensures that the model gradually optimizes its parameters in each iteration to minimize loss, thereby enhancing the model's adaptability to input. As training continues, the model's parameters tend towards optimal values, ensuring the model can accurately predict answers in real-world scenarios.

[0091] Example 2

[0092] Based on the above embodiments, the question-answer pair matching model includes an encoding layer. The question-answer pair training data is input into the model to predict the matching degree between the question training data and the answer data, including:

[0093] In the encoding layer, a robust optimized bidirectional encoder representation-whole word mask model is used to encode the question-answer pair training data to obtain encoded question data and encoded answer data.

[0094] The encoding layer is a part of the model responsible for converting input data (such as text) into a numerical representation that can be used for computation. It typically includes various feature extraction and embedding techniques.

[0095] Encoded problem data refers to the mapping vector obtained through the encoding layer of the model, which represents the semantic features of the problem.

[0096] Encoded answer data refers to the mapping vector obtained through the encoding layer of the model, which represents the semantic features of the answer.

[0097] Robust Optimized Bidirectional Encoder Representation (RoBERTa) is a pre-trained language model based on the Transformer architecture. It is an improved version of BERT and is suitable for a wider range of text understanding tasks.

[0098] Whole word masking is a strategy used in natural language processing (NLP) tasks to train language models and improve their understanding of textual context.

[0099] In traditional masked language models (such as the masking strategy used in BERT), the model randomly selects some words from the vocabulary during the pre-training phase to mask them (i.e., replaces these words with special mask labels). The model's goal is then to predict the masked words from the context. For example, if the sentence is "Machine learning is an artificial intelligence technology," the word "learning" might be masked as "[MASK]" to train the model to recognize it.

[0100] On the other hand, whole-word masking masks the entire word (rather than parts of it) during the masking process. This means that if a word is selected for masking, the entire word is replaced with the masked marker. For example, if the word "machine learning" is masked, both "machine" and "learning" will be replaced, rather than just masking either "learning" or a part of "machine".

[0101] The advantage of whole-word masking is that by masking the entire word, the model can better capture the complete semantics of the word, avoiding the ambiguity that may occur when masking individual sub-words. In some cases, the meaning or usage of a part (sub-word) of a word differs significantly from that of the whole word. Whole-word masking enables the model to infer the meaning of a word from its context.

[0102] In other words, the robust optimized bidirectional encoder representation-wwm model uses a masking strategy that randomly selects some complete words for masking during pre-training, rather than masking a subset of words like BERT. This effectively improves the model's ability to understand context.

[0103] Suppose there is a statement: "Natural language processing is an important area of artificial intelligence."

[0104] If "natural language processing" needs to be masked, the effect of whole-word masking is as follows: "[MASK] is an important area of artificial intelligence".

[0105] The traditional masking method might be: "[MASK] Language processing is an important area of artificial intelligence."

[0106] In the first case, the entire phrase is masked, and the model needs to rely on the context to predict its meaning; while in traditional methods, some words or sub-words are masked, and the information provided by the context may not be complete enough, affecting the accuracy of understanding.

[0107] The introduction of whole word masking is an optimization of the model training strategy, which aims to improve the language model's ability to understand words and their context, so that the trained model can perform more accurately and reliably in practical applications.

[0108] Based on the examples above, for the question-and-answer pairs Q and A in the training set, RoBERTa-wwm encoding is used, as shown in Equations (1) and (2).

[0109]

[0110] in, n is the number of words. This represents the sentence after RoBERTa-wwm encoding.

[0111] Example 3

[0112] Based on the above embodiments, the question-answering pair matching model also includes a feature enhancement layer, into which encoded question data and encoded answer data are input;

[0113] Inputting the question-answer pair training data into the question-answer pair matching model to predict the matching degree between the question training data and the answer data also includes:

[0114] In the feature enhancement layer, feature enhancement is performed on the encoded question data and the encoded answer data to obtain enhanced question data and enhanced answer data.

[0115] The feature enhancement layer is responsible for further processing the encoded input features (such as question data and answer data) to extract more useful information.

[0116] Enhanced problem data refers to problem data that has been processed by a feature enhancement layer, containing more information and features compared to the original encoded problem data.

[0117] Enhanced answer data refers to answer data that has been processed by the feature enhancement layer, containing more information and features compared to the original coded answer data.

[0118] The purpose of the feature enhancement layer is to improve the expressive power of the input features through various methods (such as adding additional features, nonlinear transformations, and contextual information integration), thereby enhancing the model's ability to judge the matching degree between questions and answers. It can also be understood as enabling the model to better understand the complex relationship between questions and answers, thus improving the accuracy of matching degree prediction, providing the model with additional contextual information or features, and improving the model's overall performance.

[0119] In some embodiments, feature enhancement is performed on the encoded question data and the encoded answer data, including:

[0120] Enhanced question data is obtained by using an online long short-term memory (LSTM) network to perform feature enhancement on encoded question data. The LSTM network includes a preset control signal, which guides the activation order of neurons in the LSTM network. The preset control signal is correlated with the syntactic and semantic relationships of the encoded question data. Enhanced answer data is obtained by using a skip LSTM network to perform feature enhancement on encoded answer data. The skip LSTM network includes a skip matrix, which is correlated with the semantic and syntactic relationships of the encoded answer data.

[0121] For the questions, On-LSTM (Ordered Neurons Long Short-Term Memory) is used for feature enhancement. By introducing the orderliness of neurons, On-LSTM can better capture the semantic structure and logical sequence of questions. When processing questions, it can orderly activate neurons at different locations according to the grammatical and semantic relationships of the sentence, thereby extracting the key features of the question more accurately.

[0122] Control signals are the signals used in On-LSTM to guide the activation order of neurons. These signals can adjust the network's behavior based on the syntactic and semantic relationships of the data.

[0123] Grammatical relations refer to the grammatical structures and relationships between words (such as parts of speech, syntax, etc.).

[0124] Semantic relations refer to the semantic and conceptual relationships between words.

[0125] On-LSTM leverages the long-term and short-term dependencies of time series data, combined with control signals, to enhance the feature representation of encoded problem data through dynamic adjustment of historical information and the current state.

[0126] Leaping Long Short-Term Memory (Leap-LSTM) is an improved LSTM architecture that allows the network to skip certain time steps in a time series to capture dependencies over longer distances. This improves the understanding of semantic relationships and syntactic structures.

[0127] The skip matrix is a matrix used in Leap-LSTM to control the time steps skipped, and it corresponds to the semantic relationships and syntactic structure of the encoded answer data.

[0128] Through the skipping mechanism, skipping LSTM can process the input sequence more flexibly, enabling the network to learn semantic and syntactic information over a longer range, rather than relying solely on adjacent time steps.

[0129] ① Enhanced Problem Characteristics

[0130] For a question, the input question vector after BroBERTa encoding is: Where n is the sequence length. Let d be the vector representation of the i-th position in the question.

[0131] The core computational steps of On-LSTM are as follows:

[0132] 1) Gate of Oblivion:

[0133]

[0134] Among them W f and U f It is a weight matrix (all dimensions are d×d), b f σ is the bias vector (of dimension d), and σ is the sigmoid activation function. It is the hidden state of the i-th position in the previous time step, f ti It controls the degree to which information from the previous moment is retained.

[0135] 2) Input Gate:

[0136]

[0137] Among them W i U i W C U C This is the corresponding weight matrix (both with dimensions d×d), b i b C It is a bias vector (dimension d), i ti It determines the update level of the current input information. It is the candidate memory content generated from the current input.

[0138] 3) Control of neuronal activation sequence:

[0139] On-LSTM introduces the orderliness of neurons through an additional control signal g. t To achieve this. Let g t It is a scalar representing the current level of activation, which can be dynamically generated based on the grammatical and semantic relationships of the question.

[0140]

[0141] here Representing element-wise multiplication, C t-1,i It is the state of the memory unit at the previous moment, through g t The dynamic adjustment of the current input information allows neurons to be activated in a certain order, better capturing the semantic structure and logical order in the question.

[0142] 4) Output gate:

[0143]

[0144] Among them W o and U o It is a weight matrix (all dimensions are d×d), b o It is a bias vector (dimension d), o ti By controlling the degree to which the memory unit outputs information, the enhanced question feature representation is ultimately obtained.

[0145] ② Enhanced answer features

[0146] For the answer, Leap-LSTM (Leaping Long Short-Term Memory) is used for feature enhancement. Leap-LSTM has a unique skip connection mechanism that can span different time steps, better integrating long-range information in the answer. When processing the answer, it can effectively capture the contextual dependencies in the answer text, especially for longer answer paragraphs, where Leap-LSTM can quickly associate important information from different locations.

[0147] The input question vector after BroBERTa encoding is Where n is the sequence length. Let d be the vector representation of the j-th position in the answer.

[0148] The calculation process of Leap-LSTM is as follows:

[0149] 1) Standard LSTM computation section:

[0150] Similar to traditional LSTM, the forget gate, input gate, and candidate memory contents are first calculated:

[0151]

[0152] The dimensions of each weight matrix and bias vector are the same as the corresponding parts in On-LSTM, and their functions are similar.

[0153] 2) Jump connection mechanism:

[0154] Leap-LSTM introduces skip connections, integrating long-range information through a skip matrix S (m×m). Let S be a learnable matrix that learns the skip connection weights between different positions based on the semantic and syntactic structure of the answer text.

[0155]

[0156] Here, information from other locations is integrated into the memory unit update of the current location in a skip-like manner through S, enabling Leap-LSTM to effectively capture the contextual dependencies in the answer text across different time steps. In particular, for longer answer paragraphs, it can quickly associate important information from different locations.

[0157] 3) Output gate:

[0158]

[0159] The final enhanced answer feature representation is obtained.

[0160] Understandably, in question-answering matching tasks, questions and answers often contain long-range dependencies, such as the understanding of a word in one sentence in relation to a word in another sentence or paragraph. In such cases, traditional LSTMs may face the problem of vanishing gradients. On-LSTM and Leap-LSTM improve their ability to capture long-range dependencies by introducing improved gating mechanisms and non-linear operations, enabling the model to better preserve and reference important contextual information.

[0161] Using these two networks for feature enhancement can improve the model's robustness to noise and uncertainty. The complexity of both the representation of the question data and the representation of the answer data is effectively enhanced, enabling the model to maintain high performance even when faced with inputs of different forms and expressions.

[0162] Because On-LSTM and Leap-LSTM allow the model to dynamically select which information to use to update the state at each time step. This flexibility enables the model to extract features based on specific circumstances at different time steps of the input, rather than relying solely on past states. This is especially beneficial in question-answering matching tasks, where the matching relationships between questions and answers are often complex and diverse.

[0163] By applying different types of feature enhancement methods to questions and answers respectively, multi-level and multi-dimensional feature representations can be provided to the model. Complementary characteristics are more likely to help the model capture rich semantic information and improve matching accuracy. For example, question features may focus more on intent, while answer features may focus on specific details. This complementary characteristic helps to form a more comprehensive representation.

[0164] In other words, by employing On-LSTM and Leap-LSTM for feature enhancement, the model is able to better understand and handle the complex relationships between questions and answers. These improvements not only enhance the ability to represent features but also improve the model's robustness and learning efficiency, ultimately improving the accuracy of question-answering pair matching tasks. This structure undoubtedly lays the foundation for improving the performance of intelligent question-answering systems.

[0165] Example 4

[0166] Based on the above embodiments, the question-answer pair matching model further includes a cross-attention mechanism layer. Enhanced question data and enhanced answer data are input into the cross-attention mechanism layer. The question-answer pair training data is input into the question-answer pair matching model to predict the matching degree between the question training data and the answer data. The model further includes: calculating the cross-attention between the enhanced question data and the enhanced answer data using the cross-attention mechanism to obtain question-answer cross-attention; calculating the cross-attention between the enhanced answer data and the enhanced question data using the cross-attention mechanism to obtain answer-question cross-attention; concatenating the question-answer cross-attention and the answer-question cross-attention to obtain the target attention; and predicting the matching degree between the question training data and the answer data based on the target attention.

[0167] Cross-attention is a method to enhance a model's understanding of the relationships between different parts of an input sequence. It gains contextual information by calculating the attention of elements of one sequence to elements of another sequence. The principle is to capture the relationship between two sequences by calculating the degree of attention one sequence (e.g., a question) gives to another sequence (e.g., an answer), enabling the model to understand their interdependence.

[0168] Question-answer cross-attention refers to the attention weight matrix obtained by applying a cross-attention mechanism to augmented question data and augmented answer data.

[0169] Answer-question cross-attention refers to the attention weight matrix obtained by applying a cross-attention mechanism to augmented answer data and augmented question data.

[0170] Target attention can be understood as the result of combining question-answer cross attention and answer-question cross attention.

[0171] In question-and-answer matching scenarios, the cross-attention mechanism offers significant advantages. It enhances semantic understanding, deeply analyzes the relationship between questions and answers, and accurately captures key information. Regarding improved matching accuracy, it can dynamically adjust the attention distribution to adapt to different question-and-answer combinations, effectively reducing noise interference. Furthermore, this mechanism enables the fusion of different types of information and the processing of complex data structures.

[0172] The formula for calculating question-answer cross-attention is shown in equation (10):

[0173]

[0174] The formula for calculating cross-attention in the answer-question section is shown in equation (11):

[0175]

[0176] The question-answer and answer-question cross-attention matrices are concatenated as shown in Equation (12):

[0177]

[0178] The purpose of cross-attention mechanisms is to capture the dependencies and connections between questions and answers by focusing on their semantic and syntactic relationships. Target attention combines information from both, providing rich feature representations for subsequent matching degree prediction, thereby improving the model's accuracy and robustness. In other words, cross-attention mechanisms provide richer contextual information by strengthening the semantic mapping between questions and answers, enabling the model to more accurately identify the close relationships between question-answer pairs when processing matching. By calculating cross-attention between question-answer and answer-question pairs, the resulting target attention provides strong support for subsequent matching degree prediction, improving the overall performance and effectiveness of question-answer matching models.

[0179] Next, the interaction matrix of the prediction layer with the question is used. and The matching relationship between the questions is calculated, i.e., the similarity.

[0180] The matching problem between questions is regarded as a classification problem, and Softmax is used to transform the concatenated question interaction matrix so that the prediction result is between 0 and 1, as shown in Equation (13).

[0181] p = softmax(WV) p +b). (13)

[0182] In other words, by calculating the question interaction matrix, the similarity between question pairs can be extracted and treated as a classification problem, transforming the input into a probability distribution. Through this calculation and transformation, the matching relationship between questions is clearly reflected, providing an important basis for the model's subsequent decisions.

[0183] The binary cross-entropy is used as the loss function, and the calculation formula is shown in equation (14):

[0184]

[0185] Among them, y i p(y) represents the real label of the question pair. i ) represents the probability of the prediction.

[0186] Binary cross-entropy is a loss function used in binary classification tasks to quantify the difference between the model's predicted values and the true labels. Its calculation compares the actual corresponding labels with the predicted probabilities output by the model. In question-answering pair matching, the labels are typically 0 or 1, indicating whether the question pairs match (1 for match, 0 for no match). It can also be understood that the binary cross-entropy loss function calculates the degree of inconsistency between the true labels and the model's predicted probabilities. By using the binary cross-entropy loss function, the difference between the model's predictions and the true labels can be clearly measured, providing direction for model optimization. Optimization algorithms adjust model parameters based on the value of the loss function, gradually reducing the loss and improving the model's performance on question-answering pair matching tasks.

[0187] In question-answer pair matching models, the use of a cross-attention mechanism layer and a series of operations based on this mechanism is mainly to better capture and represent the complex relationship between questions and answers, thereby improving the accuracy of matching degree prediction.

[0188] It's understandable that cross-attention mechanisms help models focus on the important relationships between questions and answers. In practical applications, questions and answers often contain different information; through cross-attention, models can effectively identify the dependencies and semantic connections between them. For example, in the case of the question "Why do things revive in spring?", the answer is "Warm sunshine and ample water in spring enable plants to grow." Through cross-attention, the model can recognize the importance of "spring" to growth.

[0189] After acquiring question-answer cross-attention and answer-question cross-attention, the model can understand questions and answers in different contexts. For example, question-answer cross-attention allows the model to know which part of the answer best addresses the key points of the question, while answer-question cross-attention can start from the answer to understand the intent of the question. This bidirectional information processing can significantly improve the comprehensiveness of understanding.

[0190] By concatenating the question-answer cross-attention and the answer-question cross-attention, the resulting target attention is a novel representation that integrates the features of both attention types. This representation not only captures the relationship between the question and the answer but also synthesizes their information, thus forming a richer feature representation. The model can use this feature to make more accurate matching predictions.

[0191] Ultimately, the step of predicting the matching degree based on target attention can effectively improve the accuracy of assessing the similarity between questions and answers. In this way, the model can output a speed prediction result, indicating the degree of matching between the question and the answer, which is of great value for practical applications such as information retrieval and dialogue systems.

[0192] Thanks to the introduction of the cross-attention mechanism layer, the model can better adapt to different question and answer scenarios. For example, even if the question wording or answer selection differs, the model can still accurately evaluate based on the key contextual information it pays attention to, achieving stronger generalization ability.

[0193] The method in the above embodiments utilizes a RoBERTa-wwm encoding layer to encode question-answer pairs. Next, in the question-answer feature enhancement layer, On-LSTM is used to enhance the features of the questions to capture semantic structure and logical order, while Leap-LSTM is used to enhance the answers to integrate long-range information. A cross-attention mechanism layer strengthens semantic understanding, reduces noise interference, and achieves information fusion. The prediction layer determines the question-answer matching relationship and uses Softmax to transform the results. Finally, binary cross-entropy is used as the loss function to ensure accuracy.

[0194] Example 5

[0195] Figure 2 This is a flowchart illustrating a question-and-answer pair matching method provided in an embodiment of this disclosure. Figure 2 As shown, it includes the following steps S201 to S202.

[0196] S201. Obtain problem data.

[0197] As an example rather than a limitation, obtain the problem data that needs to be processed from the user or data source.

[0198] For example, users can input one or more questions via input (e.g., through a natural language processing interface), retrieve existing question data from a database or knowledge base, or obtain real-time question data via API calls (as in applications such as question-answering systems or chatbots).

[0199] S202. Input the question data into the target question-answer matching model, process it, and output the answer data corresponding to the question data.

[0200] The target question-answering pair matching model is trained using any of the above question-answering pair matching model training methods.

[0201] The question data is input into a pre-trained matching model, and the model can output the best matching answer based on the learning process during training.

[0202] A well-trained model can respond to user questions quickly and accurately, improving system response efficiency and user satisfaction. Furthermore, a well-trained model can be applied to various question-answering systems, including customer service systems, intelligent assistants, and educational software, offering high flexibility.

[0203] Example 6

[0204] Based on the above embodiments, this embodiment provides an application example.

[0205] Figure 3 This is a flowchart of a question-and-answer pair data processing method provided in an embodiment of this application.

[0206] Combination Figure 3 As can be seen, after inputting the question-answer pair data into the question-answer pair matching model, it is first encoded by the RoBERTa-wwm encoding layer to obtain encoded data; then, On-LSTM is used to enhance the question features, and Leap-LSTM is used to enhance the answer features; the cross-attention mechanism is used to calculate the question-answer and answer-question cross-attention matrices, and Softmax is used to transform the concatenated question interaction matrix so that the prediction result is between 0 and 1, and finally the prediction result is output.

[0207] The specific details are as follows.

[0208] (1) RoBERTa-wwm coding layer

[0209] For the question-and-answer pairs Q and A in the training set, RoBERTa-wwm encoding is used, as shown in formulas (1) to (2) above, which will not be repeated here.

[0210] (2) Question Answer - Feature Enhancement Layer

[0211] In the question-and-answer pair matching task, feature enhancements are performed on both the question and the answer in order to better process and understand the user's question and provide accurate answers.

[0212] ① Enhanced Problem Characteristics

[0213] For a question, the input is a question vector encoded by BRoBERTa, and the On-LSTM is used to enhance the features of the question vector. The specific calculation steps are as follows (refer to formulas (3) to (6) above, and will not be repeated here.

[0214] ② Enhanced answer features

[0215] For the answer, Leap-LSTM is used for feature enhancement. Leap-LSTM has a unique skip connection mechanism that can span different time steps, better integrating long-range information in the answer. When processing the answer, it can effectively capture the contextual dependencies in the answer text, especially for longer answer paragraphs, where Leap-LSTM can quickly associate important information from different locations.

[0216] The input is a question vector encoded by BroberTa. The calculation process of Leap-LSTM is as follows: refer to formulas (7) to (9) above. It will not be repeated here.

[0217] (3) Cross-attention mechanism layer

[0218] In question-and-answer matching scenarios, the cross-attention mechanism offers significant advantages. It enhances semantic understanding, deeply analyzes the relationship between questions and answers, and accurately captures key information. Regarding improved matching accuracy, it can dynamically adjust the attention distribution to adapt to different question-and-answer combinations, effectively reducing noise interference. Furthermore, this mechanism enables the fusion of different types of information and the processing of complex data structures.

[0219] The formula for calculating question-answer cross-attention is shown in formula (10), and the formula for calculating answer-question cross-attention is shown in formula (11). The question-answer and answer-question cross-attention matrices are concatenated as shown in formula (12).

[0220] (4) Prediction layer

[0221] Prediction layer question interaction matrix and The matching relationship between the questions is calculated, i.e., the similarity. This paper regards the matching problem between questions as a classification problem, and uses Softmax to transform the concatenated question interaction matrix so that the prediction result is between 0 and 1, as shown in formula (13).

[0222] (5) Loss Function

[0223] The binary cross-entropy is used as the loss function, and the calculation formula is shown in formula (14).

[0224] Figure 3 This is a question-and-answer pair data flow diagram provided in an embodiment of this application.

[0225] The above method accurately encodes question-and-answer pairs using the RoBERTa-wwm encoding layer. The On-LSTM in the question feature enhancement layer better captures the semantic structure and logical order of the questions, while the Leap-LSTM in the answer feature enhancement layer effectively integrates long-range information about the answers. The cross-attention mechanism layer strengthens semantic understanding, accurately captures key information, reduces noise interference, and achieves information fusion. The prediction layer accurately determines the question-and-answer matching relationship, and the loss function ensures the accuracy and stability of the model.

[0226] Example 7

[0227] The question-and-answer pair matching model training apparatus of this application embodiment will now be described with reference to the accompanying drawings. For the sake of brevity, appropriate omissions will be made in the following description of the apparatus; relevant content can be referred to in the relevant description of the method above, and will not be repeated.

[0228] Figure 4 This is a schematic diagram of the structure of a question-answering pair matching model training device provided in an embodiment of this disclosure.

[0229] like Figure 4 As shown, the device 1000 includes the following units.

[0230] The acquisition unit 1001 is used to acquire question-answer pair training data, which includes question training data and answer training data.

[0231] The prediction unit 1002 is used to input question-answer pair training data into the question-answer pair matching model, predict the matching degree between the question training data and the answer data, and output the predicted matching probability. The predicted matching probability is used to represent the similarity between the question training data and the answer data.

[0232] The calculation unit 1003 is used to calculate the target loss function based on the predicted matching probability;

[0233] The adjustment unit 1004 is used to adjust the model parameters of the question-answer pair matching model according to the target loss function to obtain the target question-answer pair matching model.

[0234] In some embodiments, the question-answer pair matching model includes an encoding layer, and the prediction unit 1002 is further configured to input question-answer pair training data into the question-answer pair matching model to predict the matching degree between the question training data and the answer data, including:

[0235] In the encoding layer, a robust optimized bidirectional encoder representation-whole word mask model is used to encode the question-answer pair training data to obtain encoded question data and encoded answer data.

[0236] In some embodiments, the question-answering pair matching model further includes a feature enhancement layer, into which encoded question data and encoded answer data are input;

[0237] The prediction unit 1002 is also used to input the question-answer pair training data into the question-answer pair matching model to predict the matching degree between the question training data and the answer data, and further includes:

[0238] In the feature enhancement layer, feature enhancement is performed on the encoded question data and the encoded answer data to obtain enhanced question data and enhanced answer data.

[0239] In some embodiments, the prediction unit 1002 is further configured to perform feature enhancement on the encoded question data and the encoded answer data, including:

[0240] Feature enhancement is performed on encoded problem data using an online long short-term memory network to obtain enhanced problem data. The online long short-term memory network includes a preset control signal, which guides the activation order of neurons in the online long short-term memory network. The preset control signal is correlated with the syntactic and semantic relationships of the encoded problem data.

[0241] By using a skip long short-term memory network to enhance the features of the encoded answer data, we obtain enhanced answer data. The skip long short-term memory network includes a skip matrix, which is correlated with the semantic relationships and syntactic structure of the encoded answer data.

[0242] In some embodiments, the question-answer pair matching model further includes a cross-attention mechanism layer, into which augmented question data and augmented answer data are input; the prediction unit 1002 is further configured to input question-answer pair training data into the question-answer pair matching model, predict the matching degree between the question training data and the answer data, and further includes:

[0243] The cross-attention mechanism is used to calculate the cross-attention between augmented question data and augmented answer data, thus obtaining the question-answer cross-attention.

[0244] The cross-attention mechanism is used to calculate the cross-attention between augmented answer data and augmented question data, thus obtaining the answer-question cross-attention.

[0245] The target attention is obtained by concatenating the question-answer cross-attention and the answer-question cross-attention.

[0246] Predict the degree of matching between training data and answer data based on target attention.

[0247] It should be noted that the information interaction and execution process between the above-mentioned units are based on the same concept as the method embodiments of this application. For details on their specific functions and technical effects, please refer to the method embodiments section, which will not be repeated here.

[0248] Example 8

[0249] The question-and-answer matching device according to an embodiment of this application will now be described with reference to the accompanying drawings. For the sake of brevity, appropriate omissions will be made in the following description of the device; relevant content can be referred to in the relevant descriptions of the method above, and will not be repeated.

[0250] Figure 5 This is a schematic diagram of the structure of a question-and-answer pair matching device provided in an embodiment of this disclosure.

[0251] like Figure 5 As shown, the device 2000 includes the following units.

[0252] Unit 2001 is used to retrieve problem data;

[0253] The processing unit 2002 is used to input the question data into the target question-answer pair matching model for processing and output the answer data corresponding to the question data. The target question-answer pair matching model is trained by any of the methods in the first aspect.

[0254] The aforementioned apparatus 2000 may further include a storage unit 2003 for storing instructions and / or data, thereby implementing the methods described in the above embodiments.

[0255] It should be noted that the information interaction and execution process between the above-mentioned units are based on the same concept as the method embodiments of this application. For details on their specific functions and technical effects, please refer to the method embodiments section, which will not be repeated here.

[0256] Example 9

[0257] Based on the above embodiments, this embodiment provides a computer device 3000, including a memory 3200, a processor 3100, and a computer program 3210 stored in the memory. The processor 3100 executes the computer program 3210 to implement the steps of the method described in the above embodiments.

[0258] In some embodiments of this example, a computer-readable storage medium is provided, on which a computer program is stored, characterized in that the computer program, when executed by a processor, implements the steps of the method described in the above embodiments.

[0259] In some embodiments of this example, a computer program product is provided, including a computer program / instructions, characterized in that the computer program, when executed by a processor, implements the steps of the method described in the above embodiments.

[0260] The processor 3100 may include, but is not limited to, one or more processors or microprocessors. Each processor may be implemented as an Application Specific Integrated Circuit (ASIC), a Digital Signal Processor (DSP), a Digital Signal Processing Device (DSPD), a Programmable Logic Device (PLD), a Field Programmable Gate Array (FPGA), a controller, a microcontroller, a microprocessor, or other electronic components, for performing the methods described in the above embodiments.

[0261] Computer-readable storage media can be implemented by any type of volatile or non-volatile storage device or a combination thereof. Computer-readable storage media may include, but are not limited to, random access memory (RAM), read-only memory (ROM), flash memory, EPROM memory, EEPROM memory, registers, and computer storage media (e.g., hard disks, floppy disks, solid-state drives, removable disks, CD-ROMs, DVD-ROMs, Blu-ray discs, etc.).

[0262] Computer-readable storage media may also store at least one computer-executable program / instruction, such as computer-readable instructions. Computer-readable storage media include, but are not limited to, volatile memory and / or non-volatile memory. Volatile memory may include, for example, random access memory (RAM) and / or cache memory. Computer-readable storage media may include, for example, read-only memory (ROM), hard disk, flash memory, etc. For example, a non-transitory computer-readable storage medium may be connected to a computing device such as a computer, and then, when the computing device executes the computer-readable instructions stored on the computer-readable storage medium, the various methods described above can be performed.

[0263] In addition, the computer device 3000 may also include (but is not limited to) a data bus, an input / output (I / O) bus, a display, and input / output devices (e.g., keyboard, mouse, speakers, etc.).

[0264] The processor 3100 can communicate with external devices via wired or wireless networks through the I / O bus.

[0265] In one embodiment, the at least one computer-executable instruction may also be compiled into or comprise a software product / computer program product, wherein one or more computer-executable instructions are executed by a processor to perform the steps of the various functions and / or methods in the embodiments described herein.

[0266] In the embodiments provided in this disclosure, it should be understood that the disclosed apparatus and methods can also be implemented in other ways. The apparatus embodiments described above are merely illustrative; for example, the flowcharts and block diagrams in the accompanying drawings illustrate the architecture, functionality, and operation of possible implementations of apparatus, methods, and computer program products according to various embodiments of this disclosure. In this regard, each block in a flowchart or block diagram may represent a module, segment, or portion of code containing one or more executable instructions for implementing a specified logical function. It should also be noted that in some alternative implementations, the functions marked in the blocks may occur in a different order than those marked in the drawings. For example, two consecutive blocks may actually be executed substantially in parallel, and they may sometimes be executed in reverse order, depending on the functions involved. It should also be noted that each block in a block diagram and / or flowchart, and combinations of blocks in block diagrams and / or flowcharts, can be implemented using a dedicated hardware-based system that performs the specified function or action, or using a combination of dedicated hardware and computer instructions.

[0267] It should be noted that, in this disclosure, the terms "comprising," "including," or any other variations thereof are intended to cover non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements includes not only those elements but also other elements not expressly listed, or elements inherent to such a process, method, article, or apparatus. Without further limitation, an element limited by the phrase "comprising one..." does not exclude the presence of other identical elements in the process, method, article, or apparatus that includes that element.

[0268] While the embodiments disclosed herein are as described above, the foregoing content is merely for the purpose of facilitating understanding of this disclosure and is not intended to limit this disclosure. Any person skilled in the art to which this disclosure pertains may make any modifications and changes in form and detail of the implementation without departing from the spirit and scope of this disclosure; however, the scope of patent protection of this disclosure shall still be determined by the scope defined in the appended claims.

Claims

1. A method for training a question-answering pair matching model, characterized in that, include: Acquire question-answer pair training data, which includes question training data and answer training data; The question-answer pair training data is input into the question-answer pair matching model to predict the matching degree between the question training data and the answer data, and the predicted matching probability is output. The predicted matching probability is used to represent the degree of similarity between the question training data and the answer data. Calculate the target loss function based on the predicted matching probability; The model parameters of the question-answer pair matching model are adjusted according to the target loss function to obtain the target question-answer pair matching model.

2. The method according to claim 1, characterized in that, The question-answer pair matching model includes an encoding layer. The step of inputting the question-answer pair training data into the question-answer pair matching model and predicting the matching degree between the question training data and the answer data includes: In the encoding layer, a robust optimized bidirectional encoder representation-whole word mask model is used to encode the question-answer pair training data to obtain encoded question data and encoded answer data.

3. The method according to claim 2, characterized in that, The question-answering pair matching model further includes a feature enhancement layer, into which the encoded question data and the encoded answer data are input; The step of inputting the question-answer pair training data into the question-answer pair matching model to predict the matching degree between the question training data and the answer data further includes: In the feature enhancement layer, the encoded question data and encoded answer data are enhanced to obtain enhanced question data and enhanced answer data.

4. The method according to claim 3, characterized in that, The feature enhancement of the encoded question data and encoded answer data includes: Feature enhancement is performed on the encoded problem data using an online long short-term memory network to obtain enhanced problem data. The online long short-term memory network includes a preset control signal, which is used to guide the activation order of neurons in the online long short-term memory network. The preset control signal is correlated with the syntactic and semantic relationships of the encoded problem data. The encoded answer data is enhanced by using a skipping long short-term memory network to obtain enhanced answer data. The skipping long short-term memory network includes a skipping matrix, which is correlated with the semantic relationships and syntactic structure of the encoded answer data.

5. The method according to claim 3 or 4, characterized in that, The question-answering pair matching model also includes a cross-attention mechanism layer, into which the enhanced question data and the enhanced answer data are input; The step of inputting the question-answer pair training data into the question-answer pair matching model to predict the matching degree between the question training data and the answer data further includes: The cross-attention between the enhanced question data and the enhanced answer data is calculated using a cross-attention mechanism to obtain the question-answer cross-attention. The cross-attention mechanism is used to calculate the cross-attention between the enhanced answer data and the enhanced question data to obtain the answer-question cross-attention. The question-answer cross-attention and the answer-question cross-attention are concatenated to obtain the target attention; The matching degree between the question training data and the answer data is predicted based on the target attention.

6. A question-answer pair matching method, characterized in that, include: Obtain the problem data; The question data is input into the target question-answering matching model for processing, and the answer data corresponding to the question data is output. The target question-answering matching model is trained by the method described in any one of claims 1 to 5.

7. A question-answer pair matching model training device, characterized in that, include: An acquisition unit is used to acquire question-answer pair training data, which includes question training data and answer training data. The prediction unit is used to input the question-answer pair training data into the question-answer pair matching model, predict the matching degree between the question training data and the answer data, and output the predicted matching probability, which is used to represent the similarity between the question training data and the answer data; The calculation unit is used to calculate the target loss function based on the predicted matching probability; The adjustment unit is used to adjust the model parameters of the question-answer pair matching model according to the target loss function to obtain the target question-answer pair matching model.

8. A computer device, comprising a memory, a processor, and a computer program stored in the memory, characterized in that, The processor executes the computer program to implement the steps of the method according to any one of claims 1 to 5 or 6.

9. A computer-readable storage medium having a computer program stored thereon, characterized in that, When executed by a processor, the computer program implements the steps of the method described in any one of claims 1 to 5 or 6.

10. A computer program product, comprising a computer program, characterized in that, When executed by a processor, the computer program implements the steps of the method described in any one of claims 1 to 5 or 6.