An aspect-level sentiment triple extraction method based on a diffusion model
By employing a boundary denoising diffusion process based on a diffusion model and a contrastive denoising training strategy, the problem of word generation limitations in generative models is solved, achieving efficient and accurate prediction of multi-word aspect/opinion terms and improving the performance of aspect-level sentiment triple extraction.
Patent Information
- Authority / Receiving Office
- CN · China
- Patent Type
- Patents(China)
- Current Assignee / Owner
- UNIV OF ELECTRONICS SCI & TECH OF CHINA
- Filing Date
- 2024-08-28
- Publication Date
- 2026-06-26
AI Technical Summary
Existing generative models focus only on generating individual words during autoregressive decoding, limiting their ability to utilize holistic semantics when dealing with multi-word aspects/opinion terms.
We adopt a non-autoregressive diffusion model framework, defining the aspect-level sentiment triple extraction task as a boundary denoising diffusion process. By introducing Gaussian noise to simulate boundary uncertainty and introducing a contrastive denoising training strategy, we directly model the boundary index and use comprehensive contextual information for prediction.
It significantly improves the accuracy of aspect and opinion term boundary prediction, enhances the robustness and predictive performance of the model, and reduces the generation of erroneous triples.
Smart Images

Figure CN119168044B_ABST
Abstract
Description
Technical Field
[0001] This invention relates to the field of sentiment analysis technology, specifically to a method for extracting aspect-level sentiment triples based on a diffusion model. Background Technology
[0002] With the rapid development of social media and online commenting platforms, users express their opinions and emotions online in more diverse ways. Beyond overall sentiment analysis, in-depth mining of fine-grained sentiment information within text has become an important research topic. Against this backdrop, aspect-level sentiment triple extraction has gradually become a key technology in the field of natural language processing. Aspect-level sentiment triple extraction aims to simultaneously identify aspect words, related opinion words, and their corresponding sentiment polarity (triples: aspect word, opinion word, sentiment polarity) from text. For example, in a product review, a user might mention "the screen display is very clear," where "screen" is an aspect word, "clear" is an opinion word, and the overall expression conveys a positive sentiment polarity. By extracting this triple information, businesses can gain a more detailed understanding of users' specific opinions on different product attributes, providing valuable references for product optimization. This technology not only requires the model to recognize aspect words but also to accurately locate related opinion expressions and determine sentiment polarity. Compared to traditional sentiment classification tasks, aspect-level sentiment triple extraction requires text analysis at a fine-grained level, thus presenting greater complexity and challenges.
[0003] Furthermore, existing research often employs autoregressive Seq2Seq generative models. However, this autoregressive approach focuses only on generating a single word in each decoding step, limiting the model's ability to utilize holistic semantics when processing multi-word aspect / opinion terms. Therefore, this patent proposes an aspect-level sentiment triple extraction method based on a diffusion model. Using a non-autoregressive diffusion model framework and a contrastive denoising training strategy, it significantly improves the performance of the aspect-level sentiment triple extraction model. Summary of the Invention
[0004] The purpose of this invention is to address the problem that existing generative models, in their autoregressive decoding phase, focus only on the generation of single words, limiting the model's ability to utilize holistic semantics when handling multi-word aspects / opinion terms. Therefore, this invention, based on a non-autoregressive diffusion model framework, defines the aspect-level sentiment triple extraction task as a boundary denoising diffusion process. It directly models the boundary index, progressively refining and dynamically adjusting the boundary under noise conditions based on comprehensive contextual information. Furthermore, a contrastive denoising training strategy is introduced, which effectively mitigates the subtle variations in repetitive predictions introduced by the diffusion process, thereby solving the problems mentioned in the background section.
[0005] To achieve the above objectives, the present invention provides the following technical solution:
[0006] This invention provides a method for aspect-level sentiment triple extraction based on a diffusion model, comprising the following steps:
[0007] Step S10: For the boundary sequence T b Gaussian noise is introduced to simulate the uncertainty in identifying term boundaries, resulting in a noise sequence X. t ;
[0008] Step S20: In the denoising network f θ (X t ,S,t i In the example, the noise boundary sequence X is used. t Given sentence S as input, the corresponding prediction boundary sequence is then predicted.
[0009] Step S30: The sentiment classifier processes the sequence representation through a feedforward neural network (FFN) to output the probability distribution of sentiment categories;
[0010] Step S40: By comparing the denoising training strategy, positive and negative samples are generated to improve the accuracy of refining term boundaries and to strengthen the sentiment classification process by reducing the generation of incorrect triples;
[0011] Step S50: Define the loss function as matching loss and contrastive denoising loss, and update the model parameters;
[0012] Step S60: During the inference phase, a noise sequence is randomly sampled from a Gaussian distribution, and then iterative denoising is performed using the learned boundary index backdiffusion process. The prediction probability obtained from this denoising process corresponds to the prediction probability associated with the boundary index and its sentiment polarity.
[0013] In the above scheme, the specific steps of step S10 are as follows:
[0014] Step S101, the boundary index forward diffusion stage, gradually introduces Gaussian noise into the boundary sequence to simulate the inherent uncertainty in identifying term boundaries. For parallel training, this is achieved by copying... The quantity N is standardized to N train The normalized sequence is represented as Any noise sequence X at a given time step t t The calculation using a single-step Markov transition is as follows:
[0015]
[0016] in It is the cumulative noise retention ratio, representing all α values from time step 0 to t. s The product of consecutive products, β s Let represent a predefined variance table, where ∈ is a random vector, and each element is a sample drawn from a normal distribution. Let α represent the noise sampled from a standard Gaussian distribution, where α s This represents the noise retention ratio at the s-th time step;
[0017] Step S102, from noise state X t Initially, the backdiffusion process employs a non-Markovian denoising strategy, DDIM, to accurately reconstruct term boundaries. This process involves selecting a subsequence τ of length γ from the complete time-step sequence [1, ..., T], using the predicted boundary sequence from the previous time step. and predicted noise Iteratively refine the boundary sequence The iterative refinement process utilizes a trainable denoising network conditioned on S. As shown below:
[0018]
[0019]
[0020] in Indicates the τth i Noise retention ratio at each time step Indicates time step τ i The predicted boundary sequence; Indicates the τth i Noise boundary sequence at each time step, Indicates the τth i The predicted noise at each time step is determined as the noise boundary sequence. and predicted boundary sequences The normalized difference between them, then and Combined and adjusted according to their respective standard deviations, this process is repeated iteratively, as shown in the expression:
[0021]
[0022] Indicates the τth i-1 Estimates of the boundary sequence obtained during the backdiffusion process at each time step;
[0023] Indicates the τth i-1 The noise retention ratio at each time step;
[0024] After γ iterations of DDIM, the noise boundary index is gradually refined and converges to the accurate boundary index.
[0025] In the above scheme, the specific steps of step S20 are as follows:
[0026] Step S201, the encoder processes the input sentence S = {w1, w1, ..., w1} of length M. M} Transformed into h-dimensional sentence representation It is implemented as a pre-trained language model PLM with bidirectional LSTM;
[0027] H s =BiLSTM(BERT(S))
[0028] BERT(*) indicates that the input sentence S is processed by the bidirectional encoder BERT, and BiLSTM indicates a bidirectional long short-term memory network.
[0029] In step S202, the decoder's task is to process the sentence representation H. s The resulting noise boundary index sequence X t The semantic representation of this sequence, which represents aspects and viewpoints;
[0030] First, the noisy sequence is discretized into word indices by rescaling. Then, the sequence is represented as... The mean pooling is performed by specifying the start and end indices for each aspect and opinion item. The merged representation of the i-th sequence within the boundary sequence is calculated as follows:
[0031]
[0032] in, Indicates the start index of aspect items The hidden state at that location; Indicates the end index of aspect items The hidden state at that location; This indicates that the opinion item has begun indexing. The hidden state at that location; This indicates that the opinion item has begun indexing. The hidden state at the location; Pooling(*) represents the pooling operation;
[0033] The refined sequence representation is decoded using a Transformer decoder that integrates self-attention and cross-attention layers. The self-attention module utilizes the sequence representation H... X The generated queries, keys, and values facilitate interaction between sequences:
[0034] H sa =SelfAttention(H X )
[0035] Where SelfAttention(*) represents the attention mechanism. Meanwhile, the cross-attention mechanism further refines the sequence representation by incorporating a broader semantic context of the sentence, and utilizes the self-attention module H. sa The output is used as a query, with keys and values derived from the sentence representation H. s , is represented as:
[0036] H ca =CrossAttention(H sa H s )
[0037] CrossAttention(*) represents the cross-attention mechanism. To accommodate the iterative nature of the diffusion process, a sinusoidal code E corresponding to each time step t is used. t Integrating into the sequence representation, the final noisy sequence representation The calculation is as follows:
[0038]
[0039] Four index pointers are used to predict the boundary indices of aspects and viewpoints, respectively. For each index δ∈{a} s ,a e ,o s o e}, generate a fused representation It combines noisy sequence representation with sentence representation, with each index being predicted as a probability of a term boundary. as follows:
[0040]
[0041]
[0042] in It is a learnable matrix, and FFN(*) denotes a feedforward network. It is a type code used to distinguish aspects or viewpoint boundaries.
[0043] In the above scheme, the specific steps of step S30 are as follows:
[0044] Step S301, the sentiment classifier processes the sequence representation using FFN. The probability distribution of the output sentiment category is expressed as:
[0045]
[0046] in, C represents the total number of emotion polarity categories.
[0047] In the above scheme, the comparison denoising training strategy in step S40 is as follows:
[0048] By adding two different levels of noise λ1 and λ2 to N train Two types of samples, positive and negative, are generated from the ground truth values, where λ1 < λ2. After inverse diffusion processing, the decoder takes these two types of samples as input: positive samples have a noise scale smaller than λ1 and are used to reconstruct their corresponding ground truth values; negative samples have a noise scale larger than λ1 and smaller than λ2 and are used to predict the sentiment category "Invalid" (invalid triple) when the aspect word and opinion word do not match, denoted as ε. If a sentence has N train The true value, compared to the noise reduction training, will have 2×N. train For each sample, one positive sample and one negative sample are generated for each true value;
[0049] positive sample X + and negative sample X - The input decoder is specifically computed by first performing self-attention and cross-attention:
[0050] H sa+ =SelfAttention(X) + )
[0051] H sa- =SelfAttention(X) - )
[0052] H ca+ =CrossAttention(H sa+ H s )
[0053] H ca- =CrossAttention(H sa- H s )
[0054] Here, SelfAttention(*) represents the attention mechanism, and CrossAttention(*) represents the cross-attention mechanism. H s This is a sentence representation. To accommodate the iterative nature of the diffusion process, a sinusoidal code E corresponds to each time step t. t Integrating into the sequence representation, the final noisy sequence representation and The calculation is as follows:
[0055]
[0056]
[0057] Four index pointers are used to predict the boundary indices of aspects and viewpoints, respectively. For each index δ∈{a} s ,a e o s o e}, generate fused representation It combines noise sequence representation with sentence representation to obtain the boundary probability of positive samples.
[0058]
[0059]
[0060] in It is a learnable matrix, and FFN(*) denotes a feedforward network. It is a type code used to distinguish aspects or viewpoint boundaries.
[0061] The classification probabilities of positive and negative samples calculated by the sentiment classifier are respectively expressed as: and
[0062]
[0063]
[0064] in, C represents the total number of emotion polarity categories.
[0065] In the above scheme, the specific steps of step S50 are as follows:
[0066] Step S501, the training loss function includes matching loss and contrastive denoising loss;
[0067] Matching loss: In processing N train One prediction and the corresponding N train When expanding the true values, the Hungarian algorithm is used to establish the best match between the predicted value set and the true value set. Let represent the true situation corresponding to the i-th noise sequence. The matching loss includes boundary loss and sentiment classification loss. Subsequently, the inverse process is trained by maximizing the probability of the prediction:
[0068]
[0069] δ∈{a s a e o s o e} represents the boundary index between the aspect item and the view item, where a s and a e These represent the start and end indices of the aspect item, respectively, while o s and o e These represent the start and end indices of the viewpoint item, respectively;
[0070] This represents the true boundary index predicted by the model for the i-th noise sequence during the matching process. The probability, It is the true boundary index corresponding to the i-th noise sequence determined during the matching process. It is the probability distribution of the boundary index δ predicted by the model for the i-th noise sequence;
[0071] This indicates that during the matching process, for the i-th noise sequence, the model predicts the true sentiment category. The probability, It is the probability distribution of the sentiment category predicted by the model for the i-th noise sequence. It is the true sentiment category corresponding to the i-th noise sequence determined during the matching process; Contrast denoising loss: The contrast loss includes boundary loss and sentiment classification loss. Specifically, it is based only on the boundary probability of positive samples. Calculate the boundary loss, and the classification loss is based on the classification probabilities of positive and negative samples, respectively. and Therefore, the comparative loss is calculated as follows:
[0072]
[0073] This represents the probability of the true boundary index predicted by the i-th positive sample at the boundary index δ. This represents the true boundary index of the i-th positive sample;
[0074] This represents the probability that the i-th positive sample predicts the true sentiment category for sentiment category c. This represents the true sentiment category of the i-th positive sample;
[0075] This represents the probability that the i-th negative sample belongs to the "Invalid" category.
[0076] Joint optimization matching loss And contrast denoising loss The overall training loss can be expressed as:
[0077]
[0078] In the above scheme, the specific steps of step S60 are as follows:
[0079] Step S601: In the inference phase, N is randomly sampled from a Gaussian distribution. eval Starting with a noisy sequence, it then iteratively denoises based on a denoising time step τ using a back-diffusion process with the learned boundary index. The predicted probability derived from this denoising process corresponds to the boundary index probability and the sentiment category probability.
[0080] Using boundary index probabilities and sentiment category probabilities, the model decodes N. eval One candidate emotional triad After decoding, post-processing steps are performed:
[0081] Deduplication and filtering;
[0082] For triples with the same term boundary index, the algorithm retains the triple with the highest polarity probability.
[0083] Furthermore, the cumulative sum of predicted probabilities is below a threshold. The triples will be filtered.
[0084] The beneficial effects of this invention are:
[0085] The purpose of this invention is to address the limitation of existing generative models, which focus only on single-word generation during autoregressive decoding, restricting their ability to utilize holistic semantics when handling multi-word aspect / opinion terms. Therefore, this invention, based on a non-autoregressive diffusion model framework, defines the aspect-level sentiment triple extraction task as a boundary denoising diffusion process. Gaussian noise is injected into the boundary of aspect / opinion terms through a forward diffusion process, systematically introducing uncertainty. Unlike the traditional word-by-word generation paradigm, by directly modeling boundary indices, all boundary indices can be efficiently predicted in a single step, avoiding the tedious process of word-by-word generation, and fully utilizing comprehensive contextual information during prediction. By progressively refining and dynamically adjusting the boundaries under noisy conditions, this invention significantly improves the accuracy of aspect and opinion term boundary prediction. Furthermore, to further enhance the model's robustness and prediction performance, this invention introduces a contrastive denoising training strategy. This strategy, by denoising positive and negative samples, further trains and optimizes the decoder's performance, effectively mitigating the subtle variations in repetitive predictions introduced by the diffusion process, thereby solving the multi-word term prediction problem mentioned in the background and improving the performance of aspect-level sentiment triple extraction. Attached Figure Description
[0086] Figure 1 This is a flowchart of the method steps of the present invention. Detailed Implementation
[0087] The technical solutions of the embodiments of the present invention will be clearly and completely described below with reference to the accompanying drawings. Obviously, the described embodiments are only some embodiments of the present invention, and not all embodiments. Based on the embodiments of the present invention, all other embodiments obtained by those skilled in the art without creative effort are within the scope of protection of the present invention.
[0088] Please see Figure 1 The present invention provides a technical solution:
[0089] This invention provides the following technical solution: an aspect-level sentiment triple extraction method based on a diffusion model. The specific steps are illustrated using aspect-level sentiment triple extraction from restaurant reviews as an example. The specific steps for applying the aspect-level sentiment triple extraction method to the field of restaurant review sentiment triple extraction are as follows:
[0090] Step 1: In the forward diffusion stage, Gaussian noise is gradually added to the indices of these sequences. Conversely, the back diffusion process aims to gradually recover the original boundary indices;
[0091] Step 1-1, the boundary index forward diffusion stage, gradually introduces Gaussian noise into the boundary sequence to simulate the inherent uncertainty in identifying term boundaries. For parallel training, this is achieved by copying... The quantity N is standardized to N train The normalized sequence is represented as For any given time step t, the noise sequence is computed using a single-step Markov transition as follows:
[0092]
[0093] in It is the cumulative noise retention ratio, representing all α values from time step 0 to t. s The product of consecutive products, β s Let represent a predefined variance table, where ∈ is a random vector, and each element is a sample drawn from a normal distribution. Let α represent the noise sampled from a standard Gaussian distribution, where α s This represents the noise retention ratio at the s-th time step;
[0094] Steps 1-2 begin with a noisy state. The backdiffusion process employs a non-Markovian denoising strategy, DDIM. DDIM is used to accurately reconstruct term boundaries. This process involves selecting a subsequence τ of length γ from the complete time-step sequence [1, ..., T], using the predicted boundary sequence from the previous time step. and predicted noise Iteratively refine the boundary sequence The iterative refinement process utilizes a trainable denoising network conditioned on S. As shown below:
[0095]
[0096]
[0097] in Indicates the τth i Noise retention ratio at each time step Indicates time step τ i The predicted boundary sequence; Indicates the τth i Noise boundary sequence at each time step, Indicates the τth i The predicted noise at each time step is determined as the noise boundary sequence. and predicted boundary sequences The normalized difference between them, then and Combined and adjusted according to their respective standard deviations, this process is repeated iteratively, as shown in the expression:
[0098]
[0099] Indicates the τth i-1 Estimates of the boundary sequence obtained during the backdiffusion process at each time step;
[0100] Indicates the τth i-1 The noise retention ratio at each time step;
[0101] After Y iterations of DDIM, the noise boundary index is gradually refined and converges to the accurate boundary index.
[0102] Step 2: In the denoising network f θ (X t ,S,t i In the example, the noise boundary sequence X is used. t Given sentence S as input, the corresponding prediction boundary sequence is then predicted.
[0103] Step 2-1, the encoder processes the input sentence S = {w1, w1, ..., w1} of length M. M} Transformed into h-dimensional sentence representation It is implemented as a pre-trained language model PLM with bidirectional LSTM;
[0104] H S =BiLSTM(BERT(S))
[0105] BERT(*) indicates that the input sentence S is processed by the bidirectional encoder BERT, and BiLSTM indicates a bidirectional long short-term memory network.
[0106] Step 2-2, the decoder's task is to process the sentence representation H. s The resulting noise boundary index sequence X t The semantic representation of this sequence, which represents aspects and viewpoints;
[0107] First, the noisy sequence is discretized into word indices by rescaling. Then, the sequence is represented as... The mean pooling is performed by specifying the start and end indices for each aspect and opinion item. The merged representation of the i-th sequence within the boundary sequence is calculated as follows:
[0108]
[0109] in, Indicates the start index of aspect items The hidden state at that location; Indicates the end index of aspect items The hidden state at that location; This indicates that the opinion item has begun indexing. The hidden state at that location; This indicates that the opinion item has begun indexing. The hidden state at the location; Pooling(*) represents the pooling operation;
[0110] The refined sequence representation is decoded using a Transformer decoder that integrates self-attention and cross-attention layers. The self-attention module utilizes the sequence representation H... X The generated queries, keys, and values facilitate interaction between sequences:
[0111] H sa =SelfAttention(H X )
[0112] Where SelfAttention(*) represents the attention mechanism. Meanwhile, the cross-attention mechanism further refines the sequence representation by incorporating a broader semantic context of the sentence, and utilizes the self-attention module H. sa The output is used as a query, with keys and values derived from the sentence representation H. s , is represented as:
[0113] H ca =CrossAttention(H sa H s )
[0114] CrossAttention(*) represents the cross-attention mechanism. To accommodate the iterative nature of the diffusion process, a sinusoidal code E corresponding to each time step t is used. t Integrating into the sequence representation, the final noisy sequence representation The calculation is as follows:
[0115]
[0116] Four index pointers are used to predict the boundary indices of aspects and viewpoints, respectively. For each index δ∈{a} s a e o s o e}, generate a fused representation It combines noisy sequence representation with sentence representation, with each index being predicted as a probability of a term boundary. as follows:
[0117]
[0118]
[0119] in It is a learnable matrix, and FFN(*) denotes a feedforward network. It is a type code used to distinguish aspects or viewpoint boundaries.
[0120] Step 3: The sentiment classifier processes the sequence representation through a feedforward neural network (FFN) to output the probability distribution of sentiment categories;
[0121] Step 3-1: The sentiment classifier processes the sequence representation using FFN. The probability distribution of the output sentiment category is expressed as:
[0122]
[0123] in, C represents the total number of emotion polarity categories.
[0124] Step 4: Further improve the accuracy of refining term boundaries by comparing the denoising training strategy, and strengthen the sentiment classification process by reducing the generation of incorrect triples;
[0125] Step 4-1 introduces a degree of uncertainty during the diffusion process, leading to repeated predictions around the initially predicted boundary indices. This allows the model to flexibly explore various possible start or end points of terms. However, it's worth noting that while this increased uncertainty helps handle multi-word terms, it also carries the risk of incorrect boundary index predictions due to subtle variations. To further improve the accuracy of refined term boundaries and strengthen the sentiment classification process by reducing the generation of incorrect triples, we introduce a contrastive denoising training strategy during the training phase.
[0126] By adding two different levels of noise λ1 and λ2 to N train Two types of samples, positive and negative, are generated from the N ground truth values, where λ1 < λ2. After inverse diffusion processing, the decoder takes these two types of samples as input: positive samples have a noise scale smaller than λ1 and are used to reconstruct their corresponding ground truth values; negative samples have a noise scale larger than λ1 and smaller than λ2 and are used to predict "Invalid," denoted as ε. If a sentence has N train The true value, compared to the noise reduction training, will have 2×N. train For each sample, one positive sample and one negative sample are generated for each true value;
[0127] Similar to the previous calculation process, the boundary probability of positive samples can be obtained. The classification probabilities of positive and negative samples are expressed as follows: and
[0128] Step 5: Update the model parameters by defining the loss functions as matching loss and contrastive denoising loss;
[0129] Step 5-1, the training loss function includes matching loss and contrastive denoising loss;
[0130] Matching loss: In processing N train One prediction and the corresponding N train When expanding the true values, the Hungarian algorithm is used to establish the best match between the predicted boundary probability set and the sentiment classification probability set. Let represent the true situation corresponding to the i-th noise sequence. The matching loss includes boundary loss and sentiment classification loss. Subsequently, the inverse process is trained by maximizing the probability of the prediction:
[0131]
[0132] δ∈{a s a e o s o e} represents the boundary index between the aspect item and the view item, where a s and a e These represent the start and end indices of the aspect item, respectively, while o s and o e These represent the start and end indices of the viewpoint item, respectively;
[0133] This represents the true boundary index predicted by the model for the i-th noise sequence during the matching process. The probability, It is the true boundary index corresponding to the i-th noise sequence determined during the matching process. It is the probability distribution of the boundary index δ predicted by the model for the i-th noise sequence;
[0134] This indicates that during the matching process, for the i-th noise sequence, the model predicts the true sentiment category. The probability, It is the probability distribution of the sentiment category predicted by the model for the i-th noise sequence. It is the true sentiment category corresponding to the i-th noise sequence determined during the matching process; Contrast denoising loss: The contrast loss includes boundary loss and sentiment classification loss. Specifically, it is based only on the boundary probability of positive samples. Calculate the boundary loss, and the classification loss is based on the classification probabilities of positive and negative samples, respectively. and Therefore, the comparative loss is calculated as follows:
[0135]
[0136] This represents the probability of the true boundary index predicted by the i-th positive sample at the boundary index δ. This represents the true boundary index of the i-th positive sample;
[0137] This represents the probability that the i-th positive sample predicts the true sentiment category for sentiment category c. This represents the true sentiment category of the i-th positive sample;
[0138] This represents the probability that the i-th negative sample belongs to the "Invalid" category.
[0139] Joint optimization matching loss And contrast denoising loss The overall training loss can be expressed as:
[0140]
[0141] Step 6: In the inference phase, the process begins by randomly sampling a noise sequence from a Gaussian distribution. This is followed by iterative denoising using a backdiffusion process based on the learned boundary indices. The predicted probabilities derived from this denoising process correspond to the likelihoods associated with the boundary indices and their sentiment polarities.
[0142] Step 6-1, Step S601, In the inference phase, randomly sample N from the Gaussian distribution. eval Starting with a noisy sequence, it then iteratively denoises based on a denoising time step τ using a back-diffusion process with the learned boundary index. The predicted probability derived from this denoising process corresponds to the boundary index probability and the sentiment category probability.
[0143] Using boundary index probabilities and sentiment category probabilities, the model decodes N. eval One candidate emotional triad After decoding, post-processing steps are performed:
[0144] Deduplication and filtering;
[0145] For triples with the same term boundary index, the algorithm retains the triple with the highest polarity probability.
[0146] Furthermore, the cumulative sum of predicted probabilities is below a threshold. The triples will be filtered.
[0147] An aspect-level sentiment triple extraction method based on a diffusion model includes a boundary forward diffusion module, a boundary denoising network module, and a contrastive denoising training module, specifically:
[0148] During the boundary forward diffusion stage, Gaussian noise is gradually added to the indices of these sequences to generate boundary indices for noisy states.
[0149] In the denoising network f0(x) t ,S,t i In this study, a noisy boundary sequence and sentence representation are used as input, and then the corresponding term boundaries and corresponding sentiment polarities are predicted.
[0150] By comparing the denoising training strategy, the accuracy in refining term boundaries is further improved, and the sentiment classification process is strengthened by reducing the generation of erroneous triples, thereby improving the performance of aspect-level sentiment triple extraction.
[0151] The method of this invention focuses on aspect-level sentiment triple extraction. Based on a non-autoregressive diffusion model framework, it defines the aspect-level sentiment triple extraction task as a boundary denoising diffusion process. This is achieved by directly modeling boundary indices and progressively refining and dynamically adjusting the boundaries under noise conditions based on comprehensive contextual information. Furthermore, a contrastive denoising training strategy is introduced, which effectively mitigates the subtle variations in repetitive predictions introduced by the diffusion process, thereby improving the performance of aspect-level sentiment triple extraction.
[0152] Although the present invention has been described in detail with reference to the foregoing embodiments, those skilled in the art can still modify the technical solutions described in the foregoing embodiments or make equivalent substitutions for some of the technical features. Any modifications, equivalent substitutions, improvements, etc., made within the spirit and principles of the present invention should be included within the protection scope of the present invention.
Claims
1. A method for extracting aspect-level sentiment triples based on a diffusion model, characterized in that, Includes the following steps: step S10: Boundary sequence Gaussian noise is introduced to simulate the uncertainty in identifying term boundaries, resulting in a noise sequence. ; Step S20: In the denoising network In the middle, noise boundary sequence is used and sentences As input, the corresponding prediction boundary sequence is then predicted. ; Step S30: The sentiment classifier processes the sequence representation through a feedforward neural network (FFN) to output the probability distribution of sentiment categories; Step S40: By comparing denoising training strategies, positive and negative samples are generated to improve accuracy in refining term boundaries, and the sentiment classification process is strengthened by reducing the generation of erroneous triples through the use of two different levels of noise. and Add to Two types of samples are generated from the real values: positive samples and... and negative samples ,in After the diffusion inversion process, the decoder takes two additional types of samples as input, with the positive samples having a noise scale smaller than 1. And used to reconstruct its corresponding true value, the noise scale of the negative sample is larger than and less than The sentiment category "Invalid" is used to predict when aspect words and opinion words do not match, and is represented as... If a sentence has A true value, compared with the noise reduction training will have [a certain value]. For each sample, a positive sample and a negative sample are generated for each true value; the boundary probability of the positive sample is obtained. The classification probabilities of positive and negative samples are expressed as follows: and ; Step S50: Define the loss function as matching loss and contrastive denoising loss, update the model parameters. The contrastive denoising loss includes boundary loss and sentiment classification loss, where only the boundary probability of positive samples is considered. Calculate the boundary loss, and the classification loss is based on the classification probabilities of positive and negative samples, respectively. and calculate; Step S60: During the inference phase, a noise sequence is randomly sampled from a Gaussian distribution, and then iterative denoising is performed using the learned boundary index backdiffusion process. The prediction probability obtained from this denoising corresponds to the prediction probability associated with the boundary index and its sentiment polarity.
2. The aspect-level sentiment triple extraction method based on a diffusion model according to claim 1, characterized in that: The specific steps of step S10 are as follows: Step S101, the boundary index forward diffusion stage, gradually introduces Gaussian noise into the boundary sequence to simulate the inherent uncertainty in identifying term boundaries. For parallel training, this is achieved by copying... The quantity N is standardized as The normalized sequence is represented as Any given time step Noise boundary sequence The calculation using a single-step Markov transition is as follows: in It is the cumulative noise retention ratio, representing the percentage from time step 0 to... All The product of consecutive products, , This represents a predefined variance table. It is a random vector, where each element is a sample drawn from a normal distribution. Represents the noise sampled from a standard Gaussian distribution, where Indicates the first The noise retention ratio at each time step; Step S102, from the noise boundary sequence Initially, the backdiffusion process employs a non-Markovian denoising strategy, DDIM, which is used to accurately reconstruct term boundaries. DDIM involves selecting a subsequence from the complete time-step sequence [1,…,T]. , length is Using the predicted boundary sequence from the previous time step and predicted noise Iteratively refine the noise boundary sequence Iterative refinement process, utilizing Conditional trainable denoising network As shown below: in Indicates the first Noise retention ratio at each time step Indicates time step The predicted boundary sequence; Indicates the first Noise boundary sequence at each time step, Indicates the first The predicted noise at each time step is determined as the noise boundary sequence. and predicted boundary sequences The normalized difference between them, then and The combinations are adjusted according to their respective standard deviations, and this process is repeated iteratively, as shown in the following expression: Indicates the first Estimates of the boundary sequence obtained during the backdiffusion process at each time step; Indicates the first The noise retention ratio at each time step; In DDIM After several iterations, the noise boundary index is gradually refined and converges to the accurate boundary index.
3. The aspect-level sentiment triple extraction method based on a diffusion model according to claim 2, characterized in that: The specific steps of step S20 are as follows: Step S201, the encoder converts the input sentence with a length of into a sentence representation of dimensions, implemented as a pre-trained language model PLM with a bidirectional LSTM; This indicates that a bidirectional encoder is used to represent the input sentence from the converter BERT. To process, This represents a bidirectional long short-term memory network; In step S202, the decoder's task is to process the sentence representation. The resulting noise boundary index sequence The semantic representation of the noisy boundary index sequence represents aspects and viewpoints; First, the noisy sequence is discretized into word indices by rescaling. Then, the sequence is represented as... The mean pooling is calculated by specifying the start and end indices for aspects and opinion items, and each... Indicates the first digit within the boundary sequence The combined representation of the sequences is calculated as follows: in, Indicates the start index of aspect items The hidden state at that location; Indicates the end index of aspect items The hidden state at that location; Indicates the start index of the viewpoint item. The hidden state at that location; Indicates the end of the index for the viewpoint item. The hidden state at that location; Indicates pooling operation; The refined sequence representation is decoded using a Transformer decoder that integrates self-attention and cross-attention layers. The self-attention module utilizes the sequence representation... The generated queries, keys, and values facilitate interaction between sequences: in, This represents the attention mechanism. Meanwhile, the cross-attention mechanism further refines the sequence representation by incorporating a broader semantic context of the sentence, and utilizes a self-attention module. The output is used as a query, with keys and values derived from the sentence representation. , is represented as: in This represents the cross-attention mechanism. To accommodate the iterative nature of the diffusion process, and with each time step Corresponding sine code Integrating into the sequence representation, the final noisy sequence representation The calculation is as follows: Four index pointers are used to predict the boundary indices of aspects and viewpoints, respectively. For each index... Generate a fused representation It combines noise sequence representation with sentence representation, and each index is predicted as the probability of a term boundary. as follows: in, It is a learnable matrix. Indicates a feedforward network. It is a type encoding used to distinguish aspect or viewpoint boundaries, where and These represent the start and end indices of the aspect item, respectively. and These represent the start and end indices of the viewpoint item, respectively.
4. The aspect-level sentiment triple extraction method based on a diffusion model according to claim 3, characterized in that: The specific steps of step S30 are as follows: Step S301, the sentiment classifier processes the final noisy sequence representation using FFN. The probability distribution of the output sentiment category is expressed as: in, , This represents the total number of emotion polarity categories.
5. The aspect-level sentiment triple extraction method based on a diffusion model according to claim 4, characterized in that: The comparison denoising training strategy in step S40: positive samples and negative samples The input decoder is specifically computed by first performing self-attention and cross-attention: in, This indicates the self-attention mechanism. This represents the cross-attention mechanism. , , For sentence representation, to accommodate the iterative nature of the diffusion process, with each time step... Corresponding sine code Integrating into the sequence representation, the final noisy sequence representation and The calculation is as follows: Four index pointers are used to predict the boundary indices of aspects and viewpoints, respectively. For each index... Generate fusion representation It combines noise sequence representation with sentence representation to obtain the boundary probability of positive samples. : in It is a learnable matrix. Indicates a feedforward network. It is a type code used to distinguish the boundaries of aspects or viewpoints; The classification probabilities of positive and negative samples calculated by the sentiment classifier are respectively expressed as: and : in, , , This represents the total number of emotion polarity categories.
6. The aspect-level sentiment triple extraction method based on a diffusion model according to claim 5, characterized in that: The specific steps of step S50 are as follows: Step S501, the training loss function includes matching loss and contrastive denoising loss; Matching loss: in processing One prediction and corresponding When expanding the true values, the Hungarian algorithm is used to establish the best match between the predicted value set and the true value set. , Indicates the first For each noisy sequence corresponding to the real situation, the matching loss includes boundary loss and sentiment classification loss. Subsequently, the inverse process is trained by maximizing the probability of the prediction: Represents the boundary index of aspect items and viewpoint items, where and These represent the start and end indices of the aspect item, respectively. and These represent the start and end indices of the viewpoint item, respectively. This indicates that during the matching process, for the first... A noise sequence, the true boundary index predicted by the model. The probability, It is the first determined during the matching process. The true boundary index corresponding to each noise sequence. The model is the first Boundary index for predicting a noisy sequence The probability distribution; This indicates that during the matching process, for the first... A noise sequence, the model predicts the true sentiment category. The probability, The model is the first The probability distribution of the predicted sentiment category for a given noise sequence. It is the first determined during the matching process. The true sentiment category corresponding to each noisy sequence; Contrastive denoising loss: The contrastive denoising loss includes boundary loss and sentiment classification loss, based only on the boundary probability of positive samples. Calculate the boundary loss, and the classification loss is based on the classification probabilities of positive and negative samples, respectively. and Therefore, the denoising loss is calculated as follows: Indicates the boundary index Above, the first The probability of the true boundary index predicted by a positive sample. Indicates the first The true boundary index of a positive sample; Indicating in the sentiment category Above, the first The probability of predicting the true sentiment category for a given positive sample. Indicates the first The true sentiment category of each positive sample; Indicates for the first Given a negative sample, predict the probability that it belongs to the "Invalid" category; Joint optimization matching loss And contrast denoising loss The overall training loss can be expressed as: 。 7. The aspect-level sentiment triple extraction method based on a diffusion model according to claim 6, characterized in that: The specific steps of step S60 are as follows: Step S601: In the inference phase, random sampling is performed from a Gaussian distribution. The noisy sequence begins, and then it is based on the denoising time step. The learned boundary index is used to perform iterative denoising through backdiffusion. The predicted probability obtained from this denoising corresponds to the boundary index probability and the sentiment category probability. Using boundary index probabilities and sentiment category probabilities, the model decodes... One candidate emotional triad After decoding, post-processing steps are performed: Deduplication and filtering; For triples with the same term boundary index, the algorithm retains the triple with the highest polarity probability; Furthermore, the cumulative sum of predicted probabilities is below a threshold. The triples will be filtered.