An aspect-level sentiment analysis method based on prompt words and knowledge graph enhancement
By introducing cue words and knowledge graph enhancement methods, combined with self-attention and aspect perception mechanisms, cue words are dynamically generated and multi-granularity feature fusion is performed. This solves the problems of low dependency tree construction efficiency and insufficient utilization of pre-trained models in existing technologies, and improves the accuracy and robustness of aspect-level sentiment analysis.
Patent Information
- Authority / Receiving Office
- CN · China
- Patent Type
- Applications(China)
- Current Assignee / Owner
- CHONGQING UNIV OF POSTS & TELECOMM
- Filing Date
- 2026-02-28
- Publication Date
- 2026-06-12
AI Technical Summary
Existing aspect-level sentiment analysis methods suffer from low dependency tree construction efficiency and strong dependencies when processing long and short text comment data. Pre-trained language models are not fully utilized in downstream tasks, and information redundancy and conflicts exist when fusing multi-granularity features, leading to performance degradation.
We employ a method based on prompt words and knowledge graph enhancement. We generate encoding vectors through self-attention and aspect-aware attention mechanisms, combine semantic and syntactic convolutional networks to dynamically generate prompt words, and fuse multi-granular features through orthogonal deduplication technology. We also utilize the WordNet knowledge graph for external knowledge embedding to improve feature representation and model performance.
It effectively solves the problems of low dependency tree construction efficiency and insufficient utilization of pre-trained models, improves the accuracy and robustness of the model in aspect-level sentiment analysis, significantly improves the fusion effect of multi-granularity features, and enhances the adaptability and generalization ability of the model.
Smart Images

Figure CN122196182A_ABST
Abstract
Description
Technical Field
[0001] This invention relates to the field of sentiment analysis technology, and in particular to an aspect-level sentiment analysis method based on cue words and knowledge graph enhancement. Background Technology
[0002] With the rapid development of internet technology, a large amount of user-generated content has emerged in the form of comments, blogs, and social media posts. This textual data contains rich emotional information. How to accurately extract users' sentiment tendencies from this massive amount of unstructured text has become an important research direction in the field of natural language processing. Sentiment analysis, as a core task, plays an indispensable role in applications such as user feedback mining, public opinion monitoring, and product improvement. Aspect-Based Sentiment Analysis (ABSA) focuses on identifying specific aspects of sentiment information in text, offering higher analytical accuracy. In the sentence "The phone's screen is very clear, but the battery life is poor," the ABSA task requires determining the sentiment polarity of "screen" and "battery life" as "positive" and "negative," respectively. This fine-grained analysis is of great significance for interpreting user feedback on e-commerce platforms, improving products, and optimizing personalized recommendation systems.
[0003] Currently, ABSA research is mainly divided into traditional methods and deep learning-based methods. Traditional methods mainly include machine learning and sentiment dictionaries. Since machine learning in sentiment analysis is shallow learning and does not involve feature learning, deep learning-based methods utilize non-linear network models to obtain functions that better meet task requirements, thereby achieving automatic learning of data features. Building upon deep learning, knowledge graphs and prompting learning are introduced as external knowledge embeddings, and external prompting engineering is incorporated to assist in completing the ABSA task. These methods, as external knowledge enhancement, can learn deeper information.
[0004] In ABSA tasks, external sentiment knowledge is often used as a source to enhance sentiment feature expression. However, existing models tend to rely too heavily on dependency trees. For long text comment data, while syntactic dependency trees can capture syntactic structure information to achieve context awareness, errors in dependency parsing results can introduce noise. For short text comment data, due to the lack of obvious syntactic features in colloquial sentence structures, dependency tree construction is inefficient. To compensate for the shortcomings of syntactic dependency trees, some models introduce self-attention mechanisms to model semantic relationships between words, but these models often overlook the enhancing effect of external knowledge on semantic similarity information.
[0005] Furthermore, after release, pre-trained models are often implemented specifically for different downstream tasks. For example, BERT completes training by adding special tags and placing the corresponding tags in sentences according to the downstream task, followed by fine-tuning. While this approach is convenient, it limits the implementation options for downstream tasks to some extent, failing to effectively bridge the gap between pre-training and fine-tuning, thus restricting the performance of pre-trained language models. These special tags lack actual semantics and perform poorly in handling aspect-level sentiment analysis tasks, failing to fully realize the potential of pre-trained models in ABSA.
[0006] Most existing methods employ complex and inefficient techniques to integrate different types of knowledge. Currently, there is a lack of a framework that can combine features of multiple granularities while maintaining model scalability. Semantic, syntactic, and contextual features in existing ABSA models may overlap significantly, leading to information redundancy during feature fusion. Overlap between different features can also cause information conflicts, resulting in performance degradation. Therefore, ensuring that the combination of multiple granular features achieves cumulative effects rather than canceling each other out or duplicating features is a key challenge in the ABSA task. Summary of the Invention
[0007] To address the aforementioned technical problems, this invention aims to provide a method that combines fine-grained generation of prompt words with external knowledge graph enhancement, thereby resolving the issue of inaccurate sentiment analysis in existing technologies.
[0008] To achieve the above-mentioned technical objectives, the technical solution provided by the present invention includes:
[0009] An aspect-level sentiment analysis method based on cue words and knowledge graph enhancement includes the following steps:
[0010] S1: Obtain the sentences to be analyzed from the corpus and construct a training dataset. Each training sample in the training dataset includes an initial training sentence, aspect words in the initial training sentence, and sentiment labels corresponding to the aspect words.
[0011] S2: Input the initial sentence into a model based on prompt words and knowledge graph enhancement. The model includes a knowledge graph embedding module, a syntactic information prompting engineering module, a syntactic graph convolutional network module, a semantic graph convolutional network module, and a multi-feature fusion module.
[0012] S3: Convert the training text into a sequence of encoded vectors, where each vector corresponds to a word in the text; jointly process the encoded vector sequence through the self-attention mechanism module and the aspect-aware attention mechanism module to generate the semantic vector of the training text;
[0013] S4: Based on the encoded vector sequence, construct a semantic convolutional network based on multi-head attention and a syntactic convolutional network based on syntactic dependency tree in parallel to capture semantic information and syntactic information, respectively.
[0014] S5: Calculate the syntactic distance matrix and syntactic relation matrix using the syntactic dependency tree, and input the matrix into the prompt decision algorithm to select prompt words;
[0015] S6: Use external knowledge graphs and contextual representations to dynamically generate concept associations;
[0016] S7: Orthogonal deduplication technology is used to fuse features of four granularities: semantic information, syntactic information, prompting engineering information, and knowledge graph information;
[0017] S8: Perform aspect-level sentiment label prediction on target aspect words in sentences to determine the sentiment polarity of target aspect words.
[0018] Furthermore, in step S3, a sequence of encoded vectors for the training text is generated using a BERT or RoBERTa encoder.
[0019] Further, in step S4, a semantic graph convolutional network is generated based on the encoded vector sequence using a multi-head attention mechanism, specifically including:
[0020] An attention score matrix is obtained through a self-attention mechanism and used as the adjacency matrix. :
[0021] ;
[0022] Where Q and K represent the query matrix and key matrix, respectively. The scaling factor is calculated by dividing the dimension of the input node's features by the number of attention heads, where k represents the number of attention heads. and Let Q and K represent the weight matrices, respectively.
[0023] To enhance semantic graph representation, multiple attention heads are used to generate a more robust adjacency matrix, and the final attention score matrix is generated according to the following formula. ;
[0024] .
[0025] Further, in step S4, based on the encoded vector sequence, a dependency parser is used to model the grammatical relations between words in the sentence, generating a dependency parsing tree, and the dependency parsing tree is treated as a directed graph to construct an adjacency matrix. .
[0026] Further, in step S5, the syntactic distance matrix and relation matrix are calculated using the syntactic dependency tree, and then fed into the prompt decision algorithm to select prompt words. Specifically, this includes:
[0027] First, insert several [prompt] tokens into the prompt template. These tokens are placeholders in the language model vocabulary.
[0028] The syntactic dependency tree generated in step S4 is regarded as a directed graph. Each word in the sentence is used as a node to calculate the syntactic distance matrix and the syntactic relation matrix.
[0029] When calculating the syntactic distance matrix, the original distance matrix is obtained first. :
[0030] ,
[0031] Among them, the distance between nodes Defined as the dependency path length between two words, the forward distance is positive and the backward distance is negative, based on the word order in the sentence. The distances are adjusted to form a distance matrix D with positive and negative directions.
[0032] ;
[0033] Dependency relations are mapped to predefined weight coefficients, with smaller weight values indicating more important relations. The final syntactic relation matrix R is:
[0034] ;
[0035] Wherein, Relation represents the dependency relationship between two nodes, and MAP is the corresponding weight mapping function;
[0036] The adjacency matrix is obtained by multiplying the syntactic distance matrix and the syntactic relation matrix. The Dijkstra algorithm is used to calculate the shortest path from each word node to the target aspect word. Based on a preset decision threshold, the word closest to the aspect word is selected from the shortest path list. These words are added to the prompt word list in the original sentence order. The prompt word list is then injected into the final prompt content.
[0037] Furthermore, in step S5, several initial [Prompt] tokens are added between the aspect word and [MASK], which will be gradually replaced during training. These [Prompt] tokens are unrelated to the vocabulary of the language model and must be selected by a specific algorithm to select the actual prompt words.
[0038] The initialized prompt template is input into the language model, which contains special [Prompt] and [MASK] tokens;
[0039] In each training cycle, the first-order approximate log-likelihood is calculated, specifically by multiplying the gradient of the embedding layer by the loss of word embeddings in the vocabulary through backpropagation.
[0040] Determine a candidate set This includes the top k word groups that are estimated to cause the largest increase:
[0041] ;
[0042] in, Let w represent the input embedding of w, and let the gradient be calculated using the log-likelihood estimate.
[0043] Furthermore, in step S6, the WordNet knowledge graph is introduced as an external knowledge base, and continuous embedding vectors are used to represent the external knowledge, specifically including:
[0044] First, the external knowledge graph is integrated into a low-dimensional continuous embedding to replace the complex subgraph construction process. Then, the knowledge representation related to specific aspects is captured through the dot product attention mechanism.
[0045] For a given triple, entity embedding is performed using the open KGE toolkit OpenKE. Trained knowledge embeddings are used to represent words in sentences and aspects. The mapped knowledge is then embedded into the model. This process not only integrates heterogeneous features but also mitigates the negative impact of sparsity and inaccuracy in knowledge embedding.
[0046] Furthermore, in step S7, a fusion method is used to purify features of different granularities using orthogonal projection deduplication technology, making each feature more independent. This allows different features to better complement each other during fusion, rather than canceling each other out or repeating each other.
[0047] First, let's define the prompt features. Projection to semantic features superior:
[0048] ,
[0049] ,
[0050] Where Proj() represents the projection formula, and x and y are vectors;
[0051] By projecting again in the orthogonal direction, we can obtain a "pure" dependency feature that is independent of semantic features, thus ensuring the independence of cue features.
[0052] ;
[0053] Syntactic features can also have semantic information removed in a similar way, so that syntactic features and semantic features remain independent in multi-granularity feature fusion;
[0054] The deduplicated syntactic features, prompt features, semantic features and external knowledge features are multiplied point by point, and then normalization is performed to ensure that the fused features are processed on the same scale.
[0055] Use residual connections to chain multiple fusion blocks together, ensuring that the output of each block can be directly passed to the next block.
[0056] Furthermore, in step S8, the fused features are processed using a multilayer perceptron and a softmax function to predict the sentiment polarity of words in the sentence's target aspect.
[0057] ;
[0058] in, and For classifier parameters, during training, the cross-entropy loss function is used;
[0059] ;
[0060] in, One-hot encoding of the real label. To predict probabilities, through the above fusion and training process, the model can effectively integrate multi-granularity features and accurately predict the sentiment polarity of target words.
[0061] Compared with the prior art, the present invention has the following advantages:
[0062] (1) This invention effectively solves the problem of overly complex knowledge graph embedding processes in traditional methods by introducing WordNet knowledge graph as an external knowledge base and representing external knowledge as continuous embedding vectors. By using low-dimensional continuous embedding to replace the traditional subgraph construction process, the computational complexity is significantly reduced, while the feature representation is enriched, and the accuracy and generalization ability of the model are improved.
[0063] (2) The syntactic information-based prompting engineering module proposed in this invention uses a gradient-driven method to automatically generate discrete prompt templates, effectively solving the problem of insufficient knowledge utilization in pre-trained models. By constructing prompt templates and predicting [MASK], the task is transformed into a fill-in-the-blank task familiar to the pre-trained language model, significantly enhancing the model's adaptability to diverse emotional expressions, especially improving the model's performance in low-resource scenarios.
[0064] (3) This invention uses orthogonal projection deduplication technology to fuse multi-granularity features, ensuring the independence of different features. By removing some semantic information from syntactic features and prompt features, information redundancy and conflict during feature fusion are avoided, enabling each feature to better complement each other during the fusion process, thereby improving the overall performance of the model.
[0065] (4) The model framework proposed in this invention combines semantic GCN branch, syntactic GCN branch, prompting engineering branch based on syntactic information, and external knowledge graph embedding enhancement branch, which can comprehensively capture the semantic information, syntactic information, and external knowledge information of the text. Through the design of the multi-feature fusion module, the effective integration of features of different granularities is realized, which significantly improves the accuracy and robustness of the model in aspect-level sentiment analysis tasks. Attached Figure Description
[0066] Figure 1 This is a model architecture diagram of an aspect-level sentiment analysis method based on prompt words and knowledge graph enhancement in a preferred embodiment of the present invention;
[0067] Figure 2 This is a schematic diagram of a prompting decision algorithm module constructed in a preferred embodiment of the present invention;
[0068] Figure 3 This is a schematic diagram of the knowledge graph embedding constructed in a preferred embodiment of the present invention; Figure 4 This is a schematic diagram of a fusion module constructed in a preferred embodiment of the present invention. Detailed Implementation
[0069] To make the objectives, technical solutions, and advantages of the present invention clearer, the present invention will be further described below in conjunction with the accompanying drawings and embodiments. However, the present invention can be implemented in many different ways and should not be construed as limited to the embodiments shown; rather, these embodiments provide those skilled in the art with implementation methods that meet applicable legal requirements.
[0070] In the description of this invention, it should be understood that the terms "longitudinal", "lateral", "up", "down", "front", "rear", "left", "right", "vertical", "horizontal", "top", "bottom", "inner", "outer", etc., indicate the orientation or positional relationship based on the orientation or positional relationship shown in the accompanying drawings. They are only for the convenience of describing this invention and simplifying the description, and do not indicate or imply that the device or element referred to must have a specific orientation, or be constructed and operated in a specific orientation. Therefore, they should not be construed as limitations on this invention.
[0071] In this invention, let S = {w1, w2, ..., wn} represent text (also called a sentence), and A = {a1, a2, ..., am} represent a predefined set of aspects. For each S, A... s ={a i | a i ∈A and a i ∈S} represents the aspect term contained in s, abbreviated as a i It also represents the i-th word of S. The task of ABSA is to predict the aspect word a. i ∈A s Emotional polarity y i ∈{pos, neg, neu}, where pos represents positive, neg represents negative, and neu represents neutral.
[0072] like Figure 1 As shown in this invention, an aspect-level sentiment analysis model based on cue words and external knowledge graph enhancement effectively integrates syntactic convolutional networks, semantic convolutional networks, cue word information, and knowledge graph information. A multi-layered fusion network module is designed to deeply capture the synergistic effects between granular features. This model can comprehensively mine semantic, syntactic, and external knowledge information in text, achieving accurate sentiment analysis.
[0073] The aspect-level sentiment analysis method based on prompt words and external enhancement of knowledge graphs provided by this invention includes the following steps:
[0074] S1: Obtain the sentences to be analyzed from the corpus and construct a training dataset. Each training sample in the training dataset includes an initial training sentence, aspect words in the initial training sentence, and sentiment labels corresponding to the aspect words.
[0075] In the specific implementation step S101, restaurant-related evaluation data (Restaurant dataset), electronic product-related review data (Laptop dataset), and online data (Twitter dataset) selected from the platform are collected from the Internet.
[0076] It should be noted that each data point in these datasets consists of a sentence, aspect words, and sentiment polarity.
[0077] Among them, sentiment polarity includes three types of emotions: positive, neutral, and negative. Aspect words refer to the specific objects in the text that the user wants to express emotions about; aspect words can be single words or phrases of multiple words. A sentence often contains more than one aspect word, and different aspect words may correspond to different sentiment colors. For example, in the sentence A extracted from the Restaurant dataset, "The food at this restaurant is delicious, but the service is slow," we can observe that there are two aspect words in the sentence: "food" and "service speed." However, the sentiment polarity of these two aspect words is different: "food" has a positive sentiment polarity, while "service speed" has a negative sentiment polarity.
[0078] Based on this, a sentiment analysis model capable of predicting the sentiment polarity Y of aspect word A in sentence s is constructed through the following steps.
[0079] S2: Input the initial sentence into a model based on prompt words and knowledge graph enhancement. The model includes a knowledge graph embedding module, a syntactic information prompting engineering module, a syntactic graph convolutional network module, a semantic graph convolutional network module, and a multi-feature fusion module.
[0080] S3: Convert the training text into a sequence of encoded vectors, where each vector corresponds to a word in the text; jointly process the encoded vector sequence through the self-attention mechanism module and the aspect-aware attention mechanism module to generate the semantic vector of the training text;
[0081] It should be noted that the specific implementation process of step S3 includes the following steps:
[0082] Step S3-1: Input the initial sentence into the BERT or RoBERTa encoder in the format "[CLS]sentence[SEP]sentence[SEP]sentence[MASK][SEP]" to capture long-range dependencies in the input text sequence, providing a deep representation for each word, so that the representation of each word contains its contextual information. Next, the BERT or RoBERTa encoder outputs an encoding vector sequence H, where each vector in the encoding vector sequence corresponds to a word in the input text sequence, encoding the contextual information of that word.
[0083] Step S3-2: Introduce two attention mechanisms: self-attention and aspect-aware attention. Self-attention is used to capture long-distance dependencies of the context. However, this may overemphasize irrelevant dependencies. Therefore, aspect-aware attention is added to filter noise by using aspect vectors as queries, strengthening the syntactic connection between aspect words and related opinion words.
[0084] S4: Based on the encoded vector sequence, construct a semantic convolutional network based on multi-head attention and a syntactic convolutional network based on syntactic dependency tree in parallel to capture semantic information and syntactic information, respectively.
[0085] It should be noted that the specific implementation process of step S4 includes the following steps:
[0086] Step S4-1: Obtain an attention score matrix as the adjacency matrix through a self-attention mechanism. :
[0087] QUOTE ;
[0088] Where Q and K represent the query matrix and key matrix, respectively. The scaling factor is calculated by dividing the dimension of the input node's features by the number of attention heads, where k represents the number of attention heads. and Let Q and K represent the weight matrices, respectively.
[0089] To enhance semantic graph representation, multiple attention heads are used to generate a more robust adjacency matrix, and the final attention score matrix is generated according to the following formula. ;
[0090] .
[0091] Step S4-2: Based on the encoded vector sequence, model the grammatical relations between words in the sentence using a dependency parser, generate a dependency parser tree using spaCy, and treat the dependency parser tree as a directed graph to construct an adjacency matrix. This can help the ABSA task accurately associate aspect words and opinion words, solving the problem that simple position attention in traditional methods cannot handle long-distance dependencies.
[0092] S5: The syntactic distance matrix and syntactic relation matrix are calculated by the syntactic dependency tree. The syntactic distance matrix and relation matrix are then fed into the prompt decision algorithm to select prompt words. This method solves the problems of random word insertion and lack of semantic guidance in ordinary prompt engineering.
[0093] It should be noted that the specific implementation process of step S5 includes the following steps:
[0094] Step S5-1: First, insert several [prompt] tokens into the prompt template. These tokens are placeholders in the language model vocabulary.
[0095] Step S5-2: Treat the syntactic dependency tree generated in step S104 as a directed graph, and use each word in the sentence as a node to calculate the syntactic distance matrix and the syntactic relation matrix;
[0096] When calculating the syntactic distance matrix, the original distance matrix is obtained first. :
[0097] ,
[0098] Among them, the distance between nodes Defined as the dependency path length between two words, the forward distance is positive and the backward distance is negative, based on the word order in the sentence. The distances are adjusted to form a distance matrix D with positive and negative directions.
[0099] ;
[0100] For aspect phrases consisting of multiple words, they are treated as a single node with an internal distance of 0.
[0101] The distance matrix D can quantify the grammatical distance between words, avoid selecting words whose distance from aspect words exceeds a threshold, and thus filter out irrelevant words.
[0102] Step S5-3: Dependency trees, as indicators of sentiment associations in sentences, are very helpful for sentiment analysis. For example, `amod` (adjective modifier) can directly represent attributes; `nsubj` (noun subject) can associate actions with targets. Therefore, dependency relations are mapped to predefined weight coefficients. The smaller the weight value, the more important the relation. For example, the syntactic relation `nsubj` (noun subject) is very meaningful for determining sentiment, so its weight coefficient `r` is set to 0.1. The syntactic relation `det` (article or determiner) has little impact on sentiment determination, so its weight coefficient `r` is set to 5. The final syntactic relation matrix R is:
[0103] ;
[0104] Wherein, Relation represents the dependency relationship between two nodes, and MAP is the corresponding weight mapping function;
[0105] Step S5-4: Perform a dot product between the syntactic distance matrix and the syntactic relation matrix to obtain the adjacency matrix. The adjacency matrix integrates the importance of distance and syntactic relation; the shorter the distance and the more important the syntactic relation, the smaller the value. For each non-aspect word node... Dijkstra's algorithm is used to calculate the shortest path cost from the node to the target aspect word. :
[0106]
[0107] Based on a preset decision threshold k, the word closest to the aspect word is selected from the shortest path list. These words are added to the prompt word list in the original sentence order, and the prompt word list is then injected into the final prompt content. Experiments showed that setting the decision threshold to 4 yields optimal performance; setting it too high introduces noise, leading to performance degradation.
[0108] Step S5-5: Add several initial [Prompt] tokens between the aspect word and [MASK]. These tokens will be gradually replaced during training. They are not related to the vocabulary of the language model and must be selected from actual prompt words through a specific algorithm.
[0109] The initialized prompt template is input into the language model. This template contains special [Prompt] and [MASK] tokens, forming the initial base template. for:
[0110] ,
[0111] In the template, [Prompt] is a placeholder to be replaced. The default is 3 [Prompt] characters, which are placed before the topic words.
[0112] Steps S5-6: In each training cycle, calculate the first-order approximate log-likelihood, specifically by multiplying the gradient of the embedding layer by the loss of word embeddings in the vocabulary through backpropagation;
[0113] Determine a candidate set This includes the top k word groups that are estimated to cause the largest increase:
[0114] ;
[0115] in, Let w represent the input embedding vector. This represents the loss gradient due to placeholder embedding during backpropagation, calculated using log-likelihood estimation. The cross-entropy loss function is used to select the k words with the highest scores as the candidate set.
[0116] S6: Use external knowledge graphs and contextual representations to dynamically generate concept associations;
[0117] It should be noted that the specific implementation process of step S6 includes the following steps:
[0118] Step S6-1: By introducing the WordNet knowledge graph as an external knowledge base, the external knowledge graph is integrated into a low-dimensional continuous embedding, replacing the complex subgraph construction process, and realizing the fusion of the external knowledge graph and the context representation.
[0119] The input is an external knowledge graph in triplet format, which includes head entity, relation, and tail entity information. The open KGE toolkit OpenKE is used for entity embedding. The semantic matching model ANALOGY module of OpenKE is called to generate entity embedding vectors. This model can capture complex semantic relations and is more suitable for ABSA tasks.
[0120] Step S6-2: For each word The process involves searching for the knowledge embedding of a word. If the word is not in the knowledge base, the mean of the embeddings of its synonym set is used. Then, a dot product attention mechanism is used to capture knowledge representations related to specific aspects. This process not only integrates heterogeneous features but also mitigates the negative impact of sparsity and inaccuracy in knowledge embedding.
[0121] S7: Orthogonal deduplication technology is used to fuse features of four granularities: semantic information, syntactic information, prompting engineering information, and knowledge graph information;
[0122] It should be noted that the specific implementation process of step S7 includes the following steps:
[0123] Step S7-1: A fusion method employs orthogonal projection deduplication to purify features of different granularities, making each feature more independent. This allows different features to better complement each other during fusion, rather than canceling each other out or repeating; [This is followed by a seemingly unrelated sentence about cue features.] Projection to semantic features superior:
[0124] ,
[0125] ,
[0126] Where Proj() represents the projection formula, and x and y are vectors;
[0127] By projecting again in the orthogonal direction, we can obtain a "pure" dependency feature that is independent of semantic features, thus ensuring the independence of cue features.
[0128] ;
[0129] Syntactic features can also have semantic information removed in a similar way, so that syntactic features and semantic features remain independent in multi-granularity feature fusion;
[0130] Step 7-2: Multiply the deduplicated syntactic features, cue features, semantic features, and external knowledge features point by point to integrate their relevance. A Dropout layer is used to prevent overfitting, and the results are then normalized to ensure the fused features are processed on the same scale, improving the model's generalization ability. Stacking multiple modules can easily lead to gradient vanishing, and deep modules are difficult to learn effectively. Therefore, residual connections are used to connect multiple fusion blocks, ensuring that the output of each block can be directly passed to the next block, improving training stability. Each module gradually fuses new information while retaining historical features to achieve a cumulative effect. After repeated calculations of fusion, average pooling is performed to calculate the feature output divided by the average, obtaining the final fused feature r. The specific calculation formula is as follows:
[0131]
[0132]
[0133] Among them, all This represents the learnable weight parameters, and Norm() represents the normalization operation. represents the output within the i-th layer module, r represents the final feature output, m represents the layer number of the module, and Mean() represents the average pooling function.
[0134] S8: Perform aspect-level sentiment label prediction on target aspect words in sentences to determine the sentiment polarity of target aspect words.
[0135] It should be noted that the specific implementation of step S8 involves processing the fused features using a multilayer perceptron and a softmax function to predict the sentiment polarity of words in the sentence's target aspect.
[0136] ;
[0137] in, and For classifier parameters, during training, the cross-entropy loss function is used;
[0138] ;
[0139] in, One-hot encoding of the real label. To predict probabilities, through the above fusion and training process, the model can effectively integrate multi-granularity features and accurately predict the sentiment polarity of target words.
[0140] Finally, it should be noted that the above embodiments are only used to illustrate the technical solutions of the present invention, and not to limit them; although the present invention has been described in detail with reference to the foregoing embodiments, those skilled in the art should understand that modifications can still be made to the technical solutions described in the foregoing embodiments, or equivalent substitutions can be made to some of the technical features; and these modifications or substitutions do not cause the essence of the corresponding technical solutions to deviate from the spirit and scope of the technical solutions of the embodiments of the present invention.
Claims
1. An aspect-level sentiment analysis method based on cue words and knowledge graph enhancement, characterized by the following steps: S1: Obtain the sentences to be analyzed from the corpus and construct a training dataset. Each training sample in the training dataset includes an initial training sentence, aspect words in the initial training sentence, and sentiment labels corresponding to the aspect words. S2: Input the initial sentence into a model based on prompt words and knowledge graph enhancement. The model includes a knowledge graph embedding module, a syntactic information prompting engineering module, a syntactic graph convolutional network module, a semantic graph convolutional network module, and a multi-feature fusion module. S3: Convert the training text into a sequence of encoded vectors, where each vector corresponds to a word in the text; jointly process the encoded vector sequence through the self-attention mechanism module and the aspect-aware attention mechanism module to generate the semantic vector of the training text; S4: Based on the encoded vector sequence, construct a semantic convolutional network based on multi-head attention and a syntactic convolutional network based on syntactic dependency tree in parallel to capture semantic information and syntactic information, respectively. S5: Calculate the syntactic distance matrix and syntactic relation matrix using the syntactic dependency tree, and input the matrix into the prompt decision algorithm to select prompt words; S6: Use external knowledge graphs and contextual representations to build a concept-level lexicon; S7: Orthogonal deduplication technology is used to fuse features of four granularities: semantic information, syntactic information, prompting engineering information, and knowledge graph information; S8: Perform aspect-level sentiment label prediction on target aspect words in sentences to determine the sentiment polarity of target aspect words.
2. The aspect-level sentiment analysis method based on cue words and knowledge graph enhancement according to claim 1, characterized in that, In step S3, a sequence of encoded vectors for the training text is generated using a BERT or RoBERTa encoder.
3. The aspect-level sentiment analysis method based on cue words and knowledge graph enhancement according to claim 1, characterized in that, In step S4, a semantic graph convolutional network is generated based on the encoded vector sequence using a multi-head attention mechanism, specifically including: An attention score matrix is obtained through a self-attention mechanism and used as the adjacency matrix. : ; Where Q and K represent the query matrix and key matrix, respectively. The scaling factor is calculated by dividing the dimension of the input node's features by the number of attention heads, where k represents the number of attention heads. and Let Q and K represent the weight matrices, respectively. To enhance semantic graph representation, multiple attention heads are used to generate a more robust adjacency matrix, and the final attention score matrix is generated according to the following formula. ; 。 4. The aspect-level sentiment analysis method based on cue words and knowledge graph enhancement according to claim 1, characterized in that, In step S4, the dependency parser models the grammatical relations between words in the sentence based on the encoded vector sequence, generates a dependency parsing tree, and treats the dependency parsing tree as a directed graph to construct an adjacency matrix. .
5. The aspect-level sentiment analysis method based on cue words and knowledge graph enhancement according to claim 1, characterized in that, In step S5, the syntactic distance matrix and relation matrix are calculated using the syntactic dependency tree. These matrices are then fed into the prompt decision algorithm to select prompt words. Specifically, this includes: First, insert several [prompt] tokens into the prompt template. These tokens are placeholders in the language model vocabulary. The syntactic dependency tree generated in step S4 is regarded as a directed graph. Each word in the sentence is used as a node to calculate the syntactic distance matrix and the syntactic relation matrix. When calculating the syntactic distance matrix, the original distance matrix is obtained first. : , Among them, the distance between nodes Defined as the dependency path length between two words, the forward distance is positive and the backward distance is negative, based on the word order in the sentence. The distances are adjusted to form a distance matrix D with positive and negative directions. ; Dependency relations are mapped to predefined weight coefficients, with smaller weight values indicating more important relations. The final syntactic relation matrix R is: ; Wherein, Relation represents the dependency relationship between two nodes, and MAP is the corresponding weight mapping function; The adjacency matrix is obtained by multiplying the syntactic distance matrix and the syntactic relation matrix. The Dijkstra algorithm is used to calculate the shortest path from each word node to the target aspect word. Based on a preset decision threshold, the word closest to the aspect word is selected from the shortest path list. These words are added to the prompt word list in the original sentence order. The prompt word list is then injected into the final prompt content.
6. The aspect-level sentiment analysis method based on cue words and knowledge graph enhancement according to claim 1, characterized in that, In step S5, several initial [Prompt] tokens are added between the aspect word and [MASK]. These tokens will be gradually replaced during training. These [Prompt] tokens are not related to the vocabulary of the language model and must be selected by a specific algorithm to select the actual prompt words. The initialized prompt template is input into the language model, which contains special [Prompt] and [MASK] tokens; In each training cycle, the first-order approximate log-likelihood is calculated, specifically by multiplying the gradient of the embedding layer by the loss of word embeddings in the vocabulary through backpropagation. Determine a candidate set This includes the top k word groups that are estimated to cause the largest increase: ; in, Let w represent the input embedding of w, and let the gradient be calculated using the log-likelihood estimate.
7. The aspect-level sentiment analysis method based on cue words and knowledge graph enhancement according to claim 1, characterized in that, In step S6, the WordNet knowledge graph is introduced as an external knowledge base, and continuous embedding vectors are used to represent the external knowledge, specifically including: First, integrate the external knowledge graph into a low-dimensional continuous embedding to replace the complex subgraph construction process; Capture knowledge representations relevant to specific aspects through dot product attention mechanisms; For a given triple, entity embedding is performed using the open KGE toolkit OpenKE. Trained knowledge embeddings are used to represent words in sentences and aspects. The mapped knowledge is then embedded into the model. This process not only integrates heterogeneous features but also mitigates the negative impact of sparsity and inaccuracy in knowledge embedding.
8. The aspect-level sentiment analysis method based on cue words and knowledge graph enhancement according to claim 1, characterized in that, In step S7, orthogonal projection deduplication technology is used to purify features of different granularities through a fusion method, making each feature more independent. This allows different features to better complement each other during fusion, rather than canceling each other out or repeating each other. First, let's define the prompt features. Projection to semantic features superior: , , Where Proj() represents the projection function, and x and y are vectors; By projecting again in the orthogonal direction, a "pure" cue feature that is independent of semantic features can be obtained, ensuring the independence of the cue feature; ; Syntactic features can also have semantic information removed in a similar way, so that syntactic features and semantic features remain independent in multi-granularity feature fusion; The deduplicated syntactic features, prompt features, semantic features and external knowledge features are multiplied point by point, and then normalization is performed to ensure that the fused features are processed on the same scale. Use residual connections to chain multiple fusion blocks together, ensuring that the output of each block can be directly passed to the next block.
9. The aspect-level sentiment analysis method based on cue words and knowledge graph enhancement according to claim 1, characterized in that, In step S8, the fused features are processed using a multilayer perceptron and a softmax function to predict the sentiment polarity of words in the sentence's target aspect. ; in, and For classifier parameters, during training, the cross-entropy loss function is used; ; in, One-hot encoding of the real label. To predict probabilities, through the above fusion and training process, the model can effectively integrate multi-granularity features and accurately predict the sentiment polarity of target words.