[0021] The present invention will be further described below with reference to the accompanying drawings and specific embodiments.
[0022] like figure 1 As shown, an embodiment of the present invention provides a microblog sentiment analysis method based on standard dictionaries and semantic rules, including
[0023] The following steps:
[0024] Step 1. Collect Weibo dataset
[0025] Collect 10,000 Weibo data from Sina Weibo data, and manually score the sentiment tendency value of each Weibo; the sentiment polarity is divided into positive, negative and neutral, and the scoring zone is [-1, 1] between.
[0026] Step 2. Do normalized text preprocessing on the microblog data
[0027] Perform text preprocessing on the collected microblog data, delete special characters and remove microblog emoticons in the text, and uniformly divide the microblog text into the part containing only microblog expressions and the plain text that is conducive to program analysis. part, and perform word segmentation and sentence segmentation on the text part.
[0028] Step 3. Establish the Weibo standard sentiment dictionary database
[0029] The standard Weibo sentiment dictionary combination established under this algorithm, the standard sentiment dictionary involved in the algorithm consists of 6 parts, including the Weibo sentiment word dictionary, the benchmark dictionary of commendatory words, the benchmark dictionary of derogatory words, the dictionary of degree adverbs, the dictionary of negative adverbs and the micro-adverb dictionary. Bo expression dictionary; the dictionary contains factors such as word name, word intensity, word polarity, word part of speech, etc. Import combined dictionaries into the database to create a standard dictionary database for easy word retrieval. When analyzing the sentiment of Weibo, operations such as sentence segmentation, removal of stop words, and word segmentation are required. The Weibo after word segmentation is composed of words with various components. At this time, it is necessary to search in the Weibo standard sentiment dictionary database to determine the Weibo. The sentiment value of the sentiment word in .
[0030] Step 4. Establish the core algorithm of Weibo analysis
[0031] Word sentiment value E(w i ) can represent E(w i )=v×Neg×Deg. Among them, v represents the sentiment word, Neg represents the negative adverb corresponding to the sentiment word, and Deg represents the sentiment degree adverb. If E(S) is used to represent the sentiment value of the entire sentence, E(s i ) represents the i-th clause s i The clause sentiment value of , then E(s i ) has an emotional value of where R i Represents the inter-sentence relationship coefficient of the current clause. Therefore, the sentiment value E(S) of the whole sentence can be expressed as, Among them, P i Represents the sentence pattern coefficient. If E(text) is used to represent the sentiment value of the text, then Effectively combine microblog text and microblog expressions to determine the proportion of expressions and texts. Therefore, the final expression of microblog emotion is E(microblog)=0.4E(emoticon)+0.6E(text), where , E(emoticon) represents the emotional value of the microblog expression,
[0032] Step 5. Add semantic rule assistance to the algorithm, and adjust the semantic parameters
[0033] In the core algorithm of step 4, semantic rules are added to assist in analyzing the sentiment value of microblog text, and the semantic rules include sentence pattern relationship P i and inter-sentence relation R i. The sentence relations considered by this algorithm include exclamatory sentences, interrogative sentences (including antonymous interrogative sentences) and declarative sentences, and the inter-sentence relations include turn this sentence, progressive sentence and hypothetical sentence. The introduction of sentence-pattern relationship and inter-sentence relationship analyzes the text from the whole sentence to the sub-clause level.
[0034] Furthermore, parameter tuning experiments were carried out to optimize the selection of parameter values. The experimental data are selected as microblogs with specific related sentence patterns and inter-sentence relationships, and the semantic parameters of each sub-interval are adjusted with 0.1 as the interval. Among them, the declarative sentence is used as the benchmark sentence pattern, and the parameter value of the sentence pattern is set to 1.0. Based on the microblog data, the accuracy rate is used as the measurement standard. When the accuracy rate reaches the maximum value, the corresponding parameter value point is selected as the sentence parameter of the sentence pattern or the relationship between sentences.
[0035] Step 6, based on the real data set experiment, obtain the classification accuracy
[0036] Apply the real microblog data obtained in step 1 and step 2 to the complete algorithm of step 4 and step 5, analyze each microblog data, and compare the result of the analysis with the result of manual annotation. The correct rate, recall rate, and F-measure (F-Measure) on positive, negative, and neutral microblogs are used as the criteria for judging the sentiment polarity of microblogs, and the three values are averaged to obtain the final classification correct rate. At the same time, the accuracy rate is introduced to judge the accuracy of Weibo scoring.
[0037] In order to verify the effectiveness of the present invention and its performance compared with traditional microblog sentiment analysis methods, a set of comparative experiments were carried out. The classification results of the three classification methods are shown in Table 1, and the accuracy rates are shown in Table 2. .
[0038] Table 1 Comparison of classification results between the method of the present invention and two traditional microblog sentiment analysis methods
[0039]
[0040] In Table 1, due to the expansion of the neutral interval, the three methods show the characteristics of high positive and negative values, and low neutral values, so that the denominator in the calculation formula of the correct rate increases and the value decreases. Similarly, due to the imperfection of negative emotional words in the emotional lexicon and the different expressions of rhetorical devices, some negative emotional microblogs cannot be accurately identified, resulting in the recall rate showing the same characteristics as the correct rate. For the F value, when the text sentiment value is judged to be biased, the expression weighting can correct it, so the accuracy of the result is higher than that of the semantic rule.
[0041] Table 2 Comparison of the accuracy of the method of the present invention and two traditional microblog sentiment analysis methods
[0042]
[0043] It can be seen from Table 1 and Table 2 that the method of the present invention fully draws on the advantages of the first two methods, and the classification accuracy and accuracy are effectively improved when three types of video emotion recognition are performed.