Microblog emotion analysis method based on standard dictionaries and semantic rules

A sentiment analysis and microblog technology, applied in the field of pattern recognition, can solve the problems of ignoring the contextual relationship of words and syntactic rules, and the effect of microblog short text analysis is not ideal, so as to achieve the effect of high classification accuracy

Inactive Publication Date: 2016-12-07
BEIJING UNIV OF TECH
3 Cites 13 Cited by

AI-Extracted Technical Summary

Problems solved by technology

However, in the process of analyzing sentiment, traditional algorithms still face some important problems to be solved: 1) Although the semantic sentiment analysis algorithm can isolate words from sentences, it ignores the ...
View more

Abstract

The invention discloses a microblog emotion analysis method based on standard dictionaries and semantic rules. The microblog emotion analysis method comprises the following steps: collecting microblog data and manually labeling and marking the emotion value of each microblog; proposing corresponding standard micrblog emotion dictionaries, and establishing an emotion dictionary database; based on the standard emotion dictionaries, adding the semantic rules for assistance, and performing parameter adjustment and optimization on parameters of the semantic rules; based on a real dataset experiment, acquiring the final classification accuracy and precision. The technical scheme provided by the invention is adopted to well analyze the emotion tendency of each microblog user by introducing the standard emotion dictionaries, microblog expression dictionaries and the semantic rules, therefore, higher classification accuracy and precision are achieved.

Application Domain

Data processing applicationsWeb data indexing +2

Technology Topic

Accuracy and precisionData set +5

Image

  • Microblog emotion analysis method based on standard dictionaries and semantic rules
  • Microblog emotion analysis method based on standard dictionaries and semantic rules
  • Microblog emotion analysis method based on standard dictionaries and semantic rules

Examples

  • Experimental program(1)

Example Embodiment

[0021] The present invention will be further described below with reference to the accompanying drawings and specific embodiments.
[0022] like figure 1 As shown, an embodiment of the present invention provides a microblog sentiment analysis method based on standard dictionaries and semantic rules, including
[0023] The following steps:
[0024] Step 1. Collect Weibo dataset
[0025] Collect 10,000 Weibo data from Sina Weibo data, and manually score the sentiment tendency value of each Weibo; the sentiment polarity is divided into positive, negative and neutral, and the scoring zone is [-1, 1] between.
[0026] Step 2. Do normalized text preprocessing on the microblog data
[0027] Perform text preprocessing on the collected microblog data, delete special characters and remove microblog emoticons in the text, and uniformly divide the microblog text into the part containing only microblog expressions and the plain text that is conducive to program analysis. part, and perform word segmentation and sentence segmentation on the text part.
[0028] Step 3. Establish the Weibo standard sentiment dictionary database
[0029] The standard Weibo sentiment dictionary combination established under this algorithm, the standard sentiment dictionary involved in the algorithm consists of 6 parts, including the Weibo sentiment word dictionary, the benchmark dictionary of commendatory words, the benchmark dictionary of derogatory words, the dictionary of degree adverbs, the dictionary of negative adverbs and the micro-adverb dictionary. Bo expression dictionary; the dictionary contains factors such as word name, word intensity, word polarity, word part of speech, etc. Import combined dictionaries into the database to create a standard dictionary database for easy word retrieval. When analyzing the sentiment of Weibo, operations such as sentence segmentation, removal of stop words, and word segmentation are required. The Weibo after word segmentation is composed of words with various components. At this time, it is necessary to search in the Weibo standard sentiment dictionary database to determine the Weibo. The sentiment value of the sentiment word in .
[0030] Step 4. Establish the core algorithm of Weibo analysis
[0031] Word sentiment value E(w i ) can represent E(w i )=v×Neg×Deg. Among them, v represents the sentiment word, Neg represents the negative adverb corresponding to the sentiment word, and Deg represents the sentiment degree adverb. If E(S) is used to represent the sentiment value of the entire sentence, E(s i ) represents the i-th clause s i The clause sentiment value of , then E(s i ) has an emotional value of where R i Represents the inter-sentence relationship coefficient of the current clause. Therefore, the sentiment value E(S) of the whole sentence can be expressed as, Among them, P i Represents the sentence pattern coefficient. If E(text) is used to represent the sentiment value of the text, then Effectively combine microblog text and microblog expressions to determine the proportion of expressions and texts. Therefore, the final expression of microblog emotion is E(microblog)=0.4E(emoticon)+0.6E(text), where , E(emoticon) represents the emotional value of the microblog expression,
[0032] Step 5. Add semantic rule assistance to the algorithm, and adjust the semantic parameters
[0033] In the core algorithm of step 4, semantic rules are added to assist in analyzing the sentiment value of microblog text, and the semantic rules include sentence pattern relationship P i and inter-sentence relation R i. The sentence relations considered by this algorithm include exclamatory sentences, interrogative sentences (including antonymous interrogative sentences) and declarative sentences, and the inter-sentence relations include turn this sentence, progressive sentence and hypothetical sentence. The introduction of sentence-pattern relationship and inter-sentence relationship analyzes the text from the whole sentence to the sub-clause level.
[0034] Furthermore, parameter tuning experiments were carried out to optimize the selection of parameter values. The experimental data are selected as microblogs with specific related sentence patterns and inter-sentence relationships, and the semantic parameters of each sub-interval are adjusted with 0.1 as the interval. Among them, the declarative sentence is used as the benchmark sentence pattern, and the parameter value of the sentence pattern is set to 1.0. Based on the microblog data, the accuracy rate is used as the measurement standard. When the accuracy rate reaches the maximum value, the corresponding parameter value point is selected as the sentence parameter of the sentence pattern or the relationship between sentences.
[0035] Step 6, based on the real data set experiment, obtain the classification accuracy
[0036] Apply the real microblog data obtained in step 1 and step 2 to the complete algorithm of step 4 and step 5, analyze each microblog data, and compare the result of the analysis with the result of manual annotation. The correct rate, recall rate, and F-measure (F-Measure) on positive, negative, and neutral microblogs are used as the criteria for judging the sentiment polarity of microblogs, and the three values ​​are averaged to obtain the final classification correct rate. At the same time, the accuracy rate is introduced to judge the accuracy of Weibo scoring.
[0037] In order to verify the effectiveness of the present invention and its performance compared with traditional microblog sentiment analysis methods, a set of comparative experiments were carried out. The classification results of the three classification methods are shown in Table 1, and the accuracy rates are shown in Table 2. .
[0038] Table 1 Comparison of classification results between the method of the present invention and two traditional microblog sentiment analysis methods
[0039]
[0040] In Table 1, due to the expansion of the neutral interval, the three methods show the characteristics of high positive and negative values, and low neutral values, so that the denominator in the calculation formula of the correct rate increases and the value decreases. Similarly, due to the imperfection of negative emotional words in the emotional lexicon and the different expressions of rhetorical devices, some negative emotional microblogs cannot be accurately identified, resulting in the recall rate showing the same characteristics as the correct rate. For the F value, when the text sentiment value is judged to be biased, the expression weighting can correct it, so the accuracy of the result is higher than that of the semantic rule.
[0041] Table 2 Comparison of the accuracy of the method of the present invention and two traditional microblog sentiment analysis methods
[0042]
[0043] It can be seen from Table 1 and Table 2 that the method of the present invention fully draws on the advantages of the first two methods, and the classification accuracy and accuracy are effectively improved when three types of video emotion recognition are performed.

PUM

no PUM

Description & Claims & Application Information

We can also present the details of the Description, Claims and Application information to help users get a comprehensive understanding of the technical details of the patent, such as background art, summary of invention, brief description of drawings, description of embodiments, and other original content. On the other hand, users can also determine the specific scope of protection of the technology through the list of claims; as well as understand the changes in the life cycle of the technology with the presentation of the patent timeline. Login to view more.

Similar technology patents

Text training method and text classifying method

InactiveCN101727463AImprove classification accuracyShort training and classification time
Owner:INST OF COMPUTING TECHNOLOGY - CHINESE ACAD OF SCI

Household garbage recycling method

Owner:无锡智高点技术研发有限公司

Classification and recommendation of technical efficacy words

Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products