An emotion analysis method based on word vector and part of speech

A technology of sentiment analysis and word vectors, applied in semantic analysis, special data processing applications, instruments, etc., can solve the problems of not fully considering the influence of word parts of speech and semantic information on the results of sentiment analysis, so as to improve time performance and improve experimental results. effect of effect

Inactive Publication Date: 2018-12-14
TIANJIN UNIV
View PDF6 Cites 14 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0006] The present invention provides a sentiment analysis method based on word vectors and parts of speech. The present invention can effectively overcome the problem that traditional sentiment analysis methods cannot fully consider the influence of word part of speech and semantic information on sentiment analysis results, and combine part of speech and semantics. See the description below for details:

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • An emotion analysis method based on word vector and part of speech
  • An emotion analysis method based on word vector and part of speech
  • An emotion analysis method based on word vector and part of speech

Examples

Experimental program
Comparison scheme
Effect test

Embodiment 1

[0025] see figure 1 , the embodiment of the present invention provides a kind of sentiment analysis method based on word vector and part of speech, this method comprises the following steps:

[0026] 101: Organize the original corpus;

[0027] This step 101 specifically includes: taking the existing original microblog corpus, and matching the Chinese corpus information in the microblog corpus with the corpus label information.

[0028] 102: data preprocessing;

[0029] Remove special symbols that have no positive effect on or interfere with sentiment analysis in Weibo text, such as URLs, @marks, forwarding marks " / / " and content marking information "#content#", etc.

[0030] 103: Process the preprocessed text according to the part of speech of the word, filter out the required adjectives, verbs, and negative words, and form the original feature set.

[0031] 104: Calculate the TF-IDF value of the word, and use the TF-IDF value of the word to extract the feature word;

[00...

Embodiment 2

[0051] The scheme in embodiment 1 is further introduced below in conjunction with specific examples and mathematical formulas, see the following description for details:

[0052] 201: First, the original microblog corpus needs to be obtained, and then the original microblog corpus is sorted out, and the corpus information Data in the original microblog corpus is matched with the corpus label information Senti_Label, and each piece of corpus information corresponds to a label information. If the corpus information is positive, it is marked as 1; otherwise, it is marked as 0.

[0053] 202: Perform data preprocessing of the original microblog corpus;

[0054]From the original Weibo corpus, special symbols such as repeated Weibo, @, URL and #, and English content are sequentially removed. Then, BostonNLP is used to segment the Weibo text, and the part of speech is marked to remove meaningless stop words. Finally, the Each word in the word segmentation result is marked with a corr...

Embodiment 3

[0071] Combined with the specific experimental data, figure 2 and image 3 The scheme in embodiment 1 and 2 is carried out feasibility verification, see the following description for details:

[0072] Firstly, the effects of Naive Bayesian classifier, nearest neighbor classifier, support vector machine classifier and random forest classifier on the experimental results were verified through experiments respectively, and by using the accuracy rate (Accuracy), recall rate (RecallRate), F Value (F-measure) and precision (Precision) are used as evaluation criteria to evaluate the experimental results, such as figure 2 As shown, the experimental results prove that the support vector machine classifier has a better result in sentiment classification on the microblog dataset.

[0073] exist figure 2 In it, it can be seen that the Accuracy, Recal, F value, and Precision of SVM are higher than those of Bayesian classifier, nearest neighbor classifier and random forest classifier....

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention discloses an emotion analysis method based on a word vector and a part of speech, which comprises the following steps: obtaining an original micro-blog corpus and matching the Chinese corpus information in the original micro-blog corpus with the corpus label information; removing special symbols that do not contribute positively to or interfere with emotional analysis; processing thepre-processed text according to the part of speech of the word to form the original feature set; calculating the TF-IDF value of words in microblogging data, and then, according to the TF-IDF value,extracting feature words; calculating the TF-IDF value of the works to make each piece of data in the dictionary be composed of a word and its corresponding word vector; combining feature words with word vector dictionary to form feature words and word vector dictionary; calculating the vector of microblog data in each article, and finally getting the vector of all microblog data; according to thetraining data, establishing respective micro-blog data emotion classification models for emotion analysis.

Description

technical field [0001] The present invention relates to the fields of natural language processing, data mining, text analysis, computational linguistics and machine learning, and relates to text preprocessing technology, feature extraction technology, sentiment analysis technology and machine learning classification technology, especially a word vector and part-of-speech based Sentiment Analysis Methods. Background technique [0002] At present, Chinese microblog sentiment analysis methods can be divided into two categories: microblog sentiment analysis methods based on sentiment dictionary and microblog sentiment analysis methods based on machine learning. The microblog sentiment analysis method based on the sentiment dictionary is mainly based on the sentiment dictionary, and the sum of the sentiment polarity values ​​of a microblog sentence is used as the sentiment polarity of the sentence, which can be divided into word feature level and sentence level sentiment discrimi...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
Patent Type & Authority Applications(China)
IPC IPC(8): G06F17/30G06F17/27
CPCG06F40/242G06F40/289G06F40/30
Inventor 刘春凤张妍于健喻梅徐天一曹雅茹
Owner TIANJIN UNIV
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products