Unlock instant, AI-driven research and patent intelligence for your innovation.

A Sentiment Analysis Method Based on Word Vector and Part of Speech

A sentiment analysis and word vector technology, applied in semantic analysis, text database clustering/classification, unstructured text data retrieval, etc., can solve the problem of not fully considering the impact of word part of speech and semantic information on sentiment analysis results, etc. To achieve the effect of improving time performance and improving experimental results

Inactive Publication Date: 2022-02-11
TIANJIN UNIV
View PDF6 Cites 0 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0006] The present invention provides a sentiment analysis method based on word vectors and parts of speech. The present invention can effectively overcome the problem that traditional sentiment analysis methods cannot fully consider the influence of word part of speech and semantic information on sentiment analysis results, and combine part of speech and semantics. See the description below for details:

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • A Sentiment Analysis Method Based on Word Vector and Part of Speech
  • A Sentiment Analysis Method Based on Word Vector and Part of Speech
  • A Sentiment Analysis Method Based on Word Vector and Part of Speech

Examples

Experimental program
Comparison scheme
Effect test

Embodiment 1

[0025] see figure 1 , the embodiment of the present invention provides a kind of sentiment analysis method based on word vector and part of speech, this method comprises the following steps:

[0026] 101: Organize the original corpus;

[0027] This step 101 specifically includes: taking the existing original microblog corpus, and matching the Chinese corpus information in the microblog corpus with the corpus label information.

[0028] 102: data preprocessing;

[0029] Remove special symbols that have no positive effect on or interfere with sentiment analysis in Weibo text, such as URLs, @marks, forwarding marks " / / " and content marking information "#content#", etc.

[0030] 103: Process the preprocessed text according to the part of speech of the word, filter out the required adjectives, verbs, and negative words, and form the original feature set.

[0031] 104: Calculate the TF-IDF value of the word, and use the TF-IDF value of the word to extract the feature word;

[00...

Embodiment 2

[0051] The scheme in embodiment 1 is further introduced below in conjunction with specific examples and mathematical formulas, see the following description for details:

[0052] 201: First, the original microblog corpus needs to be obtained, and then the original microblog corpus is sorted out, and the corpus information Data in the original microblog corpus is matched with the corpus label information Senti_Label, and each piece of corpus information corresponds to a label information. If the corpus information is positive, it is marked as 1; otherwise, it is marked as 0.

[0053] 202: Perform data preprocessing of the original microblog corpus;

[0054]From the original Weibo corpus, special symbols such as repeated Weibo, @, URL and #, and English content are sequentially removed. Then, BostonNLP is used to segment the Weibo text, and the part of speech is marked to remove meaningless stop words. Finally, the Each word in the word segmentation result is marked with a corr...

Embodiment 3

[0071] Combined with the specific experimental data, figure 2 with image 3 The scheme in embodiment 1 and 2 is carried out feasibility verification, see the following description for details:

[0072] Firstly, the effects of Naive Bayesian classifier, nearest neighbor classifier, support vector machine classifier and random forest classifier on the experimental results were verified through experiments respectively, and by using the accuracy rate (Accuracy), recall rate (RecallRate), F Value (F-measure) and precision (Precision) are used as evaluation criteria to evaluate the experimental results, such as figure 2 As shown, the experimental results prove that the support vector machine classifier has a better result in sentiment classification on the microblog dataset.

[0073] exist figure 2 In it, it can be seen that the Accuracy, Recal, F value, and Precision of SVM are higher than those of Bayesian classifier, nearest neighbor classifier and random forest classifier...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

PUM

No PUM Login to View More

Abstract

The invention discloses a sentiment analysis method based on word vectors and parts of speech, comprising: obtaining the original microblog corpus, and matching the Chinese corpus information in the original microblog corpus with the corpus label information; removing the microblog text has no effect on sentiment analysis. Special symbols that have positive effects or cause interference; process the preprocessed text according to the part of speech of the word to form the original feature set; calculate the TF-IDF value of the word in the microblog data, and then extract the feature word according to the TF-IDF; calculate the word The TF-IDF value, each data in the dictionary is composed of a word and its corresponding word vector; combine the feature word and word vector dictionary to form a feature word and word vector dictionary; calculate the vector of each text microblog data , and finally get the vectors of all microblog data; according to the training data, establish respective sentiment classification models for microblog data, and perform sentiment analysis.

Description

technical field [0001] The present invention relates to the fields of natural language processing, data mining, text analysis, computational linguistics and machine learning, and relates to text preprocessing technology, feature extraction technology, sentiment analysis technology and machine learning classification technology, especially a word vector and part-of-speech based Sentiment Analysis Methods. Background technique [0002] At present, Chinese microblog sentiment analysis methods can be divided into two categories: microblog sentiment analysis methods based on sentiment dictionary and microblog sentiment analysis methods based on machine learning. The microblog sentiment analysis method based on the sentiment dictionary is mainly based on the sentiment dictionary, and the sum of the sentiment polarity values ​​of a microblog sentence is used as the sentiment polarity of the sentence, which can be divided into word feature level and sentence level sentiment discrimi...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

Application Information

Patent Timeline
no application Login to View More
Patent Type & Authority Patents(China)
IPC IPC(8): G06F16/35G06F40/242G06F40/289G06F40/30G06F40/284G06K9/62
CPCG06F40/242G06F40/289G06F40/30
Inventor 刘春凤张妍于健喻梅徐天一曹雅茹
Owner TIANJIN UNIV