A Text Positive and Negative Sentiment Classification Method

A technology for emotion classification and text, applied in the field of natural language processing and machine learning, can solve the problems of failing to effectively capture the semantic information of emotional expression, failing to reflect the ability of word emotion classification, and not taking IDF smoothing factor into account, etc., to achieve Enhance portability and interpretability, improve classification effect, and improve classification accuracy

Inactive Publication Date: 2020-09-08
HUBEI NORMAL UNIV
View PDF3 Cites 0 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0003] TFIDF is the most commonly used feature weight calculation method in emotion classification tasks. Many scholars at home and abroad have proposed various TFIDF variant implementations, including delta TF-IDF[1], TF-RF[2], SentiStrength[3], TF -KL[4], but the main problems of these technical solutions are as follows: 1) failed to effectively capture the semantic information in emotional expression; 2) failed to reflect the emotional classification ability carried by the words themselves
Among them, the delta TF-IDF scheme proposed by Martineau and Finin can effectively calculate the word score and improve the accuracy of text sentiment classification through the Support Vector Machine (Support Vector Machine) classifier, but this method does not take into account the smoothness of IDF Factor, if a certain emotional word does not appear in the text of the positive or negative class, there will be a division by zero error
The TF-RF supervised term weight calculation method proposed by Tam T consciously increases the importance of term in positive text, and has a good classification effect on the positive and negative classification of text, but for the term of the class with insufficient training samples The weight distribution is very low, which is especially prominent in the classification task of imbalanced corpus

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • A Text Positive and Negative Sentiment Classification Method

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0023] The present invention will be further described below in conjunction with embodiment.

[0024] This embodiment provides a text positive and negative emotion classification method, including the following steps.

[0025] Step 1, preprocessing all texts in the text collection, including removing HTML tags, punctuation marks, emoticons and numbers, to form a noise-free positive and negative text collection. The text set is expressed as D={d 1 , d 2 ,...,d m}, where d m Indicates the mth sentence or chapter in the text collection D.

[0026] Step 2: Use the bag-of-words language model to perform unigram word segmentation and bigram word segmentation on the positive and negative example texts to form a non-repetitive multi-dimensional feature vector space.

[0027] Step 3, using the feature weight calculation method with adjustable parameters, to calculate the variant word frequency inverse text frequency for each dimension feature vector in the multi-dimensional featur...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention discloses a text positive and negative emotion classification method. The method comprises the steps that all texts in a text set are preprocessed to form a noiseless positive and negative text set; unigram word segmentation and bigram word segmentation are performed on positive and negative texts; after stop words are removed, a non-repeat multidimensional feature vector space is formed; inverse document frequency calculation is performed on variant word frequency of all-dimensional feature vectors in the multidimensional feature vector space; and finally after training is performed with a formed lexical item-document matrix being a supervised classifier support vector machine and an input factor of logic regression in combination with marked positive and negative emotion category tags, a final text linear classifier prediction model is obtained, that is, emotion classification can be performed on a new unknown text. Through the method, the characteristic that emotional words in a marked corpus have innate classification capability is effectively utilized, a new calculation method is proposed to maximize category discrimination of the emotional words, and therefore the precision of text emotion classification through a computer is improved.

Description

technical field [0001] The invention relates to the fields of natural language processing and machine learning, in particular to a text positive and negative sentiment classification method. Background technique [0002] With the rapid development of the Internet, Web texts have become the main carrier for exchanging emotions, expressing opinions, and information sources for hot topics. Users use social media (forums, blogs, microblogs) to share their feelings about the products they purchased, comments on newly released movies, personal opinions on current hot news, etc. These remarks often contain joy, anger, sorrow, joy, affirmation, Negative, neutral, and other personal rich emotions and opinions. It is the emergence of these emotionally rich Web comment texts, on the one hand, that can help manufacturers understand the advantages and disadvantages of products through Electronic Word-of-mouth, so as to improve product design and services, adjust advertising strategies, ...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
Patent Type & Authority Patents(China)
IPC IPC(8): G06F16/35G06F40/289G06F40/30
CPCG06F16/35G06F40/289G06F40/30
Inventor 李光敏林志伟王晖魏欣
Owner HUBEI NORMAL UNIV
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products