Eureka AIR delivers breakthrough ideas for toughest innovation challenges, trusted by R&D personnel around the world.

A Sentiment Classification Method Based on Parts of Speech Combination and Feature Selection

A technology of emotion classification and feature selection, applied in the field of computer science, can solve problems such as fragrant garbage, inability to directly extract, inability to directly learn vectors, etc.

Active Publication Date: 2022-05-20
NANTONG UNIVERSITY
View PDF3 Cites 0 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0003] Although the word vector model trained by traditional Word2vec can reflect the potential semantic relationship between words, there are often some problems when training the model. First, the Word2vec tool cannot directly extract the phrase structure that better reflects the emotional tendency of the text. For example, "unhappy" is divided into "no" and "happy". Word2vec learns the contextual semantics of the words "no" and "happy" during training, and cannot directly learn the vector of the phrase "unhappy".
The second is that it is impossible to distinguish the semantics of the same word under different parts of speech. For example, "Xiao Ming bought a bundle of incense and used it for sacrifices, but the incense I bought this time is too rubbish" and "The rice cooked by Xiao Ming is really fragrant", in the previous sentence "Xiang" in "Xiang" is a noun, which refers to the thin strips made of sawdust mixed with spices used in worshiping ancestors or worshiping gods. It has no emotional color and is a neutral word; Smell is a compliment
[0004] Traditional data storage and processing methods greatly waste computer resources and time
Moreover, due to its step-by-step processing mechanism, the traditional Hadoop cluster limits its performance efficiency, and the I / O overhead for the disk is extremely high.

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • A Sentiment Classification Method Based on Parts of Speech Combination and Feature Selection
  • A Sentiment Classification Method Based on Parts of Speech Combination and Feature Selection
  • A Sentiment Classification Method Based on Parts of Speech Combination and Feature Selection

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0039] The technical solution of the present invention will be described in detail below in conjunction with the accompanying drawings. In this embodiment, the microblog comment text is used as input text data.

[0040] Such as figure 1 , the sentiment classification method based on part-of-speech combination and feature selection of the present embodiment carries out active and negative binary classification to text sentiment, comprises the following steps:

[0041] Step 1) Initialize the word-part-of-speech Word2vec model.

[0042] Step 2) Preprocessing the text, and selecting feature words with emotional information from the preprocessed text data based on the sentiment dictionary. The sentiment dictionary of this embodiment is composed of a basic sentiment dictionary, an extended sentiment dictionary and a multi-collocation sentiment dictionary.

[0043] Step 3) Combine each feature word and part of speech to convert the text into a sequence text of "word part of speech...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

PUM

No PUM Login to View More

Abstract

The emotion classification method based on part-of-speech combination and feature selection of the present invention comprises the steps of: first initializing the word-part-of-speech Word2vec model; secondly the data is preprocessed, and based on the sentiment lexicon from the preprocessed data, select the one with emotional information Feature words; then each feature word and part of speech of text is combined, and text is converted into word part-of-speech pair sequence text; Then obtain the vector of word part-of-speech pair sequence text each feature word by word-part-of-speech Word2vec model, and for each A piece of text adds the word vectors according to the dimensions and takes the average value to represent the text, so as to obtain the feature vector of the text; finally, the SVM classifier is used to obtain the sentiment classification model. The beneficial effect is: using the sentiment dictionary to extract feature words, highlighting the feature words with single-sentiment information; on the other hand, based on phrase structure optimization, word segmentation extracts phrase structures with emotional tendencies, and combines words and parts of speech to solve the problem of polysemy .

Description

technical field [0001] The invention relates to the field of computer science, in particular to an emotion classification method based on part-of-speech combination and feature selection. Background technique [0002] With the rapid development of social networking platforms, especially Weibo, a large number of netizens can express their opinions and emotions on social events more conveniently, resulting in a large amount of Weibo comment data, which contains rich opinions and views. Emotional information, how to deeply analyze and mine the emotional tendency of the massive data of Weibo texts has become a hot research direction. Traditional sentiment classification methods only focus on lexical features and syntactic features, ignoring the semantic features between words. [0003] Although the word vector model trained by traditional Word2vec can reflect the potential semantic relationship between words, there are often some problems when training the model. First, the Wor...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

Application Information

Patent Timeline
no application Login to View More
Patent Type & Authority Patents(China)
IPC IPC(8): G06F16/35G06F40/289
CPCG06F40/289
Inventor 施佺郑亚平邵叶秦王晗周晨璨
Owner NANTONG UNIVERSITY
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Eureka Blog
Learn More
PatSnap group products