Multiclass emotion analyzing method and system facing bilingual microblog text

A sentiment analysis, bilingual technology, applied in text database clustering/classification, unstructured text data retrieval, natural language data processing, etc. higher question

Inactive Publication Date: 2015-02-04
BEIJING INSTITUTE OF TECHNOLOGYGY
View PDF2 Cites 37 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0005] The purpose of the present invention is to solve the problem that the classification granularity of the existing microblog sentiment analysis method is coarse, the quality of the microblog text analysis is not high for the Chinese and English mix and match, and the identification method of the emotional vocabulary is l

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Multiclass emotion analyzing method and system facing bilingual microblog text
  • Multiclass emotion analyzing method and system facing bilingual microblog text
  • Multiclass emotion analyzing method and system facing bilingual microblog text

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0037] figure 1 It is a flowchart of a multi-category sentiment analysis method for bilingual microblog texts according to an embodiment of the present invention. The main workflow of text emotion recognition is as follows:

[0038] (1) Construction of a bilingual emotional dictionary: First, collect a certain scale of corpus with emotional tendencies, and extract high-frequency words with emotional tendencies from the corpus; then, use the existing knowledge base (WordNet and NTUSD, HowNet) and vocabulary similarity Computational models expand the emotional dictionary; finally, new online languages ​​and emoticons are added to the emotional dictionary;

[0039] (2) Text preprocessing: segment the text to be recognized and remove stop words. Stop words are function words that human language contains without real meaning, such as determiners in English (“the”, “a”, “an”, “that”). On this basis, the English text also needs to perform lemmatization and stem extraction operatio...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention relates to a multiclass emotion analyzing method and a system facing a bilingual microblog text and belongs to the technical field of microblog text emotion analysis. The method comprises the following steps that (1) bilingual dictionary construction: corpus with an emotion inclination of a certain size is first collected, high frequent words with the emotion inclination can be extracted from the corpus, an emotional dictionary is then expanded by using an existing knowledge database and a vocabulary similarity calculating model, and finally network language and emotional signs are added in the emotional dictionary; (2) text pretreatment: the words are divided in a to-be-identified text, stop words are removed, and standardization treatment is conducted on English word shapes; (3) text characteristic space expression: the bilingual emotional dictionary is used for conducting vectorization on the text; (4) an emotional identifying task of the corpus text is realized through a multi emotion class model. The accurate rate and the F1 valve of the method are higher than those of a traditional classification method, and particularly the classification effect of a semi-supervised Gaussian mixture model classification algorithm in a small-scale training set is obviously better than that of the other methods.

Description

technical field [0001] The invention relates to a sentiment analysis method and system, in particular to a multi-type sentiment analysis method and system for bilingual microblog texts, belonging to the technical field of microblog text sentiment analysis. Background technique [0002] With the rise of social media platforms and the widespread use of mobile devices, people have become accustomed to using 140 characters to express their appeals. Publishing Weibo has become an important means for individuals to express their emotions, so it is of great practical significance to analyze the sentiment tendency of Weibo texts. At present, Sina Weibo has become the main carrier of domestic Internet public opinion, and a large number of users interact with information and express emotions through Weibo. The development of an emotion classification system for user microblog texts and then the completion of emotion recognition has important reference significance in the fields of pu...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
IPC IPC(8): G06F17/30G06F17/27
CPCG06F16/35G06F40/205
Inventor 礼欣栗雨晴韩煦宋丹丹廖乐健
Owner BEIJING INSTITUTE OF TECHNOLOGYGY
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products