Semi-supervised text sentiment classification method based on random feature subspace

A random subspace and random feature technology, applied in the field of semi-supervised text sentiment classification based on random feature subspace, can solve the problems of small classifier difference and large misclassified samples.

Active Publication Date: 2015-12-30
HEFEI UNIV OF TECH
View PDF4 Cites 19 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0004] In order to overcome the deficiencies in the prior art, the present invention proposes a semi-supervised text sentiment classification method based on random subspace, in order to solve the problem of a large number of misclassified samples in the training process of the traditional cooperative training algorithm and the semi-supervised text sentiment classification method. In the classification method, the difference between the base classifiers is small, so as to further improve the accuracy of the text sentiment classification method

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Semi-supervised text sentiment classification method based on random feature subspace
  • Semi-supervised text sentiment classification method based on random feature subspace
  • Semi-supervised text sentiment classification method based on random feature subspace

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0056] The present invention preprocesses comment texts to construct a global feature set, and expresses all comment texts into vector form, and then marks the emotional polarity of some comment texts to obtain a marked sample set and an unmarked sample set; and then uses Lasso The method calculates the feature weights of all feature words in the global feature set, and uses the feature weights as probability to extract some feature words to construct a random subspace, maps the marked sample set to the random subspace and trains the classifier, and uses the unlabeled sample set to perform Collaborative training to get the final classifier; finally integrate Z classifiers in the form of main voting, and obtain the final integrated classifier F(x ε ). Specifically, as figure 1 Shown, the inventive method comprises the following steps:

[0057] Step 1. Construct a global feature set T:

[0058] Step 1.1. Obtain n comment texts to form a comment text set D, denoted as D={d 1 ...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention discloses a semi-supervised text sentiment classification method based on random feature subspace. The method comprises steps as follows: 1 acquiring and pre-processing comment text data and constructing a global feature set; 2 expressing all comment texts in a vector form; 3 marking part of comment texts and acquiring a marked sample set and an unmarked sample set; 4 calculating feature weight of all feature words in the global feature set; 5 constructing random subspace; 6 performing cooperative training with unmarked samples to obtain Z classifiers finally; 7 integrating the Z classifiers in a main voting manner to obtain a final integrated classifier. The method solves the problems that a large number of misclassified samples exist in the training process with traditional cooperative training algorithms and the difference between classifiers is small in semi-supervised text sentiment classification methods, so that the accuracy of the text sentiment classification method is improved.

Description

technical field [0001] The invention belongs to the field of natural language processing technology and pattern recognition, in particular to a semi-supervised text emotion classification method based on random feature subspace. Background technique [0002] In recent years, with the rapid development of the Internet, more and more Internet users are willing to publish their opinions and comments on the Internet, resulting in a large number of subjective texts created by users. Such subjective texts contain emotional information such as user views, opinions, and attitudes. Therefore, analyzing the emotional information expressed in subjective texts and identifying their emotional tendencies plays an important role for Internet users. Analyzing text sentiment requires a large number of labeled samples, but in practical applications, it is quite easy to collect a large number of unlabeled samples, and it takes a lot of manpower and material resources to mark these unlabeled sa...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
Patent Type & Authority Applications(China)
IPC IPC(8): G06F17/30
CPCG06F16/35
Inventor 王刚孙二冬李宁宁程八一何耀耀汪洋蒋军夏婷婷
Owner HEFEI UNIV OF TECH
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products