Method for text emotion classification through sparse multinomial logistic regression model under Spark framework

A logistic regression model and logistic regression technology, applied in special data processing applications, instruments, electrical digital data processing, etc., can solve problems such as inability to effectively process large-scale samples or characteristic data, high computational complexity of SMLR algorithm, and large sample size, etc. problem, to achieve the effect of fast solution, high recognition rate, and strong generalization ability

Active Publication Date: 2018-09-18
CHONGQING UNIV OF POSTS & TELECOMM
View PDF7 Cites 7 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

However, the data that people need to deal with in these fields often have a large sample size or a large feature scale.
There are two problems in solving the original SMLR algorithm, the first one is that the computational complexity of the original SMLR algorithm is too high
The second is that the current distributed machine learning field has not done too much parallelization of the SMLR problem, and cannot effectively handle large-scale samples or feature data.

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Method for text emotion classification through sparse multinomial logistic regression model under Spark framework
  • Method for text emotion classification through sparse multinomial logistic regression model under Spark framework
  • Method for text emotion classification through sparse multinomial logistic regression model under Spark framework

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0055] The following will clearly and completely describe the technical solutions in the embodiments of the present invention with reference to the accompanying drawings in the embodiments of the present invention. Obviously, the described embodiments are only some, not all, embodiments of the present invention. Based on the embodiments of the present invention, all other embodiments obtained by persons of ordinary skill in the art without making creative efforts belong to the protection scope of the present invention.

[0056] It should be noted that the terms "first" and "second" in the description and claims of the present invention and the above drawings are used to distinguish similar objects, but not necessarily used to describe a specific sequence or sequence. It is to be understood that the data so used are interchangeable under appropriate circumstances such that the embodiments of the invention described herein can be practiced in sequences other than those illustrate...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention provides a method for text emotion classification through a sparse multinomial logistic regression model under a Spark framework. The method comprises the steps that a training sample dataset is stored in an HDFS (Hadoop Distributed File System); a Spark platform reads data from the HDFS to generate an RDD (Resilient Distributed Dataset); the Spark platform divides a preprocessing task of the data into multiple task groups, the RDD storing the read data in each task group is preprocessed, and the preprocessing result is stored into the HDFS; the sparse multinomial logistic regression model is trained, and a sparse multinomial logistic regression classifier is obtained through solving; the sparse multinomial logistic regression classifier is output into the HDFS; the preprocessed data of a to-be-predicted text and the sparse multinomial logistic regression classifier obtained through training are read from the HDFS; and the emotion classification of the to-be-predicted text is acquired. According to the method, an ADMM (Alternating Direction Method of Multipliers) parallel method is used to solve an optimization problem under the Spark computing framework, so that model training is faster, and the method is more suitable for text emotion classification under a big data scene; and classification efficiency and precision are effectively improved.

Description

technical field [0001] The invention relates to the field of distributed machine learning, in particular to a text sentiment classification method based on a sparse multiple logistic regression model under the Spark framework. Background technique [0002] As a key part of machine learning and data mining, classification has a wide range of applications in image recognition, drug development, speech recognition, handwriting recognition, etc. It is a supervised learning problem of identifying which category a new instance belongs to based on a known training set. [0003] With the continuous expansion of data scale, the serial solution method of Sparse Multinomial Logistic Regression (SMLR) problem has been difficult to meet the time and storage space constraints in big data applications. Among many distributed algorithms, the Alternating Direction Method of Multipliers (ADMM) is widely used in the field of distributed machine learning because of its high decomposition and c...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
Patent Type & Authority Applications(China)
IPC IPC(8): G06F17/30G06F17/27
CPCG06F40/289
Inventor 雷大江杜萌陈浩张莉萍吴渝杨杰程克非
Owner CHONGQING UNIV OF POSTS & TELECOMM
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products