Looking for breakthrough ideas for innovation challenges? Try Patsnap Eureka!

Preprocessing method of text sentiment analysis characteristic verification

A sentiment analysis and pre-processing technology, which is applied in special data processing applications, text database clustering/classification, electronic digital data processing, etc., can solve the problems of insufficient consideration, low accuracy, and easy misclassification of artificially constructed sentiment analysis corpus and other issues to achieve the effect of improving accuracy, good versatility and scalability, and ensuring the richness of information

Active Publication Date: 2016-04-06
科来网络技术股份有限公司
View PDF3 Cites 6 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0003] 1. For general applicable scenarios, the business field of sentiment analysis is not deep enough, especially the emotional information on the Internet has an obvious unbalanced corpus distribution, and it is easy to misclassify when artificially constructing a sentiment analysis corpus. These two issues are generally insufficiently considered;
[0004] 2. The general verification and comparison basis for different feature extraction algorithms lacks the best practice design for practical applications
[0006] 1. The accuracy rate of most existing Chinese sentiment analysis algorithms is low, and there is a lack of feature verification or feature selection schemes that can guide algorithm improvement. For example, according to the results of the 5th Chinese Tendency Analysis and Evaluation Symposium COAE2013, the accuracy rate is generally 60% %about;
[0007] 2. Text information is represented by feature vectors. However, due to the lack of recognized best practices in the field, the modeling and algorithms of sentiment analysis are also varied. Therefore, the recognition and verification scheme for feature vectors of sentiment analysis also needs to consider the respective characteristics of these algorithms and models , such as commonly used bag of words, n-gram, word2vec and other models

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Preprocessing method of text sentiment analysis characteristic verification
  • Preprocessing method of text sentiment analysis characteristic verification

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0032] A preprocessing method for text sentiment analysis feature verification, comprising the following steps:

[0033] 1. Preprocessing the original training set to obtain preprocessing information:

[0034] Such as figure 1 As shown, this step includes the following specific content:

[0035] 1.1. Perform a summary analysis on the original training set, and the output result is recorded as sample_struct:

[0036] (1) Judging whether the total number of samples is large enough: the judgment result is represented by the parameter sample_size. For sentiment analysis samples, if the sample is large enough, it means that the number of unique samples of each effective classification is more than 1000.

[0037] (2) Judging whether the distribution of emotional classification is balanced: the judgment result is represented by the parameter sample_dist, which includes the number of samples of different classifications. If the number of samples of different classifications is not m...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

PUM

No PUM Login to View More

Abstract

The invention discloses a preprocessing method of text sentiment analysis characteristic verification. The preprocessing method comprises the following steps: preprocessing an original training set to obtain preprocessing information, and then, carrying out characteristic verification and characteristic selection on preprocessing information, wherein the step of preprocessing the original training set to obtain the preprocessing information comprises the following specific steps: determining the summary of the original training set, determining the summary of an original characteristic vector set, and expanding original data so as to construct integrated processing information. The preprocessing method has the active effects that analysis information is generated from two aspects of training set and characteristic vector, guarantees the rich degree of the information of an evaluation conclusion, and is simultaneously favorable for improving the accuracy of the whole sentiment analysis flow. The preprocessing method exhibits good universality and expandability, and can perform a good effect on various different modeling and implementation sentiment analysis algorithms.

Description

technical field [0001] The invention belongs to the field of text sentiment analysis, and in particular relates to a pre-processing method for feature verification of text sentiment analysis. Background technique [0002] Existing text classification feature selection and verification schemes have achieved good results in content domain classification, but they have the following problems when applied to the field of sentiment analysis: [0003] 1. For general applicable scenarios, the business field of sentiment analysis is not deep enough, especially the emotional information on the Internet has an obvious unbalanced corpus distribution, and it is easy to misclassify when artificially constructing a sentiment analysis corpus. These two issues are generally insufficiently considered; [0004] 2. The general verification and comparison basis for different feature extraction algorithms lacks the best practice design for practical applications. For example, in the commonly u...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

Application Information

Patent Timeline
no application Login to View More
IPC IPC(8): G06F17/30
CPCG06F16/35G06F16/374
Inventor 罗鹰张鑫阳林康
Owner 科来网络技术股份有限公司
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Patsnap Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Patsnap Eureka Blog
Learn More
PatSnap group products